this post was submitted on 11 Jul 2023
4 points (83.3% liked)

Actually Useful AI

1999 readers
2 users here now

Welcome! ๐Ÿค–

Our community focuses on programming-oriented, hype-free discussion of Artificial Intelligence (AI) topics. We aim to curate content that truly contributes to the understanding and practical application of AI, making it, as the name suggests, "actually useful" for developers and enthusiasts alike.

Be an active member! ๐Ÿ””

We highly value participation in our community. Whether it's asking questions, sharing insights, or sparking new discussions, your engagement helps us all grow.

What can I post? ๐Ÿ“

In general, anything related to AI is acceptable. However, we encourage you to strive for high-quality content.

What is not allowed? ๐Ÿšซ

General Rules ๐Ÿ“œ

Members are expected to engage in on-topic discussions, and exhibit mature, respectful behavior. Those who fail to uphold these standards may find their posts or comments removed, with repeat offenders potentially facing a permanent ban.

While we appreciate focus, a little humor and off-topic banter, when tasteful and relevant, can also add flavor to our discussions.

Related Communities ๐ŸŒ

General

Chat

Image

Open Source

Please message @[email protected] if you would like us to add a community to this list.

Icon base by Lord Berandas under CC BY 3.0 with modifications to add a gradient

founded 1 year ago
MODERATORS
 

Greetings Citizens of Hopefully Useful AI.

It has come to my attention that there are plenty of videos, as well as workflows that would get so much better if there was the possibility of textifying their audio content.

That being said, I hear Whisper, at least in the past 9 months or so was the cream of the crop when it came to audio recognition. And was also open source to boot (shocker).

Therefore, I'd be quite pleased to know if anyone created a method to more easily make use of the model. Because dedicating mental space to remembering specific adhoc commands does not make for a good long term tool.

For reference, I can throw a 24GB of VRAM at the problem if need be, and am running a Windows machine. Anything like Oobabooga or A1111? (Or a standard program would work just as nicely.) That would be very much appreciated.

Type in your answer, and ENRICH the future of Lemmy with your knowledge. (As well as answer one's question, pretty please.)


Thank you very much for reading and have a most fine of days!

top 2 comments
sorted by: hot top controversial new old
[โ€“] [email protected] 6 points 1 year ago (1 children)

https://github.com/ahmetoner/whisper-asr-webservice
this project might be of interest for you, it is a web service/api for transcribing with whisper ai. You can either use the web site or make programmatic calls to the API using it

[โ€“] [email protected] 1 points 1 year ago

Oh this is is quite interesting. Quite interesting indeed! I approve of this. Seems like it may be exactly what I was looking for.

Much appreciated!