this post was submitted on 12 Jan 2025

659 points (98.0% liked)

Technology

60469 readers

4450 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related content.
Be excellent to each another!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, to ask if your bot can be added please contact us.
Check for duplicates before posting, duplicates may be removed

Approved Bots

founded 2 years ago

MODERATORS

[email protected]

659

VLC player demos real-time AI subtitling for videos (www.theverge.com)

submitted 2 days ago* (last edited 2 days ago) by [email protected] to c/[email protected]

113 comments fedilink hide all child comments

cross-posted from: https://lemmy.ca/post/37011397

[email protected]

The popular open-source VLC video player was demonstrated on the floor of CES 2025 with automatic AI subtitling and translation, generated locally and offline in real time. Parent organization VideoLAN shared a video on Tuesday in which president Jean-Baptiste Kempf shows off the new feature, which uses open-source AI models to generate subtitles for videos in several languages.

you are viewing a single comment's thread
view the rest of the comments

[–] [email protected] 280 points 2 days ago (3 children)

Finally, some good fucking AI

[–] [email protected] 168 points 2 days ago (1 children)

I was just thinking, this is exactly what AI should be used for. Pattern recognition, full stop.

[–] [email protected] 67 points 2 days ago (2 children)

Yup, and if it isn't perfect that is ok as long as it is close enough.

Like getting name spellings wrong or mixing homophones is fine because it isn't trying to be factually accurate.

[–] [email protected] 34 points 2 days ago (5 children)

Problem ist that now people will say that they don't get to create accurate subtitles because VLC is doing the job for them.

Accessibility might suffer from that, because all subtitles are now just "good enough"

[–] [email protected] 25 points 2 days ago

Regular old live broadcast closed captioning is pretty much 'good enough' and that is the standard I'm comparing to.

Actual subtitles created ahead of time should be perfect because they have the time to double check.

[–] [email protected] 32 points 2 days ago

Or they can get OK ones with this tool, and fix the errors. Might save a lot of time

[–] [email protected] 12 points 2 days ago (1 children)

Honestly though? If your audio is even half decent you’ll get like 95% accuracy. Considering a lot of media just wouldn’t have anything, that is a pretty fair trade off to me

[–] [email protected] 6 points 2 days ago* (last edited 2 days ago) (2 children)

From experience AI translation is still garbage, specially for languages like Chinese, Japanese, and Korean , but if it only subtitles in the actual language such creating English subtitles for English then it is probably fine.

[–] [email protected] 2 points 2 days ago (1 children)

That's probably more due to lack of training than anything else. Existing models are mostly made by American companies and trained on English-language material. Naturally, the further you get from the model, the worse the result.

[–] [email protected] 0 points 2 days ago (1 children)

It is not the lack of training material that is the issue, it doesn't understand context and cultural references. Someone commented here that crunchyroll AI subtitles translated Asura Hall a name to asshole.

[–] [email protected] 1 points 2 days ago (1 children)

It would be able to behave like it understands context and cultural references it if it had the appropriate training data, no problem.

[–] [email protected] 1 points 2 days ago* (last edited 2 days ago) (1 children)

I highly doubt that it will be as good as human translation anytime soon, maybe around 10 years or so. Also they have profanity filters and they also hallucinate a lot. https://www.businessinsider.com/ai-peak-data-google-deepmind-researchers-solution-test-time-compute-2025-1

[–] [email protected] 1 points 1 day ago (1 children)

Never said that..

[–] [email protected] 1 points 1 day ago (1 children)

You said that with training data it will be able to understand. I mean that even with training data it will take years and it also has other problems like hallucinations. I admit, I didn't word it correctly.

[–] [email protected] 1 points 1 day ago (1 children)

*would, not will.

It is not know if the needed training data will ever even exist. But if it did, training an AI with that data would result in great, cultural subtitle generation.

[–] [email protected] 1 points 1 day ago (1 children)

Are you sure it is would? In the sentence you are referring to the AI understanding culture from language which is future tense.

[–] [email protected] 1 points 1 day ago

Will is future tense in a way that it is definitely gonna happen. Would just means there is the possibility.

And yes, I am sure, that one could brute force a solution with having enough computing power and learning data. If it would make sense (ethical and sustainably wise) is a whole other question.

I am sure it can, because LLM are statistically systems as humans are as well for a great factor (just not as strict as a machine). If you have enough data with action and response to such cultural traditions, there is nothing that would suggest that a LLM would fail to replicate that.

[–] [email protected] 1 points 2 days ago

English it’s been great for me yes

[–] [email protected] 11 points 2 days ago

I have a feeling that if you care enough about subtitles you're going to look for good ones, instead of using "ok" ai subs.

[–] [email protected] 2 points 2 days ago* (last edited 2 days ago) (1 children)

I imagine it would be not-exactly-simple-but-not- complicated to add a "threshold" feature. If Ai is less than X% certain, it can request human clarification.

Edit: Derp. I forgot about the "real time" part. Still, as others have said, even a single botched word would still work well enough with context.

[–] [email protected] 1 points 2 days ago* (last edited 2 days ago) (1 children)

That defeats the purpose of doing it in real time as it would introduce a delay.

[–] [email protected] 1 points 2 days ago

Derp. You're right, I've added an edit to my comment.

[–] [email protected] 15 points 2 days ago (1 children)

I'd like to see this fix the most annoying part about subtitles, timing. find transcript/any subs on the Internet and have the AI align it with the audio properly.

[–] [email protected] 2 points 2 days ago

YES! I can't stand when subtitles are misaligned to the video. If this AI tool could help with that, it would be super useful.

[–] [email protected] 12 points 2 days ago* (last edited 2 days ago)

Yeah it’s pretty wonderful To see how far auto generated transcription/captioning has become over the last couple of years. A wonderful victory for many communities with various disabilities.

[–] [email protected] 0 points 2 days ago

Finally some good AI fucking 🤭