this post was submitted on 12 Jan 2025
660 points (98.0% liked)
Technology
60469 readers
4762 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related content.
- Be excellent to each another!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, to ask if your bot can be added please contact us.
- Check for duplicates before posting, duplicates may be removed
Approved Bots
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
From experience AI translation is still garbage, specially for languages like Chinese, Japanese, and Korean , but if it only subtitles in the actual language such creating English subtitles for English then it is probably fine.
That's probably more due to lack of training than anything else. Existing models are mostly made by American companies and trained on English-language material. Naturally, the further you get from the model, the worse the result.
It is not the lack of training material that is the issue, it doesn't understand context and cultural references. Someone commented here that crunchyroll AI subtitles translated Asura Hall a name to asshole.
It would be able to behave like it understands context and cultural references it if it had the appropriate training data, no problem.
I highly doubt that it will be as good as human translation anytime soon, maybe around 10 years or so. Also they have profanity filters and they also hallucinate a lot. https://www.businessinsider.com/ai-peak-data-google-deepmind-researchers-solution-test-time-compute-2025-1
Never said that..
You said that with training data it will be able to understand. I mean that even with training data it will take years and it also has other problems like hallucinations. I admit, I didn't word it correctly.
*would, not will.
It is not know if the needed training data will ever even exist. But if it did, training an AI with that data would result in great, cultural subtitle generation.
Are you sure it is would? In the sentence you are referring to the AI understanding culture from language which is future tense.
Will is future tense in a way that it is definitely gonna happen. Would just means there is the possibility.
And yes, I am sure, that one could brute force a solution with having enough computing power and learning data. If it would make sense (ethical and sustainably wise) is a whole other question.
I am sure it can, because LLM are statistically systems as humans are as well for a great factor (just not as strict as a machine). If you have enough data with action and response to such cultural traditions, there is nothing that would suggest that a LLM would fail to replicate that.
English it’s been great for me yes