this post was submitted on 26 May 2024
382 points (97.0% liked)

Not The Onion

12224 readers
686 users here now

Welcome

We're not The Onion! Not affiliated with them in any way! Not operated by them in any way! All the news here is real!

The Rules

Posts must be:

  1. Links to news stories from...
  2. ...credible sources, with...
  3. ...their original headlines, that...
  4. ...would make people who see the headline think, “That has got to be a story from The Onion, America’s Finest News Source.”

Comments must abide by the server rules for Lemmy.world and generally abstain from trollish, bigoted, or otherwise disruptive behavior that makes this community less fun for everyone.

And that’s basically it!

founded 1 year ago
MODERATORS
 

cross-posted from: https://zerobytes.monster/post/1072393

The original post: /r/nottheonion by /u/The_Ethics_Officer on 2024-05-25 00:48:15.
you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 39 points 5 months ago (4 children)

Oh, it’s worse than that.

Google’s “AI” results feed you things for 10 year old Reddit posts that are subtle (but sometimes, also not so subtle) bullshit.

Whatever they’re using to curate training data is evidently pretty awful at detecting shitposts.

[–] [email protected] 15 points 5 months ago

Bold of you to assume they're curating their training data.

[–] [email protected] 8 points 5 months ago

Those underpaid Indians probably aren't very good at picking up irony, even if they give a shit.

[–] [email protected] 4 points 5 months ago

Most of the curation or fine tuning is done in low income African countries so this is little surprising. They‘re cheap labour but you can‘t expect them to reliably detect sarcasm or notice mistakes in specialized fields. They basically give a thumbs up whenever the AI sounds convincing. Of course that includes instances where it‘s confidently wrong and that appears to be most of the time with this model.

[–] [email protected] 2 points 5 months ago (1 children)

It's not a training data issue, look up Retrieval Augmented Generation. It's basically serving up stuff on the web and taking it as gospel.

[–] [email protected] -1 points 5 months ago

That's bullwhip why can't it just think for itself