this post was submitted on 27 Nov 2024
65 points (100.0% liked)

TechTakes

1480 readers
315 users here now

Big brain tech dude got yet another clueless take over at HackerNews etc? Here's the place to vent. Orange site, VC foolishness, all welcome.

This is not debate club. Unless it’s amusing debate.

For actually-good tech, you want our NotAwfulTech community

founded 1 year ago
MODERATORS
top 17 comments
sorted by: hot top controversial new old
[–] dgerard@awful.systems 13 points 2 weeks ago* (last edited 2 weeks ago)

Hugging Face is the most ethical AI company ever!

https://huggingface.co/DreamAngel/Hana_Bunny_v1.0

original archive: https://archive.is/stbxz

this is a model created to generate fake AI revenge porn nudes of a specific influencer

they switched off access only just now, because Bluesky was hammering the HF staff with "the fuck you're an ethical company"

(CONTENT WARNING: pic of AI boobs)

[–] Architeuthis@awful.systems 11 points 2 weeks ago* (last edited 2 weeks ago) (2 children)

And now they are sad and blame bsky for being toxic, on xitter, the chill and accepting app.

[–] self@awful.systems 13 points 2 weeks ago

my colleagues are kind, caring people & they were attacked (idc if I get attacked so long as it doesn't touch my company/colleagues) we've always seen love for our work, this incident shocked me

we'll keep shipping 📦💗 can't satisfy all

Don't take out your frustration from election results on them, LOSERS

it’s really jarring seeing one of the biggest hosts for generative AI projects simultaneously do “we’re just an uwu smol bean open source passion project why are you attacking us” while boosting and officially supporting chan-coded fash shit from an e/acc account

[–] mawhrin@awful.systems 11 points 2 weeks ago

the aggrieved attitude of “but i am a socialist” is really adorable.

[–] Architeuthis@awful.systems 8 points 2 weeks ago

404 added an update that the dataset was removed:

Update: Following the publication of this article on Tuesday evening, van Strien removed the dataset. "I've removed the Bluesky data from the repo," he wrote on Bluesky. "While I wanted to support tool development for the platform, I recognize this approach violated principles of transparency and consent in data collection. I apologize for this mistake."

[–] mountainriver@awful.systems 8 points 2 weeks ago (1 children)

So they named the product sucking the data after the Facehugger? At least they know that they are in the abomination business. Will they be releasing an AI named Bursting Chest?

[–] Architeuthis@awful.systems 11 points 2 weeks ago (1 children)

The company was named after the U+1F917 🤗 HUGGING FACE emoji.

HF is more of a platform for publishing this sort of thing, as well as the neural networks themselves and a specialized cloud service to train and deploy them, I think. They are not primarily a tool vendor, and they were around well before the LLM hype cycle.

[–] self@awful.systems 9 points 2 weeks ago (2 children)

to be honest, they give me a lot of mtgox vibes:

  • extremely stupid name
  • technically predates the worst excesses of the AI bubble
  • very eager to enable the worst excesses of the AI bubble
[–] dgerard@awful.systems 5 points 2 weeks ago

they also went full DARVO complaining how mean everyone had been to them over them abusing personal data and then telling us we needed to get with the program

[–] Architeuthis@awful.systems 4 points 2 weeks ago

It's also the place where you go to to download models to use by yourself instead of sending all your data to the most unscrupulous people possible, so at least they've got that going for them.

[–] bizarroland@fedia.io 6 points 3 weeks ago (2 children)

I hope Blue sky sues them.

[–] mawhrin@awful.systems 6 points 2 weeks ago

i bet that bluesky worked with them to get their company and all their personal accounts established at bluesky as the primary social media channel.

i don't expect they will get anything more than a slap on the wrists, if at all.

[–] Breve@pawb.social 4 points 3 weeks ago (1 children)

This is a bit of a nuanced issue though. The person merely published a dataset made from publicly available data than anyone can re-create themselves using the Bluesky Firehose API. Could it be used to train a model? Yes, but that isn't the only use case and the person who posted it has no control over what other people use it for. If someone does train a model using it then that's their legal issue to work out, not the publisher's.

It's the same argument billionaires were using to justify silencing people who posted the movement of their private jets. The billionaires argued that this data could be used to harass them, but the posters argued the data is public and they aren't responsible for what other people do with it.

[–] bizarroland@fedia.io 6 points 2 weeks ago (2 children)

The legal system is the perfect place for working out nuanced issues like this.

If I were a lawyer and making this lawsuit I would argue that "publicly available" does not mean "public domain", and that without acquiring usage rights for the data then you don't have the right to use the data.

If the courts rule against a decision like this then that would mean that any website that hosts any materials that can be accessed without an account must then provide that material to any person who accesses it free of charge which is a gigantic consequence to this nuanced issue.

[–] dgerard@awful.systems 6 points 2 weeks ago

I'd just bury them in GDPR notices, which quite a few people did

[–] Breve@pawb.social 1 points 2 weeks ago* (last edited 2 weeks ago) (1 children)

My point is that you can't talk about usage rights of a dataset without talking about a specific use case. The suggested use case was to provide a static test dataset for systems developed to use the firehose API, but the dataset could be used for literally anything from making funny memes (fair use) to training a LLM model (arguably not fair use). Does the existence of an illegal use case automatically mean the dataset itself should be illegal though?

As a collorary, a photocopier can be used to create unauthorized reproductions of copyrighted works. Should making and disturbing photocopiers be illegal because they are capable of and used in the process of violating copyright law, or should we accept the photocopier absent of a use case isn't breaking any laws and go after the people who use them to illegally create unauthorized reproductions?

[–] davidagain@lemmy.world 5 points 2 weeks ago

A data set isn't like a photocopier in any meaningful way.

It's not a tool, it's information, and some of it counts as personal data under EU data protection laws.