Technology

59039 readers

3763 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related content.
Be excellent to each another!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, to ask if your bot can be added please contact us.
Check for duplicates before posting, duplicates may be removed

Approved Bots

founded 1 year ago

MODERATORS

[email protected]

485

Report: Potential NYT lawsuit could force OpenAI to wipe ChatGPT and start over (arstechnica.com)

submitted 1 year ago by [email protected] to c/[email protected]

159 comments fedilink hide all child comments

cross-posted from: https://nom.mom/post/121481

OpenAI could be fined up to $150,000 for each piece of infringing content.https://arstechnica.com/tech-policy/2023/08/report-potential-nyt-lawsuit-could-force-openai-to-wipe-chatgpt-and-start-over/#comments

you are viewing a single comment's thread
view the rest of the comments

[–] [email protected] 187 points 1 year ago (117 children)

Good

AI should not be given free reign to train on anything and everything we’ve ever created. Copyright holders should be able to decide if their works are allowed to be used for model training, especially commercial model training. We’re not going to stop a hobbyist, but google/Microsoft/openAI should be paying for materials they’re using and compensating the creators.

[–] [email protected] 123 points 1 year ago* (last edited 1 year ago) (4 children)

While that’s understandable, I think it’s important to recognize that this is something where we’re going to have to treat pretty carefully.

If a human wants to become a writer, we tell them to read. If you want to write science fiction, you should both study the craft of writing ranging from plots and storylines to character development to Stephen King’s advice on avoiding adverbs. You also have to read science fiction so you know what has been done, how the genre handles storytelling, what is allowed versus shunned, and how the genre evolved and where it’s going. The point is not to write exactly like Heinlein (god forbid), but to throw Heinlein into the mix with other classic and contemporary authors.

Likewise, if you want to study fine art, you do so by studying other artists. You learn about composition, perspective, and color by studying works of other artists. You study art history, broken down geographically and by period. You study DaVinci’s subtle use of shading and Mondrian’s bold colors and geometry. Art students will sit in museums for hours reproducing paintings or working from photographs.

Generative AI is similar. Being software (and at a fairly early stage at that), it’s both more naive and in some ways more powerful than human artists. Once trained, it can crank out a hundred paintings or short stories per hour, but some of the people will have 14 fingers and the stories might be formulaic and dull. AI art is always better when glanced at on your phone than when looked at in detail on a big screen.

In both the cases of human learners and generative AI, a neural network(-like) structure is being conditioned to associate weights between concepts, whether it’s how to paint a picture or how to create one by using 1000 words.

A friend of mine who was an attorney used to say “bad facts make bad law.” It means that misinterpretation, over-generalization, politicization, and a sense of urgency can make for both bad legislation and bad court decisions. That’s especially true when the legislators and courts aren’t well educated in the subjects they’re asked to judge.

In a sense, it’s a new technology that we don’t fully understand - and by “we” I’m including the researchers. It’s theoretically and in some ways mechanically grounded in old technology that we also don’t understand - biological neural networks and complex adaptive systems.

We wouldn’t object to a journalism student reading articles online to learn how to write like a reporter, and we rightfully feel anger over the situation of someone like Aaron Swartz. As a scientist, I want my papers read by as many people as possible. I’ve paid thousands of dollars per paper to make sure they’re freely available and not stuck behind a paywall. On the other hand, I was paid while writing those papers. I am not paid for the paper, but writing the paper was part of my job.

I realize that is a case of the copyright holder (me) opening up my work to whoever wants a copy. On the other other hand, we would find it strange if an author forbade their work being read by someone who wants to learn from it, even if they want to learn how to write. We live in a time where technology makes things like DRM possible, which attempts to make it difficult or impossible to create a copy of that work. We live in societies that will send people to prison for copying literal bits of information without a license to do so. You can play a game, and you can make a similar game. You can play a thousand games, and make one that blends different elements of all of them. But if you violate IP, you can be sued.

I think that’s what it comes down to. We need to figure out what constitutes intellectual property and what rights go with it. What constitutes cultural property, and what rights do people have to works made available for reading or viewing? It’s easy to say that a company shouldn’t be able to hack open a paywall to get at WSJ content, but does that also go for people posting open access to Medium?

I don’t have the answers, and I do want people treated fairly. I recognize the tremendous potential for abuse of LLMs in generating viral propaganda, and I recognize that in another generation they may start making a real impact on the economy in terms of dislocating people. I’m not against legislation. I don’t expect the industry to regulate itself, because that’s not how the world works. I’d just like for it to be done deliberately and realistically and with the understanding that we’re not going to get it right and will have to keep tuning the laws as the technology and our understanding continue to evolve.

[–] [email protected] 21 points 1 year ago

Sorry this is a bit too level-headed for me, can you please repeat with a bullhorn, and use 4-letter words instead? I need to know who to blame here.

[–] [email protected] 15 points 1 year ago

This is an astonishingly well written, nuanced, and level headed response. Really on a level I'm not used to seeing on this platform.

[–] [email protected] 5 points 1 year ago

Well written sir.

[–] [email protected] 1 points 1 year ago* (last edited 1 year ago) (1 children)

Both an AI and an art student are a complex web of weights that take inputs and returns an output. Agreed.

But the inputs are vastly different. An art student has all the inputs of every moment leading up to the point of putting paint to canvas. Emotion, hunger, pain, and every moment that life has thrown at them. All of them lead to very different results. Every art piece affects the subsequent ones.

The AI on the other hand is purely derivative. It’s only ever told about pre-existing art and a brief interpretation of it. It does not feel emotion. It does not worry about paying its bills or falling in love. It builds a map of weights once and that is that. Every input repeated however many times will yield exactly the same output.

And yes, you have the artists who are professional plagiarists, making hand-painted Picasso imitations of someone’s chihuahua for $20 over the internet. But they’re not mass producing derivative work by the thousands.

I fully agree with the shit-in, shit-out sentiment, and researchers should be free to train their models of whatever data they need.

But monetising their models, that by definition are generating derivative works is another matter.

[–] [email protected] 1 points 1 year ago

How do you know it is purely derivative? Are you saying an AI can't write a sentence that has never been written before or are you saying that it can't have an original thought? If it is writing a brand new sentence that is an amalgamation of many other writings how is that violating a copyright (or any differentthan a human doingit)? The copyright claims are absurd.

load more comments (112 replies)