this post was submitted on 08 Jan 2024

334 points (96.1% liked)

Technology

59648 readers

2906 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related content.
Be excellent to each another!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, to ask if your bot can be added please contact us.
Check for duplicates before posting, duplicates may be removed

Approved Bots

founded 1 year ago

MODERATORS

[email protected]

334

Microsoft, OpenAI sued for copyright infringement by nonfiction book authors in class action claim (www.cnbc.com)

submitted 10 months ago by [email protected] to c/[email protected]

62 comments fedilink hide all child comments

Microsoft, OpenAI sued for copyright infringement by nonfiction book authors in class action claim::The new copyright infringement lawsuit against Microsoft and OpenAI comes a week after The New York Times filed a similar complaint in New York.

top 50 comments

sorted by: hot top controversial new old

[–] [email protected] 39 points 10 months ago

I wish the protections placed on corporate control on cultural and intellectual assets were placed on the average persons privacy instead.

Like I really don't care that someone's publicly available book and movie in the last century is analysed and used to create tools, but I do care that without people's actual knowledge a intense surveillance apparatus is being built to collect every minute piece of data about their lives and the lives of those around them to be sold without ethical oversight or consent.

IP is bull, but privacy is a real concern. No one is going to using a extra copy of NY times article to hurt someone, but surveillance is used by authoritarians to oppress and harass innocent people.

[–] [email protected] 23 points 10 months ago* (last edited 10 months ago) (12 children)

I'm not a huge fan of Microsoft or even OpenAI by any means, but all these lawsuits just seem so... lazy and greedy?

It isn't like ChatGPT is just spewing out the entirety of their works in a single chat. In that context, I fail to see how seeing snippets of said work returned in a Google summary is any different than ChatGPT or any other LLM doing the same.

Should OpenAI and other LLM creators use ethically sourced data in the future? Absolutely. They should've been doing so all along. But to me, these rich chumps like George R. R. Martin complaining that they felt their data was stolen without their knowledge and profited off of just feels a little ironic.

Welcome to the rest of the 6+ billion people on the Internet who've been spied on, data mined, and profited off of by large corps for the last two decades. Where's my god damn check? Maybe regulators should've put tougher laws and regulations in place long ago to protect all of us against this sort of shit, not just businesses and wealthy folk able to afford launching civil suits and shakey grounds. It's not like deep learning models are anything new.

Edit:

Already seeing people come in to defend these suits. I just see it like this: AI is a tool, much like a computer or a pencil are tools. You can use a computer to copyright infringe all day, just like a pencil can. To me, an AI is only going to be plagiarizing or infringing if you tell it to. How often does AI plagiarize without a user purposefully trying to get it to do so? That's a genuine question.

Regardless, the cat's out of the bag. Multiple LLMs are already out in the wild and more variations are made each week, and there's no way in hell they're all going to be reigned in. I'd rather AI not exist, personally, as I don't see protections coming for normal workers over the next decade or two against further evolutions of the technology. But, regardless, good luck to these companies fighting the new Pirate Bay-esque legal wars for the next couple of decades.

[–] [email protected] 18 points 10 months ago (12 children)

Already seeing people come in to defend these suits. I just see it like this: AI is a tool, much like a computer or a pencil are tools. You can use a computer to copyright infringe all day, just like a pencil can. To me, an AI is only going to be plagiarizing or infringing if you tell it to. How often does AI plagiarize without a user purposefully trying to get it to do so? That’s a genuine question.

You are misrepresenting the issue. The issue here is not if a tool just happens to be able to be used for copyright infringement in the hands of a malicious entity. The issue here is whether LLM outputs are just derivative works of their training data. This is something you cannot compare to tools like pencils and pcs which are much more general purpose and which are not built on stole copyright works. Notice also how AI companies bring up "fair use" in their arguments. This means that they are not arguing that they are not using copryighted works without permission nor that the output of the LLM does not contain any copyrighted part of its training data (they can't do that because you can't trace the flow of data through an LLM), but rather that their use of the works is novel enough to be an exception. And that is a really shaky argument when their services are actually not novel at all. In fact they are designing services that are as close as possible to the services provided by the original work creators.

load more comments (12 replies)

[–] [email protected] 10 points 10 months ago* (last edited 10 months ago) (2 children)

deleted

[–] [email protected] 1 points 10 months ago

Sure. Trickle-down FTW.

[–] [email protected] 1 points 10 months ago (15 children)

Its wild to me how so many people seem to have got it into their head that cheering for the IP laws that corporations fought so hard for is somehow left wing and sticking up for the little guy.

[–] [email protected] 9 points 10 months ago* (last edited 10 months ago) (1 children)

deleted

[–] [email protected] 1 points 10 months ago* (last edited 10 months ago)

Just a heads-up, libertarian is usually understood, in the american sense, as meaning right libertarian, including so-called anarcho-capitalists. It's understood to mean people who believe that the right to own property is absolutely fundamental. Many libertarians don't believe in intellectual property but some do. Which is to say that in american parlance, the label "libertarian" would probably include you. Just FYI.

Also, I don't know what definition of "left" you are using, but it's not a common one. Left ideologies typically favor progress, including technological progress. They also tend to be critical of property, and (AFAIK universally) reject forms of property that allow people to draw unearned rents. They tend to side with the wider interests of the public over an individual's right to property. The grandfather comment is perfectly consistent with left ideology.

load more comments (14 replies)

[–] [email protected] 6 points 10 months ago (2 children)

If I want to be able to argue that having any copyleft stuff in the training dataset makes all the output copyleft -- and I do -- then I necessarily have to also side with the rich chumps as a matter of consistency. It's not ideal, but it can't be helped. ¯\_(ツ)_/¯

[–] [email protected] 4 points 10 months ago (1 children)

Wait. I first thought this was sarcasm. Is this sarcasm?

[–] [email protected] 2 points 10 months ago* (last edited 10 months ago) (1 children)

No. I really do think that all AI output should be required to be copyleft if there's any copyleft in the training dataset (edit for clarity: unless there's also something else with an incompatible license in it, in which case the output isn't usable at all -- but protecting copyleft is the part I care about).

[–] [email protected] 1 points 10 months ago (1 children)

Huh. Obviously, you don't believe that a copyleft license should trump other licenses (or lack thereof). So, what are you hoping this to achieve?

[–] [email protected] 2 points 10 months ago (1 children)

Obviously, you don’t believe that a copyleft license should trump other licenses (or lack thereof)

I'm not sure what you mean. No licenses "trump" any other license; that's not how it works. You can only make something that's a derivative work of multiple differently-licensed things if the terms of all the licenses allow it, something the FSF calls "compatibility." Obviously, a proprietary license can never be compatible with a copyleft one, so what I'm hoping to achieve is a ruling that says any AI whose training dataset included both copyleft and proprietary items has completely legally-unusable output. (And also that any AI whose training dataset includes copyleft items along with permissively-licensed and public domain ones must have its output be copyleft.)

[–] [email protected] 1 points 10 months ago

Yes, but what do you hope to achieve by that?

[–] [email protected] 3 points 10 months ago (1 children)

In your mind are the publishers the rich chumps, or Microsoft?

For copyleft to work, copyright needs to be strong.

[–] [email protected] 0 points 10 months ago

I was just repeating the language the parent commenter used (probably should've quoted it in retrospect). In this case, "rich chumps" are George R.R. Martin and other authors suing Microsoft.

[–] [email protected] 2 points 10 months ago (1 children)

I fail to see how seeing snippets of said work returned in a Google summary is any different than ChatGPT or any other LLM doing the same.

Just because it was available for the public internet doesn't mean it was available legally. Google has a way to remove it from their index when asked, while it seems that OpenAI has no way to do so (or will to do so).

[–] [email protected] 8 points 10 months ago* (last edited 10 months ago)

deleted

load more comments (8 replies)

[–] [email protected] 3 points 10 months ago

All the grifters coming out to feed 🫣

[–] [email protected] 3 points 10 months ago (1 children)

If it's not infringement to input copyrighted materials, then it's not infringement to take the output.

[–] [email protected] 8 points 10 months ago (13 children)

Because.. why?

load more comments (13 replies)

load more comments