Technology

60076 readers

4241 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related content.
Be excellent to each another!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, to ask if your bot can be added please contact us.
Check for duplicates before posting, duplicates may be removed

Approved Bots

founded 2 years ago

MODERATORS

[email protected]

580

Microsoft’s AI boss thinks it’s perfectly OK to steal content if it’s on the open web (www.theverge.com)

submitted 5 months ago by [email protected] to c/[email protected]

142 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] [email protected] 18 points 5 months ago* (last edited 5 months ago) (3 children)

Yeah, I'm not a fan of AI but I'm generally of the view that anything posted on the internet, visible without a login, is fair game for indexing a search engine, snapshotting a backup (like the internet archive's Wayback Machine), or running user extensions on (including ad blockers). Is training an AI model all that different?

[–] [email protected] 6 points 5 months ago (1 children)

You can't be for piracy but against LLMs fair the same reason

And I think most of the people on Lemmy are for piracy,

[–] [email protected] 3 points 5 months ago* (last edited 5 months ago) (1 children)

I'm not in favor of piracy or LLMs. I'm also not a fan of copyright as it exists today (I think we should go back to the 1790 US definition of copyright).

I think a lot of people here on lemmy who are "in favor of piracy" just hate our current copyright system, and that's quite understandable and I totally agree with them. Having a work protected for your entire lifetime sucks.

[–] [email protected] 5 points 5 months ago (1 children)

The problem with copyright has nothing to do with terms limits. Those exacerbate the problem, but the fundamental problem with copyright and IP law is that it is a system of artificial scarcity where there is no need for one.

Rather than reward creators when their information is used, we hamfistedly try and prevent others from using that information so that people have to pay them to use it sometimes.

Capitalism is flat out the wrong system for distributing digital information, because as soon as information is digitized it is effectively infinitely abundant which sends its value to $0.

[–] [email protected] 2 points 5 months ago

Copyright is not a capitalist idea, it's collectivist. See copyright in the Soviet Union, the initial bill of which was passed in 1925, right near the start of the USSR.

A pure capitalist system would have no copyright, and works would instead be protected through exclusivity (I.e. paywalls) and DRM. Copyright is intended to promote sharing by providing a period of exclusivity (temporary monopoly on a work). Whether it achieves those goals is certainly up for debate.

Long terms go against any benefit to society that copyright might have. I think it does have a benefit, but that benefit is pretty limited and should probably only last 10-15 years. I think eliminating copyright entirely would leave most people worse off and probably mostly benefit large orgs that can afford expensive DRM schemes in much the same way that our current copyright duration disproportionately benefits large orgs.

[–] [email protected] 3 points 5 months ago (2 children)

Yes, it kind of is. A search engine just looks for keywords and links, and that's all it retains after crawling a site. It's not producing any derivative works, it's merely looking up an index of keywords to find matches.

An LLM can essentially reproduce a work, and the whole point is to generate derivative works. So by its very nature, it runs into copyright issues. Whether a particular generated result violates copyright depends on the license of the works it's based on and how much of those works it uses. So it's complicated, but there's very much a copyright argument there.

[–] [email protected] 7 points 5 months ago (1 children)

My brain also takes information and creates derivative works from it.

Shit, am I also a data thief?

[–] [email protected] 2 points 5 months ago

That depends, do you copy verbatim? Or do you process and understand concepts, and then create new works based on that understanding? If you copy verbatim, that's plagiarism and you're a thief. If you create your own answer, it's not.

Current AI doesn't actually "understand" anything, and "learning" is just grabbing input data. If you ask it a question, it's not understanding anything, it just matches search terms to the part of the training data that matches, and regurgitates a mix of it, and usually omits the sources. That's it.

It's a tricky line in journalism since so much of it is borrowed, and it's likewise tricky w/ AI, but the main difference IMO is attribution, good journalists cite sources, AI rarely does.

[–] [email protected] 6 points 5 months ago (1 children)

An LLM can essentially reproduce a work, and the whole point is to generate derivative works. So by its very nature, it runs into copyright issues.

Derivative works are not copyright infringement. If LLMs are spitting out exact copies, or near-enough-to-exact copies, that’s one thing. But as you said, the whole point is to generate derivative works.

[–] [email protected] 2 points 5 months ago

Derivative works are not copyright infringement

They absolutely are, unless it's covered by "fair use." A "derivative work" doesn't mean you created something that's inspired by a work, but that you've modified the the work and then distributed the modified version.

[–] [email protected] 3 points 5 months ago

None of those things replace that content, though.

Look, I dunno if this is legally a copyrights issue, but as a society, I think a lot of people have decided they're willing to yield to social media and search engine indexers, but not to AI training, you know? The same way I might consent to eating a mango but not a banana.