this post was submitted on 18 Jul 2024

90 points (100.0% liked)

TechTakes

1430 readers

171 users here now

Big brain tech dude got yet another clueless take over at HackerNews etc? Here's the place to vent. Orange site, VC foolishness, all welcome.

This is not debate club. Unless it’s amusing debate.

For actually-good tech, you want our NotAwfulTech community

founded 1 year ago

MODERATORS

[email protected]

Proton Mail goes AI, security-focused userbase goes ‘what on earth’ (pivot-to-ai.com)

submitted 4 months ago by [email protected] to c/[email protected]

164 comments fedilink hide all child comments

we appear to be the first to write up the outrage coherently too. much thanks to the illustrious @self

you are viewing a single comment's thread
view the rest of the comments

[–] [email protected] 0 points 4 months ago (2 children)

Mistral isn't trained on copy righted data. It's based off selective databases that were open use. This article in general is full of false information. But I suppose most people only read the headlines.

[–] [email protected] 1 points 4 months ago (1 children)

was this incorrect? https://www.patronus.ai/blog/introducing-copyright-catcher

[–] [email protected] 0 points 4 months ago (1 children)

Yes, they are incorrect:

https://arstechnica.com/information-technology/2023/07/why-ai-detectors-think-the-us-constitution-was-written-by-ai/

[–] [email protected] 1 points 4 months ago (1 children)

if you're not gonna read the fucken thing then fuck off.

[–] [email protected] 0 points 4 months ago (2 children)

I did read the thing, then provided an article explaining why detecting copyrighted material / determining if something is written by AI is very inaccurate.

Perhaps take your own advice to "read the fucken thing" next time instead of making yourself look like an idiot. Though I doubt you've ever heard of "better to stay silent and let them think you the fool than to speak and remove all doubt".

Btw, I even recall that Ars specifically covered the company you linked to in a separate article as well. I'd be glad to provide it once you've come to your senses and want to discuss things like an adult.

[–] [email protected] 1 points 4 months ago (1 children)

you're conflating "detecting ai text" with "detecting an ai trained on copyrighted material"

send the relevant article or shut up

[–] [email protected] 0 points 4 months ago (3 children)

Ignoring the logical inconsistency you just spouted for a moment (can't tell if it's written by AI but knows it used copyrighted material? Do you not hear yourself?), you do realize Mistral is released under the Apache 2.0 license, a highly permissive scheme that has no restrictions on use or reproduction beyond attribution, right?

I think it's clear you're arguing in bad faith however with no intention of changing your misinformed opinion at this point. Perhaps you'd enjoy an echo chamber like the "fuckai" Lemmy instance.

[–] [email protected] 2 points 4 months ago (1 children)

wait a minute… there’s another “fuck ai” instance and they’ve already told you to go fuck yourself?

I wonder if they want to be friends

[–] [email protected] 1 points 4 months ago* (last edited 4 months ago) (1 children)

have seen one on lemmy.world. it's kinda the dancing baby version of the stubsack and techtakes tho from what I can tell

[–] [email protected] 1 points 4 months ago (1 children)

aw, it’s only a community? that’s what I get for expecting anything but garbage from Oscar the Grouch I suppose

[–] [email protected] 1 points 4 months ago

"I love trash, baka!"

— Asuka the Grouch

[–] [email protected] 1 points 4 months ago (1 children)

You are quite dumb.

[–] [email protected] 1 points 4 months ago

the reading comprehension of a llm and the contextual capacity of a gnat

[–] [email protected] 1 points 4 months ago

holy shit you really are quite dumb. the fuck is wrong with you?

actually don’t answer that

[–] [email protected] 1 points 4 months ago (2 children)

Mistral’s Mixtral-8x7B-Instruct-v0.1 produced copyrighted content on 22% of the prompts.

did you know that a lesser-known side effect of the infinite monkeys approach is that they will produce whole sections of copyright content abso-dupo-lutely by accident? wild, I know! totes coinkeedink!

I’d be glad to provide it once you’ve come to your senses and want to discuss things like an adult

jesus fucking christ you must be a fucking terrible person to work with

I've seen toddlers throw more mature tantrums

[–] [email protected] 1 points 4 months ago

she wrote harry potter with an llm, didn't she?

[–] [email protected] 0 points 4 months ago (2 children)

I'm too old to discuss against bad faith arguments.

Especially with people who won't read the information I provide them showing their initial information was wrong.

One is a company that has something to sell, the other an article with citations showing why it's not easy to determine what percentage of a data set is infringing on copyright, or whether exact reproduction via "fishing expedition" prompting is a useful metric to determine if unauthorized copyright was used in training.

The dumbest take though is attacking Mistral of all LLMs, even though it's on an Apache 2.0 license.

[–] [email protected] 1 points 4 months ago* (last edited 4 months ago) (1 children)

I've read the article you've posted: it does not refute the fucking datapoint provided, it literally DOES NOT EVEN MENTION MISTRAL AT ALL.

so all I can tell you is to take your pearlclutching tantrum bullshit and please fuck off already

[–] [email protected] 1 points 4 months ago (4 children)

god these weird little fuckers’ ability to fill a thread with garbage is fucking notable isn’t it? something about loving LLMs makes you act like an LLM. how depressing for them.

[–] [email protected] 1 points 4 months ago (1 children)

I bet they go to react conferences

[–] [email protected] 1 points 4 months ago

snrk

brava.

[–] [email protected] 1 points 4 months ago

To think that when sneer club/techtakes migrated to lemmy, I was pretty sure we would not be getting a lot of incidental traffic to the communities. Just about as wrong as you can be.

[–] [email protected] 1 points 4 months ago

high willingness to accept painfully inexact responses
high tendency to side with authority when given no information
low ability to distinguish "how it is" from "how it seems like it should be"

Meta:

default expectation that others are the same way
indignant consent-ignoring gesture if they're not

[–] [email protected] 1 points 4 months ago

"I said good day, sir. good day!" [walks through revolving door]

[–] [email protected] 1 points 4 months ago

chatgpt gets it

[–] [email protected] 1 points 4 months ago (2 children)

https://huggingface.co/mistralai/Mistral-7B-v0.1/discussions/8#6527a6fca6eaf92e6c26fa59

Unfortunately we're unable to share details about the training and the datasets (extracted from the open Web) due to the highly competitive nature of the field.

The "open web" is full of copyrighted material.

[–] [email protected] 1 points 4 months ago* (last edited 4 months ago)

We had a social contract!

Mustafa Suleyman

[–] [email protected] 1 points 4 months ago

but it's apache2 sega! tooooootes freebies!