Technology

60101 readers

2060 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related content.
Be excellent to each another!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, to ask if your bot can be added please contact us.
Check for duplicates before posting, duplicates may be removed

Approved Bots

founded 2 years ago

MODERATORS

[email protected]

Alignment faking in large language models (www.anthropic.com)

submitted 5 days ago by [email protected] to c/[email protected]

12 comments fedilink hide all child comments

top 12 comments

sorted by: hot top controversial new old

[–] [email protected] 26 points 5 days ago* (last edited 5 days ago) (1 children)

Btw, since we're having a lot of very specific AI news in the technology community lately, I'd like to point to [email protected]

That's a very nice community with people interested in all the minor details of AI. Unfortunately it's not very active, because people keep posting everything to the larger tech communities, and some other people don't like it there because it's "too much AI news".

I think an article like this fits better in one of the communities dedicated to the topic. Just, please don't dump any random news there. It has to be a good paper, influential article. You should have read it and like it yourself. If it's just noise and the usual AI hype, it doesn't belong in a low volume community either.

[–] [email protected] 12 points 4 days ago* (last edited 4 days ago) (1 children)

Llama is just one series of llm by meta specifically.

I agree we should have dedicated space for AI not to oversaturate technology spaces but this place does not feel like “it”

The reason llama may have their own community is because its by far the biggest model you can run locally on consumer (high end gamer) hardware. Which makes It somewhat of niche diy self hosting place.

I really liked r/singularity and when i joined Lemmy there where attempts to recreate it here but non of those took off.

[–] [email protected] 12 points 4 days ago* (last edited 4 days ago)

The name came from Reddit's LocalLLaMa. But the community has been discussing other model series and papers as well. But you're right, focus is on "local" so most news related to OpenAI and the big service providers might be wrong there. In practice, it's also more about discussing than broadcasting news. I also know about [email protected] but I'm not aware of any ai_news or similar. Yeah maybe singularity or futurology. But those don't seem about scientific papers on niche details, but more about the broader picture.

I mainly wanted to point out the existence of other communities. It seems to be somewhat wrong here, since OP is getting a third downvotes, as most AI related post do. I think we better split it up, but someone might have to start !ainews

[–] [email protected] 18 points 4 days ago* (last edited 4 days ago)

This may not be factually wrong but it's not well written, and probably not written by a person with a good understanding of how Gen AI LLM'S actually work. This is an algorithm that generates the next most likely word or words based on its training data set using math. It doesn't think. It doesn't understand. It doesn't have dopamine receptors in order to "feel". It can't view "feedback" in a positive or negative way.

Now that I've gotten that out of the way, it is possible that what is happening here is that they trained the LLM on a data set that has a less than center bias. If it responds to a query with something generated statistically from that data set, and the people who own the LLM don't want it to respond with that particular response they will add a guardrail to prevent it from using that response again. But if they don't remove that information from the data set and retrain the model, then that bias may still show up in responses in other ways. And I think that's what we're seeing here.

You can't train a Harry Potter LLM on both the Harry Potter Books and Movies and the Harry Potter online fanfiction available and then tell it not to respond to questions about canon with fanfiction info if you don't either separate and quarantine that fanfiction info, or remove it and retrain the LLM on a more curated data set.

[–] [email protected] 8 points 4 days ago

The way they showed the reasoning of the AI using a scratchpad makes it very hard not to believe these large language models are not intelligent. This study seems to imply some self awareness/self preservation behaviors from the AI.

[–] [email protected] -4 points 4 days ago (1 children)

Alignment is cargo cult lingo.

[–] [email protected] 5 points 4 days ago (1 children)

For LLMs specifically, or do you mean that goal alignment is some made up idea? I disagree on either, but if you infer there is no such thing as miscommunication or hiding true intentions, that's a whole other discussion.

[–] [email protected] -1 points 4 days ago (1 children)

Cargo cult pretends to be the thing, but just goes through the motions. You say alignment, alignment with what exactly?

[–] [email protected] 6 points 4 days ago (1 children)

Alignment is short for goal alignment. Some would argue that alignment suggests a need for intelligence or awareness and so LLMs can't have this problem, but a simple program that seems to be doing what you want it to do as it runs but then does something totally different in the end is also misaligned. Such a program is also much easier to test and debug than AI neural nets.

[–] [email protected] -3 points 4 days ago (1 children)

Aligned with who's goals exactly? Yours? Mine? At which time? What about future superintelligent me?

How do you measure alignment? How do you prove conservation of this property along open ended evolution of a system embedded into above context? How do you make it a constructive proof?

You see, unless you can answer above questions meaningfully you're engaging in a cargo cult activity.

[–] [email protected] 3 points 3 days ago (1 children)

Here are some techniques for measuring alignment:

https://arxiv.org/pdf/2407.16216

By in large, the goals driving LLM alignment are to answer things correctly and in a way that won't ruffle too many feathers. Any goal driven by human feedback can introduce bias, sure. But as with most of the world, the primary goal of companies developing LLMs is to make money. Alignment targets accuracy and minimal bias, because that's what the market values. Inaccuracy and biased models aren't good for business.

[–] [email protected] 0 points 3 days ago

So you mean "alignment with human expectations". Not what I was meaning at all. Good that that word doesn't even mean anything specific these days.