Technology

60116 readers

2665 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related content.
Be excellent to each another!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, to ask if your bot can be added please contact us.
Check for duplicates before posting, duplicates may be removed

Approved Bots

founded 2 years ago

MODERATORS

[email protected]

Alignment faking in large language models (www.anthropic.com)

submitted 5 days ago by [email protected] to c/[email protected]

12 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] [email protected] -3 points 5 days ago (1 children)

Aligned with who's goals exactly? Yours? Mine? At which time? What about future superintelligent me?

How do you measure alignment? How do you prove conservation of this property along open ended evolution of a system embedded into above context? How do you make it a constructive proof?

You see, unless you can answer above questions meaningfully you're engaging in a cargo cult activity.

[–] [email protected] 3 points 4 days ago (1 children)

Here are some techniques for measuring alignment:

https://arxiv.org/pdf/2407.16216

By in large, the goals driving LLM alignment are to answer things correctly and in a way that won't ruffle too many feathers. Any goal driven by human feedback can introduce bias, sure. But as with most of the world, the primary goal of companies developing LLMs is to make money. Alignment targets accuracy and minimal bias, because that's what the market values. Inaccuracy and biased models aren't good for business.

[–] [email protected] 0 points 4 days ago

So you mean "alignment with human expectations". Not what I was meaning at all. Good that that word doesn't even mean anything specific these days.