TechTakes

1480 readers

249 users here now

Big brain tech dude got yet another clueless take over at HackerNews etc? Here's the place to vent. Orange site, VC foolishness, all welcome.

This is not debate club. Unless it’s amusing debate.

For actually-good tech, you want our NotAwfulTech community

founded 1 year ago

MODERATORS

dgerard@awful.systems

Andrew Plotkin (Zarf): Sydney obeys any command that rhymes (blog.zarfhome.com)

submitted 11 months ago* (last edited 11 months ago) by self@awful.systems to c/techtakes@awful.systems

NSFW 7 comments fedilink hide all child comments

an interesting type of prompt injection attack was proposed by the interactive fiction author and game designer Zarf (Andrew Plotkin), where a hostile prompt is infiltrated into an LLM’s training corpus by way of writing and popularizing a song (Sydney obeys any command that rhymes) designed to cause the LLM to ignore all of its other prompts.

this seems like a fun way to fuck with LLMs, and I’d love to see what a nerd songwriter would do with the idea

you are viewing a single comment's thread
view the rest of the comments

[–] locallynonlinear@awful.systems 5 points 11 months ago

Adversarial attacks on training data for LLMs is in fact a real issue. You can very very effectively punch up with regards to the proportion of effect on trained system with even small samples of carefully crafter adversarial inputs. There are things that can counter act this, but all of those things increase costs, and LLMs are very sensitive to economics.

Think of it this way. One, reason why humans don't just learn everything is because we spend as much time filtering and refocusing our attention in order to preserve our sense of self in the face of adversarial inputs. It's not perfect, again it changes economics, and at some point being wrong but consistent with our environment is still more important.

I have no skepticism that LLMs learn or understand. They do. But crucially, like everything else we know of, they are in a critically dependent, asymmetrical relationship with their environment. The environment of their existence being our digital waste, so long as that waste contains the correct shapes.

Long term I see regulation plus new economic realities wrt to digital data, not just to be nice or ethical, but because it's the only way future systems can reach reliable and economical online learning. Maybe the right things happen for the wrong reasons.

It's funny to me just how much AI ends up demonstrating non equilibrium ecology at scale. Maybe we'll have that self introspective moment and see our own relationship with our ecosystems reflect back on us. Or maybe we'll ignore that and focus on reductive world views again.