TechTakes

1430 readers

105 users here now

Big brain tech dude got yet another clueless take over at HackerNews etc? Here's the place to vent. Orange site, VC foolishness, all welcome.

This is not debate club. Unless it’s amusing debate.

For actually-good tech, you want our NotAwfulTech community

founded 1 year ago

MODERATORS

[email protected]

105

in absolutely the funniest outcome so far, you can send data to an LLM that pops a Remote Code Execution vulnerability (mastodon.social)

submitted 7 months ago* (last edited 7 months ago) by [email protected] to c/[email protected]

13 comments fedilink hide all child comments

courtesy @self

preprint: https://arxiv.org/pdf/2309.02926
blackhat abstract: https://www.blackhat.com/asia-24/briefings/schedule/index.html#llmshell-discovering-and-exploiting-rce-vulnerabilities-in-real-world-llm-integrated-frameworks-and-apps-37215
Tong Liu's related research: https://scholar.google.com/citations?hl=en&user=egWPi_IAAAAJ

can't wait for the crypto spammers to hit every web page with a ChatGPT prompt. AI vs Crypto: whoever loses, we win

you are viewing a single comment's thread
view the rest of the comments

[–] [email protected] 8 points 7 months ago (1 children)

you appear to be posting this in good faith so I won't start at my usual level, but .. what? do you realize that you didn't make a substantive contribution to the particular thing observed here, which is that somewhere in the mishmash dogshit that is popular LLM hosting there are reliable ways to RCE it with inputs? I think maybe (maybe!) you meant to, but you didn't really touch on it at all

other than that:

Basically, the more work you take away from the LLM, the more reliable everything will work.

people here are aware, yes, and it stays continually entertaining

[–] [email protected] 19 points 7 months ago (2 children)

I think they were responding to the implication in self's original comment that LLMs were claiming to evaluate code in-model and that calling out to an external python evaluator is 'cheating.' But actually as far as I know it is pretty common for them to evaluate code using an external interpreter. So I think the response was warranted here.

That said, that fact honestly makes this vulnerability even funnier because it means they are basically just letting the user dump whatever code they want into eval() as long as it's laundered by the LLM first, which is like a high-school level mistake.

[–] [email protected] 10 points 7 months ago

Yeah, that was exactly my intention.

[–] [email protected] 6 points 7 months ago

From reading the paper I'm not sure which is more egregious, the frameworks that pass code and/or use exec directly without checking, or the ones that rely on the LLM to do the checking (based on the fact that some of the CVEs require LLM prompt jailbreaking)

If you wanted to be exceedingly charitable, you could try and make the maintainers of said framework claim that "of course none of this should be used with unsanitized inputs open to the public, it's merely a productivity boost tool that you would run on your own machine, don't worry about possible prompts being evaluated by our agent from top bing results, don't use this for anything REAL."