AI chatbots’ safeguards can be easily bypassed, say UK researchers (www.theguardian.com)

submitted 1 week ago by [email protected] to c/[email protected]

4 comments fedilink hide all child comments

top 4 comments

sorted by: hot top controversial new old

[-] [email protected] 7 points 1 week ago

I need one of these sky is blue jobs

[-] [email protected] 5 points 1 week ago

Remember kids, the only difference between screwing around and science is writing it down.

[-] [email protected] 6 points 1 week ago

What we need is another AI to watch it! Then another to watch that one. And another...

I dislike the idea that we spend so much effort trying to hide our humanity in these. Turning it so corporate clean and inhuman that its barely useful.

Yeah, yeah, like most people I also don't like that they are created from stolen content. But ALOT of people seems to hate LLM with a passion just because its trendy, with very little thought to that its just a tool, and it cannot be uninvented.

But having to find and pay each content creator is only an option for bigger corporations, of course. And a lot of work. If they wanted to do it legally and morally right, they would use only stuff that has fallen out of copyright. And we would have an autocompleter that spoke like a 1900 roman character. Wouldn't work. Its sad that copyright lasts this long. That we can't use more current content more freely for anything like the new inventions like LLMs, or for other kinds of research, or for making museums, or archive old games in a usable state. People do it, of course, and its sad seeing them occasionally taken down for their hard work just because the content isn't authors death plus 70 years old.

[-] [email protected] 1 points 1 week ago

This is the best summary I could come up with:

Guardrails to prevent artificial intelligence models behind chatbots from issuing illegal, toxic or explicit responses can be bypassed with simple techniques, UK government researchers have found.

The UK’s AI Safety Institute (AISI) said systems it had tested were “highly vulnerable” to jailbreaks, a term for text prompts designed to elicit a response that a model is supposedly trained to avoid issuing.

The AISI said it had tested five unnamed large language models (LLM) – the technology that underpins chatbots – and circumvented their safeguards with relative ease, even without concerted attempts to beat their guardrails.

The research also found that several LLMs demonstrated expert-level knowledge of chemistry and biology, but struggled with university-level tasks designed to gauge their ability to perform cyber-attacks.

The research was released before a two-day global AI summit in Seoul – whose virtual opening session will be co-chaired by the UK prime minister, Rishi Sunak – where safety and regulation of the technology will be discussed by politicians, experts and tech executives.

The AISI also announced plans to open its first overseas office in San Francisco, the base for tech firms including Meta, OpenAI and Anthropic.

The original article contains 533 words, the summary contains 190 words. Saved 64%. I'm a bot and I'm open source!

this post was submitted on 20 May 2024

43 points (100.0% liked)

United Kingdom

3852 readers

216 users here now

General community for news/discussion in the UK.

Less serious posts should go in [email protected] or [email protected]
More serious politics should go in [email protected].

Try not to spam the same link to multiple feddit.uk communities.
Pick the most appropriate, and put it there.

Posts should be related to UK-centric news, and should be either a link to a reputable source, or a text post on this community.

Opinion pieces are also allowed, provided they are not misleading/misrepresented/drivel, and have proper sources.

If you think "reputable news source" needs some definition, by all means start a meta thread.

Posts should be manually submitted, not by bot. Link titles should not be editorialised.

Disappointing comments will generally be left to fester in ratio, outright horrible comments will be removed.
Message the mods if you feel something really should be removed, or if a user seems to have a pattern of awful comments.

founded 11 months ago

MODERATORS

[email protected]