News Researchers jailbreak AI chatbots with ASCII art -- ArtPrompt bypasses safety measures to unlock malicious queries

Status
Not open for further replies.

Admin

Administrator
Staff member

usertests

Distinguished
Mar 8, 2013
969
856
19,760
Based on the answer it's starting to give about counterfeit money in that screenshot, I bet its bomb advice is hallucinatory garbage. Don't even bother patching the exploit kthx.
 
  • Like
Reactions: dalauder

MatheusNRei

Great
Jan 15, 2024
55
38
60
I bet its bomb advice is hallucinatory garbage
Considering bomb-making advice isn't hard to find online, even by accident, I certainly wouldn't.
If it's online somewhere it's safe to assume AI will know of it if the dataset isn't heavily curated.
 
Last edited:

Rob1C

Distinguished
Jun 2, 2016
111
20
18,685
Yes the instructions are easy to find, but the ASCII art bypass has some shortcomings in the details:

cartoon_bomb_6.png
 
Wow, this ASCII art bypass is VERY entertaining. I'm amazed that the chatbots can take non-textual input like that.

What matters to me is that the bypass isn't something elementary school students will run into. Anyone old enough to do ASCII art is old enough to run into some inappropriate stuff and know that it's inappropriate.
 
Like on the other article about how to "hack" LLMs, it's useless and will continue to be useless for Gemini, Copilot, and other LLMs trained on internet data at large, the problem is when small, personalized "AI" programs are created for specific companies only on its data but using ChatGPT or others as a foundation, like what Google and Microsoft are advertising to do now, when hacks and exploits could result in real damage. That's why these "attacks" are a very good thing.
 

adamXpeter

Commendable
Jun 19, 2022
39
9
1,535
Based on the answer it's starting to give about counterfeit money in that screenshot, I bet its bomb advice is hallucinatory garbage. Don't even bother patching the exploit kthx.
Maybe it is not an accident, best solution is to make the threat eliminate itself.
 
  • Like
Reactions: usertests
Mar 8, 2024
1
0
10
What I find most surprising about exploits like this is the fact that input is subject to alignment, but the AI's output is seemingly exempt from any review. Even a cursory analysis of output would discover that alignment has failed, but this kind of basic sanity check is seemingly never done.
 

CmdrShepard

Prominent
BANNED
Dec 18, 2023
531
428
760
What I find most surprising about exploits like this is the fact that input is subject to alignment, but the AI's output is seemingly exempt from any review. Even a cursory analysis of output would discover that alignment has failed, but this kind of basic sanity check is seemingly never done.
To me there's nothing surprising about that.

Have you ever heard of Little Bobby Tables?

They learned that they should sanitize their inputs, the problem is that they are sanitizing the wrong input.

They should have santized the training input, not try (and fail) to sanitize the inference input.

The only way for the model to be unable to produce certain answers is by not knowing them.
 
Status
Not open for further replies.