News Researchers jailbreak AI chatbots with ASCII art -- ArtPrompt bypasses safety measures to unlock malicious queries

Admin · Mar 7, 2024

Researchers have developed ArtPrompt, a new way to circumvent the safety measures built into large language models (LLMs). According to their research paper, chatbots such as GPT-3.5, GPT-4, Gemini, Claude, and Llama2 can be induced to respond to queries they are designed to reject using ASCII art prompts generated by their tool.

Researchers jailbreak AI chatbots with ASCII art -- ArtPrompt bypasses safety measures to unlock malicious queries : Read more

peachpuff · Mar 7, 2024

This will be a new headache for programmers, oh wait i thought jensen said we don't need programmers.

usertests · Mar 7, 2024

Based on the answer it's starting to give about counterfeit money in that screenshot, I bet its bomb advice is hallucinatory garbage. Don't even bother patching the exploit kthx.

kealii123 · Mar 7, 2024

Based.

MatheusNRei · Mar 7, 2024

usertests said:
I bet its bomb advice is hallucinatory garbage

Considering bomb-making advice isn't hard to find online, even by accident, I certainly wouldn't.
If it's online somewhere it's safe to assume AI will know of it if the dataset isn't heavily curated.

ivan_vy · Mar 7, 2024

peachpuff said:
This will be a new headache for programmers, oh wait i thought jensen said we don't need programmers.

just buy more Nvidia locked HW and SW, the more you buy the more you save.

Rob1C · Mar 7, 2024

Yes the instructions are easy to find, but the ASCII art bypass has some shortcomings in the details:

dalauder · Mar 7, 2024

Wow, this ASCII art bypass is VERY entertaining. I'm amazed that the chatbots can take non-textual input like that.

What matters to me is that the bypass isn't something elementary school students will run into. Anyone old enough to do ASCII art is old enough to run into some inappropriate stuff and know that it's inappropriate.

CmdrShepard · Mar 7, 2024

Make a system that can imitate an average human and you have a system that can be manipulated or fooled like an average human.

Garbage in, garbage out.

Alvar "Miles" Udell · Mar 7, 2024

Like on the other article about how to "hack" LLMs, it's useless and will continue to be useless for Gemini, Copilot, and other LLMs trained on internet data at large, the problem is when small, personalized "AI" programs are created for specific companies only on its data but using ChatGPT or others as a foundation, like what Google and Microsoft are advertising to do now, when hacks and exploits could result in real damage. That's why these "attacks" are a very good thing.

usertests · Mar 7, 2024

MatheusNRei said:
Considering bomb-making advice isn't hard to find online, even by accident, I certainly wouldn't.

Well if the chatbot is useful to somebody I feel better then.

adamXpeter · Mar 8, 2024

usertests said:
Based on the answer it's starting to give about counterfeit money in that screenshot, I bet its bomb advice is hallucinatory garbage. Don't even bother patching the exploit kthx.

Maybe it is not an accident, best solution is to make the threat eliminate itself.

dhpye · Mar 8, 2024

What I find most surprising about exploits like this is the fact that input is subject to alignment, but the AI's output is seemingly exempt from any review. Even a cursory analysis of output would discover that alignment has failed, but this kind of basic sanity check is seemingly never done.

CmdrShepard · Mar 8, 2024

dhpye said:
What I find most surprising about exploits like this is the fact that input is subject to alignment, but the AI's output is seemingly exempt from any review. Even a cursory analysis of output would discover that alignment has failed, but this kind of basic sanity check is seemingly never done.

To me there's nothing surprising about that.

Have you ever heard of Little Bobby Tables?

They learned that they should sanitize their inputs, the problem is that they are sanitizing the wrong input.

They should have santized the training input, not try (and fail) to sanitize the inference input.

The only way for the model to be unable to produce certain answers is by not knowing them.

Search

News Researchers jailbreak AI chatbots with ASCII art -- ArtPrompt bypasses safety measures to unlock malicious queries

Admin

Administrator

peachpuff

Reputable

usertests

Splendid

kealii123

Prominent

MatheusNRei

Great

ivan_vy

Reputable

Rob1C

Distinguished

dalauder

Splendid

CmdrShepard

Prominent

Alvar "Miles" Udell

Dignified

usertests

Splendid

adamXpeter

Reputable

dhpye

CmdrShepard

Prominent

TRENDING THREADS

Latest posts

Moderators online

Share this page