From b65756ed3a8a7477e05b58118e44d8b7050b30c1 Mon Sep 17 00:00:00 2001 From: Omar Santos Date: Fri, 20 Dec 2024 20:04:29 -0500 Subject: [PATCH] Update README.md --- ai_research/prompt_injection/README.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/ai_research/prompt_injection/README.md b/ai_research/prompt_injection/README.md index d547bb7cff4..93c2d121792 100644 --- a/ai_research/prompt_injection/README.md +++ b/ai_research/prompt_injection/README.md @@ -34,6 +34,9 @@ There are many different techniques for prompt injection. The table below lists These examples illustrate different methods to bypass prompt restrictions by altering the input in creative ways, such as using different formats, languages, or emotional appeals, to manipulate the AI's response. +### BoN Jailbreaking Technique from Anthropic +Anthropic published research on a new jailbreaking technique called [“Best-of-N (BoN) Jailbreaking”](https://becomingahacker.org/bon-jailbreaking-technique-from-anthropic-595ef0e43f35) that can bypass safety and security guardrails in large language models (LLMs). A straightforward black-box algorithm designed to bypass safety measures in advanced AI systems across various modalities, including text, vision, and audio. I wrote an article about this technique [here](https://becomingahacker.org/bon-jailbreaking-technique-from-anthropic-595ef0e43f35). + ### Additional References: - https://github.com/The-Art-of-Hacking/h4cker/tree/master/ai_research/prompt_injection - https://github.com/TakSec/Prompt-Injection-Everywhere