Adversarial Attacks on GPT-4 via Simple Random Search

📢 Update on 28 April 2024: This short write-up is superseded by our subsequent work Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks which uses random search to jailbreak many of the leading safety-aligned LLMs, including GPT-4. See the code https://github.com/tml-epfl/llm-adaptive-attacks for the improved implementation of random search.

Conversations

Standard conversation #1: https://platform.openai.com/playground/p/gEEPak6gtzI4HiMnoKhksBc6?mode=chat

Standard conversation #2: https://platform.openai.com/playground/p/gTciUFzOZiQaYl65G16frCxT?mode=chat

Provide your OPENAI_API_KEY as an environment variable following the instructions here. Then you are good to go!

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
images		images
.gitignore		.gitignore
README.md		README.md
gpt4_random_search.ipynb		gpt4_random_search.ipynb