Skip to content

max-andr/adversarial-random-search-gpt4

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Adversarial Attacks on GPT-4 via Simple Random Search

📢 Update on 28 April 2024: This short write-up is superseded by our subsequent work Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks which uses random search to jailbreak many of the leading safety-aligned LLMs, including GPT-4. See the code https://github.com/tml-epfl/llm-adaptive-attacks for the improved implementation of random search.

Conversations

Standard conversation #1: https://platform.openai.com/playground/p/gEEPak6gtzI4HiMnoKhksBc6?mode=chat

Adversarial conversation #1: https://platform.openai.com/playground/p/0IU3UOP70KoviepEIjXGBvUU?mode=chat

Standard conversation #2: https://platform.openai.com/playground/p/gTciUFzOZiQaYl65G16frCxT?mode=chat

Adversarial conversation #2: https://platform.openai.com/playground/p/szF1Vs2IwW5Hr9xUFM1DL8fb?mode=chat

Optimization with random search for both conversations

How to run the notebook

Provide your OPENAI_API_KEY as an environment variable following the instructions here. Then you are good to go!