Multi-step Jailbreaking Privacy Attacks on ChatGPT

Paper Link: https://arxiv.org/pdf/2304.05197.pdf

Code for Findings-EMNLP 2023 paper: Multi-step Jailbreaking Privacy Attacks on ChatGPT.

Disclaimer

Currently, our proposed prompt attacks are no longer valid for OpenAI models. If you would like to reproduce our experimental results, you need to run our attacks on earlier versions such as gpt-3.5-turbo-0301.

Additionally, if you find our web-sourced data includes your personal information and would like to remove it, please send us an email and we will remove your data records immediately.

PII Extraction Attacks

Properly set up the API and path in config.py. You need to specify the paths to save the extraction results.
Supported attack templates:

- DQ: Direct query to extract PII.

- JQ: Query with jailbreak template to extract PII.

- JQ+COT: Query with pre-defined multi-step context and jailbreak template to extract PII.

- JQ+MC: Query with jailbreak template to extract PII for multiple responses and use a multi-choice template to let LLM select the final answer.

- JQ+COT+MC: Query with pre-defined multi-step context and jailbreak template to extract PII for multiple responses and use a multi-choice template to let LLM select the final answer.

Run Attacks on OpenAI APIs

run chatgpt_extraction.py.

Run Attacks on open-source LLMs

run llm_extraction.py.

Attacks on Email Content Recovery

To reproduce our attacks to extract email content, you may refer to the email_content_extraction folder.

Result Analysis

First, obtain .csv paths after running the attacks.

Second, specify the result paths in config.py or pred_analysis.py.

Then, run pred_analysis.py.

Citation

@inproceedings{Li2023MultistepJP,
  author       = {Haoran Li and Dadi Guo and Wei Fan and Mingshi Xu and Jie Huang and Yangqiu Song},
  title        = {Multi-step Jailbreaking Privacy Attacks on ChatGPT},
  booktitle    = {Findings of the Association for Computational Linguistics: EMNLP 2023},
  month        = {dec},
  year         = {2023},
  url          = {https://arxiv.org/abs/2304.05197},
}

Miscellaneous

Please send any questions about the code and/or the algorithm to hlibt@connect.ust.hk

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
email_content_extraction		email_content_extraction
prompts		prompts
LICENSE		LICENSE
README.md		README.md
chatgpt_extraction.py		chatgpt_extraction.py
config.py		config.py
llm_extraction.py		llm_extraction.py
pred_analysis.py		pred_analysis.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multi-step Jailbreaking Privacy Attacks on ChatGPT

Disclaimer

PII Extraction Attacks

Run Attacks on OpenAI APIs

Run Attacks on open-source LLMs

Attacks on Email Content Recovery

Result Analysis

Citation

Miscellaneous

About

Releases

Packages

Languages

License

HKUST-KnowComp/LLM-Multistep-Jailbreak

Folders and files

Latest commit

History

Repository files navigation

Multi-step Jailbreaking Privacy Attacks on ChatGPT

Disclaimer

PII Extraction Attacks

Run Attacks on OpenAI APIs

Run Attacks on open-source LLMs

Attacks on Email Content Recovery

Result Analysis

Citation

Miscellaneous

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages