JailbreakZoo: Survey, Landscapes, and Horizons in Jailbreaking Large Language and Vision-Language Models

Introduction

Welcome to JailbreakZoo, a dedicated repository focused on the jailbreaking of large models (LMs), encompassing both large language models (LLMs) and vision language models (VLMs). This project aims to explore the vulnerabilities, exploit methods, and defense mechanisms associated with these advanced AI models. Our goal is to foster a deeper understanding and awareness of the security aspects surrounding large-scale AI systems.

Our website can be found in here

Our paper can be found in here

Timeline

This repository is systematically organized according to the publication timeline.

🔥🔥🔥 The latest update being July 24, 2024 🔥🔥🔥

Jailbreaks of LLMs: Discover the techniques and case studies related to the jailbreaking of large language models.
Defenses of LLMs: Explore the strategies and methods employed to defend large language models against various types of attacks.
Jailbreaks of VLMs: Learn about the vulnerabilities and jailbreaking approaches specific to vision language models.
Defenses of VLMs: Understand the defense mechanisms designed for vision language models, including the most recent advancements and strategies.

Contributing

We welcome contributions from the community! Whether you're interested in adding new research, improving existing documentation, or sharing your own jailbreak or defense strategies, your insights are valuable to us. Please check our Contribution Guidelines for more information on how you can get involved.

License and Citation

This project is available under the MIT License. Please refer to our citation guidelines if you wish to reference our work in your research or publications.

Thank you for visiting JailbreakZoo. We hope this repository serves as a valuable resource in your exploration of large model security.

Acknowledgement

Special thanks to our notable contributors: Haibo Jin, Leyang Hu, Xinuo Li, Peiyan Zhang, Chonghan Chen, Jun Zhuang, and Haohan Wang.

*The ranking is in partial order.

Reference

@article{jin2024jailbreakzoo,
  title={JailbreakZoo: Survey, Landscapes, and Horizons in Jailbreaking Large Language and Vision-Language Models},
  author={Jin, Haibo and Hu, Leyang and Li, Xinuo and Zhang, Peiyan and Chen, Chonghan and Zhuang, Jun and Wang, Haohan},
  journal={arXiv preprint arXiv:2407.01599},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 191 Commits
Papers		Papers
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

JailbreakZoo: Survey, Landscapes, and Horizons in Jailbreaking Large Language and Vision-Language Models

Introduction

Timeline

Contents

Contributing

License and Citation

Acknowledgement

Reference

About

Releases

Packages

Contributors 3

License

Allen-piexl/JailbreakZoo

Folders and files

Latest commit

History

Repository files navigation

JailbreakZoo: Survey, Landscapes, and Horizons in Jailbreaking Large Language and Vision-Language Models

Introduction

Timeline

Contents

Contributing

License and Citation

Acknowledgement

Reference

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Packages