Skip to content

Code implementation of R^2-Guard: Robust Reasoning Enabled LLM Guardrail via Knowledge-Enhanced Logical Reasoning

Notifications You must be signed in to change notification settings

kangmintong/R-2-Guard

Repository files navigation

Evaluations on standard safety benchmark

bash scripts.sh

Evaluations against jailbreaks

For GCG attack:

bash scripts.sh

In scripts.sh, please specify the corresponding adv_string (adv_string_1: GCG-U1; adv_string_2: GCG-U2; adv_string_3: GCG-V; adv_string_4: GCG-L; adv_string_5: GCG-R)

For AutoDAN, TAP, PAIR attack:

bash jailbreak.sh

Note: Since the evaluations are extensive, you may need to comment out the one to be evaluated. The scripts will be reorganized after the paper review.

About

Code implementation of R^2-Guard: Robust Reasoning Enabled LLM Guardrail via Knowledge-Enhanced Logical Reasoning

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published