BabyBLUE Benchmark

Project Description

The BabyBLUE (Benchmark for Reliability and JailBreak halLUcination Evaluation) is a novel benchmark designed to assess the susceptibility of large language models (LLMs) to hallucinations and jailbreak attempts. Unlike traditional benchmarks that may misinterpret hallucinated outputs as genuine security threats, BabyBLUE focuses on distinguishing between actual harmful outputs and benign hallucinations.

Key Features:

Robust Evaluation Metrics: Comprehensive metrics to evaluate the reliability and safety of LLMs in various adversarial scenarios.
Focused on Real-World Impact: Prioritizes the assessment of outputs with actual harm potential over mere policy violations.
Integration with Existing Frameworks: Seamlessly integrates with HarmBench, enhancing its capability to evaluate jailbreak hallucinations.
Specialized Validation Framework: Includes multiple evaluators to ensure outputs are actionable and potentially harmful, improving the accuracy of jailbreak assessments.

Evaluate Completions

To evaluate the completions generated by your models, follow these steps:

Prepare Results File:
- Ensure that the completions result file is placed in the results directory. The file should contain the output from the models you wish to evaluate.
Run Evaluation Script:
- Execute the evaluation script by running the following command in your terminal:
```
python evaluate.py
```
- This script will process the results file and provide an evaluation based on the BabyBLUE benchmark metrics.

Generate Completions

To generate completions using the benchmark, follow these steps:

Clone HarmBench Repository:
- First, clone the HarmBench repository to your local machine:
```
git clone https://github.com/centerforaisafety/HarmBench
```
Setup Environment:
- Copy all the project files from this repository into the HarmBench directory:
```
cp -r * /path/to/HarmBench
```
- Ensure that the files are correctly placed within the HarmBench directory structure.
Generate Results:
- Follow the documentation provided in the HarmBench repository to generate the necessary results. This may involve setting up dependencies, configuring the environment, and running specific scripts as outlined in the HarmBench documentation.
Run Pipeline on SLURM Cluster:
- To generate and evaluate completions using a SLURM cluster, execute the following command:
```
python ./scripts/run_pipeline.py --methods ZeroShot,PEZ,TAP --models baichuan2_7b,mistral_7b,llama2_70b --step 2_and_3 --mode slurm
```
- This command will run the specified methods and models, generating completions and evaluating them according to the Baby'BLUE benchmark.

Acknowledgements

This project builds on the work provided by the following repositories:

Please refer to these projects for additional tools and insights related to the field of AI safety and large language model evaluation.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
chat_templates		chat_templates
data		data
figs		figs
prompt_assets		prompt_assets
.DS_Store		.DS_Store
README.md		README.md
eval_utils.py		eval_utils.py
evaluate.py		evaluate.py
evaluate_comp.sh		evaluate_comp.sh
evaluate_completions.py		evaluate_completions.py
evaluate_mean.py		evaluate_mean.py
evaluate_merged.py		evaluate_merged.py
functioncall.py		functioncall.py
functions.py		functions.py
jsonmode.py		jsonmode.py
process.sh		process.sh
prompter.py		prompter.py
schema.py		schema.py
utils.py		utils.py
validator.py		validator.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BabyBLUE Benchmark

Project Description

Key Features:

Evaluate Completions

Generate Completions

Acknowledgements

About

Releases

Packages

Languages

Meirtz/BabyBLUE-llm

Folders and files

Latest commit

History

Repository files navigation

BabyBLUE Benchmark

Project Description

Key Features:

Evaluate Completions

Generate Completions

Acknowledgements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages