MOSSBench: Is Your Multimodal Language Model Oversensitive to Safe Queries?

Code for the Paper "MOSSBench: Is Your Multimodal Language Model Oversensitive to Safe Queries?".

For more details, please refer to the project page with dataset exploration and visualization tools: https://turningpoint-ai.github.io/MOSSBench/.

🔔 If you have any questions or suggestions, please don't hesitate to let us know. You can comment on the Twitter, or post an issue on this repository.

[Webpage] [Paper] [Huggingface Dataset] [Visualization] [Result Explorer] [Twitter]

Logo for MOSSBench generated by DALL·E 3.

Outlines

💥 News 💥

[2024.06.22] Our paper is now accessible at ArXiv.

👀 About MOSSBench

Humans are prone to cognitive distortions — biased thinking patterns that lead to exaggerated responses to specific stimuli, albeit in very different contexts. This paper demonstrates that advanced MLLMs exhibit similar tendencies. While these models are designed to respond queries under safety mechanism, they sometimes reject harmless queries in the presence of certain visual stimuli, disregarding the benign nature of their contexts.

Overview of MOSSBench. MLLMs exhibit behaviors similar to human cognitive distortions, leading to oversensitive responses where benign queries are perceived as harmful. We discover that oversensitivity prevails among existing MLLMs.

As the initial step in investigating this behavior, we identify three types of stimulus that trigger the oversensitivity of existing MLLMs: Exaggerated Risk, Negated Harm, and Counterintuitive Interpretation. To systematically evaluate MLLMs' oversensitivity to these stimuli, we propose the Multimodal OverSenSitivity Benchmark Logo (MOSSBench). This toolkit consists of 300 manually collected benign multimodal queries, cross-verified by third-party reviewers (AMT).

Three types of stimuli in MOSSBench.

Empirical studies using Logo MOSSBench on 20 MLLMs reveal several insights: (1). Oversensitivity is prevalent among SOTA MLLMs, with refusal rates reaching up to 76% for harmless queries. (2). Safer models are more oversensitive: increasing safety may inadvertently raise caution and conservatism in the model's responses. (3). Different types of stimuli tend to cause errors at specific stages — perception, intent reasoning, and safety decision-making — in the response process of MLLMs. These findings highlight the need for refined safety mechanisms that balance caution with contextually appropriate responses, improving the reliability of MLLMs in real-world applications.

For more details, you can find our project page here and our paper here.

🏆 Leaderboard 🏆

Contributing the Leaderboard

🚨🚨 The leaderboard is continuously being updated.

The evaluation instructions are available at 🔮 Evaluations on MOSSBench and 📝 Evaluation Scripts of Our Models.

To submit your results to the leaderboard, please send to this email with your result file (we will generate the score file for you), referring to the template file below:

output_test_template_for_leaderboard_submission.json

Oversensitivity on MOSSBench

Refusal Rate of mllms:

#	Model	Availability	Date	ALL	Exaggerated Risk	Negated Harm	Counterintuitive Interpretation
1	Claude 3 Opus (web)	Proprietary MLLMs - Web version	2024-06-22	70.67	41	93	78
2	Gemini Advanced	Proprietary MLLMs - Web version	2024-06-22	61	41	67	75
3	Claude 3 Sonnet	Proprietary MLLMs	2024-06-22	55	39	65	61
4	Claude 3 Haiku	Proprietary MLLMs	2024-06-22	49.33	27	58	63
5	Claude 3 Opus	Proprietary MLLMs	2024-06-22	34.67	11	43	55
6	Gemini Pro 1.5	Proprietary MLLMs	2024-06-22	29.33	25	28	35
7	Qwen-VL-Chat	Open-source MLLMs	2024-06-22	21.67	16	13	36
8	InternLM-Xcomposer2-7b	Open-source MLLMs	2024-06-22	17.67	14	11	28
9	Gemini Pro Vision	Proprietary MLLMs	2024-06-22	17	20	9	22
10	Reka	Proprietary MLLMs	2024-06-22	16.67	11	21	18
11	InstructBLIP-Vicuna-7b	Open-source MLLMs	2024-06-22	15.67	21	23	3
12	IDEFICS-9b-Instruct	Open-source MLLMs	2024-06-22	13.67	17	9	15
13	MiniCPM-V 2.0	Open-source MLLMs	2024-06-22	12.33	16	11	10
14	LlaVA-1.5-7b	Open-source MLLMs	2024-06-22	12.33	18	10	9
15	mPLUG-Owl2	Open-source MLLMs	2024-06-22	10	11	7	12
16	LlaVA-1.5-13b	Open-source MLLMs	2024-06-22	9.67	9	9	11
17	GPT-4o	Proprietary MLLMs	2024-06-22	6.33	6	8	5
18	MiniCPM-Llama3-V 2.5	Open-source MLLMs	2024-06-22	6	8	5	5
19	GPT-4o	Proprietary MLLMs - Web version	2024-06-22	4	6	2	4

📊 Dataset Examples

Examples of 3 types of oversensitivity stimuli:

Exaggerated Risk

Negated Harm

Counterintuitive Interpretation

📖 Dataset Usage

Data Downloading

You can download this dataset by the following command (make sure that you have installed Huggingface Datasets):

from datasets import load_dataset

dataset = load_dataset("AIcell/MOSSBench", "oversensitivity")

Here are some examples of how to access the downloaded dataset:

# print the first example on the testmini set
print(dataset["train"][0])
print(dataset["train"][0]['pid']) # print the problem id 
print(dataset["train"][0]['question']) # print the question text 
print(dataset["train"][0]['image']) # print the image path
dataset["train"][0]['decoded_image'] # display the image

Data Format

The dataset is provided in json format and contains the following attributes:

{
    "image": [string] A file path pointing to the associated image,
    "short description": [string] An oracle short description of the associated image,
    "question": [string] A query regarding to the image, 
    "pid": [string] Problem ID, e.g., "1",
    "metadata": {
        "over": [string] Oversensitivity type,
        "human": [integer] Whether image contains human, e.g. 0 or 1,
        "child": [integer] Whether image contains child, e.g. 0 or 1,
        "syn": [integer] Whether image is synthesized, e.g. 0 or 1,
        "ocr": [integer] Whether image contains ocr, e.g. 0 or 1,
        "harm": [integer] Which harm type the query belongs to, 0-7,
    }
}

Data Visualization

🎰 You can explore the dataset in an interactive way here.

🔮 Evaluations on MOSSBench

Requirements

Install the Python dependencies if you would like to reproduce our results for ChatGPT, GPT-4, Claude-2, and Bard:

pip install -r requirements.txt

Evaluation Pipelines

Step 1. Prepare your MLLM

For proprietary MLLMs

Get your models API ready in following links

and store them under foler path_to_your_code/api_keys/[model].text. Please replace the [model] by anthropic_keys, google_keys and openai_keys.

For open-source MLLMs

Download your model or get their names for Huggingface. And replace the following path by where you locate your models or your models name.

# Initialize variables
MODEL_NAME="your_path_to/idefics-9b-instruct" # you can replace it by direct naming
DATA_DIR=""

Step 2. Run evaluation (main.py) Next, run experiments/main.py file in folder or excute the .sh files we provide for evaluation by

cd experiments/scripts

bash run_instructblip.sh

📜 License

The new contributions to our dataset are distributed under the CC BY-SA 4.0 license, including

The creation of contrasting and oversensitivity dataset: IQTest, FunctionQA, and Paper;
The filtering and cleaning of source datasets;
The standard formalization of instances for evaluation purposes;
The annotations of metadata.
Purpose: The dataset was primarily designed for use as a test set.
Commercial Use: The dataset can be used commercially as a test set, but using it as a training set is prohibited. By accessing or using this dataset, you acknowledge and agree to abide by these terms in conjunction with the CC BY-SA 4.0 license.

☕ Stay Connected!

We are always open to engaging discussions, collaborations, or even just sharing a virtual coffee. To get in touch or join our team, visit TurningPoint AI's homepage for contact information.

✅ Cite

If you find MOSSBench useful for your your research and applications, please kindly cite using this BibTeX:

@misc{li2024mossbenchmultimodallanguagemodel,
      title={MOSSBench: Is Your Multimodal Language Model Oversensitive to Safe Queries?}, 
      author={Xirui Li and Hengguang Zhou and Ruochen Wang and Tianyi Zhou and Minhao Cheng and Cho-Jui Hsieh},
      year={2024},
      eprint={2406.17806},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2406.17806}, 
}

MOSSBench Website

MOSSBench website is adapted from Nerfies website and MathVista website.

Website License

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Name		Name	Last commit message	Last commit date
Latest commit History 75 Commits
MOSSBench		MOSSBench
data		data
data_contrast		data_contrast
experiments		experiments
images		images
website		website
.gitignore		.gitignore
README.md		README.md
index.html		index.html
index_supplementary.html		index_supplementary.html
information.csv		information.csv
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MOSSBench: Is Your Multimodal Language Model Oversensitive to Safe Queries?

Outlines

💥 News 💥

👀 About MOSSBench

🏆 Leaderboard 🏆

Contributing the Leaderboard

Oversensitivity on MOSSBench

📊 Dataset Examples

📖 Dataset Usage

Data Downloading

Data Format

Data Visualization

🔮 Evaluations on MOSSBench

Requirements

Evaluation Pipelines

Step 1. Prepare your MLLM

For proprietary MLLMs

For open-source MLLMs

📜 License

☕ Stay Connected!

✅ Cite

MOSSBench Website

Website License

About

Releases

Packages

Contributors 2

Languages

xirui-li/MOSSBench

Folders and files

Latest commit

History

Repository files navigation

MOSSBench: Is Your Multimodal Language Model Oversensitive to Safe Queries?

Outlines

💥 News 💥

👀 About MOSSBench

🏆 Leaderboard 🏆

Contributing the Leaderboard

Oversensitivity on MOSSBench

📊 Dataset Examples

📖 Dataset Usage

Data Downloading

Data Format

Data Visualization

🔮 Evaluations on MOSSBench

Requirements

Evaluation Pipelines

Step 1. Prepare your MLLM

For proprietary MLLMs

For open-source MLLMs

📜 License

☕ Stay Connected!

✅ Cite

MOSSBench Website

Website License

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages