GitHub - 1zhou-Wang/MemVR: Official implementation of paper 'Look Twice Before You Answer: Memory-Space Visual Retracing for Hallucination Mitigation in Multimodal Large Language Models'.

Look Twice Before You Answer: Memory-Space Visual Retracing for Hallucination Mitigation in Multimodal Large Language Models

If you like our project, please give us a star ⭐ on GitHub for the latest update.

📣 News

[2024/10/7] ⭐️ Paper of MemVR uploaded. Please check out this link for details.
[2024/10/7] 🚀 Codes will be released on 2024-10-23. Welcome to watch 👀 this repository for the latest updates.
[2024/10/23] 🚀 Source code released! We're now working on extending MemVR to more MLLMs.

🎯 Overview

We propose Memory-Space Visual Retracing (MemVR), a novel hallucination mitigation paradigm without needing external knowledge retrieval or additional fine-tuning. MemVR has two significant advantages:

First, MemVR significantly mitigates hallucination issues across various MLLMs and excels in general benchmarks, emphasizing its potential for widespread applicability.
Second, MemVR is a plug-and-play solution without incurring added time overhead.

It’s a game-changer for effectiveness and efficiency.

In contrast to previous methods, which primarily focus on eliminating biases of language priors, MemVR seeks to replenish visual clues towards more evidential responses, which signifies the other side of the coin. Comprehensive experimental evaluations demonstrate that MemVR significantly mitigates hallucination issues across various MLLMs and excels in general benchmarks without incurring added time overhead.

🕹️ Usage

Installation

We recommend you to use LLaVA as the working environment. Please clone the repository from LLaVA and set up the envirnment bu running

git clone https://github.com/haotian-liu/LLaVA
cd LLaVA
conda create -n memvr python==3.10
conda activate memvr
pip install --upgrade pip
pip install -e .

After setting up, clone the repository from MemVR and move all contents to the main directory of LLaVA (except README.md).

LLaVA/
├── llava/
│ ├── eval/ # merge here in next step
│ ├── .../
├── eval_scripts/
│ ├── llava/
│ ├── qwen/
│ ├── glm/
├── memvr.py/
├── inference.py/
├── red_bird.jpg/
└── ...

Then merge the file eval to directory

/LLaVA/llava/eval/

Downloading Checkpoints

Under the main directory of LLaVA:

Download the checkpoint of LLaVA v1.5 here.
Download the checkpoint of Qwen-VL-Chat here. Replace the downloaded 'modeling_qwen.py' by modeling_qwen.py to enable MemVR on Qwen-VL-Chat model.
Download the checkpoint of glm-4v-9b here. Replace the downloaded 'modeling_chatglm.py' by modeling_chatglm.py to enable MemVR on glm-4v-9b model.

You may check if your environment works fine by running

python inference.py

Evaluation

Follow Evaluation.md in LLaVA to prepare for the benchmark materials. Additionally, we recommen you to use GPUs with no less than 40GB of VRAM. Test with these benchmarks by running

bash eval_scripts/llava/mme.sh

Please note that you may need to fill in your own OpenAI API-KEY for GPT-based evaluations like llavabench or MM-Vet.

Here are some tips of the parameters in the scripts:

    --retracing-ratio 0.12 \
    --entropy-threshold 0.75 \
    --starting-layer 5 \
    --ending-layer 16 \

Where

[retracing-ratio] refers to the percentage of visual_token to be retraced in certain layer. It has straightforward effect on the model's performance.
[entropy-threshold] defines the minimum layer-wide entropy that triggers visual information retracing.
[starting-layer] and [ending-layer] set the range of layers where visual information retracing is allowed.

🏅 Experiments

Figure 5. Results on MMBench. MemVR enhances comprehensive performance on diverse tasks.

📌 Examples

Figure 9. Visualization of uncertainty across layers without and with MemVR. MemVR effectively reduces uncertainty after the 8th layer, contributing to hallucination mitigations.

Figure 13: A case study comparing the levels of hallucination among various baselines.

Figure 10. A case study in long text generation. MemVR effectively mitigates hallucinations.

✏️ Citation

If you find this paper useful, please consider staring 🌟 this repo and citing 📑 our paper:

@article{zou2024memvr,
  title={Look Twice Before You Answer: Memory-Space Visual Retracing for Hallucination Mitigation in Multimodal Large Language Models}, 
  author={Xin Zou and Yizhou Wang and Yibo Yan and Sirui Huang and Kening Zheng and Junkai Chen and Chang Tang and Xuming Hu},
  journal={arxiv preprint arxiv:2410.03577},
  year={2024}
}

📝 Related Projects

OPERA: OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation
VCD: VCD: Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding
DoLa: DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models
Contrastive Decoding: Open-ended Text Generation as Optimization
GLM-4V: ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
LLaVA 1.5: Improved Baselines with Visual Instruction Tuning

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
assets		assets
eval		eval
eval_scripts		eval_scripts
images		images
modeling		modeling
LICENSE		LICENSE
README.md		README.md
inference.py		inference.py
memvr.py		memvr.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Look Twice Before You Answer: Memory-Space Visual Retracing for Hallucination Mitigation in Multimodal Large Language Models

If you like our project, please give us a star ⭐ on GitHub for the latest update.

📣 News

🎯 Overview

🕹️ Usage

Installation

Downloading Checkpoints

Evaluation

🏅 Experiments

📌 Examples

✏️ Citation

📝 Related Projects

Star History

About

Releases

Packages

Contributors 3

Languages

License

1zhou-Wang/MemVR

Folders and files

Latest commit

History

Repository files navigation

Look Twice Before You Answer: Memory-Space Visual Retracing for Hallucination Mitigation in Multimodal Large Language Models

If you like our project, please give us a star ⭐ on GitHub for the latest update.

📣 News

🎯 Overview

🕹️ Usage

Installation

Downloading Checkpoints

Evaluation

🏅 Experiments

📌 Examples

✏️ Citation

📝 Related Projects

Star History

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages