Skip to content

ssmisya/MHR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Mitigating Multilingual Hallucination in Large Vision-Language Models

This is the official repository for Multilingual Hallucination Removal (MHR), a straightforward yet notably effective approach aimed at alleviating multilingual hallucinations prevalent in Large Vision-Language Models (LVLMs).

🎯 Overview

MHR

  • We proposed the Multilingual Hallucination Removal (MHR) strategy, a straightforward yet profoundly effective framework for eliminating hallucinations across various languages.
  • Our Multilingual Hllucination Removal (MHR) framework comprises two stages, specifically Multilingual Supervised Fine-Tuning and Multilingual Direct Preference Optimization.

🕹️ Usage

Environment Setup

conda create -n mhr python=3.9
conda activate mhr
cd MHR
pip install -r requirements.txt
pip install -e .

Train

  1. Multilingual Supervised Fine-tuning:
    • 1.1 Prepare SFT data: PALO

    • 1.2 Train SFT on LVLM:

      SFT SCRIPTS
            PROMPT_VERSION=v1
            MODEL_VERSION=vicuna-v1-5-7b
            LM_MODEL_CKPT=lmsys/vicuna-7b-v1.5
      
            deepspeed mhr/alignment/models/llava_v1_5/train_sft.py \
                --deepspeed ./scripts/zero3.json \
                --model_name_or_path $LM_MODEL_CKPT \
                --version $PROMPT_VERSION \
                --data_path ${DATA_PATH} \
                --image_folder ${img_folder} \
                --vision_tower openai/clip-vit-large-patch14 \
                --pretrain_mm_mlp_adapter ${vision_tower_path} \
                --mm_vision_select_layer -2 \
                --mm_use_im_start_end False \
                --mm_use_im_patch_token False \
                --bf16 True \
                --output_dir ${output_dir}\
                --num_train_epochs 3 \
                --per_device_train_batch_size 16 \
                --per_device_eval_batch_size 16 \
                --gradient_accumulation_steps 1 \
                --evaluation_strategy "no" \
                --save_strategy "steps" \
                --save_steps 500 \
                --save_total_limit 1 \
                --learning_rate 2e-5 \
                --weight_decay 0. \
                --warmup_ratio 0.03 \
                --lr_scheduler_type "cosine" \
                --logging_steps 1 \
                --tf32 True \
                --model_max_length 1280 \
                --gradient_checkpointing True \
                --dataloader_num_workers 4 \
                --lazy_preprocess True \
                --report_to wandb \
                --image_aspect_ratio 'pad'
      
  2. Generate Preference Data Using Scripts under mhr/preprocess
    • 2.1 prepare hallucination-based English data.
    • A. For hallucination alignment or language alignment:
      • 2.2 sample LVLM response using lvlm_sampling.py
      • 2.3 calculate alignment score using calculate_PPL_score.py or desc_calculate_ppl_score.py
      • 2.4 extract DPO data using desc_extract_dpo_data.py or extract_dpo_data.py
    • B. For Translation alignment:
      • 2.2 Translate english hallucination preference dataset into other languages using translate.py
  3. Train on Preference Optimization
    • Train DPO on LVLM:

      DPO SCRIPTS
        accelerate launch --config_file=${accelerate_config_file}  ./train_dpo.py \
        --deepspeed ./scripts/deepspeed/zero3.json \
        --lora_enable True --lora_r 128 --lora_alpha 256 --mm_projector_lr 0 \
        --model_name_or_path ${model_name_or_path} \
        --version v1 \
        --vision_tower ${vision_tower_path} \
        --mm_projector_type mlp2x_gelu \
        --mm_vision_select_layer -2 \
        --mm_use_im_start_end False \
        --mm_use_im_patch_token False \
        --image_aspect_ratio pad \
        --group_by_modality_length True \
        --bf16 True \
        --output_dir ${ckpt_save_path} \
        --num_train_epochs 9 \
        --per_device_train_batch_size 8 \
        --per_device_eval_batch_size 4 \
        --gradient_accumulation_steps 1 \
        --evaluation_strategy "no" \
        --save_strategy "steps" \
        --save_steps ${save_steps} \
        --save_total_limit 5 \
        --learning_rate 2e-6 \
        --weight_decay 0. \
        --warmup_steps 0 \
        --lr_scheduler_type "cosine" \
        --logging_steps 1 \
        --tf32 True \
        --model_max_length 2048 \
        --gradient_checkpointing True \
        --report_to wandb \
        --run_name ${ckpt_name} \
        --dataloader_num_workers 4 \
        --lazy_preprocess True \
        --beta 0.1 \
        --hallucination_data_path ${hallucination_data} \
        --hallucination_data_type "dir_of_jsonl_desc" \
        --hallucination_ratio 1 \
        --preference_data_path ${preference_data} \
        --preference_ratio 1 \
        --preference_data_type "dir_of_jsonl_desc" \
        --translation_data_path ${translation_data} \
        --translation_ratio 1 \
        --translation_data_type "dir_of_json_desc" \
        --image_folder ${image_folder} \
        --vg_path ${vg_annotation_path} \
        --resume_from_checkpoint ${resume_from_checkpoint}
      
  4. Evaluation
    • We evaluate our method using lmms-eval, Please follow the instructions to add task and data to evaluate.

🏅 Experiments

  • MHR significantly mitigates the multilingual hallucination issue across different languages.

exp1

Table 1. Enhanced LLaVA 1.5 model Performances on POPE benchmark’s all 3 datasets. We select the “popular" type to test. Average scores of current partition are marked in gray and bold text denotes the best results of the same backbone

  • MHR gain remarkable performance on MME hallucination subset

exp2

Table 2. Results on the hallucination subset of MME. Higher scores indicate better performance and fewer hallucinations. The best performances within each setting are bolded. Limited by space, we only present 4 languages here, including high-resource languages ru and zh, and low-resource languages uk and bg. To help understand the overall performance comparison, we also report the average results for all 13 languages.

exp2

Figure 2. The performance on the full MME set, which consists of 14 tasks. Each graph displays the performance of the respective LLaVA-1.5 and our MHR model. Here we present results in four languages (uk, zh, bg, and ru) as outlined in Table 2.

  • Please refer to our paper for detailed experimental results.

📌 Examples

Case1

Figure 3. Illustration of hallucination removal by our proposed MHR with 7 languages as an example. We mark the hallucination part of response by Yellow and correctness by Green respectively.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published