Skip to content

[CVPR2024 Highlight] Official implementation for Transferable Visual Prompting. The paper "Exploring the Transferability of Visual Prompting for Multimodal Large Language Models" has been accepted in CVPR2024.

License

Notifications You must be signed in to change notification settings

zycheiheihei/Transferable-Visual-Prompting

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Transferable Visual Prompting for Multimodal Large Language Models

Installation

  1. Create the virtual environment for the project.
cd Transferable_VP_MLLM
conda create -n transvp python=3.11
pip install -r requirements.txt
  1. Prepare the model weights

Put the model weights under ./model_weights

  • MiniGPT-4: Follow MiniGPT-4 and prepare the MiniGPT-4-Vicuna-V0-7B
  • InstructBLIP: Follow LAVIS and prepare the InstructBLIP-Vicuna-7b-v1.1
  • BLIP2: Follow LAVIS and prepare the BLIP2-FlanT5-xl
  • VPGTrans: Follow MiniGPT-4 and prepare Vicuna-v0-7B as LLM
  • BLIVA: Follow BLIVA and prepare BLIVA-Vicuna-7B
  • VisualGLM-6B: No special operation needed.

To Reproduce Reproduced Results

  1. On CIFAR10
python transfer_cls.py --dataset cifar10 --model_name minigpt-4 --target_models instructblip blip2 --learning_rate 10 --fca 0.005 --tse 0.001 --epochs 1
  1. Inference with a model Specify the path to checkpoint if you want to evaluate on the dataset with trained prompt. A reproducible checkpoint is placed in save/checkpoint_best.pth.
python transfer_cls.py --dataset cifar10 --model_name minigpt-4 --evaluate --checkpoint $PATH_TO_PROMPT

Bibtex

If you find this work helpful, please cite it with the bibtex below.

@InProceedings{Zhang_2024_CVPR,
    author    = {Zhang, Yichi and Dong, Yinpeng and Zhang, Siyuan and Min, Tianzan and Su, Hang and Zhu, Jun},
    title     = {Exploring the Transferability of Visual Prompting for Multimodal Large Language Models},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2024},
    pages     = {26562-26572}
}

About

[CVPR2024 Highlight] Official implementation for Transferable Visual Prompting. The paper "Exploring the Transferability of Visual Prompting for Multimodal Large Language Models" has been accepted in CVPR2024.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages