Transferable Visual Prompting for Multimodal Large Language Models

Installation

Create the virtual environment for the project.

cd Transferable_VP_MLLM
conda create -n transvp python=3.11
pip install -r requirements.txt

Prepare the model weights

Put the model weights under ./model_weights

MiniGPT-4: Follow MiniGPT-4 and prepare the MiniGPT-4-Vicuna-V0-7B
InstructBLIP: Follow LAVIS and prepare the InstructBLIP-Vicuna-7b-v1.1
BLIP2: Follow LAVIS and prepare the BLIP2-FlanT5-xl
VPGTrans: Follow MiniGPT-4 and prepare Vicuna-v0-7B as LLM
BLIVA: Follow BLIVA and prepare BLIVA-Vicuna-7B
VisualGLM-6B: No special operation needed.

To Reproduce Reproduced Results

On CIFAR10

python transfer_cls.py --dataset cifar10 --model_name minigpt-4 --target_models instructblip blip2 --learning_rate 10 --fca 0.005 --tse 0.001 --epochs 1

Inference with a model Specify the path to checkpoint if you want to evaluate on the dataset with trained prompt. A reproducible checkpoint is placed in save/checkpoint_best.pth.

python transfer_cls.py --dataset cifar10 --model_name minigpt-4 --evaluate --checkpoint $PATH_TO_PROMPT

Bibtex

If you find this work helpful, please cite it with the bibtex below.

@InProceedings{Zhang_2024_CVPR,
    author    = {Zhang, Yichi and Dong, Yinpeng and Zhang, Siyuan and Min, Tianzan and Su, Hang and Zhu, Jun},
    title     = {Exploring the Transferability of Visual Prompting for Multimodal Large Language Models},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2024},
    pages     = {26562-26572}
}

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
models		models
save		save
util		util
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
dataset.py		dataset.py
requirements.txt		requirements.txt
transfer_cls.py		transfer_cls.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Transferable Visual Prompting for Multimodal Large Language Models

Installation

To Reproduce Reproduced Results

Bibtex

About

Releases

Packages

Languages

License

zycheiheihei/Transferable-Visual-Prompting

Folders and files

Latest commit

History

Repository files navigation

Transferable Visual Prompting for Multimodal Large Language Models

Installation

To Reproduce Reproduced Results

Bibtex

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages