Self-Training Large Language and Vision Assistant for Medical

The advancement of medical image understanding and reasoning critically depends on building high-quality visual instruction data, which is costly and labor-intensive to obtain, particularly in the medical domain. To mitigate this data-starving issue, we introduce Self-Training Large Language and Vision Assistant for Medicine (STLLaVA-Med).

Self-Training Large Language and Vision Assistant for Medical Question-Answering [paper]

Guohao Sun, Can Qin, Huazhu Fu, Linwei Wang, Zhiqiang Tao

Medical data usage and performance comparision between LLaVA-Med and our method.

Self-training pipeline for transforming a general Vision-Language assistant to medical expert.

🔥 News

2024.09.20 We will release our checkpoints soon!
2024.09.20 🌟 Our paper has been accepted by EMNLP 2024 (main conference).
2024.06.10 🌟 Our paper and code was released!

Install

Install Package

conda create -n stllava python=3.10 -y
conda activate stllava
pip install --upgrade pip  # enable PEP 660 support
cd STLLaVA-Med
pip install -e .

Install additional packages for training cases

pip install -e ".[train]"
pip install flash-attn --no-build-isolation

Data

Visual instructional data

This project utilizes vision instructional data provided by LLaVA-Med 60k_inline_mention. However, due to disabled image URL, we fillterd out the origional data to ours own version in this project.

DPO data

DPO data example.

This project auto-generate the preference dataset using the model itself and guided by GPT-4o. We sample 10k medical images from PMC-15M. You may download the dataset via STLLaVA-Med-DPO.

Traininig

Training consists of two stages: (1) visual self-questioning instruction tuning stage, teaching the model to ask questions and follow multimodal instructions; (2) preference optimization.

Instruction tuning:

Training script with DeepSpeed ZeRO-3 and lora: sqllava_med.sh.

--mm_projector_type cluster: the prototype extractor & a two-layer MLP vision-language connector.
--mm_projector_type mlp2x_gelu: a two-layer MLP vision-language connector.
--vision_tower openai/clip-vit-large-patch14-336: CLIP ViT-L/14 336px.
--image_aspect_ratio pad: this pads the non-square images to square, instead of cropping them; it slightly reduces hallucination.
--version v1_sq: training for visual self-questioning.
--vit_lora_enable: optimize vision encoder using vit lora.

Preference optimization:

Training script with DeepSpeed ZeRO-3 and lora: dpo_finetune.sh.

--version v1: training for visual self-questioning.

Evaluation

Please download raw images of datasets (VQA-RAD, SLAKE, PVQA) for medical VQA tasks.

Evaluate models on a diverse set of 3 benchmarks. To ensure the reproducibility, we evaluate the models with greedy decoding. We do not evaluate using beam search to make the inference process consistent with the chat demo of real-time outputs.

Acknowledgement

SQ-LLaVA: the codebase we built upon.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
docs		docs
images		images
llava		llava
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
base_dpo_trainer.py		base_dpo_trainer.py
dpo_finetune.sh		dpo_finetune.sh
llama_flash_attn_monkey_patch.py		llama_flash_attn_monkey_patch.py
llava_dpo_trainer.py		llava_dpo_trainer.py
llava_trainer.py		llava_trainer.py
pyproject.toml		pyproject.toml
sqllava_med.sh		sqllava_med.sh
train.py		train.py
train_dpo.py		train_dpo.py
train_mem.py		train_mem.py
train_mem_dpo.py		train_mem_dpo.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Self-Training Large Language and Vision Assistant for Medical

🔥 News

Contents

Install

Data

Traininig

Instruction tuning:

Preference optimization:

Evaluation

Acknowledgement

About

Releases

Packages

Languages

License

heliossun/STLLaVA-Med

Folders and files

Latest commit

History

Repository files navigation

Self-Training Large Language and Vision Assistant for Medical

🔥 News

Contents

Install

Data

Traininig

Instruction tuning:

Preference optimization:

Evaluation

Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages