GitHub - FreedomIntelligence/LongLLaVA: LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture

📃 Paper • 🌐 Demo • 🤗 LongLLaVA-53B-A13B • 🤗 LongLLaVA-9B

🌈 Update

[2024.09.05] LongLLaVA repo is published！🎉
[2024.10.12] LongLLaVA-53B-A13B, LongLLaVA-9b and Jamba-9B-Instruct are repleased！🎉

Architecture

Click to view the architecture image

Results

Click to view the Results

Main Results
Diagnostic Results
Video-NIAH

Results reproduction

1. Environment Setup

pip install -r requirements.txt

2. Data DownLoad and Construction

Dataset Taxonomy

Dataset DownLoading and Construction

Coming Soon.

3. Training

Downloading Language Models

🤗 Jamba-9B-Instruct
Stage I: Single-image Alignment.
```
bash Align.sh
```
Stage II: Single-image Instruction-tuning.
```
bash SingleImageSFT.sh
```
Stage III: Multi-image Instruction-tuning.
```
bash MultiImageSFT.sh
```

4. Evaluation

Command Line Interface

python cli.py --model_dir path-to-longllava

Model Inference

query = 'What does the picture show?'
image_paths = ['image_path1'] # image or video path

from cli import Chatbot
bot = Chatbot(path-to-longllava)
output = bot.chat(query, image_paths)
print(output) # Prints the output of the model

Benchmarks

python Eval.sh

5. Reproduce other results in Paper

FLOPs

python /utils/cal_flops.py

Prefill Time & Throughput & GPU Memory Usage

python ./benchmarks/Efficiency/evaluate.py
python ./benchmarks/Efficiency/evaluatevllm.py

DownCycling To Transfer Jamba-MoE to Dense

python ./utils/dense_downcycling.py

TO DO

Release Data Construction Code

Acknowledgement

LLaVA: Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Citation

@misc{wang2024longllavascalingmultimodalllms,
      title={LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture}, 
      author={Xidong Wang and Dingjie Song and Shunian Chen and Chen Zhang and Benyou Wang},
      year={2024},
      eprint={2409.02889},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2409.02889}, 
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🌈 Update

Architecture

Results

Results reproduction

1. Environment Setup

2. Data DownLoad and Construction

3. Training

4. Evaluation

5. Reproduce other results in Paper

TO DO

Acknowledgement

Citation

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
assets		assets
benchmarks		benchmarks
data		data
llava		llava
scripts		scripts
utils		utils
Align.sh		Align.sh
Eval.sh		Eval.sh
MultiImageSFT.sh		MultiImageSFT.sh
README.md		README.md
SingleImageSFT.sh		SingleImageSFT.sh
cli.py		cli.py
requirements.txt		requirements.txt

FreedomIntelligence/LongLLaVA

Folders and files

Latest commit

History

Repository files navigation

🌈 Update

Architecture

Results

Results reproduction

1. Environment Setup

2. Data DownLoad and Construction

3. Training

4. Evaluation

5. Reproduce other results in Paper

TO DO

Acknowledgement

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages