GitHub - fangyuan-ksgk/Mini-LLaVA: A minimal implementation of LLaVA-style VLM with interleaved image & text & video processing ability.

Mini-LLaVA handles text, image and video inputs.

Welcome to Mini-LLaVA – a minimal and seamless implementation of LLaVA-style vision language model, which unlock multimodal ability of a Large Language Model (based on Llama-3.1) with just a single GPU.

This project goes above and beyond the original by introducing powerful support for interleaved processing of multiple input types—including images, videos, and text—all respecting their order of appearance. Whether you're handling complex visual-textual correlations or want seamless transitions between media formats, Mini-LLaVA has you covered with minimal code and maximum flexibility.

🔥 Updates

[09/2024] [Minimal Implementation] Tutorial in Mini_LLaVA.ipynb showing how a pre-trained adaptor could helps Llama3.1 to see.

💡 Features

Minimal Code Structure: Transform a language model (Llama 3.1) into a powerful vision-language model with minimal, easy-to-understand code.
Simplified Implementation: Our code is significantly simpler than the original LLaVA implementation, making it easier to dive into and build upon.
Extended Functionality: We've added support for interleaved processing of images, videos, and text, giving you more flexibility and power.

🚧 TODO

Fine-tune on language decoder
Audio modality
Retrieval modality
Benchmark inference test

Environment Set-up

run set.sh

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
data		data
mini_llava		mini_llava
.DS_Store		.DS_Store
LICENSE		LICENSE
Mini_LLaVA.ipynb		Mini_LLaVA.ipynb
README.md		README.md
set.sh		set.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🔥 Updates

💡 Features

🚧 TODO

Environment Set-up

About

Releases

Packages

Languages

License

fangyuan-ksgk/Mini-LLaVA

Folders and files

Latest commit

History

Repository files navigation

🔥 Updates

💡 Features

🚧 TODO

Environment Set-up

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages