lmm.cpp

Inference of Large Multimodal Models in C/C++

Warning

This is still work in progress and not ready for anything.

Description

This repo implements LLaVA inference in C/C++ on top of clip.cpp and llama.cpp. Eventually, it will support inference of other large multimodal models, but LLaVA is chosen as a starting point.

Roadmap

Get rid of text model and other unnecessary artifacts in clip.cpp
Write the conversion script for LLaVA. Initially, it should be two-file format **one for the visual encoder and the other for LLaMA.
Come up with a way to support single-file format the includes the CLIP backbone, the multimodal projector and LLaMA weights together.
Support other models such as instructblip.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
llama.cpp @ 294f424		llama.cpp @ 294f424
.clang-format		.clang-format
.gitignore		.gitignore
.gitmodules		.gitmodules
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
clip.cpp		clip.cpp
clip.h		clip.h
format.sh		format.sh
main.cpp		main.cpp
stb_image.h		stb_image.h

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

lmm.cpp

Warning

Description

Roadmap

About

Releases

Sponsor this project

Packages

Languages

License

monatis/lmm.cpp

Folders and files

Latest commit

History

Repository files navigation

lmm.cpp

Warning

Description

Roadmap

About

Resources

License

Stars

Watchers

Forks

Releases

Sponsor this project

Packages 0

Languages

Packages