-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missing LLava 1.6 support for handling custom templates with the respect of the chosen LLM. #1301
Comments
Started here #1147 but got sidetracked. |
Would love to see full support for LLaVA 1.6 in this project. |
Definitely, there will be new, ground-breaking LLaVA models coming this month, fine-tuned on Llama-3. It would be great to run them quantized in GGUF using this cpp-python library. |
Coming soon in #1147 , already added llava1.6, obsidian, and moondream support using the new system. |
we appreciate the addition of LLaVaV1.6 34B support would be great to have a support for smaller 7B quants and projectors, or at least a single cjpais/llava-1.6-mistral-7b-gguf. that would be truly awesome! |
@Vinventive I wasn't aware there were differences in the chat formats, do you mind sharing a link and I'll add that right away, cheers! |
Here is the link: ggml-org/llama.cpp#5267
|
really struggling to run LLaVA with CUDA instead of cuBLAS and I was wondering if it's just an isolated issue, I've seen other open issue where people are running into similar problems #1393 maybe we're doing something incorrectly, or there is a missing info/step in the readme how to run it on Windows 64-bit? |
I'm using the
Llava15ChatHandler
but it seems I don't see anything forLlava16ChatHandler
by looking at the source code? Moreover, it contains hard-coded templating instead of having support for custom in-model given the prop-value from the metadatatokenizer.chat_template
by Nous Hermes 2 Yi 34B for example (Link) which is quite different from the hard-coded one? Any plans for that? Is LLava 1.6 really supported or should I fallback to the parent project?Update: Seems using the current codebase state, I can get fairly okay results during inference but not sure if there might be some regression, need to check the original and compare.
The text was updated successfully, but these errors were encountered: