-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support llava 1.6 image embedding dimension in server #5553
Conversation
nice to see, regarding your question: those structs use vectors and clip.h is C style. That's why you've to include them manually |
This is awesome! Thanks so much!
|
This is a great point, the print here is definitely wrong. From a quick peek at the code it looks like the same issue was in the previous version as well, though I haven't verified via testing. I think this should be fixed, I'll take a quick look for anything obvious edit: From a quick look, one thing stands out in particular. It's that I am not familiar enough with this value yet, so I'm not sure if it just affects the log or has bigger impact across the generation |
@cjpais thanks for looking into this. Maybe @cmp-nct or @ggerganov has more ideas on why llama.cpp reports number of prompt token as 1 when using image in the input, and how to fix it?
|
It seems like it's server only problem. Llava-cli seems to work. From llava-cli:
From server through API:
From server console:
|
* server: init working 1.6 * move clip_image to header * remove commented code * remove c++ style from header * remove todo * expose llava_image_embed_make_with_clip_img * fix zig build
* server: init working 1.6 * move clip_image to header * remove commented code * remove c++ style from header * remove todo * expose llava_image_embed_make_with_clip_img * fix zig build
Should address #5514. I haven't tested extensively but the results for 1.6 are as follows. 1.5 seems to work fine from very brief testing.
Baseline
Command:
./llava-cli -ngl 99 -n 325 -c 4096 --temp 0 --mmproj ~/models/llava/1.6/llava-v1.6-mistral-7b/mmproj-model-f16.gguf -m ~/models/llava/1.6/llava-v1.6-mistral-7b/llava-v1.6-mistral-7b.Q5_K_M.gguf --image ~/Downloads/beach.jpg -p "describe the image in detail"
Result:
The image shows a highway scene with a clear blue sky overhead. The road is lined with trees and appears to be in a rural or semi-rural area, as indicated by the presence of palm trees along the side. There are several vehicles on the road, including cars and trucks, suggesting that it's a busy time of day. The perspective of the image suggests it was taken from inside a vehicle traveling down the highway.
This PR
Previous
Questions:
clip_image_u8
intoclip.h
?cc: @cmp-nct