support llava 1.6 image embedding dimension in server #5553

cjpais · 2024-02-17T20:14:34Z

Should address #5514. I haven't tested extensively but the results for 1.6 are as follows. 1.5 seems to work fine from very brief testing.

Baseline

Command: ./llava-cli -ngl 99 -n 325 -c 4096 --temp 0 --mmproj ~/models/llava/1.6/llava-v1.6-mistral-7b/mmproj-model-f16.gguf -m ~/models/llava/1.6/llava-v1.6-mistral-7b/llava-v1.6-mistral-7b.Q5_K_M.gguf --image ~/Downloads/beach.jpg -p "describe the image in detail"

Result: The image shows a highway scene with a clear blue sky overhead. The road is lined with trees and appears to be in a rural or semi-rural area, as indicated by the presence of palm trees along the side. There are several vehicles on the road, including cars and trucks, suggesting that it's a busy time of day. The perspective of the image suggests it was taken from inside a vehicle traveling down the highway.

This PR

Previous

Questions:

does it make sense to move clip_image_u8 into clip.h?

cc: @cmp-nct

cmp-nct · 2024-02-17T23:35:36Z

nice to see, regarding your question: those structs use vectors and clip.h is C style. That's why you've to include them manually

examples/llava/clip.h

chigkim · 2024-02-22T00:45:08Z

This is awesome! Thanks so much!
Quick question. Any idea why server returns prompt token as 1 token?
Server prints out image embedding created: 2880 tokens. However, API doesn't return the proper number of tokens. It returns the same info as print_timings info (log below).
It'd be great if the server returns the right amount of tokens in order to calculate what's left in the context window.
Thanks again!

slot 0 - loaded image
slot 0 is processing [task id: 0]
slot 0 : kv cache rm - [0, end)
encode_image_with_clip: 5 segments encoded in   593.69 ms
encode_image_with_clip: image embedding created: 2880 tokens

encode_image_with_clip: image encoded in   721.64 ms by CLIP (    0.25 ms per image patch)

print_timings: prompt eval time =    7283.15 ms /     1 tokens ( 7283.15 ms per token,     0.14 tokens per second)
print_timings:        eval time =   26579.15 ms /   321 runs   (   82.80 ms per token,    12.08 tokens per second)
print_timings:       total time =   33862.30 ms
slot 0 released (322 tokens in cache)

cjpais · 2024-02-22T15:53:40Z

This is a great point, the print here is definitely wrong. From a quick peek at the code it looks like the same issue was in the previous version as well, though I haven't verified via testing. I think this should be fixed, I'll take a quick look for anything obvious

edit: From a quick look, one thing stands out in particular. It's that process_image is called after num_prompt_tokens_processed is set. So the number of tokens used by the image is certainly not getting updated properly for the log. There is a hack to just update the value in process_image but there is probably a more unified solution to this. May be something good to have testing for. Verifying the input token #'s match the expectation

I am not familiar enough with this value yet, so I'm not sure if it just affects the log or has bigger impact across the generation

chigkim · 2024-03-02T03:50:07Z

@cjpais thanks for looking into this. Maybe @cmp-nct or @ggerganov has more ideas on why llama.cpp reports number of prompt token as 1 when using image in the input, and how to fix it?

slot 0 - loaded image
slot 0 is processing [task id: 0]
slot 0 : kv cache rm - [0, end)
encode_image_with_clip: 5 segments encoded in   593.69 ms
encode_image_with_clip: image embedding created: 2880 tokens

encode_image_with_clip: image encoded in   721.64 ms by CLIP (    0.25 ms per image patch)

print_timings: prompt eval time =    7283.15 ms /     1 tokens ( 7283.15 ms per token,     0.14 tokens per second)
print_timings:        eval time =   26579.15 ms /   321 runs   (   82.80 ms per token,    12.08 tokens per second)
print_timings:       total time =   33862.30 ms
slot 0 released (322 tokens in cache)

chigkim · 2024-03-02T05:32:14Z

It seems like it's server only problem. Llava-cli seems to work.

From llava-cli:


llama_print_timings:    load time =   25007.57 ms
llama_print_timings:    sample time =    68.54 ms /   256 runs   (  0.27 ms per token,  3734.94 tokens per second)
llama_print_timings: prompt eval time =  421164.62 ms /  2902 tokens (  145.13 ms per token,   6.89 tokens per second)
llama_print_timings:    eval time =   66393.95 ms /   257 runs   (  258.34 ms per token,   3.87 tokens per second)
llama_print_timings:     total time =  511967.49 ms /  3159 tokens

From server through API:

{
	......
	"timings": {
		"predicted_ms": 57040.203,
		"predicted_n": 233,
		"predicted_per_second": 4.084838197367565,
		"predicted_per_token_ms": 244.8077381974249,
		"prompt_ms": 429987.864,
		"prompt_n": 1,
		"prompt_per_second": 0.0023256470326799734,
		"prompt_per_token_ms": 429987.864
	},
	"tokens_cached": 3129,
	"tokens_evaluated": 1,
	"tokens_predicted": 233,
	"truncated": false
}

From server console:

encode_image_with_clip: 5 segments encoded in 22462.62 ms
encode_image_with_clip: image embedding created: 2880 tokens

encode_image_with_clip: image encoded in 22495.54 ms by CLIP (    7.81 ms per image patch)
{"function":"print_timings","level":"INFO","line":260,"msg":"prompt eval time     =  429987.86 ms /     1 tokens (429987.86 ms per token,     0.00 tokens per second)","n_prompt_tokens_processed":1,"n_tokens_second":0.0023256470326799734,"slot_id":0,"t_prompt_processing":429987.864,"t_token":429987.864,"task_id":0,"tid":"8368","timestamp":1709356420}
{"function":"print_timings","level":"INFO","line":274,"msg":"generation eval time =   57040.20 ms /   233 runs   (  244.81 ms per token,     4.08 tokens per second)","n_decoded":233,"n_tokens_second":4.084838197367565,"slot_id":0,"t_token":244.8077381974249,"t_token_generation":57040.203,"task_id":0,"tid":"8368","timestamp":1709356420}
{"function":"print_timings","level":"INFO","line":283,"msg":"          total time =  487028.07 ms","slot_id":0,"t_prompt_processing":429987.864,"t_token_generation":57040.203,"t_total":487028.067,"task_id":0,"tid":"8368","timestamp":1709356420}
{"function":"update_slots","level":"INFO","line":1626,"msg":"slot released","n_cache_tokens":234,"n_ctx":4096,"n_past":3129,"n_system_tokens":0,"slot_id":0,"task_id":0,"tid":"8368","timestamp":1709356420,"truncated":false}
{"function":"log_server_request","level":"INFO","line":2693,"method":"POST","msg":"request","params":{},"path":"/completion","remote_addr":"127.0.0.1","remote_port":55351,"status":200,"tid":"7172","timestamp":1709356420}

chigkim · 2024-03-04T12:30:07Z

Created a separate issue.
#5863
Also, possibly same issue.
#5852

* server: init working 1.6 * move clip_image to header * remove commented code * remove c++ style from header * remove todo * expose llava_image_embed_make_with_clip_img * fix zig build

server: init working 1.6

8354b7d

cjpais changed the title ~~support llava 1.6 in server~~ support llava 1.6 image embedding dimension in server Feb 17, 2024

cjpais mentioned this pull request Feb 17, 2024

Llava 1.6 support #5267

Merged

cjpais added 2 commits February 17, 2024 12:40

move clip_image to header

6d9fea6

remove commented code

2f2973d

ggerganov requested changes Feb 18, 2024

View reviewed changes

examples/llava/clip.h Outdated Show resolved Hide resolved

cjpais added 4 commits February 18, 2024 13:25

remove c++ style from header

e801037

remove todo

fb16cc9

expose llava_image_embed_make_with_clip_img

0ffba49

fix zig build

ad60bec

cjpais requested a review from ggerganov February 20, 2024 16:12

ggerganov approved these changes Feb 20, 2024

View reviewed changes

ggerganov merged commit 6560bed into ggerganov:master Feb 20, 2024
54 checks passed

ikros98 mentioned this pull request Feb 21, 2024

Llava 1.6 34b broken responses from server #5637

Closed

chigkim mentioned this pull request Feb 27, 2024

Please update Llama.cpp Server. It now Supports Llava 1.6 Image Embedding Dimension properly. ollama/ollama#2795

Closed

chigkim mentioned this pull request Mar 14, 2024

LLaVA 1.6 Models Unable to Process Specific Image Size and Resolution Locally ollama/ollama#2429

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support llava 1.6 image embedding dimension in server #5553

support llava 1.6 image embedding dimension in server #5553

cjpais commented Feb 17, 2024 •

edited

Loading

cmp-nct commented Feb 17, 2024

chigkim commented Feb 22, 2024 •

edited

Loading

cjpais commented Feb 22, 2024 •

edited

Loading

chigkim commented Mar 2, 2024 •

edited

Loading

chigkim commented Mar 2, 2024

chigkim commented Mar 4, 2024

support llava 1.6 image embedding dimension in server #5553

support llava 1.6 image embedding dimension in server #5553

Conversation

cjpais commented Feb 17, 2024 • edited Loading

cmp-nct commented Feb 17, 2024

chigkim commented Feb 22, 2024 • edited Loading

cjpais commented Feb 22, 2024 • edited Loading

chigkim commented Mar 2, 2024 • edited Loading

chigkim commented Mar 2, 2024

chigkim commented Mar 4, 2024

cjpais commented Feb 17, 2024 •

edited

Loading

chigkim commented Feb 22, 2024 •

edited

Loading

cjpais commented Feb 22, 2024 •

edited

Loading

chigkim commented Mar 2, 2024 •

edited

Loading