Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support llava 1.6 image embedding dimension in server #5553

Merged
merged 7 commits into from
Feb 20, 2024

Conversation

cjpais
Copy link
Contributor

@cjpais cjpais commented Feb 17, 2024

Should address #5514. I haven't tested extensively but the results for 1.6 are as follows. 1.5 seems to work fine from very brief testing.

Baseline

Command: ./llava-cli -ngl 99 -n 325 -c 4096 --temp 0 --mmproj ~/models/llava/1.6/llava-v1.6-mistral-7b/mmproj-model-f16.gguf -m ~/models/llava/1.6/llava-v1.6-mistral-7b/llava-v1.6-mistral-7b.Q5_K_M.gguf --image ~/Downloads/beach.jpg -p "describe the image in detail"

Result: The image shows a highway scene with a clear blue sky overhead. The road is lined with trees and appears to be in a rural or semi-rural area, as indicated by the presence of palm trees along the side. There are several vehicles on the road, including cars and trucks, suggesting that it's a busy time of day. The perspective of the image suggests it was taken from inside a vehicle traveling down the highway.

This PR
server-pr

Previous
main-result

Questions:

  • does it make sense to move clip_image_u8 into clip.h?

cc: @cmp-nct

@cjpais cjpais changed the title support llava 1.6 in server support llava 1.6 image embedding dimension in server Feb 17, 2024
@cjpais cjpais mentioned this pull request Feb 17, 2024
@cmp-nct
Copy link
Contributor

cmp-nct commented Feb 17, 2024

nice to see, regarding your question: those structs use vectors and clip.h is C style. That's why you've to include them manually

examples/llava/clip.h Outdated Show resolved Hide resolved
@ggerganov ggerganov merged commit 6560bed into ggerganov:master Feb 20, 2024
54 checks passed
@chigkim
Copy link

chigkim commented Feb 22, 2024

This is awesome! Thanks so much!
Quick question. Any idea why server returns prompt token as 1 token?
Server prints out image embedding created: 2880 tokens. However, API doesn't return the proper number of tokens. It returns the same info as print_timings info (log below).
It'd be great if the server returns the right amount of tokens in order to calculate what's left in the context window.
Thanks again!

slot 0 - loaded image
slot 0 is processing [task id: 0]
slot 0 : kv cache rm - [0, end)
encode_image_with_clip: 5 segments encoded in   593.69 ms
encode_image_with_clip: image embedding created: 2880 tokens

encode_image_with_clip: image encoded in   721.64 ms by CLIP (    0.25 ms per image patch)

print_timings: prompt eval time =    7283.15 ms /     1 tokens ( 7283.15 ms per token,     0.14 tokens per second)
print_timings:        eval time =   26579.15 ms /   321 runs   (   82.80 ms per token,    12.08 tokens per second)
print_timings:       total time =   33862.30 ms
slot 0 released (322 tokens in cache)

@cjpais
Copy link
Contributor Author

cjpais commented Feb 22, 2024

This is a great point, the print here is definitely wrong. From a quick peek at the code it looks like the same issue was in the previous version as well, though I haven't verified via testing. I think this should be fixed, I'll take a quick look for anything obvious

edit: From a quick look, one thing stands out in particular. It's that process_image is called after num_prompt_tokens_processed is set. So the number of tokens used by the image is certainly not getting updated properly for the log. There is a hack to just update the value in process_image but there is probably a more unified solution to this. May be something good to have testing for. Verifying the input token #'s match the expectation

I am not familiar enough with this value yet, so I'm not sure if it just affects the log or has bigger impact across the generation

@chigkim
Copy link

chigkim commented Mar 2, 2024

@cjpais thanks for looking into this. Maybe @cmp-nct or @ggerganov has more ideas on why llama.cpp reports number of prompt token as 1 when using image in the input, and how to fix it?

slot 0 - loaded image
slot 0 is processing [task id: 0]
slot 0 : kv cache rm - [0, end)
encode_image_with_clip: 5 segments encoded in   593.69 ms
encode_image_with_clip: image embedding created: 2880 tokens

encode_image_with_clip: image encoded in   721.64 ms by CLIP (    0.25 ms per image patch)

print_timings: prompt eval time =    7283.15 ms /     1 tokens ( 7283.15 ms per token,     0.14 tokens per second)
print_timings:        eval time =   26579.15 ms /   321 runs   (   82.80 ms per token,    12.08 tokens per second)
print_timings:       total time =   33862.30 ms
slot 0 released (322 tokens in cache)

@chigkim
Copy link

chigkim commented Mar 2, 2024

It seems like it's server only problem. Llava-cli seems to work.

From llava-cli:


llama_print_timings:    load time =   25007.57 ms
llama_print_timings:    sample time =    68.54 ms /   256 runs   (  0.27 ms per token,  3734.94 tokens per second)
llama_print_timings: prompt eval time =  421164.62 ms /  2902 tokens (  145.13 ms per token,   6.89 tokens per second)
llama_print_timings:    eval time =   66393.95 ms /   257 runs   (  258.34 ms per token,   3.87 tokens per second)
llama_print_timings:     total time =  511967.49 ms /  3159 tokens

From server through API:

{
	......
	"timings": {
		"predicted_ms": 57040.203,
		"predicted_n": 233,
		"predicted_per_second": 4.084838197367565,
		"predicted_per_token_ms": 244.8077381974249,
		"prompt_ms": 429987.864,
		"prompt_n": 1,
		"prompt_per_second": 0.0023256470326799734,
		"prompt_per_token_ms": 429987.864
	},
	"tokens_cached": 3129,
	"tokens_evaluated": 1,
	"tokens_predicted": 233,
	"truncated": false
}

From server console:

encode_image_with_clip: 5 segments encoded in 22462.62 ms
encode_image_with_clip: image embedding created: 2880 tokens

encode_image_with_clip: image encoded in 22495.54 ms by CLIP (    7.81 ms per image patch)
{"function":"print_timings","level":"INFO","line":260,"msg":"prompt eval time     =  429987.86 ms /     1 tokens (429987.86 ms per token,     0.00 tokens per second)","n_prompt_tokens_processed":1,"n_tokens_second":0.0023256470326799734,"slot_id":0,"t_prompt_processing":429987.864,"t_token":429987.864,"task_id":0,"tid":"8368","timestamp":1709356420}
{"function":"print_timings","level":"INFO","line":274,"msg":"generation eval time =   57040.20 ms /   233 runs   (  244.81 ms per token,     4.08 tokens per second)","n_decoded":233,"n_tokens_second":4.084838197367565,"slot_id":0,"t_token":244.8077381974249,"t_token_generation":57040.203,"task_id":0,"tid":"8368","timestamp":1709356420}
{"function":"print_timings","level":"INFO","line":283,"msg":"          total time =  487028.07 ms","slot_id":0,"t_prompt_processing":429987.864,"t_token_generation":57040.203,"t_total":487028.067,"task_id":0,"tid":"8368","timestamp":1709356420}
{"function":"update_slots","level":"INFO","line":1626,"msg":"slot released","n_cache_tokens":234,"n_ctx":4096,"n_past":3129,"n_system_tokens":0,"slot_id":0,"task_id":0,"tid":"8368","timestamp":1709356420,"truncated":false}
{"function":"log_server_request","level":"INFO","line":2693,"method":"POST","msg":"request","params":{},"path":"/completion","remote_addr":"127.0.0.1","remote_port":55351,"status":200,"tid":"7172","timestamp":1709356420}

@chigkim
Copy link

chigkim commented Mar 4, 2024

Created a separate issue.
#5863
Also, possibly same issue.
#5852

jordankanter pushed a commit to jordankanter/llama.cpp that referenced this pull request Mar 13, 2024
* server: init working 1.6

* move clip_image to header

* remove commented code

* remove c++ style from header

* remove todo

* expose llava_image_embed_make_with_clip_img

* fix zig build
hodlen pushed a commit to hodlen/llama.cpp that referenced this pull request Apr 1, 2024
* server: init working 1.6

* move clip_image to header

* remove commented code

* remove c++ style from header

* remove todo

* expose llava_image_embed_make_with_clip_img

* fix zig build
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants