use n_threads param to call _embed_image_bytes fun #1834
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hello, when I run openbmb/MiniCPM-V-2_6-gguf model, I find llama-cpp-python as a server is slower than llama_cpp's example of minicpmv-cli.
I find the diffrerence is llama-cpp-python's _embed_image_bytes func is called with param of n_threads_batch. But llama-cpp's example of minicpmv-cli use n_threads (which value is cpu_cores / 2), when call llava_image_embed_make_with_bytes func. The param n_threads make image process more efficient and less time-consuming.
For example, on my CPU (56 cores), it takes more than three times the time.
This parameter affects the time consumption function as follows:
Best wishes.