use n_threads param to call _embed_image_bytes fun #1834

KenForever1 · 2024-11-16T13:37:38Z

Hello, when I run openbmb/MiniCPM-V-2_6-gguf model, I find llama-cpp-python as a server is slower than llama_cpp's example of minicpmv-cli.
I find the diffrerence is llama-cpp-python's _embed_image_bytes func is called with param of n_threads_batch. But llama-cpp's example of minicpmv-cli use n_threads (which value is cpu_cores / 2), when call llava_image_embed_make_with_bytes func. The param n_threads make image process more efficient and less time-consuming.

For example, on my CPU (56 cores), it takes more than three times the time.

This parameter affects the time consumption function as follows:

bool clip_image_batch_encode(clip_ctx * ctx, const int n_threads, const clip_image_f32_batch * imgs, float * vec) {
    ggml_backend_graph_compute(ctx->backend, gf);
......
}

Best wishes.

… example

use n_threads param to call _embed_image_bytes fun, same to cpp llava…

31a0f38

… example

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use n_threads param to call _embed_image_bytes fun #1834

use n_threads param to call _embed_image_bytes fun #1834

KenForever1 commented Nov 16, 2024

use n_threads param to call _embed_image_bytes fun #1834

Are you sure you want to change the base?

use n_threads param to call _embed_image_bytes fun #1834

Conversation

KenForever1 commented Nov 16, 2024