Llava 1.6 different responses in CLI and Server #5514

LiquidGunay · 2024-02-15T18:32:39Z

I gave the same image to llava-cli and llava hosted on the server. At temperature 0 and no other parameters both of them are giving different results (with the same prompt and also without any prompts). Is this intentional? If yes, then how can I get the same output as the cli from the server? CLI seems to consistently give better outputs than the server.

cmp-nct · 2024-02-15T19:49:14Z

I've focused on providing the required API and functionality into llava.cpp and clip.cpp, llava-cli.cpp was used as demo tool.
I took a closer look at how server works and it implements the image processing (for multi images) so it will definitely need an update to work with llava-1.6.
It currently processes the image

From server.cpp

 bool process_images(llama_client_slot &slot) const
    {
        for (slot_image &img : slot.images)
        {
            if (!img.request_encode_image)
            {
                continue;
            }
            clip_image_f32_batch img_res_v;
            img_res_v.size = 0;
            img_res_v.data = nullptr;
            if (!clip_image_preprocess(clp_ctx, img.img_data, img_res_v))
            {
                LOG_TEE("Error processing the given image");
                clip_free(clp_ctx);
                clip_image_f32_batch_free(img_res_v);
                return false;
            }
            if (img_res_v.size == 0)
            {
                LOG_TEE("Error processing the given image");
                return false;
            }

            // note: assumes only one image was returned by clip_image_preprocess
            clip_image_f32 * img_res = img_res_v.data;

            img.image_tokens = clip_n_patches(clp_ctx);
            img.image_embedding = (float *)malloc(clip_embd_nbytes(clp_ctx));
            if (!img.image_embedding)
            {
                LOG_TEE("Unable to allocate memory for image embeddings\n");
                clip_image_f32_batch_free(img_res_v);
                clip_free(clp_ctx);
                return false;
            }
            LOG_TEE("slot %i - encoding image [id: %i]\n", slot.id, img.id);
            if (!clip_image_encode(clp_ctx, params.n_threads, img_res, img.image_embedding))
            {
                LOG_TEE("Unable to encode image\n");
                clip_image_f32_batch_free(img_res_v);
                return false;
            }

            clip_image_f32_batch_free(img_res_v);

            img.request_encode_image = false;
        }

        return slot.images.size() > 0;
    }

So it correctly processes the image, it will receive back multiple image embeddings but only uses the first one (like llava-1.5)
I don't know why this processing is in server.cpp, it should happen in llava.cpp.
In llava.cpp you have this:
static bool encode_image_with_clip(clip_ctx * ctx_clip, int n_threads, const clip_image_u8 * img, float * image_embd, int * n_img_pos);
This will handle the full llava-1.6 (or 1.5) processing and returns the embeddings and the number of "tokens".

That's what server.cpp should be using, otherwise it needs to have duplicate code for each architecture.

I'm sure someone can implement that quickly, an example on using that function is in llava_image_embed_make_with_clip_img() to load the image in u8 format or llava_image_embed_make_with_bytes() if the image comes as binary (jpg, etc)

An0nie · 2024-02-15T22:54:30Z

The llava-cli has the llama system prompt hardcoded.
https://github.com/ggerganov/llama.cpp/blob/4524290e87b8e107cc2b56e1251751546f4b9051/examples/llava/llava-cli.cpp#L173

Try adding that as the system prompt to the server (with the API call, the command line parameter doesn't work right now)
And add "\nASSISTANT:" after your prompt, after that, the results should be identical.

cjpais · 2024-02-20T19:26:48Z

should be fixed in the pr which was merged

tctrautman mentioned this issue Feb 15, 2024

Llava 1.6: server not decoding images, but works via CLI #5515

Closed

cmp-nct mentioned this issue Feb 15, 2024

Llava 1.6 support #5267

Merged

cjpais mentioned this issue Feb 17, 2024

support llava 1.6 image embedding dimension in server #5553

Merged

LiquidGunay closed this as completed Feb 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Llava 1.6 different responses in CLI and Server #5514

Llava 1.6 different responses in CLI and Server #5514

LiquidGunay commented Feb 15, 2024

cmp-nct commented Feb 15, 2024 •

edited

Loading

An0nie commented Feb 15, 2024

cjpais commented Feb 20, 2024

Llava 1.6 different responses in CLI and Server #5514

Llava 1.6 different responses in CLI and Server #5514

Comments

LiquidGunay commented Feb 15, 2024

cmp-nct commented Feb 15, 2024 • edited Loading

An0nie commented Feb 15, 2024

cjpais commented Feb 20, 2024

cmp-nct commented Feb 15, 2024 •

edited

Loading