Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Llava 1.6 different responses in CLI and Server #5514

Closed
LiquidGunay opened this issue Feb 15, 2024 · 3 comments
Closed

Llava 1.6 different responses in CLI and Server #5514

LiquidGunay opened this issue Feb 15, 2024 · 3 comments

Comments

@LiquidGunay
Copy link

I gave the same image to llava-cli and llava hosted on the server. At temperature 0 and no other parameters both of them are giving different results (with the same prompt and also without any prompts). Is this intentional? If yes, then how can I get the same output as the cli from the server? CLI seems to consistently give better outputs than the server.

@cmp-nct
Copy link
Contributor

cmp-nct commented Feb 15, 2024

I've focused on providing the required API and functionality into llava.cpp and clip.cpp, llava-cli.cpp was used as demo tool.
I took a closer look at how server works and it implements the image processing (for multi images) so it will definitely need an update to work with llava-1.6.
It currently processes the image

From server.cpp

 bool process_images(llama_client_slot &slot) const
    {
        for (slot_image &img : slot.images)
        {
            if (!img.request_encode_image)
            {
                continue;
            }
            clip_image_f32_batch img_res_v;
            img_res_v.size = 0;
            img_res_v.data = nullptr;
            if (!clip_image_preprocess(clp_ctx, img.img_data, img_res_v))
            {
                LOG_TEE("Error processing the given image");
                clip_free(clp_ctx);
                clip_image_f32_batch_free(img_res_v);
                return false;
            }
            if (img_res_v.size == 0)
            {
                LOG_TEE("Error processing the given image");
                return false;
            }

            // note: assumes only one image was returned by clip_image_preprocess
            clip_image_f32 * img_res = img_res_v.data;

            img.image_tokens = clip_n_patches(clp_ctx);
            img.image_embedding = (float *)malloc(clip_embd_nbytes(clp_ctx));
            if (!img.image_embedding)
            {
                LOG_TEE("Unable to allocate memory for image embeddings\n");
                clip_image_f32_batch_free(img_res_v);
                clip_free(clp_ctx);
                return false;
            }
            LOG_TEE("slot %i - encoding image [id: %i]\n", slot.id, img.id);
            if (!clip_image_encode(clp_ctx, params.n_threads, img_res, img.image_embedding))
            {
                LOG_TEE("Unable to encode image\n");
                clip_image_f32_batch_free(img_res_v);
                return false;
            }

            clip_image_f32_batch_free(img_res_v);

            img.request_encode_image = false;
        }

        return slot.images.size() > 0;
    }

So it correctly processes the image, it will receive back multiple image embeddings but only uses the first one (like llava-1.5)
I don't know why this processing is in server.cpp, it should happen in llava.cpp.
In llava.cpp you have this:
static bool encode_image_with_clip(clip_ctx * ctx_clip, int n_threads, const clip_image_u8 * img, float * image_embd, int * n_img_pos);
This will handle the full llava-1.6 (or 1.5) processing and returns the embeddings and the number of "tokens".

That's what server.cpp should be using, otherwise it needs to have duplicate code for each architecture.

I'm sure someone can implement that quickly, an example on using that function is in llava_image_embed_make_with_clip_img() to load the image in u8 format or llava_image_embed_make_with_bytes() if the image comes as binary (jpg, etc)

@An0nie
Copy link
Contributor

An0nie commented Feb 15, 2024

The llava-cli has the llama system prompt hardcoded.
https://github.com/ggerganov/llama.cpp/blob/4524290e87b8e107cc2b56e1251751546f4b9051/examples/llava/llava-cli.cpp#L173

Try adding that as the system prompt to the server (with the API call, the command line parameter doesn't work right now)
And add "\nASSISTANT:" after your prompt, after that, the results should be identical.

@cjpais
Copy link
Contributor

cjpais commented Feb 20, 2024

should be fixed in the pr which was merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants