fix(server): infinite loop to inference #5485

snowyu · 2024-02-13T23:44:38Z

Related Issue(maybe): #3969

ngxson · 2024-02-14T10:44:04Z

examples/server/server.cpp

@@ -1737,6 +1737,8 @@ struct llama_server_context
                {
                    // if you get here, it means the KV cache is full - try increasing it via the context size
                    LOG_TEE("%s : failed to decode the batch, n_batch = %d, ret = %d\n", __func__, n_batch, ret);
+                    LOG_ERROR("KV cache is full - try increasing it via the context size", {{"ctx-size", params.n_ctx}});
+                    kv_cache_clear();


I'm not sure if it's a good idea to clear the KV here, because that mean we're dropping the context of what the model is talking about. Take this example:

Provided context length is 3 and my sentence has 4 tokens I can drink wine. By the time it process the word drink, the context is already full, then we clear KV. When the model process next word, the I can drink part is already drop, and the model only see wine.

Can you also provide steps to reproduce the problem? Seems like we'll run into the infinite loop if the prompt is bigger than context length.

In case of OpenAI API, if you pass a prompt longer than context length, it will return an error: https://community.openai.com/t/error-this-models-maximum-context-length-is-x-tokens/328860

Maybe we should also return an error here and terminate the task.

fix(server): infinite loop to inference

4447e95

ngxson reviewed Feb 14, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(server): infinite loop to inference #5485

fix(server): infinite loop to inference #5485

snowyu commented Feb 13, 2024

ngxson Feb 14, 2024

ngxson Feb 14, 2024 •

edited

Loading

fix(server): infinite loop to inference #5485

Are you sure you want to change the base?

fix(server): infinite loop to inference #5485

Conversation

snowyu commented Feb 13, 2024

ngxson Feb 14, 2024

Choose a reason for hiding this comment

ngxson Feb 14, 2024 • edited Loading

Choose a reason for hiding this comment

ngxson Feb 14, 2024 •

edited

Loading