Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(server): infinite loop to inference #5485

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

snowyu
Copy link
Contributor

@snowyu snowyu commented Feb 13, 2024

Related Issue(maybe): #3969

@@ -1737,6 +1737,8 @@ struct llama_server_context
{
// if you get here, it means the KV cache is full - try increasing it via the context size
LOG_TEE("%s : failed to decode the batch, n_batch = %d, ret = %d\n", __func__, n_batch, ret);
LOG_ERROR("KV cache is full - try increasing it via the context size", {{"ctx-size", params.n_ctx}});
kv_cache_clear();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if it's a good idea to clear the KV here, because that mean we're dropping the context of what the model is talking about. Take this example:

Provided context length is 3 and my sentence has 4 tokens I can drink wine. By the time it process the word drink, the context is already full, then we clear KV. When the model process next word, the I can drink part is already drop, and the model only see wine.

Copy link
Collaborator

@ngxson ngxson Feb 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also provide steps to reproduce the problem? Seems like we'll run into the infinite loop if the prompt is bigger than context length.

In case of OpenAI API, if you pass a prompt longer than context length, it will return an error: https://community.openai.com/t/error-this-models-maximum-context-length-is-x-tokens/328860

Maybe we should also return an error here and terminate the task.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants