-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
server : Smart selection of available slot using Longest Common Prefix #7728
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to rebase to latest master
and we can merge
I'll test again and then mark the PR as ready for review. |
By the way, should this be on by default? Or is it better to leave it off as it is now? |
The LCS algorithm is an overkill for this purpose. All you need to look for is the longest common prefix, which is much simpler to compute |
As far as I know, the server can reuse not only the prompt prefix, but also the suffix ( llama.cpp/examples/server/server.cpp Lines 1819 to 1857 in 2b33896
|
Although the llama.cpp/examples/server/server.cpp Lines 2022 to 2025 in d5c938c
So for now the slot selection logic is better to follow the prompt caching logic and look just at the prefix |
In the current implementation, an available slot is selected using LRU (Least Recently Used). This PR adds slot selection by
LCS (Longest Common Substring)LCP (Longest Common Prefix) algorithm to select a slot with a prompt that has at leastn%
similarity to the requested prompt. This reduces prompt processing in multi-user scenarios.Additionally, this PR: