-
Notifications
You must be signed in to change notification settings - Fork 9.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
server : simplify state machine for slot #9283
Conversation
Actually I got timeout error locally, due to the time it takes to download the model from internet (we use Phi-2 model for passkey test) |
Benchmark with master (with
PR:
|
@ggerganov FYI, this PR also fixed the point about
As you see on the benchmark above, |
Is it normal for the |
No, it should not. I re-ran the test and I updated the result, so (I was having some network errors during the initial run, so that's why it had less tokens) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, but I don't really know the server code well enough to review this.
According to 040fdde there was an address sanitizer issue. AFAICT the commit should not make a difference, except avoiding the copy of the task. Is the address sanitizer problem still present and how to reproduce? |
@ggerganov Thanks for pointing that out. You're right, indeed, there was a missing lock in Beside, I changed |
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* server : simplify state machine for slot * add SLOT_STATE_DONE_PROMPT * pop_deferred_task * add missing notify_one * fix passkey test * metrics : add n_busy_slots_per_decode * fix test step * add test * maybe fix AddressSanitizer? * fix deque ? * missing lock * pop_deferred_task: also notify * Update examples/server/server.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Currently,
state
of a slot is actually controlled by 2 variables:slot.state
(2 possible values) andslot.command
(3 possible values). This makes the total number of states become 2x3=6, but some of them are in fact invalid states.This PR aims to simplify state machine of slot by unifying them into 4 states.
The state machine can be represent using the graph below: