-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(router): drop requests when client closes the channel #202
Conversation
@OlivierDehaene I notice that you've incorporated much of #138 here. I had started attempting to break that into smaller PRs, but I guess there's probably no point in continuing that. I am still curious what you think about moving the stopping evaluation logic to the router as I had done in #138? |
30cf546
to
b908ca7
Compare
hey @njhill!
For now I'm not sure how that would work. I'm against supporting only Rust tokenizers so having the decoding logic in the server while having the stop logic in the router would be a bit strange. I want to investigate if it would be possible to spawn a Python interpreter in the router with PyO3 when we don't have a rust tokenizer. If its easy enough then we can move everything to the router. But I think right now the prio for this repo is to stabilize and go for a v1.0.0. |
Thanks @OlivierDehaene
Since you were against moving the detokenization to the router, I was just referring to the stopping criteria, I think that could be still be done there even if it's still strings being streamed back from the shards. But fair enough, could always reevaluate after these changes.
That would be cool. FWIW I haven't encountered any models with tokenizers that didn't work if converted (i.e. doing
Is this related to flash attention? I'd coincidentally also made some related changes that I was about to open a PR for. But I'll hold off until you merge this to avoid the churn. |
Yes but you need to handle the
I think THUDM/chatglm-6b is an example of such a model.
No, it's just a model that is under heavy load with requests that can take a while so some requests end up waiting in the queue for 10s of seconds. I want to make sure that once they go through, the client hasn't already timed out. |
6d5a5f0
to
766c0ab
Compare
766c0ab
to
3652d82
Compare
@njhill, I'm sorry I completely forgot to add you as co-author =/ |
@OlivierDehaene no worries. I have a few more things to contribute :) |
No description provided.