Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(router): drop requests when client closes the channel #202

Merged
merged 9 commits into from
Apr 20, 2023

Conversation

OlivierDehaene
Copy link
Member

No description provided.

@njhill
Copy link
Contributor

njhill commented Apr 19, 2023

@OlivierDehaene I notice that you've incorporated much of #138 here. I had started attempting to break that into smaller PRs, but I guess there's probably no point in continuing that.

I am still curious what you think about moving the stopping evaluation logic to the router as I had done in #138?

@OlivierDehaene
Copy link
Member Author

hey @njhill!
Sorry that you were porting #138 to a smaller PR at the same time =/
I need this garbage collection logic asap for one of our deployment.
I'm adding you on this pr as co-author.

I am still curious what you think about moving the stopping evaluation logic to the router as I had done in #138?

For now I'm not sure how that would work. I'm against supporting only Rust tokenizers so having the decoding logic in the server while having the stop logic in the router would be a bit strange.

I want to investigate if it would be possible to spawn a Python interpreter in the router with PyO3 when we don't have a rust tokenizer. If its easy enough then we can move everything to the router.

But I think right now the prio for this repo is to stabilize and go for a v1.0.0.

@njhill
Copy link
Contributor

njhill commented Apr 19, 2023

Thanks @OlivierDehaene

For now I'm not sure how that would work.

Since you were against moving the detokenization to the router, I was just referring to the stopping criteria, I think that could be still be done there even if it's still strings being streamed back from the shards. But fair enough, could always reevaluate after these changes.

I want to investigate if it would be possible to spawn a Python interpreter in the router with PyO3 when we don't have a rust tokenizer.

That would be cool. FWIW I haven't encountered any models with tokenizers that didn't work if converted (i.e. doing AutoTokenizer.from_pretrained() followed by .save_pretrained()).

I need this garbage collection logic asap for one of our deployment.

Is this related to flash attention? I'd coincidentally also made some related changes that I was about to open a PR for. But I'll hold off until you merge this to avoid the churn.

@OlivierDehaene
Copy link
Member Author

OlivierDehaene commented Apr 19, 2023

I was just referring to the stopping criteria.

Yes but you need to handle the generated_text in the final token payload. Today it acts as a ground truth for some users when the streaming method does not work perfectly so it's required to continue decoding it independently with the full sequence of ids and we can only to that in the server. So it will require a back and forth between the router and the server, right?

FWIW I haven't encountered any models with tokenizers that didn't work if converted (i.e. doing AutoTokenizer.from_pretrained() followed by .save_pretrained()).

I think THUDM/chatglm-6b is an example of such a model.

Is this related to flash attention?

No, it's just a model that is under heavy load with requests that can take a while so some requests end up waiting in the queue for 10s of seconds. I want to make sure that once they go through, the client hasn't already timed out.

@OlivierDehaene OlivierDehaene merged commit 709d893 into main Apr 20, 2023
@OlivierDehaene OlivierDehaene deleted the feat/time_limit branch April 20, 2023 09:07
@OlivierDehaene
Copy link
Member Author

@njhill, I'm sorry I completely forgot to add you as co-author =/
I wish github had a better way of doing this...

@njhill
Copy link
Contributor

njhill commented Apr 20, 2023

@OlivierDehaene no worries. I have a few more things to contribute :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants