-
Notifications
You must be signed in to change notification settings - Fork 588
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cleanup codebase: removed unnecessary code/logic #298
Conversation
@merrymercy @hnyls2002 PR tested with no performance regression. Ready for review/merge. |
ca73c5f
to
e56ce1b
Compare
Latest commit removed redundant math: prompt_tokens = len(req.input_ids). So adding len(req.input_ids) - prompt_tokens is always 0. |
Latest commit : 9472363 Removed unused ReqState.lock |
|
||
await asyncio.sleep(0.0006) | ||
if not slept: | ||
await asyncio.sleep(0.0006) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Did you find this give better performance?
- @hnyls2002 please also take a look
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- In my limited testing with low concurrency (1-2 concurrent requests with range 1-10 batches each) I did not find performance diff.
But this is unnecessary code/logic. There is no need to sleep for another 0.6ms (causing potential thread context switch) when it has slept for 30ms in earlier condition extend_dependency_time: float = 0.03
.
9472363
to
c68572f
Compare
c68572f
to
ad6410b
Compare
ad6410b
to
431cbb8
Compare
Fixed two issues in router loop:
Cleanup:
self.eos_token_id
in modle_rpcEDIT:
Reverted due to review: (removed useless math for calculating # of completion_tokens)