-
-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[core] separate builder init and builder prepare for each batch #12253
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: youkaichao <youkaichao@gmail.com>
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
cc @elfiegg @pavanimajety for FlashInfer 2.0 upgrade |
Thanks Kaichao! Nice change, LGTM. |
self.runner = input_builder.runner | ||
|
||
self.sliding_window = input_builder.sliding_window | ||
self.block_size = input_builder.block_size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably need sm_scale too, but can go with the Flashinfer PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, this PR just makes it easier to add more values in the builder. we can add them in the flashinfer pr.
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Right now we create the builder instance for every batch, and hence create the attention metadata builder instance for every batch, but we don't have some global information when we create the input for every batch.
When upgrading to flashinfer 0.2, see #11194 , the attention metadata builder needs to access some global information such as sliding window, for the
plan
function before the flashinfer wrapper can be used.This PR fixes the problem, by separating the builder init and builder prepare for each batch:
prepare
(or we can rename it toreset
) to prepare for the current batch, and now it can access the global config stored during construction.