-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[misc] use out argument for flash attention #9740
Conversation
Signed-off-by: youkaichao <youkaichao@gmail.com>
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
@youkaichao Now that #10811 is merged, this PR should pass the test. Please rebase the PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, assuming tests pass
@@ -126,7 +127,8 @@ def test_flash_attn_with_paged_kv( | |||
cache_seqlens=kv_lens_tensor, | |||
softcap=soft_cap if soft_cap is not None else 0, | |||
window_size=window_size, | |||
).squeeze(1) | |||
out=output.unsqueeze(1), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion - do the unsqueeze during creation (either by passing in the modified shape or just .empty_like(...).unsqueeze(1)
. I think that will be cleaner
This pull request has merge conflicts that must be resolved before it can be |
close as it has been reworked in #10822 |
rework of #5138
cc @Yard1 @njhill if you have any comments about why it is reverted in #5478