Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[torch.compile] hide slicing under custom op for inductor #8384

Merged
merged 11 commits into from
Sep 12, 2024
Prev Previous commit
Next Next commit
remove empty like
  • Loading branch information
youkaichao committed Sep 12, 2024
commit 3f6d186ae19a76e59ec10712ef86155409333832
1 change: 0 additions & 1 deletion vllm/attention/backends/flash_attn.py
Original file line number Diff line number Diff line change
@@ -697,7 +697,6 @@ def forward(
assert key.shape[0] == num_prefill_tokens + num_decode_tokens
assert value.shape[0] == num_prefill_tokens + num_decode_tokens

output = torch.empty_like(query)
# Query for decode. KV is not needed because it is already cached.
decode_query = query[num_prefill_tokens:]
# QKV for prefill.
Loading