Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Integrate flash-infer FP8 KV Cache Chunked-Prefill (Append Attention) #7450

Open
jon-chuang opened this issue Aug 13, 2024 · 5 comments

Comments

@jon-chuang
Copy link
Contributor

jon-chuang commented Aug 13, 2024

🚀 The feature, motivation and pitch

From new Flash Infer Release https://github.com/flashinfer-ai/flashinfer/releases/tag/v0.1.4

cc @comaniac

Additional context

Follow up to: #7208, #7185

@jon-chuang
Copy link
Contributor Author

Actually, @comaniac, I noticed that there are explicit asserts forbidding use of flash infer kernels for chunked prefill

# Currently chunked prefill is not supported

As pointed out in: flashinfer-ai/flashinfer#392 (comment)


My understanding is that this is because vLLM runs prefill and decode in two separate kernel invocations by default (as is the case for flash-attention, see: #6052), and this applies to flash-infer as well?

Perhaps the first step is to unify the flash infer kernels to use a single kernel, similar to #6052, or at least clarify in what scenario it is ok to run flash-infer kernels for chunked prefill, because according to @yzh119 in flashinfer-ai/flashinfer#392, this should be supported by flash-infer already.

@jon-chuang
Copy link
Contributor Author

Anw, please assign it to me, I will investigate further

@comaniac
Copy link
Collaborator

We are already working on this cc @Yard1

@pavanimajety
Copy link
Contributor

@comaniac Any updates or open PRs on this that we can take a look at?

@taegeonum
Copy link

@comaniac Any updates?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants