Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dlinfer]change llm op interface of paged_prefill_attention. #2977

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

JackWeiw
Copy link
Contributor

Motivation

This PR changes llm op interface of paged_prefiil_attention.

Modification

Add relevant params to dlinfer attention backend and kernel
Added params requested by tmo:
cu_seq_lens_kv (Tensor): The cumulative sequence lengths of the key/value sequences.
max_kv_seq_len (int): The maximum length of any key/value sequence.

@lvhan028 lvhan028 requested a review from jinminxi104 December 31, 2024 09:44
@RunningLeon RunningLeon changed the title [dlinfer]change llm op interface of paged_prefiil_attention. [dlinfer]change llm op interface of paged_prefill_attention. Jan 2, 2025
@jinminxi104 jinminxi104 marked this pull request as draft January 2, 2025 08:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants