[dlinfer]change llm op interface of paged_prefill_attention. #2977

JackWeiw · 2024-12-31T03:31:33Z

Motivation

This PR changes llm op interface of paged_prefiil_attention.

Modification

Add relevant params to dlinfer attention backend and kernel
Added params requested by tmo:
cu_seq_lens_kv (Tensor): The cumulative sequence lengths of the key/value sequences.
max_kv_seq_len (int): The maximum length of any key/value sequence.

JackWeiw added 2 commits December 30, 2024 10:50

[dlinfer]modify interface to support camb multi-batch-conv

a9121e9

[dlinfer]change order for paged_prefill

17e11f5

lvhan028 requested a review from jinminxi104 December 31, 2024 09:44

lvhan028 added the improvement label Dec 31, 2024

RunningLeon changed the title ~~[dlinfer]change llm op interface of paged_prefiil_attention.~~ [dlinfer]change llm op interface of paged_prefill_attention. Jan 2, 2025

jinminxi104 marked this pull request as draft January 2, 2025 08:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[dlinfer]change llm op interface of paged_prefill_attention. #2977

[dlinfer]change llm op interface of paged_prefill_attention. #2977

JackWeiw commented Dec 31, 2024

[dlinfer]change llm op interface of paged_prefill_attention. #2977

Are you sure you want to change the base?

[dlinfer]change llm op interface of paged_prefill_attention. #2977

Conversation

JackWeiw commented Dec 31, 2024

Motivation

Modification