Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
🤖 I have created a release beep boop
0.3.0 (2024-12-25)
Features
MultiLevelCascadeAttentionWrapper
API (#462) (1e37989)rotary_dim
argument to rope APIs for partial apply rope (#599) (eb9bc71)use_tensor_cores
option to decode kernels to accelerate GQA (#317) (3b50dd5)use_softmax
field in variant class (#533) (d81af97)non_blocking
to plan function (#622) (560af6f)merge_state_in_place
(#372) (e14fa81)sm_scale
field for all attention APIs (#145) (85d4018)logits_soft_cap
value (#339) (a2498f5)head_dim=256
for attention kernels (#132) (0372acc)Bug Fixes
+
(#118) (af6bd10)Performance Improvements
kv_chunk_size
back to 128 (#329) (f237f5f)append_paged_kv_cache
(#588) (e15f7c9)This PR was generated with Release Please. See documentation.