-
Notifications
You must be signed in to change notification settings - Fork 181
Comparing changes
Open a pull request
base repository: flashinfer-ai/flashinfer
base: v0.0.8
head repository: flashinfer-ai/flashinfer
compare: v0.0.9
- 11 commits
- 49 files changed
- 5 contributors
Commits on Jul 3, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 2e64a65 - Browse repository at this point
Copy the full SHA 2e64a65View commit details
Commits on Jul 4, 2024
-
perf: accelerate gqa performance (#356)
Changes: 1. Prefetch page indices (we have already done such optimization on decode kernels, but not on append/prefill kernels which was used in GQA). 2. Unlock 1x4 warp layout in #322, we didn't enable this because the binary size is too large, we should further reduce some unnecessary template arguments. 3. Optimize `threadblock_sync_mdo_states` for efficient merging attention states of multiple warps in a threadblock. Our previous implementation assumes small shared memory size and interleaves shared memory reads/writes with computations, which is not as efficient as a bulk shared memory access. After this PR, the GQA kernel execution time (on H100) for setting `batch_size=128, seq_len=1024, num_qo_heads=32, num_kv_heads=4, head_dim=128` was improved from 133us to 103us.
Configuration menu - View commit details
-
Copy full SHA for e56ddad - Browse repository at this point
Copy the full SHA e56ddadView commit details -
Configuration menu - View commit details
-
Copy full SHA for 3536198 - Browse repository at this point
Copy the full SHA 3536198View commit details
Commits on Jul 6, 2024
-
bugfix: check gpu id in PyTorch APIs and use input tensor's gpu defau…
Configuration menu - View commit details
-
Copy full SHA for 1b84fab - Browse repository at this point
Copy the full SHA 1b84fabView commit details
Commits on Jul 10, 2024
-
bugfix: fix decode kernels output for empty kv cache (#363)
When some request has empty kv cache, the output of decode kernels doesn't align with prefill kernels. This PR fixes the issue. Thanks @MasterJH5574 for reporting this bug.
Configuration menu - View commit details
-
Copy full SHA for ac72b1c - Browse repository at this point
Copy the full SHA ac72b1cView commit details -
refactor: slight refactor of prefill kernels (#364)
- add `__launch_bounds__` - add unroll hint for prefetching page indices - change loop structure of `threadblock_sync_mdo_states`
Configuration menu - View commit details
-
Copy full SHA for 264082e - Browse repository at this point
Copy the full SHA 264082eView commit details -
perf: Optimize tensor conversions in C++ code to avoid unnecessary co…
…pies (#366) Small tweak to avoid unnecessary copying by combining `to` calls. Discovered during profiling.
Configuration menu - View commit details
-
Copy full SHA for 1116237 - Browse repository at this point
Copy the full SHA 1116237View commit details -
Alibi experienced a performance degradation after #262 because of increased number of integer division. This PR fixes the issue.
Configuration menu - View commit details
-
Copy full SHA for 4f0a9f9 - Browse repository at this point
Copy the full SHA 4f0a9f9View commit details
Commits on Jul 11, 2024
-
bugfix: fix the decode kernel segfault in cudagraph mode (#368)
The `begin_forward` function in decode attention wrappers sometimes triggers segfault, this PR fixes the issue.
Configuration menu - View commit details
-
Copy full SHA for c69cfab - Browse repository at this point
Copy the full SHA c69cfabView commit details
Commits on Jul 12, 2024
-
refactor: reduce binary size by making
kv_layout
an argument instea……d of template parameter (#370) This PR reduces binary size by half, by moving `kv_layout` from template parameter to input argument. This PR also adds `stride_n` and `stride_h` fields to `tensor_info_t` and `paged_kv_t`, thus making it possible to support non-contiguous inputs (#311 ), however, I'll leave it for another PR.
Configuration menu - View commit details
-
Copy full SHA for 024a79f - Browse repository at this point
Copy the full SHA 024a79fView commit details -
chore(main): release 0.0.9 (#359)
🤖 I have created a release *beep* *boop* --- ## [0.0.9](v0.0.8...v0.0.9) (2024-07-12) ### Bugfix * fix the decode kernel segfault in cudagraph mode ([#368](https://github.com/flashinfer-ai/flashinfer/pull/368))([c69cfa](https://github.com/flashinfer-ai/flashinfer/commit/c69cfabc540e4a7edd991713df10d575ff3b0c21)) - fix decode kernels output for empty kv cache ([#363](https://github.com/flashinfer-ai/flashinfer/pull/363))([ac72b1](https://github.com/flashinfer-ai/flashinfer/commit/ac72b1cc14a6474d601f371c8d69e2600ac28d2f)) - check gpu id in PyTorch APIs and use input tensor's gpu default stream ([#361](https://github.com/flashinfer-ai/flashinfer/pull/361))([1b84fa](https://github.com/flashinfer-ai/flashinfer/commit/1b84fab3e4f53fb4fa26952fdb46fa8018634057)) ### Performance Improvements * accelerate alibi ([#365](#365)) ([4f0a9f9](4f0a9f9)) * accelerate gqa performance ([#356](#356)) ([e56ddad](e56ddad)) * Optimize tensor conversions in C++ code to avoid unnecessary copies ([#366](#366)) ([1116237](1116237)) ### Acknowledgement We thank [@Yard1](https://github.com/Yard1), [@Ying1123](https://github.com/Ying1123) and [@zhyncs](https://github.com/zhyncs) for their contributions. --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Zihao Ye <expye@outlook.com>
Configuration menu - View commit details
-
Copy full SHA for 17a5f1b - Browse repository at this point
Copy the full SHA 17a5f1bView commit details
This comparison is taking too long to generate.
Unfortunately it looks like we can’t render this comparison for you right now. It might be too big, or there might be something weird with your repository.
You can try running this command locally to see the comparison on your machine:
git diff v0.0.8...v0.0.9