Skip to content

Navigation Menu

Explore
By company size
By use case
By industry
View all solutions
Topics
- AI
- DevOps
- Security
- Software Development
- View all
Explore
- GitHub Sponsors
  Fund open source developers
- The ReadME Project
  GitHub community articles
Repositories
- Enterprise platform
  AI-powered developer platform
Available add-ons
Pricing

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

flashinfer-ai / flashinfer Public

Notifications You must be signed in to change notification settings
Fork 162
Star 1.6k

Code
Issues 43
Pull requests 7
Discussions
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Security
Insights

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

[Roadmap] FlashInfer v0.2 to v0.3 #675

Open

12 tasks

yzh119 opened this issue Dec 17, 2024 · 0 comments

Open

12 tasks

[Roadmap] FlashInfer v0.2 to v0.3 #675

yzh119 opened this issue Dec 17, 2024 · 0 comments

Labels

roadmap

Comments

Copy link

Collaborator

yzh119 commented Dec 17, 2024 •

edited

Loading

Milestones

Our tentative roadmap includes the following milestones:

SageAttention-2 in FlashAttention3: Implement SageAttention-2 in FlashAttention3 template
Flex-Attention Compatible Interface: standarize JIT interface @shadowpa0327
SM89 Kernel Optimization: Leverage Ada FP8 Tensor Cores for better performance on Ada6000 & 4090.
Template Refactoring: Refactor FA-2 and MLA templates using CuTE.
MLA Acceleration: Optimize Multi-Level Attention (MLA) with Tensor Core support, follow up of feat: support MLA decode #551 .
Triton Porting: Migrate elementwise, normalization, and other kernels (that are not on critical path) to Triton.
API Standardization: Simplify and standardize the attention APIs for better usability.
POD-Attention Integration: Implement POD-Attention for improved efficiency of chunked-prefill.
Nanoflow Parallelism: Expose python-level APIs for performing GEMM and Attention on a subset of SMs, which is required for nanoflow style parallelism, see #591.
Fused Tree Speculative Sampling: follow up of sampling: fused speculative sampling kernels #259 , we should support tree-speculative sampling as well, we will port the implementation of fused tree-speculative sampling written by @spectrometerHBH from https://github.com/mlc-ai/mlc-llm to accelerate eagle and medusa etc.
Improvements on Existing Top-P/K Sampling Operators: change the algorithm to guarantee all samples are successful after 32 rounds.
PyPI wheels: upload wheels to PyPI (pending issue: PEP 541 Request: flashinfer pypi/support#5355)

We welcome your feedback and suggestions!
Let us know what features you'd like to see in FlashInfer.

The text was updated successfully, but these errors were encountered:

yuziGuo, jason-huang03, cylinbao, profit-water, shadowpa0327, apanwariisc, AgrawalAmey, xslingcn, merrymercy, ZhongYingMatrix, and 2 more reacted with hooray emoji

zhyncs, yuziGuo, cylinbao, shadowpa0327, xslingcn, jiangguochaoGG, and simon-mo reacted with rocket emoji

All reactions

🎉 12 reactions
🚀 7 reactions

yzh119 added the roadmap label

yzh119 pinned this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Assignees

No one assigned

Labels

Projects

None yet

Milestone

No milestone

Development

No branches or pull requests

1 participant

Footer

© 2024 GitHub, Inc.

Footer navigation

Terms
Privacy
Security
Status
Docs
Contact

You can’t perform that action at this time.