-
Notifications
You must be signed in to change notification settings - Fork 521
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Development Roadmap (2024 Q4) #1487
Comments
Are there any plans to optimize long context latency? |
Hi,can I help for Multi-layer radix cache (GPU/CPU/Disk)? Really insterested in that. |
I am interested in contributing to P-D split inference architechure and I have machines that support me to develop the architechure, if you guys got any related develop plans please let me know. Thank you @Ying1123 @zhyncs @fengyang95 |
@lumiere-ml @tanzelin430 Are you in the slack channel? We can follow up on that. |
@lumiere-ml @tanzelin430 Welcome to join our slack channel https://join.slack.com/t/sgl-fru7574/shared_invite/zt-2ngly9muu-t37XiH87qvD~6rVBTkTEHw |
thanks for invitation, I am in slack now. forward to collaberate with you |
Thanks for your invitation! |
@lumiere-ml @zhyncs I'm also very interested, could you share which channel you're using to discuss? |
If no one is actively working on supporting pipeline parallelism, I'm down to help |
Here is the development roadmap for 2024 Q4. Contributions and feedback are welcome (Join Bi-weekly Development Meeting). Previous 2024 Q3 roadmap can be found in #634.
Performance
Parallelism
Hardware Coverage
Model Coverage
LoRA support
LMCache Integration
Quantization
@HaiShaw @zhyncs @ispobock
Server API
Observability
Others
The text was updated successfully, but these errors were encountered: