Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Development Roadmap (2024 Q4) #1487

Open
8 of 33 tasks
Ying1123 opened this issue Sep 21, 2024 · 9 comments
Open
8 of 33 tasks

Development Roadmap (2024 Q4) #1487

Ying1123 opened this issue Sep 21, 2024 · 9 comments

Comments

@Ying1123
Copy link
Member

Ying1123 commented Sep 21, 2024

Here is the development roadmap for 2024 Q4. Contributions and feedback are welcome (Join Bi-weekly Development Meeting). Previous 2024 Q3 roadmap can be found in #634.

Performance

Parallelism

Hardware Coverage

Model Coverage

LoRA support

LMCache Integration

Quantization

@HaiShaw @zhyncs @ispobock

Server API

Observability

Others

@fengyang95
Copy link

Are there any plans to optimize long context latency?

@Ying1123 Ying1123 changed the title [WIP] Development Roadmap (2024 Q4) Development Roadmap (2024 Q4) Sep 22, 2024
@zhyncs zhyncs pinned this issue Sep 22, 2024
@lumiere-ml
Copy link

Hi,can I help for Multi-layer radix cache (GPU/CPU/Disk)? Really insterested in that.

@tanzelin430
Copy link

Are there any plans to optimize long context latency?

I am interested in contributing to P-D split inference architechure and I have machines that support me to develop the architechure, if you guys got any related develop plans please let me know. Thank you @Ying1123 @zhyncs @fengyang95

@merrymercy
Copy link
Contributor

@lumiere-ml @tanzelin430 Are you in the slack channel? We can follow up on that.

@zhyncs
Copy link
Member

zhyncs commented Oct 20, 2024

@lumiere-ml @tanzelin430 Are you in the slack channel? We can follow up on that.

@lumiere-ml @tanzelin430 Welcome to join our slack channel https://join.slack.com/t/sgl-fru7574/shared_invite/zt-2ngly9muu-t37XiH87qvD~6rVBTkTEHw

@tanzelin430
Copy link

@lumiere-ml @tanzelin430 Are you in the slack channel? We can follow up on that.

@lumiere-ml @tanzelin430 Welcome to join our slack channel https://join.slack.com/t/sgl-fru7574/shared_invite/zt-2ngly9muu-t37XiH87qvD~6rVBTkTEHw

thanks for invitation, I am in slack now. forward to collaberate with you

@lumiere-ml
Copy link

@lumiere-ml @tanzelin430 Are you in the slack channel? We can follow up on that.

@lumiere-ml @tanzelin430 Welcome to join our slack channel https://join.slack.com/t/sgl-fru7574/shared_invite/zt-2ngly9muu-t37XiH87qvD~6rVBTkTEHw

Thanks for your invitation!

@Edenzzzz
Copy link

Edenzzzz commented Nov 11, 2024

@lumiere-ml @tanzelin430 Are you in the slack channel? We can follow up on that.

@lumiere-ml @tanzelin430 Welcome to join our slack channel https://join.slack.com/t/sgl-fru7574/shared_invite/zt-2ngly9muu-t37XiH87qvD~6rVBTkTEHw

Thanks for your invitation!

@lumiere-ml @zhyncs I'm also very interested, could you share which channel you're using to discuss?
Perhaps we can combine radix tree prefix matching with P-D disaggregation similar to Mooncake?

@mfdj2002
Copy link

If no one is actively working on supporting pipeline parallelism, I'm down to help

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants