Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adapt vllm distributed module to sglang #2244

Merged
merged 14 commits into from
Dec 1, 2024

Conversation

yizhang2077
Copy link
Collaborator

@yizhang2077 yizhang2077 commented Nov 28, 2024

Motivation

Modifications

Move vllm distributed module (v0.6.4.post1) to sglang, and current model is stilll using vllm.distributed module

Checklist

  • Format your code according to the Contributor Guide.
  • Add unit tests as outlined in the Contributor Guide.
  • Update documentation as needed, including docstrings or example tutorials.

What to do Next

  • Install sgl-kernel for custom allreduce and replace vllm.distributed to sglang.srt.distributed
  • Add tensorrt allreduce module

@yizhang2077 yizhang2077 changed the title move vllm distributed module to sglang move vllm distributed module (v0.6.3.post1) to sglang Nov 28, 2024
@zhyncs zhyncs self-assigned this Nov 28, 2024
@zhyncs
Copy link
Member

zhyncs commented Nov 28, 2024

QQ why not use the v0.6.4.post1

@yizhang2077
Copy link
Collaborator Author

QQ why not use the v0.6.4.post1

I see pyproject.toml require vllm>=0.6.3.post1

@zhyncs
Copy link
Member

zhyncs commented Nov 28, 2024

This is to maintain compatibility, 0.6.3.post1 uses torch 2.4, and 0.6.4.post1 uses torch 2.5.1. The current main branch is compatible with both torch 2.4 and torch 2.5.1. Regarding the distributed part, I suggest directly updating to v0.6.4.post1

@yizhang2077
Copy link
Collaborator Author

yizhang2077 commented Nov 28, 2024

This is to maintain compatibility, 0.6.3.post1 uses torch 2.4, and 0.6.4.post1 uses torch 2.5.1. The current main branch is compatible with both torch 2.4 and torch 2.5.1. Regarding the distributed part, I suggest directly updating to v0.6.4.post1

ok,currently it use 0.6.4.post1 as base

@yizhang2077 yizhang2077 force-pushed the remove-vllm-distributed branch from 84622c3 to 21d7d58 Compare November 28, 2024 16:14
@yizhang2077 yizhang2077 changed the title move vllm distributed module (v0.6.3.post1) to sglang move vllm distributed module (v0.6.4.post1) to sglang Nov 28, 2024
@yizhang2077 yizhang2077 force-pushed the remove-vllm-distributed branch from 21d7d58 to 9fecd6e Compare November 28, 2024 16:22
Copy link
Member

@zhyncs zhyncs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM just left some comments Thanks!

python/sglang/srt/_custom_ops.py Outdated Show resolved Hide resolved
python/sglang/srt/_custom_ops.py Outdated Show resolved Hide resolved
python/sglang/srt/_custom_ops.py Outdated Show resolved Hide resolved
python/sglang/srt/_custom_ops.py Show resolved Hide resolved
@zhyncs zhyncs changed the title move vllm distributed module (v0.6.4.post1) to sglang adapt vllm distributed module to sglang Nov 30, 2024
Copy link
Member

@zhyncs zhyncs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I think we can merge it into remove-vllm-distributed and verify afterward. You can then create another PR from remove-vllm-distributed.

@zhyncs zhyncs requested a review from HaiShaw November 30, 2024 05:52
@zhyncs
Copy link
Member

zhyncs commented Nov 30, 2024

isort....................................................................Failed
- hook id: isort
- files were modified by this hook

Fixing /home/runner/work/sglang/sglang/python/sglang/srt/_custom_ops.py

@yizhang2077
Copy link
Collaborator Author

yizhang2077 commented Nov 30, 2024

I think the test failed is caused by pr #2266 , @zhaochenyang20

Copy link
Member

@zhyncs zhyncs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM Since it is not currently in use, I think it's safe to merge. More detailed testing and verification can be conducted when integrating the custom all-reduce CUDA kernel into sgl-kernel. cc @yizhang2077

@zhyncs zhyncs enabled auto-merge (squash) December 1, 2024 07:49
@zhyncs zhyncs disabled auto-merge December 1, 2024 07:54
@zhyncs zhyncs merged commit d5b95cb into sgl-project:main Dec 1, 2024
8 of 13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants