Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[wip][Core] Introduce SPMD worker execution using Ray accelerated DAG #5980

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

stephanie-wang
Copy link
Contributor

This introduces an SPMD execution mode for Worker. In this mode, there is no longer a driver worker and the rank 0 worker is moved to a separate process. All workers are expected to take an ExecuteModelRequest input, instead of using NCCL as a control plane to receive inputs.

To keep the changes contained, for now, this path needs to be used with the new Ray accelerated DAG feature. Compared to Ray Core, this feature reduces system performance overheads for task execution and args passing, by using an execution loop and shared memory, respectively.

TODO:

  • update the required Ray version once accelerated DAG preview is released. Latest Ray version is missing some improved exception handling, so we should not merge this PR until DAG preview is released.
  • add tests - an e2e distributed test for VLLM_USE_SPMD_WORKER=1 VLLM_USE_RAY_COMPILED_DAG=1 is probably sufficient
  • add some benchmarks. We expect some improvement in latency over NCCL control plane for shorter sequences, but longer sequences may be worse right now due to repeatedly sending the entire sequence to all workers via ExecuteModelRequest.

Signed-off-by: Stephanie Wang <swang@cs.berkeley.edu>
Signed-off-by: Stephanie Wang <swang@cs.berkeley.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant