-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Ray Integration] Integrate vllm with experimental accelerated DAG API #2201
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
ip Added DeciLM-7b and DeciLM-7b-instruct (vllm-project#2062) .
560c80f
to
d0721ac
Compare
Q: Please let me know Where I should add tests. I was thinking to add a flag to test_models.py, but it seems like it doesn't really test ray config (where tp > 1). |
…athway-integration
…into pathway-integration
Presumably this is obsolete now that #2221 is merged? |
@njhill we decided to contribute the feature off by default. We will internally productionize it within anyscale and consider to reenable in the future. |
Decided to close over #2471 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hi, we have an experimental accelerated DAG developed based on rep ray-project/enhancements#48.
TL;DR is that we are implementing a compile DAG api that can reduce the control plane overhead from ray (See
_compiled_dag_init_dag
for details). Our microbenchmark shows 35x control plane overhead reduction in scatter gather type of workload (which is equivalent to how vllm implements tensor parallel. I.e., send a single input to N actors and get the result from all actors);PR summary
Benchmark
The experimental feature can be used with the nightly Ray for the evaluation. I ran the llama 7B benchmark on g5.12xlarge (4 A10 GPUs), TP=4, 10 iteration.
And got the result
which is about 13% improvement.
Limitation
Note that the current nightly implementation has a several limitation. We are working on follow-ups, but the feature itself could be enabled by default after these features are implemented.
Followup
I am planning to make a follow up PR for these 2 features after merging https://github.com/ray-project/ray/pull/41943/files. The ETA is 12/26 (when I come back from OOO). Please let me know if async engine is a high priority item for evaluation.