-
Notifications
You must be signed in to change notification settings - Fork 144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compatibility with vLLM
with tensor_parallel_size
argument
#805
Merged
gabrielmbmb
merged 10 commits into
develop
from
compatibility-vllm-tensor-parallel-size
Jul 23, 2024
Merged
Compatibility with vLLM
with tensor_parallel_size
argument
#805
gabrielmbmb
merged 10 commits into
develop
from
compatibility-vllm-tensor-parallel-size
Jul 23, 2024
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Documentation for this PR has been built. You can view it at: https://distilabel.argilla.io/pr-805/ |
CodSpeed Performance ReportMerging #805 will not alter performanceComparing Summary
|
plaguss
approved these changes
Jul 23, 2024
Co-authored-by: Agus <agustin@argilla.io>
…hub.com/argilla-io/rlxf into compatibility-vllm-tensor-parallel-size
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This PR adds a few changes that enables using
vLLM
with the argumenttensor_parallel_size
. This argument enables the use of multiple GPUs with vLLM, and to do so, vLLM usesmultiprocessing
orray
:multiprocessing
approach, it was not working asvLLM
was trying to create new processes but it was not able to because the process thatdistilabel
creates was a daemon process, which is not allowed to create child processes. To by pass these issue, a_NoDaemonPool
class has been created that creates non-daemon processes and it's used inPipeline
.ray
it works installing the version in themain
branch which includes the changes of [Core] Introduce SPMD worker execution using Ray accelerated DAG vllm-project/vllm#6032. It's needed to setVLLM_USE_RAY_COMPILED_DAG=1
andVLLM_USE_RAY_SPMD_WORKER=1
environment variables. Update:vllm==0.5.3
has been released which includes the changes needed.