-
-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Ray backend] Better error when pg topology is bad. #7584
[Ray backend] Better error when pg topology is bad. #7584
Conversation
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, please make sure to run full CI as it is required to merge (or just use auto-merge). To run full CI, you can do one of these:
🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this guarantee the driver worker node is included in the placement group?
ah, I misunderstood the requirement. I think there's a way to get it working. Are we aligned on this semantic?
|
|
gotcha. ETA eod today |
Addressed comments;
|
|
||
from vllm.utils import cuda_device_count_stateless | ||
|
||
TARGET_TEST_SUITE = os.environ.get("TARGET_TEST_SUITE", "L4") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please clean up the tests and remove unnecessary code. this test does not need TARGET_TEST_SUITE
I think.
f"available in a node {current_node_id=} {current_ip=}.") | ||
# This way, at least bundle is required to be created in a current | ||
# node. | ||
placement_group_specs[0][f"node:{current_ip}"] = 0.001 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this the key to make sure current node is included in the placement group?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes that's right. I failed to find other clean way to support this unfortunately...
will fix tests and address comments by this sun |
|
||
# Simulate 2 node clusters, 1 GPU each. | ||
cluster = Cluster() | ||
head_node = cluster.add_node(num_cpus=8, num_gpus=1, resources={"head": 1}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why num_cpus=8
here? is 8 some magic number?
We don't have any guarentee on the number of GPUs we have for this 2 GPUs test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ray doesn't require real hardware cpus when specifying resources like this. 8 is just a random number. (I think technically num_cpus=1 should also work)
@pytest.mark.parametrize("model, distributed_executor_backend, test_suite", [ | ||
("facebook/opt-125m", "ray"), | ||
]) | ||
def test_multi_node_bad_topology( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should test it with 2 nodes test (where ray cluster is already created), and try to launch the test with 1 gpu from both the head node and the worker node, and make sure the vllm instance is scheduled with the same node as the process who launched it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm yeah I think we need real 2 nodes if we want to use node ip resources. let me fix it.
@ray.remote(num_gpus=1) | ||
class Actor: | ||
pass | ||
|
||
# a is created on a head node. | ||
actors = [Actor.remote() for _ in range(2)] # type: ignore | ||
ray.get([a.__ray_ready__.remote() for a in actors]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it guaranteed that these actors are allocated in the head node?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. it is a little of implementation detail, but actor is prioritized to be scheduled on a node that creates them.
But good point though. Let me just use https://docs.ray.io/en/master/ray-core/api/doc/ray.util.scheduling_strategies.NodeAffinitySchedulingStrategy.html#ray.util.scheduling_strategies.NodeAffinitySchedulingStrategy so that we don't need to rely on impl details..
Looks like there is a misunderstanding. What I want to test, is basically: worker_wrapper_kwargs = self._get_worker_wrapper_args()
for bundle_id, bundle in enumerate(placement_group.bundle_specs):
if not bundle.get("GPU", 0):
continue
scheduling_strategy = PlacementGroupSchedulingStrategy(
placement_group=placement_group,
placement_group_capture_child_tasks=True,
placement_group_bundle_index=bundle_id,
)
worker = ray.remote(
num_cpus=0,
num_gpus=num_gpus,
scheduling_strategy=scheduling_strategy,
**ray_remote_kwargs,
)(RayWorkerWrapper).remote(**worker_wrapper_kwargs) When we create actors (which needs 2 GPUs) within the placement group (which has 4 GPUs in 2 nodes), we can make sure the actors are scheduled in the head node. |
discussed offline. we want to make sure all gpus in the current node is used before other nodes are used. Will modify code and tests accordingly. sorry for the delay |
Co-authored-by: youkaichao <youkaichao@126.com>
These changes have unfortunately broken Ray 2.9 compatibility, because Does this mean 2.9 is no longer supported or should we conditionally import the correct class function instead? |
Hmm I guess this never been stable API, and that's probably the issue. I think we can conditionally import this. @darthhexx would you like to tackle this real quick? |
Yip, on it. |
Prepped a PR to conditionally import, but as mentioned in the PR description we also need the fastapi bump in #8212 to be pinned to match Ray 2.9.x to keep Ray 2.9 compatibility. |
Co-authored-by: youkaichao <youkaichao@126.com> Signed-off-by: Alvant <alvasian@yandex.ru>
Co-authored-by: youkaichao <youkaichao@126.com>
Closes #6956
Closes #2406
Also added some tests
PR Checklist (Click to Expand)
Thank you for your contribution to vLLM! Before submitting the pull request, please ensure the PR meets the following criteria. This helps vLLM maintain the code quality and improve the efficiency of the review process.
PR Title and Classification
Only specific types of PRs will be reviewed. The PR title is prefixed appropriately to indicate the type of change. Please use one of the following:
[Bugfix]
for bug fixes.[CI/Build]
for build or continuous integration improvements.[Doc]
for documentation fixes and improvements.[Model]
for adding a new model or improving an existing model. Model name should appear in the title.[Frontend]
For changes on the vLLM frontend (e.g., OpenAI API server,LLM
class, etc.)[Kernel]
for changes affecting CUDA kernels or other compute kernels.[Core]
for changes in the core vLLM logic (e.g.,LLMEngine
,AsyncLLMEngine
,Scheduler
, etc.)[Hardware][Vendor]
for hardware-specific changes. Vendor name should appear in the prefix (e.g.,[Hardware][AMD]
).[Misc]
for PRs that do not fit the above categories. Please use this sparingly.Note: If the PR spans more than one category, please include all relevant prefixes.
Code Quality
The PR need to meet the following code quality standards:
format.sh
to format your code.docs/source/
if the PR modifies the user-facing behaviors of vLLM. It helps vLLM user understand and utilize the new features or changes.Notes for Large Changes
Please keep the changes as concise as possible. For major architectural changes (>500 LOC excluding kernel/data/config/test), we would expect a GitHub issue (RFC) discussing the technical design and justification. Otherwise, we will tag it with
rfc-required
and might not go through the PR.What to Expect for the Reviews
The goal of the vLLM team is to be a transparent reviewing machine. We would like to make the review process transparent and efficient and make sure no contributor feel confused or frustrated. However, the vLLM team is small, so we need to prioritize some PRs over others. Here is what you can expect from the review process:
action-required
label on the PR if there are changes required. The contributor should address the comments and ping the reviewer to re-review the PR.Thank You
Finally, thank you for taking the time to read these guidelines and for your interest in contributing to vLLM. Your contributions make vLLM a great tool for everyone!