-
-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[distributed] add function to create ipc buffers directly #10064
Conversation
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
test failure is because we don't have a100 gpus in our ci queue. |
@youkaichao I can confirm this work on my machine. |
world_size = dist.get_world_size(group=group) | ||
rank = dist.get_rank(group=group) | ||
handles = [None] * world_size | ||
dist.all_gather_object(handles, handle, group=group) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to use broadcast with device=cpu
in _gather_ipc_meta
but not here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when you use this function, the group
argument should be cpu_group
passed to custom allreduce object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see
vllm/vllm/distributed/parallel_state.py
Lines 231 to 236 in 4089985
if use_custom_allreduce and self.world_size > 1: | |
# Initialize a custom fast all-reduce implementation. | |
self.ca_comm = CustomAllreduce( | |
group=self.cpu_group, | |
device=self.device, | |
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is all_gather
fine here but not in _gather_ipc_meta
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, that is because we met some issues with all_gather
for tensors directly. here we are using all_gather_object
, so it should be fine. see pytorch/pytorch#126032 for the pytorch issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Someone refactored this to return a Tensor
https://github.com/vllm-project/vllm/pull/5047/files#diff-44d9d733ee604800cbce9858a9201db1044aeff2c940fa4a0521d0c9b6541b3eL137
A better way should be returning a string, if torch bindings doesn't support int8 directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah string should be fine. #5047 aims to get rid of pybind11 so that we can release python version agnostic wheels.
…lm-project#10064)" This reverts commit 4be3a45.
…ct#10064) Signed-off-by: youkaichao <youkaichao@gmail.com> Signed-off-by: Loc Huynh <jc1da.3011@gmail.com>
…ct#10064) Signed-off-by: youkaichao <youkaichao@gmail.com> Signed-off-by: Sumit Dubey <sumit.dubey2@ibm.com>
…ct#10064) Signed-off-by: youkaichao <youkaichao@gmail.com>
…ct#10064) Signed-off-by: youkaichao <youkaichao@gmail.com> Signed-off-by: Maxime Fournioux <55544262+mfournioux@users.noreply.github.com>
…ct#10064) Signed-off-by: youkaichao <youkaichao@gmail.com> Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
…ct#10064) Signed-off-by: youkaichao <youkaichao@gmail.com>
pytorch's ipc handle format can change, and using pytorch for cuda ipc will suffer from pytorch's change. see #9815 for example.
cc @hanzhi713