-
-
Notifications
You must be signed in to change notification settings - Fork 5.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[core][distributed] add zmq fallback for broadcasting large objects #6183
Conversation
if self.n_local_reader > 0: | ||
if len(serialized_obj) >= self.buffer.max_chunk_bytes: | ||
with self.acquire_write() as buf: | ||
buf[0] = 1 # overflow | ||
self.local_socket.send(serialized_obj) | ||
else: | ||
with self.acquire_write() as buf: | ||
buf[0] = 0 # not overflow | ||
buf[1:len(serialized_obj) + 1] = serialized_obj | ||
if self.n_remote_reader > 0: | ||
self.remote_socket.send(serialized_obj) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the most critical part, previously, when object size is too large (e.g. large image data), vLLM will error. Now, we will fall back to zqm.
This idea sounds good, but I don't have much experience with cross-device communication, so I'll leave the review to someone who is more qualified. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took a very brief look and the code in general LGTM. I didn't check the internal logic of the message queue-based broadcast, but the interface and code change looks good. Please let me know whether I need to look into any part more carefully.
@youkaichao |
@WoosukKwon thanks for testing! I think it works because I added the dependency in |
…-project#6183) [core][distributed] add zmq fallback for broadcasting large objects (vllm-project#6183) (cherry picked from commit da78cae)
…-project#6183) [core][distributed] add zmq fallback for broadcasting large objects (vllm-project#6183)
…-project#6183) [core][distributed] add zmq fallback for broadcasting large objects (vllm-project#6183)
…-project#6183) [core][distributed] add zmq fallback for broadcasting large objects (vllm-project#6183) Signed-off-by: Alvant <alvasian@yandex.ru>
The input to vision language model contains images, which has variable length and can be quite large.
While the shared memory broadcast introduced in #5399 works fine for LLMs, later we find we often need to adjust the buffer size for vision language models.
Estimating the size upper bound can be difficult. To solve the problem, this PR adds a fallback option using zeromq.
In addition, shared memory broadcast is limited to single node, while zeromq (socket-based) is not. Therefore, we can extend the broadcast to also work for cross-node settings. This PR extends the functionality.
cc @DarkLight1337 @ywang96 for vision language model related.