[core][distributed] add zmq fallback for broadcasting large objects #6183

youkaichao · 2024-07-07T05:40:35Z

The input to vision language model contains images, which has variable length and can be quite large.

While the shared memory broadcast introduced in #5399 works fine for LLMs, later we find we often need to adjust the buffer size for vision language models.

Estimating the size upper bound can be difficult. To solve the problem, this PR adds a fallback option using zeromq.

When the object is small, we will use shared memory broadcast, which is fast and efficient.
If the object is too large, we leave an overflow message in the shared memory, and send the data via zeromq, which can handle arbitrary sized data.

In addition, shared memory broadcast is limited to single node, while zeromq (socket-based) is not. Therefore, we can extend the broadcast to also work for cross-node settings. This PR extends the functionality.

cc @DarkLight1337 @ywang96 for vision language model related.

youkaichao · 2024-07-07T05:44:42Z

vllm/distributed/device_communicators/shm_broadcast.py

+        if self.n_local_reader > 0:
+            if len(serialized_obj) >= self.buffer.max_chunk_bytes:
+                with self.acquire_write() as buf:
+                    buf[0] = 1  # overflow
+                self.local_socket.send(serialized_obj)
+            else:
+                with self.acquire_write() as buf:
+                    buf[0] = 0  # not overflow
+                    buf[1:len(serialized_obj) + 1] = serialized_obj
+        if self.n_remote_reader > 0:
+            self.remote_socket.send(serialized_obj)


this is the most critical part, previously, when object size is too large (e.g. large image data), vLLM will error. Now, we will fall back to zqm.

DarkLight1337 · 2024-07-08T05:29:32Z

This idea sounds good, but I don't have much experience with cross-device communication, so I'll leave the review to someone who is more qualified.

zhuohan123

I took a very brief look and the code in general LGTM. I didn't check the internal logic of the message queue-based broadcast, but the interface and code change looks good. Please let me know whether I need to look into any part more carefully.

vllm/distributed/parallel_state.py

WoosukKwon · 2024-07-09T22:51:40Z

@youkaichao pytest tests/distributed/test_shm_broadcast.py worked on the AMD MI210 machine. Do you want other kinds of tests as well?

youkaichao · 2024-07-10T01:48:57Z

@WoosukKwon thanks for testing! I think it works because I added the dependency in requirements-common.txt , so all platforms can use it.

…-project#6183) [core][distributed] add zmq fallback for broadcasting large objects (vllm-project#6183) (cherry picked from commit da78cae)

…-project#6183) [core][distributed] add zmq fallback for broadcasting large objects (vllm-project#6183)

…-project#6183) [core][distributed] add zmq fallback for broadcasting large objects (vllm-project#6183) Signed-off-by: Alvant <alvasian@yandex.ru>

youkaichao added 9 commits July 6, 2024 21:35

update in_the_same_node_as

5da4157

add fallback to zmq for large data

81e47ed

update tests

61a4b82

bugfix

dcf1e6d

bugfix

b8251d5

bugfix

0cf91d2

bugfix

710c9a3

rename to message queue

24e496d

update parallel state

45d82fc

youkaichao commented Jul 7, 2024

View reviewed changes

zhuohan123 approved these changes Jul 9, 2024

View reviewed changes

vllm/distributed/parallel_state.py Outdated Show resolved Hide resolved

youkaichao added 2 commits July 9, 2024 10:42

rename

daa49cc

update comments

17d7e36

youkaichao merged commit da78cae into vllm-project:main Jul 10, 2024
70 checks passed

youkaichao deleted the add_zmq branch July 10, 2024 01:49

ywang96 mentioned this pull request Jul 12, 2024

[Bug]: llava model gets stuck with RuntimeError: Please increase the max_chunk_bytes parameter. #6376

Closed

dtrifiro pushed a commit to opendatahub-io/vllm that referenced this pull request Jul 17, 2024

[core][distributed] zmq fallback for broadcasting large objects (vllm…

12e9369

…-project#6183) [core][distributed] add zmq fallback for broadcasting large objects (vllm-project#6183)

xjpang pushed a commit to xjpang/vllm that referenced this pull request Jul 24, 2024

[core][distributed] zmq fallback for broadcasting large objects (vllm…

675d169

…-project#6183) [core][distributed] add zmq fallback for broadcasting large objects (vllm-project#6183)

mgoin mentioned this pull request Jul 25, 2024

[RFC]: Isolate OpenAI Server Into Separate Process #6797

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[core][distributed] add zmq fallback for broadcasting large objects #6183

[core][distributed] add zmq fallback for broadcasting large objects #6183

youkaichao commented Jul 7, 2024

youkaichao Jul 7, 2024

DarkLight1337 commented Jul 8, 2024 •

edited

Loading

zhuohan123 left a comment

WoosukKwon commented Jul 9, 2024

youkaichao commented Jul 10, 2024

[core][distributed] add zmq fallback for broadcasting large objects #6183

[core][distributed] add zmq fallback for broadcasting large objects #6183

Conversation

youkaichao commented Jul 7, 2024

youkaichao Jul 7, 2024

Choose a reason for hiding this comment

DarkLight1337 commented Jul 8, 2024 • edited Loading

zhuohan123 left a comment

Choose a reason for hiding this comment

WoosukKwon commented Jul 9, 2024

youkaichao commented Jul 10, 2024

DarkLight1337 commented Jul 8, 2024 •

edited

Loading