[Ray Integration] Integrate vllm with experimental accelerated DAG API #2201

rkooo567 · 2023-12-19T12:23:06Z

Hi, we have an experimental accelerated DAG developed based on rep ray-project/enhancements#48.

TL;DR is that we are implementing a compile DAG api that can reduce the control plane overhead from ray (See _compiled_dag_init_dag for details). Our microbenchmark shows 35x control plane overhead reduction in scatter gather type of workload (which is equivalent to how vllm implements tensor parallel. I.e., send a single input to N actors and get the result from all actors);

PR summary

Support a path to run compiled DAG.
Use pickle for serialization instead of Ray's default serializer (cloudpickle). pickle is much cheaper to serialize the input required by vllm. Cloudpickle is useful when you have large data that can zero-copy.

Benchmark

The experimental feature can be used with the nightly Ray for the evaluation. I ran the llama 7B benchmark on g5.12xlarge (4 A10 GPUs), TP=4, 10 iteration.

python benchmark_latency.py --use-ray-compiled-dag --tensor-parallel-size 4 --num-iters 10 --model "meta-llama/Llama-2-7b-hf"

And got the result

Tp4, pathway: 1.7389498465017823
Tp4, default: 2.0027053648009314

which is about 13% improvement.

Limitation

Note that the current nightly implementation has a several limitation. We are working on follow-ups, but the feature itself could be enabled by default after these features are implemented.

Currently, if the actor raises an exception or a worker dies, it hangs forever. It will be fixed by https://github.com/ray-project/ray/pull/41943/files
Currently, once DAG is initialized, you cannot run other actor tasks. It will also be fixed by https://github.com/ray-project/ray/pull/41943/files
Currently, you can have only 1 DAG for a set of actors (however, the 1 DAG can be reused without a new compilation). We are planning to fix this for other use case, but this is not required from VLLM now IIUC.

Followup

I am planning to make a follow up PR for these 2 features after merging https://github.com/ray-project/ray/pull/41943/files. The ETA is 12/26 (when I come back from OOO). Please let me know if async engine is a high priority item for evaluation.

Better error handling
Support async engine

ip Added DeciLM-7b and DeciLM-7b-instruct (vllm-project#2062) .

rkooo567 · 2023-12-19T14:00:01Z

Q: Please let me know Where I should add tests. I was thinking to add a flag to test_models.py, but it seems like it doesn't really test ray config (where tp > 1).

…athway-integration

…into pathway-integration

njhill · 2024-01-13T00:04:10Z

Presumably this is obsolete now that #2221 is merged?

rkooo567 · 2024-01-16T04:53:10Z

@njhill we decided to contribute the feature off by default. We will internally productionize it within anyscale and consider to reenable in the future.

rkooo567 · 2024-01-18T08:46:29Z

Decided to close over #2471

rkooo567 closed this Dec 19, 2023

ip

d0721ac

ip Added DeciLM-7b and DeciLM-7b-instruct (vllm-project#2062) .

rkooo567 reopened this Dec 19, 2023

rkooo567 force-pushed the pathway-integration branch from 560c80f to d0721ac Compare December 19, 2023 12:31

rkooo567 closed this Dec 19, 2023

rkooo567 added 2 commits December 19, 2023 04:32

Merge branch 'main' into pathway-integration

1452213

ip

6945f72

rkooo567 reopened this Dec 19, 2023

fix

ca47550

rkooo567 changed the title ~~todo~~ [Ray Integration] Integrate vllm with experimental accelerated DAG Dec 19, 2023

rkooo567 changed the title ~~[Ray Integration] Integrate vllm with experimental accelerated DAG~~ [Ray Integration] Integrate vllm with experimental accelerated DAG API Dec 19, 2023

lint

0b99c31

simon-mo requested review from zhuohan123, Yard1, WoosukKwon and simon-mo December 19, 2023 20:54

simon-mo assigned zhuohan123 and WoosukKwon Dec 19, 2023

rkooo567 and others added 8 commits December 21, 2023 05:59

.

9e215fb

Use msgspec

1e5bcdd

Merge branch 'pathway-integration' of github.com:rkooo567/vllm into p…

cc019e0

…athway-integration

Fix

5b9a701

Merge branch 'pathway-integration' of https://github.com/rkooo567/vllm …

7167b6b

…into pathway-integration

.

b64e548

Merge branch 'main' into pathway-integration

a1e3d67

.

821e1d5

.

e44894c

rkooo567 added 3 commits January 15, 2024 22:15

Merge branch 'main' into pathway-integration

7c3c6b6

ip

ed83b93

not working for some reasons

69c6495

rkooo567 closed this Jan 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Ray Integration] Integrate vllm with experimental accelerated DAG API #2201

[Ray Integration] Integrate vllm with experimental accelerated DAG API #2201

rkooo567 commented Dec 19, 2023 •

edited

Loading

rkooo567 commented Dec 19, 2023 •

edited

Loading

njhill commented Jan 13, 2024

rkooo567 commented Jan 16, 2024

rkooo567 commented Jan 18, 2024

[Ray Integration] Integrate vllm with experimental accelerated DAG API #2201

[Ray Integration] Integrate vllm with experimental accelerated DAG API #2201

Conversation

rkooo567 commented Dec 19, 2023 • edited Loading

PR summary

Benchmark

Limitation

Followup

rkooo567 commented Dec 19, 2023 • edited Loading

njhill commented Jan 13, 2024

rkooo567 commented Jan 16, 2024

rkooo567 commented Jan 18, 2024

rkooo567 commented Dec 19, 2023 •

edited

Loading

rkooo567 commented Dec 19, 2023 •

edited

Loading