[Misc] Add tqdm progress bar during graph capture #11349

mgoin · 2024-12-19T22:59:27Z

Nice idea inspired by sgl-project/sglang#2502

Example of log:

vllm serve meta-llama/Llama-3.2-1B-Instruct
 ...
INFO 12-19 22:58:27 model_runner.py:1415] Capturing cudagraphs for decoding. This may lead to unexpected consequences if the model is not static. To run the model in eager mode, set 'enforce_eager=True' or use '--enforce-eager' in the CLI. If out-of-memory error occurs during cudagraph capture, consider decreasing `gpu_memory_utilization` or switching to eager mode. You can also reduce the `max_num_seqs` as needed to decrease memory usage.
Capturing CUDA graph shapes: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 35/35 [00:13<00:00,  2.51it/s]
INFO 12-19 22:58:41 model_runner.py:1535] Graph capturing finished in 14 secs, took 0.21 GiB

vllm serve meta-llama/Llama-3.2-1B-Instruct -tp 2
 ...
INFO 12-19 22:57:03 model_runner.py:1415] Capturing cudagraphs for decoding. This may lead to unexpected consequences if the model is not static. To run the model in eager mode, set 'enforce_eager=True' or use '--enforce-eager' in the CLI. If out-of-memory error occurs during cudagraph capture, consider decreasing `gpu_memory_utilization` or switching to eager mode. You can also reduce the `max_num_seqs` as needed to decrease memory usage.
Capturing CUDA graph shapes:  80%|██████████████████████████████████████████████████████████████████████████████████████████████▍                       | 28/35 [00:13<00:02,  2.64it/s](VllmWorkerProcess pid=2099548) INFO 12-19 22:57:17 custom_all_reduce.py:224] Registering 1155 cuda graph addresses
Capturing CUDA graph shapes: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 35/35 [00:15<00:00,  2.20it/s]
INFO 12-19 22:57:19 custom_all_reduce.py:224] Registering 1155 cuda graph addresses
INFO 12-19 22:57:19 model_runner.py:1535] Graph capturing finished in 16 secs, took 0.14 GiB

Signed-off-by: mgoin <michael@neuralmagic.com>

github-actions · 2024-12-19T22:59:39Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

jeejeelee

It's very useful! thanks

INFO 12-20 01:21:29 model_runner.py:1415] Capturing cudagraphs for decoding. This may lead to unexpected consequences if the model is not static. To run the model in eager mode, set 'enforce_eager=True' or use '--enforce-eager' in the CLI. If out-of-memory error occurs during cudagraph capture, consider decreasing `gpu_memory_utilization` or switching to eager mode. You can also reduce the `max_num_seqs` as needed to decrease memory usage.
Capturing CUDA graph shapes: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:06<00:00,  1.32s/it]
INFO 12-20 01:21:36 model_runner.py:1535] Graph capturing finished in 7 secs, took 0.09 GiB

Signed-off-by: mgoin <michael@neuralmagic.com> Signed-off-by: lucast2021 <lucast2021@headroyce.org>

Signed-off-by: mgoin <michael@neuralmagic.com>

Signed-off-by: mgoin <michael@neuralmagic.com> Signed-off-by: Bowen Wang <abmfy@icloud.com>

Signed-off-by: mgoin <michael@neuralmagic.com>

Add tqdm progress bar during graph capture

294cb7b

Signed-off-by: mgoin <michael@neuralmagic.com>

jeejeelee approved these changes Dec 20, 2024

View reviewed changes

DarkLight1337 enabled auto-merge (squash) December 20, 2024 03:09

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 20, 2024

DarkLight1337 merged commit b880ffb into vllm-project:main Dec 20, 2024
68 checks passed

lucas-tucker pushed a commit to lucas-tucker/vllm-lucas-tucker that referenced this pull request Dec 21, 2024

[Misc] Add tqdm progress bar during graph capture (vllm-project#11349)

547fafb

Signed-off-by: mgoin <michael@neuralmagic.com> Signed-off-by: lucast2021 <lucast2021@headroyce.org>

BKitor pushed a commit to BKitor/vllm that referenced this pull request Dec 30, 2024

[Misc] Add tqdm progress bar during graph capture (vllm-project#11349)

59a6558

Signed-off-by: mgoin <michael@neuralmagic.com>

joennlae pushed a commit to 44ai-labs/vllm that referenced this pull request Jan 19, 2025

[Misc] Add tqdm progress bar during graph capture (vllm-project#11349)

4d9d054

Signed-off-by: mgoin <michael@neuralmagic.com>

joennlae pushed a commit to 44ai-labs/vllm that referenced this pull request Jan 19, 2025

[Misc] Add tqdm progress bar during graph capture (vllm-project#11349)

5233bef

Signed-off-by: mgoin <michael@neuralmagic.com>

abmfy pushed a commit to abmfy/vllm-flashinfer that referenced this pull request Jan 24, 2025

[Misc] Add tqdm progress bar during graph capture (vllm-project#11349)

3d64a4c

Signed-off-by: mgoin <michael@neuralmagic.com> Signed-off-by: Bowen Wang <abmfy@icloud.com>

abmfy pushed a commit to abmfy/vllm-flashinfer that referenced this pull request Jan 24, 2025

[Misc] Add tqdm progress bar during graph capture (vllm-project#11349)

9918253

Signed-off-by: mgoin <michael@neuralmagic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Misc] Add tqdm progress bar during graph capture #11349

[Misc] Add tqdm progress bar during graph capture #11349

mgoin commented Dec 19, 2024 •

edited by github-actions bot

Loading

github-actions bot commented Dec 19, 2024

jeejeelee left a comment

[Misc] Add tqdm progress bar during graph capture #11349

[Misc] Add tqdm progress bar during graph capture #11349

Conversation

mgoin commented Dec 19, 2024 • edited by github-actions bot Loading

github-actions bot commented Dec 19, 2024

jeejeelee left a comment

Choose a reason for hiding this comment

mgoin commented Dec 19, 2024 •

edited by github-actions bot

Loading