Fix error message on `TORCH_CUDA_ARCH_LIST` #1239

WoosukKwon · 2023-10-01T04:54:38Z

Fixes #1225

This PR fixes the error message when the TORCH_CUDA_ARCH_LIST includes an unsupported CUDA architecture.

setup.py

yunfeng-scale · 2023-10-07T01:17:05Z

what about #1280 ?

zhuohan123 · 2023-10-07T22:46:30Z

@WoosukKwon Can you take a look at #1280 and check which is a better fix?

WoosukKwon · 2023-10-13T17:35:21Z

@zhuohan123 It depends on whether we want to raise an error when the TORCH_CUDA_ARCH_LIST env variable includes a CUDA architecture that vLLM does not support. I feel raising the error is safer, but it might bother some users since some docker images (like NVIDIA PyTorch docker) already include TORCH_CUDA_ARCH_LIST="5.2, ...". WDYT?

zhuohan123 · 2023-10-13T18:14:55Z

@zhuohan123 It depends on whether we want to raise an error when the TORCH_CUDA_ARCH_LIST env variable includes a CUDA architecture that vLLM does not support. I feel raising the error is safer, but it might bother some users since some docker images (like NVIDIA PyTorch docker) already include TORCH_CUDA_ARCH_LIST="5.2, ...". WDYT?

Let's print a warning then. I think otherwise vLLM will block people from using NVIDIA Docker on newer hardware and this is bad.

Fixes #463. Pytorch 2.1.0 (https://github.com/pytorch/pytorch/releases/tag/v2.1.0) was just released just last week, and it's built using CUDA 12.1. The image we're using uses CUDA 11.8, as recommended by vLLM. Previously vLLM specified a dependency on torch>=2.0.0, and picked up this 2.1.0 version. That was pinned back to 2.0.1 in vllm-project/vllm#1290. When picking up that SHA however, we ran into what vllm-project/vllm#1239 fixes. So for now point to temporary fork with that fix.

zhuohan123

LGTM! Thanks for the fix!

Fixes #463. Pytorch 2.1.0 (https://github.com/pytorch/pytorch/releases/tag/v2.1.0) was just released just last week, and it's built using CUDA 12.1. The image we're using uses CUDA 11.8, as recommended by vLLM. Previously vLLM specified a dependency on torch>=2.0.0, and picked up this 2.1.0 version. That was pinned back to 2.0.1 in vllm-project/vllm#1290. When picking up that SHA however, we ran into what vllm-project/vllm#1239 fixes. So for now point to temporary fork with that fix.

WoosukKwon · 2023-10-14T21:49:14Z

@yunfeng-scale Thanks again for the proposal! I fixed the PR as you suggested in #1280

Fixes #463. Pytorch 2.1.0 (https://github.com/pytorch/pytorch/releases/tag/v2.1.0) was just released just last week, and it's built using CUDA 12.1. The image we're using uses CUDA 11.8, as recommended by vLLM. Previously vLLM specified a dependency on torch>=2.0.0, and picked up this 2.1.0 version. That was pinned back to 2.0.1 in vllm-project/vllm#1290. When picking up that SHA however, we ran into what vllm-project/vllm#1239 fixes. So for now point to temporary fork with that fix.

Co-authored-by: Yunfeng Bai <yunfeng.bai@scale.com>

Fix error msg

cd25a26

WoosukKwon requested a review from zhuohan123 October 1, 2023 04:54

alaydshah reviewed Oct 1, 2023

View reviewed changes

setup.py Outdated Show resolved Hide resolved

WoosukKwon mentioned this pull request Oct 13, 2023

[v0.2.1] Release Tracker #1346

Closed

3 tasks

Fix err msg

03e2f9e

yunfeng-scale mentioned this pull request Oct 13, 2023

Filter out unsupported arch list instead of error #1280

Closed

irfansharif mentioned this pull request Oct 13, 2023

Fix vLLM example modal-labs/modal-examples#465

Merged

zhuohan123 approved these changes Oct 14, 2023

View reviewed changes

WoosukKwon merged commit d0740df into main Oct 14, 2023
2 checks passed

WoosukKwon deleted the fix-setup branch October 14, 2023 21:48

hongxiayang pushed a commit to hongxiayang/vllm that referenced this pull request Feb 13, 2024

Fix error message on TORCH_CUDA_ARCH_LIST (vllm-project#1239)

c06a1d0

Co-authored-by: Yunfeng Bai <yunfeng.bai@scale.com>

sjchoi1 pushed a commit to casys-kaist-internal/vllm that referenced this pull request May 7, 2024

Fix error message on TORCH_CUDA_ARCH_LIST (vllm-project#1239)

9a08713

Co-authored-by: Yunfeng Bai <yunfeng.bai@scale.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix error message on `TORCH_CUDA_ARCH_LIST` #1239

Fix error message on `TORCH_CUDA_ARCH_LIST` #1239

WoosukKwon commented Oct 1, 2023

yunfeng-scale commented Oct 7, 2023

zhuohan123 commented Oct 7, 2023

WoosukKwon commented Oct 13, 2023

zhuohan123 commented Oct 13, 2023

zhuohan123 left a comment

WoosukKwon commented Oct 14, 2023

Fix error message on TORCH_CUDA_ARCH_LIST #1239

Fix error message on TORCH_CUDA_ARCH_LIST #1239

Conversation

WoosukKwon commented Oct 1, 2023

yunfeng-scale commented Oct 7, 2023

zhuohan123 commented Oct 7, 2023

WoosukKwon commented Oct 13, 2023

zhuohan123 commented Oct 13, 2023

zhuohan123 left a comment

Choose a reason for hiding this comment

WoosukKwon commented Oct 14, 2023

Fix error message on `TORCH_CUDA_ARCH_LIST` #1239

Fix error message on `TORCH_CUDA_ARCH_LIST` #1239