Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix error message on TORCH_CUDA_ARCH_LIST #1239

Merged
merged 2 commits into from
Oct 14, 2023
Merged

Fix error message on TORCH_CUDA_ARCH_LIST #1239

merged 2 commits into from
Oct 14, 2023

Conversation

WoosukKwon
Copy link
Collaborator

Fixes #1225

This PR fixes the error message when the TORCH_CUDA_ARCH_LIST includes an unsupported CUDA architecture.

setup.py Outdated Show resolved Hide resolved
@yunfeng-scale
Copy link
Contributor

what about #1280 ?

@zhuohan123
Copy link
Member

@WoosukKwon Can you take a look at #1280 and check which is a better fix?

@WoosukKwon WoosukKwon mentioned this pull request Oct 13, 2023
3 tasks
@WoosukKwon
Copy link
Collaborator Author

@zhuohan123 It depends on whether we want to raise an error when the TORCH_CUDA_ARCH_LIST env variable includes a CUDA architecture that vLLM does not support. I feel raising the error is safer, but it might bother some users since some docker images (like NVIDIA PyTorch docker) already include TORCH_CUDA_ARCH_LIST="5.2, ...". WDYT?

@zhuohan123
Copy link
Member

@zhuohan123 It depends on whether we want to raise an error when the TORCH_CUDA_ARCH_LIST env variable includes a CUDA architecture that vLLM does not support. I feel raising the error is safer, but it might bother some users since some docker images (like NVIDIA PyTorch docker) already include TORCH_CUDA_ARCH_LIST="5.2, ...". WDYT?

Let's print a warning then. I think otherwise vLLM will block people from using NVIDIA Docker on newer hardware and this is bad.

irfansharif added a commit to modal-labs/modal-examples that referenced this pull request Oct 13, 2023
Fixes #463. Pytorch 2.1.0 (https://github.com/pytorch/pytorch/releases/tag/v2.1.0)
was just released just last week, and it's built using CUDA 12.1. The
image we're using uses CUDA 11.8, as recommended by vLLM. Previously
vLLM specified a dependency on torch>=2.0.0, and picked up this 2.1.0
version. That was pinned back to 2.0.1 in
vllm-project/vllm#1290. When picking up that SHA
however, we ran into what vllm-project/vllm#1239
fixes. So for now point to temporary fork with that fix.
Copy link
Member

@zhuohan123 zhuohan123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for the fix!

irfansharif added a commit to modal-labs/modal-examples that referenced this pull request Oct 14, 2023
Fixes #463. Pytorch 2.1.0 (https://github.com/pytorch/pytorch/releases/tag/v2.1.0)
was just released just last week, and it's built using CUDA 12.1. The
image we're using uses CUDA 11.8, as recommended by vLLM. Previously
vLLM specified a dependency on torch>=2.0.0, and picked up this 2.1.0
version. That was pinned back to 2.0.1 in
vllm-project/vllm#1290. When picking up that SHA
however, we ran into what vllm-project/vllm#1239
fixes. So for now point to temporary fork with that fix.
@WoosukKwon WoosukKwon merged commit d0740df into main Oct 14, 2023
2 checks passed
@WoosukKwon WoosukKwon deleted the fix-setup branch October 14, 2023 21:48
@WoosukKwon
Copy link
Collaborator Author

@yunfeng-scale Thanks again for the proposal! I fixed the PR as you suggested in #1280

gongy pushed a commit to modal-labs/modal-examples that referenced this pull request Jan 5, 2024
Fixes #463. Pytorch 2.1.0 (https://github.com/pytorch/pytorch/releases/tag/v2.1.0)
was just released just last week, and it's built using CUDA 12.1. The
image we're using uses CUDA 11.8, as recommended by vLLM. Previously
vLLM specified a dependency on torch>=2.0.0, and picked up this 2.1.0
version. That was pinned back to 2.0.1 in
vllm-project/vllm#1290. When picking up that SHA
however, we ran into what vllm-project/vllm#1239
fixes. So for now point to temporary fork with that fix.
hongxiayang pushed a commit to hongxiayang/vllm that referenced this pull request Feb 13, 2024
Co-authored-by: Yunfeng Bai <yunfeng.bai@scale.com>
sjchoi1 pushed a commit to casys-kaist-internal/vllm that referenced this pull request May 7, 2024
Co-authored-by: Yunfeng Bai <yunfeng.bai@scale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Installation Error
4 participants