[ROCm] Fixup arch checks for ROCM #2627

dllehr-amd · 2024-01-27T17:39:34Z

The ROCM stack with PyTorch supports a wide set of gfx architectures. This can be displayed by printing PYTORCH_ROCM_ARCH env. In the absence of PYTORCH_ROCM_ARCH pytorch uses theoutput from rocm_agent_enumerator to choose what to compile for.

vllm supports a subset of these, (gfx908, gfx90a,...)

Due to a need to potentially support multiple architectures at once (ex. docker image) it's important to make sure vllm is compiled with them all unless specified otherwise.

We now gather either the PYTORCH_ROCM_ARCH env or rocm_agent_enumerator output and cross reference with ROCM_SUPPORTED_ARCHS from vllm to generate a list of arches to build for.

The ROCM stack with PyTorch supports a wide set of gfx architectures. This can be displayed by printing PYTORCH_ROCM_ARCH env. In the absence of PYTORCH_ROCM_ARCH pytorch uses theoutput from rocm_agent_enumerator to choose what to compile for. vllm supports a subset of these, (gfx908, gfx90a,...) Due to a need to potentially support multiple architectures at once (ex. docker image) it's important to make sure vllm is compiled with them all unless specified otherwise. We now gather either the PYTORCH_ROCM_ARCH env or rocm_agent_enumerator output and cross reference with ROCM_SUPPORTED_ARCHS from vllm to generate a list of arches to build for.

WoosukKwon · 2024-01-29T19:00:41Z

Hi @dllehr-amd, thanks for submitting the PR! This is a bit confusing. It seems like there are 3 environment variables:

PyTorch uses TORCH_CUDA_ARCH_LIST env var for specifying the target NVIDIA GPUs.
PyTorch w/ ROCm backend uses PYTORCH_ROCM_ARCH env var for specifying the target AMD GPUs.
The ROCm flash-attn repo uses GPU_ARCHS env var for specifying the target AMD GPUs.

And here we are using PYTORCH_ROCM_ARCH . Is my understanding correct?

Flash-Attention currently supports the MI2xx and MI3xx architectures. Modify vllm's support matrix during build to reflect this.

dllehr-amd · 2024-02-01T22:12:55Z

Hi @WoosukKwon. I made a change to vllm arch's to match the ones supported by our Flash Attention(GPU_ARCHS) As we'd be hard pressed to guarantee support on the other gfx's. There are still your three variables, but ROCM_SUPPORTED_ARCHS is reduced to the amount that Flash Attention supports.

The way this PR works now, is it'll take all of the architectures PyTorch built with, and cross reference with what vllm supports, and only build for those.

As of this PR, if you build it will only build for gfx90a and gfx942. Without any additional input from a user.

Does that make things any more clear?

Thanks!

WoosukKwon

LGTM! Thanks for the fix! BTW, it'd be nicer if we can have a CI/CD pipeline for AMD GPUs :)

jamestwhedbee · 2024-02-06T17:29:40Z

I made a change to vllm arch's to match the ones supported by our Flash Attention(GPU_ARCHS) As we'd be hard pressed to guarantee support on the other gfx's.

@WoosukKwon @dllehr-amd just so folks know, since this PR, the ROCm Flash Attention officially supports gfx_908 (it always did, it just wasn't documented until this PR)

jamestwhedbee · 2024-02-06T21:05:34Z

I've opened #2792

dllehr-amd added 2 commits January 27, 2024 12:36

yapf cleanup

a9d752c

WoosukKwon added the rocm label Jan 28, 2024

WoosukKwon self-requested a review January 29, 2024 18:55

WoosukKwon mentioned this pull request Jan 29, 2024

ROCm: Allow setting compilation target #2581

Merged

[ROCm] Move supported ROCM arches to gfx90a and gfx942

a3d4f58

Flash-Attention currently supports the MI2xx and MI3xx architectures. Modify vllm's support matrix during build to reflect this.

Merge branch 'main' into rocm_arch_fix

6f703d0

WoosukKwon approved these changes Feb 5, 2024

View reviewed changes

WoosukKwon merged commit 2ccee3d into vllm-project:main Feb 5, 2024
15 of 17 checks passed

hongxiayang pushed a commit to hongxiayang/vllm that referenced this pull request Feb 13, 2024

[ROCm] Fixup arch checks for ROCM (vllm-project#2627)

742ccc0

alexm-redhat pushed a commit to neuralmagic/nm-vllm that referenced this pull request Feb 13, 2024

[ROCm] Fixup arch checks for ROCM (vllm-project#2627)

9641707

jvmncs pushed a commit to jvmncs/vllm that referenced this pull request Feb 14, 2024

[ROCm] Fixup arch checks for ROCM (vllm-project#2627)

419b31d

xjpang pushed a commit to xjpang/vllm that referenced this pull request Feb 20, 2024

[ROCm] Fixup arch checks for ROCM (vllm-project#2627)

4a3a3a0

xjpang pushed a commit to xjpang/vllm that referenced this pull request Feb 22, 2024

[ROCm] Fixup arch checks for ROCM (vllm-project#2627)

2d49f43

andy-neuma mentioned this pull request Feb 23, 2024

andy/bump main to v0.3.2 neuralmagic/nm-vllm#49

Closed

xjpang pushed a commit to xjpang/vllm that referenced this pull request Mar 4, 2024

[ROCm] Fixup arch checks for ROCM (vllm-project#2627)

068e4e6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ROCm] Fixup arch checks for ROCM #2627

[ROCm] Fixup arch checks for ROCM #2627

dllehr-amd commented Jan 27, 2024

WoosukKwon commented Jan 29, 2024

dllehr-amd commented Feb 1, 2024

WoosukKwon left a comment

jamestwhedbee commented Feb 6, 2024

jamestwhedbee commented Feb 6, 2024

[ROCm] Fixup arch checks for ROCM #2627

[ROCm] Fixup arch checks for ROCM #2627

Conversation

dllehr-amd commented Jan 27, 2024

WoosukKwon commented Jan 29, 2024

dllehr-amd commented Feb 1, 2024

WoosukKwon left a comment

Choose a reason for hiding this comment

jamestwhedbee commented Feb 6, 2024

jamestwhedbee commented Feb 6, 2024