Installing with ROCM #621

baderex · 2023-07-30T22:31:29Z

Hello,

I'm trying to install VLLM on AMD server. However unable to build the package because CUDA is not installed. Is their anyway we can configure it to work with ROCM instead?

!pip install vllm

Error:
RuntimeError: Cannot find CUDA_HOME. CUDA must be available in order to build the package.

ROCM is installed and verified

PyTorch 2-0-ROCm

zhuohan123 · 2023-08-07T22:23:01Z

AMD GPUs are not supported now and we just added this into our roadmap.

CC @naed90

ccbadd · 2023-08-14T13:46:47Z

My server is running a pair of MI100's and I would love to try this out. Please add rocm support.

naed90 · 2023-08-14T15:50:36Z

My server is running a pair of MI100's and I would love to try this out. Please add rocm support.

Hey! Currently planning to work on this. If you can reach out, we’d appreciate it (got some questions): Dean.leitersdorf@gmail.com

jamestwhedbee · 2023-08-16T14:03:05Z

Hey @naed90, I would also love to try this out on MI100s. I am going to go ahead and shoot you an email as well.

chymian · 2023-08-22T12:42:00Z

is it possible to get clblast/VULKAN support, for a bigger range of supported cards, as well?
there are a lot of older cards, which are only partly/not supported by the latest ROCm, drivers.

GdRottoli · 2023-08-22T15:13:01Z

The enhancement for AMD Support was added two weeks ago. Do you have any info about the current state of this issue? Thanks!

ehartford · 2023-09-16T05:17:21Z

Very interested in this feature

smiraldr · 2023-09-21T10:59:21Z

@GdRottoli can we have some documentation on this feature too? Don't see any docs on how to use vllm with MI100s

ehartford · 2023-09-21T15:45:50Z

I also want to use vllm on mi100s

smiraldr · 2023-09-22T14:17:18Z

If anyone can point me towards the code which achieve this - I can experiment and I'd also be willing to contribute to the docs !

ardfork · 2023-09-29T16:47:14Z

The major blocker for a ROCm HIP port is xFormers (and flash attention). Without that, it shouldn't be that hard to hipify vLLM.

Edit: After taking a closer look, vLLM also use a lot of inline PTX assembly, that will also be annoying to port.

fxmarty · 2023-10-04T09:16:15Z

Inline PTX seem to be mostly related to AWQ, right?

pcmoritz · 2023-10-10T17:44:58Z

Some progress on this: #1313

I believe the flash attention part can be solved with https://github.com/ROCmSoftwarePlatform/flash-attention

If somebody has bandwidth to port the AWQ kernels, that would be very much appreciated, but it is not blocking for now :)

sabreshao · 2023-10-16T08:13:53Z

Hi, this is sabre from AMD. We recognized the value of VLLM and are already working on both xformers and VLLM for ROCm. Stay tuned!

ehartford · 2023-10-16T08:42:33Z

I can't wait to try it!

…

On Mon, Oct 16, 2023, 3:14 AM sabreshao ***@***.***> wrote: Hi, this is sabre from AMD. We recognized the value of VLLM and are already working on both xformers <https://github.com/facebookresearch/xformers> and VLLM for ROCm. Stay tuned! — Reply to this email directly, view it on GitHub <#621 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAIQ4BNNSTG6GMSJWKPXAI3X7TUEXANCNFSM6AAAAAA25O2YYI> . You are receiving this because you commented.Message ID: ***@***.***>

bennmann · 2023-10-27T15:59:44Z

I upvote the main post and have the same needs here

$ python -m pip list | grep rocm5.6
torch 2.2.0.dev20231003+rocm5.6

But when I go to install vllm (whl?) does not seem up to date to allow this version of torch

tjtanaa · 2023-10-27T16:24:53Z

I upvote the main post and have the same needs here
$ python -m pip list | grep rocm5.6
torch 2.2.0.dev20231003+rocm5.6
But when I go to install vllm (whl?) does not seem up to date to allow this version of torch

You can try my fork which is a ROCm-port of vllm v0.1.4. You can follow the setup procedure in https://github.com/EmbeddedLLM/vllm-rocm
We have been successful in running Llama-2 7b/13b/70b and Vicuna 7b/13b/33b on MI210.
We are also working on v0.2.x.
Stay tune.

bennmann · 2023-10-27T19:57:21Z

In addition to the good community forks, here's the llamacpp ROCM commit for inspiration on this topic: https://github.com/ggerganov/llama.cpp/pull/1087/commits

fxmarty · 2023-10-30T07:56:16Z

I upvote the main post and have the same needs here
$ python -m pip list | grep rocm5.6
torch 2.2.0.dev20231003+rocm5.6
But when I go to install vllm (whl?) does not seem up to date to allow this version of torch
You can try my fork which is a ROCm-port of vllm v0.1.4. You can follow the setup procedure in https://github.com/EmbeddedLLM/vllm-rocm We have been successful in running Llama-2 7b/13b/70b and Vicuna 7b/13b/33b on MI210. We are also working on v0.2.x. Stay tune.

Also building from #1313 went just fine for me.

tanpinsiang · 2023-11-05T15:34:47Z

With the integration of flash attention v2 we can report vLLM v0.2.1 on ROCm achieved speedup of > 2x for LLaMA-70B model and > 3x for LLaMA-7B/13B on MI210 compared to vLLM v0.1.4. We are trying to port AWQ and contribute to #1313.

ccbadd · 2023-11-05T17:53:44Z

Hi, this is sabre from AMD. We recognized the value of VLLM and are already working on both xformers and VLLM for ROCm. Stay tuned!

So is only the 210 and newer cards supported? If so that's pretty worthless for most of us. I started to install a little bit ago but once I went to install flash attention it stopped me in my tracks.

ehartford · 2023-11-05T18:36:34Z

Please support mi100s, I will be forever grateful.

…

On Sun, Nov 5, 2023, 11:53 AM ChrisK ***@***.***> wrote: Hi, this is sabre from AMD. We recognized the value of VLLM and are already working on both xformers <https://github.com/facebookresearch/xformers> and VLLM for ROCm. Stay tuned! So is only the 210 and newer cards supported? If so that's pretty worthless for most of us. I started to install a little bit ago but once I went to install flash attention it stopped me in my tracks. — Reply to this email directly, view it on GitHub <#621 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAIQ4BMXYY7JCS6DYUPSQWLYC7ACHAVCNFSM6AAAAAA25O2YYKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOJTHAYDGMRZGY> . You are receiving this because you commented.Message ID: ***@***.***>

yourbuddyconner · 2024-01-14T23:11:33Z

FWIW in case anyone else is following this, it looks like ROCm support was ported from the vLLM fork linked in this thread and merged upstream.

Here's the docs: https://docs.vllm.ai/en/latest/getting_started/amd-installation.html

hongxiayang · 2024-02-06T15:26:05Z

@ehartford etc: For folks who are interested in vllm on MI100: try this fork: https://github.com/hongxiayang/vllm/tree/navi3x_rocm6. Let me know if you run into problems.

ehartford · 2024-02-06T18:14:00Z

@ehartford etc: For folks who are interested in vllm on MI100: try this fork: https://github.com/hongxiayang/vllm/tree/navi3x_rocm6. Let me know if you run into problems.

this is for rocm6 specifically? I use rocm 5.7 because pytorch nightly uses rocm 5.7.

hongxiayang · 2024-02-06T18:21:25Z

@ehartford etc: For folks who are interested in vllm on MI100: try this fork: https://github.com/hongxiayang/vllm/tree/navi3x_rocm6. Let me know if you run into problems.

this is for rocm6 specifically? I use rocm 5.7 because pytorch nightly uses rocm 5.7.

This is how you can build and run the docker:
you can change parameters below, but I tested on below 6.0 docker using llama2-7b model weights.

BASE_IMAGE="rocm/pytorch:rocm6.0_ubuntu20.04_py3.9_pytorch_2.1.1"
DockerImageName="vllm-${BASE_IMAGE}"
docker build --build-arg BASE_IMAGE="$BASE_IMAGE" --build-arg BUILD_FA="0"  -f Dockerfile.rocm -t "$DockerImageName" . 

sudo docker run -it --network=host --device=/dev/kfd --device=/dev/dri --group-add=video --ipc=host \
      --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --shm-size 8G  -v $PATH_TO_MODEL_WEIGHTS:/app/model "$DockerImageName"

hongxiayang · 2024-02-06T18:36:22Z

BASE_IMAGE for ROCm_5.7: "rocm/pytorch:rocm5.7_ubuntu22.04_py3.10_pytorch_2.0.1"

tjtanaa · 2024-02-07T06:01:19Z

@ehartford I found there are people successfully compile vLLM for MI100 on ROCm 5.7. The only changes need to be made is to pass the AMD GPU architecture gfx908 to compile flash-attention-rocm.
Reply by @TNT3530 in Discord
vLLM release version v0.2.7, built from source using the docker ROCm instructions. You'll need to add "gfx908" to the flash attention valid arch array in setup.py so it'll compile as well.

and @jamestwhedbee has opened a simple PR that specify gfx908 during vLLM compilation
#2792

Kudos to them to verify that the flash-attention-rocm can also be compiled for MI100.

jamestwhedbee · 2024-02-14T17:16:01Z

No review on #2792 yet, not sure the best way to get more eyes on it

cocoderss · 2024-03-02T11:05:24Z

ROCm currently has many many issues, would it be a possibility to install vllm using vulkan or clblast with some limited performance? This would be great specifically for AMD iGPU (APU)

thebeline · 2024-03-18T15:21:08Z

ROCm currently has many many issues, would it be a possibility to install vllm using vulkan or clblast with some limited performance? This would be great specifically for AMD iGPU (APU)

It looks like it was merged in,a good sign, no?

Does anyone have any benchmarks on this? How does it stack up against an A100? Lower, I am sure, but if it is even close...

fxmarty · 2024-03-19T04:57:31Z

@thebeline At least for TGI, that relies on VLLM's paged attention kernel, you can find some benchmarks here: https://huggingface.co/blog/huggingface-and-optimum-amd#production-solutions

linchen111 · 2024-06-30T07:49:42Z

@ehartford I found there are people successfully compile vLLM for MI100 on ROCm 5.7. The only changes need to be made is to pass the AMD GPU architecture gfx908 to compile flash-attention-rocm.我发现有人在 ROCm 5.7 上成功编译了 MI100 的 vLLM。唯一需要做的改变就是通过AMD GPU架构 gfx908 来编译flash-attention-rocm。 Reply by @TNT3530 in Discord 在 Discord 中回复 vLLM release version v0.2.7, built from source using the docker ROCm instructions. You'll need to add "gfx908" to the flash attention valid arch array in setup.py so it'll compile as well.

and @jamestwhedbee has opened a simple PR that specify gfx908 during vLLM compilation并打开了一个简单的 PR，在 vLLM 编译期间指定 gfx908 #2792

Kudos to them to verify that the flash-attention-rocm can also be compiled for MI100.感谢他们验证了 flash-attention-rocm 也可以为 MI100 编译。

hi， do you have pre-built vllm docker image for rocm-5.7?

lookfirst · 2024-06-30T09:19:47Z

https://www.nscale.com/blog/nscale-benchmarks-amd-mi300x-gpus-with-gemm-tuning-improves-throughput-and-latency-by-up-to-7-2x

Docker Image

ROCm 6.1.2
Python 3.10.12
PyTorch 2.5.0
Triton 2.1.0
Flash Attention 2.0.4
rocBLAS 4.1.2
hipBLASlt 0.8.0
Rccl 2.18.6
vLLM 0.5.0

linchen111 · 2024-06-30T09:42:56Z

https://www.nscale.com/blog/nscale-benchmarks-amd-mi300x-gpus-with-gemm-tuning-improves-throughput-and-latency-by-up-to-7-2x

Docker Image

ROCm 6.1.2 Python 3.10.12 PyTorch 2.5.0 Triton 2.1.0 Flash Attention 2.0.4 rocBLAS 4.1.2 hipBLASlt 0.8.0 Rccl 2.18.6 vLLM 0.5.0

Thanks! pulling it now.

if I use MI100s with:

=============================== Link Type between two GPUs ===============================
GPU0 GPU1 GPU2 GPU3 GPU4 GPU5 GPU6 GPU7
GPU0 0 PCIE PCIE PCIE PCIE PCIE PCIE PCIE
GPU1 PCIE 0 PCIE PCIE PCIE PCIE PCIE PCIE
GPU2 PCIE PCIE 0 PCIE PCIE PCIE PCIE PCIE
GPU3 PCIE PCIE PCIE 0 PCIE PCIE PCIE PCIE
GPU4 PCIE PCIE PCIE PCIE 0 PCIE PCIE PCIE
GPU5 PCIE PCIE PCIE PCIE PCIE 0 PCIE PCIE
GPU6 PCIE PCIE PCIE PCIE PCIE PCIE 0 PCIE
GPU7 PCIE PCIE PCIE PCIE PCIE PCIE PCIE 0

should I do some special settings?

linchen111 · 2024-06-30T11:07:50Z

https://www.nscale.com/blog/nscale-benchmarks-amd-mi300x-gpus-with-gemm-tuning-improves-throughput-and-latency-by-up-to-7-2x

Docker Image

ROCm 6.1.2 Python 3.10.12 PyTorch 2.5.0 Triton 2.1.0 Flash Attention 2.0.4 rocBLAS 4.1.2 hipBLASlt 0.8.0 Rccl 2.18.6 vLLM 0.5.0

And I have met this:
INFO 06-30 11:07:12 selector.py:56] Using ROCmFlashAttention backend.
Traceback (most recent call last):
File "/root/benchmarks/benchmark_throughput.py", line 411, in
main(args)
File "/root/benchmarks/benchmark_throughput.py", line 223, in main
elapsed_time = run_vllm(
File "/root/benchmarks/benchmark_throughput.py", line 86, in run_vllm
llm = LLM(
File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/llm.py", line 144, in init
self.llm_engine = LLMEngine.from_engine_args(
File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 363, in from_engine_args
engine = cls(
File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 223, in init
self.model_executor = executor_class(
File "/usr/local/lib/python3.10/dist-packages/vllm/executor/executor_base.py", line 41, in init
self._init_executor()
File "/usr/local/lib/python3.10/dist-packages/vllm/executor/gpu_executor.py", line 23, in _init_executor
self.driver_worker.init_device()
File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker.py", line 110, in init_device
self.init_gpu_memory = torch.cuda.mem_get_info()[0]
File "/usr/local/lib/python3.10/dist-packages/torch/cuda/memory.py", line 685, in mem_get_info
return torch.cuda.cudart().cudaMemGetInfo(device)
RuntimeError: HIP error: invalid argument
HIP kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing AMD_SERIALIZE_KERNEL=3
Compile with TORCH_USE_HIP_DSA to enable device-side assertions.

linchen111 · 2024-07-03T02:45:42Z

https://www.nscale.com/blog/nscale-benchmarks-amd-mi300x-gpus-with-gemm-tuning-improves-throughput-and-latency-by-up-to-7-2x

Docker Image

ROCm 6.1.2 Python 3.10.12 PyTorch 2.5.0 Triton 2.1.0 Flash Attention 2.0.4 rocBLAS 4.1.2 hipBLASlt 0.8.0 Rccl 2.18.6 vLLM 0.5.0

seems not working in MI100

lookfirst · 2024-07-03T02:47:23Z

Likely not as this is focused on mi300x. Probably compiled for gfx942.

linchen111 · 2024-07-03T03:06:51Z

Likely not as this is focused on mi300x. Probably compiled for gfx942.

guess so , I failed compiling for gfx908~

bennmann · 2024-07-03T14:02:30Z

You could try what consumer GPU workaround exist for stable diffusion workflows on non-suoported ROCM gpus, HSA override variable (I forget exactly which variable but easy to search).

…

On Tue, Jul 2, 2024, 11:07 PM linshuai ***@***.***> wrote: Likely not as this is focused on mi300x. Probably compiled for gfx942. guess so , I failed compiling for gfx908~ — Reply to this email directly, view it on GitHub <#621 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEMUTTVVPVRRX2WFUUW7LMTZKNTGFAVCNFSM6AAAAAA25O2YYKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMBUHE4DAMJRGA> . You are receiving this because you commented.Message ID: ***@***.***>

hongxiayang · 2024-07-03T21:33:18Z

For gfx908. As far as I remember:
(1) I have updated the flash-attn branch to ae7928c as someone mentioned this branch has support for gfx908.
(2) you can build using the Dockerfile.rocm by specifying FA_GFX_ARCHS="gfx908" using the --build-arg parameter when doing docker build as explained in this file:

Please build your own docker image, and let's start from there.

hongxiayang · 2024-07-03T21:36:19Z

This issue is very old, and should be closed as the initial request for ROCm support has already there. cc @zhuohan123
Any new issues, please open a new one.

hmellor · 2024-07-04T13:42:10Z

https://docs.vllm.ai/en/latest/getting_started/amd-installation.html

linchen111 · 2024-07-05T15:57:31Z

File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker.py", line 110, in init_device
self.init_gpu_memory = torch.cuda.mem_get_info()[0]
File "/usr/local/lib/python3.10/dist-packages/torch/cuda/memory.py", line 685, in mem_get_info
return torch.cuda.cudart().cudaMemGetInfo(device)
RuntimeError: HIP error: invalid argument
HIP kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing AMD_SERIALIZE_KERNEL=3
Compile with TORCH_USE_HIP_DSA to enable device-side assertions.

Did docker image building, but failed with same error

With this patch, mp executor does not hang at the end of application out of the box, and exits gracefully.

zhuohan123 mentioned this issue Aug 7, 2023

[Roadmap] vLLM Development Roadmap: H2 2023 #244

Closed

76 tasks

zhuohan123 added feature request enhancement New feature or request and removed feature request labels Aug 7, 2023

bennmann mentioned this issue Oct 27, 2023

ROCM support AMD Oct 2023 facebookresearch/doc-storygen-v2#3

Open

hmellor added feature request and removed enhancement New feature or request labels Mar 15, 2024

hmellor closed this as completed Jul 4, 2024

pi314ever pushed a commit to pi314ever/vllm that referenced this issue Dec 12, 2024

Fix multiprocessing executor shutdown (vllm-project#621)

d312c92

With this patch, mp executor does not hang at the end of application out of the box, and exits gracefully.

Installing with ROCM #621

Installing with ROCM #621

Comments

baderex commented Jul 30, 2023 • edited Loading

zhuohan123 commented Aug 7, 2023

ccbadd commented Aug 14, 2023

naed90 commented Aug 14, 2023

jamestwhedbee commented Aug 16, 2023

chymian commented Aug 22, 2023

GdRottoli commented Aug 22, 2023

ehartford commented Sep 16, 2023

smiraldr commented Sep 21, 2023

ehartford commented Sep 21, 2023

smiraldr commented Sep 22, 2023

ardfork commented Sep 29, 2023 • edited Loading

fxmarty commented Oct 4, 2023

pcmoritz commented Oct 10, 2023

sabreshao commented Oct 16, 2023

ehartford commented Oct 16, 2023 via email

bennmann commented Oct 27, 2023

tjtanaa commented Oct 27, 2023

bennmann commented Oct 27, 2023

fxmarty commented Oct 30, 2023

tanpinsiang commented Nov 5, 2023

ccbadd commented Nov 5, 2023

ehartford commented Nov 5, 2023 via email

yourbuddyconner commented Jan 14, 2024

hongxiayang commented Feb 6, 2024

ehartford commented Feb 6, 2024

hongxiayang commented Feb 6, 2024 • edited Loading

hongxiayang commented Feb 6, 2024

tjtanaa commented Feb 7, 2024 • edited Loading

jamestwhedbee commented Feb 14, 2024

cocoderss commented Mar 2, 2024

thebeline commented Mar 18, 2024

fxmarty commented Mar 19, 2024

linchen111 commented Jun 30, 2024

lookfirst commented Jun 30, 2024 • edited Loading

linchen111 commented Jun 30, 2024

linchen111 commented Jun 30, 2024

linchen111 commented Jul 3, 2024

lookfirst commented Jul 3, 2024

linchen111 commented Jul 3, 2024

bennmann commented Jul 3, 2024 via email

hongxiayang commented Jul 3, 2024 • edited Loading

hongxiayang commented Jul 3, 2024

hmellor commented Jul 4, 2024

linchen111 commented Jul 5, 2024

baderex commented Jul 30, 2023 •

edited

Loading

ardfork commented Sep 29, 2023 •

edited

Loading

hongxiayang commented Feb 6, 2024 •

edited

Loading

tjtanaa commented Feb 7, 2024 •

edited

Loading

lookfirst commented Jun 30, 2024 •

edited

Loading

hongxiayang commented Jul 3, 2024 •

edited

Loading