-
-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Installing with ROCM #621
Comments
AMD GPUs are not supported now and we just added this into our roadmap. CC @naed90 |
My server is running a pair of MI100's and I would love to try this out. Please add rocm support. |
Hey! Currently planning to work on this. If you can reach out, we’d appreciate it (got some questions): Dean.leitersdorf@gmail.com |
Hey @naed90, I would also love to try this out on MI100s. I am going to go ahead and shoot you an email as well. |
is it possible to get clblast/VULKAN support, for a bigger range of supported cards, as well? |
The enhancement for AMD Support was added two weeks ago. Do you have any info about the current state of this issue? Thanks! |
Very interested in this feature |
@GdRottoli can we have some documentation on this feature too? Don't see any docs on how to use vllm with MI100s |
I also want to use vllm on mi100s |
If anyone can point me towards the code which achieve this - I can experiment and I'd also be willing to contribute to the docs ! |
The major blocker for a ROCm HIP port is xFormers (and flash attention). Without that, it shouldn't be that hard to hipify vLLM. Edit: After taking a closer look, vLLM also use a lot of inline PTX assembly, that will also be annoying to port. |
Inline PTX seem to be mostly related to AWQ, right? |
Some progress on this: #1313 I believe the flash attention part can be solved with https://github.com/ROCmSoftwarePlatform/flash-attention If somebody has bandwidth to port the AWQ kernels, that would be very much appreciated, but it is not blocking for now :) |
Hi, this is sabre from AMD. We recognized the value of VLLM and are already working on both xformers and VLLM for ROCm. Stay tuned! |
I can't wait to try it!
…On Mon, Oct 16, 2023, 3:14 AM sabreshao ***@***.***> wrote:
Hi, this is sabre from AMD. We recognized the value of VLLM and are
already working on both xformers
<https://github.com/facebookresearch/xformers> and VLLM for ROCm. Stay
tuned!
—
Reply to this email directly, view it on GitHub
<#621 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAIQ4BNNSTG6GMSJWKPXAI3X7TUEXANCNFSM6AAAAAA25O2YYI>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
I upvote the main post and have the same needs here
But when I go to install vllm (whl?) does not seem up to date to allow this version of torch |
You can try my fork which is a ROCm-port of vllm v0.1.4. You can follow the setup procedure in https://github.com/EmbeddedLLM/vllm-rocm |
In addition to the good community forks, here's the llamacpp ROCM commit for inspiration on this topic: https://github.com/ggerganov/llama.cpp/pull/1087/commits |
Also building from #1313 went just fine for me. |
With the integration of flash attention v2 we can report vLLM v0.2.1 on ROCm achieved speedup of > 2x for LLaMA-70B model and > 3x for LLaMA-7B/13B on MI210 compared to vLLM v0.1.4. We are trying to port AWQ and contribute to #1313. |
So is only the 210 and newer cards supported? If so that's pretty worthless for most of us. I started to install a little bit ago but once I went to install flash attention it stopped me in my tracks. |
Please support mi100s, I will be forever grateful.
…On Sun, Nov 5, 2023, 11:53 AM ChrisK ***@***.***> wrote:
Hi, this is sabre from AMD. We recognized the value of VLLM and are
already working on both xformers
<https://github.com/facebookresearch/xformers> and VLLM for ROCm. Stay
tuned!
So is only the 210 and newer cards supported? If so that's pretty
worthless for most of us. I started to install a little bit ago but once I
went to install flash attention it stopped me in my tracks.
—
Reply to this email directly, view it on GitHub
<#621 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAIQ4BMXYY7JCS6DYUPSQWLYC7ACHAVCNFSM6AAAAAA25O2YYKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOJTHAYDGMRZGY>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
FWIW in case anyone else is following this, it looks like ROCm support was ported from the vLLM fork linked in this thread and merged upstream. Here's the docs: https://docs.vllm.ai/en/latest/getting_started/amd-installation.html |
@ehartford etc: For folks who are interested in vllm on MI100: try this fork: https://github.com/hongxiayang/vllm/tree/navi3x_rocm6. Let me know if you run into problems. |
this is for rocm6 specifically? I use rocm 5.7 because pytorch nightly uses rocm 5.7. |
This is how you can build and run the docker:
|
BASE_IMAGE for ROCm_5.7: "rocm/pytorch:rocm5.7_ubuntu22.04_py3.10_pytorch_2.0.1" |
@ehartford I found there are people successfully compile vLLM for MI100 on ROCm 5.7. The only changes need to be made is to pass the AMD GPU architecture and @jamestwhedbee has opened a simple PR that specify Kudos to them to verify that the flash-attention-rocm can also be compiled for MI100. |
No review on #2792 yet, not sure the best way to get more eyes on it |
ROCm currently has many many issues, would it be a possibility to install vllm using vulkan or clblast with some limited performance? This would be great specifically for AMD iGPU (APU) |
It looks like it was merged in,a good sign, no? Does anyone have any benchmarks on this? How does it stack up against an A100? Lower, I am sure, but if it is even close... |
@thebeline At least for TGI, that relies on VLLM's paged attention kernel, you can find some benchmarks here: https://huggingface.co/blog/huggingface-and-optimum-amd#production-solutions |
hi, do you have pre-built vllm docker image for rocm-5.7? |
ROCm 6.1.2 |
Thanks! pulling it now. if I use MI100s with: =============================== Link Type between two GPUs =============================== should I do some special settings? |
And I have met this: |
seems not working in MI100 |
Likely not as this is focused on mi300x. Probably compiled for gfx942. |
guess so , I failed compiling for gfx908~ |
You could try what consumer GPU workaround exist for stable diffusion
workflows on non-suoported ROCM gpus,
HSA override variable (I forget exactly which variable but easy to search).
…On Tue, Jul 2, 2024, 11:07 PM linshuai ***@***.***> wrote:
Likely not as this is focused on mi300x. Probably compiled for gfx942.
guess so , I failed compiling for gfx908~
—
Reply to this email directly, view it on GitHub
<#621 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AEMUTTVVPVRRX2WFUUW7LMTZKNTGFAVCNFSM6AAAAAA25O2YYKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMBUHE4DAMJRGA>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
For gfx908. As far as I remember: Please build your own docker image, and let's start from there. |
This issue is very old, and should be closed as the initial request for ROCm support has already there. cc @zhuohan123 |
Did docker image building, but failed with same error |
With this patch, mp executor does not hang at the end of application out of the box, and exits gracefully.
Hello,
I'm trying to install VLLM on AMD server. However unable to build the package because CUDA is not installed. Is their anyway we can configure it to work with ROCM instead?
!pip install vllm
Error:
RuntimeError: Cannot find CUDA_HOME. CUDA must be available in order to build the package.
ROCM is installed and verified
PyTorch 2-0-ROCm
The text was updated successfully, but these errors were encountered: