Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Image generation won't start forever (Linux+ROCm, possibly specific to RX 5000 series) #10855

Open
1 task done
cyatarow opened this issue May 30, 2023 · 47 comments
Open
1 task done
Labels
bug-report Report of a bug, yet to be confirmed platform:amd Issues that apply to AMD manufactured cards

Comments

@cyatarow
Copy link

Is there an existing issue for this?

  • I have searched the existing issues and checked the recent builds/commits

What happened?

I have newly installed v1.3.0, but image generation won't start even after many minutes of pressing "Generate" button.

Steps to reproduce the problem

  1. Launch the UI by webui.sh
  2. Go to http://127.0.0.1:7860 with a browser
  3. Press "Generate" for any prompt or model

What should have happened?

Image generation should have started.

Commit where the problem happens

20ae71f

What Python version are you running on ?

Python 3.10.x

What platforms do you use to access the UI ?

Linux

What device are you running WebUI on?

AMD GPUs (RX 5000 below)

What browsers do you use to access the UI ?

Mozilla Firefox

Command Line Arguments

`--ckpt-dir` and `--vae-dir`
I'm using external storage to place model files.

List of extensions

(None)

Console logs

################################################################
Install script for stable-diffusion + Web UI
Tested on Debian 11 (Bullseye)
################################################################

################################################################
Running on sd-amd user
################################################################

################################################################
Repo already cloned, using it as install directory
################################################################

################################################################
Create and activate python venv
################################################################

################################################################
Launching launch.py...
################################################################
Using TCMalloc: libtcmalloc.so.4
Python 3.10.6 (main, Mar 10 2023, 10:55:28) [GCC 11.3.0]
Version: v1.3.0
Commit hash: 20ae71faa8ef035c31aa3a410b707d792c8203a3
Installing torch and torchvision
Looking in indexes: https://download.pytorch.org/whl/rocm5.4.2
Collecting torch==2.0.1+rocm5.4.2
  Using cached https://download.pytorch.org/whl/rocm5.4.2/torch-2.0.1%2Brocm5.4.2-cp310-cp310-linux_x86_64.whl (1536.4 MB)
Collecting torchvision==0.15.2+rocm5.4.2
  Using cached https://download.pytorch.org/whl/rocm5.4.2/torchvision-0.15.2%2Brocm5.4.2-cp310-cp310-linux_x86_64.whl (62.4 MB)
Collecting filelock
  Using cached https://download.pytorch.org/whl/filelock-3.9.0-py3-none-any.whl (9.7 kB)
Collecting networkx
  Using cached https://download.pytorch.org/whl/networkx-3.0-py3-none-any.whl (2.0 MB)
Collecting sympy
  Using cached https://download.pytorch.org/whl/sympy-1.11.1-py3-none-any.whl (6.5 MB)
Collecting pytorch-triton-rocm<2.1,>=2.0.0
  Using cached https://download.pytorch.org/whl/pytorch_triton_rocm-2.0.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (78.4 MB)
Collecting jinja2
  Using cached https://download.pytorch.org/whl/Jinja2-3.1.2-py3-none-any.whl (133 kB)
Collecting typing-extensions
  Using cached https://download.pytorch.org/whl/typing_extensions-4.4.0-py3-none-any.whl (26 kB)
Collecting requests
  Using cached https://download.pytorch.org/whl/requests-2.28.1-py3-none-any.whl (62 kB)
Collecting pillow!=8.3.*,>=5.3.0
  Using cached https://download.pytorch.org/whl/Pillow-9.3.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.2 MB)
Collecting numpy
  Using cached https://download.pytorch.org/whl/numpy-1.24.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (17.3 MB)
Collecting cmake
  Using cached https://download.pytorch.org/whl/cmake-3.25.0-py2.py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (23.7 MB)
Collecting lit
  Using cached https://download.pytorch.org/whl/lit-15.0.7.tar.gz (132 kB)
  Preparing metadata (setup.py) ... done
Collecting MarkupSafe>=2.0
  Using cached https://download.pytorch.org/whl/MarkupSafe-2.1.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (25 kB)
Collecting certifi>=2017.4.17
  Using cached https://download.pytorch.org/whl/certifi-2022.12.7-py3-none-any.whl (155 kB)
Collecting idna<4,>=2.5
  Using cached https://download.pytorch.org/whl/idna-3.4-py3-none-any.whl (61 kB)
Collecting urllib3<1.27,>=1.21.1
  Using cached https://download.pytorch.org/whl/urllib3-1.26.13-py2.py3-none-any.whl (140 kB)
Collecting charset-normalizer<3,>=2
  Using cached https://download.pytorch.org/whl/charset_normalizer-2.1.1-py3-none-any.whl (39 kB)
Collecting mpmath>=0.19
  Using cached https://download.pytorch.org/whl/mpmath-1.2.1-py3-none-any.whl (532 kB)
Using legacy 'setup.py install' for lit, since package 'wheel' is not installed.
Installing collected packages: mpmath, lit, cmake, urllib3, typing-extensions, sympy, pillow, numpy, networkx, MarkupSafe, idna, filelock, charset-normalizer, certifi, requests, jinja2, pytorch-triton-rocm, torch, torchvision
  Running setup.py install for lit ... done
Successfully installed MarkupSafe-2.1.2 certifi-2022.12.7 charset-normalizer-2.1.1 cmake-3.25.0 filelock-3.9.0 idna-3.4 jinja2-3.1.2 lit-15.0.7 mpmath-1.2.1 networkx-3.0 numpy-1.24.1 pillow-9.3.0 pytorch-triton-rocm-2.0.1 requests-2.28.1 sympy-1.11.1 torch-2.0.1+rocm5.4.2 torchvision-0.15.2+rocm5.4.2 typing-extensions-4.4.0 urllib3-1.26.13
Installing gfpgan
Installing clip
Installing open_clip
Cloning Stable Diffusion into /home/sd-amd/sd-ui-130/stable-diffusion-webui/repositories/stable-diffusion-stability-ai...
Cloning Taming Transformers into /home/sd-amd/sd-ui-130/stable-diffusion-webui/repositories/taming-transformers...
Cloning K-diffusion into /home/sd-amd/sd-ui-130/stable-diffusion-webui/repositories/k-diffusion...
Cloning CodeFormer into /home/sd-amd/sd-ui-130/stable-diffusion-webui/repositories/CodeFormer...
Cloning BLIP into /home/sd-amd/sd-ui-130/stable-diffusion-webui/repositories/BLIP...
Installing requirements for CodeFormer
Installing requirements
Launching Web UI with arguments: --ckpt-dir /mnt/W20/Stable_Diffusion/MODEL --vae-dir /mnt/W20/Stable_Diffusion/VAE
No module 'xformers'. Proceeding without it.
Calculating sha256 for /mnt/W20/Stable_Diffusion/MODEL/AnythingV5_v5PrtRE.safetensors: Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Startup time: 2.7s (import torch: 0.5s, import gradio: 0.6s, import ldm: 0.6s, other imports: 0.4s, load scripts: 0.3s, create ui: 0.2s).
7f96a1a9ca9b3a3242a9ae95d19284f0d2da8d5282b42d2d974398bf7663a252
Loading weights [7f96a1a9ca] from /mnt/W20/Stable_Diffusion/MODEL/AnythingV5_v5PrtRE.safetensors
Creating model from config: /home/sd-amd/sd-ui-130/stable-diffusion-webui/configs/v1-inference.yaml
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
Applying optimization: sdp-no-mem... done.
Textual inversion embeddings loaded(0): 
Model loaded in 4.0s (calculate hash: 2.7s, create model: 0.3s, apply weights to model: 0.3s, apply half(): 0.2s, load VAE: 0.1s, move model to device: 0.2s).

Additional information

My environment:

  • OS: Ubuntu 22.04.2
  • CPU: Intel Core i3-12100
  • GPU: AMD Radeon RX 5500 XT (8GB)
@cyatarow cyatarow added the bug-report Report of a bug, yet to be confirmed label May 30, 2023
@olinorwell
Copy link

I have exactly the same issue, used to work perfectly before.

Like you say, it just sits there and doesn't do anything, no errors anywhere.

I've uninstalled/reinstalled everything and tried various different combinations, no good.

Previously I would get the classic: "MIOpen(HIP): Warning [SQLiteBase] Missing system database file: gfx1030_40.kdb Performance may degrade. " warning, but then after about a minute it would start working and then work correctly. Now I don't get that warning, suggesting that might be the point that it falters.

I'm using an AMD Radeon RX 5700 XT (8GB), Ryzen 3700 CPU, Arch Linux. So similar to you but not exactly the same.

Fingers crossed somebody can suggest something! Previously on this system I've had SD working well through all the updates from September last year to a couple of weeks ago.

@HoCoK31
Copy link

HoCoK31 commented May 31, 2023

Same issue, no errors, just not generating anything
AMD Radeon RX 5700 XT, Ryzen 3600, Manjaro, kernel 6.3.4-2

@cyatarow
Copy link
Author

Could it be this problem is specific to RX 5000 series?

@cyatarow cyatarow changed the title [Bug]: Image generation won't start forever (Linux+ROCm, RX 5500 XT) [Bug]: Image generation won't start forever (Linux+ROCm, possibly specific to RX 5000 series) May 31, 2023
@olinorwell
Copy link

I fear it might be related to the fact that the 5000 series wasn't supposed to work originally, but then we got a workaround to do with 'fooling something' into believing it was a different chip, after which it then worked.
Perhaps that trick isn't working now, and it's just unable to function.
There must be many others in the same situation out there. Hopefully they will all comment on this post.

@olinorwell
Copy link

olinorwell commented May 31, 2023

To confirm to anyone trying to help - at least in my case it used to immediately give the warning:
"MIOpen(HIP): Warning [SQLiteBase] Missing system database file: gfx1030_40.kdb Performance may degrade."

This no longer happens. So whatever is different is after the Generate button is hit, and before the warning would be outputted.

[Edit: Additionally, I ran the tests for PyTorch found here - https://pytorch.org/get-started/locally/ suggesting that PyTorch RocM is working as expected]

[Edit 2: Not sure if it's useful to know, but I did recently install OpenCL on my machine, I was reading that OpenCL/HIP backends are potentially not compatible side-by-side when using RocM. I don't fully understand all of this but my gut feeling is it could be something to do with that - but then, maybe others haven't recently installed OpenCL]

@cyatarow
Copy link
Author

cyatarow commented May 31, 2023

In fact, inspired by this PR, I had tried the dev branch shortly before v1.3.0 was released.
But the result was the same...

The participants in the PR were only RX 6000 users, and I think the merge was forced without decent verification with 5000 series.

@olinorwell
Copy link

I agree, I fear that change is what has broken it for RX 5000 users. According to that PR it was needed due to old versions not being available on the pytorch repos. I wonder if they are still available elsewhere. I fear we're going to need the 1.3 version again, avoiding the 2.0 version which doesn't appear to work. It at times like this when I really get mad at myself for updating anything! It was all working so well.

@VekuDazo
Copy link

VekuDazo commented May 31, 2023

But I have the exact same issue on the 6600m gfx1031? with r7 5800h
Without --medvram it doesnt proceed after - Applying optimization: sdp-no-mem... done.
With it, the model loads but nothing generates and nothing else happens in the terminal

@ethragur
Copy link

Same here (RX 5700) with ROCm 5.5
The only solution for now is to force downgrade to torch 1.13.1
pip install torch==1.13.1 torchvision==0.14.1 --index-url https://download.pytorch.org/whl/rocm5.2

has anyone tried with a torch 2.0 build for ROCm version 5.5? For now the newest one in nightly is still 5.4.2
https://download.pytorch.org/whl/nightly/torch/

@olinorwell
Copy link

Even force downgrading was failing for me, I had instructions that had a '+rocm' next to the package versions? When I tried without it appeared to download the Nvidia versions.

What would be the way to try the 5.5 version? I can try that now.

@ethragur
Copy link

ethragur commented May 31, 2023

What would be the way to try the 5.5 version? I can try that now.

You would have to build pytorch yourself with the ROCm 5.5 version. Maybe something like #9591, the docker image they use does not exist anymore, but the one from the official pytorch docker repo could still work (https://hub.docker.com/r/rocm/pytorch/tags)

rocm/pytorch:rocm5.5_ubuntu20.04_py3.8_pytorch_staging

But I'm not really sure if that would make it work, even if we'd be able to compile it, maybe there is something that doesn't work in the new pytorch version with rx5X00 graphics cards.

Even force downgrading was failing for me, I had instructions that had a '+rocm' next to the package versions? When I tried without it appeared to download the Nvidia versions.

Maybe you had '--extra-index-url' instead of '--index-url'. You could also just go into your venv directory:
stable-diffusion-webui/venv/lib/python3.10/site-packages and delete torch & torchvision. Afterwards you should just be able to use my pip install cmd.

Additionally I added the export TORCH_COMMAND= "pip install torch==1.13.1 torchvision==0.14.1 --index-url https://download.pytorch.org/whl/rocm5.2" to my webui-user.sh, and I started the webui with ./webui.sh

@olinorwell
Copy link

(venv) [oli@ARCH-RYZEN stable-diffusion-webui]$ pip install torch==1.13.1 torchvision==0.14.1 --index-url https://download.pytorch.org/whl/rocm5.2 Looking in indexes: https://download.pytorch.org/whl/rocm5.2 ERROR: Could not find a version that satisfies the requirement torch==1.13.1 (from versions: none) ERROR: No matching distribution found for torch==1.13.1

I wonder if the fact they bumped the Python version up to 3.11 makes a difference? I see you were running 3.10.

@ethragur
Copy link

ethragur commented May 31, 2023

I wonder if the fact they bumped the Python version up to 3.11 makes a difference? I see you were running 3.10.

https://download.pytorch.org/whl/rocm5.2/torch/
it looks like it, pytorch seems to only have builds for 3.10

@olinorwell
Copy link

I'm retrying now with 3.10. Fingers crossed.

@ethragur
Copy link

Otherwise you could try to download the .whl file and just install it directly with pip:

pip install /path/to/file.whl

@olinorwell
Copy link

Success! @ethragur is the hero, his solution has worked for me.
I'm now running v1.3.0 of A1111 on my 5700XT.

My solution was this - ensure you have Python 3.10 and edit the webui.sh file to make sure it uses Python 3.10.

Run webui.sh and let it create the venv etc and then fail to create an image.

Run:
source venv/bin/activate

Then run (thanks to @ethragur)
pip install torch==1.13.1 torchvision==0.14.1 --index-url https://download.pytorch.org/whl/rocm5.2

Now restart webui.sh and this time image generation will succeed, you'll see at the bottom of A1111 that the version number says "torch: 1.13.1+rocm5.2".

Hopefully what has worked for me will work for others too, thanks again to @ethragur for the help - I was getting very down at not having SD to play with!

@ethragur
Copy link

Perfect, good to hear that it works again.
Hopefully some future builds of pytorch will also work again with the rx5000 series, otherwise we'll be stuck on this version forever 😢. From what I've seen, 2.0 should give some performance improvements.

I'll try building the new version in a docker container, and if it works I'll upload the .whl file somewhere. But I do not have high hopes. Maybe there is some way to get more debug information out of pytorch to see where it is stuck

@cyatarow
Copy link
Author

cyatarow commented Jun 2, 2023

Any contributors notice this issue?

@cyatarow
Copy link
Author

cyatarow commented Jun 3, 2023

v1.3.1, released yesterday, doesn't seem to have this fix... too bad.

@cyatarow
Copy link
Author

cyatarow commented Jun 5, 2023

@AUTOMATIC1111 please don't ignore us...

@magusman52
Copy link

magusman52 commented Jun 5, 2023

Same issue, 5700 XT both on torch 1.13.1 and 2.0. Oddly enough, I just borrowed this card today from a friend and managed to get a single gen in before this bug occured

EDIT: It started generating the entire prompt in a couple seconds, after waiting for 2 minutes. After that incident, my system became really sluggish. Prompts were generating again, but the speed was inconsistent

@olinorwell
Copy link

Same issue, 5700 XT both on torch 1.13.1 and 2.0. Oddly enough, I just borrowed this card today from a friend and managed to get a single gen in before this bug occured

EDIT: It started generating the entire prompt in a couple seconds, after waiting for 2 minutes. After that incident, my system became really sluggish. Prompts were generating again, but the speed was inconsistent

Is this Windows or Linux?

For me it was cut and dry, torch 2.0 doesn't work, torch 1.13.1 does. Perhaps check versions, etc?
I always have a one minute delay before generations begin each time, but that's been like that since the beginning, and after it's done what it needs to do then I don't experience problems afterwards.

@magusman52
Copy link

magusman52 commented Jun 5, 2023

Same issue, 5700 XT both on torch 1.13.1 and 2.0. Oddly enough, I just borrowed this card today from a friend and managed to get a single gen in before this bug occured
EDIT: It started generating the entire prompt in a couple seconds, after waiting for 2 minutes. After that incident, my system became really sluggish. Prompts were generating again, but the speed was inconsistent

Is this Windows or Linux?

For me it was cut and dry, torch 2.0 doesn't work, torch 1.13.1 does. Perhaps check versions, etc? I always have a one minute delay before generations begin each time, but that's been like that since the beginning, and after it's done what it needs to do then I don't experience problems afterwards.

I'm on Ubuntu 22.04. And yes it occurs with both versions of torch. Prompt loads for a minute or two, first 90% of the gen gets done in a couple seconds, gets stuck at 97% again for a while, and then finished the prompt. Also my system seems to get really unstable after prompting, as if it's about to crash or blackscreen. Quite odd.

EDIT: Tested again, now it only occurs on torch 2.0. Works alright on 1.13.1 besides the initial lag.

@DGdev91
Copy link
Contributor

DGdev91 commented Jun 6, 2023

I made a PR to force pytorch 1.13.1 for RX 5000 cards. also checks for python <= 3.10
Not a definitive fix, but maybe it can help other users

#11048

@cyatarow
Copy link
Author

cyatarow commented Jun 6, 2023

But still, why is only RX 5000 series soooo incompatible with torch 2.0??

@DGdev91
Copy link
Contributor

DGdev91 commented Jun 7, 2023

But still, why is only RX 5000 series soooo incompatible with torch 2.0??

That's a good question. My first guess is that we need to force HSA_OVERRIDE_GFX_VERSION to make it work, but that's also trie for RX 6000, wich is working just fine.

Sooo.... Who knows.

We can't even be really sure it's just RX 5000, maybe there are other series wich have problems but no one has reported it yet

@olinorwell
Copy link

HSA_OVERRIDE_GFX_VERSION is already forced though in the script for those cards - it was set correctly for me even when things weren't working. Perhaps Torch v2.0 needs a further workaround or something.

I just hope code doesn't slip into the repo that's only torch 2.0 compatible, then we're in trouble.

@magusman52
Copy link

But still, why is only RX 5000 series soooo incompatible with torch 2.0??

That's a good question. My first guess is that we need to force HSA_OVERRIDE_GFX_VERSION to make it work, but that's also trie for RX 6000, wich is working just fine.

Sooo.... Who knows.

HSA_OVERRIDE_GFX_VERSION is already enabled by default in webui.sh since a couple releases I think,

We can't even be really sure it's just RX 5000, maybe there are other series wich have problems but no one has reported it yet

Before this card, I ran SD on a RX 580 4GB which was a nightmare to get running. It didn't have this specific issue, but plenty of others problems that all boiled down to ROCm support.

@DGdev91
Copy link
Contributor

DGdev91 commented Jun 7, 2023

HSA_OVERRIDE_GFX_VERSION is already forced though in the script for those cards - it was set correctly for me even when things weren't working. Perhaps Torch v2.0 needs a further workaround or something.

Yes, exactly. What i meant was that my first guess was about the HSA_OVERRIDE_GFX_VERSION causing problems, but that can't be because also the 6000 series uses that without issues.

@magusman52
Copy link

Just out of curiosity, would there be any significant increase in performance on torch 2.0? Would be interesting to see someone on torch 2.0 with a 5700XT upload a benchmark, to compare to 1.13.1

@DGdev91
Copy link
Contributor

DGdev91 commented Jun 8, 2023

Just out of curiosity, would there be any significant increase in performance on torch 2.0? Would be interesting to see someone on torch 2.0 with a 5700XT upload a benchmark, to compare to 1.13.1

It surely would, if we can manage to run it. Specially using --opt-sdp-attention

On AMD we can't use xformers, and that option would surely be a huge boost

@cyatarow
Copy link
Author

cyatarow commented Jun 9, 2023

Related reports:
#9951 (comment)
#9951 (comment)

As far as I can tell, ROCM does not support RDNA1/Navi1.x cards.

Really? So, was it wrong of me to buy an RX 5000 GPU? And should I sell it right now??

@olinorwell
Copy link

Related reports: #9951 (comment) #9951 (comment)

As far as I can tell, ROCM does not support RDNA1/Navi1.x cards.

Really? So, was it wrong of me to buy an RX 5000 GPU? And should I sell it right now??

I believe it doesn't officially, but with the special override define it allows it to work. I'm using RocM 5.2 on a Navi1.x card.

@DGdev91
Copy link
Contributor

DGdev91 commented Jun 9, 2023

As far as I can tell, ROCM does not
Really? So, was it wrong of me to buy an RX 5000 GPU? And should I sell it right now??

No, it can still work with an older PyTorch and that override.

And technically ROCm doesn't officially supports any consumer-grade video card. Even if they work just fine with it.

@cyatarow
Copy link
Author

The PR #11048 was merged into dev and release_candidate branches.
But...is there really no way to work around the issue other than fixing torch to 1.13.1?
Could it be that since RDNA1 is not officially supported by ROCm, torch 2.0 was developed without any consideration of RDNA1??

@ethragur
Copy link

Tried it with the new rocm5.5 torch release build in the pytorch nightly repo. The same problem is still present ...

@fighuass
Copy link

Can confirm that I have this issue too with my RX 5700 XT. Starting to regret ever buying that GPU, tbh..

Everything worked fine last time I was into using SD, sometime last year or so.

@k1llerk3ks
Copy link

k1llerk3ks commented Aug 6, 2023

I still have this issue with RX 5700 XT. Downgrade to 1.13.1 worked for me, although there is this delay at the beginning of picture creation. I cannot use the sd-xl-base checkpoint with it though... please @AUTOMATIC1111 fix this...

@catboxanon catboxanon added the platform:amd Issues that apply to AMD manufactured cards label Aug 7, 2023
@DGdev91
Copy link
Contributor

DGdev91 commented Aug 8, 2023

I still have this issue with RX 5700 XT. Downgrade to 1.13.1 worked for me, although there is this delay at the beginning of picture creation. I cannot use the sd-xl-base checkpoint with it though... please @AUTOMATIC1111 fix this...

That probably isn't something related to the Web UI, it's an issue in pytorch itself. Or maybe in ROCm.
I'm starting to think the problem here is rocm, because i had issues also in llama.cpp, both with clblast and with a forks wich aims to add rocm support.

Anyway, i found this on pytorch's github, probably related
pytorch/pytorch#106728

@cl0ck-byte
Copy link

cl0ck-byte commented Aug 8, 2023

Anyway, i found this on pytorch's github, probably related pytorch/pytorch#106728

Indeed related, torch>=2.0.0 won't run on RDNA1 for now, even with torch wheel targeting gfx1010 which is my card in this case.

edit: wow, this is worthless!
image

@DGdev91
Copy link
Contributor

DGdev91 commented Mar 12, 2024

I found some time ago an old pytorch 2.0 build wich runs on RX5000 pytorch/pytorch#106728 (comment)

@fd400
Copy link

fd400 commented Nov 12, 2024

Hi, Can someone please summarize what setup is needed to run a 5000 series GPU with Torch?

Linux version
Python version
Rocm version
Torch version
Specific approaches?
THANKS !

@DGdev91
Copy link
Contributor

DGdev91 commented Nov 12, 2024

Hi, Can someone please summarize what setup is needed to run a 5000 series GPU with Torch?

Linux version
Python version
Rocm version
Torch version
Specific approaches?
THANKS !

Linux version: Any one should be fine.
Python Version: 3.10, i reccomend using a Conda environment for that, because most of the distribution are currently shipped with a newer version.
Rocm Version: In theory, anything over 5.2 should be fine. Just install the last.
Torch Version: the one i posted on my last comment.
Some time ago i made an addition to the launch script just for the 5000 series, in theory it should install that automatically.

Note: There has been some work in the past months to make the old gpus work again on newer versions of rocm and pytorch. The last official pytorch wheels don't work on the 5000 series yet, but just few days ago a guy wrote under a issue on ROCm's github that he managed to build the last version from git.

ROCm/ROCm#2527 (comment)

@DGdev91
Copy link
Contributor

DGdev91 commented Nov 15, 2024

@fd400
Copy link

fd400 commented Nov 15, 2024

thank you very much, I tried but it didn't work (torch can't work with gfx1010). I tried with Rocm 5.2, Rocm Last update, ubuntu 20 22 and 24... I give up. I am selling my 2 gpu 5700xt and buying a 4070 Ti Super.

@DGdev91
Copy link
Contributor

DGdev91 commented Nov 16, 2024

thank you very much, I tried but it didn't work (torch can't work with gfx1010). I tried with Rocm 5.2, Rocm Last update, ubuntu 20 22 and 24... I give up. I am selling my 2 gpu 5700xt and buying a 4070 Ti Super.

Did you set the HSA override environment variable?

export HSA_OVERRIDE_GFX_VERSION=10.3.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug-report Report of a bug, yet to be confirmed platform:amd Issues that apply to AMD manufactured cards
Projects
None yet
Development

No branches or pull requests