-
Notifications
You must be signed in to change notification settings - Fork 27.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Image generation won't start forever (Linux+ROCm, possibly specific to RX 5000 series) #10855
Comments
I have exactly the same issue, used to work perfectly before. Like you say, it just sits there and doesn't do anything, no errors anywhere. I've uninstalled/reinstalled everything and tried various different combinations, no good. Previously I would get the classic: "MIOpen(HIP): Warning [SQLiteBase] Missing system database file: gfx1030_40.kdb Performance may degrade. " warning, but then after about a minute it would start working and then work correctly. Now I don't get that warning, suggesting that might be the point that it falters. I'm using an AMD Radeon RX 5700 XT (8GB), Ryzen 3700 CPU, Arch Linux. So similar to you but not exactly the same. Fingers crossed somebody can suggest something! Previously on this system I've had SD working well through all the updates from September last year to a couple of weeks ago. |
Same issue, no errors, just not generating anything |
Could it be this problem is specific to RX 5000 series? |
I fear it might be related to the fact that the 5000 series wasn't supposed to work originally, but then we got a workaround to do with 'fooling something' into believing it was a different chip, after which it then worked. |
To confirm to anyone trying to help - at least in my case it used to immediately give the warning: This no longer happens. So whatever is different is after the Generate button is hit, and before the warning would be outputted. [Edit: Additionally, I ran the tests for PyTorch found here - https://pytorch.org/get-started/locally/ suggesting that PyTorch RocM is working as expected] [Edit 2: Not sure if it's useful to know, but I did recently install OpenCL on my machine, I was reading that OpenCL/HIP backends are potentially not compatible side-by-side when using RocM. I don't fully understand all of this but my gut feeling is it could be something to do with that - but then, maybe others haven't recently installed OpenCL] |
In fact, inspired by this PR, I had tried the dev branch shortly before v1.3.0 was released. The participants in the PR were only RX 6000 users, and I think the merge was forced without decent verification with 5000 series. |
I agree, I fear that change is what has broken it for RX 5000 users. According to that PR it was needed due to old versions not being available on the pytorch repos. I wonder if they are still available elsewhere. I fear we're going to need the 1.3 version again, avoiding the 2.0 version which doesn't appear to work. It at times like this when I really get mad at myself for updating anything! It was all working so well. |
But I have the exact same issue on the 6600m gfx1031? with r7 5800h |
Same here (RX 5700) with ROCm 5.5 has anyone tried with a torch 2.0 build for ROCm version 5.5? For now the newest one in nightly is still 5.4.2 |
Even force downgrading was failing for me, I had instructions that had a '+rocm' next to the package versions? When I tried without it appeared to download the Nvidia versions. What would be the way to try the 5.5 version? I can try that now. |
You would have to build pytorch yourself with the ROCm 5.5 version. Maybe something like #9591, the docker image they use does not exist anymore, but the one from the official pytorch docker repo could still work (https://hub.docker.com/r/rocm/pytorch/tags)
But I'm not really sure if that would make it work, even if we'd be able to compile it, maybe there is something that doesn't work in the new pytorch version with rx5X00 graphics cards.
Maybe you had '--extra-index-url' instead of '--index-url'. You could also just go into your venv directory: Additionally I added the |
I wonder if the fact they bumped the Python version up to 3.11 makes a difference? I see you were running 3.10. |
https://download.pytorch.org/whl/rocm5.2/torch/ |
I'm retrying now with 3.10. Fingers crossed. |
Otherwise you could try to download the .whl file and just install it directly with pip:
|
Success! @ethragur is the hero, his solution has worked for me. My solution was this - ensure you have Python 3.10 and edit the webui.sh file to make sure it uses Python 3.10. Run webui.sh and let it create the venv etc and then fail to create an image. Run: Then run (thanks to @ethragur) Now restart webui.sh and this time image generation will succeed, you'll see at the bottom of A1111 that the version number says "torch: 1.13.1+rocm5.2". Hopefully what has worked for me will work for others too, thanks again to @ethragur for the help - I was getting very down at not having SD to play with! |
Perfect, good to hear that it works again. I'll try building the new version in a docker container, and if it works I'll upload the .whl file somewhere. But I do not have high hopes. Maybe there is some way to get more debug information out of pytorch to see where it is stuck |
Any contributors notice this issue? |
v1.3.1, released yesterday, doesn't seem to have this fix... too bad. |
@AUTOMATIC1111 please don't ignore us... |
Same issue, 5700 XT both on torch 1.13.1 and 2.0. Oddly enough, I just borrowed this card today from a friend and managed to get a single gen in before this bug occured EDIT: It started generating the entire prompt in a couple seconds, after waiting for 2 minutes. After that incident, my system became really sluggish. Prompts were generating again, but the speed was inconsistent |
Is this Windows or Linux? For me it was cut and dry, torch 2.0 doesn't work, torch 1.13.1 does. Perhaps check versions, etc? |
I'm on Ubuntu 22.04. And yes it occurs with both versions of torch. Prompt loads for a minute or two, first 90% of the gen gets done in a couple seconds, gets stuck at 97% again for a while, and then finished the prompt. Also my system seems to get really unstable after prompting, as if it's about to crash or blackscreen. Quite odd. EDIT: Tested again, now it only occurs on torch 2.0. Works alright on 1.13.1 besides the initial lag. |
I made a PR to force pytorch 1.13.1 for RX 5000 cards. also checks for python <= 3.10 |
But still, why is only RX 5000 series soooo incompatible with torch 2.0?? |
That's a good question. My first guess is that we need to force HSA_OVERRIDE_GFX_VERSION to make it work, but that's also trie for RX 6000, wich is working just fine. Sooo.... Who knows. We can't even be really sure it's just RX 5000, maybe there are other series wich have problems but no one has reported it yet |
HSA_OVERRIDE_GFX_VERSION is already forced though in the script for those cards - it was set correctly for me even when things weren't working. Perhaps Torch v2.0 needs a further workaround or something. I just hope code doesn't slip into the repo that's only torch 2.0 compatible, then we're in trouble. |
HSA_OVERRIDE_GFX_VERSION is already enabled by default in webui.sh since a couple releases I think,
Before this card, I ran SD on a RX 580 4GB which was a nightmare to get running. It didn't have this specific issue, but plenty of others problems that all boiled down to ROCm support. |
Yes, exactly. What i meant was that my first guess was about the HSA_OVERRIDE_GFX_VERSION causing problems, but that can't be because also the 6000 series uses that without issues. |
Just out of curiosity, would there be any significant increase in performance on torch 2.0? Would be interesting to see someone on torch 2.0 with a 5700XT upload a benchmark, to compare to 1.13.1 |
It surely would, if we can manage to run it. Specially using --opt-sdp-attention On AMD we can't use xformers, and that option would surely be a huge boost |
Related reports:
Really? So, was it wrong of me to buy an RX 5000 GPU? And should I sell it right now?? |
I believe it doesn't officially, but with the special override define it allows it to work. I'm using RocM 5.2 on a Navi1.x card. |
No, it can still work with an older PyTorch and that override. And technically ROCm doesn't officially supports any consumer-grade video card. Even if they work just fine with it. |
The PR #11048 was merged into dev and release_candidate branches. |
Tried it with the new rocm5.5 torch release build in the pytorch nightly repo. The same problem is still present ... |
Can confirm that I have this issue too with my RX 5700 XT. Starting to regret ever buying that GPU, tbh.. Everything worked fine last time I was into using SD, sometime last year or so. |
I still have this issue with RX 5700 XT. Downgrade to 1.13.1 worked for me, although there is this delay at the beginning of picture creation. I cannot use the sd-xl-base checkpoint with it though... please @AUTOMATIC1111 fix this... |
That probably isn't something related to the Web UI, it's an issue in pytorch itself. Or maybe in ROCm. Anyway, i found this on pytorch's github, probably related |
Indeed related, torch>=2.0.0 won't run on RDNA1 for now, even with torch wheel targeting |
I found some time ago an old pytorch 2.0 build wich runs on RX5000 pytorch/pytorch#106728 (comment) |
Hi, Can someone please summarize what setup is needed to run a 5000 series GPU with Torch?
|
Linux version: Any one should be fine. Note: There has been some work in the past months to make the old gpus work again on newer versions of rocm and pytorch. The last official pytorch wheels don't work on the 5000 series yet, but just few days ago a guy wrote under a issue on ROCm's github that he managed to build the last version from git. |
Well..... i guess they got deleted, eventually. it's a very old nightly build, after all. Luckly, i still had those wheels on my hard drive, i uploaded them on a git repo https://github.com/DGdev91/pythorch_wheels_rocm5.2 |
thank you very much, I tried but it didn't work (torch can't work with gfx1010). I tried with Rocm 5.2, Rocm Last update, ubuntu 20 22 and 24... I give up. I am selling my 2 gpu 5700xt and buying a 4070 Ti Super. |
Did you set the HSA override environment variable? export HSA_OVERRIDE_GFX_VERSION=10.3.0 |
Is there an existing issue for this?
What happened?
I have newly installed v1.3.0, but image generation won't start even after many minutes of pressing "Generate" button.
Steps to reproduce the problem
What should have happened?
Image generation should have started.
Commit where the problem happens
20ae71f
What Python version are you running on ?
Python 3.10.x
What platforms do you use to access the UI ?
Linux
What device are you running WebUI on?
AMD GPUs (RX 5000 below)
What browsers do you use to access the UI ?
Mozilla Firefox
Command Line Arguments
List of extensions
(None)
Console logs
Additional information
My environment:
The text was updated successfully, but these errors were encountered: