Fix issue with 16xx cards #4407

yoinked-h · 2022-11-07T01:07:48Z

16XX cards dont natively support FP32; but with this simple workaround they do work, without --precision full and --no-half

cudnn

C43H66N12O12S2 · 2022-11-07T14:30:19Z

This is a fix I've seen floated around on threads for a while now and it's a curious one. Enabling cuDNN shouldn't have any effect as cuDNN is enabled by default in all cases if available.

So, logically only benchmark should be fixing this issue (and that seems more like a bug with PyTorch tbh). Could anybody with a 16xx test only enabling benchmarking?

Anyhow, you should gate benchmark enablement beyond a SM check for 16xx cards as enabling benchmarking has highly variable results. It degraded performance on my 3080, for example.

TKoestlerx · 2022-11-07T18:26:13Z

Just added the 2 lines on an gtx1660 super.(6gb)

And indeed. I can start without command line parameters and the image I get is ok. (not black).
But the performance absolutely collapses.

From 1.5 iteration / sec to 2.5sec / iteration.

With the same 2 lines active, but started with --no-half and --precision full the performance is back to normal.

drax-xard · 2022-11-07T18:36:54Z

I had a 1650 until recently and it worked fine with just "--medvram", didn't need to use no-half or such. (I'm on Linux)

yoinked-h · 2022-11-08T00:58:23Z

Maybe you could check what GPU that is enabled, if it even is possible, to filter which should get it

yoinked-h · 2022-11-08T01:05:36Z

This is a fix I've seen floated around on threads for a while now and it's a curious one. Enabling cuDNN shouldn't have any effect as cuDNN is enabled by default in all cases if available.

So, logically only benchmark should be fixing this issue (and that seems more like a bug with PyTorch tbh). Could anybody with a 16xx test only enabling benchmarking?

Anyhow, you should gate benchmark enablement beyond a SM check for 16xx cards as enabling benchmarking has highly variable results. It degraded performance on my 3080, for example.

I am a 1660 user, I use this fix in order to run it; and yeah, if there is a way to check if the gpu is a 16xx card, ill try and implement it, haven't found one yet

yoinked-h · 2022-11-08T02:13:12Z

this might take more time on startup; since it loops over every card and loops over a list of turing cards and checks the name; but its better for the long run preformance

C43H66N12O12S2 · 2022-11-08T04:44:14Z

torch.cuda.get_device_capability(device) == (7, 5)

XiteSDF · 2022-11-08T09:24:07Z

Why are the 20xx cards in the list though? They work fine now, and judging by other replies this change would just tank performance for no reason.

yoinked-h · 2022-11-08T23:17:23Z

Why are the 20xx cards in the list though? They work fine now, and judging by other replies this change would just tank performance for no reason.

some 20xx cards are turing
although mentioned by C43H66N12O12S2, ill implement the better solution

thanks C43H66N12O12S2

JackCopland · 2022-11-28T18:41:16Z

I can confirm this fix works for me on a 1660 SUPER. Till now I've had to use the args "--precision full" and "--no-half" otherwise I get black images. With this change made I no longer see black images even without those args. (In both cases I am also using "--medvram" and "--xformers")

It looks like @C43H66N12O12S2 was correct that it is the benchmarking change that is fixing this. I commented out "torch.backends.cudnn.enabled = True" and still saw this fix work. I guess that line can be removed from this change unless it has some other effect.

MrCheeze · 2022-12-03T07:41:22Z

This is a fix I've seen floated around on threads for a while now and it's a curious one. Enabling cuDNN shouldn't have any effect as cuDNN is enabled by default in all cases if available.

So, logically only benchmark should be fixing this issue (and that seems more like a bug with PyTorch tbh). Could anybody with a 16xx test only enabling benchmarking?

Anyhow, you should gate benchmark enablement beyond a SM check for 16xx cards as enabling benchmarking has highly variable results. It degraded performance on my 3080, for example.

benchmark=True is the only thing that has an effect, yes. And as far as I know it improves performance if anything, at least on the second generation onwards once the benchmarking has already been done?

By the way, calculations with 16-bit floats are extremely slow on 16xx cards, so even with this fix you should always be using --no-half anyway unless you're truly desperate for vram. Might be worth updating the documentation accordingly. (Although I don't know exactly which set of cards has fast 16-bit and which set doesn't.)

…e PR

…essary cudnn.enabled" This reverts commit 46b0d23.

…an ones related to the PR" This reverts commit 2651267.

This reverts commit 681c000, reversing changes made to 37fc1fa.

pinyangcong · 2023-05-16T04:55:42Z

@yoinked-h @C43H66N12O12S2 Maybe I need torch.backends.cudnn.benchmark_limit = 0, because the total number of convolution algorithm benchmark tests is small, which can still lead to the possibility of issue occurring in my 1650 card.

yoinked-h · 2023-05-17T00:50:42Z

ill try it out with torch2

pinyangcong · 2023-05-17T08:32:45Z

ill try it out with torch2

According to some tutorial websites, it seems that only the 16 series will have issues with not working.
if any(["GeForce GTX 16" in torch.cuda.get_device_name(devid) for devid in range(0, torch.cuda.device_count())]):
may be better than
if any([torch.cuda.get_device_capability(devid) == (7, 5) for devid in range(0, torch.cuda.device_count())]):

yoinked-h · 2023-05-17T23:05:48Z

yep; tensor cores are the main reason the 20xx series does fp32 normally, 16xx dont get that comfort

16xx card fix

cd6c55c

cudnn

yoinked-h requested a review from AUTOMATIC1111 as a code owner November 7, 2022 01:07

yoinked-h marked this pull request as draft November 8, 2022 01:16

terrible hack

29eff4a

yoinked-h marked this pull request as ready for review November 8, 2022 02:10

yoinked-h marked this pull request as draft November 8, 2022 23:17

actual better fix

62e9fec

thanks C43H66N12O12S2

yoinked-h marked this pull request as ready for review November 8, 2022 23:19

AUTOMATIC1111 approved these changes Dec 3, 2022

View reviewed changes

AUTOMATIC1111 merged commit 681c000 into AUTOMATIC1111:master Dec 3, 2022

AUTOMATIC1111 added a commit that referenced this pull request Dec 3, 2022

fix #4407 breaking UI entirely for card other than ones related to th…

2651267

…e PR

AUTOMATIC1111 added a commit that referenced this pull request Dec 3, 2022

add comment for #4407 and remove seemingly unnecessary cudnn.enabled

46b0d23

AlexValliere mentioned this pull request Dec 3, 2022

[Bug]: Ouf Of Memory when using highresfix #5391

Closed

1 task

yoinked-h deleted the patch-1 branch December 6, 2022 05:48

Vermiliond added a commit to Vermiliond/stable-diffusion-webui that referenced this pull request Dec 9, 2022

Revert "add comment for AUTOMATIC1111#4407 and remove seemingly unnec…

8ddefa4

…essary cudnn.enabled" This reverts commit 46b0d23.

Vermiliond added a commit to Vermiliond/stable-diffusion-webui that referenced this pull request Dec 9, 2022

Revert "fix AUTOMATIC1111#4407 breaking UI entirely for card other th…

0365be4

…an ones related to the PR" This reverts commit 2651267.

Vermiliond added a commit to Vermiliond/stable-diffusion-webui that referenced this pull request Dec 9, 2022

Revert "Merge pull request AUTOMATIC1111#4407 from yoinked-h/patch-1"

b20951c

This reverts commit 681c000, reversing changes made to 37fc1fa.

MasterZap mentioned this pull request Dec 11, 2022

[Bug]: Addition of "cudnn.benchmark = true" breaks 2080 Ti card with CUDA Out Of Memory #5619

Closed

1 task

mainsplainer mentioned this pull request Mar 23, 2023

Possible solution for loss=nan in 16xx series cards. bmaltais/kohya_ss#436

Closed

pinyangcong mentioned this pull request Apr 28, 2023

[Bug]: "torch.backends.cudnn.benchmark = true" hasn't completely fix issue with my 1650 card #9916

Closed

1 task

PaperOrb mentioned this pull request May 5, 2023

loss=nan on 1660 SUPER 6GB kohya-ss/sd-scripts#293

Closed

catboxanon mentioned this pull request Aug 31, 2023

[Bug]: Device capability check incorrectly sets cuDNN benchmark on cards it should not or on multi-GPU systems, causes non-deterministic results #12879

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix issue with 16xx cards #4407

Fix issue with 16xx cards #4407

yoinked-h commented Nov 7, 2022

C43H66N12O12S2 commented Nov 7, 2022

TKoestlerx commented Nov 7, 2022

drax-xard commented Nov 7, 2022

yoinked-h commented Nov 8, 2022

yoinked-h commented Nov 8, 2022 •

edited

Loading

yoinked-h commented Nov 8, 2022

C43H66N12O12S2 commented Nov 8, 2022

XiteSDF commented Nov 8, 2022

yoinked-h commented Nov 8, 2022

JackCopland commented Nov 28, 2022

MrCheeze commented Dec 3, 2022

pinyangcong commented May 16, 2023

yoinked-h commented May 17, 2023

pinyangcong commented May 17, 2023

yoinked-h commented May 17, 2023

Fix issue with 16xx cards #4407

Fix issue with 16xx cards #4407

Conversation

yoinked-h commented Nov 7, 2022

C43H66N12O12S2 commented Nov 7, 2022

TKoestlerx commented Nov 7, 2022

drax-xard commented Nov 7, 2022

yoinked-h commented Nov 8, 2022

yoinked-h commented Nov 8, 2022 • edited Loading

yoinked-h commented Nov 8, 2022

C43H66N12O12S2 commented Nov 8, 2022

XiteSDF commented Nov 8, 2022

yoinked-h commented Nov 8, 2022

JackCopland commented Nov 28, 2022

MrCheeze commented Dec 3, 2022

pinyangcong commented May 16, 2023

yoinked-h commented May 17, 2023

pinyangcong commented May 17, 2023

yoinked-h commented May 17, 2023

yoinked-h commented Nov 8, 2022 •

edited

Loading