-
Notifications
You must be signed in to change notification settings - Fork 27.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix issue with 16xx cards #4407
Conversation
cudnn
This is a fix I've seen floated around on threads for a while now and it's a curious one. Enabling cuDNN shouldn't have any effect as cuDNN is enabled by default in all cases if available. So, logically only benchmark should be fixing this issue (and that seems more like a bug with PyTorch tbh). Could anybody with a 16xx test only enabling benchmarking? Anyhow, you should gate benchmark enablement beyond a SM check for 16xx cards as enabling benchmarking has highly variable results. It degraded performance on my 3080, for example. |
Just added the 2 lines on an gtx1660 super.(6gb) And indeed. I can start without command line parameters and the image I get is ok. (not black). From 1.5 iteration / sec to 2.5sec / iteration. With the same 2 lines active, but started with --no-half and --precision full the performance is back to normal. |
I had a 1650 until recently and it worked fine with just "--medvram", didn't need to use no-half or such. (I'm on Linux) |
Maybe you could check what GPU that is enabled, if it even is possible, to filter which should get it |
I am a 1660 user, I use this fix in order to run it; and yeah, if there is a way to check if the gpu is a 16xx card, ill try and implement it, haven't found one yet |
this might take more time on startup; since it loops over every card and loops over a list of turing cards and checks the name; but its better for the long run preformance |
|
Why are the 20xx cards in the list though? They work fine now, and judging by other replies this change would just tank performance for no reason. |
some 20xx cards are turing |
thanks C43H66N12O12S2
I can confirm this fix works for me on a 1660 SUPER. Till now I've had to use the args "--precision full" and "--no-half" otherwise I get black images. With this change made I no longer see black images even without those args. (In both cases I am also using "--medvram" and "--xformers") It looks like @C43H66N12O12S2 was correct that it is the benchmarking change that is fixing this. I commented out "torch.backends.cudnn.enabled = True" and still saw this fix work. I guess that line can be removed from this change unless it has some other effect. |
benchmark=True is the only thing that has an effect, yes. And as far as I know it improves performance if anything, at least on the second generation onwards once the benchmarking has already been done? By the way, calculations with 16-bit floats are extremely slow on 16xx cards, so even with this fix you should always be using --no-half anyway unless you're truly desperate for vram. Might be worth updating the documentation accordingly. (Although I don't know exactly which set of cards has fast 16-bit and which set doesn't.) |
…essary cudnn.enabled" This reverts commit 46b0d23.
…an ones related to the PR" This reverts commit 2651267.
@yoinked-h @C43H66N12O12S2 Maybe I need |
ill try it out with torch2 |
According to some tutorial websites, it seems that only the 16 series will have issues with not working. |
yep; tensor cores are the main reason the 20xx series does fp32 normally, 16xx dont get that comfort |
16XX cards dont natively support FP32; but with this simple workaround they do work, without
--precision full
and--no-half