Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUBLAS_STATUS_ARCH_MISMATCH error while trying to run Lc0 v0.29 CUDNN backend #1820

Open
Clausable opened this issue Dec 22, 2022 · 5 comments

Comments

@Clausable
Copy link

I have a GeForce GTX 760 GPU with 16 GB of ram, and I'm trying to run Lc0 v0.29.0 with CUDNN, but whenever I try to run "go nodes 1000" I get Unhandled exception in worker thread: CUBLAS error: CUBLAS_STATUS_ARCH_MISMATCH (c:\projects\lc0\src\neural\cuda\layers.cc:1478) for some reason. Here's my command line:

       _
|   _ | |
|_ |_ |_| v0.29.0 built Dec 13 2022
go nodes 1000
Found pb network file: 791556.pb.gz
Creating backend [cudnn-auto]...
Switching to [cudnn]...
CUDA Runtime version: 10.0.0
Cudnn version: 7.4.2
Latest version of CUDA supported by the driver: 11.4.0
GPU: NVIDIA GeForce GTX 760
GPU memory: 2 GiB
GPU clock frequency: 1058.5 MHz
GPU compute capability: 3.0
Unhandled exception in worker thread: CUBLAS error: CUBLAS_STATUS_ARCH_MISMATCH (c:\projects\lc0\src\neural\cuda\layers.cc:1478)

This doesn't happen with the v0.29 prerelease version either for some reason:

       _
|   _ | |
|_ |_ |_| v0.29.0-rc0 built Apr  3 2022
Detected 4 core(s) and 8 thread(s) in 1 group(s).
Group 0 has 4 core(s) and 8 thread(s).
go nodes 1000
Found pb network file: 320x24-2020_1206_2112_58_298.pb
Creating backend [cudnn-auto]...
Switching to [cudnn]...
CUDA Runtime version: 10.0.0
Cudnn version: 7.4.2
Latest version of CUDA supported by the driver: 11.4.0
GPU: NVIDIA GeForce GTX 760
GPU memory: 2 GiB
GPU clock frequency: 1058.5 MHz
GPU compute capability: 3.0
info depth 1 seldepth 2 time 1342 nodes 2 score cp 9 nps 25 tbhits 0 pv d2d4 g8f6
info depth 2 seldepth 3 time 1504 nodes 8 score cp 9 nps 32 tbhits 0 pv d2d4 g8f6 c2c4
info depth 2 seldepth 4 time 1672 nodes 16 score cp 9 nps 38 tbhits 0 pv d2d4 g8f6 c2c4 e7e6
info depth 3 seldepth 4 time 1784 nodes 24 score cp 9 nps 45 tbhits 0 pv d2d4 g8f6 c2c4 e7e6
info depth 3 seldepth 5 time 1950 nodes 33 score cp 10 nps 47 tbhits 0 pv d2d4 g8f6 c2c4 e7e6 g2g3
info depth 3 seldepth 6 time 2054 nodes 37 score cp 10 nps 46 tbhits 0 pv d2d4 g8f6 c2c4 e7e6 g2g3
info depth 3 seldepth 7 time 2188 nodes 51 score cp 10 nps 55 tbhits 0 pv d2d4 g8f6 c2c4 e7e6 g2g3
info depth 3 seldepth 8 time 2377 nodes 114 score cp 10 nps 102 tbhits 0 pv d2d4 g8f6 c2c4 e7e6 g2g3 f8b4
info depth 4 seldepth 8 time 2484 nodes 160 score cp 10 nps 130 tbhits 0 pv d2d4 g8f6 c2c4 e7e6 g2g3 f8b4
info depth 4 seldepth 9 time 2734 nodes 238 score cp 10 nps 161 tbhits 0 pv d2d4 g8f6 c2c4 e7e6 g2g3 f8b4 c1d2
info depth 4 seldepth 10 time 4559 nodes 341 score cp 10 nps 103 tbhits 0 pv d2d4 g8f6 c2c4 e7e6 g2g3 f8b4 c1d2
info depth 4 seldepth 11 time 5405 nodes 660 score cp 10 nps 159 tbhits 0 pv d2d4 g8f6 c2c4 e7e6 g2g3 f8b4 c1d2 b4e7
info depth 5 seldepth 11 time 6155 nodes 895 score cp 10 nps 182 tbhits 0 pv d2d4 g8f6 c2c4 e7e6 g2g3 f8b4 c1d2 b4e7 d1c2
bestmove d2d4 ponder g8f6
@borg323
Copy link
Member

borg323 commented Dec 23, 2022

Is v0.29.0 working with the same net you used for v0.29.0-rc0?

@Clausable
Copy link
Author

Clausable commented Dec 23, 2022

Now it is!

       _
|   _ | |
|_ |_ |_| v0.29.0 built Dec 13 2022
go nodes 1000
Found pb network file: 320x24-2020_1206_2112_58_298.pb
Creating backend [cudnn-auto]...
Switching to [cudnn]...
CUDA Runtime version: 10.0.0
Cudnn version: 7.4.2
Latest version of CUDA supported by the driver: 11.4.0
GPU: NVIDIA GeForce GTX 760
GPU memory: 2 GiB
GPU clock frequency: 1058.5 MHz
GPU compute capability: 3.0
info depth 1 seldepth 2 time 1563 nodes 2 score cp 9 nps 21 tbhits 0 pv d2d4 g8f6
info depth 2 seldepth 3 time 1742 nodes 8 score cp 9 nps 29 tbhits 0 pv d2d4 g8f6 c2c4
info depth 2 seldepth 4 time 1915 nodes 16 score cp 9 nps 36 tbhits 0 pv d2d4 g8f6 c2c4 e7e6
info depth 3 seldepth 4 time 2027 nodes 24 score cp 9 nps 43 tbhits 0 pv d2d4 g8f6 c2c4 e7e6
info depth 3 seldepth 5 time 2177 nodes 30 score cp 10 nps 42 tbhits 0 pv d2d4 g8f6 c2c4 c7c6 g1f3
info depth 3 seldepth 6 time 2298 nodes 37 score cp 10 nps 44 tbhits 0 pv d2d4 g8f6 c2c4 e7e6 g2g3
info depth 3 seldepth 7 time 2432 nodes 51 score cp 10 nps 53 tbhits 0 pv d2d4 g8f6 c2c4 e7e6 g2g3
info depth 3 seldepth 8 time 2621 nodes 114 score cp 10 nps 99 tbhits 0 pv d2d4 g8f6 c2c4 e7e6 g2g3 f8b4
info depth 4 seldepth 8 time 2728 nodes 160 score cp 10 nps 127 tbhits 0 pv d2d4 g8f6 c2c4 e7e6 g2g3 f8b4
info depth 4 seldepth 9 time 2975 nodes 238 score cp 10 nps 158 tbhits 0 pv d2d4 g8f6 c2c4 e7e6 g2g3 f8b4 c1d2
info depth 4 seldepth 10 time 3642 nodes 459 score cp 10 nps 211 tbhits 0 pv d2d4 g8f6 c2c4 e7e6 g2g3 f8b4 c1d2 b4e7
info depth 5 seldepth 11 time 4622 nodes 812 score cp 10 nps 257 tbhits 0 pv d2d4 g8f6 c2c4 e7e6 g2g3 f8b4 c1d2 b4e7 d1c2
info depth 5 seldepth 11 time 4816 nodes 889 score cp 10 nps 265 tbhits 0 pv d2d4 g8f6 c2c4 e7e6 g2g3 f8b4 c1d2 b4e7 d1c2
bestmove d2d4 ponder g8f6

So I guess the neural network that's installed in Lc0 v0.29 is faulty on my machine for some reason.

@borg323
Copy link
Member

borg323 commented Dec 23, 2022

It is far more likely there is an issue with the new nets on cards with compute capability 3.0. I'll keep this issue report open while we investigate.

@borg323 borg323 reopened this Dec 23, 2022
@borg323
Copy link
Member

borg323 commented Dec 23, 2022

Can you try this executable? https://ci.appveyor.com/api/buildjobs/tekvtsr36mdby89j/artifacts/build%2Flc0.exe
Update: https://ci.appveyor.com/api/buildjobs/jyf4lebup28gi3l4/artifacts/build%2Flc0.exe is the one to test.
With 791556.pb.gz for network file.

@Clausable
Copy link
Author

Clausable commented Dec 25, 2022

It runs, but it seems to run unusually quick.

       _
|   _ | |
|_ |_ |_| v0.30.0-dev+git.bd7566f built Dec 23 2022
go nodes 1000
Found pb network file: 791556.pb.gz
Creating backend [cudnn-auto]...
Switching to [cudnn]...
CUDA Runtime version: 10.0.0
Cudnn version: 7.4.2
Latest version of CUDA supported by the driver: 11.4.0
GPU: NVIDIA GeForce GTX 760
GPU memory: 2 GiB
GPU clock frequency: 1058.5 MHz
GPU compute capability: 3.0
info depth 1 seldepth 2 time 1029 nodes 2 score cp 15 nps 66 tbhits 0 pv e2e4 e7e5
info depth 2 seldepth 3 time 1078 nodes 6 score cp 14 nps 76 tbhits 0 pv e2e4 e7e5 g1f3
info depth 2 seldepth 4 time 1114 nodes 10 score cp 14 nps 87 tbhits 0 pv e2e4 e7e5 g1f3 g8f6
info depth 3 seldepth 5 time 1143 nodes 17 score cp 14 nps 118 tbhits 0 pv e2e4 e7e5 g1f3 g8f6 d2d4
info depth 3 seldepth 6 time 1195 nodes 34 score cp 15 nps 174 tbhits 0 pv e2e4 e7e5 g1f3 g8f6 d2d4 f6e4
info depth 3 seldepth 7 time 1210 nodes 37 score cp 15 nps 176 tbhits 0 pv e2e4 e7e5 g1f3 g8f6 d2d4 f6e4 f1d3
info depth 3 seldepth 8 time 1219 nodes 38 score cp 14 nps 173 tbhits 0 pv e2e4 e7e5 g1f3 g8f6 d2d4 f6e4 f1d3 d7d5
info depth 4 seldepth 9 time 1259 nodes 75 score cp 15 nps 288 tbhits 0 pv e2e4 e7e5 g1f3 g8f6 d2d4 f6e4 f1d3 d7d5 f3e5
info depth 4 seldepth 10 time 1316 nodes 127 score cp 15 nps 401 tbhits 0 pv e2e4 e7e5 g1f3 g8f6 d2d4 f6e4 f1d3 d7d5 f3e5 b8d7
info depth 4 seldepth 11 time 1378 nodes 206 score cp 14 nps 544 tbhits 0 pv e2e4 e7e5 g1f3 g8f6 d2d4 f6e4 f3e5 d7d6 e5f3 d6d5
info depth 5 seldepth 12 time 1476 nodes 307 score cp 15 nps 643 tbhits 0 pv e2e4 e7e5 g1f3 g8f6 d2d4 f6e4 f1d3 d7d5 f3e5 b8d7 e5d7 c8d7
info depth 5 seldepth 13 time 1593 nodes 453 score cp 15 nps 763 tbhits 0 pv e2e4 e7e5 g1f3 g8f6 d2d4 f6e4 f1d3 d7d5 f3e5 b8d7 b1c3 d7e5 d4e5
info depth 5 seldepth 14 time 1762 nodes 668 score cp 15 nps 876 tbhits 0 pv e2e4 e7e5 g1f3 g8f6 d2d4 f6e4 f1d3 d7d5 f3e5 b8d7 b1c3 d7e5 d4e5 f8b4
info depth 6 seldepth 15 time 1905 nodes 843 score cp 15 nps 931 tbhits 0 pv e2e4 e7e5 g1f3 g8f6 d2d4 f6e4 f1d3 d7d5 f3e5 b8d7 b1c3 d7e5 d4e5 f8b4 e1g1
info depth 6 seldepth 16 time 1975 nodes 903 score cp 15 nps 926 tbhits 0 pv e2e4 e7e5 g1f3 g8f6 d2d4 f6e4 f1d3 d7d5 f3e5 b8d7 b1c3 d7e5 d4e5 f8b4 e1g1 e4c3
info depth 6 seldepth 17 time 2035 nodes 939 score cp 15 nps 907 tbhits 0 pv e2e4 e7e5 g1f3 g8f6 d2d4 f6e4 f1d3 d7d5 f3e5 b8d7 b1c3 d7e5 d4e5 f8b4 e1g1 e4c3 b2c3
bestmove e2e4 ponder e7e5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants