Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDNN_STATUS_INTERNAL_ERROR on 12.2? #56

Closed
ityonemo opened this issue Oct 18, 2023 · 9 comments
Closed

CUDNN_STATUS_INTERNAL_ERROR on 12.2? #56

ityonemo opened this issue Oct 18, 2023 · 9 comments

Comments

@ityonemo
Copy link
Contributor

I had some serious struggles with cuda 11.8 (Exla-0.6 fails on this platform) so I upgraded to Cuda 12, but I wound up with 12.2:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Jul_11_02:20:44_PDT_2023
Cuda compilation tools, release 12.2, V12.2.128
Build cuda_12.2.r12.2/compiler.33053471_0
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.113.01             Driver Version: 535.113.01   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA A10                     On  | 00000000:06:00.0 Off |                    0 |
|  0%   30C    P8              17W / 150W |     18MiB / 23028MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      1699      G   /usr/lib/xorg/Xorg                            4MiB |
+---------------------------------------------------------------------------------------+

This seems to cause CUDNN_STATUS_INTERNAL_ERROR.

Downgrading to 11.8 and Exla-0.5 works (but other libraries, e.g. Bumblebee) fail on Exla-0.5

@seanmor5
Copy link
Contributor

Can you send the logs around that internal error?

@josevalim
Copy link
Contributor

And please include your relevant XLA_TARGETs :)

@ityonemo
Copy link
Contributor Author

thanks! Which logs should I send?

XLA_TARGET=cuda120

@josevalim
Copy link
Contributor

Thank you, and what is the CUDNN version?

@josevalim
Copy link
Contributor

You can also try building XLA from source and see if you have better luck.

@ityonemo
Copy link
Contributor Author

ityonemo commented Oct 18, 2023

unfortunately, building XLA from source stopped with Inconsistent CUDA toolkit path: /usr vs /usr/lib possibly because I switched from 11.8 to 12.x?

@ityonemo
Copy link
Contributor Author

I actually can't figure out how to find out what cudnn version I have directly. Some instructions on how to determine these in the readme might be helpful. I'l make a pr. Also a lot of people don't know this, but nvidia-smi will lie about the cuda version (the only way to know for sure is nvcc -V).

@jonatanklosko
Copy link
Member

jonatanklosko commented Oct 18, 2023

@ityonemo what OS do you use? On Debian/Ubuntu you can usually find cuDNN package version with apt-cache policy libcudnn8.

@ityonemo
Copy link
Contributor Author

Unable to locate package libcudnn8

I guess i don't have cudnn installed. Or i might have accidentally wiped it when i purged 11.8 =(

Ok, thanks. I think we can close this, will reopen if i install cudnn and can't get it working

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants