Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUBLAS_STATUS_NOT_INITIALIZED #164

Open
gpwood opened this issue Jun 4, 2024 · 5 comments
Open

CUBLAS_STATUS_NOT_INITIALIZED #164

gpwood opened this issue Jun 4, 2024 · 5 comments

Comments

@gpwood
Copy link

gpwood commented Jun 4, 2024

Hello, I just installed this package on an A10G with CUDA 12:

    [gwood@gaia-single-gpu-dy-g5-4xlarge-1 ~]$ nvidia-smi
    Tue Jun  4 12:15:18 2024       
    +---------------------------------------------------------------------------------------+
    | NVIDIA-SMI 535.54.03              Driver Version: 535.54.03    CUDA Version: 12.2     |
    |-----------------------------------------+----------------------+----------------------+
    | GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
    |                                         |                      |               MIG M. |
    |=========================================+======================+======================|
    |   0  NVIDIA A10G                    On  | 00000000:00:1E.0 Off |                    0 |
    |  0%   24C    P8              22W / 300W |      4MiB / 23028MiB |      0%      Default |
    |                                         |                      |                  N/A |
    +-----------------------------------------+----------------------+----------------------+
                                                                                             
    +---------------------------------------------------------------------------------------+
    | Processes:                                                                            |
    |  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
    |        ID   ID                                                             Usage      |
    |=======================================================================================|
    |  No running processes found                                                           |
    +---------------------------------------------------------------------------------------+

when I run a simple example:

import pyscf
from pyscf.dft import rks

atom ='''
O       0.0000000000    -0.0000000000     0.1174000000
H      -0.7570000000    -0.0000000000    -0.4696000000
H       0.7570000000     0.0000000000    -0.4696000000
'''

mol = pyscf.M(atom=atom, basis='def2-tzvpp')
mf = rks.RKS(mol, xc='LDA').density_fit().to_gpu()  # move PySCF object to GPU4PySCF object
e_dft = mf.kernel()  # compute total energy

I get the following error:

         ~~~~~~^~~~~~~~~~~~~~~~~~
  File "cupy/_core/core.pyx", line 1289, in cupy._core.core._ndarray_base.__matmul__
  File "cupy/_core/_routines_linalg.pyx", line 846, in cupy._core._routines_linalg.matmul
  File "cupy/_core/_routines_linalg.pyx", line 536, in cupy._core._routines_linalg.dot
  File "cupy/_core/_routines_linalg.pyx", line 626, in cupy._core._routines_linalg.tensordot_core
  File "cupy/_core/_routines_linalg.pyx", line 763, in cupy._core._routines_linalg.tensordot_core_v11
  File "cupy_backends/cuda/libs/cublas.pyx", line 1426, in cupy_backends.cuda.libs.cublas.gemmEx
  File "cupy_backends/cuda/libs/cublas.pyx", line 1454, in cupy_backends.cuda.libs.cublas.gemmEx
  File "cupy_backends/cuda/libs/cublas.pyx", line 438, in cupy_backends.cuda.libs.cublas.check_status
cupy_backends.cuda.libs.cublas.CUBLASError: CUBLAS_STATUS_NOT_INITIALIZED

I'm running Python 3.11.9, any ideas?

@wxj6000
Copy link
Collaborator

wxj6000 commented Jun 4, 2024

It could be the incompatibility issue among cuda drvier, cuda toolkit and cupy. Can you also post the output of nvcc --version?

@gpwood
Copy link
Author

gpwood commented Jun 4, 2024

This is the output:

(/exs/shared/collaboration/teams/qmteam/shared/gwood/pyqc/.venv) [gwood@gaia-single-gpu-dy-g5-4xlarge-1 pyqc]$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Mon_Oct_12_20:09:46_PDT_2020
Cuda compilation tools, release 11.1, V11.1.105
Build cuda_11.1.TC455_06.29190527_0

I'm loading it through spack

 spack load --best-arch cuda

@wxj6000
Copy link
Collaborator

wxj6000 commented Jun 4, 2024

This is the output:

(/exs/shared/collaboration/teams/qmteam/shared/gwood/pyqc/.venv) [gwood@gaia-single-gpu-dy-g5-4xlarge-1 pyqc]$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Mon_Oct_12_20:09:46_PDT_2020
Cuda compilation tools, release 11.1, V11.1.105
Build cuda_11.1.TC455_06.29190527_0

I'm loading it through spack

 spack load --best-arch cuda

It seems that you are using cuda toolkit v11. You will need to install gpu4pyscf-cuda11x. Thank you for your feedback. We should clarify more in the installation instruction for the CUDA version.

@gpwood
Copy link
Author

gpwood commented Jun 4, 2024

ok thank you. Does the installation of gpu4pyscf include cupy? I've followed the install instructions as written using cuda11 versions as recommended but now get this error:

  File "/exs/shared/collaboration/teams/qmteam/shared/gwood/pyqc/.venv/lib/python3.11/site-packages/gpu4pyscf/lib/diis.py", line 25, in <module>
    import cupy
ModuleNotFoundError: No module named 'cupy'

I've just tried to install this with pip3 but it fails:

        File "/tmp/pip-install-ct08rl2u/cupy_03a99d85276d427f869a5ec942870dbd/install/cupy_builder/_compiler.py", line 148, in _nvcc_gencode_options
          assert False
                 ^^^^^
      AssertionError
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for cupy
  Running setup.py clean for cupy
Failed to build cupy
ERROR: Could not build wheels for cupy, which is required to install pyproject.toml-based projects

@wxj6000
Copy link
Collaborator

wxj6000 commented Jun 4, 2024

@gpwood GPU4PySCF does include cupy as a dependency. Since you have installed gpu4pyscf-cuda12x before, pip probably did not install cupy for you again. You will need to uninstall gpu4pyscf-cuda12x and cupy-cuda12x completely via

pip3 uninstall gpu4pyscf-cuda12x
pip3 uninstall cupy-cuda12x

Then pip3 install gpu4pyscf-cuda11x.

And if you want to install cupy individually, you will also need to install it via pip3 install cupy-cuda11x. pip3 install cupy will build cupy from it source code. It will generally fail.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants