"Could not create cudnn handle" error #79

gregszumel · 2024-03-15T12:50:54Z

Hi - I'm not 100% sure this is an EXLA error, but it's my best guess. I'm running into an issue when trying to do ops on tensors in on Cuda (see below). Do you know what might be causing this? I've tried a few things (playing around with :preallocate, :memory_fraction, reinstalling cudnn, downgrading CUDA, etc), but nothing has worked so far. I have verified that CuDNN was installed properly through here

# running Nx -> 0.7.1, Exla -> 0.7.1, xla -> 0.6.0
iex(1)> t = Nx.tensor([1], backend: EXLA.Backend)

08:25:16.940 [info] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355

08:25:16.942 [info] XLA service <service> initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:

08:25:16.942 [info]   StreamExecutor device (0): NVIDIA RTX A6000, Compute Capability 8.6

08:25:16.942 [info] Using BFC allocator.

08:25:16.942 [info] XLA backend allocating 45932072140 bytes on device 0 for BFCAllocator.
#Nx.Tensor<
  s64[1]
  EXLA.Backend<cuda:0, 0.2762047049.2204500040.162304>
  [1]
>

iex(2)> Nx.add(t, t)

08:23:37.926 [error] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR

08:23:37.926 [error] Memory usage: 4734255104 bytes free, 51035635712 bytes total.
** (RuntimeError) DNN library initialization failed. Look at the errors above for more details.
    (exla 0.7.1) lib/exla/mlir/module.ex:127: EXLA.MLIR.Module.unwrap!/1
    (exla 0.7.1) lib/exla/mlir/module.ex:113: EXLA.MLIR.Module.compile/5
    (stdlib 5.2.1) timer.erl:270: :timer.tc/2
    (exla 0.7.1) lib/exla/defn.ex:599: anonymous fn/12 in EXLA.Defn.compile/8
    (exla 0.7.1) lib/exla/mlir/context_pool.ex:10: anonymous fn/3 in EXLA.MLIR.ContextPool.checkout/1
    (nimble_pool 1.0.0) lib/nimble_pool.ex:349: NimblePool.checkout!/4
    (exla 0.7.1) lib/exla/defn/locked_cache.ex:36: EXLA.Defn.LockedCache.run/2
    iex:1: (file)

Versions

OS: Ubuntu 22.04
Nvidia driver version: 545.29.06
CUDA version

> nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Mon_Apr__3_17:16:06_PDT_2023
Cuda compilation tools, release 12.1, V12.1.105
Build cuda_12.1.r12.1/compiler.32688072_0

CuDNN installed via here and verified it works (through the verified section)

The text was updated successfully, but these errors were encountered:

polvalente · 2024-03-15T12:52:55Z

What's your cudnn version? IIRC we require cudnn8, not cudnn9

gregszumel · 2024-03-15T12:53:40Z

It is 9! I'll downgrade and report back

gregszumel · 2024-03-15T13:33:20Z

Fixed, thanks for the speedy reply!

gregszumel closed this as completed Mar 15, 2024

zacksiri mentioned this issue Jun 6, 2024

add < 9 for cuDNN spec #87

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"Could not create cudnn handle" error #79

"Could not create cudnn handle" error #79

gregszumel commented Mar 15, 2024

polvalente commented Mar 15, 2024

gregszumel commented Mar 15, 2024

gregszumel commented Mar 15, 2024

"Could not create cudnn handle" error #79

"Could not create cudnn handle" error #79

Comments

gregszumel commented Mar 15, 2024

polvalente commented Mar 15, 2024

gregszumel commented Mar 15, 2024

gregszumel commented Mar 15, 2024