-
Notifications
You must be signed in to change notification settings - Fork 218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Local runtime discovery does not work for external libraries (CUDNN, CUTENSOR) #1850
Comments
We see a similar issue with CUDA.jl v4.1 not being able to find the cuTENSOR library, as discussed in ITensor/ITensors.jl#1107. At least for cuTENSOR, the issue seems to be that CUDA.jl v4.1 bumps support to CUDA 12.1 but cuTENSOR doesn't support CUDA 12.1 yet. |
That's a different issue. You 'just' need to downgrade the CUDA runtime version to one that CUTENSOR supports. Importing CUTENSOR.jl should warn about that (or at least mention that the library isn't available for your platform). |
As a workaround, it should be possible to download the artifacts on the login node by specifying the CUDA runtime version to use on the compute nodes. The documentation has some details on that, as it's similar to how containers work: https://cuda.juliagpu.org/stable/installation/overview/#Containers |
If anybody wants to help fixing this, it would involve adding (optional) discovery of sublibraries to https://github.com/JuliaGPU/CUDA_Runtime_Discovery.jl, and use that JLL conditionally just like it's done now in CUDA.jl proper: Lines 23 to 31 in 5c51766
|
Indeed, this is the error that was reported: Error: cuTENSOR is not available for your platform (x86_64-linux-gnu-libgfortran5-cxx11-libstdcxx30-cuda+12.1-julia_version+1.8.5)
└ @ cuTENSOR ~/.julia/packages/cuTENSOR/WTpqy/src/cuTENSOR.jl:99
[...] It's a bit opaque since it doesn't specifically say the issue is the CUDA version so maybe that could be improved a bit, if possible (say by reporting the closest platform that is supported). I guess I was hoping this could get handled in some way by the Julia package manager (for example by setting an upper bound on the compat of |
Yes, I agree. The whole transformation to JLLs is fairly new, so some issues are expected (hence this was a breaking release).
Sadly, that's not possible. This was explicitly decided against by the Pkg developers because it would make package resolution a much harder problem. |
I am running into the same issue. I have CUDA setup through the nvidia HPC SDK and cudnn as a separate one. All the path are setup properly but afterI set A separate note, and not sure if it is related, even with "local" runtime version, CUDA seems to still load |
I am using CUDA and Flux on a HPC. There is no internet access when I mount a gpu via slurm so I ran
CUDA.set_runtime_version!("local")
, which CUDA.jl 4 requires for finding local installs. I ensure local installation of cuda and cudnn are loaded via the slurm module load command. importing Flux gives an error indicating cuDNN is not available for my platform, and broadcasting functions from Flux onto CuArrays does not work. See MWE below.Manifest.toml
Expected behavior
With CUDA@3.12.0, it was able to discover all local installtionas of cuda and cudnn and the MWE above would work fine. The cuda runtime version is 11.5 and cudnn is 8.3.1.
Version info
Details on Julia:
Details on CUDA:
Additional context
here is the MWE with JULIA_DEBUG=all
The text was updated successfully, but these errors were encountered: