-
Notifications
You must be signed in to change notification settings - Fork 218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
System-wide CUDA in LD_LIBRARY_PATH breaks CUBLAS #1755
Comments
I just realized that this was due to there being a system-wide CUDA installation in |
That is unlikely, and is not how JLLs work. Even if you set a LD_LIBRARY_PATH,
Could you run both scenarios (i.e., the working one without LD_LIBRARY_PATH and the broken one with LD_LIBRARY_PATH set to a system-wide installation) while LD_DEBUG is set to |
Here it is, hope it helps! Output with system CUDA in LD_LIBRARY_PATH
Output without system CUDA in LD_LIBRARY_PATH
|
Thanks. That shows the issue happens with
vs
And through Maybe we should be patching The other position is to deem setting |
First, I consider Note that setting up an RPATH isn't necessarily a solution either, because
So in the end, my advice is:
|
I agree, but tell that to Nix or HPC users 🙂 Thanks for the details on RPATH, I'll just add
Some libraries don't expose their dependencies though. Take
Yet: julia> cudnnActivationDescriptor(CUDNN_ACTIVATION_RELU,CUDNN_NOT_PROPAGATE_NAN,0)
550427: find library=libcudnn_ops_infer.so.8 [0]; searching
550427: search path=/home/tim/Julia/depot/artifacts/39edbd07a46d182c2681130c16ff339251297514/lib/glibc-hwcaps/x86-64-v3:/home/tim/Julia/depot/artifacts/39edbd07a46d182c2681130c16ff339251297514/lib/glibc-hwcaps/x86-64-v2:/home/tim/Julia/depot/artifacts/39edbd07a46d182c2681130c16ff339251297514/lib/tls/x86_64/x86_64:/home/tim/Julia/depot/artifacts/39edbd07a46d182c2681130c16ff339251297514/lib/tls/x86_64:/home/tim/Julia/depot/artifacts/39edbd07a46d182c2681130c16ff339251297514/lib/tls/x86_64:/home/tim/Julia/depot/artifacts/39edbd07a46d182c2681130c16ff339251297514/lib/tls:/home/tim/Julia/depot/artifacts/39edbd07a46d182c2681130c16ff339251297514/lib/x86_64/x86_64:/home/tim/Julia/depot/artifacts/39edbd07a46d182c2681130c16ff339251297514/lib/x86_64:/home/tim/Julia/depot/artifacts/39edbd07a46d182c2681130c16ff339251297514/lib/x86_64:/home/tim/Julia/depot/artifacts/39edbd07a46d182c2681130c16ff339251297514/lib (RUNPATH from file /home/tim/Julia/depot/artifacts/39edbd07a46d182c2681130c16ff339251297514/lib/libcudnn.so)
550427: trying file=/home/tim/Julia/depot/artifacts/39edbd07a46d182c2681130c16ff339251297514/lib/glibc-hwcaps/x86-64-v3/libcudnn_ops_infer.so.8
550427: trying file=/home/tim/Julia/depot/artifacts/39edbd07a46d182c2681130c16ff339251297514/lib/glibc-hwcaps/x86-64-v2/libcudnn_ops_infer.so.8
550427: trying file=/home/tim/Julia/depot/artifacts/39edbd07a46d182c2681130c16ff339251297514/lib/tls/x86_64/x86_64/libcudnn_ops_infer.so.8
550427: trying file=/home/tim/Julia/depot/artifacts/39edbd07a46d182c2681130c16ff339251297514/lib/tls/x86_64/libcudnn_ops_infer.so.8
550427: trying file=/home/tim/Julia/depot/artifacts/39edbd07a46d182c2681130c16ff339251297514/lib/tls/x86_64/libcudnn_ops_infer.so.8
550427: trying file=/home/tim/Julia/depot/artifacts/39edbd07a46d182c2681130c16ff339251297514/lib/tls/libcudnn_ops_infer.so.8
550427: trying file=/home/tim/Julia/depot/artifacts/39edbd07a46d182c2681130c16ff339251297514/lib/x86_64/x86_64/libcudnn_ops_infer.so.8
550427: trying file=/home/tim/Julia/depot/artifacts/39edbd07a46d182c2681130c16ff339251297514/lib/x86_64/libcudnn_ops_infer.so.8
550427: trying file=/home/tim/Julia/depot/artifacts/39edbd07a46d182c2681130c16ff339251297514/lib/x86_64/libcudnn_ops_infer.so.8
550427: trying file=/home/tim/Julia/depot/artifacts/39edbd07a46d182c2681130c16ff339251297514/lib/libcudnn_ops_infer.so.8
550427:
550427:
550427: calling init: /home/tim/Julia/depot/artifacts/39edbd07a46d182c2681130c16ff339251297514/lib/libcudnn_ops_infer.so.8
julia> i.e. the library uses a plug-in system that dynamically |
If the library dlopen's by calculating a path relative to itself, that will work just fine. If it loads it by SONAME, that will also be just fine, as we'll already have loaded something with the same SONAME. The only case in which it won't work is if it tries to load the library by a non-canonical name; e.g. something like loading |
But how do you we will have loaded the sublibrary? How does the JLL know to load the sublibrary first if there's no dependency path from libcudnn.so to libcudnn_ops_infer.so? EDIT: in this case it will, but I think some OpenCL libraries dlopen the implementation sublibrary during initialization, i.e., not lazily when a call happens. Maybe I shouldn't worry about this though until people complain 🙂 |
Yeah, we don't have a good way right now to force an ordering on products within a single JLL. |
Anyway, this is fixed for the specific issue at hand here. It will take a while before this lands in a release though, so please continue not populating LD_LIBRARY_PATH with a toolkit directory for the time being. |
Describe the bug
The
norm(x)
function seems to be broken in CUDA.jl v4.To reproduce
The Minimal Working Example (MWE) for this bug:
gives the error:
Manifest.toml
Version info
Details on Julia:
Details on CUDA:
Additional context
I checked that the master branch also shows the error, while everything works fine in CUDA v3.13.0. The error also shows up with any other element type I checked (Float32, ComplexF64, etc).
The text was updated successfully, but these errors were encountered: