-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Install CUDA and CUDA-Samples via the bot #381
Conversation
Instance
|
Unfortunately this is tripping over the issue with removing software from the overlay:
|
Co-authored-by: Kenneth Hoste <kenneth.hoste@ugent.be>
Co-authored-by: Kenneth Hoste <kenneth.hoste@ugent.be>
To actually test the final build you will need to have the drivers in place. The below should/may work and at least give you an idea of what is required (PR in the works): # Change directory to a location under host_injections
mkdir -p /cvmfs/pilot.eessi-hpc.org/host_injections/nvidia/host
cd /cvmfs/pilot.eessi-hpc.org/host_injections/nvidia/host
# Gather libraries on the host (_must_ be host ldconfig)
ldconfig -p | awk '{print $NF}' > libs.txt
# Allow for the fact that we may be in a container
ls /.singularity.d/libs/* >> libs.txt
# Link the relevant libraries
curl -O https://raw.githubusercontent.com/apptainer/apptainer/main/etc/nvliblist.conf
grep '.so$' nvliblist.conf | xargs -i grep {} libs.txt | xargs -i ln -s {}
# Inject CUDA version into dir
nvidia-smi --query-gpu=driver_version --format=csv,noheader | tail -n1 > version.txt
# Make latest symlink for NVIDIA drivers
cd ..
ln -s host latest
# Make sure the libraries can be found by the EESSI linker
source /cvmfs/pilot.eessi-hpc.org/versions/2023.06/init/bash
host_injection_linker_dir=${EESSI_EPREFIX/versions/host_injections}
mkdir -p $host_injection_linker_dir
cd $host_injection_linker_dir
ln -s /cvmfs/pilot.eessi-hpc.org/host_injections/nvidia/latest lib |
function find_host_ldconfig () {
if [ ! -z ${EESSI_HOST_LDCONFIG} ]; then
echo ${EESSI_HOST_LDCONFIG}
else
if [ -f /sbin/ldconfig ]; then
echo /u/bin/ldconfig
elif [ -f /usr/sbin/ldconfig ]; then
echo /usr/sbin/ldconfig
else
echo "This is weird, you should set ${EESSI_HOST_LDCONFIG} (and a support issue)" >&2
exit 1
fi
fi
}
on system without CUDA: $ nvidia_smi_out=$(nvidia-smi --query-gpu=driver_version --format=csv,noheader 2> nvidia-smi.errors)
$ echo $?
127
$ cat nvidia-smi.errors
-bash: nvidia-smi: command not found |
I'm in the EESSI container and ran everything in #381 (comment) except the
I now have:
|
Note that
Works fine. I'm a bit confused: how is our runtime linker supposed to pick up on things from |
I updated the last line of the script in my previous comment, it should have been
|
Yeah, I used your code from after your last edit, so that isn't the issue here:
Where / how is that arranged? (then I can check if for me somehow that is not correct) |
If you want to use
You don't have to do that though, you can just load the CUDA-Samples module and try to run |
Hm,
and
Actually look perfectly fine. It seems to be on the search path for the linker. But... of course
I guess for software that is actually build in the software layer, this wouldn't be an option. |
Yes, I came to this conclusion as well. Unfortunately, my writeable overlay has disappeared (again), and with it all the contents. So no more |
Oh, btw, it does mean that if your script should work in the EESSI container (and we probably want that), you'll have to do that copy of nvidia-smi, patchelf the linker, before you run it with
|
No, not really, you can just do:
|
True, and much easier / foolproof. I meant: just remember to include that support for running it in the container if you make the PR for that :) Oh, and it seems it needs to end in
|
…into cuda_install
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm! But, hitting the annying rate limit again on the CI... We'll have to wait for a bit before we try again I guess :(
This actually needs to be built by the bot, but won't work without first using the script to install CUDA under |
GPU support implemented with #434 |
{2023.06}[foss/2023a] GATK v4.5.0.0
Requires #368