Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enable installation of samples for CUDA > 10.1 #2374

Merged
merged 8 commits into from
May 29, 2021

Conversation

bartoldeman
Copy link
Contributor

For CUDA > 5 and < 10.1 they are installed by default.
For CUDA > 10.1 we need to add --samples to the installer command
line.
I could not find any evidence of libglut being needed as osdependency
(even for CUDA 6), so deleted those comments.

For CUDA > 5 and < 10.1 they are installed by default.
For CUDA > 10.1 we need to add --samples to the installer command
line.
I could not find any evidence of libglut being needed as osdependency
(even for CUDA 6), so deleted those comments.
@bartoldeman bartoldeman added this to the next release (4.3.4?) milestone Mar 30, 2021
@mboisson
Copy link
Contributor

mboisson commented Mar 30, 2021

Should probably add a check in the sanity checks ?

Nevermind, already there.

@bartoldeman
Copy link
Contributor Author

Test report by @bartoldeman

Overview of tested easyconfigs (in order)

  • SUCCESS CUDAcore-11.0.2.eb

Build succeeded for 1 out of 1 (1 easyconfigs in total)
build-node.computecanada.ca - Linux centos linux 7.9.2009, x86_64, Intel Xeon Processor (Skylake, IBRS), Python 3.7.7
See https://gist.github.com/1410f224b2affb5eb477a7d8a954fb68 for a full test report.

@bartoldeman
Copy link
Contributor Author

I just found out this works with 10.2 and 11.0 but fails with 10.1. Checking out why.

@boegel boegel changed the title cuda: install samples enable installation of samples for CUDA > 10.1 Mar 30, 2021
@boegel
Copy link
Member

boegel commented Mar 30, 2021

Test report by @boegel

Overview of tested easyconfigs (in order)

  • SUCCESS CUDA-10.1.105-GCC-8.2.0-2.31.1.eb

Build succeeded for 1 out of 1 (1 easyconfigs in total)
node3506.doduo.os - Linux RHEL 8.2, x86_64, AMD EPYC 7552 48-Core Processor (zen2), Python 3.6.8
See https://gist.github.com/32b205a99a9717fa2dad0d9decf7552d for a full test report.

@boegel
Copy link
Member

boegel commented Mar 30, 2021

@bartoldeman Works fine for me with CUDA 10.1?

@boegel
Copy link
Member

boegel commented Mar 30, 2021

Test report by @boegel

Overview of tested easyconfigs (in order)

  • SUCCESS CUDA-7.0.28.eb
  • SUCCESS CUDA-8.0.44.eb

Build succeeded for 2 out of 2 (2 easyconfigs in total)
node2682.swalot.os - Linux centos linux 7.9.2009, x86_64, Intel(R) Xeon(R) CPU E5-2660 v3 @ 2.60GHz (haswell), Python 3.6.8
See https://gist.github.com/ae056bef5071cf8d176df82614089106 for a full test report.

@bartoldeman
Copy link
Contributor Author

@bartoldeman Works fine for me with CUDA 10.1?

@boegel it's odd, it works for me as normal user but if I use sudo as different user (our software installation user) it complains. I'm still digging to figure out why.

samples are installed in two places with identical copies:
self.installdir/samples and $HOME/NVIDIA_CUDA-11.2_Samples
changing the second location to a scratch location (self.builddir) avoids the duplicate
@bartoldeman
Copy link
Contributor Author

@boegel two things happening here:

  • without --samplespath=xxx it creates two copies of the samples, one in self.installdir/samples and one in $HOME/NVIDIA_CUDA-10.1_Samples (the latter is documented here: https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html). We want to avoid the HOME installation, and the only way I found out that works is to point it to a scratch location.
  • if you run eb via sudo it'll try to derive HOME from SUDO_USER and also chown the files even if you change --samplespath (blindly assuming sudo made it go as root). We're using sudo -u ebuser -i eb... however and the chown fails. I've worked around this by unsetting SUDO_USER into our eb wrapper script -- please let me know if it would be appropriate to do such a thing in the easyblock, or if it's too site-specific.

This avoids issues with eb is called via sudo -iu someuser ...
as the CUDA installer tries to chown sample files to $SUDO_USER
@boegel boegel modified the milestones: 4.4.0, release after 4.4.0 May 27, 2021
@boegel
Copy link
Member

boegel commented May 29, 2021

Test report by @boegel

Overview of tested easyconfigs (in order)

  • SUCCESS CUDA-7.0.28.eb
  • SUCCESS CUDA-7.5.18.eb
  • SUCCESS CUDA-8.0.44.eb
  • SUCCESS CUDA-9.0.176.eb
  • SUCCESS CUDA-9.2.88.eb
  • SUCCESS CUDA-10.0.130.eb
  • SUCCESS CUDA-10.1.105.eb
  • SUCCESS CUDA-10.1.243.eb
  • SUCCESS CUDA-11.0.2-GCC-9.3.0.eb
  • SUCCESS CUDA-11.1.1-GCC-10.2.0.eb

Build succeeded for 10 out of 10 (10 easyconfigs in total)
node2618.swalot.os - Linux centos linux 7.9.2009, x86_64, Intel(R) Xeon(R) CPU E5-2660 v3 @ 2.60GHz (haswell), Python 3.6.8
See https://gist.github.com/0264bd8e43185a80feb2926315ffb8d4 for a full test report.

@boegel boegel modified the milestones: release after 4.4.0, 4.4.0 May 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants