-
-
Notifications
You must be signed in to change notification settings - Fork 14.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nvidia-docker module/package #51733
nvidia-docker module/package #51733
Conversation
d44a939
to
16dceb7
Compare
16dceb7
to
35aea63
Compare
35aea63
to
d90450a
Compare
@Mic92 Can we get this closer to merging somehow? I don't think any of the linked issues are likely to show activity soon, and the feature is gated behind an extra switch, so the risk of breaking stuff is low. |
d90450a
to
f38f291
Compare
@Mic92 Comments addressed, and the branch has been rebased on top of master. |
f38f291
to
faa1acf
Compare
faa1acf
to
cd288aa
Compare
08878ea
to
f8c300b
Compare
I realized that there are actually two problems left:
How can I fix that? Especially accessing the actual kernelPackages from a package is probably not possible. So should I patch binaries in a module and put them in /run or something? |
So I modified the original nvidia_x11 packages to provide usable binaries which are linked into /run similar to the opengl libraries. Doesn't look very elegant, but at least this way there's less chance of version incompatibilities. |
@infinisil This could be re-reviewed, or I could rebase and squash the fixups on top of the current master. |
47a6c26
to
cd0f7be
Compare
@FRidh You can ignore this, I accidentally pushed a commit in the wrong branch. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have tested with the command you gave, working well, thanks! If you want it in the stable release you can make a backport to 19.03.
Oh actually before I merge, you should split the commit into 2: One for adding the package, and one for adding the NixOS option. And format your commit message according to the Contribution Guidelines, also linked to from the PR template. |
nvidia_x11 and persistenced were modified to provide binaries which can be mounted inside a docker-container to be executed there. most ldconfig-based discovery of bundled nvidia libraries is patched out ldconfig itself is patched to be able to deal with patchelf'ed libraries See https://sourceware.org/bugzilla/show_bug.cgi?id=23964
cd0f7be
to
7f7209e
Compare
Awesome. I'll make a backport PR once this is merged. |
If I want run a container requiring GPU on NixOS with this MR, do I have still have to pass these parameters into the container?
I currently do this to ensure that the |
Do you have an example image and command to test this? |
Not with me, but when I created images using tensorflowWithCuda in 18.09, I would then run |
Ok, I checked the list of auto-mounted libs again: https://github.com/NVIDIA/libnvidia-container/blob/773b1954446b73921ce16919248c764ff62d29ad/src/nvc_info.c#L73 |
@averelld I'll be trying that right now. Although I want to know that by |
I just tried it now:
It seems that there's still a problem, as that DSO error is exactly what I got before when I didn't mount the host's cuda driver into it. I had to add these in:
To ensure that the right |
Just thinking about this... this might be a nix specific issue, as the image that I'm trying to run used |
Regarding the runtime, that is only selected when using the nvidia-docker wrapper or explicitly. The enable flag just prepares mountable libraries like libcuda and such. |
Yes it is definitely the older version baked into the image. That's how |
It does, yes, along with a selection of nvidia libs from
The ones in |
I found it. The NVIDIA docker on NixOS puts When I used bash to enter into my container, the only place that had libraries was |
@averelld The above solution now works for my tensorflow container. However I wanted to do a little test on a basic trivial container to see what the nvidia-docker integration does step by step. I ended up hitting an error that is unexpected. # first I build a trivial container containing only bash and coreutils
docker load --input "$(
nix-build \
--no-out-link \
-E 'with import <nixpkgs> {}; dockerTools.buildImage { name = "bash"; tag = "latest"; contents = [ bash coreutils ]; }'
)"
# then I run it with --runtime=nvidia (same result with nvidia-docker ...)
docker run --runtime=nvidia -it --rm \
--env LD_LIBRARY_PATH='/usr/lib64' \
--env NVIDIA_DRIVER_CAPABILITIES='compute' \
--env NVIDIA_VISIBLE_DEVICES=all \
bash:latest \
/bin/bash The result unfortunately is this error:
|
Regarding the paths above, those paths are image dependent, there is no "NVIDIA docker on NixOS" path unfortunately, and the The run command above apparently fails because there is no |
Requires |
@CMCDragonkai do you have an example of using buildPytorch.nix:
but no GPU available |
Motivation for this change
See #27999 and NixOS/patchelf/issues/44 for some reasons why this is so messy. The ldconfig patch should go in standard glibc, but it's probably better to do that separately later.
Quick test:
nvidia-docker run nvidia/cuda:10.0-runtime nvidia-smi
Things done
sandbox
innix.conf
on non-NixOS)nix-shell -p nox --run "nox-review wip"
./result/bin/
)nix path-info -S
before and after)