-
-
Notifications
You must be signed in to change notification settings - Fork 14.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nixos/virtualisation.containers.cdi.dynamic.nvidia: EGL broken #297537
Comments
I'll submit a patch to nvidia that should fix this. It seems like the nvidia tooling should be using XDG_DATA_DIRS instead of hardcoding /usr/share. On the nixos side though, we should be adding |
This is the CDI output I get when using XDG_DATA_DIRS that includes /run/openl-driver/share:
@SomeoneSerge does this include everything you expect? |
For my current use-case I think this is the thing, so yes. Of course we still need to ensure that the closure is included in the container, including the symlink targets
I'm pretty sure we do do that for libglvnd somehow, but I'd need to search for the concrete rreferences |
Thanks @jmbaur. I assume this works with the change you plan to submit to nvidia-container-toolkit, right?
Wouldn't this be a matter of setting |
We need to somehow synchronize at least the following:
In the worst case we track them manually as we clarify and implement #141803 EDIT: CC @Atemu |
Sorry, what exactly is there to synchronise? They all appear to use some subdir of |
Hi! I tagged you because a potential implication of #141803 is that the location of these configs might change in the future, and the current issue is about making |
Yes! PR is here: NVIDIA/nvidia-container-toolkit#425 |
Awesome! |
@SomeoneSerge what kind of container are you running? I'm worried that in the case where the container is built using nix, everything works when we mount the paths under |
Yes, there are two layers to the issue:
|
Steps To Reproduce
Steps to reproduce the behavior:
Enable CDI and podman:
Run any program doing off-screen rendering in a container, cf. https://github.com/SomeoneSerge/cdi-vtk-egl-repro/blob/b841dade859918856622b623aa9e366738cb2062/nix/nixos-cdi-glvnd-repro/package.nix#L36-L41 (vtk-egl defined in https://github.com/SomeoneSerge/cdi-vtk-egl-repro/blob/master/nix/vtk-egl/package.nix):
Verify the issue is eliminated by mounting the glvnd configs:
See what gets mounted into the container:
Note how the paths from
/etc/egl
are present, but/run/opengl-driver/share/{egl,glvnd,vulkan}
are omitted.This is likely because https://github.com/NVIDIA/nvidia-container-toolkit/blob/1ddc859700c0d698f7f155fdbf7ae6f77ea0c1f5/internal/discover/graphics.go#L64-L77 hard-codes the FHS paths.
Expected behavior
Glvnd configs are part of the host configuration and should be mounted into the containers
Notify maintainers
CC @ereslibre @jmbaur
CC @ShamrockLee it's the same issue with
apptainer exec --nv
Metadata
Please run
nix-shell -p nix-info --run "nix-info -m"
and paste the result.9df3e30ce24fd28c7b3e2de0d986769db5d6225d
Add a 👍 reaction to issues you find important.
The text was updated successfully, but these errors were encountered: