Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nixos/virtualisation.containers.cdi.dynamic.nvidia: EGL broken #297537

Closed
SomeoneSerge opened this issue Mar 20, 2024 · 11 comments · Fixed by #314840
Closed

nixos/virtualisation.containers.cdi.dynamic.nvidia: EGL broken #297537

SomeoneSerge opened this issue Mar 20, 2024 · 11 comments · Fixed by #314840
Assignees
Labels
0.kind: bug Something is broken 5. scope: tracked Issue (or PR) is linked back to a `5. scope: tracking` issue

Comments

@SomeoneSerge
Copy link
Contributor

SomeoneSerge commented Mar 20, 2024

Steps To Reproduce

Steps to reproduce the behavior:

Enable CDI and podman:

{
  virtualisation.podman.enable = true;
  virtualisation.containers.cdi.dynamic.nvidia.enable = true;
  services.xserver.videoDrivers = [ "nvidia" ];
  hardware.opengl.enable = true;
}

Run any program doing off-screen rendering in a container, cf. https://github.com/SomeoneSerge/cdi-vtk-egl-repro/blob/b841dade859918856622b623aa9e366738cb2062/nix/nixos-cdi-glvnd-repro/package.nix#L36-L41 (vtk-egl defined in https://github.com/SomeoneSerge/cdi-vtk-egl-repro/blob/master/nix/vtk-egl/package.nix):

nix run github:SomeoneSerge/cdi-vtk-egl-repro -- fail
...
2024-03-20 21:08:55.171 (   0.028s) [        F70FA740] vtkEGLRenderWindow.cxx:383   WARN| vtkEGLRenderWindow (0x1ea0e70): Setting an EGL display to device index: -1 require EGL_EXT_device_base EGL_EXT_platform_device EGL_EXT_platform_base extensions
2024-03-20 21:08:55.171 (   0.028s) [        F70FA740] vtkEGLRenderWindow.cxx:388   WARN| vtkEGLRenderWindow (0x1ea0e70): Attempting to use EGL_DEFAULT_DISPLAY...
2024-03-20 21:08:55.171 (   0.028s) [        F70FA740] vtkEGLRenderWindow.cxx:393    ERR| vtkEGLRenderWindow (0x1ea0e70): Could not initialize a device. Exiting...
2024-03-20 21:08:55.171 (   0.028s) [        F70FA740]vtkOpenGLRenderWindow.c:511    ERR| vtkEGLRenderWindow (0x1ea0e70): GLEW could not be initialized: Missing GL version

Verify the issue is eliminated by mounting the glvnd configs:

❯ nix run github:SomeoneSerge/cdi-vtk-egl-repro -- fix
...
❯ # (0 exit status)

See what gets mounted into the container:

❯ rg egl /run/cdi/nvidia-container-toolkit.json 
126:        "hostPath": "/etc/egl/egl_external_platform.d/10_nvidia_wayland.json",
127:        "containerPath": "/etc/egl/egl_external_platform.d/10_nvidia_wayland.json",
136:        "hostPath": "/etc/egl/egl_external_platform.d/15_nvidia_gbm.json",
137:        "containerPath": "/etc/egl/egl_external_platform.d/15_nvidia_gbm.json",
246:        "hostPath": "/nix/store/b0cwmgcn3qiziwcqsfmf6frpfll8nx6l-nvidia-x11-550.54.14-6.1.80/lib/libnvidia-egl-gbm.so.1.1.1",
247:        "containerPath": "/nix/store/b0cwmgcn3qiziwcqsfmf6frpfll8nx6l-nvidia-x11-550.54.14-6.1.80/lib/libnvidia-egl-gbm.so.1.1.1",
256:        "hostPath": "/nix/store/b0cwmgcn3qiziwcqsfmf6frpfll8nx6l-nvidia-x11-550.54.14-6.1.80/lib/libnvidia-eglcore.so.550.54.14",
257:        "containerPath": "/nix/store/b0cwmgcn3qiziwcqsfmf6frpfll8nx6l-nvidia-x11-550.54.14-6.1.80/lib/libnvidia-eglcore.so.550.54.14",
❯ rg glvnd /run/cdi/nvidia-container-toolkit.json
❯ 

Note how the paths from /etc/egl are present, but /run/opengl-driver/share/{egl,glvnd,vulkan} are omitted.

This is likely because https://github.com/NVIDIA/nvidia-container-toolkit/blob/1ddc859700c0d698f7f155fdbf7ae6f77ea0c1f5/internal/discover/graphics.go#L64-L77 hard-codes the FHS paths.

Expected behavior

Glvnd configs are part of the host configuration and should be mounted into the containers

Notify maintainers

CC @ereslibre @jmbaur
CC @ShamrockLee it's the same issue with apptainer exec --nv

Metadata

Please run nix-shell -p nix-info --run "nix-info -m" and paste the result.

9df3e30ce24fd28c7b3e2de0d986769db5d6225d

Add a 👍 reaction to issues you find important.

@SomeoneSerge SomeoneSerge added the 0.kind: bug Something is broken label Mar 20, 2024
@ereslibre ereslibre self-assigned this Mar 21, 2024
@jmbaur
Copy link
Contributor

jmbaur commented Mar 22, 2024

I'll submit a patch to nvidia that should fix this. It seems like the nvidia tooling should be using XDG_DATA_DIRS instead of hardcoding /usr/share. On the nixos side though, we should be adding /run/opengl-driver/share to XDG_DATA_DIRS.

@jmbaur
Copy link
Contributor

jmbaur commented Mar 22, 2024

This is the CDI output I get when using XDG_DATA_DIRS that includes /run/openl-driver/share:

{
  "cdiVersion": "0.5.0",
  "kind": "nvidia.com/gpu",
  "devices": [
    {
      "name": "0",
      "containerEdits": {
        "deviceNodes": [
          {
            "path": "/dev/nvidia0"
          },
          {
            "path": "/dev/dri/card0"
          },
          {
            "path": "/dev/dri/renderD128"
          }
        ],
        "hooks": [
          {
            "hookName": "createContainer",
            "path": "/nix/store/hk2l16qfn6ghisw123mqkk5bg1hv0icb-container-toolkit-container-toolkit-1.15.0-rc.3/bin/nvidia-ctk",
            "args": [
              "nvidia-ctk",
              "hook",
              "create-symlinks",
              "--link",
              "../card0::/dev/dri/by-path/pci-0000:65:00.0-card",
              "--link",
              "../renderD128::/dev/dri/by-path/pci-0000:65:00.0-render"
            ]
          },
          {
            "hookName": "createContainer",
            "path": "/nix/store/hk2l16qfn6ghisw123mqkk5bg1hv0icb-container-toolkit-container-toolkit-1.15.0-rc.3/bin/nvidia-ctk",
            "args": [
              "nvidia-ctk",
              "hook",
              "chmod",
              "--mode",
              "755",
              "--path",
              "/dev/dri"
            ]
          }
        ]
      }
    },
    {
      "name": "all",
      "containerEdits": {
        "deviceNodes": [
          {
            "path": "/dev/nvidia0"
          },
          {
            "path": "/dev/dri/card0"
          },
          {
            "path": "/dev/dri/renderD128"
          }
        ],
        "hooks": [
          {
            "hookName": "createContainer",
            "path": "/nix/store/hk2l16qfn6ghisw123mqkk5bg1hv0icb-container-toolkit-container-toolkit-1.15.0-rc.3/bin/nvidia-ctk",
            "args": [
              "nvidia-ctk",
              "hook",
              "create-symlinks",
              "--link",
              "../card0::/dev/dri/by-path/pci-0000:65:00.0-card",
              "--link",
              "../renderD128::/dev/dri/by-path/pci-0000:65:00.0-render"
            ]
          },
          {
            "hookName": "createContainer",
            "path": "/nix/store/hk2l16qfn6ghisw123mqkk5bg1hv0icb-container-toolkit-container-toolkit-1.15.0-rc.3/bin/nvidia-ctk",
            "args": [
              "nvidia-ctk",
              "hook",
              "chmod",
              "--mode",
              "755",
              "--path",
              "/dev/dri"
            ]
          }
        ]
      }
    }
  ],
  "containerEdits": {
    "deviceNodes": [
      {
        "path": "/dev/nvidia-modeset"
      },
      {
        "path": "/dev/nvidia-uvm"
      },
      {
        "path": "/dev/nvidia-uvm-tools"
      },
      {
        "path": "/dev/nvidiactl"
      }
    ],
    "hooks": [
      {
        "hookName": "createContainer",
        "path": "/nix/store/hk2l16qfn6ghisw123mqkk5bg1hv0icb-container-toolkit-container-toolkit-1.15.0-rc.3/bin/nvidia-ctk",
        "args": [
          "nvidia-ctk",
          "hook",
          "update-ldcache",
          "--ldconfig-path",
          "/nix/store/2ksh88m9fnnmj8xn5a2a0z2q9vakbjpj-glibc-2.38-44-bin/bin/ldconfig",
          "--folder",
          "/nix/store/qx6j94l99y8179vgdsyydcd1y0qs60z4-nvidia-x11-550.67-6.6.22/lib"
        ]
      }
    ],
    "mounts": [
      {
        "hostPath": "/etc/egl/egl_external_platform.d/10_nvidia_wayland.json",
        "containerPath": "/etc/egl/egl_external_platform.d/10_nvidia_wayland.json",
        "options": [
          "ro",
          "nosuid",
          "nodev",
          "bind"
        ]
      },
      {
        "hostPath": "/etc/egl/egl_external_platform.d/15_nvidia_gbm.json",
        "containerPath": "/etc/egl/egl_external_platform.d/15_nvidia_gbm.json",
        "options": [
          "ro",
          "nosuid",
          "nodev",
          "bind"
        ]
      },
      {
        "hostPath": "/nix/store/qx6j94l99y8179vgdsyydcd1y0qs60z4-nvidia-x11-550.67-6.6.22/lib/libEGL_nvidia.so.550.67",
        "containerPath": "/nix/store/qx6j94l99y8179vgdsyydcd1y0qs60z4-nvidia-x11-550.67-6.6.22/lib/libEGL_nvidia.so.550.67",
        "options": [
          "ro",
          "nosuid",
          "nodev",
          "bind"
        ]
      },
      {
        "hostPath": "/nix/store/qx6j94l99y8179vgdsyydcd1y0qs60z4-nvidia-x11-550.67-6.6.22/lib/libGLESv1_CM_nvidia.so.550.67",
        "containerPath": "/nix/store/qx6j94l99y8179vgdsyydcd1y0qs60z4-nvidia-x11-550.67-6.6.22/lib/libGLESv1_CM_nvidia.so.550.67",
        "options": [
          "ro",
          "nosuid",
          "nodev",
          "bind"
        ]
      },
      {
        "hostPath": "/nix/store/qx6j94l99y8179vgdsyydcd1y0qs60z4-nvidia-x11-550.67-6.6.22/lib/libGLESv2_nvidia.so.550.67",
        "containerPath": "/nix/store/qx6j94l99y8179vgdsyydcd1y0qs60z4-nvidia-x11-550.67-6.6.22/lib/libGLESv2_nvidia.so.550.67",
        "options": [
          "ro",
          "nosuid",
          "nodev",
          "bind"
        ]
      },
      {
        "hostPath": "/nix/store/qx6j94l99y8179vgdsyydcd1y0qs60z4-nvidia-x11-550.67-6.6.22/lib/libGLX_nvidia.so.550.67",
        "containerPath": "/nix/store/qx6j94l99y8179vgdsyydcd1y0qs60z4-nvidia-x11-550.67-6.6.22/lib/libGLX_nvidia.so.550.67",
        "options": [
          "ro",
          "nosuid",
          "nodev",
          "bind"
        ]
      },
      {
        "hostPath": "/nix/store/qx6j94l99y8179vgdsyydcd1y0qs60z4-nvidia-x11-550.67-6.6.22/lib/libcuda.so.550.67",
        "containerPath": "/nix/store/qx6j94l99y8179vgdsyydcd1y0qs60z4-nvidia-x11-550.67-6.6.22/lib/libcuda.so.550.67",
        "options": [
          "ro",
          "nosuid",
          "nodev",
          "bind"
        ]
      },
      {
        "hostPath": "/nix/store/qx6j94l99y8179vgdsyydcd1y0qs60z4-nvidia-x11-550.67-6.6.22/lib/libcudadebugger.so.550.67",
        "containerPath": "/nix/store/qx6j94l99y8179vgdsyydcd1y0qs60z4-nvidia-x11-550.67-6.6.22/lib/libcudadebugger.so.550.67",
        "options": [
          "ro",
          "nosuid",
          "nodev",
          "bind"
        ]
      },
      {
        "hostPath": "/nix/store/qx6j94l99y8179vgdsyydcd1y0qs60z4-nvidia-x11-550.67-6.6.22/lib/libglxserver_nvidia.so.550.67",
        "containerPath": "/nix/store/qx6j94l99y8179vgdsyydcd1y0qs60z4-nvidia-x11-550.67-6.6.22/lib/libglxserver_nvidia.so.550.67",
        "options": [
          "ro",
          "nosuid",
          "nodev",
          "bind"
        ]
      },
      {
        "hostPath": "/nix/store/qx6j94l99y8179vgdsyydcd1y0qs60z4-nvidia-x11-550.67-6.6.22/lib/libnvcuvid.so.550.67",
        "containerPath": "/nix/store/qx6j94l99y8179vgdsyydcd1y0qs60z4-nvidia-x11-550.67-6.6.22/lib/libnvcuvid.so.550.67",
        "options": [
          "ro",
          "nosuid",
          "nodev",
          "bind"
        ]
      },
      {
        "hostPath": "/nix/store/qx6j94l99y8179vgdsyydcd1y0qs60z4-nvidia-x11-550.67-6.6.22/lib/libnvidia-allocator.so.550.67",
        "containerPath": "/nix/store/qx6j94l99y8179vgdsyydcd1y0qs60z4-nvidia-x11-550.67-6.6.22/lib/libnvidia-allocator.so.550.67",
        "options": [
          "ro",
          "nosuid",
          "nodev",
          "bind"
        ]
      },
      {
        "hostPath": "/nix/store/qx6j94l99y8179vgdsyydcd1y0qs60z4-nvidia-x11-550.67-6.6.22/lib/libnvidia-cfg.so.550.67",
        "containerPath": "/nix/store/qx6j94l99y8179vgdsyydcd1y0qs60z4-nvidia-x11-550.67-6.6.22/lib/libnvidia-cfg.so.550.67",
        "options": [
          "ro",
          "nosuid",
          "nodev",
          "bind"
        ]
      },
      {
        "hostPath": "/nix/store/qx6j94l99y8179vgdsyydcd1y0qs60z4-nvidia-x11-550.67-6.6.22/lib/libnvidia-egl-gbm.so.1.1.1",
        "containerPath": "/nix/store/qx6j94l99y8179vgdsyydcd1y0qs60z4-nvidia-x11-550.67-6.6.22/lib/libnvidia-egl-gbm.so.1.1.1",
        "options": [
          "ro",
          "nosuid",
          "nodev",
          "bind"
        ]
      },
      {
        "hostPath": "/nix/store/qx6j94l99y8179vgdsyydcd1y0qs60z4-nvidia-x11-550.67-6.6.22/lib/libnvidia-eglcore.so.550.67",
        "containerPath": "/nix/store/qx6j94l99y8179vgdsyydcd1y0qs60z4-nvidia-x11-550.67-6.6.22/lib/libnvidia-eglcore.so.550.67",
        "options": [
          "ro",
          "nosuid",
          "nodev",
          "bind"
        ]
      },
      {
        "hostPath": "/nix/store/qx6j94l99y8179vgdsyydcd1y0qs60z4-nvidia-x11-550.67-6.6.22/lib/libnvidia-encode.so.550.67",
        "containerPath": "/nix/store/qx6j94l99y8179vgdsyydcd1y0qs60z4-nvidia-x11-550.67-6.6.22/lib/libnvidia-encode.so.550.67",
        "options": [
          "ro",
          "nosuid",
          "nodev",
          "bind"
        ]
      },
      {
        "hostPath": "/nix/store/qx6j94l99y8179vgdsyydcd1y0qs60z4-nvidia-x11-550.67-6.6.22/lib/libnvidia-fbc.so.550.67",
        "containerPath": "/nix/store/qx6j94l99y8179vgdsyydcd1y0qs60z4-nvidia-x11-550.67-6.6.22/lib/libnvidia-fbc.so.550.67",
        "options": [
          "ro",
          "nosuid",
          "nodev",
          "bind"
        ]
      },
      {
        "hostPath": "/nix/store/qx6j94l99y8179vgdsyydcd1y0qs60z4-nvidia-x11-550.67-6.6.22/lib/libnvidia-glcore.so.550.67",
        "containerPath": "/nix/store/qx6j94l99y8179vgdsyydcd1y0qs60z4-nvidia-x11-550.67-6.6.22/lib/libnvidia-glcore.so.550.67",
        "options": [
          "ro",
          "nosuid",
          "nodev",
          "bind"
        ]
      },
      {
        "hostPath": "/nix/store/qx6j94l99y8179vgdsyydcd1y0qs60z4-nvidia-x11-550.67-6.6.22/lib/libnvidia-glsi.so.550.67",
        "containerPath": "/nix/store/qx6j94l99y8179vgdsyydcd1y0qs60z4-nvidia-x11-550.67-6.6.22/lib/libnvidia-glsi.so.550.67",
        "options": [
          "ro",
          "nosuid",
          "nodev",
          "bind"
        ]
      },
      {
        "hostPath": "/nix/store/qx6j94l99y8179vgdsyydcd1y0qs60z4-nvidia-x11-550.67-6.6.22/lib/libnvidia-glvkspirv.so.550.67",
        "containerPath": "/nix/store/qx6j94l99y8179vgdsyydcd1y0qs60z4-nvidia-x11-550.67-6.6.22/lib/libnvidia-glvkspirv.so.550.67",
        "options": [
          "ro",
          "nosuid",
          "nodev",
          "bind"
        ]
      },
      {
        "hostPath": "/nix/store/qx6j94l99y8179vgdsyydcd1y0qs60z4-nvidia-x11-550.67-6.6.22/lib/libnvidia-gpucomp.so.550.67",
        "containerPath": "/nix/store/qx6j94l99y8179vgdsyydcd1y0qs60z4-nvidia-x11-550.67-6.6.22/lib/libnvidia-gpucomp.so.550.67",
        "options": [
          "ro",
          "nosuid",
          "nodev",
          "bind"
        ]
      },
      {
        "hostPath": "/nix/store/qx6j94l99y8179vgdsyydcd1y0qs60z4-nvidia-x11-550.67-6.6.22/lib/libnvidia-ml.so.550.67",
        "containerPath": "/nix/store/qx6j94l99y8179vgdsyydcd1y0qs60z4-nvidia-x11-550.67-6.6.22/lib/libnvidia-ml.so.550.67",
        "options": [
          "ro",
          "nosuid",
          "nodev",
          "bind"
        ]
      },
      {
        "hostPath": "/nix/store/qx6j94l99y8179vgdsyydcd1y0qs60z4-nvidia-x11-550.67-6.6.22/lib/libnvidia-ngx.so.550.67",
        "containerPath": "/nix/store/qx6j94l99y8179vgdsyydcd1y0qs60z4-nvidia-x11-550.67-6.6.22/lib/libnvidia-ngx.so.550.67",
        "options": [
          "ro",
          "nosuid",
          "nodev",
          "bind"
        ]
      },
      {
        "hostPath": "/nix/store/qx6j94l99y8179vgdsyydcd1y0qs60z4-nvidia-x11-550.67-6.6.22/lib/libnvidia-nvvm.so.550.67",
        "containerPath": "/nix/store/qx6j94l99y8179vgdsyydcd1y0qs60z4-nvidia-x11-550.67-6.6.22/lib/libnvidia-nvvm.so.550.67",
        "options": [
          "ro",
          "nosuid",
          "nodev",
          "bind"
        ]
      },
      {
        "hostPath": "/nix/store/qx6j94l99y8179vgdsyydcd1y0qs60z4-nvidia-x11-550.67-6.6.22/lib/libnvidia-opencl.so.550.67",
        "containerPath": "/nix/store/qx6j94l99y8179vgdsyydcd1y0qs60z4-nvidia-x11-550.67-6.6.22/lib/libnvidia-opencl.so.550.67",
        "options": [
          "ro",
          "nosuid",
          "nodev",
          "bind"
        ]
      },
      {
        "hostPath": "/nix/store/qx6j94l99y8179vgdsyydcd1y0qs60z4-nvidia-x11-550.67-6.6.22/lib/libnvidia-opticalflow.so.550.67",
        "containerPath": "/nix/store/qx6j94l99y8179vgdsyydcd1y0qs60z4-nvidia-x11-550.67-6.6.22/lib/libnvidia-opticalflow.so.550.67",
        "options": [
          "ro",
          "nosuid",
          "nodev",
          "bind"
        ]
      },
      {
        "hostPath": "/nix/store/qx6j94l99y8179vgdsyydcd1y0qs60z4-nvidia-x11-550.67-6.6.22/lib/libnvidia-pkcs11-openssl3.so.550.67",
        "containerPath": "/nix/store/qx6j94l99y8179vgdsyydcd1y0qs60z4-nvidia-x11-550.67-6.6.22/lib/libnvidia-pkcs11-openssl3.so.550.67",
        "options": [
          "ro",
          "nosuid",
          "nodev",
          "bind"
        ]
      },
      {
        "hostPath": "/nix/store/qx6j94l99y8179vgdsyydcd1y0qs60z4-nvidia-x11-550.67-6.6.22/lib/libnvidia-pkcs11.so.550.67",
        "containerPath": "/nix/store/qx6j94l99y8179vgdsyydcd1y0qs60z4-nvidia-x11-550.67-6.6.22/lib/libnvidia-pkcs11.so.550.67",
        "options": [
          "ro",
          "nosuid",
          "nodev",
          "bind"
        ]
      },
      {
        "hostPath": "/nix/store/qx6j94l99y8179vgdsyydcd1y0qs60z4-nvidia-x11-550.67-6.6.22/lib/libnvidia-ptxjitcompiler.so.550.67",
        "containerPath": "/nix/store/qx6j94l99y8179vgdsyydcd1y0qs60z4-nvidia-x11-550.67-6.6.22/lib/libnvidia-ptxjitcompiler.so.550.67",
        "options": [
          "ro",
          "nosuid",
          "nodev",
          "bind"
        ]
      },
      {
        "hostPath": "/nix/store/qx6j94l99y8179vgdsyydcd1y0qs60z4-nvidia-x11-550.67-6.6.22/lib/libnvidia-rtcore.so.550.67",
        "containerPath": "/nix/store/qx6j94l99y8179vgdsyydcd1y0qs60z4-nvidia-x11-550.67-6.6.22/lib/libnvidia-rtcore.so.550.67",
        "options": [
          "ro",
          "nosuid",
          "nodev",
          "bind"
        ]
      },
      {
        "hostPath": "/nix/store/qx6j94l99y8179vgdsyydcd1y0qs60z4-nvidia-x11-550.67-6.6.22/lib/libnvidia-tls.so.550.67",
        "containerPath": "/nix/store/qx6j94l99y8179vgdsyydcd1y0qs60z4-nvidia-x11-550.67-6.6.22/lib/libnvidia-tls.so.550.67",
        "options": [
          "ro",
          "nosuid",
          "nodev",
          "bind"
        ]
      },
      {
        "hostPath": "/nix/store/qx6j94l99y8179vgdsyydcd1y0qs60z4-nvidia-x11-550.67-6.6.22/lib/libnvoptix.so.550.67",
        "containerPath": "/nix/store/qx6j94l99y8179vgdsyydcd1y0qs60z4-nvidia-x11-550.67-6.6.22/lib/libnvoptix.so.550.67",
        "options": [
          "ro",
          "nosuid",
          "nodev",
          "bind"
        ]
      },
      {
        "hostPath": "/run/current-system/sw/bin/nvidia-cuda-mps-control",
        "containerPath": "/run/current-system/sw/bin/nvidia-cuda-mps-control",
        "options": [
          "ro",
          "nosuid",
          "nodev",
          "bind"
        ]
      },
      {
        "hostPath": "/run/current-system/sw/bin/nvidia-cuda-mps-server",
        "containerPath": "/run/current-system/sw/bin/nvidia-cuda-mps-server",
        "options": [
          "ro",
          "nosuid",
          "nodev",
          "bind"
        ]
      },
      {
        "hostPath": "/run/current-system/sw/bin/nvidia-debugdump",
        "containerPath": "/run/current-system/sw/bin/nvidia-debugdump",
        "options": [
          "ro",
          "nosuid",
          "nodev",
          "bind"
        ]
      },
      {
        "hostPath": "/run/current-system/sw/bin/nvidia-smi",
        "containerPath": "/run/current-system/sw/bin/nvidia-smi",
        "options": [
          "ro",
          "nosuid",
          "nodev",
          "bind"
        ]
      },
      {
        "hostPath": "/run/opengl-driver/share/egl/egl_external_platform.d/10_nvidia_wayland.json",
        "containerPath": "/run/opengl-driver/share/egl/egl_external_platform.d/10_nvidia_wayland.json",
        "options": [
          "ro",
          "nosuid",
          "nodev",
          "bind"
        ]
      },
      {
        "hostPath": "/run/opengl-driver/share/egl/egl_external_platform.d/15_nvidia_gbm.json",
        "containerPath": "/run/opengl-driver/share/egl/egl_external_platform.d/15_nvidia_gbm.json",
        "options": [
          "ro",
          "nosuid",
          "nodev",
          "bind"
        ]
      },
      {
        "hostPath": "/run/opengl-driver/share/glvnd/egl_vendor.d/10_nvidia.json",
        "containerPath": "/run/opengl-driver/share/glvnd/egl_vendor.d/10_nvidia.json",
        "options": [
          "ro",
          "nosuid",
          "nodev",
          "bind"
        ]
      },
      {
        "hostPath": "/run/opengl-driver/share/vulkan/implicit_layer.d/nvidia_layers.json",
        "containerPath": "/run/opengl-driver/share/vulkan/implicit_layer.d/nvidia_layers.json",
        "options": [
          "ro",
          "nosuid",
          "nodev",
          "bind"
        ]
      }
    ]
  }
}

@SomeoneSerge does this include everything you expect?

@SomeoneSerge
Copy link
Contributor Author

    "containerPath": "/run/opengl-driver/share/glvnd/egl_vendor.d/10_nvidia.json",

For my current use-case I think this is the thing, so yes. Of course we still need to ensure that the closure is included in the container, including the symlink targets

On the nixos side though, we should be adding /run/opengl-driver/share to XDG_DATA_DIRS.

I'm pretty sure we do do that for libglvnd somehow, but I'd need to search for the concrete rreferences

@ereslibre
Copy link
Member

This is the CDI output I get when using XDG_DATA_DIRS that includes /run/openl-driver/share:

Thanks @jmbaur. I assume this works with the change you plan to submit to nvidia-container-toolkit, right?

I'm pretty sure we do do that for libglvnd somehow, but I'd need to search for the concrete rreferences

Wouldn't this be a matter of setting serviceConfig.Environment = "XDG_DATA_DIRS=/run/opengl-driver/share"; on the unit that calls the CDI generator?

@SomeoneSerge
Copy link
Contributor Author

SomeoneSerge commented Mar 23, 2024

We need to somehow synchronize at least the following:

In the worst case we track them manually as we clarify and implement #141803

EDIT: CC @Atemu

@Atemu
Copy link
Member

Atemu commented Mar 23, 2024

Sorry, what exactly is there to synchronise? They all appear to use some subdir of /run/opengl-driver/share/.

@SomeoneSerge
Copy link
Contributor Author

Sorry, what exactly is there to synchronise? They all appear to use some subdir of /run/opengl-driver/share/

Hi! I tagged you because a potential implication of #141803 is that the location of these configs might change in the future, and the current issue is about making nvidia-container-toolkit detect those correctly => how ever we solve this for nvidia-container-toolkit has to account for the potential move later

@jmbaur
Copy link
Contributor

jmbaur commented Mar 23, 2024

Thanks @jmbaur. I assume this works with the change you plan to submit to nvidia-container-toolkit, right?

Yes! PR is here: NVIDIA/nvidia-container-toolkit#425

@ereslibre
Copy link
Member

Yes! PR is here: NVIDIA/nvidia-container-toolkit#425

Awesome!

@jmbaur
Copy link
Contributor

jmbaur commented Mar 23, 2024

@SomeoneSerge what kind of container are you running? I'm worried that in the case where the container is built using nix, everything works when we mount the paths under /run/opengl-driver/share/* since the software running in the container is already configured to look for stuff in /run/opengl-driver. But perhaps in the case where the host is nixos and the container is something else (like ubuntu), that software may end up looking in /usr/share and will not find what has been shared from the host. It seems that XDG_DATA_DIRS from the host would need to be shared with the container.

@SomeoneSerge
Copy link
Contributor Author

SomeoneSerge commented Mar 23, 2024

what kind of container are you running? ... perhaps in the case where the host is nixos and the container is something else (like ubuntu)

Yes, there are two layers to the issue:

  1. Currently, EGL is "broken" even in containers based on images produced by nixpkgs' dockerTools: I think it's because the current implementation of virtualisation.containers.cdi fails to mount the full closure into the container (including the store path with the jsons); already tracking in [Tracking issue] CDI Support #290609
  2. Then there's supporting the FHS images, for which we need to remap parts of driverLink to the locations expected by FHS distributions. Your PR hopefully addresses this

@samueldr samueldr added the 5. scope: tracked Issue (or PR) is linked back to a `5. scope: tracking` issue label Apr 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.kind: bug Something is broken 5. scope: tracked Issue (or PR) is linked back to a `5. scope: tracking` issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants