Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Entering fails if NVIDIA Persistence Daemon is used #1572

Closed
jbtrystram opened this issue Oct 23, 2024 · 8 comments
Closed

Entering fails if NVIDIA Persistence Daemon is used #1572

jbtrystram opened this issue Oct 23, 2024 · 8 comments
Labels
1. Bug Something isn't working

Comments

@jbtrystram
Copy link
Contributor

jbtrystram commented Oct 23, 2024

Describe the bug
When trying to enter a toolbox podman fails with Error: mount: /run/nvidia-persistenced/socket: mount point does not exist

Steps how to reproduce the behaviour

  1. f40 kinoite with a nvidia GPU
  2. toolbox create
  3. toolbox enter
  4. See error

Expected behaviour
toolbox working awesome, as it's been for months

Actual behaviour

jib@fedora:/var/home/jib$ toolbox enter
Error: mount: /run/nvidia-persistenced/socket: mount point does not exist.
       dmesg(1) may have more information after failed mount system call.
failed to apply mount from Container Device Interface for NVIDIA

Output of toolbox --version (v0.0.90+)

toolbox version 0.0.99.6

Toolbx package info (rpm -q toolbox)

toolbox-0.0.99.6-1.fc40.x86_64

Output of podman version
e.g.,

Client:       Podman Engine
Version:      5.2.3
API Version:  5.2.3
Go Version:   go1.22.7
Built:        Tue Sep 24 02:00:00 2024
OS/Arch:      linux/amd64

Podman package info (rpm -q podman)
podman-5.2.3-1.fc40.x86_64

Info about your OS
universal-blue kinoite-nvidia build (f40)

Additional context

I think this coincide with me setting up a podman container using NVIDIA CUDA capabilities. Note that my other container works fine as expected.
I did ran nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml following the podman documentation

See attached log of toolbox enter -vv
toolbox-nvidia-issue.txt

@jbtrystram jbtrystram added the 1. Bug Something isn't working label Oct 23, 2024
@lmgarret
Copy link

lmgarret commented Oct 23, 2024

I believe that this is a recently introduced problem; I'm running Bazzite on both my Desktop with a Nvidia GPU, and on a laptop with a disabled Nvidia dGPU (with envycontrol). I recently updated both Bazzite installations and can no longer enter any toolbox container, with the same error message that you shared.

Also, maybe a red herring but dmesg does bring up something about the pid file in the same dir, could it be related?

[   11.540144] systemd[1]: /usr/lib/systemd/system/nvidia-persistenced.service:7: PIDFile= references a path below legacy directory /var/run/, updating /var/run/nvidia-persistenced/nvidia-persistenced.pid → /run/nvidia-persistenced/nvidia-persistenced.pid; please update the unit file accordingly.

@LoGaIta99
Copy link

LoGaIta99 commented Oct 23, 2024

I add myself to the list of affected users. I must say that I am on Fedora Kinoite and the last working build that I pinned was 40.20241011.0. In the meantime these packages were updated and could be related to my issue:

amd-gpu-firmware 20240909-1.fc40 -> 20241017-2.fc40
intel-gpu-firmware 20240909-1.fc40 -> 20241017-2.fc40
nvidia-gpu-firmware 20240909-1.fc40 -> 20241017-2.fc40
toolbox 0.0.99.5-11.fc40 -> 0.0.99.6-1.fc40

Notice that neither the Nvidia driver, neither podman were updated between the working and faulty deployments.

I might also report that my error when the Nvidia GPU is deactivated through envycontrol is:

Error: failed to initialize NVIDIA Management Library

I never installed Nvidia container toolkit and I don't need my Nvidia GPU inside the containers.

I read in a previous issue that everything should work seamlessly. It is not the case now.
Distrobox is unaffected by this problem.
I previously explained my situation on Fedora discussion.

@tfmoraes
Copy link

I'm having the same problem. Adding log from systemd journal:

Oct 23 15:28:17 watchmen.scartissue conmon[33520]: conmon ccb5befa180ef889abac <ndebug>: failed to write to /proc/self/oom_score_adj: Permissão negada
Oct 23 15:28:17 watchmen.scartissue conmon[33521]: conmon ccb5befa180ef889abac <ndebug>: addr{sun_family=AF_UNIX, sun_path=/proc/self/fd/12/attach}
Oct 23 15:28:17 watchmen.scartissue conmon[33521]: conmon ccb5befa180ef889abac <ndebug>: terminal_ctrl_fd: 12
Oct 23 15:28:17 watchmen.scartissue conmon[33521]: conmon ccb5befa180ef889abac <ndebug>: winsz read side: 15, winsz write side: 16
Oct 23 15:28:17 watchmen.scartissue systemd[3000]: Started libpod-ccb5befa180ef889abaca823d1fb3b52a21e90529fc66f59b8b14f92c5939d5c.scope - libcrun container.
Oct 23 15:28:17 watchmen.scartissue conmon[33521]: conmon ccb5befa180ef889abac <ndebug>: container PID: 33523
Oct 23 15:28:17 watchmen.scartissue podman[33501]: 2024-10-23 15:28:17.337042921 -0300 -03 m=+0.079351406 container init ccb5befa180ef889abaca823d1fb3b52a21e90529fc66f59b8b14f92c5939d5c (image=registry.fedoraproject.org/fedora-toolbox:41, name=fedora-toolbox-41, org.opencontainers.image.url=https://fedoraproject.org/, license=MIT, org.opencontainers.image.name=fedora-toolbox, org.opencontainers.image.license=MIT, org.opencontainers.image.version=41, version=41, vendor=Fedora Project, com.github.containers.toolbox=true, io.buildah.version=1.37.5, name=fedora-toolbox, org.opencontainers.image.vendor=Fedora Project)
Oct 23 15:28:17 watchmen.scartissue podman[33501]: 2024-10-23 15:28:17.340022121 -0300 -03 m=+0.082330606 container start ccb5befa180ef889abaca823d1fb3b52a21e90529fc66f59b8b14f92c5939d5c (image=registry.fedoraproject.org/fedora-toolbox:41, name=fedora-toolbox-41, version=41, io.buildah.version=1.37.5, org.opencontainers.image.version=41, name=fedora-toolbox, org.opencontainers.image.url=https://fedoraproject.org/, org.opencontainers.image.vendor=Fedora Project, license=MIT, org.opencontainers.image.license=MIT, org.opencontainers.image.name=fedora-toolbox, vendor=Fedora Project, com.github.containers.toolbox=true)
Oct 23 15:28:17 watchmen.scartissue fedora-toolbox-41[33521]: level=debug msg="Running as real user ID 0"
Oct 23 15:28:17 watchmen.scartissue fedora-toolbox-41[33521]: level=debug msg="Resolved absolute path to the executable as /usr/bin/toolbox"
Oct 23 15:28:17 watchmen.scartissue fedora-toolbox-41[33521]: level=debug msg="TOOLBX_DELAY_ENTRY_POINT is "
Oct 23 15:28:17 watchmen.scartissue fedora-toolbox-41[33521]: level=debug msg="TOOLBX_FAIL_ENTRY_POINT is "
Oct 23 15:28:17 watchmen.scartissue fedora-toolbox-41[33521]: level=debug msg="TOOLBOX_PATH is /usr/bin/toolbox"
Oct 23 15:28:17 watchmen.scartissue fedora-toolbox-41[33521]: level=debug msg="Migrating to newer Podman"
Oct 23 15:28:17 watchmen.scartissue fedora-toolbox-41[33521]: level=debug msg="Migration not needed: running inside a container"
Oct 23 15:28:17 watchmen.scartissue fedora-toolbox-41[33521]: level=debug msg="Setting up configuration"
Oct 23 15:28:17 watchmen.scartissue fedora-toolbox-41[33521]: level=debug msg="Setting up configuration: file /etc/containers/toolbox.conf not found"
Oct 23 15:28:17 watchmen.scartissue fedora-toolbox-41[33521]: level=debug msg="Setting up configuration: file /root/.config/containers/toolbox.conf not found"
Oct 23 15:28:17 watchmen.scartissue fedora-toolbox-41[33521]: level=debug msg="Resolving container and image names"
Oct 23 15:28:17 watchmen.scartissue fedora-toolbox-41[33521]: level=debug msg="Container: ''"
Oct 23 15:28:17 watchmen.scartissue fedora-toolbox-41[33521]: level=debug msg="Distribution (CLI): ''"
Oct 23 15:28:17 watchmen.scartissue fedora-toolbox-41[33521]: level=debug msg="Image (CLI): ''"
Oct 23 15:28:17 watchmen.scartissue fedora-toolbox-41[33521]: level=debug msg="Release (CLI): ''"
Oct 23 15:28:17 watchmen.scartissue fedora-toolbox-41[33521]: level=debug msg="Resolved container and image names"
Oct 23 15:28:17 watchmen.scartissue fedora-toolbox-41[33521]: level=debug msg="Container: 'fedora-toolbox-41'"
Oct 23 15:28:17 watchmen.scartissue fedora-toolbox-41[33521]: level=debug msg="Image: 'fedora-toolbox:41'"
Oct 23 15:28:17 watchmen.scartissue fedora-toolbox-41[33521]: level=debug msg="Release: '41'"
Oct 23 15:28:17 watchmen.scartissue fedora-toolbox-41[33521]: level=debug msg="Creating /run/.toolboxenv"
Oct 23 15:28:17 watchmen.scartissue fedora-toolbox-41[33521]: level=debug msg="Path /run/host/etc exists"
Oct 23 15:28:17 watchmen.scartissue fedora-toolbox-41[33521]: level=debug msg="Resolved /etc/localtime to /run/host/usr/share/zoneinfo/America/Sao_Paulo"
Oct 23 15:28:17 watchmen.scartissue fedora-toolbox-41[33521]: level=debug msg="Creating regular file /etc/machine-id"
Oct 23 15:28:17 watchmen.scartissue fedora-toolbox-41[33521]: level=debug msg="Binding /etc/machine-id to /run/host/etc/machine-id"
Oct 23 15:28:17 watchmen.scartissue fedora-toolbox-41[33521]: level=debug msg="Creating directory /run/libvirt"
Oct 23 15:28:17 watchmen.scartissue fedora-toolbox-41[33521]: level=debug msg="Binding /run/libvirt to /run/host/run/libvirt"
Oct 23 15:28:17 watchmen.scartissue fedora-toolbox-41[33521]: level=debug msg="Creating directory /run/systemd/journal"
Oct 23 15:28:17 watchmen.scartissue fedora-toolbox-41[33521]: level=debug msg="Binding /run/systemd/journal to /run/host/run/systemd/journal"
Oct 23 15:28:17 watchmen.scartissue fedora-toolbox-41[33521]: level=debug msg="Creating directory /run/systemd/resolve"
Oct 23 15:28:17 watchmen.scartissue fedora-toolbox-41[33521]: level=debug msg="Binding /run/systemd/resolve to /run/host/run/systemd/resolve"
Oct 23 15:28:17 watchmen.scartissue fedora-toolbox-41[33521]: level=debug msg="Creating directory /run/systemd/sessions"
Oct 23 15:28:17 watchmen.scartissue fedora-toolbox-41[33521]: level=debug msg="Binding /run/systemd/sessions to /run/host/run/systemd/sessions"
Oct 23 15:28:17 watchmen.scartissue fedora-toolbox-41[33521]: level=debug msg="Creating directory /run/systemd/system"
Oct 23 15:28:17 watchmen.scartissue fedora-toolbox-41[33521]: level=debug msg="Binding /run/systemd/system to /run/host/run/systemd/system"
Oct 23 15:28:17 watchmen.scartissue fedora-toolbox-41[33521]: level=debug msg="Creating directory /run/systemd/users"
Oct 23 15:28:17 watchmen.scartissue fedora-toolbox-41[33521]: level=debug msg="Binding /run/systemd/users to /run/host/run/systemd/users"
Oct 23 15:28:17 watchmen.scartissue fedora-toolbox-41[33521]: level=debug msg="Creating directory /run/udev/data"
Oct 23 15:28:17 watchmen.scartissue fedora-toolbox-41[33521]: level=debug msg="Binding /run/udev/data to /run/host/run/udev/data"
Oct 23 15:28:17 watchmen.scartissue fedora-toolbox-41[33521]: level=debug msg="Creating directory /run/udev/tags"
Oct 23 15:28:17 watchmen.scartissue fedora-toolbox-41[33521]: level=debug msg="Binding /run/udev/tags to /run/host/run/udev/tags"
Oct 23 15:28:17 watchmen.scartissue fedora-toolbox-41[33521]: level=debug msg="Creating directory /tmp"
Oct 23 15:28:17 watchmen.scartissue fedora-toolbox-41[33521]: level=debug msg="Binding /tmp to /run/host/tmp"
Oct 23 15:28:17 watchmen.scartissue fedora-toolbox-41[33521]: level=debug msg="Creating directory /var/lib/flatpak"
Oct 23 15:28:17 watchmen.scartissue fedora-toolbox-41[33521]: level=debug msg="Binding /var/lib/flatpak to /run/host/var/lib/flatpak"
Oct 23 15:28:17 watchmen.scartissue fedora-toolbox-41[33521]: level=debug msg="Creating directory /var/lib/libvirt"
Oct 23 15:28:17 watchmen.scartissue fedora-toolbox-41[33521]: level=debug msg="Binding /var/lib/libvirt to /run/host/var/lib/libvirt"
Oct 23 15:28:17 watchmen.scartissue fedora-toolbox-41[33521]: level=debug msg="Creating directory /var/lib/systemd/coredump"
Oct 23 15:28:17 watchmen.scartissue fedora-toolbox-41[33521]: level=debug msg="Binding /var/lib/systemd/coredump to /run/host/var/lib/systemd/coredump"
Oct 23 15:28:17 watchmen.scartissue fedora-toolbox-41[33521]: level=debug msg="Creating directory /var/log/journal"
Oct 23 15:28:17 watchmen.scartissue fedora-toolbox-41[33521]: level=debug msg="Binding /var/log/journal to /run/host/var/log/journal"
Oct 23 15:28:17 watchmen.scartissue fedora-toolbox-41[33521]: level=debug msg="Creating directory /var/mnt"
Oct 23 15:28:17 watchmen.scartissue fedora-toolbox-41[33521]: level=debug msg="Binding /var/mnt to /run/host/var/mnt"
Oct 23 15:28:17 watchmen.scartissue fedora-toolbox-41[33521]: level=debug msg="Creating directory /sys/fs/selinux"
Oct 23 15:28:17 watchmen.scartissue fedora-toolbox-41[33521]: level=debug msg="Binding /sys/fs/selinux to /usr/share/empty"
Oct 23 15:28:17 watchmen.scartissue fedora-toolbox-41[33521]: level=debug msg="Preparing to redirect /home to /var/home"
Oct 23 15:28:17 watchmen.scartissue fedora-toolbox-41[33521]: level=debug msg="/var/home isn't a symbolic link"
Oct 23 15:28:17 watchmen.scartissue fedora-toolbox-41[33521]: level=debug msg="Redirecting /home to /var/home"
Oct 23 15:28:17 watchmen.scartissue fedora-toolbox-41[33521]: level=debug msg="Looking up group for sudo"
Oct 23 15:28:17 watchmen.scartissue fedora-toolbox-41[33521]: level=debug msg="Group for sudo is wheel"
Oct 23 15:28:17 watchmen.scartissue fedora-toolbox-41[33521]: level=debug msg="Modifying user thiago with UID 1000:"
Oct 23 15:28:17 watchmen.scartissue fedora-toolbox-41[33521]: level=debug msg=usermod
Oct 23 15:28:17 watchmen.scartissue fedora-toolbox-41[33521]: level=debug msg=--append
Oct 23 15:28:17 watchmen.scartissue fedora-toolbox-41[33521]: level=debug msg=--groups
Oct 23 15:28:17 watchmen.scartissue fedora-toolbox-41[33521]: level=debug msg=wheel
Oct 23 15:28:17 watchmen.scartissue fedora-toolbox-41[33521]: level=debug msg=--home
Oct 23 15:28:17 watchmen.scartissue fedora-toolbox-41[33521]: level=debug msg=/var/home/thiago
Oct 23 15:28:17 watchmen.scartissue fedora-toolbox-41[33521]: level=debug msg=--password
Oct 23 15:28:17 watchmen.scartissue fedora-toolbox-41[33521]: level=debug
Oct 23 15:28:17 watchmen.scartissue fedora-toolbox-41[33521]: level=debug msg=--shell
Oct 23 15:28:17 watchmen.scartissue fedora-toolbox-41[33521]: level=debug msg=/usr/bin/fish
Oct 23 15:28:17 watchmen.scartissue fedora-toolbox-41[33521]: level=debug msg=--uid
Oct 23 15:28:17 watchmen.scartissue fedora-toolbox-41[33521]: level=debug msg=1000
Oct 23 15:28:17 watchmen.scartissue fedora-toolbox-41[33521]: level=debug msg=thiago
Oct 23 15:28:17 watchmen.scartissue fedora-toolbox-41[33521]: usermod: Warning: missing or non-executable shell '/usr/bin/fish'
Oct 23 15:28:17 watchmen.scartissue usermod[33559]: change user 'thiago' password
Oct 23 15:28:17 watchmen.scartissue fedora-toolbox-41[33521]: level=debug msg="Removing password for user root"
Oct 23 15:28:17 watchmen.scartissue passwd[33565]: password for 'root' changed by 'root'
Oct 23 15:28:17 watchmen.scartissue fedora-toolbox-41[33521]: level=debug msg="Creating runtime directory /run/user/1000/toolbox"
Oct 23 15:28:17 watchmen.scartissue fedora-toolbox-41[33521]: level=debug msg="Loading Container Device Interface for NVIDIA from file /run/user/1000/toolbox/cdi-nvidia.json"
Oct 23 15:28:17 watchmen.scartissue fedora-toolbox-41[33521]: level=debug msg="Applying Container Device Interface for NVIDIA"
Oct 23 15:28:17 watchmen.scartissue fedora-toolbox-41[33521]: level=debug msg="Binding /run/nvidia-persistenced/socket to /run/host/run/nvidia-persistenced/socket"
Oct 23 15:28:17 watchmen.scartissue fedora-toolbox-41[33521]: mount: /run/nvidia-persistenced/socket: mount point does not exist.
Oct 23 15:28:17 watchmen.scartissue fedora-toolbox-41[33521]:        dmesg(1) may have more information after failed mount system call.
Oct 23 15:28:17 watchmen.scartissue fedora-toolbox-41[33521]: level=debug msg="Applying Container Device Interface for NVIDIA: failed to bind /run/nvidia-persistenced/socket to /run/host/run/nvidia-persistenced/socket"
Oct 23 15:28:17 watchmen.scartissue fedora-toolbox-41[33521]: Error: failed to apply mount from Container Device Interface for NVIDIA
Oct 23 15:28:17 watchmen.scartissue conmon[33521]: conmon ccb5befa180ef889abac <ninfo>: container 33523 exited with status 1
Oct 23 15:28:17 watchmen.scartissue conmon[33521]: conmon ccb5befa180ef889abac <nwarn>: Failed to open cgroups file: /sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/user.slice/libpod-ccb5befa180ef889abaca823d1fb3b52a21e90529fc66f59b8b14f92c5939d5c.scope/container/memory.events
Oct 23 15:28:17 watchmen.scartissue /usr/bin/podman[33571]: time="2024-10-23T15:28:17-03:00" level=debug msg="Called cleanup.PersistentPreRunE(/usr/bin/podman --root /var/home/thiago/.local/share/containers/storage --runroot /run/user/1000/containers --log-level debug --cgroup-manager systemd --tmpdir /run/user/1000/libpod/tmp --network-config-dir  --network-backend netavark --volumepath /var/home/thiago/.local/share/containers/storage/volumes --db-backend sqlite --transient-store=false --runtime crun --storage-driver overlay --events-backend journald --syslog container cleanup ccb5befa180ef889abaca823d1fb3b52a21e90529fc66f59b8b14f92c5939d5c)"
Oct 23 15:28:17 watchmen.scartissue /usr/bin/podman[33571]: time="2024-10-23T15:28:17-03:00" level=debug msg="Setting custom database backend: \"sqlite\""
Oct 23 15:28:17 watchmen.scartissue /usr/bin/podman[33571]: time="2024-10-23T15:28:17-03:00" level=debug msg="Using conmon: \"/usr/bin/conmon\""
Oct 23 15:28:17 watchmen.scartissue /usr/bin/podman[33571]: time="2024-10-23T15:28:17-03:00" level=info msg="Using sqlite as database backend"
Oct 23 15:28:17 watchmen.scartissue /usr/bin/podman[33571]: time="2024-10-23T15:28:17-03:00" level=debug msg="Using graph driver overlay"
Oct 23 15:28:17 watchmen.scartissue /usr/bin/podman[33571]: time="2024-10-23T15:28:17-03:00" level=debug msg="Using graph root /var/home/thiago/.local/share/containers/storage"
Oct 23 15:28:17 watchmen.scartissue /usr/bin/podman[33571]: time="2024-10-23T15:28:17-03:00" level=debug msg="Using run root /run/user/1000/containers"
Oct 23 15:28:17 watchmen.scartissue /usr/bin/podman[33571]: time="2024-10-23T15:28:17-03:00" level=debug msg="Using static dir /var/home/thiago/.local/share/containers/storage/libpod"
Oct 23 15:28:17 watchmen.scartissue /usr/bin/podman[33571]: time="2024-10-23T15:28:17-03:00" level=debug msg="Using tmp dir /run/user/1000/libpod/tmp"
Oct 23 15:28:17 watchmen.scartissue /usr/bin/podman[33571]: time="2024-10-23T15:28:17-03:00" level=debug msg="Using volume path /var/home/thiago/.local/share/containers/storage/volumes"
Oct 23 15:28:17 watchmen.scartissue /usr/bin/podman[33571]: time="2024-10-23T15:28:17-03:00" level=debug msg="Using transient store: false"
Oct 23 15:28:17 watchmen.scartissue /usr/bin/podman[33571]: time="2024-10-23T15:28:17-03:00" level=debug msg="[graphdriver] trying provided driver \"overlay\""
Oct 23 15:28:17 watchmen.scartissue /usr/bin/podman[33571]: time="2024-10-23T15:28:17-03:00" level=debug msg="Cached value indicated that overlay is supported"
Oct 23 15:28:17 watchmen.scartissue /usr/bin/podman[33571]: time="2024-10-23T15:28:17-03:00" level=debug msg="Cached value indicated that overlay is supported"
Oct 23 15:28:17 watchmen.scartissue /usr/bin/podman[33571]: time="2024-10-23T15:28:17-03:00" level=debug msg="Cached value indicated that metacopy is not being used"
Oct 23 15:28:17 watchmen.scartissue /usr/bin/podman[33571]: time="2024-10-23T15:28:17-03:00" level=debug msg="Cached value indicated that native-diff is usable"
Oct 23 15:28:17 watchmen.scartissue /usr/bin/podman[33571]: time="2024-10-23T15:28:17-03:00" level=debug msg="backingFs=btrfs, projectQuotaSupported=false, useNativeDiff=true, usingMetacopy=false"
Oct 23 15:28:17 watchmen.scartissue /usr/bin/podman[33571]: time="2024-10-23T15:28:17-03:00" level=debug msg="Initializing event backend journald"
Oct 23 15:28:17 watchmen.scartissue /usr/bin/podman[33571]: time="2024-10-23T15:28:17-03:00" level=debug msg="Configured OCI runtime crun-wasm initialization failed: no valid executable found for OCI runtime crun-wasm: invalid argument"
Oct 23 15:28:17 watchmen.scartissue /usr/bin/podman[33571]: time="2024-10-23T15:28:17-03:00" level=debug msg="Configured OCI runtime runj initialization failed: no valid executable found for OCI runtime runj: invalid argument"
Oct 23 15:28:17 watchmen.scartissue /usr/bin/podman[33571]: time="2024-10-23T15:28:17-03:00" level=debug msg="Configured OCI runtime runsc initialization failed: no valid executable found for OCI runtime runsc: invalid argument"
Oct 23 15:28:17 watchmen.scartissue /usr/bin/podman[33571]: time="2024-10-23T15:28:17-03:00" level=debug msg="Configured OCI runtime youki initialization failed: no valid executable found for OCI runtime youki: invalid argument"
Oct 23 15:28:17 watchmen.scartissue /usr/bin/podman[33571]: time="2024-10-23T15:28:17-03:00" level=debug msg="Configured OCI runtime crun-vm initialization failed: no valid executable found for OCI runtime crun-vm: invalid argument"
Oct 23 15:28:17 watchmen.scartissue /usr/bin/podman[33571]: time="2024-10-23T15:28:17-03:00" level=debug msg="Configured OCI runtime kata initialization failed: no valid executable found for OCI runtime kata: invalid argument"
Oct 23 15:28:17 watchmen.scartissue /usr/bin/podman[33571]: time="2024-10-23T15:28:17-03:00" level=debug msg="Configured OCI runtime krun initialization failed: no valid executable found for OCI runtime krun: invalid argument"
Oct 23 15:28:17 watchmen.scartissue /usr/bin/podman[33571]: time="2024-10-23T15:28:17-03:00" level=debug msg="Configured OCI runtime ocijail initialization failed: no valid executable found for OCI runtime ocijail: invalid argument"
Oct 23 15:28:17 watchmen.scartissue /usr/bin/podman[33571]: time="2024-10-23T15:28:17-03:00" level=debug msg="Using OCI runtime \"/usr/bin/crun\""
Oct 23 15:28:17 watchmen.scartissue /usr/bin/podman[33571]: time="2024-10-23T15:28:17-03:00" level=info msg="Setting parallel job count to 49"
Oct 23 15:28:17 watchmen.scartissue podman[33571]: 2024-10-23 15:28:17.416983073 -0300 -03 m=+0.026547419 container died ccb5befa180ef889abaca823d1fb3b52a21e90529fc66f59b8b14f92c5939d5c (image=registry.fedoraproject.org/fedora-toolbox:41, name=fedora-toolbox-41, com.github.containers.toolbox=true, license=MIT, org.opencontainers.image.url=https://fedoraproject.org/, org.opencontainers.image.vendor=Fedora Project, version=41, name=fedora-toolbox, org.opencontainers.image.name=fedora-toolbox, org.opencontainers.image.version=41, vendor=Fedora Project, io.buildah.version=1.37.5, org.opencontainers.image.license=MIT)
Oct 23 15:28:17 watchmen.scartissue /usr/bin/podman[33571]: time="2024-10-23T15:28:17-03:00" level=debug msg="Sending signal 9 to container ccb5befa180ef889abaca823d1fb3b52a21e90529fc66f59b8b14f92c5939d5c"
Oct 23 15:28:17 watchmen.scartissue /usr/bin/podman[33571]: time="2024-10-23T15:28:17-03:00" level=debug msg="Cleaning up container ccb5befa180ef889abaca823d1fb3b52a21e90529fc66f59b8b14f92c5939d5c"
Oct 23 15:28:17 watchmen.scartissue /usr/bin/podman[33571]: time="2024-10-23T15:28:17-03:00" level=debug msg="Network is already cleaned up, skipping..."
Oct 23 15:28:17 watchmen.scartissue /usr/bin/podman[33571]: time="2024-10-23T15:28:17-03:00" level=debug msg="Successfully cleaned up container ccb5befa180ef889abaca823d1fb3b52a21e90529fc66f59b8b14f92c5939d5c"
Oct 23 15:28:17 watchmen.scartissue /usr/bin/podman[33571]: time="2024-10-23T15:28:17-03:00" level=debug msg="Unmounted container \"ccb5befa180ef889abaca823d1fb3b52a21e90529fc66f59b8b14f92c5939d5c\""
Oct 23 15:28:17 watchmen.scartissue podman[33571]: 2024-10-23 15:28:17.458947593 -0300 -03 m=+0.068511929 container cleanup ccb5befa180ef889abaca823d1fb3b52a21e90529fc66f59b8b14f92c5939d5c (image=registry.fedoraproject.org/fedora-toolbox:41, name=fedora-toolbox-41, vendor=Fedora Project, version=41, name=fedora-toolbox, org.opencontainers.image.name=fedora-toolbox, org.opencontainers.image.version=41, license=MIT, org.opencontainers.image.url=https://fedoraproject.org/, org.opencontainers.image.vendor=Fedora Project, org.opencontainers.image.license=MIT, io.buildah.version=1.37.5, com.github.containers.toolbox=true)
Oct 23 15:28:17 watchmen.scartissue /usr/bin/podman[33571]: time="2024-10-23T15:28:17-03:00" level=debug msg="Called cleanup.PersistentPostRunE(/usr/bin/podman --root /var/home/thiago/.local/share/containers/storage --runroot /run/user/1000/containers --log-level debug --cgroup-manager systemd --tmpdir /run/user/1000/libpod/tmp --network-config-dir  --network-backend netavark --volumepath /var/home/thiago/.local/share/containers/storage/volumes --db-backend sqlite --transient-store=false --runtime crun --storage-driver overlay --events-backend journald --syslog container cleanup ccb5befa180ef889abaca823d1fb3b52a21e90529fc66f59b8b14f92c5939d5c)"
Oct 23 15:28:17 watchmen.scartissue /usr/bin/podman[33571]: time="2024-10-23T15:28:17-03:00" level=debug msg="Shutting down engines"
Oct 23 15:28:17 watchmen.scartissue /usr/bin/podman[33571]: time="2024-10-23T15:28:17-03:00" level=info msg="Received shutdown.Stop(), terminating!" PID=33571

@debarshiray
Copy link
Member

I suspect this is because nvidia-persistenced.service is enabled on the host operating system, and it's exposing a bug in our common bind mounting code that only handles directories and regular files, but not sockets.

@tfmoraes
Copy link

Stoping nvidia-persistenced.service make toolbox works.

@debarshiray
Copy link
Member

Stoping nvidia-persistenced.service make toolbox works.

Thanks for the confirmation! I won't be able to get to this until Tuesday. Maybe you want to submit a pull request? :)

The problem lies in the mountBind function in src/cmd/initContainer.go. I think the conditional branch for fileMode.IsRegular() also needs to cover fileMode&os.ModeSocket != 0.

jbtrystram added a commit to jbtrystram/ctrs-toolbox that referenced this issue Oct 24, 2024
When a socket is bind-mounted to the container, also create a file
mount point for it. Nvidia CDI on the proprietary driver added a
socket for `nvidia-persistence.service` which was failing to be mounted
in the container as no mount point existed.

More logs in the issue below.
Fixes containers#1572
@jbtrystram
Copy link
Contributor Author

jbtrystram commented Oct 24, 2024

Stoping nvidia-persistenced.service make toolbox works.

Thanks for the confirmation! I won't be able to get to this until Tuesday. Maybe you want to submit a pull request? :)

The problem lies in the mountBind function in src/cmd/initContainer.go. I think the conditional branch for fileMode.IsRegular() also needs to cover fileMode&os.ModeSocket != 0.

@debarshiray thanks for giving a hint really really precise.
with that much instructions I couldn't not do it :D
It now works for me ! :)

debarshiray pushed a commit to debarshiray/toolbox that referenced this issue Oct 29, 2024
If the NVIDIA Persistence Daemon is used, then 'enter' fails with:
  $ sudo systemctl start nvidia-persistenced.service
  $ toolbox enter
  Error: mount: /run/nvidia-persistenced/socket: mount point does not exist.
         dmesg(1) may have more information after failed mount system call.
  failed to apply mount from Container Device Interface for NVIDIA

This is due to the socket at /run/nvidia-persistenced/socket being
listed in the Container Device Interface specification when the NVIDIA
Persistence Daemon is used.

Fallout from 6e848b2

containers#1572
@debarshiray debarshiray changed the title nvidia-persistenced.socket bind error Entering fails if NVIDIA Persistence Daemon is used Oct 29, 2024
@debarshiray
Copy link
Member

Fixed by #1576 (and #1577)

Thanks for your contribution, @jbtrystram !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1. Bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants