Mount NVIDIA device files and shared libraries directly rather than delegating this to `nvidia-container-cli` #11224

EtiennePerot · 2024-11-26T19:20:00Z

Description

Currently, runsc's NVIDIA GPU support relies on nvidia-container-cli's configure subcommand to mount NVIDIA device files and shared libraries (*.so's) into the container's root filesystem during container setup. In runc containers, this runs as a "prestart hook" in the OCI spec. In gVisor, because prestart hooks run at a time where the gVisor sandbox filesystem is already isolated from the host using pivot_root(2). So runsc also has logic to unconditionally skip the NVIDIA prestart hook, and to instead run the nvidia-container-cli configure subcommand earlier in the sandbox startup sequence, within the context of the Gofer process.

runsc also has GKE-specific code that detects the request of GPUs for the container by scanning the list of devices to be mounted in the container, and manually mounts these device files if that is present. In this environment, shared libraries are already mounted by another GKE component that inserts a bind mount of shared libraries into the container spec, so runsc doesn't need to have specific code to mount those shared libraries itself.

This issue tracks the removal of the first codepath. runsc should do all the work and not invoke nvidia-container-cli configure. This has the following advantages:

Faster container startup (no fork/exec required)
Fewer dependencies
No inheritance of security vulnerabilities in nvidia-container-cli

... at the cost of more brittleness when the behavior of nvidia-container-cli changes. But given that we already maintain a codepath that avoids nvidia-container-cli entirely, this doesn't seem like a large incremental cost. The main cost seems to be to add logic to mount the required shared libraries into the container's root filesystem.

Is this feature related to a specific bug?

Perhaps.

Do you have a specific solution in mind?

Remove the use of nvidia-container-cli configure in runsc. Replace it with manual mounting of device files and shared libraries.

The text was updated successfully, but these errors were encountered:

EtiennePerot added the type: enhancement New feature or request label Nov 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mount NVIDIA device files and shared libraries directly rather than delegating this to `nvidia-container-cli` #11224

Mount NVIDIA device files and shared libraries directly rather than delegating this to `nvidia-container-cli` #11224

EtiennePerot commented Nov 26, 2024 •

edited

Loading

Mount NVIDIA device files and shared libraries directly rather than delegating this to nvidia-container-cli #11224

Mount NVIDIA device files and shared libraries directly rather than delegating this to nvidia-container-cli #11224

Comments

EtiennePerot commented Nov 26, 2024 • edited Loading

Description

Is this feature related to a specific bug?

Do you have a specific solution in mind?

Mount NVIDIA device files and shared libraries directly rather than delegating this to `nvidia-container-cli` #11224

Mount NVIDIA device files and shared libraries directly rather than delegating this to `nvidia-container-cli` #11224

EtiennePerot commented Nov 26, 2024 •

edited

Loading