Mount NVIDIA device files and shared libraries directly rather than delegating this to nvidia-container-cli
#11224
Labels
type: enhancement
New feature or request
Description
Currently,
runsc
's NVIDIA GPU support relies onnvidia-container-cli
'sconfigure
subcommand to mount NVIDIA device files and shared libraries (*.so
's) into the container's root filesystem during container setup. Inrunc
containers, this runs as a "prestart hook" in the OCI spec. In gVisor, because prestart hooks run at a time where the gVisor sandbox filesystem is already isolated from the host usingpivot_root(2)
. Sorunsc
also has logic to unconditionally skip the NVIDIA prestart hook, and to instead run thenvidia-container-cli configure
subcommand earlier in the sandbox startup sequence, within the context of the Gofer process.runsc
also has GKE-specific code that detects the request of GPUs for the container by scanning the list of devices to be mounted in the container, and manually mounts these device files if that is present. In this environment, shared libraries are already mounted by another GKE component that inserts a bind mount of shared libraries into the container spec, sorunsc
doesn't need to have specific code to mount those shared libraries itself.This issue tracks the removal of the first codepath.
runsc
should do all the work and not invokenvidia-container-cli configure
. This has the following advantages:nvidia-container-cli
... at the cost of more brittleness when the behavior of
nvidia-container-cli
changes. But given that we already maintain a codepath that avoidsnvidia-container-cli
entirely, this doesn't seem like a large incremental cost. The main cost seems to be to add logic to mount the required shared libraries into the container's root filesystem.Is this feature related to a specific bug?
Perhaps.
Do you have a specific solution in mind?
Remove the use of
nvidia-container-cli configure
inrunsc
. Replace it with manual mounting of device files and shared libraries.The text was updated successfully, but these errors were encountered: