-
Notifications
You must be signed in to change notification settings - Fork 430
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use NVIDIA's gpu-operator for GPU node support #1017
Comments
NVIDIA's gpu-operator doesn't support containerd yet per discussion in NVIDIA/gpu-operator#7 |
/assign @mboersma |
Containerd 1.4 support is now live in gpu-operator 1.4. +100 for leveraging gpu-operator |
@mboersma are you working on this? If not, can I pick this one up? |
/assign @shysank @shysank I am not currently working on this, so please have at it (and thank you). When I had looked at it in December, the issue was that restarting a node made Kubernetes lose track of the GPU device, which didn't seem to be a problem with the existing |
/kind feature
Describe the solution you'd like
#1002 implemented the "nvidia-gpu" flavor via
postKubeadmCommands
recommended by NVIDIA, as explained in this comment.But NVIDIA's gpu-operator seems like a cleaner, more future-proof solution. We should investigate whether it supports
containerd
now and whether the current implementation could be replaced with gpu-operator.Anything else you would like to add:
See the discussion in #426 and the current implementation in #1002.
Environment:
kubectl version
):/etc/os-release
):The text was updated successfully, but these errors were encountered: