-
Notifications
You must be signed in to change notification settings - Fork 430
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for GPU nodes #426
Comments
We should into how/if https://github.com/NVIDIA/gpu-operator can be leveraged for this. See https://docs.nvidia.com/datacenter/kubernetes/openshift-on-gpu-install-guide/index.html. |
/assign |
Looks like NVIDIA If we don't want to wait for |
If anyone wants to install drivers and device plugin manually, here are instructions:
You should see following output in device plugin logs:
|
Can we possibly pre-provide a kubeadm config template to simplify this? |
@alexeldeib you mean leverage post kubeadm commands to do the install? |
/unassign @sozercan |
yeah, or even just stick it in a file and have the postKubeadmCommands be I'm warming up to the idea of using the templatized types as a way to simplify defaulting / best practices. We could have something like a default GPU kubeadm config template, so users don't need to bring their own |
Yeah I like the idea of having a "reference" flavor template for GPU w/ docs using the bash script Sertac shared above for now, and then maybe open a separate issue for switching the instructions to use the nvidia operator once that works with containerd. I'm going to mark this as help wanted. /help |
@CecileRobertMichon: Please ensure the request meets the requirements listed here. If this request no longer meets these requirements, the label can be removed In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
There is also a VM extension available on Azure that might be worth looking into: https://docs.microsoft.com/en-us/azure/virtual-machines/extensions/hpccompute-gpu-linux Not sure if it works with containerd though. |
/assign |
/kind feature
Describe the solution you'd like
[A clear and concise description of what you want to happen.]
https://docs.microsoft.com/en-us/azure/virtual-machines/linux/n-series-driver-setup
Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]
Environment:
kubectl version
):/etc/os-release
):The text was updated successfully, but these errors were encountered: