Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle kubelet restarts #59

Merged
merged 3 commits into from
Sep 23, 2022

Conversation

cdesiniotis
Copy link
Contributor

Signed-off-by: Christopher Desiniotis cdesiniotis@nvidia.com

Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
This allows us to perform basic health checks even if NVML
is not available inside of the container (i.e. when NVIDIA
Container Toolkit is not running on the system). If NVML
is available inside of the container, then we also watch
for any XID errors reported.

Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
@rthallisey rthallisey merged commit f7b0769 into NVIDIA:master Sep 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants