-
Notifications
You must be signed in to change notification settings - Fork 146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failed to initialize NVML: ERROR_UNKNOWN #452
Comments
it seems your nvidia-driver may not be installed correctly, you can try install nvidia-device-plugin v0.14, can see if that can be launched correctly |
NVIDIA GPU Operator works fine, but nvidia-device-plugin v0.14.5 has the same error:
|
you can look |
You mean NVIDIA Container Toolkit? |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
If I install hami without privileged=true in daemonsetnvidia.yaml, device-plugin is CrashLoopBackOff.
Here is device-plugin's log:
If I installed hami with privileged=true in daemonsetnvidia.yaml, device-plugin works well. However, containers that request vGPU will encounter following error:
Here is vgpu-scheduler-extender's log:
Ubuntu: 22.04.4
Kubernetes: RKE2 1.28.12
Containerd: v1.7.17-k3s1
NVIDIA Container Toolkit: 1.15.0
The text was updated successfully, but these errors were encountered: