Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there a way to verify if plugin is installed and running as it should be ? #35

Open
bettaps opened this issue Feb 7, 2024 · 1 comment

Comments

@bettaps
Copy link

bettaps commented Feb 7, 2024

Hello,

We have enabled GPU plugin on our Nomad client node, however GPU jobs are not getting allocated on this node. Furthermore running a 'nomad node status' on this node does not give us any info on number of GPUs available.
I can see the plugin installed under the plugins directory. And i am able to run nvidia/cuda docker container on this node directly plus nvidia-smi command gives me the correct info about the number of GPUS available.
I suspect that this is a plugin issue, how can i verify if the plugin has been installed correctly ?
Or could the issue be something else ?

Thanks!

@bettaps bettaps changed the title Is there a way to verify if plugin is installed and running successfully ? Is there a way to verify if plugin is installed and running as it should be ? Feb 7, 2024
@dobecad
Copy link

dobecad commented Feb 8, 2024

Do your Nomad client logs say that your GPU is failing to get fingerprinted? If so, that was the same issue I was having. The only fix I found was to re-compile the nomad-device-nvidia plugin with an updated version of the go-nvml library (specifically, v0.12.0-2)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants