Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nvidia: support disabling the nvidia plugin #8353

Merged
merged 2 commits into from
Jul 21, 2020
Merged

Conversation

notnoop
Copy link
Contributor

@notnoop notnoop commented Jul 3, 2020

This provides a workaround for where nomad does not support the nvidia driver library installed on host. The crash in #8303 is a result of an attempt to fingerprint using a C function that's not resolved, so process crashes. Because this happens before agent is up, the logs are completely hidden.

To disable the plugin, one adds the following to the agent config:

plugin "nvidia-gpu" {
  config {
    enabled = false
  }
}

Fixes #8303

@notnoop notnoop self-assigned this Jul 3, 2020
Copy link
Contributor

@drewbailey drewbailey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Member

@tgross tgross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation here LGTM. Just left one comment worrying about confusing/noisy logging, but if I'm up the wrong tree on that this looks good-to-go.

devices/gpu/nvidia/device.go Outdated Show resolved Hide resolved
@notnoop notnoop added this to the 0.12.1 milestone Jul 21, 2020
@notnoop notnoop merged commit 60dd7ae into master Jul 21, 2020
@notnoop notnoop deleted the f-disable-nvidia-option branch July 21, 2020 14:11
@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 29, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

nomad agent dev fails to start with 'undefined symbol: nvmlDeviceGetPciInfo_v3'
3 participants