Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix default package list for upgrade workflow #33

Merged
merged 2 commits into from
Apr 14, 2021

Conversation

ajdecon
Copy link
Collaborator

@ajdecon ajdecon commented Mar 18, 2021

Due to the way the package dependencies are structured, trying to upgrade to a new driver branch with the Canonical packages currently fails.

For example, if we install the 450 driver branch with DeepOps, and then try to run the playbook again with nvidia_driver_ubuntu_branch: 460, we see an error like this:

TASK [nvidia.nvidia_driver : install driver packages] ***********************************************************************************
failed: [gpu01] (item=['nvidia-headless-460-server', 'nvidia-utils-460-server']) => changed=false
  ansible_loop_var: item
  cache_update_time: 1615947575
  cache_updated: false
  item:
  - nvidia-headless-460-server
  - nvidia-utils-460-server
  msg: |-
    '/usr/bin/apt-get -y -o "Dpkg::Options::=--force-confdef" -o "Dpkg::Options::=--force-confold"      install 'nvidia-headless-460-server' 'nvidia-utils-460-server'' failed: E: Unable to correct problems, you have held broken packages.
  rc: 100
  stderr: |-
    E: Unable to correct problems, you have held broken packages.
  stderr_lines: <omitted>
  stdout: |-
    Reading package lists...
    Building dependency tree...
    Reading state information...
    Some packages could not be installed. This may mean that you have
    requested an impossible situation or if you are using the unstable
    distribution that some required packages have not yet been created
    or been moved out of Incoming.
    The following information may help to resolve the situation:
    The following packages have unmet dependencies:
     nvidia-headless-460-server : Depends: nvidia-headless-no-dkms-460-server but it is not going to be installed
  stdout_lines: <omitted>

Adding nvidia-headless-no-dkms-{{ nvidia_driver_ubuntu_branch }}-server to the list of packages we specify explicitly as part of the install appears to resolve the issue. The playbook runs successfully, and when we check the package list it shows all packages are in the new driver branch (460). (Note that all 450 packages are in rc state)

vagrant@ubuntu1804:~$ dpkg -l | grep nvidia
ii  libnvidia-cfg1-460-server:amd64       460.32.03-0ubuntu0.18.04.1        amd64        NVIDIA binary OpenGL/GLX configuration library
rc  libnvidia-compute-450-server:amd64    450.102.04-0ubuntu0.18.04.1       amd64        NVIDIA libcompute package
ii  libnvidia-compute-460-server:amd64    460.32.03-0ubuntu0.18.04.1        amd64        NVIDIA libcompute package
rc  nvidia-compute-utils-450-server       450.102.04-0ubuntu0.18.04.1       amd64        NVIDIA compute utilities
ii  nvidia-compute-utils-460-server       460.32.03-0ubuntu0.18.04.1        amd64        NVIDIA compute utilities
rc  nvidia-dkms-450-server                450.102.04-0ubuntu0.18.04.1       amd64        NVIDIA DKMS package
ii  nvidia-dkms-460-server                460.32.03-0ubuntu0.18.04.1        amd64        NVIDIA DKMS package
ii  nvidia-headless-460-server            460.32.03-0ubuntu0.18.04.1        amd64        NVIDIA headless metapackage
ii  nvidia-headless-no-dkms-460-server    460.32.03-0ubuntu0.18.04.1        amd64        NVIDIA headless metapackage - no DKMS
rc  nvidia-kernel-common-450-server       450.102.04-0ubuntu0.18.04.1       amd64        Shared files used with the kernel module
ii  nvidia-kernel-common-460-server       460.32.03-0ubuntu0.18.04.1        amd64        Shared files used with the kernel module
ii  nvidia-kernel-source-460-server       460.32.03-0ubuntu0.18.04.1        amd64        NVIDIA kernel source package
ii  nvidia-utils-460-server               460.32.03-0ubuntu0.18.04.1        amd64        NVIDIA Server Driver support binaries

@ajdecon ajdecon requested a review from dholt March 18, 2021 22:06
@acoastalfog
Copy link

Upgrade path from 450 to 460 failed on my single node install without

  • "nvidia-kernel-source-{{ nvidia_driver_ubuntu_branch }}-server"

in addition.

@ajdecon
Copy link
Collaborator Author

ajdecon commented Mar 25, 2021

@acoastalfog : Hmm. That wasn't needed in my testing, but OTOH I don't see a downside to including it in the explicit package list. Added!

@dholt dholt merged commit 9f83a51 into NVIDIA:master Apr 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants