Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: bumping the grid driver affecting linux kernels 5.15.1063+ versions #381

Merged

Conversation

Bryce-Soghigian
Copy link
Collaborator

@Bryce-Soghigian Bryce-Soghigian commented May 31, 2024

Description
See: Azure/AgentBaker#4429 for additional context as to why we are bumping.
How was this change tested?

Does this change impact docs?

  • Yes, PR includes docs updates
  • Yes, issue opened: #
  • No

Release Note


Copy link
Collaborator Author

@Bryce-Soghigian Bryce-Soghigian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/test

@coveralls
Copy link

coveralls commented May 31, 2024

Pull Request Test Coverage Report for Build 9356764125

Details

  • 1 of 1 (100.0%) changed or added relevant line in 1 file are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage remained the same at 97.774%

Totals Coverage Status
Change from base Build 9074172591: 0.0%
Covered Lines: 36279
Relevant Lines: 37105

💛 - Coveralls

Copy link
Collaborator Author

@Bryce-Soghigian Bryce-Soghigian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/test

EDIT: This won't work without the sha I believe.

Copy link
Collaborator Author

@Bryce-Soghigian Bryce-Soghigian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/test

@Bryce-Soghigian
Copy link
Collaborator Author

Bryce-Soghigian commented May 31, 2024

Looks like its fixed from version bump going in. I ran the e2es locally and via the pipeline both pass

~/dev/focus/karpenter-provider-azure (bsoghigian/gpu-driver-bump*) » k get pods -A sillygoose@Bryces-MacBook-Pro
NAMESPACE NAME READY STATUS RESTARTS AGE
default devourerwhite-5-vfh02dctly-7968d96bb-mdjdv 1/1 Running 0 3m5s
gatekeeper-system gatekeeper-audit-57fc5568f8-5b987 1/1 Running 0 2m26s
gatekeeper-system gatekeeper-controller-6494586d5d-7k4db 1/1 Running 0 2m26s
gatekeeper-system gatekeeper-controller-6494586d5d-dqnfd 1/1 Running 0 2m26s
karpenter karpenter-66d689464f-jljmz 1/1 Running 0 6m23s
kube-system azure-cns-5s6sv 1/1 Running 0 43s
kube-system azure-cns-7g95z 1/1 Running 0 11m
kube-system azure-cns-p6xn8 1/1 Running 0 11m
kube-system azure-cns-z7c45 1/1 Running 0 11m
kube-system azure-ip-masq-agent-8njp2 1/1 Running 0 11m
kube-system azure-ip-masq-agent-dxlfm 1/1 Running 0 11m
kube-system azure-ip-masq-agent-fp8j2 1/1 Running 0 11m
kube-system azure-ip-masq-agent-k89d4 1/1 Running 0 43s
kube-system azure-policy-d76896767-r8rp9 1/1 Running 0 2m26s
kube-system azure-policy-webhook-564c9d7c7b-v65wr 1/1 Running 0 2m26s
kube-system azure-wi-webhook-controller-manager-7585698f56-4jnwz 1/1 Running 0 10m
kube-system azure-wi-webhook-controller-manager-7585698f56-5s9g9 1/1 Running 0 10m
kube-system cilium-fc5b5 1/1 Running 0 42s
kube-system cilium-mp6sj 1/1 Running 0 11m
kube-system cilium-operator-559887cf4-7w6rt 1/1 Running 0 11m
kube-system cilium-operator-559887cf4-gdw9m 1/1 Running 0 11m
kube-system cilium-sml28 1/1 Running 0 11m
kube-system cilium-wvrgh 1/1 Running 0 11m
kube-system cloud-node-manager-lzv8v 1/1 Running 0 11m
kube-system cloud-node-manager-q6vx2 1/1 Running 0 11m
kube-system cloud-node-manager-xwkws 1/1 Running 0 42s
kube-system cloud-node-manager-z7sfg 1/1 Running 0 11m
kube-system coredns-767bfbd4fb-n876d 1/1 Running 0 11m
kube-system coredns-767bfbd4fb-zrjfk 1/1 Running 0 10m
kube-system coredns-autoscaler-c6649b67c-b5x85 1/1 Running 0 11m
kube-system csi-azuredisk-node-779m5 3/3 Running 0 11m
kube-system csi-azuredisk-node-fv7ll 3/3 Running 0 11m
kube-system csi-azuredisk-node-mzv6c 3/3 Running 0 11m
kube-system csi-azuredisk-node-x6cnl 3/3 Running 0 43s
kube-system csi-azurefile-node-4pwqk 3/3 Running 0 11m
kube-system csi-azurefile-node-kldhr 3/3 Running 0 11m
kube-system csi-azurefile-node-mkkd7 3/3 Running 0 42s
kube-system csi-azurefile-node-rgw62 3/3 Running 0 11m
kube-system konnectivity-agent-c98b47dbd-5vqm9 1/1 Running 0 11m
kube-system konnectivity-agent-c98b47dbd-dpkts 1/1 Running 0 11m
kube-system metrics-server-76d77694d4-9pv6p 2/2 Running 0 10m
kube-system metrics-server-76d77694d4-r8r6d 2/2 Running 0 10m
kube-system nvidia-device-plugin-daemonset-vv8t7 1/1 Running 0 21s

Copy link
Collaborator Author

@Bryce-Soghigian Bryce-Soghigian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/test

Copy link
Collaborator Author

@Bryce-Soghigian Bryce-Soghigian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/test

@Bryce-Soghigian Bryce-Soghigian added the area/gpu Issues or PRs related to GPUs label Jun 3, 2024
pkg/utils/gpu.go Outdated Show resolved Hide resolved
Copy link
Collaborator Author

@Bryce-Soghigian Bryce-Soghigian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/test

Copy link
Collaborator Author

@Bryce-Soghigian Bryce-Soghigian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/test

@Bryce-Soghigian Bryce-Soghigian marked this pull request as ready for review June 3, 2024 20:24
@Bryce-Soghigian
Copy link
Collaborator Author

Going to merge as these e2es are unrelated to my change and the GPU ones are passing. E2Es are failing on cluster create steps and not karpenter logic itself.

@Bryce-Soghigian Bryce-Soghigian merged commit cf27169 into Azure:main Jun 4, 2024
15 of 18 checks passed
Bryce-Soghigian added a commit that referenced this pull request Sep 12, 2024
…sions (#381)

* chore: bumping the grid driver affecting linux kernels 5.15.1063+ versions

* chore: bumping the cuda driver versions too

* chore: fix gpu sha

* fix: use different driver version for 550

* fix: bumping major version to 550

* ci
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/gpu Issues or PRs related to GPUs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants