Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BREAKING CHANGE: replace node.kubernetes.io/unschedulable with a Karpenter-specific taint #508

Merged
merged 30 commits into from
Oct 12, 2023

Conversation

njtran
Copy link
Contributor

@njtran njtran commented Sep 11, 2023

Fixes #N/A

Description
Design Doc: #585

Karpenter will now use the karpenter.sh/disrupting taint instead of node.kubernetes.io/unschedulable taint when deprovisioning nodes. Karpenter only adds this taint once it has begun spinning up replacement nodes for deprovisioning actions, and subsequently deleting the candidate node.

  • Karpenter would add/remove this taint, assuming no other agent had been managing this taint.
  • This fixes an issue where in rare cases where Karpenter restarted while executing a deprovisioning action, Karpenter can leave nodes that were meant to be deprovisioned as cordoned.
  • This breaks some users with pods that tolerate the node.kubernetes.io/unschedulable taint or tolerate all keys: a * toleration. If users want to ensure their daemonsets and other pods that had this toleration are not evicted as part of termination, they'll need to add this toleration to their pods.
  • This also modifies the cluster state logic to remove the stateNode.MarkedForDeletion variable. Karpenter can now rely on the existence of the karpenter.sh/disrupting taint to know when a node has been marked for deletion.

How was this change tested?

  • make presubmit
  • make apply && manual test

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@njtran njtran requested a review from a team as a code owner September 11, 2023 17:31
@njtran njtran requested a review from tzneal September 11, 2023 17:31
@njtran njtran added the blocked Unable to make progress due to some dependency label Sep 15, 2023
@njtran
Copy link
Contributor Author

njtran commented Sep 15, 2023

Adding the blocked label so we can first vet if changing this taint right now is what we want.

Copy link
Member

@jonathan-innis jonathan-innis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Just a few thoughts.

pkg/apis/v1beta1/labels.go Outdated Show resolved Hide resolved
pkg/controllers/deprovisioning/controller.go Outdated Show resolved Hide resolved
pkg/controllers/state/statenode.go Outdated Show resolved Hide resolved
@jonathan-innis jonathan-innis removed the blocked Unable to make progress due to some dependency label Oct 10, 2023
pkg/controllers/state/statenode.go Outdated Show resolved Hide resolved
pkg/controllers/deprovisioning/controller.go Outdated Show resolved Hide resolved
pkg/controllers/deprovisioning/controller.go Outdated Show resolved Hide resolved
pkg/controllers/deprovisioning/controller.go Outdated Show resolved Hide resolved
pkg/apis/v1beta1/taints.go Show resolved Hide resolved
pkg/controllers/deprovisioning/controller.go Outdated Show resolved Hide resolved
pkg/controllers/deprovisioning/controller.go Show resolved Hide resolved
pkg/controllers/deprovisioning/controller.go Show resolved Hide resolved
pkg/controllers/deprovisioning/suite_test.go Outdated Show resolved Hide resolved
pkg/controllers/deprovisioning/suite_test.go Show resolved Hide resolved
jonathan-innis
jonathan-innis previously approved these changes Oct 12, 2023
Copy link
Member

@jonathan-innis jonathan-innis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚀

Copy link
Member

@jonathan-innis jonathan-innis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚀

@njtran njtran enabled auto-merge (squash) October 12, 2023 20:47
@coveralls
Copy link

coveralls commented Oct 12, 2023

Pull Request Test Coverage Report for Build 6500758663

  • 54 of 64 (84.38%) changed or added relevant lines in 4 files are covered.
  • 4 unchanged lines in 1 file lost coverage.
  • Overall coverage decreased (-0.006%) to 82.189%

Changes Missing Coverage Covered Lines Changed/Added Lines %
pkg/controllers/deprovisioning/controller.go 43 53 81.13%
Files with Coverage Reduction New Missed Lines %
pkg/controllers/deprovisioning/controller.go 4 80.95%
Totals Coverage Status
Change from base Build 6500425875: -0.006%
Covered Lines: 9077
Relevant Lines: 11044

💛 - Coveralls

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants