Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add LB exclusion label when deleting node #2518

Merged
merged 1 commit into from
Sep 20, 2022
Merged

feat: Add LB exclusion label when deleting node #2518

merged 1 commit into from
Sep 20, 2022

Conversation

DWSR
Copy link
Contributor

@DWSR DWSR commented Sep 16, 2022

Fixes # N/A

Description

Currently, when Karpenter drains and then deletes a Node from the
cluster, if that node is registered in a Target Group for an ALB/NLB the
corresponding EC2 instance is not removed. This leads to the potential
for increased errors when deleting nodes via Karpenter.

In order to help resolve this issue, this change adds the well-known
node.kubernetes.io/exclude-from-external-balancers label, which will
case the AWS LB controller to remove the node from the Target Group
while Karpenter is draining the node. This is similar to how the AWS
Node Termination Handler works (see
aws/aws-node-termination-handler#316).

In future, Karpenter might be enhanced to be able to wait for a
configurable period before deleting the Node and terminating the
associated instance as currently there's a race condition between the
Pods being drained off of the Node and the EC2 instance being removed
from the target group.

How was this change tested?

  • Running Karpenter against a manual cluster that had the AWS LB Controller installed and observing the de-registration in progress.

Does this change impact docs?

  • Yes, PR includes docs updates
  • Yes, issue opened: #
  • No

Release Note

NONE

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@DWSR DWSR requested a review from a team as a code owner September 16, 2022 01:43
@DWSR DWSR requested a review from tzneal September 16, 2022 01:43
@netlify
Copy link

netlify bot commented Sep 16, 2022

Deploy Preview for karpenter-docs-prod ready!

Name Link
🔨 Latest commit 3278012
🔍 Latest deploy log https://app.netlify.com/sites/karpenter-docs-prod/deploys/6329d1b77d89bf00097b1bbe
😎 Deploy Preview https://deploy-preview-2518--karpenter-docs-prod.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

@bwagner5
Copy link
Contributor

Thanks for this! Would you be able to add a test as well within the termination suite_test.go?

pkg/controllers/termination/terminate.go Outdated Show resolved Hide resolved
pkg/controllers/termination/terminate.go Show resolved Hide resolved
@DWSR
Copy link
Contributor Author

DWSR commented Sep 17, 2022

Thanks for this! Would you be able to add a test as well within the termination suite_test.go?

Added, but I'm not entirely happy with the fact that I'm (ab)using a do-not-evict Pod. Feel free to suggest a better way to perform the test

@DWSR DWSR requested review from jonathan-innis and removed request for tzneal September 17, 2022 20:57
Currently, when Karpenter drains and then deletes a Node from the
cluster, if that node is registered in a Target Group for an ALB/NLB the
corresponding EC2 instance is not removed. This leads to the potential
for increased errors when deleting nodes via Karpenter.

In order to help resolve this issue, this change adds the well-known
`node.kubernetes.io/exclude-from-external-balancers` label, which will
case the AWS LB controller to remove the node from the Target Group
while Karpenter is draining the node. This is similar to how the AWS
Node Termination Handler works (see
aws/aws-node-termination-handler#316).

In future, Karpenter might be enhanced to be able to wait for a
configurable period before deleting the Node and terminating the
associated instance as currently there's a race condition between the
Pods being drained off of the Node and the EC2 instance being removed
from the target group.
Copy link
Contributor

@dewjam dewjam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants