feat: Add LB exclusion label when deleting node #2518

DWSR · 2022-09-16T01:43:04Z

Fixes # N/A

Description

Currently, when Karpenter drains and then deletes a Node from the
cluster, if that node is registered in a Target Group for an ALB/NLB the
corresponding EC2 instance is not removed. This leads to the potential
for increased errors when deleting nodes via Karpenter.

In order to help resolve this issue, this change adds the well-known
node.kubernetes.io/exclude-from-external-balancers label, which will
case the AWS LB controller to remove the node from the Target Group
while Karpenter is draining the node. This is similar to how the AWS
Node Termination Handler works (see
aws/aws-node-termination-handler#316).

In future, Karpenter might be enhanced to be able to wait for a
configurable period before deleting the Node and terminating the
associated instance as currently there's a race condition between the
Pods being drained off of the Node and the EC2 instance being removed
from the target group.

How was this change tested?

Running Karpenter against a manual cluster that had the AWS LB Controller installed and observing the de-registration in progress.

Does this change impact docs?

Yes, PR includes docs updates
Yes, issue opened: #
No

Release Note

NONE

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

netlify · 2022-09-16T01:43:31Z

✅ Deploy Preview for karpenter-docs-prod ready!

Name	Link
🔨 Latest commit	`3278012`
🔍 Latest deploy log	https://app.netlify.com/sites/karpenter-docs-prod/deploys/6329d1b77d89bf00097b1bbe
😎 Deploy Preview	https://deploy-preview-2518--karpenter-docs-prod.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

bwagner5 · 2022-09-16T14:22:25Z

Thanks for this! Would you be able to add a test as well within the termination suite_test.go?

pkg/controllers/termination/terminate.go

DWSR · 2022-09-17T15:30:10Z

Thanks for this! Would you be able to add a test as well within the termination suite_test.go?

Added, but I'm not entirely happy with the fact that I'm (ab)using a do-not-evict Pod. Feel free to suggest a better way to perform the test

pkg/controllers/termination/suite_test.go

Currently, when Karpenter drains and then deletes a Node from the cluster, if that node is registered in a Target Group for an ALB/NLB the corresponding EC2 instance is not removed. This leads to the potential for increased errors when deleting nodes via Karpenter. In order to help resolve this issue, this change adds the well-known `node.kubernetes.io/exclude-from-external-balancers` label, which will case the AWS LB controller to remove the node from the Target Group while Karpenter is draining the node. This is similar to how the AWS Node Termination Handler works (see aws/aws-node-termination-handler#316). In future, Karpenter might be enhanced to be able to wait for a configurable period before deleting the Node and terminating the associated instance as currently there's a race condition between the Pods being drained off of the Node and the EC2 instance being removed from the target group.

dewjam

LGTM

Changes were implemented.

DWSR requested a review from a team as a code owner September 16, 2022 01:43

DWSR requested a review from tzneal September 16, 2022 01:43

jonathan-innis previously requested changes Sep 16, 2022

View reviewed changes

pkg/controllers/termination/terminate.go Outdated Show resolved Hide resolved

pkg/controllers/termination/terminate.go Show resolved Hide resolved

DWSR force-pushed the main branch from 85747e7 to 5c48d5e Compare September 17, 2022 15:28

DWSR requested review from jonathan-innis and removed request for tzneal September 17, 2022 20:57

jonathan-innis assigned jonathan-innis and dewjam and unassigned jonathan-innis Sep 19, 2022

dewjam reviewed Sep 20, 2022

View reviewed changes

pkg/controllers/termination/suite_test.go Outdated Show resolved Hide resolved

DWSR force-pushed the main branch from 5c48d5e to 3278012 Compare September 20, 2022 14:44

dewjam approved these changes Sep 20, 2022

View reviewed changes

dewjam merged commit 2c15b29 into aws:main Sep 20, 2022

adri mentioned this pull request May 30, 2023

Feature request: Deregister terminating nodes from ALB to avoid 5xx errors zalando-incubator/kube-ingress-aws-controller#604

Closed

derbauer97 mentioned this pull request Jul 13, 2023

feat: add label to exclude from external loadbalancers TwiN/aws-eks-asg-rolling-update-handler#131

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add LB exclusion label when deleting node #2518

feat: Add LB exclusion label when deleting node #2518

DWSR commented Sep 16, 2022

netlify bot commented Sep 16, 2022 •

edited

Loading

bwagner5 commented Sep 16, 2022

DWSR commented Sep 17, 2022

dewjam left a comment

feat: Add LB exclusion label when deleting node #2518

feat: Add LB exclusion label when deleting node #2518

Conversation

DWSR commented Sep 16, 2022

netlify bot commented Sep 16, 2022 • edited Loading

✅ Deploy Preview for karpenter-docs-prod ready!

bwagner5 commented Sep 16, 2022

DWSR commented Sep 17, 2022

dewjam left a comment

Choose a reason for hiding this comment

netlify bot commented Sep 16, 2022 •

edited

Loading