Implement priority based evictor #6139

damikag · 2023-09-25T12:48:01Z

What type of PR is this?

/kind bug

What this PR does / why we need it:

This PR implements a priority based evictor for pod eviction in scale down. This evictor can be used to enforce non-critical pods get evicted before critical pods. When this evictor is used it groups pods by priority and evicts group by group from lowest priority to highest.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?

This PR introduces a new evictor that can be enabled and configured by --drain-priority-config flag. Setting and empty string will disable the feature and use the default unordered evictor.

Release-note

A new flag (--drain-priority-config) is introduced which allows users to configure drain behavior during scale-down based on pod priority. The new flag is mutually exclusive with --max-graceful-termination-sec. --max-graceful-termination-sec can still be used if the new configuration options are not needed. The default behavior is preserved (simple config, default value of --max-graceful-termination-sec).

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

towca

Separate from the review comments, I'm wondering if this is what we want to do. kubectl drain also performs node drain, but it doesn't reuse the kubelet logic: https://kubernetes.io/docs/reference/generated/kubectl/kubectl-commands#drain. Most notably, it doesn't evict DaemonSet pods at all, on the basis that they will be recreated and scheduled on the same node again (DaemonSet pods bypass the usual unready/unschedulable checks while scheduling), and will then be forcibly terminated anyway. I've definitely seen this happen in practice, which makes me think that maybe we should consider reverting to our previous behavior of not evicting DaemonSet pods at all. At which point, maybe this priority feature is not needed.

@x13n @MaciekPytel I'm curious to hear your thoughts on this.

cluster-autoscaler/main.go

cluster-autoscaler/metrics/metrics.go

cluster-autoscaler/core/scaledown/actuation/drain.go

cluster-autoscaler/core/scaledown/actuation/priority_evictor.go

cluster-autoscaler/metrics/metrics.go

cluster-autoscaler/core/scaledown/actuation/drain.go

cluster-autoscaler/core/scaledown/actuation/priority_evictor.go

x13n · 2023-10-10T09:01:16Z

My understanding was that DS eviction was introduced to give some of them heads up before deleting a node - e.g. logging agents may need non-negligible amount of time to flush logs to some backend. If we just delete the VM under them, their cleanup may not complete in time.

We don't currently set unschedulable bit in CA, only taint the node before deletion. Proper cordoning (setting unschedulable: true should prevent DS pods from scheduling again. Maybe we should do this first? The problem with DaemonSets is that they sometimes use a wildcard taint - we shouldn't have such nodes schedule again, so setting unschedulable field instead of (or in addition to) the taint seems like a better option.

x13n · 2023-10-10T09:05:51Z

/assign @towca

(since you're already reviewing it anyway)

towca · 2023-10-10T11:29:52Z

Good point, although with taints it's easy to determine their "ownership" by name, so we know which taints to clean up in error-handling flows. With the unschedulable bit, we wouldn't be able to know if it was CA or something else setting it.

In any case, it seems like we have important use cases for evicting DaemonSet pods even if they schedule back on the node afterwards. Let's move forward with this PR @damikag

MaciekPytel · 2023-10-10T12:17:01Z

Does this PR introduce a user-facing change?

This very much does introduce a user-facing behavior change and should have a release note explaining it.

cluster-autoscaler/core/scaledown/actuation/drain_test.go

cluster-autoscaler/core/scaledown/actuation/evictor.go

cluster-autoscaler/main.go

cluster-autoscaler/core/scaledown/actuation/drain.go

cluster-autoscaler/core/scaledown/actuation/priority_evictor.go

towca

Thanks for addressing my previous comments, this approach is way more readable on a high level. Sorry for the delay and the number of new comments. This is a critical part of CA that has historically had huge readability problems, so I'm erring on the side of thoroughness here.

cluster-autoscaler/metrics/metrics.go

cluster-autoscaler/main.go

cluster-autoscaler/config/autoscaling_options.go

cluster-autoscaler/core/scaledown/actuation/actuator.go

cluster-autoscaler/core/scaledown/actuation/actuator_test.go

cluster-autoscaler/core/scaledown/actuation/drain.go

cluster-autoscaler/core/scaledown/actuation/priority_evictor.go

cluster-autoscaler/core/scaledown/actuation/drain_test.go

cluster-autoscaler/core/scaledown/actuation/drain.go

cluster-autoscaler/core/scaledown/actuation/group_deletion_scheduler.go

cluster-autoscaler/main.go

cluster-autoscaler/core/scaledown/actuation/actuator.go

cluster-autoscaler/config/autoscaling_options.go

towca · 2023-12-21T16:01:18Z

Could you also add Release Notes to the PR description before we merge?

towca · 2023-12-21T16:31:08Z

The release note goes a bit too deep into details for an average reader. I'd put something like this:

A new flag (--drain-priority-config) is introduced which allows users to configure drain behavior during scale-down based on pod priority. The new flag is mutually exclusive with --max-graceful-termination-sec. --max-graceful-termination-sec can still be used if the new configuration options are not needed. The default behavior is preserved (simple config, default value of --max-graceful-termination-sec).

towca · 2023-12-21T17:05:42Z

Thanks for all the hard work on this!
/lgtm
/approve

k8s-ci-robot · 2023-12-21T17:06:07Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: damikag, towca

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~cluster-autoscaler/OWNERS~~ [towca]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

kost2191 · 2024-02-04T08:44:34Z

@towca Can you please give an example for this config? can I prioritise drain of node in the same way as priority expander?

towca · 2024-02-08T15:48:28Z

@kost2191 No, this allows you to configure the order in which pods are evicted from a single node, whenever CA decides to scale it down. It doesn't affect the choice of which node is scaled down.

An example config could be 0:3600,2000000000:600. This means that whenever CA scales down a node, it does so in two batches. First, all pods with priority in the [0, 2000000000) range are evicted and CA waits 60min before they're force-terminated. After all pods from the first batch are evicted (whether forcibly or not), CA evicts all pods with priority >=2000000000, and waits 10min before force-terminating them.

The feature can be useful if some of the pods depend on other pods (e.g. metric/logging agents) still running on the node during graceful termination.

k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Sep 25, 2023

k8s-ci-robot requested review from BigDarkClown and x13n September 25, 2023 12:48

k8s-ci-robot added the area/cluster-autoscaler label Sep 25, 2023

damikag force-pushed the priority-evictor branch from 1a38ae9 to e85cb04 Compare September 25, 2023 13:15

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 27, 2023

damikag force-pushed the priority-evictor branch from e85cb04 to 95960d8 Compare October 5, 2023 08:42

towca requested changes Oct 9, 2023

View reviewed changes

k8s-ci-robot assigned towca Oct 10, 2023

x13n mentioned this pull request Oct 13, 2023

Only evict ds pods after other pods are evicted #5674

Closed

damikag force-pushed the priority-evictor branch from 95960d8 to f05d89d Compare October 16, 2023 16:29

damikag requested a review from towca October 17, 2023 08:34

towca requested changes Oct 26, 2023

View reviewed changes

damikag requested a review from towca November 14, 2023 09:37

towca reviewed Nov 16, 2023

View reviewed changes

damikag force-pushed the priority-evictor branch from 421b6fe to 6ebffcc Compare November 28, 2023 09:04

damikag requested a review from towca November 28, 2023 09:28

damikag force-pushed the priority-evictor branch from 6ebffcc to ad96ca5 Compare December 4, 2023 12:14

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 4, 2023

damikag force-pushed the priority-evictor branch from ad96ca5 to 64caecb Compare December 20, 2023 11:54

towca reviewed Dec 21, 2023

View reviewed changes

damikag force-pushed the priority-evictor branch from 64caecb to 9e220ba Compare December 21, 2023 16:09

damikag requested a review from towca December 21, 2023 16:10

damikag force-pushed the priority-evictor branch from 9e220ba to 4a4c0d9 Compare December 21, 2023 16:51

implement priority based evictor and refactor drain logic

9ffbea4

damikag force-pushed the priority-evictor branch from 4a4c0d9 to 9ffbea4 Compare December 21, 2023 16:57

k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Dec 21, 2023

towca approved these changes Dec 21, 2023

View reviewed changes

k8s-ci-robot merged commit fc48d5c into kubernetes:master Dec 21, 2023
6 checks passed

rouke-broersma mentioned this pull request Aug 20, 2024

[Feature] Support cluster-autoscaler drain-priority-config parameter Azure/AKS#4493

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement priority based evictor #6139

Implement priority based evictor #6139

damikag commented Sep 25, 2023 •

edited

Loading

towca left a comment

x13n commented Oct 10, 2023

x13n commented Oct 10, 2023

towca commented Oct 10, 2023

MaciekPytel commented Oct 10, 2023

towca left a comment

towca commented Dec 21, 2023

towca commented Dec 21, 2023

towca commented Dec 21, 2023

k8s-ci-robot commented Dec 21, 2023

kost2191 commented Feb 4, 2024 •

edited

Loading

towca commented Feb 8, 2024

Implement priority based evictor #6139

Implement priority based evictor #6139

Conversation

damikag commented Sep 25, 2023 • edited Loading

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Release-note

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

towca left a comment

Choose a reason for hiding this comment

x13n commented Oct 10, 2023

x13n commented Oct 10, 2023

towca commented Oct 10, 2023

MaciekPytel commented Oct 10, 2023

towca left a comment

Choose a reason for hiding this comment

towca commented Dec 21, 2023

towca commented Dec 21, 2023

towca commented Dec 21, 2023

k8s-ci-robot commented Dec 21, 2023

kost2191 commented Feb 4, 2024 • edited Loading

towca commented Feb 8, 2024

damikag commented Sep 25, 2023 •

edited

Loading

kost2191 commented Feb 4, 2024 •

edited

Loading