Prepare pod evictor for the descheduling framework plugin #846

ingvagabund · 2022-06-09T09:50:33Z

Pass the strategy name into evictor through context to reduce the list of parameters to miminum
Drop node parameter as it can be retrieved from the pod object
EvictPod: stop returning error and have higher processes decide when is the right time to exit a strategy loop

Pre-requisite for #837

ingvagabund · 2022-06-09T10:55:19Z

CC @knelasevero

ingvagabund · 2022-06-09T11:03:18Z

/retest

pkg/descheduler/evictions/evictions.go

a7i · 2022-06-09T22:44:49Z

pkg/descheduler/strategies/duplicates.go

-						klog.ErrorS(err, "Error evicting pod", "pod", klog.KObj(pod))
-						break
-					}
+					podEvictor.EvictPod(ctx, pod)


It appears that functionality is being changed here? In other words, before it would break on a single "failed" eviction but now it loops through all pods.

Should it be preserved?

Suggested change

podEvictor.EvictPod(ctx, pod)

if !podEvictor.EvictPod(ctx, pod) {

break

}

In the past there was only a single error issued (exceeding limit of number of pods allowed to be evicted per node). So it made sense to break and move to another node. However, few months back we added a limit for the number of pods evicted per namespace. So when the namespace limit gets exceeded we can still continue evicting.

Though, I plan to introduce a check for the node limit exceeded. I will incorporate it in this PR once #847 is merged. I am still performing changes to figure out the smallest amount of refactoring so I can move the evictor filter bits into a plugin. Once done we can start moving the strategies into plugins.

Just introduced NodeLimitExceeded method for performing the check.

ingvagabund · 2022-06-13T13:48:08Z

#847 needs to be merged first.

ingvagabund · 2022-06-13T15:53:08Z

Rebasing on top of #847

ingvagabund · 2022-06-16T14:38:18Z

@damemi @a7i ptal

a7i · 2022-06-16T15:32:50Z

pkg/descheduler/evictions/evictions.go

+		reason = ctx.Value("evictionReason").(string)
+	}
+
+	if pod.Spec.NodeName == "" {


PodLifeTime strategy allows Pods in Pending state. I assume there could be a state where the Pod has not been scheduled yet or cannot but needs to be evicted for a retry?

/b2418ef481298c6caf185a5f88dd0bb6ddc1cdbf/pkg/descheduler/strategies/pod_lifetime.go#L42

I will alter the code to take this case into account. Thanks for noticing!

On second thought... after scanning the code, it seems that it only lists Pods on Nodes. We should probably discuss changing that but not part of this PR.

We might remove condition in https://github.com/kubernetes-sigs/descheduler/blob/master/pkg/descheduler/pod/pods.go#L129-L131 to have pending pods included in getPodsAssignedToNode function. Considering an empty nodename as a special case.

The method uses the node object to only get the node name. The node name can be retrieved from the pod object. Some strategies might try to evict a pod in Pending state which does not have the .spec.nodeName field set. Thus, skipping the test for the node limit.

When an error is returned a strategy either stops completely or starts processing another node. Given the error can be a transient error or only one of the limits can get exceeded it is fair to just skip a pod that failed eviction and proceed to the next instead. In order to optimize the processing and stop earlier, it is more practical to implement a check which will say when a limit was exceeded.

a7i · 2022-06-20T20:08:37Z

pkg/descheduler/evictions/evictions.go

+// EvictPod evicts a pod while exercising eviction limits.
+// Returns true when the pod is evicted on the server side.
+// Eviction reason can be set through the ctx's evictionReason:STRING pair
+func (pe *PodEvictor) EvictPod(ctx context.Context, pod *v1.Pod) bool {


I'm curious to know why the return signature was changed to not return the error.

Previously, we had the flexibility to return an error for namespace limit reached, node limit reached, and the potential for some other limit in the future. But now it's up to each strategy to handle limit exceeded logic.

Is it because each plugin should now be responsible for deciding if it should continue/abort?

Is it because each plugin should now be responsible for deciding if it should continue/abort?

There are several limits now. Not all of them require to stop processing the current node. E.g. when a namespace limit is reached, a plugin can continue processing another pod on the same node. Thus yes, a plugin is allowed to decide if the right course of action is to stop processing a node and continue with another one. Or, just skipping a pod and taking another on the same node

a7i · 2022-06-24T20:54:14Z

pkg/descheduler/evictions/evictions.go

+// EvictPod evicts a pod while exercising eviction limits.
+// Returns true when the pod is evicted on the server side.
+// Eviction reason can be set through the ctx's evictionReason:STRING pair
+func (pe *PodEvictor) EvictPod(ctx context.Context, pod *v1.Pod) bool {


Can we continue to leave the v1.Node parameter? See #859 for use-case

Replied in #859 (comment)

JaneLiuL

/lgtm

JaneLiuL · 2022-06-29T02:23:35Z

pkg/descheduler/strategies/node_affinity.go

@@ -74,6 +74,7 @@ func RemovePodsViolatingNodeAffinity(ctx context.Context, client clientset.Inter

 		switch nodeAffinity {
 		case "requiredDuringSchedulingIgnoredDuringExecution":


lgtm actually, but i just wonder why this name requiredDuringSchedulingIgnoredDuringExecution so long...

Chosen by the designers: https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#node-affinity. There's probably a historical reason for it.

get it. look good to me now~

ingvagabund · 2022-07-07T12:53:02Z

@a7i @damemi would you please review the PR one more time?

a7i · 2022-07-07T13:51:48Z

/ok-to-test
/lgtm

a7i · 2022-07-08T16:05:07Z

/approve

k8s-ci-robot · 2022-07-08T16:05:22Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: a7i

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [a7i]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

pkg/descheduler/strategies/duplicates.go

damemi · 2022-07-08T16:10:23Z

pkg/descheduler/descheduler.go

@@ -302,7 +302,7 @@ func RunDeschedulerStrategies(ctx context.Context, rs *options.DeschedulerServer
 						continue
 					}
 					evictorFilter := evictions.NewEvictorFilter(nodes, getPodsAssignedToNode, evictLocalStoragePods, evictSystemCriticalPods, ignorePvcPods, evictBarePods, evictions.WithNodeFit(nodeFit), evictions.WithPriorityThreshold(thresholdPriority))
-					f(ctx, rs.Client, strategy, nodes, podEvictor, evictorFilter, getPodsAssignedToNode)
+					f(context.WithValue(ctx, "strategyName", string(name)), rs.Client, strategy, nodes, podEvictor, evictorFilter, getPodsAssignedToNode)


is this intended to be the final approach or just a work in progress step? (passing values through the context)

Passing some values through the context is the final approach. At least the strategyName. Which will get changed into pluginName. The context key can be then read and used on other places.

is there a reason for doing that rather than adding a new parameter to the functions that use it? Looks like this is just being used to pass the strategy name and reason to EvictPod, is that right?

I think a better approach would be to define an "options" struct to pass as an optional param to EvictPod. that struct can have strategy and reasons fields to start with, and if we decide to add more options in the future then we don't need to change EvictPod's signature.

what do you think about something like that? As this is, I don't think it's an appropriate use of context

that said, I don't need to block on this right now since this PR has been open for a while (and that's my bad for not getting around to reviewing until now). If my idea sounds good I'll take it as a follow up refactor and we can unhold this PR

We might as well move the strategy name from EvictPod. It's used mostly for metrics. Then two lines for logging which can be logged without the strategy name (the same for the reason). Instead, printing both the strategy name and the reason (if there's any) outside of EvictPod through a wrapper which will get introduced after the framework primitives are implemented.

My reasoning is to use EvictPod only for the actual eviction. The method does not need to know anything about the strategy/plugin/reason/etc. in order to evict a pod. We can have higher invokers to log the additional information.

If my idea sounds good I'll take it as a follow up refactor and we can unhold this PR

+1 for refactoring the code more in the follow up PR.

I think the idea of passing additional info/options to EvictPod makes sense. It's already a wrapper for a private evictPod function (not sure why though), but as the public interface it seems like a good spot to expose a handle for customization/logging/metrics.

damemi · 2022-07-08T16:11:01Z

/hold
just had 2 questions before this auto-merges

ingvagabund · 2022-07-09T18:27:20Z

Unholding based on #846 (comment)
/hold cancel

k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jun 9, 2022

k8s-ci-robot requested review from lixiang233 and seanmalloy June 9, 2022 09:51

ingvagabund force-pushed the evictor-interface branch from 21acbc7 to a024e0c Compare June 9, 2022 10:05

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jun 9, 2022

ingvagabund changed the title ~~WIP: Pass the strategy name into evictor through context~~ WIP: Prepare pod evictor for the descheduling framework plugin Jun 9, 2022

ingvagabund force-pushed the evictor-interface branch from 7fda69f to 9b38519 Compare June 9, 2022 11:09

a7i reviewed Jun 9, 2022

View reviewed changes

pkg/descheduler/evictions/evictions.go Outdated Show resolved Hide resolved

a7i reviewed Jun 9, 2022

View reviewed changes

pkg/descheduler/evictions/evictions.go Outdated Show resolved Hide resolved

a7i reviewed Jun 9, 2022

View reviewed changes

ingvagabund mentioned this pull request Jun 13, 2022

Descheduling framework: define framework types #837

Closed

12 tasks

ingvagabund force-pushed the evictor-interface branch from 9b38519 to 8874ce6 Compare June 13, 2022 15:52

k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 13, 2022

ingvagabund mentioned this pull request Jun 15, 2022

Migrate RemovePodsViolatingNodeTaints to a plugin #857

Merged

Pass the strategy name into evictor through context

d5ee855

ingvagabund force-pushed the evictor-interface branch from 8874ce6 to f47cfe4 Compare June 16, 2022 14:36

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jun 16, 2022

ingvagabund changed the title ~~WIP: Prepare pod evictor for the descheduling framework plugin~~ Prepare pod evictor for the descheduling framework plugin Jun 16, 2022

k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 16, 2022

a7i reviewed Jun 16, 2022

View reviewed changes

ingvagabund added 2 commits June 17, 2022 10:10

ingvagabund force-pushed the evictor-interface branch from f47cfe4 to c838614 Compare June 17, 2022 08:14

a7i mentioned this pull request Jun 17, 2022

PodLifeTime: Only processes pods already scheduled on a node #858

Closed

a7i reviewed Jun 20, 2022

View reviewed changes

knelasevero mentioned this pull request Jun 23, 2022

Migrate RemovePodsViolatingNodeAffinity to plugin #860

Merged

a7i mentioned this pull request Jun 24, 2022

Migrate RemoveFailedPods to plugin #861

Merged

a7i reviewed Jun 24, 2022

View reviewed changes

JaneLiuL reviewed Jun 29, 2022

View reviewed changes

k8s-ci-robot assigned JaneLiuL Jun 29, 2022

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 29, 2022

a7i approved these changes Jul 7, 2022

View reviewed changes

k8s-ci-robot added the ok-to-test Indicates a non-member PR verified by an org member that is safe to test. label Jul 7, 2022

k8s-ci-robot assigned a7i Jul 7, 2022

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 8, 2022

damemi reviewed Jul 8, 2022

View reviewed changes

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 8, 2022

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 9, 2022

k8s-ci-robot merged commit 6e69a10 into kubernetes-sigs:master Jul 9, 2022

ingvagabund deleted the evictor-interface branch July 9, 2022 18:28

damemi mentioned this pull request Jul 11, 2022

Add EvictOptions struct to EvictPod() #885

Merged

damemi mentioned this pull request Jul 5, 2023

Possible support for evicting pending pods that are stuck. #1183

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prepare pod evictor for the descheduling framework plugin #846

Prepare pod evictor for the descheduling framework plugin #846

ingvagabund commented Jun 9, 2022 •

edited

Loading

ingvagabund commented Jun 9, 2022

ingvagabund commented Jun 9, 2022

a7i Jun 9, 2022

ingvagabund Jun 13, 2022 •

edited

Loading

ingvagabund Jun 13, 2022

ingvagabund commented Jun 13, 2022

ingvagabund commented Jun 13, 2022

ingvagabund commented Jun 16, 2022

a7i Jun 16, 2022 •

edited

Loading

ingvagabund Jun 16, 2022

a7i Jun 16, 2022

ingvagabund Jun 17, 2022

a7i Jun 20, 2022

a7i Jun 20, 2022

ingvagabund Jul 7, 2022

a7i Jun 24, 2022

ingvagabund Jul 7, 2022 •

edited

Loading

JaneLiuL left a comment

JaneLiuL Jun 29, 2022

ingvagabund Jul 7, 2022 •

edited

Loading

JaneLiuL Jul 8, 2022 •

edited

Loading

ingvagabund commented Jul 7, 2022

a7i commented Jul 7, 2022 •

edited

Loading

a7i commented Jul 8, 2022

k8s-ci-robot commented Jul 8, 2022

damemi Jul 8, 2022

ingvagabund Jul 8, 2022 •

edited

Loading

damemi Jul 8, 2022

damemi Jul 8, 2022

damemi Jul 8, 2022

ingvagabund Jul 9, 2022

damemi Jul 11, 2022

damemi commented Jul 8, 2022

ingvagabund commented Jul 9, 2022

		@@ -74,6 +74,7 @@ func RemovePodsViolatingNodeAffinity(ctx context.Context, client clientset.Inter

		switch nodeAffinity {
		case "requiredDuringSchedulingIgnoredDuringExecution":

Prepare pod evictor for the descheduling framework plugin #846

Prepare pod evictor for the descheduling framework plugin #846

Conversation

ingvagabund commented Jun 9, 2022 • edited Loading

ingvagabund commented Jun 9, 2022

ingvagabund commented Jun 9, 2022

Choose a reason for hiding this comment

ingvagabund Jun 13, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ingvagabund commented Jun 13, 2022

ingvagabund commented Jun 13, 2022

ingvagabund commented Jun 16, 2022

a7i Jun 16, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ingvagabund Jul 7, 2022 • edited Loading

Choose a reason for hiding this comment

JaneLiuL left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ingvagabund Jul 7, 2022 • edited Loading

Choose a reason for hiding this comment

JaneLiuL Jul 8, 2022 • edited Loading

Choose a reason for hiding this comment

ingvagabund commented Jul 7, 2022

a7i commented Jul 7, 2022 • edited Loading

a7i commented Jul 8, 2022

k8s-ci-robot commented Jul 8, 2022

Choose a reason for hiding this comment

ingvagabund Jul 8, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

damemi commented Jul 8, 2022

ingvagabund commented Jul 9, 2022

ingvagabund commented Jun 9, 2022 •

edited

Loading

ingvagabund Jun 13, 2022 •

edited

Loading

a7i Jun 16, 2022 •

edited

Loading

ingvagabund Jul 7, 2022 •

edited

Loading

ingvagabund Jul 7, 2022 •

edited

Loading

JaneLiuL Jul 8, 2022 •

edited

Loading

a7i commented Jul 7, 2022 •

edited

Loading

ingvagabund Jul 8, 2022 •

edited

Loading