Allow a user specifiable node draining timeout #2331

ajwdev · 2020-02-13T22:50:27Z

As a user/operator, I would like the option to specify a timeout for how long the machine controller will wait for a node to drain before forcibly removing the machine.

Today, if a user has specified a pod disruption budget that cannot be fulfilled, CAPI could wait forever on a machine/node that is running a pod whose budget could not be met and will never successfully drain. I would like the option to say ...

Wait X minutes for a node to drain.
If the node does not drain within that amount of time, continue on with the deletion anyways.

This would prevent CAPI from getting suck in an infinite reconcile loop during certain cluster conditions.

/kind feature

vincepri · 2020-02-14T22:43:33Z

/help
/milestone v0.3.0
/priority backlog

k8s-ci-robot · 2020-02-14T22:43:34Z

@vincepri:
This request has been marked as needing help from a contributor.

Please ensure the request meets the requirements listed here.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.

In response to this:

/help
/milestone v0.3.0
/priority backlog

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

SataQiu · 2020-02-15T09:23:44Z

/cc

vincepri · 2020-03-02T17:14:05Z

/milestone Next

ncdc · 2020-05-11T19:29:06Z

@williamsandrew does #2165 solve this for you?

ncdc · 2020-05-11T19:47:08Z

Oh wait, that PR is only for unreachable/unready nodes

ajwdev · 2020-05-11T22:19:33Z

Correct. My use case would include nodes that are accessible but are still unable to drain within a certain period of time.

wfernandes · 2020-06-26T18:12:03Z

/assign
Investigating what needs to be done here.

wfernandes · 2020-07-06T19:56:14Z

Just wanted to get some clarification on expected behavior.

In the Machine Controller, when we are draining the nodes, the default behavior is to Evict the pods before deleting them.

In this scenario of evicting the pods and then deleting them we are waiting forever for the pods to be deleted. The timeout property is set to MaxInt64 in evictPods().

cluster-api/third_party/kubernetes-drain/drain.go

Lines 261 to 273 in 5f0a49e

    
           params := waitForDeleteParams{ 
        
           	ctx:                             ctx, 
        
           	pods:                            []corev1.Pod{pod}, 
        
           	interval:                        1 * time.Second, 
        
           	timeout:                         time.Duration(math.MaxInt64), 
        
           	usingEviction:                   true, 
        
           	getPodFn:                        getPodFn, 
        
           	onDoneFn:                        d.OnPodDeletedOrEvicted, 
        
           	globalTimeout:                   globalTimeout, 
        
           	skipWaitForDeleteTimeoutSeconds: d.SkipWaitForDeleteTimeoutSeconds, 
        
           	out:                             d.Out, 
        
           } 
        
           _, err := waitForDelete(params)

However if we just delete the pods directly by calling deletePods() we only wait for 20s to delete.
Here timeout: globalTimeout which is 20s by default.

cluster-api/third_party/kubernetes-drain/drain.go

Lines 315 to 327 in 5f0a49e

    
           params := waitForDeleteParams{ 
        
           	ctx:                             ctx, 
        
           	pods:                            pods, 
        
           	interval:                        1 * time.Second, 
        
           	timeout:                         globalTimeout, 
        
           	usingEviction:                   false, 
        
           	getPodFn:                        getPodFn, 
        
           	onDoneFn:                        d.OnPodDeletedOrEvicted, 
        
           	globalTimeout:                   globalTimeout, 
        
           	skipWaitForDeleteTimeoutSeconds: d.SkipWaitForDeleteTimeoutSeconds, 
        
           	out:                             d.Out, 
        
           } 
        
           _, err := waitForDelete(params)

It seems that the MachineController always calls drainNode with DisableEviction: false (default value for kubedrain.Helper)

cluster-api/controllers/machine_controller.go

Lines 389 to 426 in 5f0a49e

    
           drainer := &kubedrain.Helper{ 
        
           	Client:              kubeClient, 
        
           	Force:               true, 
        
           	IgnoreAllDaemonSets: true, 
        
           	DeleteLocalData:     true, 
        
           	GracePeriodSeconds:  -1, 
        
           	// If a pod is not evicted in 20 seconds, retry the eviction next time the 
        
           	// machine gets reconciled again (to allow other machines to be reconciled). 
        
           	Timeout: 20 * time.Second, 
        
           	OnPodDeletedOrEvicted: func(pod *corev1.Pod, usingEviction bool) { 
        
           		verbStr := "Deleted" 
        
           		if usingEviction { 
        
           			verbStr = "Evicted" 
        
           		} 
        
           		logger.Info(fmt.Sprintf("%s pod from Node", verbStr), 
        
           			"pod", fmt.Sprintf("%s/%s", pod.Name, pod.Namespace)) 
        
           	}, 
        
           	Out:    writer{klog.Info}, 
        
           	ErrOut: writer{klog.Error}, 
        
           	DryRun: false, 
        
           } 
        
           if noderefutil.IsNodeUnreachable(node) { 
        
           	// When the node is unreachable and some pods are not evicted for as long as this timeout, we ignore them. 
        
           	drainer.SkipWaitForDeleteTimeoutSeconds = 60 * 5 // 5 minutes 
        
           } 
        
           if err := kubedrain.RunCordonOrUncordon(drainer, node, true); err != nil { 
        
           	// Machine will be re-reconciled after a cordon failure. 
        
           	logger.Error(err, "Cordon failed") 
        
           	return errors.Errorf("unable to cordon node %s: %v", node.Name, err) 
        
           } 
        
           if err := kubedrain.RunNodeDrain(drainer, node.Name); err != nil { 
        
           	// Machine will be re-reconciled after a drain failure. 
        
           	logger.Error(err, "Drain failed") 
        
           	return &capierrors.RequeueAfterError{RequeueAfter: 20 * time.Second} 
        
           }

So we are going to wait forever when trying to delete pods after eviction. My question is Why are we waiting forever in one case but only 20s in the other?

Feel free to reach out on slack to discuss more. 🙂

Other thoughts:

Can we use the timeout field and make that configurable instead of adding yet another NodeDrainTimeout to the kubedrain helper?
Where did we get Timeout: 20s from?

vincepri · 2020-07-06T20:03:36Z

Thanks for the great write-up @wfernandes, let's sync up at the community meeting. +1 to using a different default where we're using MaxInt64, seems rather a large timeout. What would be a good default we can get in v0.3.7 @detiber ?

wfernandes · 2020-07-06T20:09:50Z

@vincepri I added a topic under the General Questions section for the July 8th meeting.

detiber · 2020-07-06T20:36:49Z

If I had to hazard a guess, the idea was to try to err on the side of workload safety, since pod deletion should be subject to any pod disruption budgets that are configured if we force early deletion we could be affecting configured pod disruption budgets.

It does seem that MaxInt64 is a tad big excessive, but I would think that 20s is probably a bit short if we want to avoid evicting or deleting user workloads that could violate disruption budgets.

That said, if we want to respect PDBs, then we need to also ensure we are calling the eviction APIs by default as well: https://kubernetes.io/docs/concepts/workloads/pods/disruptions/#how-disruption-budgets-work

namnx228 · 2020-09-14T14:36:35Z

Hi all, is there anyone who is working with this issue? If not, I'd like to take care of it. My plan is:

First, add a NodeDrainTimeout parameter to KCP and machinedeployment, and these objects then pass the parameter to the corresponding machines.
Next, set the time that the machine drains for the first time to its status/conditions.
Finally, if the NodeDrainTimeout is over, forcefully delete the machine.
How do you think about this solution?

wfernandes · 2020-09-14T15:22:51Z

Hey @namnx228 This issue is all yours. Feel free to /assign and mark it as /lifecycle active.

I believe this issue was specifically for worker machines and not the control plane nodes. However, if there is a use case for the control plane then I think this can be done all at once.

I don't have much context for this specifically but maybe @detiber and/or @vincepri can offer some insight. Else bring it up as a topic during this week's CAPI meeting.

/unassign

namnx228 · 2020-09-15T06:38:49Z

/assign
/lifecycle active
@wfernandes that is great, Thanks.
Yes, there is an use case that we need to have workload running on control plane in baremetal, so it would be good to take care of KCP as well.
I will bring ask on Slack or in the meeting if I need more explanations.

ncdc · 2020-09-15T13:52:59Z

@namnx228 thanks for working on this!

First, add a NodeDrainTimeout parameter to KCP and machinedeployment, and these objects then pass the parameter to the corresponding machines.

I think it makes sense to add this to the KCP spec, but the MachineDeployment already has the MachineSpec, so I don't know that you'd need to add anything to the MachineDeployment (since you'll be adding a field to MachineSpec). WDYT?

namnx228 · 2020-09-15T14:02:02Z

@ncdc yes, that is a good idea to make use of the machineSpec instead of making change to the machinedeployment. Thanks for your suggestion.

k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Feb 13, 2020

k8s-ci-robot added the priority/backlog Higher priority than priority/awaiting-more-evidence. label Feb 14, 2020

k8s-ci-robot added this to the v0.3.0 milestone Feb 14, 2020

k8s-ci-robot added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Feb 14, 2020

k8s-ci-robot modified the milestones: v0.3.0, Next Mar 2, 2020

vincepri added the area/machine Issues or PRs related to machine lifecycle management label Mar 2, 2020

k8s-ci-robot assigned wfernandes Jun 26, 2020

benmoss mentioned this issue Jul 6, 2020

KCP resilience to machine disk space issues #3289

Closed

ncdc mentioned this issue Sep 14, 2020

Forcefully remove a node after a certain amount of time that it keeps failing to be successfully drained. #3635

Closed

k8s-ci-robot unassigned wfernandes Sep 14, 2020

k8s-ci-robot assigned namnx228 Sep 15, 2020

k8s-ci-robot added the lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. label Sep 15, 2020

This was referenced Sep 17, 2020

✨ Add option to ignore draining after a while Nordix/cluster-api#5

Closed

✨ Add the ability to specify a drain timeout for machines #3662

Merged

k8s-ci-robot closed this as completed in #3662 Sep 30, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow a user specifiable node draining timeout #2331

Allow a user specifiable node draining timeout #2331

ajwdev commented Feb 13, 2020

vincepri commented Feb 14, 2020

k8s-ci-robot commented Feb 14, 2020

SataQiu commented Feb 15, 2020

vincepri commented Mar 2, 2020

ncdc commented May 11, 2020

ncdc commented May 11, 2020

ajwdev commented May 11, 2020

wfernandes commented Jun 26, 2020

wfernandes commented Jul 6, 2020

vincepri commented Jul 6, 2020

wfernandes commented Jul 6, 2020

detiber commented Jul 6, 2020

namnx228 commented Sep 14, 2020

wfernandes commented Sep 14, 2020

namnx228 commented Sep 15, 2020

ncdc commented Sep 15, 2020

namnx228 commented Sep 15, 2020

Allow a user specifiable node draining timeout #2331

Allow a user specifiable node draining timeout #2331

Comments

ajwdev commented Feb 13, 2020

vincepri commented Feb 14, 2020

k8s-ci-robot commented Feb 14, 2020

SataQiu commented Feb 15, 2020

vincepri commented Mar 2, 2020

ncdc commented May 11, 2020

ncdc commented May 11, 2020

ajwdev commented May 11, 2020

wfernandes commented Jun 26, 2020

wfernandes commented Jul 6, 2020

vincepri commented Jul 6, 2020

wfernandes commented Jul 6, 2020

detiber commented Jul 6, 2020

namnx228 commented Sep 14, 2020

wfernandes commented Sep 14, 2020

namnx228 commented Sep 15, 2020

ncdc commented Sep 15, 2020

namnx228 commented Sep 15, 2020