Not handled case when leader pod have status reason "Terminated" #2

Levitan · 2022-10-19T10:17:44Z

I have a case when all nodes of k8s cluster was rebooted, and pod with operator have a status Failed and reason Terminated.
Library does not handle this case, and Vitess cluster not start after accident on k8s cluster. New pod wait when old leader will be deleted and do nothing.

Name:                 vitess-operator-78b664544-26vc5
Namespace:            default
Priority:             5000
Priority Class Name:  vitess-operator-control-plane
Service Account:      vitess-operator
Node:                 node-name-hash/0.0.0.0
Start Time:           Wed, 19 Oct 2022 12:41:30 +0300
Labels:               app=vitess-operator
                      pod-template-hash=78b664544
Annotations:          kubectl.kubernetes.io/restartedAt: 2022-10-18T15:04:56+03:00
Status:               Failed
Reason:               Terminated
Message:              Pod was terminated in response to imminent node shutdown.

The text was updated successfully, but these errors were encountered:

yoheimuta · 2022-10-21T11:34:46Z

Same here.
I also found the same issue at https://www.digitalocean.com/community/questions/vitess-deadlock-after-kubernetes-restart.

I think this case could impact the availability.

A new pod waits until a lock file of ConfigMap named vitess-operator-lock is deleted by k8s GC. And this lock file will be deleted right after the deletion of its ownerReference, meaning the old leader.
The point is that the pod that was terminated by graceful node shutdown is supposed to be deleted by GC when the number of terminated Pods reaches a threshold. Not by its controller. It could take days to kick off GC.
To conclude, there is a chance of no operator until the next GC.

Currently, we have only two tricky options when the node shutdown happens:

Wait until the number of terminated Pods reaches a threshold (determined by terminated-pod-gc-threshold in the kube-controller-manager). By default, --terminated-pod-gc-threshold is set to 12500.
Manually delete the ConfigMap or the old leader pod.

yoheimuta · 2022-10-24T02:20:23Z

I think we have two approaches to resolve this issue:

Merge the fix from Remove locks for Pods in "Shutdown" status operator-framework/operator-lib#77
Use Leader-with-lease election instead of Leader-for-life. We need to catch up with the operator-sdk v1.0.0 to use this new default.

Approach 1 is easier, but keeping up with k8s is harder.

Approach 2 might be laborious, but it can address the root cause and be line with the de-facto standard upstream.
Linking planetscale/vitess-operator#226 here might be helpful.

yoheimuta · 2022-10-24T02:25:11Z

Merge the fix from operator-framework/operator-lib#77

In this issue, adding Terminated is appropriate.

- podEvicted := pod.Status.Reason == "Evicted" || pod.Status.Reason == "Shutdown"
+ podEvicted := pod.Status.Reason == "Evicted" || pod.Status.Reason == "Terminated"

Levitan · 2022-10-26T06:02:57Z

I think best solution will be without check reason, just switching leader and delete failed pod.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Not handled case when leader pod have status reason "Terminated" #2

Not handled case when leader pod have status reason "Terminated" #2

Levitan commented Oct 19, 2022

yoheimuta commented Oct 21, 2022 •

edited

Loading

yoheimuta commented Oct 24, 2022

yoheimuta commented Oct 24, 2022

Levitan commented Oct 26, 2022

Not handled case when leader pod have status reason "Terminated" #2

Not handled case when leader pod have status reason "Terminated" #2

Comments

Levitan commented Oct 19, 2022

yoheimuta commented Oct 21, 2022 • edited Loading

yoheimuta commented Oct 24, 2022

yoheimuta commented Oct 24, 2022

Levitan commented Oct 26, 2022

yoheimuta commented Oct 21, 2022 •

edited

Loading