Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not handled case when leader pod have status reason "Terminated" #2

Open
Levitan opened this issue Oct 19, 2022 · 4 comments
Open

Not handled case when leader pod have status reason "Terminated" #2

Levitan opened this issue Oct 19, 2022 · 4 comments

Comments

@Levitan
Copy link

Levitan commented Oct 19, 2022

I have a case when all nodes of k8s cluster was rebooted, and pod with operator have a status Failed and reason Terminated.
Library does not handle this case, and Vitess cluster not start after accident on k8s cluster. New pod wait when old leader will be deleted and do nothing.

Name:                 vitess-operator-78b664544-26vc5
Namespace:            default
Priority:             5000
Priority Class Name:  vitess-operator-control-plane
Service Account:      vitess-operator
Node:                 node-name-hash/0.0.0.0
Start Time:           Wed, 19 Oct 2022 12:41:30 +0300
Labels:               app=vitess-operator
                      pod-template-hash=78b664544
Annotations:          kubectl.kubernetes.io/restartedAt: 2022-10-18T15:04:56+03:00
Status:               Failed
Reason:               Terminated
Message:              Pod was terminated in response to imminent node shutdown.
@yoheimuta
Copy link

yoheimuta commented Oct 21, 2022

Same here.
I also found the same issue at https://www.digitalocean.com/community/questions/vitess-deadlock-after-kubernetes-restart.

I think this case could impact the availability.

A new pod waits until a lock file of ConfigMap named vitess-operator-lock is deleted by k8s GC. And this lock file will be deleted right after the deletion of its ownerReference, meaning the old leader.
The point is that the pod that was terminated by graceful node shutdown is supposed to be deleted by GC when the number of terminated Pods reaches a threshold. Not by its controller. It could take days to kick off GC.
To conclude, there is a chance of no operator until the next GC.

Currently, we have only two tricky options when the node shutdown happens:

  1. Wait until the number of terminated Pods reaches a threshold (determined by terminated-pod-gc-threshold in the kube-controller-manager). By default, --terminated-pod-gc-threshold is set to 12500.
  2. Manually delete the ConfigMap or the old leader pod.

@yoheimuta
Copy link

I think we have two approaches to resolve this issue:

  1. Merge the fix from Remove locks for Pods in "Shutdown" status operator-framework/operator-lib#77
  2. Use Leader-with-lease election instead of Leader-for-life. We need to catch up with the operator-sdk v1.0.0 to use this new default.

Approach 1 is easier, but keeping up with k8s is harder.

Approach 2 might be laborious, but it can address the root cause and be line with the de-facto standard upstream.
Linking planetscale/vitess-operator#226 here might be helpful.

@yoheimuta
Copy link

Merge the fix from operator-framework/operator-lib#77

In this issue, adding Terminated is appropriate.

- podEvicted := pod.Status.Reason == "Evicted" || pod.Status.Reason == "Shutdown"
+ podEvicted := pod.Status.Reason == "Evicted" || pod.Status.Reason == "Terminated"

@Levitan
Copy link
Author

Levitan commented Oct 26, 2022

I think best solution will be without check reason, just switching leader and delete failed pod.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants