-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Ready status condition #24
Comments
Any option like "pods & sts are ready but quorum is lost"? Shouldn't we check the health of the cluster throughout its entire lifecycle? I mean, that etcd cluster controller will react the changes of sts state and act accordingly since controller reference is set, but in case loss of quorum or anything else happen to etcd app which will not affect pod/sts state operator will not be informed. Thinking out loud. |
@aobort If I got it right, the case you mentioned is the first if. We should check that sts is ready AND check if cluster has quorum at all and there're desired number of members. If not, update cluster state accordingly. |
@sircthulhu I didn't get from the description and from the answer should this status be continiously updated or just once? I see the etcd-operator (controller) as a "watchdog" who updates the status by some events/regularly. I will try to dive deeper into the docs and code to get such answers but currently it is not clear should this logic be implemented while working on this issue or not. |
@Kirill-Garbar if I got things right, the goal at the moment is to implement such a check on etcd cluster creation:
Therefore, from my perspective, the workflow should be like the following:
Next reconciliation run schould be triggered by the update of sts state - NotReady -> Ready:
Continuous state tracking is definitely should be implemented. But, from my perspective it should be implemented separately from the controller's code. May be as another service which implements |
@aobort I'd like to make it more precise
We do not rely on what's written in status to get the current state, I agree with the point that continuous tracking should be implemented as |
Absolutely. I meant, that the update of sts object (it does not matter whether it will be regular status update or manual or whatnot) will trigger etcd cluster object reconciliation. 'Cuz there is the controller reference applied to sts. |
This requires additional condition, eg: |
Internaly we agreed to implement just single condition for now -
Every pod in StatefulSet should have healthchecks defined: We can borrow the logic from Kubeadm, for example: Here is latest issue where they were updated: Hovewer @Uburro noted that:
|
operator is not monitoring solution. Also if you would implement it, the data in CR will be always stale. |
Agree. However according to plans to implement maintenance of the cluster monitoring of its state becomes mandatory capability.
It depends. In case cluster health was not changed, there would be no reason for watchdog to update CR. |
According to this KEP, the ETCD introduced full-fledged checks of readiness and liveness probes. /readyz for us mention that no raft leader, raft loop deadlock, data coraption. In this case, it will disappear from the endpoints and for our controller this will be a signal that everything is ok or not ok with the cluster. We also subscribe to events from the shared informer so there is no need for the controller to go through the clusters and poll their status. This will be done by Kubernetes native way itself using kubelet. /livez will be a signal about a problem with the database (apps) itself and a trigger for its restart. This way we avoid unnecessary restarting of the etcd. This is useful in cases where the etcd will be used outside the kubernetes. |
etcd-io/etcd#16666 we can think about --experimental-wait-cluster-ready-timeout with starupprobe in a feature |
Update cluster Ready status according to StatefulSet status and update ConfigMap cluster state to `existing` after first time STS is ready fixes #24
In general
We need to implement checking quorum status of an initialized cluster and update
Ready
status condition in accordance. After initializing cluster and making sure pods found each other and formed a cluster, controller must update cluster state configmap to set cluster stateexisting
(fromnew
)Design proposal
When cluster is initialized, we should check if:
Ready
existing
.If configmap already has
existing
state, do not change it anymore, as cluster should be already initializedThe text was updated successfully, but these errors were encountered: