Use "tolerate-unready-endpoints" to check etcd pod readiness #622

hongchaodeng · 2017-01-06T23:18:18Z

ref:

Readiness probe optionally not affecting receiving traffic via service kubernetes/kubernetes#39207
StatefulSet should allow optional burst mode (don't wait for readiness) kubernetes/kubernetes#39363

When etcd starts, it has a bootstrap phase that talks to other peers:

Start -> bootstrapping -> running/serving

If "bootstrap" phase failed, it is the same as dead because no data is available, no quorum information is known.

However, if we start etcd in a Kubernetes Pod, even if it starts running, it doesn't mean etcd is running. It is important for the operator to know the status of the etcd member in this phase. We can only proceed with this member after it's "ready".

There is a field in Kubernetes to decouple readiness and endpoints called

TolerateUnreadyEndpointsAnnotation = "service.alpha.kubernetes.io/tolerate-unready-endpoints"

Making use of this will help us differentiate the "bootstrapping" phase.

The text was updated successfully, but these errors were encountered:

xiang90 · 2017-01-10T21:14:18Z

We cannot use alpha feature (as a general rule). We can start to try it out when it becomes beta. Let's keep our workaround for now.

hongchaodeng · 2017-03-13T18:34:39Z

Only affect self hosted
Another use case of this: On self hosted etcd, etcd pods restart could lead to pod endpoints removed from service. This is dangerous and unnecessary. Because etcd pod should restart and recover. It shouldn't remove such endpoints unless pod is deleted.

For example, say we have 3 members of etcd cluster, three of them died. In such case, the etcd service will have no endpoints and self hosted kubernetes cluster won't be able to recover itself. However, if service can tolerate such unready pods and don't remove the endpoints, etcd pods will restart and recover itself.

hongchaodeng · 2017-04-25T21:56:40Z

The field is still alpha in k8s 1.6 .

hongchaodeng · 2017-06-28T18:55:44Z

Some real world experience:
Due to some issue on the node, e.g. node pressure or network partition, etcd pod was restarted and endpoint gets removed.
This could be better tolerated.

xiang90 · 2017-07-22T07:08:35Z

this is already done.

xiang90 added the priority/P3 label Jan 10, 2017

xiang90 added this to the future milestone Jan 10, 2017

hongchaodeng changed the title ~~Use "tolerante-unready-endpoints" to check etcd pod readiness~~ Use "tolerate-unready-endpoints" to check etcd pod readiness Apr 25, 2017

chancez mentioned this issue Jun 21, 2017

self hosted etcd has a circular dependency on service ip. kubernetes-retired/bootkube#599

Closed

xiang90 closed this as completed Jul 22, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use "tolerate-unready-endpoints" to check etcd pod readiness #622

Use "tolerate-unready-endpoints" to check etcd pod readiness #622

hongchaodeng commented Jan 6, 2017

xiang90 commented Jan 10, 2017

hongchaodeng commented Mar 13, 2017 •

edited

Loading

hongchaodeng commented Apr 25, 2017

hongchaodeng commented Jun 28, 2017

xiang90 commented Jul 22, 2017

Use "tolerate-unready-endpoints" to check etcd pod readiness #622

Use "tolerate-unready-endpoints" to check etcd pod readiness #622

Comments

hongchaodeng commented Jan 6, 2017

xiang90 commented Jan 10, 2017

hongchaodeng commented Mar 13, 2017 • edited Loading

hongchaodeng commented Apr 25, 2017

hongchaodeng commented Jun 28, 2017

xiang90 commented Jul 22, 2017

hongchaodeng commented Mar 13, 2017 •

edited

Loading