Skip to content
This repository has been archived by the owner on Mar 28, 2020. It is now read-only.

Use "tolerate-unready-endpoints" to check etcd pod readiness #622

Closed
hongchaodeng opened this issue Jan 6, 2017 · 5 comments
Closed

Use "tolerate-unready-endpoints" to check etcd pod readiness #622

hongchaodeng opened this issue Jan 6, 2017 · 5 comments
Milestone

Comments

@hongchaodeng
Copy link
Member

ref:

When etcd starts, it has a bootstrap phase that talks to other peers:

Start -> bootstrapping -> running/serving

If "bootstrap" phase failed, it is the same as dead because no data is available, no quorum information is known.

However, if we start etcd in a Kubernetes Pod, even if it starts running, it doesn't mean etcd is running. It is important for the operator to know the status of the etcd member in this phase. We can only proceed with this member after it's "ready".

There is a field in Kubernetes to decouple readiness and endpoints called

TolerateUnreadyEndpointsAnnotation = "service.alpha.kubernetes.io/tolerate-unready-endpoints"

Making use of this will help us differentiate the "bootstrapping" phase.

@xiang90
Copy link
Collaborator

xiang90 commented Jan 10, 2017

We cannot use alpha feature (as a general rule). We can start to try it out when it becomes beta. Let's keep our workaround for now.

@xiang90 xiang90 added this to the future milestone Jan 10, 2017
@hongchaodeng
Copy link
Member Author

hongchaodeng commented Mar 13, 2017

Only affect self hosted
Another use case of this: On self hosted etcd, etcd pods restart could lead to pod endpoints removed from service. This is dangerous and unnecessary. Because etcd pod should restart and recover. It shouldn't remove such endpoints unless pod is deleted.

For example, say we have 3 members of etcd cluster, three of them died. In such case, the etcd service will have no endpoints and self hosted kubernetes cluster won't be able to recover itself. However, if service can tolerate such unready pods and don't remove the endpoints, etcd pods will restart and recover itself.

@hongchaodeng hongchaodeng changed the title Use "tolerante-unready-endpoints" to check etcd pod readiness Use "tolerate-unready-endpoints" to check etcd pod readiness Apr 25, 2017
@hongchaodeng
Copy link
Member Author

The field is still alpha in k8s 1.6 .

@hongchaodeng
Copy link
Member Author

Some real world experience:
Due to some issue on the node, e.g. node pressure or network partition, etcd pod was restarted and endpoint gets removed.
This could be better tolerated.

@xiang90
Copy link
Collaborator

xiang90 commented Jul 22, 2017

this is already done.

@xiang90 xiang90 closed this as completed Jul 22, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants