Skip to content
This repository has been archived by the owner on Jul 30, 2021. It is now read-only.

self hosted etcd has a circular dependency on service ip. #599

Closed
coresolve opened this issue Jun 21, 2017 · 6 comments · Fixed by #626
Closed

self hosted etcd has a circular dependency on service ip. #599

coresolve opened this issue Jun 21, 2017 · 6 comments · Fixed by #626

Comments

@coresolve
Copy link
Contributor

With self hosted etcd we see that significant memory pressure on the masters will cause the cluster to become unstable.

This will mean that the kubelet will eventually mark all the etcd pods unhealthy and thus the kube-proxy will rejecting packets headed for the kube-system/etcd-service:client

At this point the apiserver is unable to start as it is configured to connect to etcd cluster with the service ip address of the self hosted cluster. The service continues to have no health endpoints because the kubelet can't talk to the apiserver and tell it that the service has healthy endpoints.

So now we have a healthy etcd cluster and no way to update the record in etcd via the apiserver.

@chancez
Copy link
Contributor

chancez commented Jun 21, 2017

coreos/etcd-operator#622 seems related.

@hongchaodeng
Copy link
Contributor

Tolerate unreadiness is not enough. We also need to take into account restarts.

@hongchaodeng
Copy link
Contributor

From this comment: kubernetes/kubernetes#25283 (comment)

Seems the tolerate-unready feature might be able to tolerate restart (rolling upgrade). We need to test out though

@coresolve
Copy link
Contributor Author

Further when you restart a node in this condition we aren't checkpointing the sh etcd cluster. So there is no way to get the cluster back in a reboot.

@hongchaodeng
Copy link
Contributor

hongchaodeng commented Jun 28, 2017

Filed upstream issue: kubernetes/kubernetes#47880

@hongchaodeng
Copy link
Contributor

We are going to enable alpha annotation TolerateUnreadyEndpointsAnnotation in etcd service. Will submit a PR shortly.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants