-
Notifications
You must be signed in to change notification settings - Fork 741
Client Service "tolerate-unready-endpoints" annotation causes connection issues #2030
Comments
Argh... after much googling and even reading kubernetes source code I found the problem:
Why is the |
Looks like it is deprecated as well in favor of |
Documentation for publishNotReadyAddresses:
Well, this certainly does not fit the use case. This is neither a Stateful Set, nor is the Service a Headless one. |
So, #1257 says "Set TolerateUnreadyEndpoints for service of peer URLs". I have no problem with the peer service. Makes sense to have TolerateUnreadyEndpoints for it. But the same patch also added TolerateUnreadyEndpoints to the client service. I think this one was accidental and is harmful. cc @hongchaodeng. |
@gjcarneiro I agree, the peer URLs are networked by a headless service and the annotation is used in a similar manner as it would be in a StatefulSet (mentioned in the documentation of Each pod checks its own DNS entry before initializing, here: etcd-operator/pkg/util/k8sutil/k8sutil.go Line 373 in aeb3e3e
If only the headless service (one which services Peer URLs) is annotated with
Good catch :) |
Previously unready etcd nodes were already receiving client connections although they are still in the initiation phase and not able to accept any traffic. This caused connection failure or high latency. Fixes #2030 Signed-off-by: Christian Köhn <christian.koehn@figo.io>
Previously unready etcd nodes were already receiving client connections although they are still in the initiation phase and not able to accept any traffic. This caused connection failure or high latency. Fixes coreos#2030 Signed-off-by: Christian Köhn <christian.koehn@figo.io>
I don't know whether this is just a question, or a bug in etcd-operator, or a bug in kubernetes itself.
In any case, I was just playing with a 3 node etcd cluster and the impact on clients when I delete one of the cluster members.
I use a simple command like
etcdctl get --prefix /
in a loop, while I delete one pod, and wait for a replacement pod to appear.The problem happens when the new pod appears, it is not yet Ready, but the Service endpoints include the new pod:
But at the same time the pod is still initialising:
As a result, my etcdctl gets either delayed for 1 second or so, or gets an error, or works fast, randomly.
As soon as the new pod finishes initialising, the service is restored.
To be honest, to be this seems like a Kubernetes bug -- it shouldn't add a pod to a Service endpoints list before the pod is Ready -- but I could be missing something. Otherwise, how can you have zero downtime with etcd, with expected node maintenance requiring me to evict pods from nodes, once in a while?
Any thoughts?
The text was updated successfully, but these errors were encountered: