Skip to content
This repository has been archived by the owner on Mar 28, 2020. It is now read-only.

Client Service "tolerate-unready-endpoints" annotation causes connection issues #2030

Closed
gjcarneiro opened this issue Dec 18, 2018 · 6 comments · Fixed by #2063
Closed

Client Service "tolerate-unready-endpoints" annotation causes connection issues #2030

gjcarneiro opened this issue Dec 18, 2018 · 6 comments · Fixed by #2063

Comments

@gjcarneiro
Copy link

I don't know whether this is just a question, or a bug in etcd-operator, or a bug in kubernetes itself.

In any case, I was just playing with a 3 node etcd cluster and the impact on clients when I delete one of the cluster members.

I use a simple command like etcdctl get --prefix / in a loop, while I delete one pod, and wait for a replacement pod to appear.

The problem happens when the new pod appears, it is not yet Ready, but the Service endpoints include the new pod:

$ kubectl get ep example-etcd-cluster-client
[...]
example-etcd-cluster-client   10.125.27.21:2379,10.125.6.10:2379,10.126.8.18:2379   41m

But at the same time the pod is still initialising:

example-etcd-cluster-kbn69vn2xn   0/1     Init:0/1   0          6s    10.125.6.10    hex-48b-pm.k2.gambit   <none>           <none>

As a result, my etcdctl gets either delayed for 1 second or so, or gets an error, or works fast, randomly.

As soon as the new pod finishes initialising, the service is restored.

To be honest, to be this seems like a Kubernetes bug -- it shouldn't add a pod to a Service endpoints list before the pod is Ready -- but I could be missing something. Otherwise, how can you have zero downtime with etcd, with expected node maintenance requiring me to evict pods from nodes, once in a while?

Any thoughts?

@gjcarneiro
Copy link
Author

Argh... after much googling and even reading kubernetes source code I found the problem:

07:42:09 {master} ~/Downloads/etcd-operator$ kubectl --context=dev get svc  example-etcd-cluster-client -o yaml
apiVersion: v1
kind: Service
metadata:
  annotations:
    service.alpha.kubernetes.io/tolerate-unready-endpoints: "true"

Why is the tolerate-unready-endpoints annotation being used, and how can I get rid of it?

@hexfusion
Copy link
Member

Why is the tolerate-unready-endpoints annotation being used, and how can I get rid of it?

Looks like it is deprecated as well in favor of Service.spec.publishNotReadyAddresses but we need to understand it better as well. Can you keep digging on this for us?

kubernetes/kubernetes#63742

@gjcarneiro
Copy link
Author

This is something you set here

This is the commit: 5635478

I think this is a very bad idea. Clients will try to connect to an etcd node that is not ready, causing high latency or even connection failure. And unfortunately I don't even see an option to disable this.

@gjcarneiro
Copy link
Author

Documentation for publishNotReadyAddresses:

publishNotReadyAddresses, when set to true, indicates that DNS implementations must publish the notReadyAddresses of subsets for the Endpoints associated with the Service. The default value is false. The primary use case for setting this field is to use a StatefulSet’s Headless Service to propagate SRV records for its Pods without respect to their readiness for purpose of peer discovery. This field will replace the service.alpha.kubernetes.io/tolerate-unready-endpoints when that annotation is deprecated and all clients have been converted to use this field.

Well, this certainly does not fit the use case. This is neither a Stateful Set, nor is the Service a Headless one.

@gjcarneiro gjcarneiro changed the title When a pod is still initialising, the -client service endpoints already include it Client Service "tolerate-unready-endpoints" annotation causes connection issues Dec 19, 2018
@gjcarneiro
Copy link
Author

So, #1257 says "Set TolerateUnreadyEndpoints for service of peer URLs".

I have no problem with the peer service. Makes sense to have TolerateUnreadyEndpoints for it.

But the same patch also added TolerateUnreadyEndpoints to the client service. I think this one was accidental and is harmful. cc @hongchaodeng.

@alaypatel07
Copy link
Collaborator

@gjcarneiro I agree, the peer URLs are networked by a headless service and the annotation is used in a similar manner as it would be in a StatefulSet (mentioned in the documentation of publishNotReadyAddress).

Each pod checks its own DNS entry before initializing, here:

Name: "check-dns",

If only the headless service (one which services Peer URLs) is annotated with publishNotReadyAddresses:

  1. A DNS entry will be populated because of the headless service, and check-dns will resolve correctly.
  2. The [cluster-name]-client endpoint will not be populated with an unready pod IP:port-number, resolving this issue.

Good catch :)

hexfusion pushed a commit that referenced this issue Apr 17, 2019
Previously unready etcd nodes were already receiving client connections
although they are still in the initiation phase and not able to accept
any traffic. This caused connection failure or high latency.

Fixes #2030

Signed-off-by: Christian Köhn <christian.koehn@figo.io>
kapouille pushed a commit to Polystream/etcd-operator that referenced this issue May 16, 2019
Previously unready etcd nodes were already receiving client connections
although they are still in the initiation phase and not able to accept
any traffic. This caused connection failure or high latency.

Fixes coreos#2030

Signed-off-by: Christian Köhn <christian.koehn@figo.io>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants