etcd2 client doesn't cycle to next cluster member in some error conditions #8515

rrati · 2017-09-07T14:50:20Z

In Openshift we have seen errors like:

apiserver received an error that is not an metav1.Status: dial tcp :2379: getsockopt: connection refused

This can happen when the etcd cluster leader goes away. In my testing I got it to occur when I manually shut down the etcd process on the node that is acting as the cluster leader. When this condition occurs, the etcd client continually tries to hit the down etcd leader and doesn't cycles to the other available etcd servers for a very long time (10+ minutes).

Looking at the code in client/client.go it seems there are some error cases where the client won't attempt to contact the next server. The OneShot case definitely won't cycle in error conditions.

Is this expected? It seems like you would want the client to cycle to the next cluster member for each request regardless of whether the previous one was a failure or a success.

Fixes etcd-io#8515

Fixes #8515

Fixes etcd-io#8515

heyitsanthony added a commit to heyitsanthony/etcd that referenced this issue Sep 7, 2017

client: fail over to next endpoint on oneshot failure

7028d62

Fixes etcd-io#8515

heyitsanthony mentioned this issue Sep 7, 2017

client: fail over to next endpoint on oneshot failure #8519

Merged

heyitsanthony added a commit to heyitsanthony/etcd that referenced this issue Sep 8, 2017

client: fail over to next endpoint on oneshot failure

76a35e7

Fixes etcd-io#8515

heyitsanthony closed this as completed in #8519 Sep 8, 2017

gyuho pushed a commit that referenced this issue Sep 8, 2017

client: fail over to next endpoint on oneshot failure

15e9510

Fixes #8515

gyuho pushed a commit to gyuho/etcd that referenced this issue Sep 8, 2017

client: fail over to next endpoint on oneshot failure

45193ff

Fixes etcd-io#8515

tormath1 mentioned this issue Aug 3, 2021

app-admin/locksmith: bump commit ID flatcar-archive/coreos-overlay#1161

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

etcd2 client doesn't cycle to next cluster member in some error conditions #8515

etcd2 client doesn't cycle to next cluster member in some error conditions #8515

rrati commented Sep 7, 2017

etcd2 client doesn't cycle to next cluster member in some error conditions #8515

etcd2 client doesn't cycle to next cluster member in some error conditions #8515

Comments

rrati commented Sep 7, 2017