Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

etcd2 client doesn't cycle to next cluster member in some error conditions #8515

Closed
rrati opened this issue Sep 7, 2017 · 0 comments
Closed

Comments

@rrati
Copy link

rrati commented Sep 7, 2017

In Openshift we have seen errors like:

apiserver received an error that is not an metav1.Status: dial tcp :2379: getsockopt: connection refused

This can happen when the etcd cluster leader goes away. In my testing I got it to occur when I manually shut down the etcd process on the node that is acting as the cluster leader. When this condition occurs, the etcd client continually tries to hit the down etcd leader and doesn't cycles to the other available etcd servers for a very long time (10+ minutes).

Looking at the code in client/client.go it seems there are some error cases where the client won't attempt to contact the next server. The OneShot case definitely won't cycle in error conditions.

Is this expected? It seems like you would want the client to cycle to the next cluster member for each request regardless of whether the previous one was a failure or a success.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

1 participant