Skip to content
This repository has been archived by the owner on Mar 28, 2020. It is now read-only.

etcd member failing to bootstrap and exits because of dial tcp timeout #1330

Closed
hasbro17 opened this issue Jul 27, 2017 · 1 comment
Closed

Comments

@hasbro17
Copy link
Contributor

The etcd member can occasionally fail to bootstrap (and exit), by failing to talk to another member due to a dail tcp timeout.

# Logs of etcd member test-etcd-m2pfm-0001
etcdserver: could not get cluster response from http://test-etcd-m2pfm-0000.test-etcd-m2pfm.e2e-etcd-operator-flake-164.svc.cluster.local:2380: Get http://test-etcd-m2pfm-0000.test-etcd-m2pfm.e2e-etcd-operator-flake-164.svc.cluster.local:2380/members:
dial tcp: i/o timeout

etcdmain: cannot fetch cluster info from peer urls: could not retrieve cluster information from the given urls

If this happens for the 2nd member added to the cluster, it can lose quorum and go into disaster recovery.

The exact cause for the dial tcp timeout is not known. Putting this issue as a reference in case we encounter the above problem more frequently.

@hasbro17
Copy link
Contributor Author

Already tracking in #1300

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant