Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

etcd does SRV look up for restart and then reads cluster file. It should do only 1 of file/SRV #5444

Closed
raoofm opened this issue May 24, 2016 · 6 comments · Fixed by #5918
Closed
Assignees
Milestone

Comments

@raoofm
Copy link
Contributor

raoofm commented May 24, 2016

etcd does SRV look up for restart and then reads cluster file whereas it should not do SRV lookup when already initialized and member info of cluster is present in store.

2016-05-24 17:52:58.457637 I | etcdmain: etcd Version: 2.3.5
2016-05-24 17:52:58.457741 I | etcdmain: Git SHA: a535dc9
2016-05-24 17:52:58.457750 I | etcdmain: Go Version: go1.6.2
2016-05-24 17:52:58.457757 I | etcdmain: Go OS/Arch: linux/amd64
2016-05-24 17:52:58.457765 I | etcdmain: setting maximum number of CPUs to 2, total number of available CPUs is 2
2016-05-24 17:52:58.457821 N | etcdmain: the server is already initialized as member before, starting as etcd member...
2016-05-24 17:52:58.475233 N | discovery: got bootstrap from DNS for etcd-server-ssl at https://node01.etcd.dev.com:2380
2016-05-24 17:52:58.475268 N | discovery: got bootstrap from DNS for etcd-server-ssl at https://node02.etcd.dev.com:2380
2016-05-24 17:52:58.475283 N | discovery: got bootstrap from DNS for etcd-server-ssl at https://node03.etcd.dev.com:2380
2016-05-24 17:52:58.477767 I | etcdmain: peerTLS: cert = /var/home/rm/node03.etcd.dev.com.cert.pem, key = /var/home/rm/node03.etcd.dev.com.pem, ca = , trusted-ca = /var/home/rm/ca.cert.pem, client-cert-auth = false
2016-05-24 17:52:58.479693 I | etcdmain: listening for peers on https://node03.etcd.dev.com:2380
2016-05-24 17:52:58.479712 I | etcdmain: clientTLS: cert = /var/home/rm/node03.etcd.dev.com.cert.pem, key = /var/home/rm/node03.etcd.dev.com.pem, ca = , trusted-ca = /var/home/rm/ca.cert.pem, client-cert-auth = false
2016-05-24 17:52:58.481425 I | etcdmain: listening for client requests on https://node03.etcd.dev.com:2379
2016-05-24 17:52:58.489862 I | etcdserver: recovered store from snapshot at index 44374437
2016-05-24 17:52:58.489883 I | etcdserver: name = node03
2016-05-24 17:52:58.489890 I | etcdserver: data dir = /var/lib/etcd/cluster/datadir
2016-05-24 17:52:58.489898 I | etcdserver: member dir = /var/lib/etcd/cluster/datadir/member
2016-05-24 17:52:58.489904 I | etcdserver: heartbeat = 200ms
2016-05-24 17:52:58.489909 I | etcdserver: election = 2000ms
2016-05-24 17:52:58.489915 I | etcdserver: snapshot count = 10000
2016-05-24 17:52:58.489926 I | etcdserver: advertise client URLs = https://node03.etcd.dev.com:2379
2016-05-24 17:52:58.855136 I | etcdserver: restarting member 4c5854cca2fdc9c in cluster baf099e94f57d45e at commit index 44375903
2016-05-24 17:52:58.855268 I | raft: 4c5854cca2fdc9c became follower at term 244
2016-05-24 17:52:58.855301 I | raft: newRaft 4c5854cca2fdc9c [peers: [4c5854cca2fdc9c,5f97c3330e24a56d,f95caeae097ab105], term: 244, commit: 44375903, applied: 44374437, lastindex: 44375903, lastterm: 244]
2016-05-24 17:52:58.856002 I | etcdserver: added member 4c5854cca2fdc9c [https://node03.etcd.dev.com:2380] to cluster baf099e94f57d45e from store
2016-05-24 17:52:58.856018 I | etcdserver: added member 5f97c3330e24a56d [https://node02.etcd.dev.com:2380] to cluster baf099e94f57d45e from store
2016-05-24 17:52:58.856027 I | etcdserver: added member f95caeae097ab105 [https://node01.etcd.dev.com:2380] to cluster baf099e94f57d45e from store
2016-05-24 17:52:58.856037 I | etcdserver: set the cluster version to 2.3 from store
2016-05-24 17:52:58.874243 I | etcdserver: starting server... [version: 2.3.5, cluster version: 2.3]
2016-05-24 17:52:58.959339 I | rafthttp: the connection with 5f97c3330e24a56d became active
2016-05-24 17:52:58.969477 I | rafthttp: the connection with f95caeae097ab105 became active
2016-05-24 17:52:59.001823 I | raft: raft.node: 4c5854cca2fdc9c elected leader f95caeae097ab105 at term 244
2016-05-24 17:52:59.045741 I | etcdserver: published {Name:node03 ClientURLs:[https://node03.etcd.dev.com:2379]} to cluster baf099e94f57d45e

it should be from 1 of the two and not from both. Wondering what would be the behavior if SRV fails.

Similar to #3753

@xiang90 xiang90 self-assigned this May 27, 2016
@xiang90 xiang90 added this to the v3.1.0 milestone May 29, 2016
@xiang90
Copy link
Contributor

xiang90 commented May 29, 2016

@raoofm You are right. We need to fix this. However, we need to do a big surgery to make etcd server discovery better. This is not going to happen in our 3.0 timeline.

@raoofm
Copy link
Contributor Author

raoofm commented May 30, 2016

@xiang90 😄 makes sense

@raoofm
Copy link
Contributor Author

raoofm commented May 31, 2016

@xiang90 Will this be released as v2.3.7 or as v3.1.0

@xiang90
Copy link
Contributor

xiang90 commented May 31, 2016

@raoofm 3.1.0 or 3.0.x

@raoofm
Copy link
Contributor Author

raoofm commented May 31, 2016

@xiang90 do you guys plan to backport it.

Just checking, to decide if I should wait for it or go ahead with 2.3.6 prod upgrade.

@xiang90
Copy link
Contributor

xiang90 commented May 31, 2016

@raoofm It depends. If it goes into 3.0.x, we will def back-port it. If it goes into 3.1, we might (it is TBD)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging a pull request may close this issue.

2 participants