Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Snapshot does not match after re-joining into a V3 cluster. #5898

Closed
sergeyfd opened this issue Jul 7, 2016 · 7 comments
Closed

Snapshot does not match after re-joining into a V3 cluster. #5898

sergeyfd opened this issue Jul 7, 2016 · 7 comments

Comments

@sergeyfd
Copy link

sergeyfd commented Jul 7, 2016

When I try to re-add a failed node back into a cluster by removing it first and adding it back pretty often I see the following issue:

A member joins cluster win no issues:

...
2016-07-07 17:58:51.762640 I | rafthttp: stopped peer dc0dc2965696a868
2016-07-07 17:58:51.762650 I | rafthttp: removed peer dc0dc2965696a868
2016-07-07 17:58:51.762657 I | etcdserver: finished removing old peers from network
2016-07-07 17:58:51.762663 I | etcdserver: adding peers from new cluster configuration into network...
2016-07-07 17:58:51.762688 I | rafthttp: starting peer 21441eef36bcf00f...
2016-07-07 17:58:51.762717 I | rafthttp: started HTTP pipelining with peer 21441eef36bcf00f
2016-07-07 17:58:51.767398 I | rafthttp: started peer 21441eef36bcf00f
2016-07-07 17:58:51.767433 I | rafthttp: added peer 21441eef36bcf00f
2016-07-07 17:58:51.767454 I | rafthttp: starting peer dc0dc2965696a868...
2016-07-07 17:58:51.767510 I | rafthttp: started HTTP pipelining with peer dc0dc2965696a868
2016-07-07 17:58:51.772597 I | rafthttp: started peer dc0dc2965696a868
2016-07-07 17:58:51.772639 I | rafthttp: added peer dc0dc2965696a868
2016-07-07 17:58:51.772651 I | etcdserver: finished adding peers from new cluster configuration into network...
2016-07-07 17:58:51.772664 I | etcdserver: finished applying incoming snapshot at index 0
2016-07-07 17:58:51.783640 I | rafthttp: started streaming with peer 21441eef36bcf00f (writer)
2016-07-07 17:58:51.783683 I | rafthttp: started streaming with peer 21441eef36bcf00f (writer)
2016-07-07 17:58:51.783705 I | rafthttp: started streaming with peer 21441eef36bcf00f (stream MsgApp v2 reader)
2016-07-07 17:58:51.784035 I | rafthttp: started streaming with peer 21441eef36bcf00f (stream Message reader)
2016-07-07 17:58:51.784112 I | rafthttp: started streaming with peer dc0dc2965696a868 (writer)
2016-07-07 17:58:51.784136 I | rafthttp: started streaming with peer dc0dc2965696a868 (writer)
2016-07-07 17:58:51.784152 I | rafthttp: started streaming with peer dc0dc2965696a868 (stream MsgApp v2 reader)
2016-07-07 17:58:51.784459 I | rafthttp: started streaming with peer dc0dc2965696a868 (stream Message reader)
2016-07-07 17:58:51.963328 I | rafthttp: peer dc0dc2965696a868 became active
2016-07-07 17:58:51.963362 I | rafthttp: established a TCP streaming connection with peer dc0dc2965696a868 (stream MsgApp v2 reader)
2016-07-07 17:58:52.097748 I | rafthttp: established a TCP streaming connection with peer dc0dc2965696a868 (stream MsgApp v2 writer)
2016-07-07 17:58:52.099848 I | rafthttp: established a TCP streaming connection with peer dc0dc2965696a868 (stream Message reader)
2016-07-07 17:58:52.102643 I | rafthttp: established a TCP streaming connection with peer dc0dc2965696a868 (stream Message writer)
2016-07-07 17:58:52.103427 I | rafthttp: peer 21441eef36bcf00f became active
2016-07-07 17:58:52.103444 I | rafthttp: established a TCP streaming connection with peer 21441eef36bcf00f (stream MsgApp v2 writer)
2016-07-07 17:58:52.145793 I | rafthttp: established a TCP streaming connection with peer 21441eef36bcf00f (stream MsgApp v2 reader)
2016-07-07 17:58:52.151191 I | rafthttp: established a TCP streaming connection with peer 21441eef36bcf00f (stream Message reader)
2016-07-07 17:58:52.168383 I | rafthttp: established a TCP streaming connection with peer 21441eef36bcf00f (stream Message writer)
2016-07-07 17:58:52.175224 I | etcdserver: published {Name:.. ClientURLs:[...]} to cluster fb9e691a81dea324
2016-07-07 17:58:52.175241 I | etcdmain: ready to serve client requests
2016-07-07 17:58:52.175596 I | etcdmain: serving client requests on ...
2016-07-07 17:58:52.175817 E | etcdmain: failed to notify systemd for readiness: No socket
2016-07-07 17:58:52.175829 E | etcdmain: forgot to set Type=notify in systemd service file?
2016-07-07 17:58:52.236021 I | api: enabled capabilities for version 3.0.0

But then if I stop that member and try to start it again I get following error:

2016-07-07 18:01:17.132749 I | etcdserver: heartbeat = 100ms
2016-07-07 18:01:17.132760 I | etcdserver: election = 1000ms
2016-07-07 18:01:17.132770 I | etcdserver: snapshot count = 10000
2016-07-07 18:01:17.132789 I | etcdserver: advertise client URLs = ...
2016-07-07 18:01:17.135415 I | etcdserver: restarting member 9f305e80ae209f5 in cluster fb9e691a81dea324 at commit index 30653
2016-07-07 18:01:17.135647 I | raft: 9f305e80ae209f5 became follower at term 6
2016-07-07 18:01:17.135677 I | raft: newRaft 9f305e80ae209f5 [peers: [9f305e80ae209f5,21441eef36bcf00f,dc0dc2965696a868], term: 6, commit: 30653, applied: 30363, lastindex: 30653, lastterm: 6]
2016-07-07 18:01:17.135843 I | membership: added member 21441eef36bcf00f [...] to cluster fb9e691a81dea324 from store
2016-07-07 18:01:17.135859 I | membership: added member 9f305e80ae209f5 [...] to cluster fb9e691a81dea324 from store
2016-07-07 18:01:17.135869 I | membership: added member dc0dc2965696a868 [...] to cluster fb9e691a81dea324 from store
2016-07-07 18:01:17.135880 I | membership: set the cluster version to 3.0 from store
2016-07-07 18:01:17.137083 I | etcdmain: stopping listening for client requests on...
2016-07-07 18:01:17.137112 I | etcdmain: stopping listening for peers on: ...
2016-07-07 18:01:17.137125 C | etcdmain: database file (/var/etcd/member/snap/db index 30353) does not match with snapshot (index 30363).
@xiang90
Copy link
Contributor

xiang90 commented Jul 7, 2016

@sergeyfd Which version of etcd are you running?

@sergeyfd
Copy link
Author

sergeyfd commented Jul 7, 2016

3.0.0

@xiang90
Copy link
Contributor

xiang90 commented Jul 7, 2016

Oh. There is a bug (#5862) in that release. We just fixed it. We will do another release tomorrow to include the fix. At the meantime, can you try with master branch to see if it works?

@xiang90
Copy link
Contributor

xiang90 commented Jul 10, 2016

released 3.0.2. this issue should be fixed.

@xiang90 xiang90 closed this as completed Jul 10, 2016
@lishu2006ll
Copy link

aa
2016-07-15 01:54:05.338566 I | etcdmain: ready to serve client requests
2016-07-15 01:54:05.338828 N | etcdmain: serving insecure client requests on 192.168.56.133:2379, this is strongly discouraged!
2016-07-15 01:54:05.338969 E | etcdmain: failed to notify systemd for readiness: No socket
2016-07-15 01:54:05.338977 E | etcdmain: forgot to set Type=notify in systemd service file?
2016-07-15 01:54:34.341608 W | rafthttp: the clock difference against peer bdea1e0343578f7a is too high [1m22.150199592s > 1s]
2016-07-15 01:55:04.341961 W | rafthttp: the clock difference against peer bdea1e0343578f7a is too high [1m22.191514647s > 1s]
2016-07-15 01:55:34.342699 W | rafthttp: the clock difference against peer bdea1e0343578f7a is too high [1m22.227260344s > 1s]
2016-07-15 01:56:04.343417 W | rafthttp: the clock difference against peer bdea1e0343578f7a is too high [1m22.26853987s > 1s]
2016-07-15 01:56:34.343730 W | rafthttp: the clock difference against peer bdea1e0343578f7a is too high [1m22.304243031s > 1s]
2016-07-15 01:57:04.344476 W | rafthttp: the clock difference against peer bdea1e0343578f7a is too high [1m22.345121299s > 1s]
2016-07-15 01:57:34.345220 W | rafthttp: the clock difference against peer bdea1e0343578f7a is too high [1m22.381443609s > 1s]
2016-07-15 01:58:04.345574 W | rafthttp: the clock difference against peer bdea1e0343578f7a is too high [1m22.422995422s > 1s]
2016-07-15 01:58:34.345892 W | rafthttp: the clock difference against peer bdea1e0343578f7a is too high [1m22.458314658s > 1s]
2016-07-15 01:59:04.346521 W | rafthttp: the clock difference against peer bdea1e0343578f7a is too high [1m22.49941648s > 1s]
2016-07-15 01:59:34.346782 W | rafthttp: the clock difference against peer bdea1e0343578f7a is too high [1m22.535548761s > 1s]
2016-07-15 02:00:04.347013 W | rafthttp: the clock difference against peer bdea1e0343578f7a is too high [1m22.576600927s > 1s]
2016-07-15 02:00:34.347253 W | rafthttp: the clock difference against peer bdea1e0343578f7a is too high [1m22.61267504s > 1s]
2016-07-15 02:01:04.347968 W | rafthttp: the clock difference against peer bdea1e0343578f7a is too high [1m22.65374916s > 1s]

in the released 3.0.2 the problem is exsist

@sergeyfd
Copy link
Author

It seems like a different issue. Do you run NTP on your nodes? It looks like time is out of sync on them.

@yangunang
Copy link

@xiang90 this issue is fix?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

4 participants