Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

etcd crash when start member added into cluster #8608

Closed
zyf0330 opened this issue Sep 26, 2017 · 12 comments
Closed

etcd crash when start member added into cluster #8608

zyf0330 opened this issue Sep 26, 2017 · 12 comments
Assignees
Labels

Comments

@zyf0330
Copy link

zyf0330 commented Sep 26, 2017

When I startup a etcd member which is added into a cluster at the first time startup, I got these logs. And this problem always appears and when I startup etcd at second times, it works normally.

2017-09-26 10:36:44.551441 I | pkg/flags: recognized and used environment variable ETCD_INITIAL_CLUSTER=node01=https://node01.example.com:2380,node02=https://node02.example.com:2380,node03=https://node03.example.com:2380
2017-09-26 10:36:44.551600 I | etcdmain: etcd Version: 3.2.0+git
2017-09-26 10:36:44.551613 I | etcdmain: Git SHA: 4afb99f
2017-09-26 10:36:44.551621 I | etcdmain: Go Version: go1.9
2017-09-26 10:36:44.551631 I | etcdmain: Go OS/Arch: linux/amd64
2017-09-26 10:36:44.551639 I | etcdmain: setting maximum number of CPUs to 4, total number of available CPUs is 4
2017-09-26 10:36:44.551766 I | embed: listening for peers on http://127.0.0.1:2380
2017-09-26 10:36:44.551808 I | embed: listening for client requests on 127.0.0.1:2379
2017-09-26 10:36:44.866216 I | pkg/netutil: resolving node01.example.com:2380 to 101.251.220.234:2380
2017-09-26 10:36:44.866633 I | pkg/netutil: resolving node01.example.com:2380 to 101.251.220.234:2380
2017-09-26 10:36:44.912582 I | pkg/netutil: resolving node02.example.com:2380 to 121.41.10.207:2380
2017-09-26 10:36:44.913277 I | pkg/netutil: resolving node02.example.com:2380 to 121.41.10.207:2380
2017-09-26 10:36:44.958004 I | pkg/netutil: resolving node03.example.com:2380 to 182.61.35.130:2380
2017-09-26 10:36:44.958736 I | pkg/netutil: resolving node03.example.com:2380 to 182.61.35.130:2380
2017-09-26 10:36:45.120634 I | etcdserver: name = node03
2017-09-26 10:36:45.120680 I | etcdserver: data dir = /var/lib/etcd
2017-09-26 10:36:45.120693 I | etcdserver: member dir = /var/lib/etcd/member
2017-09-26 10:36:45.120700 I | etcdserver: heartbeat = 100ms
2017-09-26 10:36:45.120705 I | etcdserver: election = 1000ms
2017-09-26 10:36:45.120711 I | etcdserver: snapshot count = 100000
2017-09-26 10:36:45.120734 I | etcdserver: advertise client URLs = https://node03.example.com:2379
2017-09-26 10:36:45.135568 I | etcdserver: starting member 437d42737c50ded2 in cluster 328523468af88e35
2017-09-26 10:36:45.135652 I | raft: 437d42737c50ded2 became follower at term 0
2017-09-26 10:36:45.135685 I | raft: newRaft 437d42737c50ded2 [peers: [], term: 0, commit: 0, applied: 0, lastindex: 0, lastterm: 0]
2017-09-26 10:36:45.135697 I | raft: 437d42737c50ded2 became follower at term 1
2017-09-26 10:36:45.138961 W | auth: simple token is not cryptographically signed
2017-09-26 10:36:45.140140 I | rafthttp: started HTTP pipelining with peer 951e2b33215b168
2017-09-26 10:36:45.140164 I | rafthttp: started HTTP pipelining with peer 420a46dd477e236d
2017-09-26 10:36:45.140181 I | rafthttp: starting peer 951e2b33215b168...
2017-09-26 10:36:45.140197 I | rafthttp: started HTTP pipelining with peer 951e2b33215b168
2017-09-26 10:36:45.144066 I | rafthttp: started streaming with peer 951e2b33215b168 (writer)
2017-09-26 10:36:45.144570 I | rafthttp: started streaming with peer 951e2b33215b168 (writer)
2017-09-26 10:36:45.146104 I | rafthttp: started peer 951e2b33215b168
2017-09-26 10:36:45.146154 I | rafthttp: added peer 951e2b33215b168
2017-09-26 10:36:45.146175 I | rafthttp: starting peer 420a46dd477e236d...
2017-09-26 10:36:45.146197 I | rafthttp: started HTTP pipelining with peer 420a46dd477e236d
2017-09-26 10:36:45.146218 I | rafthttp: started streaming with peer 951e2b33215b168 (stream MsgApp v2 reader)
2017-09-26 10:36:45.147195 I | rafthttp: started streaming with peer 951e2b33215b168 (stream Message reader)
2017-09-26 10:36:45.148146 I | rafthttp: started streaming with peer 420a46dd477e236d (writer)
2017-09-26 10:36:45.150374 I | rafthttp: started streaming with peer 420a46dd477e236d (writer)
2017-09-26 10:36:45.152302 I | rafthttp: started peer 420a46dd477e236d
2017-09-26 10:36:45.152337 I | rafthttp: added peer 420a46dd477e236d
2017-09-26 10:36:45.152345 I | rafthttp: started streaming with peer 420a46dd477e236d (stream MsgApp v2 reader)
2017-09-26 10:36:45.152389 I | rafthttp: started streaming with peer 420a46dd477e236d (stream Message reader)
2017-09-26 10:36:45.152453 I | etcdserver: starting server... [version: 3.2.0+git, cluster version: to_be_decided]
2017-09-26 10:36:45.153466 I | rafthttp: peer 420a46dd477e236d became active
2017-09-26 10:36:45.153503 I | rafthttp: established a TCP streaming connection with peer 420a46dd477e236d (stream Message writer)
2017-09-26 10:36:45.153728 I | rafthttp: established a TCP streaming connection with peer 420a46dd477e236d (stream MsgApp v2 writer)
2017-09-26 10:36:45.275454 I | rafthttp: established a TCP streaming connection with peer 420a46dd477e236d (stream Message reader)
2017-09-26 10:36:45.286420 I | rafthttp: established a TCP streaming connection with peer 420a46dd477e236d (stream MsgApp v2 reader)
2017-09-26 10:36:45.309592 I | rafthttp: peer 951e2b33215b168 became active
2017-09-26 10:36:45.309646 I | rafthttp: established a TCP streaming connection with peer 951e2b33215b168 (stream MsgApp v2 reader)
2017-09-26 10:36:45.316705 I | rafthttp: established a TCP streaming connection with peer 951e2b33215b168 (stream Message reader)
2017-09-26 10:36:45.351640 I | rafthttp: established a TCP streaming connection with peer 951e2b33215b168 (stream MsgApp v2 writer)
2017-09-26 10:36:45.353178 I | raft: 437d42737c50ded2 [term: 1] received a MsgHeartbeat message with higher term from 951e2b33215b168 [term: 21]
2017-09-26 10:36:45.353212 I | raft: 437d42737c50ded2 became follower at term 21
2017-09-26 10:36:45.353244 I | raft: raft.node: 437d42737c50ded2 elected leader 951e2b33215b168 at term 21
2017-09-26 10:36:45.365595 I | rafthttp: established a TCP streaming connection with peer 951e2b33215b168 (stream Message writer)
2017-09-26 10:36:45.436034 D | raft: 437d42737c50ded2 [logterm: 0, index: 21515] rejected msgApp [logterm: 21, index: 21515] from 951e2b33215b168
2017-09-26 10:36:45.458693 D | raft: 437d42737c50ded2 [logterm: 0, index: 21515] rejected msgApp [logterm: 21, index: 21515] from 951e2b33215b168
2017-09-26 10:36:45.458756 D | raft: 437d42737c50ded2 [logterm: 0, index: 21515] rejected msgApp [logterm: 21, index: 21515] from 951e2b33215b168
2017-09-26 10:36:45.458840 D | raft: 437d42737c50ded2 [logterm: 0, index: 21515] rejected msgApp [logterm: 21, index: 21515] from 951e2b33215b168
2017-09-26 10:36:45.458859 D | raft: 437d42737c50ded2 [logterm: 0, index: 21515] rejected msgApp [logterm: 21, index: 21515] from 951e2b33215b168
2017-09-26 10:36:45.531710 I | rafthttp: receiving database snapshot [index:21522, from 951e2b33215b168] ...
2017-09-26 10:36:52.152882 E | etcdserver: publish error: etcdserver: request timed out, possibly due to connection lost
2017-09-26 10:36:59.153213 E | etcdserver: publish error: etcdserver: request timed out
2017-09-26 10:37:06.153555 E | etcdserver: publish error: etcdserver: request timed out
2017-09-26 10:37:13.153771 E | etcdserver: publish error: etcdserver: request timed out
2017-09-26 10:37:20.153994 E | etcdserver: publish error: etcdserver: request timed out
2017-09-26 10:37:27.154334 E | etcdserver: publish error: etcdserver: request timed out
2017-09-26 10:37:34.154639 E | etcdserver: publish error: etcdserver: request timed out
2017-09-26 10:37:41.154889 E | etcdserver: publish error: etcdserver: request timed out
2017-09-26 10:37:48.155243 E | etcdserver: publish error: etcdserver: request timed out
2017-09-26 10:37:55.155470 E | etcdserver: publish error: etcdserver: request timed out
2017-09-26 10:38:02.155760 E | etcdserver: publish error: etcdserver: request timed out
2017-09-26 10:38:09.155982 E | etcdserver: publish error: etcdserver: request timed out
2017-09-26 10:38:16.156326 E | etcdserver: publish error: etcdserver: request timed out
2017-09-26 10:38:23.156667 E | etcdserver: publish error: etcdserver: request timed out
2017-09-26 10:38:30.156952 E | etcdserver: publish error: etcdserver: request timed out
2017-09-26 10:38:37.157208 E | etcdserver: publish error: etcdserver: request timed out
2017-09-26 10:38:44.157547 E | etcdserver: publish error: etcdserver: request timed out
2017-09-26 10:38:51.157834 E | etcdserver: publish error: etcdserver: request timed out
2017-09-26 10:38:58.158161 E | etcdserver: publish error: etcdserver: request timed out
2017-09-26 10:39:05.158413 E | etcdserver: publish error: etcdserver: request timed out
2017-09-26 10:39:12.158703 E | etcdserver: publish error: etcdserver: request timed out
2017-09-26 10:39:14.721256 I | snap: saved database snapshot to disk [total bytes: 87572480]
2017-09-26 10:39:14.721341 I | rafthttp: received and saved database snapshot [index: 21522, from: 951e2b33215b168] successfully
2017-09-26 10:39:14.721471 I | raft: 437d42737c50ded2 [commit: 0, lastindex: 0, lastterm: 0] starts to restore snapshot [index: 21522, term: 21]
2017-09-26 10:39:14.721701 I | raft: log [committed=0, applied=0, unstable.offset=1, len(unstable.Entries)=0] starts to restore snapshot [index: 21522, term: 21]
2017-09-26 10:39:14.721767 I | raft: 437d42737c50ded2 restored progress of 951e2b33215b168 [next = 21523, match = 0, state = ProgressStateProbe, waiting = false, pendingSnapshot = 0]
2017-09-26 10:39:14.721789 I | raft: 437d42737c50ded2 restored progress of 420a46dd477e236d [next = 21523, match = 0, state = ProgressStateProbe, waiting = false, pendingSnapshot = 0]
2017-09-26 10:39:14.721819 I | raft: 437d42737c50ded2 restored progress of 437d42737c50ded2 [next = 21523, match = 21522, state = ProgressStateProbe, waiting = false, pendingSnapshot = 0]
2017-09-26 10:39:14.721833 I | raft: 437d42737c50ded2 [commit: 21522] restored snapshot [index: 21522, term: 21]
2017-09-26 10:39:14.722140 I | etcdserver: applying snapshot at index 0...
2017-09-26 10:39:14.723510 I | etcdserver: raft applied incoming snapshot at index 21522
2017-09-26 10:39:14.839194 I | etcdserver: recovering lessor...
2017-09-26 10:39:14.842021 I | etcdserver: finished recovering lessor
2017-09-26 10:39:14.842058 I | etcdserver: restoring mvcc store...
2017-09-26 10:39:15.131638 I | etcdserver: finished restoring mvcc store
2017-09-26 10:39:15.131718 I | etcdserver: recovering alarms...
2017-09-26 10:39:15.131737 I | etcdserver: closing old backend...
2017-09-26 10:39:15.132560 I | etcdserver: finished closing old backend
2017-09-26 10:39:15.132735 I | etcdserver: finished recovering alarms
2017-09-26 10:39:15.132760 I | etcdserver: recovering auth store...
2017-09-26 10:39:15.132779 I | etcdserver: finished recovering auth store
2017-09-26 10:39:15.132785 I | etcdserver: recovering store v2...
2017-09-26 10:39:15.133895 I | etcdserver: finished recovering store v2
2017-09-26 10:39:15.133932 I | etcdserver: recovering cluster configuration...
2017-09-26 10:39:15.134049 I | etcdserver/api: enabled capabilities for version 3.2
2017-09-26 10:39:15.134082 I | etcdserver/membership: added member 437d42737c50ded2 [https://node03.example.com:2380] to cluster 328523468af88e35 from store
2017-09-26 10:39:15.134096 I | etcdserver/membership: added member 951e2b33215b168 [https://node01.example.com:2380] to cluster 328523468af88e35 from store
2017-09-26 10:39:15.134103 I | etcdserver/membership: added member 420a46dd477e236d [https://node02.example.com:2380] to cluster 328523468af88e35 from store
2017-09-26 10:39:15.134111 I | etcdserver/membership: set the cluster version to 3.2 from store
2017-09-26 10:39:15.134116 I | etcdserver: finished recovering cluster configuration
2017-09-26 10:39:15.134126 I | etcdserver: removing old peers from network...
2017-09-26 10:39:15.134139 I | rafthttp: stopping peer 951e2b33215b168...
2017-09-26 10:39:15.134342 I | rafthttp: closed the TCP streaming connection with peer 951e2b33215b168 (stream MsgApp v2 writer)
2017-09-26 10:39:15.134367 I | rafthttp: stopped streaming with peer 951e2b33215b168 (writer)
2017-09-26 10:39:15.134524 I | rafthttp: closed the TCP streaming connection with peer 951e2b33215b168 (stream Message writer)
2017-09-26 10:39:15.134538 I | rafthttp: stopped streaming with peer 951e2b33215b168 (writer)
2017-09-26 10:39:15.134696 I | rafthttp: stopped HTTP pipelining with peer 951e2b33215b168
2017-09-26 10:39:15.134792 W | rafthttp: lost the TCP streaming connection with peer 951e2b33215b168 (stream MsgApp v2 reader)
2017-09-26 10:39:15.134828 I | rafthttp: stopped streaming with peer 951e2b33215b168 (stream MsgApp v2 reader)
2017-09-26 10:39:15.134939 W | rafthttp: lost the TCP streaming connection with peer 951e2b33215b168 (stream Message reader)
2017-09-26 10:39:15.135008 I | rafthttp: stopped streaming with peer 951e2b33215b168 (stream Message reader)
2017-09-26 10:39:15.135026 I | rafthttp: stopped peer 951e2b33215b168
2017-09-26 10:39:15.135045 I | rafthttp: removed peer 951e2b33215b168
2017-09-26 10:39:15.135053 I | rafthttp: stopping peer 420a46dd477e236d...
2017-09-26 10:39:15.135211 I | rafthttp: closed the TCP streaming connection with peer 420a46dd477e236d (stream MsgApp v2 writer)
2017-09-26 10:39:15.135228 I | rafthttp: stopped streaming with peer 420a46dd477e236d (writer)
2017-09-26 10:39:15.135372 I | rafthttp: closed the TCP streaming connection with peer 420a46dd477e236d (stream Message writer)
2017-09-26 10:39:15.135387 I | rafthttp: stopped streaming with peer 420a46dd477e236d (writer)
2017-09-26 10:39:15.135585 I | rafthttp: stopped HTTP pipelining with peer 420a46dd477e236d
2017-09-26 10:39:15.135676 W | rafthttp: lost the TCP streaming connection with peer 420a46dd477e236d (stream MsgApp v2 reader)
2017-09-26 10:39:15.135693 I | rafthttp: stopped streaming with peer 420a46dd477e236d (stream MsgApp v2 reader)
2017-09-26 10:39:15.135732 W | rafthttp: lost the TCP streaming connection with peer 420a46dd477e236d (stream Message reader)
2017-09-26 10:39:15.135755 I | rafthttp: stopped streaming with peer 420a46dd477e236d (stream Message reader)
2017-09-26 10:39:15.135763 I | rafthttp: stopped peer 420a46dd477e236d
2017-09-26 10:39:15.135781 I | rafthttp: removed peer 420a46dd477e236d
2017-09-26 10:39:15.135791 I | etcdserver: finished removing old peers from network
2017-09-26 10:39:15.135801 I | etcdserver: adding peers from new cluster configuration into network...
2017-09-26 10:39:15.135828 I | rafthttp: starting peer 951e2b33215b168...
2017-09-26 10:39:15.135950 I | rafthttp: started HTTP pipelining with peer 951e2b33215b168
2017-09-26 10:39:15.136340 I | rafthttp: started streaming with peer 951e2b33215b168 (writer)
2017-09-26 10:39:15.136510 I | rafthttp: started streaming with peer 951e2b33215b168 (writer)
2017-09-26 10:39:15.136759 I | rafthttp: started peer 951e2b33215b168
2017-09-26 10:39:15.136799 I | rafthttp: added peer 951e2b33215b168
2017-09-26 10:39:15.136823 I | rafthttp: started streaming with peer 951e2b33215b168 (stream MsgApp v2 reader)
2017-09-26 10:39:15.136958 I | rafthttp: starting peer 420a46dd477e236d...
2017-09-26 10:39:15.137033 I | rafthttp: started HTTP pipelining with peer 420a46dd477e236d
2017-09-26 10:39:15.137097 I | rafthttp: started streaming with peer 951e2b33215b168 (stream Message reader)
2017-09-26 10:39:15.137419 I | rafthttp: started streaming with peer 420a46dd477e236d (writer)
2017-09-26 10:39:15.137490 I | rafthttp: started streaming with peer 420a46dd477e236d (writer)
2017-09-26 10:39:15.137761 I | rafthttp: started peer 420a46dd477e236d
2017-09-26 10:39:15.137804 I | rafthttp: started streaming with peer 420a46dd477e236d (stream MsgApp v2 reader)
2017-09-26 10:39:15.137824 I | rafthttp: started streaming with peer 420a46dd477e236d (stream Message reader)
2017-09-26 10:39:15.137921 I | rafthttp: added peer 420a46dd477e236d
2017-09-26 10:39:15.137957 I | etcdserver: finished adding peers from new cluster configuration into network...
2017-09-26 10:39:15.137973 I | etcdserver: finished applying incoming snapshot at index 0
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x90197c]

goroutine 133 [running]:
github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/auth.(*simpleTokenTTLKeeper).addSimpleToken(0x0, 0xc4239045c0, 0x16)
	/root/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/auth/simple_token.go:59 +0x3c
github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/auth.(*tokenSimple).assignSimpleTokenToUser(0xc4203f64e0, 0xc421688178, 0x4, 0xc4239045c0, 0x16)
	/root/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/auth/simple_token.go:128 +0x12d
github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/auth.(*tokenSimple).assign(0xc4203f64e0, 0x14db740, 0xc4238feae0, 0xc421688178, 0x4, 0xbb, 0x7ffd9f077a28, 0x0, 0x0, 0xc42002e700)
	/root/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/auth/simple_token.go:191 +0x1e7
github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/auth.(*authStore).Authenticate(0xc4201e8280, 0x14db740, 0xc4238feae0, 0xc421688178, 0x4, 0xc4216881c0, 0xd, 0x0, 0x0, 0x0)
	/root/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/auth/store.go:282 +0x16d
github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver.(*applierV3backend).Authenticate(0xc420474120, 0xc4238fea50, 0x0, 0x0, 0xc422cc34d0)
	/root/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver/apply.go:654 +0x199
github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver.(*quotaApplierV3).Authenticate(0xc420474140, 0xc4238fea50, 0xe55440, 0x1, 0xc4238fea80)
	<autogenerated>:1 +0x46
github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver.(*authApplierV3).Authenticate(0xc420496000, 0xc4238fea50, 0xc4202d2300, 0xc420237d80, 0x3)
	<autogenerated>:1 +0x46
github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver.(*applierV3backend).Apply(0xc420474120, 0xc42041b960, 0xc4202d2300)
	/root/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver/apply.go:130 +0xc41
github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver.(*quotaApplierV3).Apply(0xc420474140, 0xc42041b960, 0xc420496030)
	<autogenerated>:1 +0x46
github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver.(*authApplierV3).Apply(0xc420496000, 0xc42041b960, 0x0)
	/root/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver/apply_auth.go:58 +0xe2
github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver.(*EtcdServer).applyEntryNormal(0xc420080000, 0xc422cc3788)
	/root/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver/server.go:1375 +0x236
github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver.(*EtcdServer).apply(0xc420080000, 0xc4217e0800, 0x1b, 0x1c, 0xc4201e8640, 0x44c2a8, 0xabde3b18c222c, 0x8399751)
	/root/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver/server.go:1297 +0x37e
github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver.(*EtcdServer).applyEntries(0xc420080000, 0xc4201e8640, 0xc4203858c0)
	/root/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver/server.go:934 +0xc7
github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver.(*EtcdServer).applyAll(0xc420080000, 0xc4201e8640, 0xc4203858c0)
	/root/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver/server.go:796 +0xb3
github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver.(*EtcdServer).run.func6(0x14db680, 0xc420052380)
	/root/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver/server.go:758 +0x3c
github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/pkg/schedule.(*fifo).run(0xc420182f60)
	/root/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/pkg/schedule/schedule.go:157 +0xe6
created by github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/pkg/schedule.NewFIFOScheduler
	/root/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/pkg/schedule/schedule.go:70 +0x166
@gyuho
Copy link
Contributor

gyuho commented Sep 26, 2017

How did you configure etcd cluster? e.g. etcd flags, or env vars?

@fanminshi
Copy link
Member

@gyuho looks like something went wrong when restoring auth store.

@zyf0330
Copy link
Author

zyf0330 commented Sep 27, 2017

@gyuho

--name node01 \
 --initial-cluster-token nodes \
 --data-dir /var/lib/etcd \
 --initial-cluster-state new \
 --discovery-srv example.com \
 --initial-advertise-peer-urls https://node01.example.com:2380 \
 --listen-peer-urls http://127.0.0.1:2380 \
 --advertise-client-urls https://node01.example.com:2379 \
 --listen-client-urls http://127.0.0.1:2379

@gyuho
Copy link
Contributor

gyuho commented Sep 27, 2017

When I startup a etcd member which is added into a cluster at the first time startup

So you bootstrap 3-node cluster with SRV with TLS.
And add a member.
And restarted the added member to get this panic?
We need exact steps to reproduce.

@zyf0330
Copy link
Author

zyf0330 commented Sep 28, 2017

@gyuho My bad English misleaded you. I mean that etcd member has been etcdctl member add into cluster.
The whole steps are:

  1. create 3-node cluster with SRV with TLS
  2. etcdctl remove a member
  3. etcdctl add member with the same advertise-peer-url as that member removed in step 2
  4. startup this new node first time and get that error
  5. after re-startup this new node, everything is ok

@xiang90
Copy link
Contributor

xiang90 commented Oct 4, 2017

@jpbetz can you help on this issue?

@jpbetz
Copy link
Contributor

jpbetz commented Oct 4, 2017

@xiang90 Sure, I'll have a look.

@gyuho
Copy link
Contributor

gyuho commented Oct 6, 2017

@zyf0330 I cannot reproduce this...
@jpbetz Have you tried reproducing this?

We've recently added containers to simulate local SRV-based cluster.

Below is what I tried to reproduce this, with master branch:

diff --git a/e2e/docker-dns-srv/run.sh b/e2e/docker-dns-srv/run.sh
index 7c7415f8d..710e3da3a 100755
--- a/e2e/docker-dns-srv/run.sh
+++ b/e2e/docker-dns-srv/run.sh
@@ -14,3 +14,12 @@ ETCDCTL_API=3 ./etcdctl \
   --key=/certs//server-wildcard.key.insecure \
   --discovery-srv etcd.local \
   put foo bar
+
+ETCDCTL_API=3 ./etcdctl \
+  --cacert=/certs/ca.crt \
+  --cert=/certs/server-wildcard.crt \
+  --key=/certs//server-wildcard.key.insecure \
+  --discovery-srv etcd.local \
+  member list
+
+sleep 1h

And

make docker-test-build
make docker-test-compile
make docker-dns-srv-test-build
make docker-dns-srv-test-run

And

# SSH into SRV docker container
docker exec -it 076cd3c03d40 /bin/bash

cd /etcd

ETCDCTL_API=3 ./etcdctl \
  --cacert=/certs/ca.crt \
  --cert=/certs/server-wildcard.crt \
  --key=/certs//server-wildcard.key.insecure \
  --discovery-srv etcd.local \
  member list

# remove m3
ETCDCTL_API=3 ./etcdctl \
  --cacert=/certs/ca.crt \
  --cert=/certs/server-wildcard.crt \
  --key=/certs//server-wildcard.key.insecure \
  --discovery-srv etcd.local \
  member remove 1757793ebd94f915

# remove data dir
rm -rf m3.etcd/

# add member
ETCDCTL_API=3 ./etcdctl \
  --cacert=/certs/ca.crt \
  --cert=/certs/server-wildcard.crt \
  --key=/certs//server-wildcard.key.insecure \
  --discovery-srv etcd.local \
  member add m3 --peer-urls=https://m3.etcd.local:32380

To start with '--initial-cluster-state existing'

./etcd --name m3 --listen-client-urls https://127.0.0.1:32379 --advertise-client-urls https://m3.etcd.local:32379 --listen-peer-urls https://127.0.0.1:32380 --initial-advertise-peer-urls=https://m3.etcd.local:32380 --initial-cluster-token tkn --discovery-srv=etcd.local --initial-cluster-state existing --peer-cert-file=/certs/server-wildcard.crt --peer-key-file=/certs/server-wildcard.key.insecure --peer-trusted-ca-file=/certs/ca.crt --peer-client-cert-auth --cert-file=/certs/server-wildcard.crt --key-file=/certs/server-wildcard.key.insecure --trusted-ca-file=/certs/ca.crt --client-cert-auth

And everything works as expected in my side.

@zyf0330 Did you run with etcd master?

@zyf0330
Copy link
Author

zyf0330 commented Oct 7, 2017

No, I ran with commit 4afb99f
And network status between us is not same. My three servers are in different location.
I cannot provide more help. Just say, this problem happened every time when I did this, total twice.

@jpbetz
Copy link
Contributor

jpbetz commented Oct 7, 2017

@gyuho You're one step ahead of me! I just finished attempting to reproduce it, using steps nearly identical to yours. I was also unable to reproduce. Everything worked as expected.

I'll have look at the stack trace in more detail, see if I back track what might have gone wrong from it.

@jpbetz
Copy link
Contributor

jpbetz commented Oct 13, 2017

Looks like the disabled state for TokenProvider is not being respected for assign, this will be fixed with #8695.

@xiang90
Copy link
Contributor

xiang90 commented Oct 14, 2017

i believe this is fixed by #8695

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

5 participants