Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

V0.4.2: Hitting "panic: raft: Index is beyond end of log:" on a 3 node cluster within a few seconds of starting it #832

Closed
fleiner opened this issue Jun 6, 2014 · 3 comments
Labels

Comments

@fleiner
Copy link

fleiner commented Jun 6, 2014

all three etcd daemons run on the same machine, but use different ports.
The three log files I get are:

etcd.1.log:[etcd] Jun  6 12:25:16.392 INFO      | etcddcpd.sh-1 is starting a new cluster
etcd.1.log:[etcd] Jun  6 12:25:16.394 INFO      | etcd server [name etcddcpd.sh-1, listen on :18003, advertised url http://localhost:18003]
etcd.1.log:[etcd] Jun  6 12:25:16.394 INFO      | peer server [name etcddcpd.sh-1, listen on :15162, advertised url http://localhost:15162]
etcd.1.log:[etcd] Jun  6 12:25:16.394 INFO      | etcddcpd.sh-1 starting in peer mode
etcd.1.log:[etcd] Jun  6 12:25:16.394 INFO      | etcddcpd.sh-1: state changed from 'initialized' to 'follower'.
etcd.1.log:[etcd] Jun  6 12:25:16.394 INFO      | etcddcpd.sh-1: state changed from 'follower' to 'leader'.
etcd.1.log:[etcd] Jun  6 12:25:16.394 INFO      | etcddcpd.sh-1: leader changed from '' to 'etcddcpd.sh-1'.
etcd.1.log:[etcd] Jun  6 12:25:17.456 INFO      | etcddcpd.sh-1: peer added: 'etcddcpd.sh-2'
etcd.1.log:[etcd] Jun  6 12:25:19.844 INFO      | etcddcpd.sh-1: peer added: 'etcddcpd.sh-3'
etcd.1.log:[etcd] Jun  6 12:25:20.410 INFO      | etcddcpd.sh-1: state changed from 'leader' to 'follower'.
etcd.1.log:[etcd] Jun  6 12:25:20.410 INFO      | etcddcpd.sh-1: term #1935 started.
etcd.1.log:[etcd] Jun  6 12:25:20.410 INFO      | etcddcpd.sh-1: leader changed from 'etcddcpd.sh-1' to ''.
etcd.1.log:[etcd] Jun  6 12:25:22.092 INFO      | etcddcpd.sh-1: state changed from 'follower' to 'candidate'.
etcd.1.log:[etcd] Jun  6 12:25:22.092 INFO      | etcddcpd.sh-1: state changed from 'candidate' to 'leader'.
etcd.1.log:[etcd] Jun  6 12:25:22.093 INFO      | etcddcpd.sh-1: leader changed from '' to 'etcddcpd.sh-1'.
etcd.1.log:panic: raft: Index is beyond end of log: 13 618063
etcd.1.log:goroutine 40 [running]:
etcd.1.log:runtime.panic(0x734040, 0xc2100dacf0)
etcd.1.log:     /usr/lib64/golang/src/pkg/runtime/panic.c:266 +0xb6
etcd.1.log:github.com/coreos/etcd/third_party/github.com/goraft/raft.(*Log).getEntriesAfter(0xc2100517e0, 0x96e4f, 0x7d0, 0x0, 0x0, ...)
etcd.1.log:     /bld/etcd/gopath/src/github.com/coreos/etcd/third_party/github.com/goraft/raft/log.go:256 +0x31f
etcd.1.log:github.com/coreos/etcd/third_party/github.com/goraft/raft.(*Peer).flush(0xc2101089a0)
etcd.1.log:     /bld/etcd/gopath/src/github.com/coreos/etcd/third_party/github.com/goraft/raft/peer.go:175 +0x171
etcd.1.log:github.com/coreos/etcd/third_party/github.com/goraft/raft.(*Peer).heartbeat(0xc2101089a0, 0xc21010ee40)
etcd.1.log:     /bld/etcd/gopath/src/github.com/coreos/etcd/third_party/github.com/goraft/raft/peer.go:163 +0x480
etcd.1.log:github.com/coreos/etcd/third_party/github.com/goraft/raft.func·005()
etcd.1.log:     /bld/etcd/gopath/src/github.com/coreos/etcd/third_party/github.com/goraft/raft/peer.go:97 +0x75
etcd.1.log:created by github.com/coreos/etcd/third_party/github.com/goraft/raft.(*Peer).startHeartbeat
etcd.1.log:     /bld/etcd/gopath/src/github.com/coreos/etcd/third_party/github.com/goraft/raft/peer.go:98 +0x12e
etcd.1.log:goroutine 1 [select]:
etcd.1.log:github.com/coreos/etcd/etcd.(*Etcd).runServer(0xc21000a680)
etcd.1.log:     /bld/etcd/gopath/src/github.com/coreos/etcd/etcd/etcd.go:318 +0x985
etcd.1.log:github.com/coreos/etcd/etcd.(*Etcd).Run(0xc21000a680)
etcd.1.log:     /bld/etcd/gopath/src/github.com/coreos/etcd/etcd/etcd.go:289 +0x323e
etcd.1.log:main.main()
etcd.1.log:     /bld/etcd/gopath/src/github.com/coreos/etcd/main.go:43 +0x2b2
etcd.1.log:goroutine 3 [syscall]:
etcd.1.log:os/signal.loop()
etcd.1.log:     /usr/lib64/golang/src/pkg/os/signal/signal_unix.go:21 +0x1e
etcd.1.log:created by os/signal.init·1
etcd.1.log:     /usr/lib64/golang/src/pkg/os/signal/signal_unix.go:27 +0x31
etcd.1.log:goroutine 5 [IO wait]:
etcd.1.log:net.runtime_pollWait(0x7fcbe530e130, 0x72, 0x0)
etcd.1.log:     /usr/lib64/golang/src/pkg/runtime/netpoll.goc:116 +0x6a
etcd.1.log:net.(*pollDesc).Wait(0xc2100ad1b0, 0x72, 0x7fcbe530cf88, 0xb)
etcd.1.log:     /usr/lib64/golang/src/pkg/net/fd_poll_runtime.go:81 +0x34
etcd.1.log:net.(*pollDesc).WaitRead(0xc2100ad1b0, 0xb, 0x7fcbe530cf88)
etcd.1.log:     /usr/lib64/golang/src/pkg/net/fd_poll_runtime.go:86 +0x30
etcd.1.log:net.(*netFD).accept(0xc2100ad150, 0x94ab08, 0x0, 0x7fcbe530cf88, 0xb)
etcd.1.log:     /usr/lib64/golang/src/pkg/net/fd_unix.go:382 +0x2c2
etcd.1.log:net.(*TCPListener).AcceptTCP(0xc210000db8, 0x18, 0xc2100ca840, 0x520123)
etcd.1.log:     /usr/lib64/golang/src/pkg/net/tcpsock_posix.go:233 +0x47
etcd.1.log:net.(*TCPListener).Accept(0xc210000db8, 0x420b49, 0x800, 0x0, 0x10)
etcd.1.log:     /usr/lib64/golang/src/pkg/net/tcpsock_posix.go:243 +0x27
etcd.1.log:net/http.(*Server).Serve(0xc2100b2550, 0x7fcbe530f4b8, 0xc210000db8, 0x0, 0x0)
etcd.1.log:     /usr/lib64/golang/src/pkg/net/http/server.go:1622 +0x91
etcd.1.log:github.com/coreos/etcd/etcd.func·001(()
etcd.1.log:     /bld/etcd/gopath/src/github.com/coreos/etcd/etcd/etcd.go:273 +0x93
etcd.1.log:created by github.com/coreos/etcd/etcd.(*Etcd).Run
etcd.1.log:     /bld/etcd/gopath/src/github.com/coreos/etcd/etcd/etcd.go:278 +0x31b0
etcd.1.log:goroutine 6 [IO wait]:
etcd.1.log:net.runtime_pollWait(0x7fcbe530e088, 0x72, 0x0)
etcd.1.log:     /usr/lib64/golang/src/pkg/runtime/netpoll.goc:116 +0x6a
etcd.1.log:net.(*pollDesc).Wait(0xc2100ad300, 0x72, 0x7fcbe530cf88, 0xb)
etcd.1.log:     /usr/lib64/golang/src/pkg/net/fd_poll_runtime.go:81 +0x34
etcd.1.log:net.(*pollDesc).WaitRead(0xc2100ad300, 0xb, 0x7fcbe530cf88)
etcd.1.log:     /usr/lib64/golang/src/pkg/net/fd_poll_runtime.go:86 +0x30
etcd.1.log:net.(*netFD).accept(0xc2100ad2a0, 0x94ab08, 0x0, 0x7fcbe530cf88, 0xb)
etcd.1.log:     /usr/lib64/golang/src/pkg/net/fd_unix.go:382 +0x2c2
etcd.1.log:net.(*TCPListener).AcceptTCP(0xc210000de0, 0x52034b, 0x7fcbe5185e88, 0x52034b)
etcd.1.log:     /usr/lib64/golang/src/pkg/net/tcpsock_posix.go:233 +0x47
etcd.1.log:net.(*TCPListener).Accept(0xc210000de0, 0x7fcbe530fa78, 0xc2100d4438, 0xc210102880, 0x0)
etcd.1.log:     /usr/lib64/golang/src/pkg/net/tcpsock_posix.go:243 +0x27
etcd.1.log:net/http.(*Server).Serve(0xc2100b25a0, 0x7fcbe530f4b8, 0xc210000de0, 0x0, 0x0)
etcd.1.log:     /usr/lib64/golang/src/pkg/net/http/server.go:1622 +0x91
etcd.1.log:github.com/coreos/etcd/etcd.func·002(()
etcd.1.log:     /bld/etcd/gopath/src/github.com/coreos/etcd/etcd/etcd.go:282 +0x93
etcd.1.log:created by github.com/coreos/etcd/etcd.(*Etcd).Run
etcd.1.log:     /bld/etcd/gopath/src/github.com/coreos/etcd/etcd/etcd.go:287 +0x3228
etcd.1.log:goroutine 7 [select]:
etcd.1.log:github.com/coreos/etcd/third_party/github.com/goraft/raft.(*server).leaderLoop(0xc210052120)
etcd.1.log:     /bld/etcd/gopath/src/github.com/coreos/etcd/third_party/github.com/goraft/raft/server.go:835 +0x5dc
etcd.1.log:github.com/coreos/etcd/third_party/github.com/goraft/raft.(*server).loop(0xc210052120)
etcd.1.log:     /bld/etcd/gopath/src/github.com/coreos/etcd/third_party/github.com/goraft/raft/server.go:610 +0x3a0
etcd.1.log:github.com/coreos/etcd/third_party/github.com/goraft/raft.func·007()
etcd.1.log:     /bld/etcd/gopath/src/github.com/coreos/etcd/third_party/github.com/goraft/raft/server.go:471 +0x5c
etcd.1.log:created by github.com/coreos/etcd/third_party/github.com/goraft/raft.(*server).Start
etcd.1.log:     /bld/etcd/gopath/src/github.com/coreos/etcd/third_party/github.com/goraft/raft/server.go:472 +0x3c0
etcd.1.log:goroutine 17 [IO wait]:
etcd.1.log:net.runtime_pollWait(0x7fcbe530dfe0, 0x72, 0x0)
etcd.1.log:     /usr/lib64/golang/src/pkg/runtime/netpoll.goc:116 +0x6a
etcd.1.log:net.(*pollDesc).Wait(0xc210051920, 0x72, 0x7fcbe530cf88, 0xb)
etcd.1.log:     /usr/lib64/golang/src/pkg/net/fd_poll_runtime.go:81 +0x34
etcd.1.log:net.(*pollDesc).WaitRead(0xc210051920, 0xb, 0x7fcbe530cf88)
etcd.1.log:     /usr/lib64/golang/src/pkg/net/fd_poll_runtime.go:86 +0x30
etcd.1.log:net.(*netFD).Read(0xc2100518c0, 0xc2100cd000, 0x1000, 0x1000, 0x0, ...)
etcd.1.log:     /usr/lib64/golang/src/pkg/net/fd_unix.go:204 +0x2a0
etcd.1.log:net.(*conn).Read(0xc2100d4098, 0xc2100cd000, 0x1000, 0x1000, 0x0, ...)
etcd.1.log:     /usr/lib64/golang/src/pkg/net/net.go:122 +0xc5
etcd.1.log:net/http.(*liveSwitchReader).Read(0xc21000a528, 0xc2100cd000, 0x1000, 0x1000, 0x7339e0, ...)
etcd.1.log:     /usr/lib64/golang/src/pkg/net/http/server.go:204 +0xa5
etcd.2.log:[etcd] Jun  6 12:25:17.456 INFO      | Send Join Request to http://localhost:15162/join
etcd.2.log:[etcd] Jun  6 12:25:17.515 INFO      | etcddcpd.sh-2 joined the cluster via peer localhost:15162
etcd.2.log:[etcd] Jun  6 12:25:17.516 INFO      | etcd server [name etcddcpd.sh-2, listen on :15367, advertised url http://localhost:15367]
etcd.2.log:[etcd] Jun  6 12:25:17.516 INFO      | peer server [name etcddcpd.sh-2, listen on :16980, advertised url http://localhost:16980]
etcd.2.log:[etcd] Jun  6 12:25:17.517 INFO      | etcddcpd.sh-2 starting in peer mode
etcd.2.log:[etcd] Jun  6 12:25:17.517 INFO      | etcddcpd.sh-2: state changed from 'initialized' to 'follower'.
etcd.2.log:[etcd] Jun  6 12:25:17.809 INFO      | etcddcpd.sh-2: peer added: 'etcddcpd.sh-1'
etcd.2.log:[etcd] Jun  6 12:25:20.173 INFO      | etcddcpd.sh-2: peer added: 'etcddcpd.sh-3'
etcd.2.log:[etcd] Jun  6 12:25:21.551 INFO      | etcddcpd.sh-2: state changed from 'follower' to 'candidate'.
etcd.2.log:[etcd] Jun  6 12:25:21.551 INFO      | etcddcpd.sh-2: leader changed from 'etcddcpd.sh-1' to ''.
etcd.2.log:[etcd] Jun  6 12:25:21.552 INFO      | etcddcpd.sh-2: state changed from 'candidate' to 'follower'.
etcd.2.log:[etcd] Jun  6 12:25:21.552 INFO      | etcddcpd.sh-2: term #1935 started.
etcd.2.log:[etcd] Jun  6 12:25:22.092 INFO      | etcddcpd.sh-2: term #1936 started.
etcd.3.log:[etcd] Jun  6 12:25:20.460 INFO      | Send Join Request to http://localhost:15162/join
etcd.3.log:[etcd] Jun  6 12:25:20.461 INFO      | etcddcpd.sh-3 attempted to join via localhost:15162 failed: fail on join request: Raft Internal Error ()
etcd.3.log:[etcd] Jun  6 12:25:20.462 INFO      | Send Join Request to http://localhost:16980/join
etcd.3.log:[etcd] Jun  6 12:25:20.463 INFO      | etcddcpd.sh-3 attempted to join via localhost:16980 failed: fail on join request: Raft Internal Error ()
etcd.3.log:[etcd] Jun  6 12:25:20.464 INFO      | etcddcpd.sh-3 is unable to join the cluster using any of the peers [localhost:15162 localhost:16980] at 0th time. Retrying in 2.6 seconds
etcd.3.log:[etcd] Jun  6 12:25:22.465 INFO      | Send Join Request to http://localhost:15162/join
@jonboulle
Copy link
Contributor

This could be another manifestation of #829?

@yichengq
Copy link
Contributor

yichengq commented Jun 6, 2014

@fleiner
I think the case is as this:
etcddcpd.sh-1 starts with no '-peers', no '-discovery' and no log data, so it starts as a one-node new cluster.
etcddcpd.sh-2 starts with old log data in data-dir, and recover the old cluster info. But etcddcpd.sh-1 joins this cluster and the logs of these two cluster collides.

If that fits your case well, there's some way to make it run again:

  1. rm data-dir of etcddcpd.sh-1
  2. start etcddcpd.sh-2 first
  3. start etcddcpd.sh-1 with -peers=$etcddcpd.sh-2.peerAddr

We are sorry for the inconvenience that caused by this. We have plans to introduce cluster-id to avoid the collision, and it should be finished sooner or later.

@jonboulle We think this is caused by another reason. But they all end up with raft panic because raft cannot handle any of them. In further plan, this case should return a readable error.

@kelseyhightower
Copy link
Contributor

Fixes for issue like these have been pushed to master, and also noted in #829. Closing this out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

4 participants