Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nomad 0.6.0 panic: no future #3044

Closed
discobean opened this issue Aug 17, 2017 · 2 comments · Fixed by #3051
Closed

Nomad 0.6.0 panic: no future #3044

discobean opened this issue Aug 17, 2017 · 2 comments · Fixed by #3051

Comments

@discobean
Copy link

discobean commented Aug 17, 2017

I have 3 nomad servers and 2 panic'd with the same issue.

Cluster lost quorum, after a restart of the 2 dead instances, things came back to normal.

Nomad version

0.6.0 linux - fresh cluster running for 2 days

Issue

Aug 16 04:00:39 ip-10-123-13-59 nomad.sh[1251]:     2017/08/16 04:00:39 [INFO] raft: Initial configuration (index=0): []
Aug 16 04:00:39 ip-10-123-13-59 nomad.sh[1251]:     2017/08/16 04:00:39 [INFO] serf: EventMemberJoin: ip-10-123-13-59.global 10.123.13.59
Aug 16 04:00:39 ip-10-123-13-59 nomad.sh[1251]:     2017/08/16 04:00:39.297246 [INFO] nomad: starting 1 scheduling worker(s) for [service batch system _core]
Aug 16 04:00:39 ip-10-123-13-59 nomad.sh[1251]:     2017/08/16 04:00:39 [INFO] raft: Node at 10.123.13.59:4647 [Follower] entering Follower state (Leader: "")
Aug 16 04:00:39 ip-10-123-13-59 nomad.sh[1251]:     2017/08/16 04:00:39.302296 [INFO] nomad: adding server ip-10-123-13-59.global (Addr: 10.123.13.59:4647) (DC: ap-southeast-2)
Aug 16 04:00:39 ip-10-123-13-59 nomad.sh[1251]:     2017/08/16 04:00:39.327648 [ERR] consul: error looking up Nomad servers: server.nomad: unable to query Consul datacenters: Unexpected response code: 500 (No known Consul servers)
Aug 16 04:00:40 ip-10-123-13-59 nomad.sh[1251]:     2017/08/16 04:00:40 [WARN] raft: no known peers, aborting election
Aug 16 04:00:48 ip-10-123-13-59 nomad.sh[1251]:     2017/08/16 04:00:48 [INFO] serf: EventMemberJoin: ip-10-123-11-76.global 10.123.11.76
Aug 16 04:00:48 ip-10-123-13-59 nomad.sh[1251]:     2017/08/16 04:00:48 [INFO] serf: EventMemberJoin: ip-10-123-14-242.global 10.123.14.242
Aug 16 04:00:48 ip-10-123-13-59 nomad.sh[1251]:     2017/08/16 04:00:48.337052 [INFO] nomad: adding server ip-10-123-11-76.global (Addr: 10.123.11.76:4647) (DC: ap-southeast-2)
Aug 16 04:00:48 ip-10-123-13-59 nomad.sh[1251]:     2017/08/16 04:00:48.348335 [INFO] nomad: Found expected number of peers (10.123.13.59:4647,10.123.11.76:4647,10.123.14.242:4647), attempting to bootstrap cluster...
Aug 16 04:00:48 ip-10-123-13-59 nomad.sh[1251]:     2017/08/16 04:00:48.354164 [INFO] nomad: adding server ip-10-123-14-242.global (Addr: 10.123.14.242:4647) (DC: ap-southeast-2)
Aug 16 04:00:48 ip-10-123-13-59 nomad.sh[1251]:     2017/08/16 04:00:48 [WARN] raft: Heartbeat timeout from "" reached, starting election
Aug 16 04:00:48 ip-10-123-13-59 nomad.sh[1251]:     2017/08/16 04:00:48 [INFO] raft: Node at 10.123.13.59:4647 [Candidate] entering Candidate state in term 2
Aug 16 04:00:48 ip-10-123-13-59 nomad.sh[1251]:     2017/08/16 04:00:48 [INFO] raft: Election won. Tally: 2
Aug 16 04:00:48 ip-10-123-13-59 nomad.sh[1251]:     2017/08/16 04:00:48 [INFO] raft: Node at 10.123.13.59:4647 [Leader] entering Leader state
Aug 16 04:00:48 ip-10-123-13-59 nomad.sh[1251]:     2017/08/16 04:00:48 [INFO] raft: Added peer 10.123.11.76:4647, starting replication
Aug 16 04:00:48 ip-10-123-13-59 nomad.sh[1251]:     2017/08/16 04:00:48 [INFO] raft: Added peer 10.123.14.242:4647, starting replication
Aug 16 04:00:48 ip-10-123-13-59 nomad.sh[1251]:     2017/08/16 04:00:48.916382 [INFO] nomad: cluster leadership acquired
Aug 16 04:00:48 ip-10-123-13-59 nomad.sh[1251]:     2017/08/16 04:00:48 [INFO] raft: pipelining replication to peer {Voter 10.123.11.76:4647 10.123.11.76:4647}
Aug 16 04:00:48 ip-10-123-13-59 nomad.sh[1251]:     2017/08/16 04:00:48 [WARN] raft: AppendEntries to {Voter 10.123.14.242:4647 10.123.14.242:4647} rejected, sending older logs (next: 1)
Aug 16 04:00:48 ip-10-123-13-59 nomad.sh[1251]:     2017/08/16 04:00:48 [INFO] raft: pipelining replication to peer {Voter 10.123.14.242:4647 10.123.14.242:4647}
Aug 16 04:00:51 ip-10-123-13-59 nomad.sh[1251]:     2017/08/16 04:00:51.336997 [INFO] server.nomad: successfully contacted 2 Nomad Servers
Aug 16 04:01:55 ip-10-123-13-59 nomad.sh[1251]:     2017/08/16 04:01:55.220404 [ERR] http: Request /v1/job/domain-listing-indexer-release-3, error: job not found
Aug 16 04:01:55 ip-10-123-13-59 nomad.sh[1251]:     2017/08/16 04:01:55.335967 [ERR] http: Request /v1/job/fe-dora-master-6, error: job not found
Aug 16 04:01:55 ip-10-123-13-59 nomad.sh[1251]:     2017/08/16 04:01:55.380096 [ERR] http: Request /v1/job/fe-server-listing-aggregate-api-master-32, error: job not found
Aug 16 04:39:45 ip-10-123-13-59 nomad.sh[1251]:     2017/08/16 04:39:45.228948 [ERR] nomad.rpc: multiplex conn accept failed: read tcp 10.123.13.59:4647->10.123.13.247:38618: read: connection reset by peer
Aug 16 04:39:58 ip-10-123-13-59 nomad.sh[1251]:     2017/08/16 04:39:58.633236 [WARN] nomad.heartbeat: node '7a02dcf8-004d-f107-ec0a-152ee6af4ae7' TTL expired
Aug 16 04:40:05 ip-10-123-13-59 nomad.sh[1251]:     2017/08/16 04:40:05.032088 [WARN] nomad.heartbeat: node 'de60ecd0-6519-34c6-0a2d-0a7562b94dc2' TTL expired
Aug 16 04:43:08 ip-10-123-13-59 nomad.sh[1251]:     2017/08/16 04:43:08.645632 [WARN] nomad.heartbeat: node '9b674d67-c0bc-bc13-bcb2-7ac3ce963e58' TTL expired
Aug 16 04:43:16 ip-10-123-13-59 nomad.sh[1251]:     2017/08/16 04:43:16.190246 [WARN] nomad.heartbeat: node '56618808-7fb3-b18d-12c9-3a8b28f8a1d1' TTL expired
Aug 16 04:44:38 ip-10-123-13-59 nomad.sh[1251]:     2017/08/16 04:44:38.245521 [WARN] nomad.heartbeat: node 'ab16140c-8f22-462e-932a-29e40b998d28' TTL expired
Aug 16 04:44:58 ip-10-123-13-59 nomad.sh[1251]:     2017/08/16 04:44:58.611083 [WARN] nomad.heartbeat: node 'a09d491c-b08e-49b3-0c90-82c38e7ba1dd' TTL expired
Aug 16 05:04:13 ip-10-123-13-59 nomad.sh[1251]:     2017/08/16 05:04:13.701494 [WARN] nomad.heartbeat: node 'd8afbb80-95b1-0e32-56a9-e3443364de12' TTL expired
Aug 16 05:10:08 ip-10-123-13-59 nomad.sh[1251]:     2017/08/16 05:10:08.729648 [WARN] nomad.heartbeat: node '0e4e6441-ea6d-1c44-e786-8533e13fff12' TTL expired
Aug 16 05:12:30 ip-10-123-13-59 nomad.sh[1251]:     2017/08/16 05:12:30.439047 [ERR] http: Request /v1/job/fe-server-listing-aggregate-api-master-42, error: job not found
Aug 16 05:24:17 ip-10-123-13-59 nomad.sh[1251]:     2017/08/16 05:24:17.090007 [WARN] nomad.heartbeat: node 'af73b024-c4ed-8f2a-df9c-b66da4371fde' TTL expired
Aug 16 05:38:25 ip-10-123-13-59 nomad.sh[1251]:     2017/08/16 05:38:25.293549 [WARN] nomad.heartbeat: node '58ca6080-0589-bf0f-d561-e462e39cd592' TTL expired
Aug 16 06:28:02 ip-10-123-13-59 nomad.sh[1251]:     2017/08/16 06:28:02 [WARN] raft: Failed to contact 10.123.11.76:4647 in 500.12918ms
Aug 16 06:28:03 ip-10-123-13-59 nomad.sh[1251]:     2017/08/16 06:28:03 [WARN] raft: Failed to contact 10.123.11.76:4647 in 500.156758ms
Aug 16 06:29:09 ip-10-123-13-59 nomad.sh[1251]:     2017/08/16 06:29:09 [WARN] raft: Failed to contact 10.123.11.76:4647 in 500.266741ms
Aug 16 06:29:10 ip-10-123-13-59 nomad.sh[1251]:     2017/08/16 06:29:10 [WARN] raft: Failed to contact 10.123.11.76:4647 in 939.749937ms
Aug 16 07:10:27 ip-10-123-13-59 nomad.sh[1251]:     2017/08/16 07:10:27.862386 [WARN] nomad.heartbeat: node 'f1a2bbc7-9fad-0668-bcd7-0ee8a808b20e' TTL expired
Aug 16 11:25:24 ip-10-123-13-59 nomad.sh[1251]:     2017/08/16 11:25:24.558548 [WARN] nomad.heartbeat: node '8c4410f4-f83f-625f-68c9-84a66cdef6eb' TTL expired
Aug 16 21:08:58 ip-10-123-13-59 nomad.sh[1251]:     2017/08/16 21:08:58.839080 [WARN] nomad.heartbeat: node '1ebef74f-43f0-02f3-c115-fed3f4948a37' TTL expired
Aug 16 21:10:31 ip-10-123-13-59 nomad.sh[1251]:     2017/08/16 21:10:31.135279 [WARN] nomad.heartbeat: node '689dfe9a-028c-c553-9c27-7905e1088c30' TTL expired
Aug 16 22:55:06 ip-10-123-13-59 nomad.sh[1251]:     2017/08/16 22:55:06.618910 [WARN] nomad.heartbeat: node '0e6f6366-eea8-21c4-a0fc-1d74f358a91a' TTL expired
Aug 16 22:55:10 ip-10-123-13-59 nomad.sh[1251]:     2017/08/16 22:55:10.893190 [WARN] nomad.heartbeat: node '62a44c26-42d2-7c45-6a4b-17f203eb40ad' TTL expired
Aug 16 22:56:07 ip-10-123-13-59 nomad.sh[1251]:     2017/08/16 22:56:07.410547 [WARN] nomad.heartbeat: node 'ea469dd8-330b-ba36-b452-bb4db6fda41d' TTL expired
Aug 17 02:44:51 ip-10-123-13-59 nomad.sh[1251]:     2017/08/17 02:44:51.296404 [WARN] nomad.heartbeat: node '745a369f-710e-e54b-3618-601b9587f449' TTL expired
Aug 17 02:51:25 ip-10-123-13-59 nomad.sh[1251]:     2017/08/17 02:51:25 [INFO] raft: Starting snapshot up to 10367
Aug 17 02:51:25 ip-10-123-13-59 nomad.sh[1251]:     2017/08/17 02:51:25 [INFO] snapshot: Creating new snapshot at /var/lib/nomad/server/raft/snapshots/2-10367-1502938285137.tmp
Aug 17 02:51:25 ip-10-123-13-59 nomad.sh[1251]:     2017/08/17 02:51:25 [INFO] raft: Compacting logs from 1 to 128
Aug 17 02:51:25 ip-10-123-13-59 nomad.sh[1251]:     2017/08/17 02:51:25 [INFO] raft: Snapshot to 10367 complete
Aug 17 02:53:54 ip-10-123-13-59 nomad.sh[1251]: panic: no future
Aug 17 02:53:54 ip-10-123-13-59 nomad.sh[1251]: goroutine 139 [running]:
Aug 17 02:53:54 ip-10-123-13-59 nomad.sh[1251]: github.com/hashicorp/nomad/nomad/deploymentwatcher.(*EvalBatcher).batcher(0xc4203e6240)
Aug 17 02:53:54 ip-10-123-13-59 nomad.sh[1251]:         /opt/gopath/src/github.com/hashicorp/nomad/nomad/deploymentwatcher/batcher.go:82 +0x588
Aug 17 02:53:54 ip-10-123-13-59 nomad.sh[1251]: created by github.com/hashicorp/nomad/nomad/deploymentwatcher.NewEvalBatcher
Aug 17 02:53:54 ip-10-123-13-59 nomad.sh[1251]:         /opt/gopath/src/github.com/hashicorp/nomad/nomad/deploymentwatcher/batcher.go:42 +0xde
Aug 17 02:53:54 ip-10-123-13-59 systemd[1]: nomad.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Aug 17 02:53:54 ip-10-123-13-59 systemd[1]: nomad.service: Unit entered failed state.
Aug 17 02:53:54 ip-10-123-13-59 systemd[1]: nomad.service: Failed with result 'exit-code'.

Reproduction steps

N/A

@dadgar
Copy link
Contributor

dadgar commented Aug 17, 2017

Thanks for the report @discobean Will be fixed in 0.6.1!

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 10, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants