Periodic jobs "non-tracked" after server restart #2829

carlpett · 2017-07-13T18:05:32Z

Nomad version

Nomad v0.5.6

Operating system and Environment details

Centos 7, 3 server nodes

Issue

About two days ago, one of the server nodes in our cluster panicked and exited, and was subsequently restarted. However, since then, some of our periodic jobs have not been working. There are a lot of these lines in the server logs:

[ERR] nomad.periodic: force run of periodic job "consul-snapshot" failed: can't force run non-tracked job consul-snapshot
[ERR] nomad: failed to establish leadership: force run of periodic job "consul-snapshot" failed: can't force run non-tracked job consul-snapshot

As well as these:

[ERR] nomad.periodic: failed to dispatch job "logstash-curator": timed out enqueuing operation
[ERR] nomad.client: alloc update failed: timed out enqueuing operation  ### (about 1 of these for 100 of the above)

What are my options here? Just remove the jobs and reschedule? They have been working for several months at least up until two days ago.

Server logs

This is the last few logs from the node that crashed. I'm not sure if it is related or not:

2017/07/11 14:42:12.495355 [ERR] nomad: failed to establish leadership: force run of periodic job "consul-snapshot" failed: can't force run non-tracked job consul-snapshot
2017/07/11 14:42:46.146890 [INFO] fingerprint.consul: consul agent is unavailable
2017/07/11 14:42:46 [WARN] raft: Failed to contact quorum of nodes, stepping down
2017/07/11 14:42:46 [INFO] raft: Node at 192.168.123.154:4647 [Follower] entering Follower state (Leader: "")
2017/07/11 14:42:46.210437 [ERR] nomad.client: Register failed: node is not the leader
2017/07/11 14:42:46.210478 [ERR] client: registration failure: node is not the leader
2017/07/11 14:42:46.210421 [INFO] nomad: cluster leadership lost
2017/07/11 14:42:46 [INFO] raft: aborting pipeline replication to peer {Voter 192.168.123.118:4647 192.168.123.118:4647}
2017/07/11 14:42:46 [INFO] raft: aborting pipeline replication to peer {Voter 192.168.123.116:4647 192.168.123.116:4647}
2017/07/11 14:42:46.213334 [ERR] worker: failed to dequeue evaluation: eval broker disabled
panic: close of closed channel
goroutine 176795521 [running]:
github.com/hashicorp/nomad/nomad.(*PeriodicDispatch).run(0xc42039d3e0)
/opt/gopath/src/github.com/hashicorp/nomad/nomad/periodic.go:325 +0x221
created by github.com/hashicorp/nomad/nomad.(*PeriodicDispatch).Start
/opt/gopath/src/github.com/hashicorp/nomad/nomad/periodic.go:171 +0x71

The job consul-snapshot is a periodic parameterized job. I'm guessing one of the parameterized versions has been crashing for a longer time, since we do not seem to have any snapshots from that consul cluster.

The text was updated successfully, but these errors were encountered:

github-actions · 2022-12-10T02:15:55Z

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

schmichael added type/bug theme/client stage/needs-investigation labels Jul 31, 2017

dadgar mentioned this issue Aug 3, 2017

Fix restoration of parameterized, periodic jobs #2959

Merged

dadgar removed the theme/client label Aug 3, 2017

dadgar closed this as completed in #2959 Aug 7, 2017

github-actions bot locked as resolved and limited conversation to collaborators Dec 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Periodic jobs "non-tracked" after server restart #2829

Periodic jobs "non-tracked" after server restart #2829

carlpett commented Jul 13, 2017

github-actions bot commented Dec 10, 2022

Periodic jobs "non-tracked" after server restart #2829

Periodic jobs "non-tracked" after server restart #2829

Comments

carlpett commented Jul 13, 2017

Nomad version

Operating system and Environment details

Issue

Server logs

github-actions bot commented Dec 10, 2022