When one of servers are reloaded nomad can decide that some client nodes expire they TTL, and reallocate allocations #3035

tantra35 · 2017-08-16T14:46:54Z

Nomad version

Nomad v0.6.0

Issue

If we made reboot of server on that placed nomad server, nomad can think that client nodes expire TTL

For example after reboot on server node(it was leader at that time), we got follow on current leader server:

Aug 16 16:23:53 vol-cl-control-03 nomad[8553]:     2017/08/16 16:23:53.197919 [WARN] nomad.heartbeat: node '08fba0d2-1ee9-a43a-5ade-76d99fc76028' TTL expired
Aug 16 16:23:53 vol-cl-control-03 nomad[8553]:     2017/08/16 16:23:53 [INFO] memberlist: Suspect vol-cl-control-02.global has failed, no acks received
Aug 16 16:23:53 vol-cl-control-03 nomad[8553]:     2017/08/16 16:23:53 [DEBUG] raft: Failed to contact 172.16.9.89:4647 in 37.430211249s
Aug 16 16:23:53 vol-cl-control-03 nomad[8553]:     2017/08/16 16:23:53.335928 [DEBUG] worker: dequeued evaluation 299e62e8-8fde-67f5-f09b-3e149507e3b3
Aug 16 16:23:53 vol-cl-control-03 nomad[8553]:     2017/08/16 16:23:53.336096 [DEBUG] worker: dequeued evaluation 05d16754-2cc1-e0c0-bf74-0d2783b5c328
Aug 16 16:23:53 vol-cl-control-03 nomad[8553]:     2017/08/16 16:23:53.336137 [DEBUG] sched: <Eval '299e62e8-8fde-67f5-f09b-3e149507e3b3' JobID: 'haproxy'>: Total changes: (place 1) (destr
Aug 16 16:23:53 vol-cl-control-03 nomad[8553]: Desired Changes for "haproxy": (place 1) (inplace 0) (destructive 0) (stop 1) (migrate 0) (ignore 1) (canary 0)
Aug 16 16:23:53 vol-cl-control-03 nomad[8553]:     2017/08/16 16:23:53.336224 [DEBUG] sched: <Eval '05d16754-2cc1-e0c0-bf74-0d2783b5c328' JobID: 'townshipDynamoTeamServer'>: Total changes:
Aug 16 16:23:53 vol-cl-control-03 nomad[8553]: Desired Changes for "townshipDynamoTeamServer": (place 1) (inplace 0) (destructive 0) (stop 1) (migrate 0) (ignore 0) (canary 0)
Aug 16 16:23:53 vol-cl-control-03 nomad[8553]:     2017/08/16 16:23:53.416086 [DEBUG] worker: submitted plan for evaluation 299e62e8-8fde-67f5-f09b-3e149507e3b3
Aug 16 16:23:53 vol-cl-control-03 nomad[8553]:     2017/08/16 16:23:53.416139 [DEBUG] sched: <Eval '299e62e8-8fde-67f5-f09b-3e149507e3b3' JobID: 'haproxy'>: setting status to complete
Aug 16 16:23:53 vol-cl-control-03 nomad[8553]:     2017/08/16 16:23:53.499625 [DEBUG] worker: updated evaluation <Eval '299e62e8-8fde-67f5-f09b-3e149507e3b3' JobID: 'haproxy'>
Aug 16 16:23:53 vol-cl-control-03 nomad[8553]:     2017/08/16 16:23:53.499653 [DEBUG] worker: ack for evaluation 299e62e8-8fde-67f5-f09b-3e149507e3b3
Aug 16 16:23:53 vol-cl-control-03 nomad[8553]:     2017/08/16 16:23:53.499673 [DEBUG] worker: submitted plan for evaluation 05d16754-2cc1-e0c0-bf74-0d2783b5c328
Aug 16 16:23:53 vol-cl-control-03 nomad[8553]:     2017/08/16 16:23:53.499687 [DEBUG] sched: <Eval '05d16754-2cc1-e0c0-bf74-0d2783b5c328' JobID: 'townshipDynamoTeamServer'>: setting status
Aug 16 16:23:53 vol-cl-control-03 nomad[8553]:     2017/08/16 16:23:53.565375 [DEBUG] worker: updated evaluation <Eval '05d16754-2cc1-e0c0-bf74-0d2783b5c328' JobID: 'townshipDynamoTeamServ
Aug 16 16:23:53 vol-cl-control-03 nomad[8553]:     2017/08/16 16:23:53.565400 [DEBUG] worker: ack for evaluation 05d16754-2cc1-e0c0-bf74-0d2783b5c328
Aug 16 16:23:53 vol-cl-control-03 nomad[8553]:     2017/08/16 16:23:53 [DEBUG] raft: Failed to contact 172.16.9.89:4647 in 37.878762738s

As you can see nomad writes Aug 16 16:23:53 vol-cl-control-03 nomad[8553]: 2017/08/16 16:23:53.197919 [WARN] nomad.heartbeat: node '08fba0d2-1ee9-a43a-5ade-76d99fc76028' TTL expired , but this decision is wrong because client node with id 08fba0d2-1ee9-a43a-5ade-76d99fc76028, was healthy and worked excellent. As a result of this wrong decision nomad began reallocate allocation placed on that node, but this it fully wrong

The text was updated successfully, but these errors were encountered:

github-actions · 2022-12-03T02:14:06Z

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

schmichael added type/bug stage/needs-investigation theme/scheduling labels Aug 16, 2017

dadgar added the theme/heartbeating label Feb 12, 2018

dadgar mentioned this issue Mar 12, 2018

Heartbeat improvements and handling failures during establishing leadership #3890

Merged

dadgar closed this as completed in #3890 Mar 12, 2018

github-actions bot locked as resolved and limited conversation to collaborators Dec 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When one of servers are reloaded nomad can decide that some client nodes expire they TTL, and reallocate allocations #3035

When one of servers are reloaded nomad can decide that some client nodes expire they TTL, and reallocate allocations #3035

tantra35 commented Aug 16, 2017

github-actions bot commented Dec 3, 2022

When one of servers are reloaded nomad can decide that some client nodes expire they TTL, and reallocate allocations #3035

When one of servers are reloaded nomad can decide that some client nodes expire they TTL, and reallocate allocations #3035

Comments

tantra35 commented Aug 16, 2017

Nomad version

Issue

github-actions bot commented Dec 3, 2022