System jobs may not be allocated on newly initialized nodes #5579

notnoop · 2019-04-18T15:22:06Z

Nomad version

Nomad v0.9.0 (18dd59056ee1d7b2df51256fe900a98460d3d6b9)

Operating system and Environment details

Linux, amd64 on Ubuntu 16.04

Issue

Nodes that spin up after a system job registration may never run the system job.

Looks like there is a race in node placement and node update hooks. However, this races with the scheduler evaluating the placement of a job immediately on registry, and the scheduler would fail to place the evaluation.

Generic schedulers mitigate this scenario by using blocked evaluations to be re-evaluated on node updates. System scheduler doesn't perform blocked evals though, as reported in #4072; so the job never runs on that node.

Initially, I thought it was due to driver manager being async but at a closer look, it might not be the issue. I've added a 30 second sleep in exec fingerprinting, and looking in the client vs server logs, the server only dequeus evals after the node has detected all drivers, rather than immediately when node is up, so probably a race is occuring somewhere else.

[1] #4072

Reproduction steps

Spin up a nomad server with some clients
submit job file below, note that it runs on all existing clients
Spin up a client
run nomad node status and nomad job status example

Expect a new alloc on all nodes but new node doesn't have an alloc:

vagrant@nomad-server-01:~$ nomad node status
ID        DC   Name             Class   Drain  Eligibility  Status
451d1b74  dc1  nomad-client-04  <none>  false  eligible     ready
f02fc6f9  dc1  nomad-client-03  <none>  false  eligible     ready
e2ec56fd  dc1  nomad-client-02  <none>  false  eligible     ready
13720d7f  dc1  nomad-client-02  <none>  false  eligible     down
03ce9e96  dc1  nomad-client-01  <none>  false  eligible     ready
vagrant@nomad-server-01:~$ nomad job status example
ID            = example
Name          = example
Submit Date   = 2019-04-18T14:15:21Z
Type          = system
Priority      = 50
Datacenters   = dc1
Status        = running
Periodic      = false
Parameterized = false

Summary
Task Group  Queued  Starting  Running  Failed  Complete  Lost
cache       0       0         3        0       0         1

Allocations
ID        Node ID   Task Group  Version  Desired  Status   Created     Modified
75601224  f02fc6f9  cache       0        run      running  37m30s ago  37m27s ago
10f01367  e2ec56fd  cache       0        run      running  47m52s ago  47m51s ago
65a89311  13720d7f  cache       0        stop     lost     50m19s ago  49m7s ago
87c23673  03ce9e96  cache       0        run      running  53m5s ago   53m3s ago

Note that node 451d1b74 didn't have a corresponding alloc. Below you can find the failed evaluation for the job and node update.

Job file (if appropriate)

job "example" {
  datacenters = ["dc1"]
  type = "system"

  group "cache" {
    task "redis" {
      driver = "exec"

      config {
        command = "/bin/sleep"
        args = ["50000"]
      }

      resources {
        cpu    = 500
        memory = 256
      }
    }
  }
}

Failed to Place Evaluation

I found the following evaluation for node `451d1b74` and job with failure to update:

{
"AnnotatePlan": false,
"BlockedEval": "",
"ClassEligibility": null,
"CreateIndex": 76,
"DeploymentID": "",
"EscapedComputedClass": false,
"FailedTGAllocs": {
"cache": {
"AllocationTime": 6493,
"ClassExhausted": null,
"ClassFiltered": null,
"CoalescedFailures": 0,
"ConstraintFiltered": {
"missing drivers": 1
},
"DimensionExhausted": null,
"NodesAvailable": {
"dc1": 4
},
"NodesEvaluated": 1,
"NodesExhausted": 0,
"NodesFiltered": 1,
"QuotaExhausted": null,
"ScoreMetaData": null,
"Scores": null
}
},
"ID": "a6bfbdfc-94c4-ac26-7b90-7f3c5d5711c9",
"JobID": "example",
"JobModifyIndex": 0,
"LeaderACL": "",
"ModifyIndex": 77,
"Namespace": "default",
"NextEval": "",
"NodeID": "451d1b74-107e-ba71-7386-9cdc3265094d",
"NodeModifyIndex": 75,
"PreviousEval": "",
"Priority": 50,
"QueuedAllocations": {
"cache": 0
},
"QuotaLimitReached": "",
"SnapshotIndex": 76,
"Status": "complete",
"StatusDescription": "",
"TriggeredBy": "node-update",
"Type": "system",
"Wait": 0,
"WaitUntil": "0001-01-01T00:00:00Z"
},


### Nomad Client logs (if appropriate)

Apr 18 14:37:09 nomad-client-04 nomad[13539]: 2019-04-18T14:37:05.267Z [INFO ] agent: detected plugin: name=raw_exec type=driver plugin_version=0.1.0
Apr 18 14:37:09 nomad-client-04 nomad[13539]: 2019-04-18T14:37:05.267Z [INFO ] agent: detected plugin: name=exec type=driver plugin_version=0.1.0
Apr 18 14:37:09 nomad-client-04 nomad[13539]: 2019-04-18T14:37:09.293Z [DEBUG] client.driver_mgr: initial driver fingerprint: driver=raw_exec health=undetected description=disabled
Apr 18 14:37:09 nomad-client-04 nomad[13539]: 2019-04-18T14:37:09.294Z [DEBUG] client.driver_mgr: initial driver fingerprint: driver=rkt health=undetected description="Failed to execute rkt version: exec: "rkt": executable file not found in $PATH"
Apr 18 14:37:14 nomad-client-04 nomad[13539]: 2019-04-18T14:37:14.293Z [DEBUG] client.driver_mgr: detected drivers: drivers="[mock_driver exec]"
Apr 18 14:37:39 nomad-client-04 nomad[13539]: 2019-04-18T14:37:39.294Z [DEBUG] client.driver_mgr: initial driver fingerprint: driver=exec health=healthy description=Healthy


### Nomad Server logs (if appropriate)

Apr 18 14:37:39 nomad-server-01 nomad[13543]: 2019-04-18T14:37:39.663Z [DEBUG] worker: dequeued evaluation: eval_id=a6bfbdfc-94c4-ac26-7b90-7f3c5d5711c9
Apr 18 14:37:39 nomad-server-01 nomad[13543]: 2019-04-18T14:37:39.664Z [DEBUG] worker.system_sched: reconciled current state with desired state: eval_id=a6bfbdfc-94c4-ac26-7b90-7f3c5d5711c9 job_id=example namespace=default place=1 update=0 migrate=0 stop=0 ignore=3 lost=0
Apr 18 14:37:39 nomad-server-01 nomad[13543]: 2019-04-18T14:37:39.664Z [DEBUG] worker.system_sched: setting eval status: eval_id=a6bfbdfc-94c4-ac26-7b90-7f3c5d5711c9 job_id=example namespace=default status=complete
Apr 18 14:37:39 nomad-server-01 nomad[13543]: 2019-04-18T14:37:39.667Z [DEBUG] worker: updated evaluation: eval="<Eval "a6bfbdfc-94c4-ac26-7b90-7f3c5d5711c9" JobID: "example" Namespace: "default">"
Apr 18 14:37:39 nomad-server-01 nomad[13543]: 2019-04-18T14:37:39.667Z [DEBUG] worker: ack evaluation: eval_id=a6bfbdfc-94c4-ac26-7b90-7f3c5d5711c9

The text was updated successfully, but these errors were encountered:

Here we retain 0.8.7 behavior of waiting for driver fingerprints before registering a node, with some timeout. This is needed for system jobs, as system job scheduling for node occur at node registration, and the race might mean that a system job may not get placed on the node because of missing drivers. The timeout isn't strictly necessary, but raising it to 1 minute as it's closer to indefinitely blocked than 1 second. We need to keep the value high enough to capture as much drivers/devices, but low enough that doesn't risk blocking too long due to misbehaving plugin. Fixes #5579

github-actions · 2022-11-24T02:20:24Z

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

notnoop mentioned this issue Apr 19, 2019

client: wait for batched driver updates before registering nodes #5585

Merged

notnoop closed this as completed in #5585 Apr 19, 2019

langmartin mentioned this issue Jun 27, 2019

System scheduler blocked evals #5900

Merged

1 task

github-actions bot locked as resolved and limited conversation to collaborators Nov 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

System jobs may not be allocated on newly initialized nodes #5579

System jobs may not be allocated on newly initialized nodes #5579

notnoop commented Apr 18, 2019

github-actions bot commented Nov 24, 2022

System jobs may not be allocated on newly initialized nodes #5579

System jobs may not be allocated on newly initialized nodes #5579

Comments

notnoop commented Apr 18, 2019

Nomad version

Operating system and Environment details

Issue

Reproduction steps

Job file (if appropriate)

Failed to Place Evaluation

github-actions bot commented Nov 24, 2022