Job Evaluations are not correctly adjusting for dead worker nodes #1663

BSick7 · 2016-08-30T14:16:03Z

Nomad version

Clients and Servers

$ nomad -v
Nomad v0.4.1

Operating system and Environment details

Our clients have the following set:

leave_on_interrupt = true
leave_on_terminate = true

We are running nomad workers using immutable infrastructure.
We have 2 sets of workgroups (blue and green) that allow us to upgrade worker boxes without downtime.
We run nomad jobs at both workgroups using constraints.
Our job status may look something like this:

$ nomad status deploy
ID          = deploy
Name        = deploy
Type        = system
Priority    = 50
Datacenters = us-east-1a,us-east-1b,us-east-1d
Status      = running
Periodic    = false

Summary
Task Group    Queued  Starting  Running  Failed  Complete  Lost
deploy-blue   0       0         3        0       0         0
deploy-green  0       0         3        0       0         0

Allocations
ID        Eval ID   Node ID   Task Group    Desired  Status   Created At
0e2eee8f  c340a16a  82e0e895  deploy-green  run      running  08/29/16 13:54:58 UTC
23884cb5  fcc2b1fc  92264826  deploy-green  run      running  08/29/16 13:54:55 UTC
12f809a5  4e1fee7c  11cdc54f  deploy-green  run      running  08/29/16 13:54:52 UTC
006ab90f  93168101  ec3aa0f7  deploy-blue   run      running  08/29/16 12:22:24 UTC
75712d72  93168101  19dea94b  deploy-blue   run      running  08/29/16 12:22:24 UTC
8d55a1bf  93168101  6ff73ef5  deploy-blue   run      running  08/29/16 12:22:24 UTC

Issue

The issue arises when upgrading our nodes.
This particular job is scheduled as a system job so we expect it to run on all worker nodes that are live.

Instead, what we get is the job is placed based on comparing "total live workers" vs "total allocations".
From the above job status, we can see that all 6 allocations are on down worker nodes.

$ nomad node-status
ID        DC          Name              Class  Drain  Status
0d72463c  us-east-1a  worker-green-17   green  false  ready
772737ae  us-east-1b  worker-green-69   green  false  ready
f2424785  us-east-1d  worker-green-148  green  false  ready
2f1c84e1  us-east-1b  worker-green-69   green  false  down
0683e842  us-east-1a  worker-green-19   green  false  down
f26a326a  us-east-1d  worker-green-158  green  false  down
8a2b653d  us-east-1a  worker-green-8    green  false  down
a9fc0246  us-east-1b  worker-green-86   green  false  down
b88c0861  us-east-1d  worker-green-133  green  false  down
11cdc54f  us-east-1b  worker-green-69   green  false  down
82e0e895  us-east-1d  worker-green-140  green  false  down
92264826  us-east-1a  worker-green-22   green  false  down
ec3aa0f7  us-east-1b  worker-blue-85    blue   false  down
6ff73ef5  us-east-1d  worker-blue-139   blue   false  down
19dea94b  us-east-1a  worker-blue-20    blue   false  down

Since a system job should run on every box, we would expect 3 live allocations.

Reproduction steps

Start with a 1 worker-node nomad cluster
Issue a job that satisfies that node.
Start another worker-node (that will satisfy the same constraint)
Stop original worker-node.

The text was updated successfully, but these errors were encountered:

dadgar · 2016-08-30T16:23:04Z

In your example there are only 3 nodes up but the status says it is running on 6?

dadgar · 2016-08-30T16:23:34Z

Would it be possible to maybe share two node configs and a job file that will expose this behavior?

BSick7 · 2016-08-31T12:29:14Z

There are 3 live nodes (think of these as worker-green v2). The 6 running jobs are allocated on 3 worker-green v1 (dead) and 3 worker-blue v1 (dead).

Deploy Job Spec

NOTE: I dropped Config(docker config), Env, Services, and Resources from definition.

{
    "Job": {
        "Region": "us-east",
        "ID": "deploy",
        "Name": "deploy",
        "Type": "system",
        "Priority": 50,
        "AllAtOnce": false,
        "Datacenters": [
            "us-east-1b",
            "us-east-1e"
        ],
        "Constraints": null,
        "TaskGroups": [
            {
                "Name": "deploy-blue",
                "Count": 1,
                "Constraints": [
                    {
                        "LTarget": "${node.class}",
                        "RTarget": "blue",
                        "Operand": "="
                    }
                ],
                "Tasks": [
                    {
                        "Name": "deploy",
                        "Driver": "docker",
                        "User": "",
                        "Config": {},
                        "Constraints": null,
                        "Services": [],
                        "Env": {},
                        "Resources": {},
                        "Meta": null,
                        "KillTimeout": 5000000000,
                        "LogConfig": {
                            "MaxFiles": 10,
                            "MaxFileSizeMB": 10
                        },
                        "Artifacts": null
                    }
                ],
                "RestartPolicy": {
                    "Interval": 60000000000,
                    "Attempts": 1,
                    "Delay": 15000000000,
                    "Mode": "delay"
                },
                "Meta": null
            },
            {
                "Name": "deploy-green",
                "Count": 1,
                "Constraints": [
                    {
                        "LTarget": "${node.class}",
                        "RTarget": "green",
                        "Operand": "="
                    }
                ],
                "Tasks": [
                    {
                        "Name": "deploy",
                        "Driver": "docker",
                        "User": "",
                        "Config": {},
                        "Constraints": null,
                        "Services": [],
                        "Env": {},
                        "Resources": {},
                        "Meta": null,
                        "KillTimeout": 5000000000,
                        "LogConfig": {
                            "MaxFiles": 10,
                            "MaxFileSizeMB": 10
                        },
                        "Artifacts": null
                    }
                ],
                "RestartPolicy": {
                    "Interval": 60000000000,
                    "Attempts": 1,
                    "Delay": 15000000000,
                    "Mode": "delay"
                },
                "Meta": null
            }
        ],
        "Update": {
            "Stagger": 0,
            "MaxParallel": 0
        },
        "Periodic": null,
        "Meta": null,
        "Status": "running",
        "StatusDescription": "",
        "CreateIndex": 313764,
        "ModifyIndex": 313766,
        "JobModifyIndex": 313764
    }
}

Sample Green Worker Config

data_dir = "/var/lib/nomad"
leave_on_interrupt = true
leave_on_terminate = true
disable_update_check = true
datacenter = "<scrubbed>"
region = "<scrubbed>"
bind_addr = "<scrubbed>"
client {
  node_class = "green"
}
consul {
  address = "<scrubbed>"
}

Sample Blue Worker Config

data_dir = "/var/lib/nomad"
leave_on_interrupt = true
leave_on_terminate = true
disable_update_check = true
datacenter = "<scrubbed>"
region = "<scrubbed>"
bind_addr = "<scrubbed>"
client {
  node_class = "blue"
}
consul {
  address = "<scrubbed>"
}

steve-jansen · 2016-09-12T12:10:56Z

@dadgar

This problem appears to be limited to system jobs.

It's also worth noting that the nodes remain listed in nomad node-status after the nodes have been terminated in AWS, and after we issue a curl -X PUT ${NOMAD_ADDR}/v1/system/gc to garbage collect dead nodes.

It's unclear why system jobs remain "running" on a ghost node.

dadgar · 2016-09-12T16:17:17Z

@steve-jansen Thanks for the additional detail. Will get this fixed before releasing 0.5!

github-actions · 2022-12-19T02:12:34Z

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

dadgar added type/bug stage/waiting-reply theme/scheduling labels Aug 30, 2016

dadgar removed the stage/waiting-reply label Aug 31, 2016

dadgar added this to the v0.5.0 milestone Aug 31, 2016

dadgar mentioned this issue Sep 17, 2016

Fix bug where dead nodes weren't properly handled by system scheduler #1715

Merged

dadgar closed this as completed in #1715 Sep 19, 2016

github-actions bot locked as resolved and limited conversation to collaborators Dec 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Job Evaluations are not correctly adjusting for dead worker nodes #1663

Job Evaluations are not correctly adjusting for dead worker nodes #1663

BSick7 commented Aug 30, 2016

dadgar commented Aug 30, 2016

dadgar commented Aug 30, 2016

BSick7 commented Aug 31, 2016

steve-jansen commented Sep 12, 2016

dadgar commented Sep 12, 2016

github-actions bot commented Dec 19, 2022

Job Evaluations are not correctly adjusting for dead worker nodes #1663

Job Evaluations are not correctly adjusting for dead worker nodes #1663

Comments

BSick7 commented Aug 30, 2016

Nomad version

Operating system and Environment details

Issue

Reproduction steps

dadgar commented Aug 30, 2016

dadgar commented Aug 30, 2016

BSick7 commented Aug 31, 2016

Deploy Job Spec

Sample Green Worker Config

Sample Blue Worker Config

steve-jansen commented Sep 12, 2016

dadgar commented Sep 12, 2016

github-actions bot commented Dec 19, 2022