Job update always lead to allocations migrate (node-reschedule-penalty ) #5856

capone212 · 2019-06-19T14:55:36Z

Nomad version

Output from nomad version
Nomad v0.9.3

Operating system and Environment details

3 test linux boxes with client+server mode

Issue

After changing single Meta or Envar for single task in job file, nomad re-schedules the whole job with several tasks to another node. I think this behavior new (comparing to old versions) and suboptimal.

Digging with console tool I found the following. Originally job with single task group and 2 tasks in the group, was allocated on node 4aae63e4. Then i have changed single envar value, and run nomad job plan:

nomad job plan --verbose job.hcl 

+/- Job: "test.DeviceIpint"
+/- Task Group: "default" (1 create/destroy update)
  +/- Task: "DeviceIpint.1" (forces create/destroy update)
    +/- Env[RestartTag]: "tag2" => "tag3"
      Task: "VideDecoder.2"

Then I have updated job with nomad job run job.hcl . In result nomad moved allocation to another node (9e3a3d78).

To understand why, I looked to alloc status, and it seems it due node-reschedule-penalty.

nomad alloc status -verbose <alloc_id>

---skipped some info----
Placement Metrics
Node                                  node-reschedule-penalty  node-affinity  binpack  job-anti-affinity  final score
9e3a3d78-261f-2b52-ddc8-4a770196a325  0                        0              0.498    0                  0.498
3b36f877-4790-bc64-84a5-6e74d0cd4167  0                        0              0.403    0                  0.403
4aae63e4-6614-2e55-0168-c1b12cc992df  -1                       0              0.417    0                  -0.292

I think node-reschedule-penalty in current case should not be taken in account (because task is not failing ), also it is clear that restarting the allocation in the same node is more appropriate than migrating to another node in this case.

I would like nomad try to place allocations on the same server by default. Is there any flag or trick I can force nomad not to move allocations ?

The text was updated successfully, but these errors were encountered:

Dirrk · 2019-06-19T20:56:48Z

job "docs" {
  group "example" {
    ephemeral_disk {
      migrate = true
      size    = "500"
      sticky  = true
    }
  }
}

I attach ephemeral_disk to jobs that I want to stay on the same host. From the docs:

Specifies that Nomad should make a best-effort attempt to place the updated allocation on the same machine. This will move the local/ and alloc/data directories to the new allocation.

https://www.nomadproject.io/docs/job-specification/ephemeral_disk.html

preetapan · 2019-06-20T15:34:23Z

@capone212 You can add an affinity or constraint on the set of nodes you want the job to run on. Would recommend against pinning to the same node using a constraint because then if that node goes down the job can't run anywhere else till its back.

As for the node-reschedule-penalty - could you share the output for nomad alloc status -json for all three allocs? It should only add a penalty if a previos alloc for the same job failed on that node. The json output should help us debug further.

capone212 · 2019-06-20T19:57:37Z

Hi @preetapan thanks for your response!
Please find the requested info by following link https://gist.github.com/capone212/280bb0d9cfdd11298eae8aed75fc0700

Please let me know if you need something. Please be informed, that I use tasks with custom external driver implemented using plugin interface.

stale · 2019-09-18T20:58:38Z

Hey there

Since this issue hasn't had any activity in a while - we're going to automatically close it in 30 days. If you're still seeing this issue with the latest version of Nomad, please respond here and we'll keep this open and take another look at this.

Thanks!

capone212 · 2019-09-26T14:16:51Z

Still actiual

kdsnice · 2019-10-29T11:57:13Z

Still actual with Nomad v0.9.5

tgross · 2019-11-25T14:21:03Z

I've verified this behavior on 0.10.2-rc1 as well. Reproduction steps on a cluster using our e2e setup (4 client nodes):

▶ nomad job init -short
▶ nomad job run example.nomad
==> Monitoring evaluation "85ea5dee"
    Evaluation triggered by job "example"
    Evaluation within deployment: "1cc9eeb6"
    Allocation "7b881420" created: node "ff8ed4ac", group "cache"
    Evaluation status changed: "pending" -> "complete"
==> Evaluation "85ea5dee" finished with status "complete"

▶ nomad job status example
...
Allocations
ID        Node ID   Task Group  Version  Desired  Status   Created  Modified
7b881420  ff8ed4ac  cache       0        run      running  6s ago   1s ago

# Edit the job file to add a `env { version = "1" }` stanza to the task
▶ emacs example.hcl
...

▶ nomad job run example.nomad
==> Monitoring evaluation "89db0813"
    Evaluation triggered by job "example"
    Evaluation within deployment: "fc89ecc2"
    Allocation "686d1da1" created: node "1c61fbb7", group "cache"
    Evaluation status changed: "pending" -> "complete"
==> Evaluation "89db0813" finished with status "complete"

# note we've landed on 2 different nodes
▶ nomad job status example
...
Allocations
ID        Node ID   Task Group  Version  Desired  Status    Created  Modified
686d1da1  1c61fbb7  cache       1        run      running   5s ago   1s ago
7b881420  ff8ed4ac  cache       0        stop     complete  37s ago  4s ago

# previous node has node-reschedule-penalty set
▶ nomad alloc status -verbose 686d1da1
ID                  = 686d1da1-cbfb-251c-b9e2-ccb85b4c8bf3
Eval ID             = 89db0813-e218-4ca4-ec6f-69e55dd7ce86
Name                = example.cache[0]
Node ID             = 1c61fbb7-df88-1737-c03f-7221fb8331eb
Node Name           = ip-172-31-29-84
Job ID              = example
Job Version         = 1
Client Status       = running
Client Description  = Tasks are running
Desired Status      = run
Desired Description = <none>
Created             = 2019-11-25T09:05:35-05:00
Modified            = 2019-11-25T09:05:49-05:00
Deployment ID       = fc89ecc2-a05b-4fa2-26d6-1f350b0d6ce0
Deployment Health   = healthy
Evaluated Nodes     = 2
Filtered Nodes      = 0
Exhausted Nodes     = 0
Allocation Time     = 72.165µs
Failures            = 0

Task "redis" is "running"
Task Resources
CPU        Memory           Disk     Addresses
2/500 MHz  6.3 MiB/256 MiB  300 MiB  db: 172.31.29.84:27806

Task Events:
Started At     = 2019-11-25T14:05:39Z
Finished At    = N/A
Total Restarts = 0
Last Restart   = N/A

Recent Events:
Time                       Type        Description
2019-11-25T09:05:39-05:00  Started     Task started by client
2019-11-25T09:05:36-05:00  Driver      Downloading image
2019-11-25T09:05:36-05:00  Task Setup  Building Task Directory
2019-11-25T09:05:35-05:00  Received    Task received by client

Placement Metrics
Node                                  binpack  job-anti-affinity  node-affinity  node-reschedule-penalty  final score
1c61fbb7-df88-1737-c03f-7221fb8331eb  0.2      0                  0              0                        0.2
ff8ed4ac-6b01-f8e2-490c-8c3016a4745f  0.2      0                  0              -1                       -0.4

alloc 7b881420 JSON

{
  "AllocModifyIndex": 69,
  "AllocatedResources": {
    "Shared": {
      "DiskMB": 300,
      "Networks": null
    },
    "Tasks": {
      "redis": {
        "Cpu": {
          "CpuShares": 500
        },
        "Memory": {
          "MemoryMB": 256
        },
        "Networks": [
          {
            "CIDR": "",
            "Device": "eth0",
            "DynamicPorts": [
              {
                "Label": "db",
                "To": 0,
                "Value": 21304
              }
            ],
            "IP": "172.31.25.125",
            "MBits": 10,
            "Mode": "",
            "ReservedPorts": null
          }
        ]
      }
    }
  },
  "ClientDescription": "All tasks have completed",
  "ClientStatus": "complete",
  "CreateIndex": 58,
  "CreateTime": 1574690702968629200,
  "DeploymentID": "1cc9eeb6-b553-7d09-2686-e61f6478f85c",
  "DeploymentStatus": {
    "Canary": false,
    "Healthy": true,
    "ModifyIndex": 63,
    "Timestamp": "2019-11-25T14:05:17.313989414Z"
  },
  "DesiredDescription": "alloc is being updated due to job update",
  "DesiredStatus": "stop",
  "DesiredTransition": {
    "Migrate": null,
    "Reschedule": null
  },
  "EvalID": "85ea5dee-ab03-3bd2-3678-a4ea4ebae49a",
  "FollowupEvalID": "",
  "ID": "7b881420-729f-6fec-7d76-e43dd86d0352",
  "Job": {
    "Affinities": null,
    "AllAtOnce": false,
    "Constraints": null,
    "CreateIndex": 56,
    "Datacenters": [
      "dc1"
    ],
    "Dispatched": false,
    "ID": "example",
    "JobModifyIndex": 56,
    "Meta": null,
    "Migrate": null,
    "ModifyIndex": 57,
    "Name": "example",
    "Namespace": "default",
    "ParameterizedJob": null,
    "ParentID": "",
    "Payload": null,
    "Periodic": null,
    "Priority": 50,
    "Region": "global",
    "Reschedule": null,
    "Spreads": null,
    "Stable": false,
    "Status": "pending",
    "StatusDescription": "",
    "Stop": false,
    "SubmitTime": 1574690702955314200,
    "TaskGroups": [
      {
        "Affinities": null,
        "Constraints": null,
        "Count": 1,
        "EphemeralDisk": {
          "Migrate": false,
          "SizeMB": 300,
          "Sticky": false
        },
        "Meta": null,
        "Migrate": {
          "HealthCheck": "checks",
          "HealthyDeadline": 300000000000,
          "MaxParallel": 1,
          "MinHealthyTime": 10000000000
        },
        "Name": "cache",
        "Networks": null,
        "ReschedulePolicy": {
          "Attempts": 0,
          "Delay": 30000000000,
          "DelayFunction": "exponential",
          "Interval": 0,
          "MaxDelay": 3600000000000,
          "Unlimited": true
        },
        "RestartPolicy": {
          "Attempts": 2,
          "Delay": 15000000000,
          "Interval": 1800000000000,
          "Mode": "fail"
        },
        "Services": null,
        "Spreads": null,
        "Tasks": [
          {
            "Affinities": null,
            "Artifacts": null,
            "Config": {
              "image": "redis:3.2",
              "port_map": [
                {
                  "db": 6379
                }
              ]
            },
            "Constraints": null,
            "DispatchPayload": null,
            "Driver": "docker",
            "Env": {
              "version": "0"
            },
            "KillSignal": "",
            "KillTimeout": 5000000000,
            "Kind": "",
            "Leader": false,
            "LogConfig": {
              "MaxFileSizeMB": 10,
              "MaxFiles": 10
            },
            "Meta": null,
            "Name": "redis",
            "Resources": {
              "CPU": 500,
              "Devices": null,
              "DiskMB": 0,
              "IOPS": 0,
              "MemoryMB": 256,
              "Networks": [
                {
                  "CIDR": "",
                  "Device": "",
                  "DynamicPorts": [
                    {
                      "Label": "db",
                      "To": 0,
                      "Value": 0
                    }
                  ],
                  "IP": "",
                  "MBits": 10,
                  "Mode": "",
                  "ReservedPorts": null
                }
              ]
            },
            "Services": null,
            "ShutdownDelay": 0,
            "Templates": null,
            "User": "",
            "Vault": null,
            "VolumeMounts": null
          }
        ],
        "Update": {
          "AutoPromote": false,
          "AutoRevert": false,
          "Canary": 0,
          "HealthCheck": "checks",
          "HealthyDeadline": 300000000000,
          "MaxParallel": 1,
          "MinHealthyTime": 10000000000,
          "ProgressDeadline": 600000000000,
          "Stagger": 30000000000
        },
        "Volumes": null
      }
    ],
    "Type": "service",
    "Update": {
      "AutoPromote": false,
      "AutoRevert": false,
      "Canary": 0,
      "HealthCheck": "",
      "HealthyDeadline": 0,
      "MaxParallel": 1,
      "MinHealthyTime": 0,
      "ProgressDeadline": 0,
      "Stagger": 30000000000
    },
    "VaultToken": "",
    "Version": 0
  },
  "JobID": "example",
  "Metrics": {
    "AllocationTime": 106081,
    "ClassExhausted": null,
    "ClassFiltered": null,
    "CoalescedFailures": 0,
    "ConstraintFiltered": null,
    "DimensionExhausted": null,
    "NodesAvailable": {
      "dc1": 2
    },
    "NodesEvaluated": 2,
    "NodesExhausted": 0,
    "NodesFiltered": 0,
    "QuotaExhausted": null,
    "ScoreMetaData": [
      {
        "NodeID": "1c61fbb7-df88-1737-c03f-7221fb8331eb",
        "NormScore": 0.20002653302990245,
        "Scores": {
          "job-anti-affinity": 0,
          "node-reschedule-penalty": 0,
          "node-affinity": 0,
          "binpack": 0.20002653302990245
        }
      },
      {
        "NodeID": "ff8ed4ac-6b01-f8e2-490c-8c3016a4745f",
        "NormScore": 0.20002653302990245,
        "Scores": {
          "job-anti-affinity": 0,
          "node-reschedule-penalty": 0,
          "node-affinity": 0,
          "binpack": 0.20002653302990245
        }
      }
    ],
    "Scores": null
  },
  "ModifyIndex": 72,
  "ModifyTime": 1574690735900604400,
  "Name": "example.cache[0]",
  "Namespace": "default",
  "NextAllocation": "686d1da1-cbfb-251c-b9e2-ccb85b4c8bf3",
  "NodeID": "ff8ed4ac-6b01-f8e2-490c-8c3016a4745f",
  "NodeName": "ip-172-31-25-125",
  "PreemptedAllocations": null,
  "PreemptedByAllocation": "",
  "PreviousAllocation": "",
  "RescheduleTracker": null,
  "Resources": {
    "CPU": 500,
    "Devices": null,
    "DiskMB": 300,
    "IOPS": 0,
    "MemoryMB": 256,
    "Networks": [
      {
        "CIDR": "",
        "Device": "eth0",
        "DynamicPorts": [
          {
            "Label": "db",
            "To": 0,
            "Value": 21304
          }
        ],
        "IP": "172.31.25.125",
        "MBits": 10,
        "Mode": "",
        "ReservedPorts": null
      }
    ]
  },
  "Services": null,
  "TaskGroup": "cache",
  "TaskResources": {
    "redis": {
      "CPU": 500,
      "Devices": null,
      "DiskMB": 0,
      "IOPS": 0,
      "MemoryMB": 256,
      "Networks": [
        {
          "CIDR": "",
          "Device": "eth0",
          "DynamicPorts": [
            {
              "Label": "db",
              "To": 0,
              "Value": 21304
            }
          ],
          "IP": "172.31.25.125",
          "MBits": 10,
          "Mode": "",
          "ReservedPorts": null
        }
      ]
    }
  },
  "TaskStates": {
    "redis": {
      "Events": [
        {
          "Details": {},
          "DiskLimit": 0,
          "DiskSize": 0,
          "DisplayMessage": "Task received by client",
          "DownloadError": "",
          "DriverError": "",
          "DriverMessage": "",
          "ExitCode": 0,
          "FailedSibling": "",
          "FailsTask": false,
          "GenericSource": "",
          "KillError": "",
          "KillReason": "",
          "KillTimeout": 0,
          "Message": "",
          "RestartReason": "",
          "SetupError": "",
          "Signal": 0,
          "StartDelay": 0,
          "TaskSignal": "",
          "TaskSignalReason": "",
          "Time": 1574690702985412600,
          "Type": "Received",
          "ValidationError": "",
          "VaultError": ""
        },
        {
          "Details": {
            "message": "Building Task Directory"
          },
          "DiskLimit": 0,
          "DiskSize": 0,
          "DisplayMessage": "Building Task Directory",
          "DownloadError": "",
          "DriverError": "",
          "DriverMessage": "",
          "ExitCode": 0,
          "FailedSibling": "",
          "FailsTask": false,
          "GenericSource": "",
          "KillError": "",
          "KillReason": "",
          "KillTimeout": 0,
          "Message": "Building Task Directory",
          "RestartReason": "",
          "SetupError": "",
          "Signal": 0,
          "StartDelay": 0,
          "TaskSignal": "",
          "TaskSignalReason": "",
          "Time": 1574690702991575300,
          "Type": "Task Setup",
          "ValidationError": "",
          "VaultError": ""
        },
        {
          "Details": {
            "image": "redis:3.2"
          },
          "DiskLimit": 0,
          "DiskSize": 0,
          "DisplayMessage": "Downloading image",
          "DownloadError": "",
          "DriverError": "",
          "DriverMessage": "Downloading image",
          "ExitCode": 0,
          "FailedSibling": "",
          "FailsTask": false,
          "GenericSource": "",
          "KillError": "",
          "KillReason": "",
          "KillTimeout": 0,
          "Message": "",
          "RestartReason": "",
          "SetupError": "",
          "Signal": 0,
          "StartDelay": 0,
          "TaskSignal": "",
          "TaskSignalReason": "",
          "Time": 1574690703021879300,
          "Type": "Driver",
          "ValidationError": "",
          "VaultError": ""
        },
        {
          "Details": {},
          "DiskLimit": 0,
          "DiskSize": 0,
          "DisplayMessage": "Task started by client",
          "DownloadError": "",
          "DriverError": "",
          "DriverMessage": "",
          "ExitCode": 0,
          "FailedSibling": "",
          "FailsTask": false,
          "GenericSource": "",
          "KillError": "",
          "KillReason": "",
          "KillTimeout": 0,
          "Message": "",
          "RestartReason": "",
          "SetupError": "",
          "Signal": 0,
          "StartDelay": 0,
          "TaskSignal": "",
          "TaskSignalReason": "",
          "Time": 1574690707311462700,
          "Type": "Started",
          "ValidationError": "",
          "VaultError": ""
        },
        {
          "Details": {
            "kill_timeout": "5s"
          },
          "DiskLimit": 0,
          "DiskSize": 0,
          "DisplayMessage": "Sent interrupt. Waiting 5s before force killing",
          "DownloadError": "",
          "DriverError": "",
          "DriverMessage": "",
          "ExitCode": 0,
          "FailedSibling": "",
          "FailsTask": false,
          "GenericSource": "",
          "KillError": "",
          "KillReason": "",
          "KillTimeout": 5000000000,
          "Message": "",
          "RestartReason": "",
          "SetupError": "",
          "Signal": 0,
          "StartDelay": 0,
          "TaskSignal": "",
          "TaskSignalReason": "",
          "Time": 1574690735483180800,
          "Type": "Killing",
          "ValidationError": "",
          "VaultError": ""
        },
        {
          "Details": {
            "oom_killed": "false",
            "exit_code": "0",
            "signal": "0"
          },
          "DiskLimit": 0,
          "DiskSize": 0,
          "DisplayMessage": "Exit Code: 0",
          "DownloadError": "",
          "DriverError": "",
          "DriverMessage": "",
          "ExitCode": 0,
          "FailedSibling": "",
          "FailsTask": false,
          "GenericSource": "",
          "KillError": "",
          "KillReason": "",
          "KillTimeout": 0,
          "Message": "",
          "RestartReason": "",
          "SetupError": "",
          "Signal": 0,
          "StartDelay": 0,
          "TaskSignal": "",
          "TaskSignalReason": "",
          "Time": 1574690735795506400,
          "Type": "Terminated",
          "ValidationError": "",
          "VaultError": ""
        },
        {
          "Details": {},
          "DiskLimit": 0,
          "DiskSize": 0,
          "DisplayMessage": "Task successfully killed",
          "DownloadError": "",
          "DriverError": "",
          "DriverMessage": "",
          "ExitCode": 0,
          "FailedSibling": "",
          "FailsTask": false,
          "GenericSource": "",
          "KillError": "",
          "KillReason": "",
          "KillTimeout": 0,
          "Message": "",
          "RestartReason": "",
          "SetupError": "",
          "Signal": 0,
          "StartDelay": 0,
          "TaskSignal": "",
          "TaskSignalReason": "",
          "Time": 1574690735811976700,
          "Type": "Killed",
          "ValidationError": "",
          "VaultError": ""
        }
      ],
      "Failed": false,
      "FinishedAt": "2019-11-25T14:05:35.819294074Z",
      "LastRestart": null,
      "Restarts": 0,
      "StartedAt": "2019-11-25T14:05:07.311467296Z",
      "State": "dead"
    }
  }
}

alloc 686d1da1 JSON

{
  "AllocModifyIndex": 69,
  "AllocatedResources": {
    "Shared": {
      "DiskMB": 300,
      "Networks": null
    },
    "Tasks": {
      "redis": {
        "Cpu": {
          "CpuShares": 500
        },
        "Memory": {
          "MemoryMB": 256
        },
        "Networks": [
          {
            "CIDR": "",
            "Device": "eth0",
            "DynamicPorts": [
              {
                "Label": "db",
                "To": 0,
                "Value": 27806
              }
            ],
            "IP": "172.31.29.84",
            "MBits": 10,
            "Mode": "",
            "ReservedPorts": null
          }
        ]
      }
    }
  },
  "ClientDescription": "Tasks are running",
  "ClientStatus": "running",
  "CreateIndex": 69,
  "CreateTime": 1574690735466495500,
  "DeploymentID": "fc89ecc2-a05b-4fa2-26d6-1f350b0d6ce0",
  "DeploymentStatus": {
    "Canary": false,
    "Healthy": true,
    "ModifyIndex": 76,
    "Timestamp": "2019-11-25T14:05:49.356320114Z"
  },
  "DesiredDescription": "",
  "DesiredStatus": "run",
  "DesiredTransition": {
    "Migrate": null,
    "Reschedule": null
  },
  "EvalID": "89db0813-e218-4ca4-ec6f-69e55dd7ce86",
  "FollowupEvalID": "",
  "ID": "686d1da1-cbfb-251c-b9e2-ccb85b4c8bf3",
  "Job": {
    "Affinities": null,
    "AllAtOnce": false,
    "Constraints": null,
    "CreateIndex": 56,
    "Datacenters": [
      "dc1"
    ],
    "Dispatched": false,
    "ID": "example",
    "JobModifyIndex": 67,
    "Meta": null,
    "Migrate": null,
    "ModifyIndex": 67,
    "Name": "example",
    "Namespace": "default",
    "ParameterizedJob": null,
    "ParentID": "",
    "Payload": null,
    "Periodic": null,
    "Priority": 50,
    "Region": "global",
    "Reschedule": null,
    "Spreads": null,
    "Stable": false,
    "Status": "running",
    "StatusDescription": "",
    "Stop": false,
    "SubmitTime": 1574690735115202300,
    "TaskGroups": [
      {
        "Affinities": null,
        "Constraints": null,
        "Count": 1,
        "EphemeralDisk": {
          "Migrate": false,
          "SizeMB": 300,
          "Sticky": false
        },
        "Meta": null,
        "Migrate": {
          "HealthCheck": "checks",
          "HealthyDeadline": 300000000000,
          "MaxParallel": 1,
          "MinHealthyTime": 10000000000
        },
        "Name": "cache",
        "Networks": null,
        "ReschedulePolicy": {
          "Attempts": 0,
          "Delay": 30000000000,
          "DelayFunction": "exponential",
          "Interval": 0,
          "MaxDelay": 3600000000000,
          "Unlimited": true
        },
        "RestartPolicy": {
          "Attempts": 2,
          "Delay": 15000000000,
          "Interval": 1800000000000,
          "Mode": "fail"
        },
        "Services": null,
        "Spreads": null,
        "Tasks": [
          {
            "Affinities": null,
            "Artifacts": null,
            "Config": {
              "image": "redis:3.2",
              "port_map": [
                {
                  "db": 6379
                }
              ]
            },
            "Constraints": null,
            "DispatchPayload": null,
            "Driver": "docker",
            "Env": {
              "version": "1"
            },
            "KillSignal": "",
            "KillTimeout": 5000000000,
            "Kind": "",
            "Leader": false,
            "LogConfig": {
              "MaxFileSizeMB": 10,
              "MaxFiles": 10
            },
            "Meta": null,
            "Name": "redis",
            "Resources": {
              "CPU": 500,
              "Devices": null,
              "DiskMB": 0,
              "IOPS": 0,
              "MemoryMB": 256,
              "Networks": [
                {
                  "CIDR": "",
                  "Device": "",
                  "DynamicPorts": [
                    {
                      "Label": "db",
                      "To": 0,
                      "Value": 0
                    }
                  ],
                  "IP": "",
                  "MBits": 10,
                  "Mode": "",
                  "ReservedPorts": null
                }
              ]
            },
            "Services": null,
            "ShutdownDelay": 0,
            "Templates": null,
            "User": "",
            "Vault": null,
            "VolumeMounts": null
          }
        ],
        "Update": {
          "AutoPromote": false,
          "AutoRevert": false,
          "Canary": 0,
          "HealthCheck": "checks",
          "HealthyDeadline": 300000000000,
          "MaxParallel": 1,
          "MinHealthyTime": 10000000000,
          "ProgressDeadline": 600000000000,
          "Stagger": 30000000000
        },
        "Volumes": null
      }
    ],
    "Type": "service",
    "Update": {
      "AutoPromote": false,
      "AutoRevert": false,
      "Canary": 0,
      "HealthCheck": "",
      "HealthyDeadline": 0,
      "MaxParallel": 1,
      "MinHealthyTime": 0,
      "ProgressDeadline": 0,
      "Stagger": 30000000000
    },
    "VaultToken": "",
    "Version": 1
  },
  "JobID": "example",
  "Metrics": {
    "AllocationTime": 72165,
    "ClassExhausted": null,
    "ClassFiltered": null,
    "CoalescedFailures": 0,
    "ConstraintFiltered": null,
    "DimensionExhausted": null,
    "NodesAvailable": {
      "dc1": 2
    },
    "NodesEvaluated": 2,
    "NodesExhausted": 0,
    "NodesFiltered": 0,
    "QuotaExhausted": null,
    "ScoreMetaData": [
      {
        "NodeID": "1c61fbb7-df88-1737-c03f-7221fb8331eb",
        "NormScore": 0.20002653302990245,
        "Scores": {
          "binpack": 0.20002653302990245,
          "job-anti-affinity": 0,
          "node-reschedule-penalty": 0,
          "node-affinity": 0
        }
      },
      {
        "NodeID": "ff8ed4ac-6b01-f8e2-490c-8c3016a4745f",
        "NormScore": -0.39998673348504876,
        "Scores": {
          "node-affinity": 0,
          "binpack": 0.20002653302990245,
          "job-anti-affinity": 0,
          "node-reschedule-penalty": -1
        }
      }
    ],
    "Scores": null
  },
  "ModifyIndex": 76,
  "ModifyTime": 1574690749450031000,
  "Name": "example.cache[0]",
  "Namespace": "default",
  "NextAllocation": "",
  "NodeID": "1c61fbb7-df88-1737-c03f-7221fb8331eb",
  "NodeName": "ip-172-31-29-84",
  "PreemptedAllocations": null,
  "PreemptedByAllocation": "",
  "PreviousAllocation": "7b881420-729f-6fec-7d76-e43dd86d0352",
  "RescheduleTracker": null,
  "Resources": {
    "CPU": 500,
    "Devices": null,
    "DiskMB": 300,
    "IOPS": 0,
    "MemoryMB": 256,
    "Networks": [
      {
        "CIDR": "",
        "Device": "eth0",
        "DynamicPorts": [
          {
            "Label": "db",
            "To": 0,
            "Value": 27806
          }
        ],
        "IP": "172.31.29.84",
        "MBits": 10,
        "Mode": "",
        "ReservedPorts": null
      }
    ]
  },
  "Services": null,
  "TaskGroup": "cache",
  "TaskResources": {
    "redis": {
      "CPU": 500,
      "Devices": null,
      "DiskMB": 0,
      "IOPS": 0,
      "MemoryMB": 256,
      "Networks": [
        {
          "CIDR": "",
          "Device": "eth0",
          "DynamicPorts": [
            {
              "Label": "db",
              "To": 0,
              "Value": 27806
            }
          ],
          "IP": "172.31.29.84",
          "MBits": 10,
          "Mode": "",
          "ReservedPorts": null
        }
      ]
    }
  },
  "TaskStates": {
    "redis": {
      "Events": [
        {
          "Details": {},
          "DiskLimit": 0,
          "DiskSize": 0,
          "DisplayMessage": "Task received by client",
          "DownloadError": "",
          "DriverError": "",
          "DriverMessage": "",
          "ExitCode": 0,
          "FailedSibling": "",
          "FailsTask": false,
          "GenericSource": "",
          "KillError": "",
          "KillReason": "",
          "KillTimeout": 0,
          "Message": "",
          "RestartReason": "",
          "SetupError": "",
          "Signal": 0,
          "StartDelay": 0,
          "TaskSignal": "",
          "TaskSignalReason": "",
          "Time": 1574690735475090000,
          "Type": "Received",
          "ValidationError": "",
          "VaultError": ""
        },
        {
          "Details": {
            "message": "Building Task Directory"
          },
          "DiskLimit": 0,
          "DiskSize": 0,
          "DisplayMessage": "Building Task Directory",
          "DownloadError": "",
          "DriverError": "",
          "DriverMessage": "",
          "ExitCode": 0,
          "FailedSibling": "",
          "FailsTask": false,
          "GenericSource": "",
          "KillError": "",
          "KillReason": "",
          "KillTimeout": 0,
          "Message": "Building Task Directory",
          "RestartReason": "",
          "SetupError": "",
          "Signal": 0,
          "StartDelay": 0,
          "TaskSignal": "",
          "TaskSignalReason": "",
          "Time": 1574690736017497600,
          "Type": "Task Setup",
          "ValidationError": "",
          "VaultError": ""
        },
        {
          "Details": {
            "image": "redis:3.2"
          },
          "DiskLimit": 0,
          "DiskSize": 0,
          "DisplayMessage": "Downloading image",
          "DownloadError": "",
          "DriverError": "",
          "DriverMessage": "Downloading image",
          "ExitCode": 0,
          "FailedSibling": "",
          "FailsTask": false,
          "GenericSource": "",
          "KillError": "",
          "KillReason": "",
          "KillTimeout": 0,
          "Message": "",
          "RestartReason": "",
          "SetupError": "",
          "Signal": 0,
          "StartDelay": 0,
          "TaskSignal": "",
          "TaskSignalReason": "",
          "Time": 1574690736039536600,
          "Type": "Driver",
          "ValidationError": "",
          "VaultError": ""
        },
        {
          "Details": {},
          "DiskLimit": 0,
          "DiskSize": 0,
          "DisplayMessage": "Task started by client",
          "DownloadError": "",
          "DriverError": "",
          "DriverMessage": "",
          "ExitCode": 0,
          "FailedSibling": "",
          "FailsTask": false,
          "GenericSource": "",
          "KillError": "",
          "KillReason": "",
          "KillTimeout": 0,
          "Message": "",
          "RestartReason": "",
          "SetupError": "",
          "Signal": 0,
          "StartDelay": 0,
          "TaskSignal": "",
          "TaskSignalReason": "",
          "Time": 1574690739354516700,
          "Type": "Started",
          "ValidationError": "",
          "VaultError": ""
        }
      ],
      "Failed": false,
      "FinishedAt": null,
      "LastRestart": null,
      "Restarts": 0,
      "StartedAt": "2019-11-25T14:05:39.354520859Z",
      "State": "running"
    }
  }
}

jobspec

job "example" {
  datacenters = ["dc1"]

  group "cache" {
    task "redis" {
      driver = "docker"

      env {
        version = "1"
      }

      config {
        image = "redis:3.2"

        port_map {
          db = 6379
        }
      }

      resources {
        cpu    = 500
        memory = 256

        network {
          mbits = 10
          port  "db"  {}
        }
      }
    }
  }
}

I'm digging into why this is happening but my initial look at scheduler/generic_sched.go#L573 suggests we're unconditionally adding prevAllocation.NodeID to the list of penalized nodes. But I may be misunderstanding what prevAllocation is in this context, so I'm going to work up a quick test to walk myself through it.

tgross · 2019-11-25T15:12:02Z

I just ran a test with that line commented out to at least verify that this is where the node penalty is coming from, and that results in the new allocation not being moved, just as we'd expect.

Looks like that path was originally introduced in 5ecb789 The unit tests for the scheduler still all pass with that change, so that suggests there's some unexercised code paths there.

▶ nomad job run example.nomad
==> Monitoring evaluation "8fb4bba0"
    Evaluation triggered by job "example"
    Evaluation within deployment: "1e731e13"
    Allocation "c3aa1e43" created: node "e30d500b", group "cache"
    Evaluation status changed: "pending" -> "complete"
==> Evaluation "8fb4bba0" finished with status "complete"

▶ nomad job status example
...
Allocations
ID        Node ID   Task Group  Version  Desired  Status   Created  Modified
c3aa1e43  e30d500b  cache       0        run      running  5s ago   2s ago

▶ nomad job run example.nomad
==> Monitoring evaluation "0d5b56d5"
    Evaluation triggered by job "example"
    Evaluation within deployment: "7e744299"
    Allocation "4d37ba01" created: node "e30d500b", group "cache"
    Evaluation status changed: "pending" -> "complete"
==> Evaluation "0d5b56d5" finished with status "complete"

▶ nomad job status example
...
Allocations
ID        Node ID   Task Group  Version  Desired  Status    Created  Modified
4d37ba01  e30d500b  cache       1        run      running   5s ago   4s ago
c3aa1e43  e30d500b  cache       0        stop     complete  24s ago  5s ago

▶ nomad alloc status -verbose 4d37ba01
ID                  = 4d37ba01-9cec-f93f-a681-0d22bfce895c
Eval ID             = 0d5b56d5-a1d1-21bd-9ed7-bf032608fdb7
Name                = example.cache[0]
Node ID             = e30d500b-0480-63e5-2077-0d07abd8226c
Node Name           = ip-172-31-25-80
Job ID              = example
Job Version         = 1
Client Status       = running
Client Description  = Tasks are running
Desired Status      = run
Desired Description = <none>
Created             = 2019-11-25T10:09:16-05:00
Modified            = 2019-11-25T10:09:27-05:00
Deployment ID       = 7e744299-c05a-e959-3bc4-94bcaeac2165
Deployment Health   = healthy
Evaluated Nodes     = 2
Filtered Nodes      = 0
Exhausted Nodes     = 0
Allocation Time     = 93.481µs
Failures            = 0

Task "redis" is "running"
Task Resources
CPU        Memory           Disk     Addresses
2/500 MHz  6.3 MiB/256 MiB  300 MiB  db: 172.31.25.80:29354

Task Events:
Started At     = 2019-11-25T15:09:17Z
Finished At    = N/A
Total Restarts = 0
Last Restart   = N/A

Recent Events:
Time                       Type        Description
2019-11-25T10:09:17-05:00  Started     Task started by client
2019-11-25T10:09:17-05:00  Task Setup  Building Task Directory
2019-11-25T10:09:16-05:00  Received    Task received by client

Placement Metrics
Node                                  binpack  job-anti-affinity  node-affinity  node-reschedule-penalty  final score
0ad76a7b-ab6e-569a-dfa9-83417379c8db  0.2      0                  0              0                        0.2
e30d500b-0480-63e5-2077-0d07abd8226c  0.2      0                  0              0                        0.2

tgross · 2019-11-26T15:07:09Z

I removed that line and did some tests on the behavior around failing allocations, and it looks like leaving it out removes the node-reschedule-penalty for failed allocations, which we don't want either. So that block might need to be reworked to pick up the penalty only on failure, not on updates.

Fixes #5856 When the scheduler looks for a placement for an allocation that's replacing another allocation, it's supposed to penalize the previous node if the allocation had been rescheduled or failed. But we're currently always penalizing the node, which leads to unnecessary migrations on job update. This commit leaves in place the existing behavior where if the previous alloc was itself rescheduled, its previous nodes are also penalized. This is conservative but the right behavior especially on larger clusters where a group of hosts might be having correlated trouble (like an AZ failure).

tgross · 2019-11-26T20:55:27Z

I've opened #6781 for review which I think should fix this.

Fixes #5856 When the scheduler looks for a placement for an allocation that's replacing another allocation, it's supposed to penalize the previous node if the allocation had been rescheduled or failed. But we're currently always penalizing the node, which leads to unnecessary migrations on job update. This commit leaves in place the existing behavior where if the previous alloc was itself rescheduled, its previous nodes are also penalized. This is conservative but the right behavior especially on larger clusters where a group of hosts might be having correlated trouble (like an AZ failure). Co-Authored-By: Michael Schurter <mschurter@hashicorp.com>

github-actions · 2022-11-16T02:30:10Z

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

preetapan added stage/needs-investigation stage/waiting-reply labels Jun 20, 2019

stale bot removed the stage/waiting-reply label Jun 20, 2019

lonegunmanb mentioned this issue Jul 26, 2019

[question] Is there anyway to avoid allocation migrate during job rolling update? #6020

Closed

stale bot added the stage/waiting-reply label Sep 18, 2019

stale bot removed the stage/waiting-reply label Sep 26, 2019

tgross added this to Needs Triage in Nomad - Community Issues Triage via automation Oct 29, 2019

tgross self-assigned this Nov 25, 2019

tgross added type/bug theme/scheduling and removed stage/needs-investigation labels Nov 25, 2019

tgross added this to the unscheduled milestone Nov 25, 2019

tgross moved this from Needs Triage to In Progress in Nomad - Community Issues Triage Nov 25, 2019

tgross mentioned this issue Nov 26, 2019

scheduler: fix job update placement on prev node penalized #6781

Merged

tgross modified the milestones: unscheduled , 0.10.3 Nov 26, 2019

tgross moved this from In Progress to In Review in Nomad - Community Issues Triage Nov 26, 2019

tgross closed this as completed in #6781 Dec 3, 2019

Nomad - Community Issues Triage automation moved this from In Review to Done Dec 3, 2019

github-actions bot locked as resolved and limited conversation to collaborators Nov 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Job update always lead to allocations migrate (node-reschedule-penalty ) #5856

Job update always lead to allocations migrate (node-reschedule-penalty ) #5856

capone212 commented Jun 19, 2019 •

edited

Loading

Dirrk commented Jun 19, 2019

preetapan commented Jun 20, 2019

capone212 commented Jun 20, 2019

stale bot commented Sep 18, 2019

capone212 commented Sep 26, 2019

kdsnice commented Oct 29, 2019

tgross commented Nov 25, 2019 •

edited

Loading

tgross commented Nov 25, 2019 •

edited

Loading

tgross commented Nov 26, 2019

tgross commented Nov 26, 2019

github-actions bot commented Nov 16, 2022

Job update always lead to allocations migrate (node-reschedule-penalty ) #5856

Job update always lead to allocations migrate (node-reschedule-penalty ) #5856

Comments

capone212 commented Jun 19, 2019 • edited Loading

Nomad version

Operating system and Environment details

Issue

Dirrk commented Jun 19, 2019

preetapan commented Jun 20, 2019

capone212 commented Jun 20, 2019

stale bot commented Sep 18, 2019

capone212 commented Sep 26, 2019

kdsnice commented Oct 29, 2019

tgross commented Nov 25, 2019 • edited Loading

tgross commented Nov 25, 2019 • edited Loading

tgross commented Nov 26, 2019

tgross commented Nov 26, 2019

github-actions bot commented Nov 16, 2022

capone212 commented Jun 19, 2019 •

edited

Loading

tgross commented Nov 25, 2019 •

edited

Loading

tgross commented Nov 25, 2019 •

edited

Loading