v0.9.5: node-affinity stays after removing it from job spec #6334

a-vorobiev · 2019-09-16T17:40:59Z

Nomad version

v0.9.5 (1cbb2b9)

Operating system and Environment details

Debian GNU/Linux 10 (buster)

Issue

Node affinity stays in Placement Metrics after moving it out from the job spec.

Reproduction steps

Testing Nomad cluster consists of single server and three client nodes, two of them have meta.tier = hi, the last one has meta.tier = lo.

Submitting the following job definition:

job "example" {
  datacenters = ["dc1"]
  type = "service"

  affinity {
    attribute = "${meta.tier}"
      value     = "hi"
      weight    = 100
  }

  group "cache" {
    count = 2

    task "redis" {
      driver = "docker"

      config {
        image = "redis:3.2"
        port_map {
          db = 6379
        }
      }

      resources {
        cpu    = 150 # 150 MHz
        memory = 64 # 64 MB
        network {
          mbits = 10
          port "db" {}
        }
      }

      service {
        name = "redis-cache"
        tags = ["global", "cache"]
        port = "db"
        check {
          name     = "alive"
          type     = "tcp"
          interval = "10s"
          timeout  = "2s"
        }
      }
    }
  }
}

root@master:~# nomad run redis.nomad
==> Monitoring evaluation "6b5c6106"
    Evaluation triggered by job "example"
    Allocation "98832f04" created: node "6fc7950f", group "cache"
    Allocation "9a2567d2" created: node "16a5491e", group "cache"
    Evaluation status changed: "pending" -> "complete"
==> Evaluation "6b5c6106" finished with status "complete"

One node wins due to node-affinity (two of them match, because there are two nodes with meta.tier = hi):

root@master:~# nomad alloc status -verbose 98832f04
ID                  = 98832f04-2bd7-a1ac-500a-ea0dec112bf3
Eval ID             = 6b5c6106-9291-a65b-cbd6-98b88edd7dd6
Name                = example.cache[0]
Node ID             = 6fc7950f-6af1-001f-dc85-76690174e6a2
Node Name           = worker1
Job ID              = example
Job Version         = 824636458128
Client Status       = running
Client Description  = Tasks are running
Desired Status      = run
Desired Description = <none>
Created             = 2019-09-16T16:57:40Z
Modified            = 2019-09-16T16:58:05Z
Evaluated Nodes     = 3
Filtered Nodes      = 0
Exhausted Nodes     = 0
Allocation Time     = 137.848µs
Failures            = 0

Task "redis" is "running"
Task Resources
CPU        Memory           Disk     Addresses
6/150 MHz  1008 KiB/64 MiB  300 MiB  db: 10.0.2.15:29844

Task Events:
Started At     = 2019-09-16T16:57:50Z
Finished At    = N/A
Total Restarts = 0
Last Restart   = N/A

Recent Events:
Time                  Type        Description
2019-09-16T16:57:50Z  Started     Task started by client
2019-09-16T16:57:40Z  Driver      Downloading image
2019-09-16T16:57:40Z  Task Setup  Building Task Directory
2019-09-16T16:57:40Z  Received    Task received by client

Placement Metrics
Node                                  node-affinity  binpack  job-anti-affinity  node-reschedule-penalty  final score
6fc7950f-6af1-001f-dc85-76690174e6a2  1              0.192    0                  0                        0.596
16a5491e-a3ed-8f4e-af18-e29e05f45b52  1              0.192    0                  0                        0.596
d711dc5e-20f5-fb6f-f227-2c91081672d2  0              0.192    0                  0                        0.192

Now I decide to remove the affinity:

job "example" {
  datacenters = ["dc1"]
  type = "service"

#  affinity {
#    attribute = "${meta.tier}"
#      value     = "hi"
#      weight    = 100
#  }

  group "cache" {
    count = 2

    task "redis" {
      driver = "docker"

      config {
        image = "redis:3.2"
        port_map {
          db = 6379
        }
      }

      resources {
        cpu    = 150 # 150 MHz
        memory = 64 # 64 MB
        network {
          mbits = 10
          port "db" {}
        }
      }

      service {
        name = "redis-cache"
        tags = ["global", "cache"]
        port = "db"
        check {
          name     = "alive"
          type     = "tcp"
          interval = "10s"
          timeout  = "2s"
        }
      }
    }
  }
}

Looks like it'll go away as expected:

root@master:~# nomad plan redis.nomad 
+/- Job: "example"
- Affinity {
  - LTarget: "${meta.tier}"
  - Operand: "="
  - RTarget: "hi"
  - Weight:  "100"
  }
  Task Group: "cache" (2 in-place update)
    Task: "redis"

Scheduler dry-run:
- All tasks successfully allocated.

root@master:~# nomad job run redis.nomad 
==> Monitoring evaluation "d49d8c57"
    Evaluation triggered by job "example"
    Allocation "98832f04" modified: node "6fc7950f", group "cache"
    Allocation "9a2567d2" modified: node "16a5491e", group "cache"
    Evaluation status changed: "pending" -> "complete"
==> Evaluation "d49d8c57" finished with status "complete"

But there are still two nodes with node-affinity equal to 1

root@master:~# nomad alloc status -verbose 98832f04
ID                  = 98832f04-2bd7-a1ac-500a-ea0dec112bf3
Eval ID             = d49d8c57-d5be-a1f4-bb9e-67afe53aae3f
Name                = example.cache[0]
Node ID             = 6fc7950f-6af1-001f-dc85-76690174e6a2
Node Name           = worker1
Job ID              = example
Job Version         = 824635190912
Client Status       = running
Client Description  = Tasks are running
Desired Status      = run
Desired Description = <none>
Created             = 2019-09-16T16:57:40Z
Modified            = 2019-09-16T16:59:47Z
Evaluated Nodes     = 3
Filtered Nodes      = 0
Exhausted Nodes     = 0
Allocation Time     = 137.848µs
Failures            = 0

Task "redis" is "running"
Task Resources
CPU        Memory           Disk     Addresses
5/150 MHz  1008 KiB/64 MiB  300 MiB  db: 10.0.2.15:29844

Task Events:
Started At     = 2019-09-16T16:57:50Z
Finished At    = N/A
Total Restarts = 0
Last Restart   = N/A

Recent Events:
Time                  Type        Description
2019-09-16T16:57:50Z  Started     Task started by client
2019-09-16T16:57:40Z  Driver      Downloading image
2019-09-16T16:57:40Z  Task Setup  Building Task Directory
2019-09-16T16:57:40Z  Received    Task received by client

Placement Metrics
Node                                  node-affinity  binpack  job-anti-affinity  node-reschedule-penalty  final score
6fc7950f-6af1-001f-dc85-76690174e6a2  1              0.192    0                  0                        0.596
16a5491e-a3ed-8f4e-af18-e29e05f45b52  1              0.192    0                  0                        0.596
d711dc5e-20f5-fb6f-f227-2c91081672d2  0              0.192    0                  0                        0.192

The text was updated successfully, but these errors were encountered:

drewbailey · 2019-11-14T21:53:16Z

After taking a closer look at this the issue we determined that currently Nomad does not check for differences in affinities or constraints when determining whether to do an inplace update or create a new allocation.

github-actions · 2022-11-16T02:30:34Z

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

tgross added stage/needs-investigation theme/scheduling labels Oct 29, 2019

tgross added this to Needs Triage in Nomad - Community Issues Triage via automation Oct 29, 2019

preetapan added type/bug and removed stage/needs-investigation labels Nov 14, 2019

drewbailey mentioned this issue Nov 14, 2019

Check for changes to affinity, constraints and spread during update #6703

Merged

drewbailey closed this as completed in #6703 Nov 19, 2019

Nomad - Community Issues Triage automation moved this from Needs Triage to Done Nov 19, 2019

michaeldwan mentioned this issue Jan 27, 2020

Changing affinity or spreads prevents in-place upgrade #6988

Closed

github-actions bot locked as resolved and limited conversation to collaborators Nov 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.9.5: node-affinity stays after removing it from job spec #6334

v0.9.5: node-affinity stays after removing it from job spec #6334

a-vorobiev commented Sep 16, 2019

drewbailey commented Nov 14, 2019

github-actions bot commented Nov 16, 2022

v0.9.5: node-affinity stays after removing it from job spec #6334

v0.9.5: node-affinity stays after removing it from job spec #6334

Comments

a-vorobiev commented Sep 16, 2019

Nomad version

Operating system and Environment details

Issue

Reproduction steps

drewbailey commented Nov 14, 2019

github-actions bot commented Nov 16, 2022