Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v0.9.5: node-affinity stays after removing it from job spec #6334

Closed
a-vorobiev opened this issue Sep 16, 2019 · 2 comments · Fixed by #6703
Closed

v0.9.5: node-affinity stays after removing it from job spec #6334

a-vorobiev opened this issue Sep 16, 2019 · 2 comments · Fixed by #6703

Comments

@a-vorobiev
Copy link

Nomad version

v0.9.5 (1cbb2b9)

Operating system and Environment details

Debian GNU/Linux 10 (buster)

Issue

Node affinity stays in Placement Metrics after moving it out from the job spec.

Reproduction steps

Testing Nomad cluster consists of single server and three client nodes, two of them have meta.tier = hi, the last one has meta.tier = lo.

Submitting the following job definition:

job "example" {
  datacenters = ["dc1"]
  type = "service"

  affinity {
    attribute = "${meta.tier}"
      value     = "hi"
      weight    = 100
  }

  group "cache" {
    count = 2

    task "redis" {
      driver = "docker"

      config {
        image = "redis:3.2"
        port_map {
          db = 6379
        }
      }

      resources {
        cpu    = 150 # 150 MHz
        memory = 64 # 64 MB
        network {
          mbits = 10
          port "db" {}
        }
      }

      service {
        name = "redis-cache"
        tags = ["global", "cache"]
        port = "db"
        check {
          name     = "alive"
          type     = "tcp"
          interval = "10s"
          timeout  = "2s"
        }
      }
    }
  }
}
root@master:~# nomad run redis.nomad
==> Monitoring evaluation "6b5c6106"
    Evaluation triggered by job "example"
    Allocation "98832f04" created: node "6fc7950f", group "cache"
    Allocation "9a2567d2" created: node "16a5491e", group "cache"
    Evaluation status changed: "pending" -> "complete"
==> Evaluation "6b5c6106" finished with status "complete"

One node wins due to node-affinity (two of them match, because there are two nodes with meta.tier = hi):

root@master:~# nomad alloc status -verbose 98832f04
ID                  = 98832f04-2bd7-a1ac-500a-ea0dec112bf3
Eval ID             = 6b5c6106-9291-a65b-cbd6-98b88edd7dd6
Name                = example.cache[0]
Node ID             = 6fc7950f-6af1-001f-dc85-76690174e6a2
Node Name           = worker1
Job ID              = example
Job Version         = 824636458128
Client Status       = running
Client Description  = Tasks are running
Desired Status      = run
Desired Description = <none>
Created             = 2019-09-16T16:57:40Z
Modified            = 2019-09-16T16:58:05Z
Evaluated Nodes     = 3
Filtered Nodes      = 0
Exhausted Nodes     = 0
Allocation Time     = 137.848µs
Failures            = 0

Task "redis" is "running"
Task Resources
CPU        Memory           Disk     Addresses
6/150 MHz  1008 KiB/64 MiB  300 MiB  db: 10.0.2.15:29844

Task Events:
Started At     = 2019-09-16T16:57:50Z
Finished At    = N/A
Total Restarts = 0
Last Restart   = N/A

Recent Events:
Time                  Type        Description
2019-09-16T16:57:50Z  Started     Task started by client
2019-09-16T16:57:40Z  Driver      Downloading image
2019-09-16T16:57:40Z  Task Setup  Building Task Directory
2019-09-16T16:57:40Z  Received    Task received by client

Placement Metrics
Node                                  node-affinity  binpack  job-anti-affinity  node-reschedule-penalty  final score
6fc7950f-6af1-001f-dc85-76690174e6a2  1              0.192    0                  0                        0.596
16a5491e-a3ed-8f4e-af18-e29e05f45b52  1              0.192    0                  0                        0.596
d711dc5e-20f5-fb6f-f227-2c91081672d2  0              0.192    0                  0                        0.192

Now I decide to remove the affinity:

job "example" {
  datacenters = ["dc1"]
  type = "service"

#  affinity {
#    attribute = "${meta.tier}"
#      value     = "hi"
#      weight    = 100
#  }

  group "cache" {
    count = 2

    task "redis" {
      driver = "docker"

      config {
        image = "redis:3.2"
        port_map {
          db = 6379
        }
      }

      resources {
        cpu    = 150 # 150 MHz
        memory = 64 # 64 MB
        network {
          mbits = 10
          port "db" {}
        }
      }

      service {
        name = "redis-cache"
        tags = ["global", "cache"]
        port = "db"
        check {
          name     = "alive"
          type     = "tcp"
          interval = "10s"
          timeout  = "2s"
        }
      }
    }
  }
}

Looks like it'll go away as expected:

root@master:~# nomad plan redis.nomad 
+/- Job: "example"
- Affinity {
  - LTarget: "${meta.tier}"
  - Operand: "="
  - RTarget: "hi"
  - Weight:  "100"
  }
  Task Group: "cache" (2 in-place update)
    Task: "redis"

Scheduler dry-run:
- All tasks successfully allocated.
root@master:~# nomad job run redis.nomad 
==> Monitoring evaluation "d49d8c57"
    Evaluation triggered by job "example"
    Allocation "98832f04" modified: node "6fc7950f", group "cache"
    Allocation "9a2567d2" modified: node "16a5491e", group "cache"
    Evaluation status changed: "pending" -> "complete"
==> Evaluation "d49d8c57" finished with status "complete"

But there are still two nodes with node-affinity equal to 1

root@master:~# nomad alloc status -verbose 98832f04
ID                  = 98832f04-2bd7-a1ac-500a-ea0dec112bf3
Eval ID             = d49d8c57-d5be-a1f4-bb9e-67afe53aae3f
Name                = example.cache[0]
Node ID             = 6fc7950f-6af1-001f-dc85-76690174e6a2
Node Name           = worker1
Job ID              = example
Job Version         = 824635190912
Client Status       = running
Client Description  = Tasks are running
Desired Status      = run
Desired Description = <none>
Created             = 2019-09-16T16:57:40Z
Modified            = 2019-09-16T16:59:47Z
Evaluated Nodes     = 3
Filtered Nodes      = 0
Exhausted Nodes     = 0
Allocation Time     = 137.848µs
Failures            = 0

Task "redis" is "running"
Task Resources
CPU        Memory           Disk     Addresses
5/150 MHz  1008 KiB/64 MiB  300 MiB  db: 10.0.2.15:29844

Task Events:
Started At     = 2019-09-16T16:57:50Z
Finished At    = N/A
Total Restarts = 0
Last Restart   = N/A

Recent Events:
Time                  Type        Description
2019-09-16T16:57:50Z  Started     Task started by client
2019-09-16T16:57:40Z  Driver      Downloading image
2019-09-16T16:57:40Z  Task Setup  Building Task Directory
2019-09-16T16:57:40Z  Received    Task received by client

Placement Metrics
Node                                  node-affinity  binpack  job-anti-affinity  node-reschedule-penalty  final score
6fc7950f-6af1-001f-dc85-76690174e6a2  1              0.192    0                  0                        0.596
16a5491e-a3ed-8f4e-af18-e29e05f45b52  1              0.192    0                  0                        0.596
d711dc5e-20f5-fb6f-f227-2c91081672d2  0              0.192    0                  0                        0.192
@drewbailey
Copy link
Contributor

After taking a closer look at this the issue we determined that currently Nomad does not check for differences in affinities or constraints when determining whether to do an inplace update or create a new allocation.

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 16, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants