Preempted dispatch alloc is not replaced after resources become available #9890

tgross · 2021-01-26T14:41:48Z

When a dispatch job allocation is displaced by pre-emption, it is never replaced even after resources become available.

Users have reported they expect the dispatch job to only be temporarily displaced. The evicted allocation should be replaced but in a blocked state while waiting for resources to become available, just as happens with placement failures.

This is borderline bug/enhancement because the behavior is not well-defined in the documentation, but it's certainly surprising to users.

To reproduce, run Nomad and enable batch preemption:

curl -XPUT -d '{"PreemptionConfig": {"BatchSchedulerEnabled": true }}' \
    "localhost:4646/v1/operator/scheduler/configuration"

Verify the resources available:

$ nomad node status -self
...
Allocated Resources
CPU          Memory       Disk
0/18424 MHz  0 B/1.9 GiB  0 B/44 GiB

Low-priority job:

job "low" {
  datacenters = ["dc1"]
  type        = "batch"

  parameterized {
    meta_optional = ["test"]
  }

  group "group" {

    task "task" {
      driver = "exec"
      config {
        command = "bash"
        args    = ["-c", "sleep 120"]
      }
      resources {
        memory = 1000
      }
    }
  }
}

High-priority job, with memory requirements that force pre-emption:

job "high" {
  datacenters = ["dc1"]
  type        = "batch"

  priority = 80

  parameterized {
    meta_optional = ["test"]
  }

  group "group" {

    task "task" {
      driver = "exec"
      config {
        command = "bash"
        args    = ["-c", "sleep 120"]
      }
      resources {
        memory = 1500
      }
    }
  }
}

Register both jobs.

$ nomad job run ./low.nomad
Job registration successful
$ nomad job run ./high.nomad
Job registration successful

Dispatch the low priority job and note that it's running:

$ nomad job dispatch low
Dispatched Job ID = low/dispatch-1611671592-fa7829a8
Evaluation ID     = b6423703

==> Monitoring evaluation "b6423703"
    Evaluation triggered by job "low/dispatch-1611671592-fa7829a8"
    Allocation "2ce23ac8" created: node "5ba86c7d", group "group"
==> Monitoring evaluation "b6423703"
    Evaluation status changed: "pending" -> "complete"
==> Evaluation "b6423703" finished with status "complete"

$ nomad job status
ID                                Type                 Priority  Status   Submit Date
high                              batch/parameterized  80        running  2021-01-26T09:33:04-05:00
low                               batch/parameterized  50        running  2021-01-26T09:33:01-05:00
low/dispatch-1611671592-fa7829a8  batch                50        running  2021-01-26T09:33:12-05:00

While that job is still running, dispatch the high priority job and note that the low-priority dispatched job is now dead because it's been evicted:

$ nomad job dispatch high
Dispatched Job ID = high/dispatch-1611671603-e1b8559d
Evaluation ID     = 1b8c7b4f

==> Monitoring evaluation "1b8c7b4f"
    Evaluation triggered by job "high/dispatch-1611671603-e1b8559d"
    Allocation "e0b4c1b2" created: node "5ba86c7d", group "group"
==> Monitoring evaluation "1b8c7b4f"
    Evaluation status changed: "pending" -> "complete"
==> Evaluation "1b8c7b4f" finished with status "complete"

$ nomad job status
ID                                 Type                 Priority  Status   Submit Date
high                               batch/parameterized  80        running  2021-01-26T09:33:04-05:00
high/dispatch-1611671603-e1b8559d  batch                80        running  2021-01-26T09:33:23-05:00
low                                batch/parameterized  50        running  2021-01-26T09:33:01-05:00
low/dispatch-1611671592-fa7829a8   batch                50        dead     2021-01-26T09:33:12-05:00

$ nomad job status low/dispatch-1611671592-fa7829a8
...
Allocations
ID        Node ID   Task Group  Version  Desired  Status    Created    Modified
2ce23ac8  5ba86c7d  group       0        evict    complete  1m42s ago  1m31s ago

Wait for the high-priority job to complete and note that the low-priority job is not replaced:

$ nomad job status
ID                                 Type                 Priority  Status   Submit Date
high                               batch/parameterized  80        running  2021-01-26T09:33:04-05:00
high/dispatch-1611671603-e1b8559d  batch                80        dead     2021-01-26T09:33:23-05:00
low                                batch/parameterized  50        running  2021-01-26T09:33:01-05:00
low/dispatch-1611671592-fa7829a8   batch                50        dead     2021-01-26T09:33:12-05:00

The text was updated successfully, but these errors were encountered:

Fuco1 · 2022-04-29T22:50:55Z

What I observed is that it doesn't even need to be a dispatched job, any batch job would do.

I now started two batch jobs with count = 1400, one with higher priority and one with 10 lower. The allocations from the low priority job were quickly preempted, but then they never returned after the high priority job finished (and about 300 allocations went straight into a failed state and never recovered). In the end Nomad was telling me there is 1000 queued tasks but nothing was happening.

Nomad version is 1.2.6.

shoenig · 2022-05-31T18:56:08Z

Indeed I'm able to reproduce with just a normal batch job. It seems when the alloc is evicted, Nomad doesn't queue up a replacement.

nomad.hcl

client {
  enabled = true
}

server {
  enabled = true
  default_scheduler_config {
    preemption_config {
      service_scheduler_enabled = true
      batch_scheduler_enabled = true
    }
  }
}

low.nomad

job "low" {
  datacenters = ["dc1"]
  priority = 50
  type = "batch"

  group "group" {
    count = 3    
    task "sleep" {
      driver = "exec"

      config {
	command = "/bin/sleep"
	args = ["10000"]
      }

      resources {
        cpu    = 500
        memory = 10000
      }
    }
  }
}

high.nomad

job "low" {
  datacenters = ["dc1"]
  priority = 50
  type = "batch"

  group "group" {
    count = 3    
    task "sleep" {
      driver = "exec"

      config {
	command = "/bin/sleep"
	args = ["30"]
      }

      resources {
        cpu    = 500
        memory = 10000
      }
    }
  }
}

Summary
Task Group  Queued  Starting  Running  Failed  Complete  Lost  Unknown
group       0       0         2        0       1         0     0

Allocations
ID        Node ID   Task Group  Version  Desired  Status    Created    Modified
01fd9a9e  5fedcd77  group       0        evict    complete  6m48s ago  6m24s ago
7125b2c9  5fedcd77  group       0        run      running   6m48s ago  6m44s ago
7cc0bebb  5fedcd77  group       0        run      running   6m48s ago  6m44s ago

This PR fixes a bug where an evicted batch job would not be rescheduled once resources become available. Closes #9890

github-actions · 2022-12-22T02:14:05Z

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

tgross added this to Needs Roadmapping in Nomad - Community Issues Triage Feb 12, 2021

tgross removed this from Needs Roadmapping in Nomad - Community Issues Triage Mar 3, 2021

mmcquillan modified the milestones: 1.3.2, 1.3.x May 17, 2022

shoenig self-assigned this May 31, 2022

shoenig added a commit that referenced this issue Jun 2, 2022

core: reschedule evicted batch job when resources become available

92b0696

This PR fixes a bug where an evicted batch job would not be rescheduled once resources become available. Closes #9890

shoenig mentioned this issue Jun 2, 2022

core: reschedule evicted batch job when resources become available #13205

Merged

shoenig added a commit that referenced this issue Jun 2, 2022

core: reschedule evicted batch job when resources become available

682dbaa

This PR fixes a bug where an evicted batch job would not be rescheduled once resources become available. Closes #9890

shoenig closed this as completed in #13205 Jun 2, 2022

shoenig added a commit that referenced this issue Jun 2, 2022

core: reschedule evicted batch job when resources become available

ba0e0e6

This PR fixes a bug where an evicted batch job would not be rescheduled once resources become available. Closes #9890

shoenig added a commit that referenced this issue Jun 3, 2022

core: reschedule evicted batch job when resources become available

367ca2c

This PR fixes a bug where an evicted batch job would not be rescheduled once resources become available. Closes #9890

shoenig added a commit that referenced this issue Jun 3, 2022

core: reschedule evicted batch job when resources become available

98c46be

This PR fixes a bug where an evicted batch job would not be rescheduled once resources become available. Closes #9890

ChaiWithJai pushed a commit that referenced this issue Jun 3, 2022

core: reschedule evicted batch job when resources become available

8a0c410

This PR fixes a bug where an evicted batch job would not be rescheduled once resources become available. Closes #9890

lgfa29 modified the milestones: 1.3.x, 1.3.2 Aug 24, 2022

github-actions bot locked as resolved and limited conversation to collaborators Dec 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Preempted dispatch alloc is not replaced after resources become available #9890

Preempted dispatch alloc is not replaced after resources become available #9890

tgross commented Jan 26, 2021

Fuco1 commented Apr 29, 2022

shoenig commented May 31, 2022

github-actions bot commented Dec 22, 2022

Preempted dispatch alloc is not replaced after resources become available #9890

Preempted dispatch alloc is not replaced after resources become available #9890

Comments

tgross commented Jan 26, 2021

Fuco1 commented Apr 29, 2022

shoenig commented May 31, 2022

github-actions bot commented Dec 22, 2022