Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Periodic jobs are never marked as complete in the job summary list #8903

Closed
lgfa29 opened this issue Sep 16, 2020 · 4 comments
Closed

Periodic jobs are never marked as complete in the job summary list #8903

lgfa29 opened this issue Sep 16, 2020 · 4 comments
Labels
hcc/cst Admin - internal theme/batch Issues related to batch jobs and scheduling type/bug
Milestone

Comments

@lgfa29
Copy link
Contributor

lgfa29 commented Sep 16, 2020

Nomad version

Nomad v0.12.1+

Operating system and Environment details

Reproduced on MacOS and Linux

Issue

After running a periodic job, the spawned jobs never transition to a complete state, even if they run successfully. They will always be marked as pending in the job summary list:

ID                   = batch
Name                 = batch
Submit Date          = 2020-09-16T13:24:07-04:00
Type                 = batch
Priority             = 50
Datacenters          = dc1
Namespace            = default
Status               = running
Periodic             = true
Parameterized        = false
Next Periodic Launch = 2020-09-16T17:25:00Z (5s from now)

Children Job Summary
Pending  Running  Dead
10       0        10

Previously Launched Jobs
ID                         Status
batch/periodic-1600277050  dead
batch/periodic-1600277055  dead
batch/periodic-1600277060  dead
batch/periodic-1600277065  dead
batch/periodic-1600277070  dead
batch/periodic-1600277075  dead
batch/periodic-1600277080  dead
batch/periodic-1600277085  dead
batch/periodic-1600277090  dead
batch/periodic-1600277095  dead

But each job, alloc and eval are completed succesfully:

$ nomad status batch/periodic-1600277095
nID            = batch/periodic-1600277095
Name          = batch/periodic-1600277095
Submit Date   = 2020-09-16T13:24:55-04:00
Type          = batch
Priority      = 50
Datacenters   = dc1
Namespace     = default
Status        = dead
Periodic      = false
Parameterized = false

Summary
Task Group  Queued  Starting  Running  Failed  Complete  Lost
batch       0       0         0        0       1         0

Allocations
ID        Node ID   Task Group  Version  Desired  Status    Created    Modified
53257249  1f38c7ad  batch       0        run      complete  1m24s ago  1m24s ago

$ nomad alloc status 53257249
ID                  = 53257249-963c-886c-c0d1-721bea8d2518
Eval ID             = 1926d34f
Name                = batch/periodic-1600277095.batch[0]
Node ID             = 1f38c7ad
Node Name           = MacBook-Pro.local
Job ID              = batch/periodic-1600277095
Job Version         = 0
Client Status       = complete
Client Description  = All tasks have completed
Desired Status      = run
Desired Description = <none>
Created             = 1m42s ago
Modified            = 1m42s ago

Task "batch" is "dead"
Task Resources
CPU        Memory       Disk     Addresses
0/100 MHz  0 B/300 MiB  300 MiB

Task Events:
Started At     = 2020-09-16T17:24:55Z
Finished At    = 2020-09-16T17:24:55Z
Total Restarts = 0
Last Restart   = N/A

Recent Events:
Time                       Type        Description
2020-09-16T13:24:55-04:00  Terminated  Exit Code: 0
2020-09-16T13:24:55-04:00  Started     Task started by client
2020-09-16T13:24:55-04:00  Task Setup  Building Task Directory
2020-09-16T13:24:55-04:00  Received    Task received by client

$ nomad eval status 1926d34f

ID                 = 1926d34f
Create Time        = 1m53s ago
Modify Time        = 1m53s ago
Status             = complete
Status Description = complete
Type               = batch
TriggeredBy        = periodic-job
Job ID             = batch/periodic-1600277095
Priority           = 50
Placement Failures = false

The issue started in the commit 97c69ee. Building from its parent doesn't trigger this problem. This also only happens with periodic jobs. Regular batch and dispatched jobs are not affected.

Reproduction steps

  1. Run sample job
  2. Wait a few seconds for periodic instances to be triggered
  3. Run nomad status batch

The list show all jobs as pending.

Job file (if appropriate)

job "batch" {
  datacenters = ["dc1"]
  type        = "batch"

  periodic {
    cron = "*/5 * * * * * *"
  }

  group "batch" {
    task "batch" {
      driver = "raw_exec"

      config {
        command = "echo"
        args    = ["hi"]
      }
    }
  }
}
@lgfa29 lgfa29 added theme/batch Issues related to batch jobs and scheduling type/bug labels Sep 16, 2020
@lgfa29 lgfa29 changed the title Periodic jobs are never marked as complete in thee job summary list Periodic jobs are never marked as complete in the job summary list Sep 16, 2020
@evandam
Copy link

evandam commented Feb 3, 2021

Hey folks - any updates on this? Experiencing this on Nomad 1.0.3. All of our batch jobs show thousands of pending jobs, equal to the number of dead jobs.

Thanks!

@tgross
Copy link
Member

tgross commented Feb 5, 2021

Hi @evandam! I have a suspicion this is fixed by #9768 which will land in 1.0.4. That was a fix for #8692 but looking at the commit ref you posted I know that was discussed as part of the cause there. I'll try to reproduce and verify that patch; will probably be early next week and we'll figure out what to do if anything from there.

@tgross tgross self-assigned this Feb 5, 2021
@tgross tgross added the hcc/cst Admin - internal label Feb 5, 2021
@tgross
Copy link
Member

tgross commented Feb 8, 2021

Using the repro above:

$ nomad status batch
ID                   = batch
Name                 = batch
Submit Date          = 2021-02-08T08:49:13-05:00
Type                 = batch
Priority             = 50
Datacenters          = dc1
Namespace            = default
Status               = running
Periodic             = true
Parameterized        = false
Next Periodic Launch = 2021-02-08T13:49:50Z (3s from now)

Children Job Summary
Pending  Running  Dead
0        0        7

Previously Launched Jobs
ID                         Status
batch/periodic-1612792155  dead
batch/periodic-1612792160  dead
batch/periodic-1612792165  dead
batch/periodic-1612792170  dead
batch/periodic-1612792175  dead
batch/periodic-1612792180  dead
batch/periodic-1612792185  dead

Going to close this and mark it for release in 1.0.4

@tgross tgross closed this as completed Feb 8, 2021
@tgross tgross added this to the 1.0.4 milestone Feb 8, 2021
@tgross tgross removed their assignment Feb 8, 2021
@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 24, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
hcc/cst Admin - internal theme/batch Issues related to batch jobs and scheduling type/bug
Projects
None yet
Development

No branches or pull requests

3 participants