Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Forcing periodic jobs results in one pending and one running #9775

Closed
thetooth opened this issue Jan 12, 2021 · 3 comments
Closed

Forcing periodic jobs results in one pending and one running #9775

thetooth opened this issue Jan 12, 2021 · 3 comments
Labels
stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/batch Issues related to batch jobs and scheduling type/bug

Comments

@thetooth
Copy link

Nomad version

Nomad v1.0.1 (c9c68aa)

Issue

When forcing a periodic job to run, a mystery pending job appears and hangs around forever, stopping the running instance also results in the entire job getting garbage collected(job is going from running->dead status when stopping myjob/periodic-xxx), preventing the actual scheduled launch of the job.

Reproduction steps

nomad run myjob.hcl
Job registration successful
Approximate next launch time: 2021-01-13T06:00:00+10:30 (17h35m2s from now)
nomad job periodic force myjob
==> Monitoring evaluation "5e6a0dd5"
    Evaluation triggered by job "livesale-streamer/periodic-1610416639"
    Allocation "1124a587" created: node "a6a13bca", group "streamer"
    Evaluation status changed: "pending" -> "complete"
==> Evaluation "5e6a0dd5" finished with status "complete"
nomad job status livesale-streamer
ID                   = livesale-streamer
Name                 = livesale-streamer
Submit Date          = 2021-01-12T12:24:58+10:30
Type                 = batch
Priority             = 50
Datacenters          = mtc1
Namespace            = default
Status               = running
Periodic             = true
Parameterized        = false
Next Periodic Launch = 2021-01-13T06:00:00+10:30 (17h29m25s from now)

Children Job Summary
Pending  Running  Dead
1        1        0

Previously Launched Jobs
ID                                     Status
livesale-streamer/periodic-1610416639  running
nomad job stop livesale-streamer/periodic-1610416639
==> Monitoring evaluation "f1c849d0"
    Evaluation triggered by job "livesale-streamer/periodic-1610416639"
    Evaluation status changed: "pending" -> "complete"
==> Evaluation "f1c849d0" finished with status "complete"
nomad job status livesale-streamer
ID                   = livesale-streamer
Name                 = livesale-streamer
Submit Date          = 2021-01-12T12:24:58+10:30
Type                 = batch
Priority             = 50
Datacenters          = mtc1
Namespace            = default
Status               = dead (stopped)
Periodic             = true
Parameterized        = false
Next Periodic Launch = none (job stopped)

Children Job Summary
Pending  Running  Dead
1        0        1

Previously Launched Jobs
ID                                     Status
livesale-streamer/periodic-1610416639  dead

Job file (if appropriate)

job "livesale-streamer" {
  datacenters = ["mtc1"]
  type = "batch"

  periodic {
    cron = "0 6 * * WED"
    time_zone = "Australia/Adelaide"
  }

  reschedule {
    unlimited      = true
    delay          = "5s"
    delay_function = "constant"
    attempts       = 0
  }

  group "streamer" {
    restart {
      attempts = 5
      delay    = "1s"
    }
    task "head" {
      constraint {
        attribute = "${meta.capturecard}"
        operator  = "="
        value     = "yes"
      }

      driver = "raw_exec"

      config {
        command = "/bin/bash"
        args    = [
                  "-c",
                  "cd /opt && /opt/head.sh"
        ]
      }

      resources {
        cpu    = 6500 # MHz
        memory = 1024 # MB
      }
    }
  }
}
@tgross tgross added type/bug stage/needs-investigation theme/batch Issues related to batch jobs and scheduling labels Jan 12, 2021
@tgross
Copy link
Member

tgross commented Jan 12, 2021

Thanks for reporting this @thetooth. Looks like this might be another case of #8692, and that it should be fixed by @drewbailey's PR: #9768

@tgross tgross added stage/accepted Confirmed, and intend to work on. No timeline committment though. and removed stage/needs-investigation labels Jan 12, 2021
@thetooth
Copy link
Author

Excellent, I did try to find an issue for this but must have missed it. Good to see we have a fix already, if I correct my application to return 0 on termination the parent job does not enter the dead state, also it can be brought back from the dead by simply running it again before GC. Periodic jobs are indeed running when they should.

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 25, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/batch Issues related to batch jobs and scheduling type/bug
Projects
None yet
Development

No branches or pull requests

2 participants