Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dispatch batch job summary reporting negative values #10222

Open
lgfa29 opened this issue Mar 24, 2021 · 2 comments
Open

Dispatch batch job summary reporting negative values #10222

lgfa29 opened this issue Mar 24, 2021 · 2 comments
Labels
stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/batch Issues related to batch jobs and scheduling theme/job-summary type/bug

Comments

@lgfa29
Copy link
Contributor

lgfa29 commented Mar 24, 2021

Nomad version

Nomad v1.0.4 (9294f35f9aa8dbb4acb6e85fa88e3e2534a3e41a)

Operating system and Environment details

MacOS 10.15.7

Issue

When using parameterized batch jobs, the job summary returned by /v1/job/<job>summary miscounts the number of running children, keeping it always at 0.

This results in negative values for Running when the dispatched job finishes.

The problem doesn't happen in Nomad v1.0.3 and git bisect shows that the problem started in this commit: 85129bb

Reproduction steps

  1. Start nomad agent
$ sudo nomad agent -dev
  1. Run sample job
$ nomad run counter.nomad
  1. Dispatch job
$ nomad job dispatch -meta start=0 counter
  1. Check job summary
$ curl http://localhost:4646/v1/job/counter/summary | jq .Children
% Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                               Dload  Upload   Total   Spent    Left  Speed
100   214  100   214    0     0  42800      0 --:--:-- --:--:-- --:--:-- 53500
{
"Pending": 0,
"Running": 0,
"Dead": 0
}
  1. Wait for job to finish and check summary again
$ curl http://localhost:4646/v1/job/counter/summary | jq .Children
% Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                               Dload  Upload   Total   Spent    Left  Speed
100   215  100   215    0     0  71666      0 --:--:-- --:--:-- --:--:-- 71666
{
"Pending": 0,
"Running": -1,
"Dead": 1
}

Expected Result

In step 4 there should've been 1 job running, and 0 in step 5.

Actual Result

0 children running in step 4 and -1 in step 5.

Job file

job "counter" {
  datacenters = ["dc1"]
  type        = "batch"

  parameterized {
    meta_required = ["start"]
  }

  group "counter" {
    task "counter" {
      driver = "docker"

      config {
        image   = "alpine:3.13"
        command = "count.sh"
        volumes = [
          "local/count.sh:/usr/bin/count.sh",
        ]
      }

      template {
        data        = <<EOF
#!/bin/sh

count=${NOMAD_META_start}
while [ $count -lt 100000 ]
do
  echo $count
  count=$((count + 1))
done
        EOF
        destination = "local/count.sh"
        perms       = "777"
      }
    }
  }
}
@notnoop
Copy link
Contributor

notnoop commented Mar 24, 2021

This seems like a regression in 1.0.4. It's very similar to #10145, in that the initial status is not properly set but in a different path.

@tgross
Copy link
Member

tgross commented Jul 25, 2022

Some other issues that look related to this one: #13519 #4731 #10338 #13897.

@tgross tgross added this to Needs Triage in Nomad - Community Issues Triage via automation Jul 25, 2022
@tgross tgross moved this from Needs Triage to Needs Roadmapping in Nomad - Community Issues Triage Jul 25, 2022
@tgross tgross added the stage/accepted Confirmed, and intend to work on. No timeline committment though. label Jul 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/batch Issues related to batch jobs and scheduling theme/job-summary type/bug
Projects
Status: Needs Roadmapping
Development

No branches or pull requests

3 participants