Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[documentation] Periodic Dispatch jobs execute immediately #2470

Closed
nugend opened this issue Mar 22, 2017 · 8 comments · Fixed by #2489
Closed

[documentation] Periodic Dispatch jobs execute immediately #2470

nugend opened this issue Mar 22, 2017 · 8 comments · Fixed by #2489

Comments

@nugend
Copy link

nugend commented Mar 22, 2017

Nomad version

0.5.5

Issue

When nomad job dispatching a periodic dispatch job, the job will begin executing immediately. This is fine, but the behavior is not clearly documented as far as I can tell. Plain batch periodic jobs do not immediately execute and are deferred until the next time as indicated by the periodic stanza.

If this is not the intended behavior, then I guess this is a bug?

@dadgar
Copy link
Contributor

dadgar commented Mar 22, 2017

@nugend I am not seeing that behavior:

$ nomad run example.nomad
Job registration successful

$ nomad status
ID       Type   Priority  Status
example  batch  50        running

$ nomad status e
ID            = example
Name          = example
Type          = batch
Priority      = 50
Datacenters   = dc1
Status        = running
Periodic      = true
Parameterized = true

Parameterized Job
Payload           = optional
Required Metadata = <none>
Optional Metadata = <none>

Parameterized Job Summary
Pending  Running  Dead
0        0        0

No dispatched instances of parameterized job found
$ nomad job dispatch example
Dispatched Job ID = example/dispatch-1490214152-eedcccbf
Evaluation ID     = abceb25f

==> Monitoring evaluation "abceb25f"
    Evaluation triggered by job "example/dispatch-1490214152-eedcccbf"
    Allocation "305b4449" created: node "564daec4", group "cache"
    Evaluation status changed: "pending" -> "complete"
==> Evaluation "abceb25f" finished with status "complete"

$ nomad status
ID                                    Type   Priority  Status
example                               batch  50        running
example/dispatch-1490214152-eedcccbf  batch  50        running

$ nomad status example/dispatch-1490214152-eedcccbf
ID                   = example/dispatch-1490214152-eedcccbf
Name                 = example/dispatch-1490214152-eedcccbf
Type                 = batch
Priority             = 50
Datacenters          = dc1
Status               = running
Periodic             = true
Parameterized        = false
Next Periodic Launch = 03/22/17 23:00:00 UTC (2h37m13s from now)

Children Job Summary
Pending  Running  Dead
0        0        0

No instances of periodic job found

$ cat example.nomad | head -n 20
job "example" {
  datacenters = ["dc1"]
  type = "batch"

  periodic {
    cron = "* */23 * * * *"
  }

  parameterized {}

@nugend
Copy link
Author

nugend commented Mar 22, 2017

Can you provide the full example.nomad file? I'll try and reproduce with the example and if the behavior is different I will see if I can figure out what's going on.

@nugend
Copy link
Author

nugend commented Mar 24, 2017

@dadgar I'm seeing similar output from the command line, but I'm using hashi-ui to actually look at most of my jobs, so I noticed that it had attempted to execute. When I look at the HTTP allocations end point, I see that the job is executing.

The following jq filter provides a concise summary:

curl -s YOURHOST:4646/v1/allocations | jq -c '.[] | {job: .JobID, status: .ClientStatus, allocation_id: .ID, restart_reason: .TaskStates[].Events[-1].RestartReason}'

Maybe this is also a bug with the status sub-command?

@dadgar
Copy link
Contributor

dadgar commented Mar 26, 2017

Sure:

job "example" {
  datacenters = ["dc1"]
  type = "batch"

  parameterized {}

  periodic {
    cron = "* */23 * * * *"
  }

  group "cache" {
    count = 1

    task "redis" {
      driver = "docker"
      config {
        image = "redis:3.2"
	command = "sleep"
	args = ["10"]
        port_map {
          db = 6379
        }
      }

      resources {
        cpu    = 500 # 500 MHz
        memory = 256 # 256MB
        network {
          mbits = 10
          port "db" {}
        }
      }
    }
  }
}
$ nomad status example
ID            = example
Name          = example
Type          = batch
Priority      = 50
Datacenters   = dc1
Status        = running
Periodic      = true
Parameterized = true

Parameterized Job
Payload           = optional
Required Metadata = <none>
Optional Metadata = <none>

Parameterized Job Summary
Pending  Running  Dead
0        0        0

No dispatched instances of parameterized job found

$ curl http://127.0.0.1:4646/v1/allocations
[]

I am going to close this till it is verified broken because twice I have run it now and it is working correctly.

@dadgar dadgar closed this as completed Mar 26, 2017
@nugend
Copy link
Author

nugend commented Mar 27, 2017

@dadgar I can only run the raw_exec driver in my environment. The following task reproduces the issue:

job "example" {
  datacenters = ["dc1"]
  type = "batch"

  parameterized {}

  periodic {
    cron = "* */23 * * * *"
  }

  group "quick_fail" {
    count = 1

    task "run" {
      driver = "raw_exec"
      config {
        command = "bash"
        args = ["-c","echo `date` && echo 'I AM RUNNING' && false"]
      }

      resources {
        cpu    = 500 # 500 MHz
        memory = 256 # 256MB
        network {
          mbits = 10
          port "db" {}
        }
      }
    }
  }
}

Part of the problem is that you can't see the allocation with just nomad status

$ nomad run example.nomad
Job registration successful
$ nomad job dispatch example
Dispatched Job ID = example/dispatch-1490647455-84f64fdb
Evaluation ID     = 92d0607f

==> Monitoring evaluation "92d0607f"
    Evaluation triggered by job "example/dispatch-1490647455-84f64fdb"
    Allocation "34f3f86f" created: node "96133a71", group "quick_fail"
    Evaluation status changed: "pending" -> "complete"
==> Evaluation "92d0607f" finished with status "complete"
$ nomad status
ID                                    Type   Priority  Status
example                               batch  50        dead
example/dispatch-1490647455-84f64fdb  batch  50        running
$ nomad status example/dispatch-1490647455-84f64fdb
ID                   = example/dispatch-1490647455-84f64fdb
Name                 = example/dispatch-1490647455-84f64fdb
Type                 = batch
Priority             = 50
Datacenters          = dc1
Status               = running
Periodic             = true
Parameterized        = false
Next Periodic Launch = 03/27/17 23:00:00 UTC (2h15m4s from now)

Children Job Summary
Pending  Running  Dead
0        0        0

No instances of periodic job found
$ curl txsrfpexglb010:4646/v1/job/example/dispatch-1490647455-84f64fdb/allocations
[{"ID":"34f3f86f-4945-e210-9410-0acea3a92754","EvalID":"92d0607f-ccc5-11a6-42ec-e5db1bac21b4","Name":"example/dispatch-1490647455-84f64fdb.quick_fail[0]","NodeID":"96133a71-dd01-b4b5-6d63-aac7721dbdd8","JobID":"example/dispatch-1490647455-84f64fdb","TaskGroup":"quick_fail","DesiredStatus":"run","DesiredDescription":"","ClientStatus":"pending","ClientDescription":"","TaskStates":{"run":{"State":"pending","Failed":false,"Events":[{"Type":"Restarting","Time":1490647455695144620,"FailsTask":false,"RestartReason":"Restart within policy","SetupError":"","DriverError":"","ExitCode":0,"Signal":0,"Message":"","KillTimeout":0,"KillError":"","KillReason":"","StartDelay":16807003639,"DownloadError":"","ValidationError":"","DiskLimit":0,"FailedSibling":"","VaultError":"","TaskSignalReason":"","TaskSignal":"","DriverMessage":""},{"Type":"Started","Time":1490647472546359827,"FailsTask":false,"RestartReason":"","SetupError":"","DriverError":"","ExitCode":0,"Signal":0,"Message":"","KillTimeout":0,"KillError":"","KillReason":"","StartDelay":0,"DownloadError":"","ValidationError":"","DiskLimit":0,"FailedSibling":"","VaultError":"","TaskSignalReason":"","TaskSignal":"","DriverMessage":""},{"Type":"Terminated","Time":1490647472551581430,"FailsTask":false,"RestartReason":"","SetupError":"","DriverError":"","ExitCode":1,"Signal":0,"Message":"","KillTimeout":0,"KillError":"","KillReason":"","StartDelay":0,"DownloadError":"","ValidationError":"","DiskLimit":0,"FailedSibling":"","VaultError":"","TaskSignalReason":"","TaskSignal":"","DriverMessage":""},{"Type":"Restarting","Time":1490647472557417380,"FailsTask":false,"RestartReason":"Restart within policy","SetupError":"","DriverError":"","ExitCode":0,"Signal":0,"Message":"","KillTimeout":0,"KillError":"","KillReason":"","StartDelay":18609631515,"DownloadError":"","ValidationError":"","DiskLimit":0,"FailedSibling":"","VaultError":"","TaskSignalReason":"","TaskSignal":"","DriverMessage":""},{"Type":"Started","Time":1490647491211856799,"FailsTask":false,"RestartReason":"","SetupError":"","DriverError":"","ExitCode":0,"Signal":0,"Message":"","KillTimeout":0,"KillError":"","KillReason":"","StartDelay":0,"DownloadError":"","ValidationError":"","DiskLimit":0,"FailedSibling":"","VaultError":"","TaskSignalReason":"","TaskSignal":"","DriverMessage":""},{"Type":"Terminated","Time":1490647491220924977,"FailsTask":false,"RestartReason":"","SetupError":"","DriverError":"","ExitCode":1,"Signal":0,"Message":"","KillTimeout":0,"KillError":"","KillReason":"","StartDelay":0,"DownloadError":"","ValidationError":"","DiskLimit":0,"FailedSibling":"","VaultError":"","TaskSignalReason":"","TaskSignal":"","DriverMessage":""},{"Type":"Restarting","Time":1490647491232587847,"FailsTask":false,"RestartReason":"Restart within policy","SetupError":"","DriverError":"","ExitCode":0,"Signal":0,"Message":"","KillTimeout":0,"KillError":"","KillReason":"","StartDelay":17888868061,"DownloadError":"","ValidationError":"","DiskLimit":0,"FailedSibling":"","VaultError":"","TaskSignalReason":"","TaskSignal":"","DriverMessage":""},{"Type":"Started","Time":1490647509162282241,"FailsTask":false,"RestartReason":"","SetupError":"","DriverError":"","ExitCode":0,"Signal":0,"Message":"","KillTimeout":0,"KillError":"","KillReason":"","StartDelay":0,"DownloadError":"","ValidationError":"","DiskLimit":0,"FailedSibling":"","VaultError":"","TaskSignalReason":"","TaskSignal":"","DriverMessage":""},{"Type":"Terminated","Time":1490647509166102926,"FailsTask":false,"RestartReason":"","SetupError":"","DriverError":"","ExitCode":1,"Signal":0,"Message":"","KillTimeout":0,"KillError":"","KillReason":"","StartDelay":0,"DownloadError":"","ValidationError":"","DiskLimit":0,"FailedSibling":"","VaultError":"","TaskSignalReason":"","TaskSignal":"","DriverMessage":""},{"Type":"Restarting","Time":1490647509171676939,"FailsTask":false,"RestartReason":"Restart within policy","SetupError":"","DriverError":"","ExitCode":0,"Signal":0,"Message":"","KillTimeout":0,"KillError":"","KillReason":"","StartDelay":18078739009,"DownloadError":"","ValidationError":"","DiskLimit":0,"FailedSibling":"","VaultError":"","TaskSignalReason":"","TaskSignal":"","DriverMessage":""}]}},"CreateIndex":42317,"ModifyIndex":42323,"CreateTime":1490647455593222081}]
$ nomad logs 34f3f86f-4945-e210-9410-0acea3a92754
Mon Mar 27 16:44:15 EDT 2017
I AM RUNNING
Mon Mar 27 16:44:32 EDT 2017
I AM RUNNING
Mon Mar 27 16:44:51 EDT 2017
I AM RUNNING
Mon Mar 27 16:45:09 EDT 2017
I AM RUNNING
Mon Mar 27 16:45:27 EDT 2017
I AM RUNNING
Mon Mar 27 16:45:43 EDT 2017
I AM RUNNING

So, I used the HTTP API to look up the allocations for the dispatch job, took the returned ID (which should be the first one), and can see that it is currently executing by inspecting the logs.

If I'm doing something blatantly wrong here with the task definition, please let me know.

@dadgar dadgar reopened this Mar 27, 2017
@dadgar
Copy link
Contributor

dadgar commented Mar 27, 2017

Yeah I see now. It is when dispatched an alloc is made! Will get a fix for 0.5.6

dadgar added a commit that referenced this issue Mar 27, 2017
This PR fixes an issue in which when a periodic and parameterized job
was dispatched, an allocation would be immediately created.

Fixes #2470
@nugend
Copy link
Author

nugend commented Mar 28, 2017

Awesome! Thanks!

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 14, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants