Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restart-Stanza: Delay not working #15198

Closed
fraenku opened this issue Nov 10, 2022 · 2 comments · Fixed by #15215
Closed

Restart-Stanza: Delay not working #15198

fraenku opened this issue Nov 10, 2022 · 2 comments · Fixed by #15215

Comments

@fraenku
Copy link

fraenku commented Nov 10, 2022

Nomad version

1.4.1

Operating system and Environment details

System: Linux 4.18.0-372.26.1.el8_6.x86_64 x86_64 (Rocky Linux release 8.6 (Green Obsidian))

Issue

The delay paraemeter in the restart-stanza is not working as expected.

Reproduction steps

Execute the attached jobs-specifcation

Expected Result

The job should restart in approx 15 seconds (+- 25%).

Actual Result

The job restarts immediately which we can see in the following output

Nov 09, '22 08:42:**53** +0100 | Started | Task started by client
Nov 09, '22 08:42:**52** +0100 | Restarting | Task restarting in **15.587610296s**
Nov 09, '22 08:42:52 +0100 | Terminated | Exit Code: 2, Exit Message: Docker container exited with non-zero exit code: 2
Nov 09, '22 08:42:51 +0100 | Restart Signaled | healthcheck: check fail_service health using http endpoint ‘/health’ unhealthy
Nov 09, '22 08:41:**57** +0100 | Started | Task started by client
Nov 09, '22 08:41:**56** +0100 | Restarting | Task restarting in **16.822710794s**
Nov 09, '22 08:41:56 +0100 | Terminated | Exit Code: 2, Exit Message: Docker container exited with non-zero exit code: 2

Job file (if appropriate)

(sorry for the formatting issue, I do not know why, but the editor did not like my example somehow...)

job "fail-service" {
  datacenters = ["isys_poc"]
 
  type = "service"
 
  group "fail-service" {
    count = 1
 
    network {
      port "http" {
        to = 8080
        }
    }

    task "fail-service" {
      driver = "docker"
      config {
        image = "thobe/fail_service:v0.0.12"
        ports = ["http"]
      }
 
      service {
        name = "${TASK}"
        port = "http"
        check {
          name     = "fail_service health using http endpoint '/health'"
          port     = "http"
          type     = "http"
          path     = "/health"
          method   = "GET"
          interval = "10s"
          timeout  = "2s"
        }
        tags = [
          "traefik.enable=true",
          "traefik.http.routers.fail-service.rule=Host(`fail-service.poc-nomad.intersys.internal`)",
        ]
      }
 
      env {
        HEALTHY_FOR    = -1 # Stays healthy forever
      }
 
      resources {
        cpu    = 100 # MHz
        memory = 256 # MB
      }
    }
  }
}
@shoenig
Copy link
Member

shoenig commented Nov 11, 2022

Thanks for reporting, @fraenku! I was able to bisect the broken behavior all the way back to the 1.3.0, 1.2.7, and 1.1.13 releases, so at least now we have a place to start looking.

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 12, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants