-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Health check routine leaks using new nomad provider #15477
Labels
stage/accepted
Confirmed, and intend to work on. No timeline committment though.
theme/service-discovery/nomad
type/bug
Milestone
Comments
shoenig
added
stage/accepted
Confirmed, and intend to work on. No timeline committment though.
and removed
stage/needs-investigation
labels
Jan 23, 2023
Thanks for the report @thetooth, and apologies for the slow response. I was able to reproduce this with this simpler job and bash script. AFAICT the duplication in requests to the healthcheck happens on job "demo" {
datacenters = ["dc1"]
group "group1" {
network {
mode = "host"
port "http" {
static = 8888
}
}
reschedule {
unlimited = true
delay = "15s"
delay_function = "constant"
attempts = 0
}
restart {
attempts = 2
delay = "1s"
interval = "15s"
mode = "fail"
}
task "task1" {
driver = "raw_exec"
user = "shoenig"
config {
command = "python3"
args = ["-m", "http.server", "8888", "--directory", "/tmp"]
}
service {
provider = "nomad"
port = "http"
check{
path = "/"
type = "http"
interval = "3s"
timeout = "1s"
}
}
resources {
cpu = 500
memory = 256
}
}
}
}
|
shoenig
added a commit
that referenced
this issue
Jan 23, 2023
This PR fixes a bug where alloc pre-kill hooks were not run in the edge case where there are no live tasks remaining, but it is also the final update to process for the (terminal) allocation. We need to run cleanup hooks here, otherwise they will not run until the allocation gets garbage collected (i.e. via Destroy()), possibly at a distant time in the future. Fixes #15477
shoenig
added a commit
that referenced
this issue
Jan 23, 2023
This PR fixes a bug where alloc pre-kill hooks were not run in the edge case where there are no live tasks remaining, but it is also the final update to process for the (terminal) allocation. We need to run cleanup hooks here, otherwise they will not run until the allocation gets garbage collected (i.e. via Destroy()), possibly at a distant time in the future. Fixes #15477
shoenig
added a commit
that referenced
this issue
Jan 27, 2023
* client: run alloc pre-kill hooks on last pass despite no live tasks This PR fixes a bug where alloc pre-kill hooks were not run in the edge case where there are no live tasks remaining, but it is also the final update to process for the (terminal) allocation. We need to run cleanup hooks here, otherwise they will not run until the allocation gets garbage collected (i.e. via Destroy()), possibly at a distant time in the future. Fixes #15477 * client: do not run ar cleanup hooks if client is shutting down
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
stage/accepted
Confirmed, and intend to work on. No timeline committment though.
theme/service-discovery/nomad
type/bug
Nomad version
Nomad v1.4.3 (f464aca)
Issue
I have a service that logs HTTP requests and noticed that the endpoint given for health checking is being executed a few hundred times per second. There is a pretty aggressive restart policy on this job and we had a netsplit issue last night which lead to the service restarting around 600 times, so the logs are quite busy to say the least.
Reproduction steps
Run the job below and either stop the job and resubmit or have the process crash. The number of requests hitting the service increase until nomad is restarted.
Job file (if appropriate)
The text was updated successfully, but these errors were encountered: