[Feature Request] Add option to nullify docker container healthchecks #5310

hvindin · 2019-02-09T11:19:51Z

Description

When building docker containers the option exists to declare a HEALTHCHECK during the build process. Nomad doesn't use this information at all and declares its own checks to monitor container health.

Essentially this means that unless you are running more than one scheduler to manage your containers there is no reason to have these health checks running.

Thus, it would be nice to have the ability to disable native healthchecks at runtime if the containers being scheduled had a useless but resource consuming HEALTHCHECK defined.

Use case

A long time ago, possibly before we even started using nomad to manage our clusters, I made the extremely naive decision to throw in a

HEALTHCHECK --interval=2s --timeout=5s --retries=5 CMD curl -Ssi http://127.0.0.1:8080/healthcheck | grep -q 200

In some of our Dev and Test environments we have up to 70 jobs running per node. Some of which contain code which takes a long time to start up. looking at the processes running on a node if we have just drained one of these giant nodes and have 60+ jobs starting at once reveals literally thousands of curl commands building up before things stabilise. Furthermore, after some time, when a couple of jobs on a node have become unhealthy the docker daemon seems to lock up and nomad is unable to remove the now-unresponsive container.

We didn't actually notice that this was an issue previously because our nodes were a more sensible size so the cumulative impact went unnoticed.

With this issue our current way forward is likely going to be to just throw another layer onto currently running versions of containers with

HEALTHCHECK NONE

as the only change. However one of the fun things about working in a large organisation is that I'm sure that some of the technical owners are going to want to manually regression test the change because it's a change they don't understand to their container.

Examples

The docker CLI provides a --no-healthcheck option at run-time. The API allows for NONE to be passed in to disable any predefined health checks. From memory, Kubernetes disables docker health checks all the time since ~mid-2017ish.

So this seems like it should be incredibly simple to do at the point in time that container configuration is being put together before the container starts.

My assumption would be that it would be better to default to leaving docker health checks as they are, on the off chance that the results are being used by someone for something, but to provide an option in the task config to disable native health checks if desired.

The text was updated successfully, but these errors were encountered:

rgl · 2020-12-15T22:20:04Z

Also please consider the other way around, using the status of the HEALTHCHECK CMD defined in the container as a valid check and propagate that to nomad/consul. That is, when the container has an HEALTHCHECK CMD use it (maybe in addition to the checks that are defined in the nomad job).

SrMouraSilva · 2022-06-22T16:11:53Z

I agree with @rgl.
I think that, now Nomad (1.3.0) has a Native Discovery Server, it also could be informs the container status to Services API. At this way, Traefik or other reverse proxy could consider the docker container healthy state to decide if it can be exposed or not. Or maybe prevent to deploy an unhealthy Canary.

…thcheck This PR adds a docker driver task configuration setting for turning off built-in HEALTHCHECK of a container. References) https://docs.docker.com/engine/reference/builder/#healthcheck https://github.com/docker/engine-api/blob/master/types/container/config.go#L16 Closes #5310 Closes #14068

github-actions · 2022-12-11T02:17:46Z

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

hvindin mentioned this issue Feb 9, 2019

Nomad should try and schedule enough healthy allocations to meet the count requirement without being bound by broken unhealthy allocations that cant be removed. #4862

Closed

schmichael added type/enhancement theme/driver/docker theme/client labels Feb 11, 2019

tgross added the stage/needs-discussion label Dec 16, 2020

tgross added this to Needs Roadmapping in Nomad - Community Issues Triage Feb 12, 2021

tgross removed this from Needs Roadmapping in Nomad - Community Issues Triage Mar 4, 2021

Amier3 added the help-wanted We encourage community PRs for these issues! label Apr 1, 2022

mikenomitch added the good first issue label Apr 5, 2022

SrMouraSilva mentioned this issue Jun 23, 2022

There isn't possible to define datacenter when I'm using the new Nomad Service Discovery traefik/traefik#9109

Closed

2 tasks

lcarpenter-hashi mentioned this issue Aug 10, 2022

docker: configuration for disabling docker healthchecks #14068

Closed

shoenig mentioned this issue Aug 11, 2022

docker: configuration for disable docker healthcheck #14089

Merged

shoenig closed this as completed in #14089 Aug 12, 2022

hc-github-team-nomad-core mentioned this issue Aug 12, 2022

Backport of docker: configuration for disable docker healthcheck into release/1.3.x #14095

Merged

github-actions bot locked as resolved and limited conversation to collaborators Dec 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Add option to nullify docker container healthchecks #5310

[Feature Request] Add option to nullify docker container healthchecks #5310

hvindin commented Feb 9, 2019

rgl commented Dec 15, 2020

SrMouraSilva commented Jun 22, 2022

github-actions bot commented Dec 11, 2022

[Feature Request] Add option to nullify docker container healthchecks #5310

[Feature Request] Add option to nullify docker container healthchecks #5310

Comments

hvindin commented Feb 9, 2019

Description

Use case

Examples

rgl commented Dec 15, 2020

SrMouraSilva commented Jun 22, 2022

github-actions bot commented Dec 11, 2022