Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Health checks using scripts within task/container #311

Closed
F21 opened this issue Oct 20, 2015 · 13 comments
Closed

Health checks using scripts within task/container #311

F21 opened this issue Oct 20, 2015 · 13 comments

Comments

@F21
Copy link

F21 commented Oct 20, 2015

From the discussion I have seen, nomad plans to lean heavily on consul for service discovery and health checks.

While consul provides the ability to perform a check by running a script, the script must exist within the context of consul. For example, if I have a HDFS cluster running on nomad, it would be a good idea to perform a check by using the hadoop command. Currently, as it stands, a copy of the hadoop binary would need to be available to consul, which complicates deployment (what if we updated the HDFS containers to a new version and need to deploy new hadoop binaries to all consul agents?

It would be really awesome, if through some sort of bridge, consul can run the check within the docker container or task. For example, if we could ask consul to run a command using the hadoop binary inside the container, it would make things much simpler.

I know there's also the possibility of creating a ttl health check by having a process inside the container run the checks and then send the results to consul over http, but this creates more moving parts and can become complex quickly.

@diptanu
Copy link
Contributor

diptanu commented Oct 22, 2015

@F21 We are currently adding support in Consul to run a script inside a Docker container for doing health checks. When Nomad registers a Task with Consul if the user has defined health check in a job we will pass that information to Consul too, which will be then used by Consul to start doing health checks by running scripts inside the Docker container.

If a user is running tasks which are exec or raw exec then Consul could do health checks by running the scripts directly on the host.

If the user is using Qemu to run KVM or Xen based workloads then Consul would support health checks which are only HTTP, TCP or TTL since Consul won't be able to execute a script within a VM.

@F21
Copy link
Author

F21 commented Oct 22, 2015

Awesome! In the case of consul running a script directly on the host for exec or raw exec jobs, I would assume it's doing so using the consul remote execution feature right?

@diptanu
Copy link
Contributor

diptanu commented Oct 22, 2015

@F21 I think we can avoid remote execution for doing healthchecks for tasks which are running via exec and raw exec on Nomad by running Consul Agent as a system task on the node in the raw exec mode. So Consul would be able to run any script on the host or in any chroot.

@F21
Copy link
Author

F21 commented Oct 22, 2015

That sounds like the way to do it! I hope consul support is added to nomad soon, would love to give that a spin!

@adrianlop
Copy link
Contributor

Hi @diptanu,
I see that the Docker check has been added to Consul 0.6 release.
According to the docs (https://consul.io/docs/agent/checks.html):

{
"check": {
"id": "mem-util",
"name": "Memory utilization",
"docker_container_id": "f972c95ebf0e",
"shell": "/bin/bash",
"script": "/usr/local/bin/check_mem.py",
"interval": "10s"
}
}
How can I pass to the service check, the container ID from a nomad jobfile? is there any Nomad variable available to use the container_id or container_name value already?

thanks!

@diptanu
Copy link
Contributor

diptanu commented Dec 9, 2015

@poll0rz The only check types supported for 0.2.1 was http and tcp. I will try to integrate the script type check for 0.3

@adrianlop
Copy link
Contributor

@diptanu great, thanks!

@pires
Copy link

pires commented Jan 8, 2016

I will try to integrate the script type check for 0.3

Wow, bring it on!

@adrianlop
Copy link
Contributor

hi @diptanu, this wasn't finally included in 0.3 right?

I have microservices that aren't listening in a port, they just consume Kafka messages and write in Cassandra. For now, I register them with a 'dummy port', so they're always in Consul marked as critical.

thanks!! great work

@diptanu
Copy link
Contributor

diptanu commented Feb 26, 2016

@adrianlop This is slotted for 0.3.1! Finally getting around to start working on this. We need to do this across all our drivers, so taking some time.

@diptanu diptanu self-assigned this Mar 14, 2016
@diptanu
Copy link
Contributor

diptanu commented Mar 26, 2016

Fixed via #986

@diptanu diptanu closed this as completed Mar 26, 2016
@diptanu
Copy link
Contributor

diptanu commented Mar 26, 2016

@adrianlop @F21 @pires This is done now, and would go out with Nomad 0.3.2

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 24, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants