Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nomad does not register service checks of type script #2180

Closed
gokhansengun opened this issue Jan 11, 2017 · 14 comments · Fixed by #2591
Closed

Nomad does not register service checks of type script #2180

gokhansengun opened this issue Jan 11, 2017 · 14 comments · Fixed by #2591

Comments

@gokhansengun
Copy link

gokhansengun commented Jan 11, 2017

Environment

Nomad v0.5.2
Consul v0.7.2
Linux 4.4.0-31-generic

Issue

Nomad does not register service checks of type scripts in case of scaling out the instances. One can see that service check is registered with type http (will be detailed later) is registered properly. The scripts have been added to the container using volumes configuration.

I was able to test the same behaviour with raw_exec and Docker. Both have 100% reproducibility each and every time with the same pattern, so it should be pretty easy to reproduce.

Reproduction steps

  1. Start with below job definition file in (a), replace LOCAL_SCRIPT_PATH with the local file path you put the script (b)

    (a) https://gist.github.com/gokhansengun/c3bc63a38649784b0b6ff33b43190b70#file-nginx-nomad
    (b) https://gist.github.com/gokhansengun/c3bc63a38649784b0b6ff33b43190b70#file-cpu-utilization-sh

  2. Schedule the job using nomad run <job_file_path>. Everything is fine until this point.

    nomad_service_check_initial_status
  3. Increase the count to 2 and reschedule with nomad run <job_file_path>.

    You should see that the service check of type http and name Serving Pages Status is registered for the new instance but not service check of type script and name nginx - cpu util.

    nomad_service_check_scale_by_1_status
  4. Increase the count to 3 and again reschedule.

    You should see that the two instances have both the service checks registered but one instance not.

    nomad_service_check_scale_by_2_status
@gokhansengun gokhansengun changed the title Nomad does not register service checks of type script with Docker driver Nomad does not register service checks of type script Jan 11, 2017
@dadgar
Copy link
Contributor

dadgar commented Feb 1, 2017

Hey I could not reproduce. Followed your directions and went from 1 all the way up to 7 (ran out of resources at that point):

https://gist.github.com/dadgar/d84680bc870b8daa90b80b1a8ed2d4ce

Nomad 0.5.4 (few commits ahead)
Consul v0.7.3
Linux version 4.4.0-51-generic

It may have just been the UI acting weird. Instead use this command to check what is registered:
curl http://127.0.0.1:8500/v1/agent/checks?pretty=true.

We can re-open if you are still hitting.

@dadgar dadgar closed this as completed Feb 1, 2017
@gokhansengun
Copy link
Author

Hey @dadgar, with the same version as you (except I use Nomad 0.5.4), I can still 100% replicate the problem. It is probably the difference in our settings. I have taken a look at agent logs with no luck.

Could you suggest how I can narrow this down?

@dadgar
Copy link
Contributor

dadgar commented Feb 27, 2017

@gokhansengun Can you maybe get reproduction steps on our Vagrant box and we can then debug it?

@gokhansengun
Copy link
Author

@dadgar thanks, I will give it a try but I am still on 0.5.2 due to problems with my setup on 0.5.4, as soon as I move to 0.5.5, will get back to you.

@schmichael
Copy link
Member

@gokhansengun I've reproduced your script check issue and it will be fixed in 0.6 by #2467

I've made #2478 to describe the changes and track related issues in one place.

@gokhansengun
Copy link
Author

@schmichael thanks, I had to set debugging this aside for a while, so good news. Will 0.6 be the next version after 0.5.5? What I see from the change log and infer from some comments is that 0.5.6 is said the next release.

@schmichael
Copy link
Member

@gokhansengun 0.5.6 is next (either today or Monday). 0.6 after that (unless we need another urgent patch release for some reason).

I'm going to be working hard to get that PR and some other Consul related work merged next week. I'd be happy to post some pre-release binaries for you to test with if you have time!

@gokhansengun
Copy link
Author

@schmichael thanks a lot for the info. I have already setup my development environment for Nomad (hopefully I will send a few patches in coming weeks) and I can build master myself. I am watching for the changes on master and will create a new binary as soon as you merge your branch to it.

I have a cluster of jobs (a lot of them :) ) highly coupled to Consul run by Nomad and I need your work. So surely I will try it for the first day and give you a feedback. Thanks a ton for refactoring the stuff :)

@gokhansengun
Copy link
Author

@schmichael, fyi, I have built the master locally and this issue still occurs for me.

$ nomad --version
Nomad v0.6.0-dev (53eb407+CHANGES)

@schmichael
Copy link
Member

@gokhansengun Confirmed! Updated tasks weren't getting their env var interpolated properly which broke the way Nomad tracks them internally.

Should have a fix up tomorrow.

schmichael added a commit that referenced this issue Apr 26, 2017
Previously was interpolating the original task's services again.

Fixes #2180

Also fixes a slight memory leak in the new consul agent. Script check
handles weren't being deleted after cancellation.
@schmichael
Copy link
Member

@gokhansengun Here's a build from #2591 if you're able to test.

linux_amd64.zip

@gokhansengun
Copy link
Author

@schmichael thanks a lot. This was blocking me from further testing. I will give it a go and let you know.

@gokhansengun
Copy link
Author

@schmichael seems to be fixed for me. Great job, thanks.

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 13, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants