-
Notifications
You must be signed in to change notification settings - Fork 617
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Always deregister from Consul #136
Comments
|
I'm just running the docker container (in the foreground) via docker-compose and when I want to make changes I'm using ctl-c to shut it down and just running |
Looks like a shutdown race. I'll try to make it more robust and provide a patch. Can you work with that and build a fabio binary yourself or do you need a full Docker container to test this? |
What's the race? Is it that docker-compose isn't giving enough time for the shutdown? If that's the case I can tell docker-compose to wait longer before sending the kill signal. If it's something internal then yes, I can prob. build my own docker image from the source or, if not, I can wait for a new image release. As long as I know the problem is being sorted I can cope with cleaning out the Consul data manually during testing :) |
Actually, I don't think there is a race. You should have the following log output:
|
Ah, right, I think I see the problem. My mistake. It must be that it only leaves the data in Consul if it dies for some reason while processing the config file or it fails to talk to Vault. I think I'd miss-counted the number of left-over healthchecks vs the number of times I'd started and stopped the container. |
It also doesn't help that the 'Down' line and all after it are not visible via docker-compose during the shutdown - although if you run Thanks for looking into things and sorry it was my error! |
I think by moving the deregistration into a |
ah, that would be great :) I guess leaving the health check in place to turn critical when/if Fabio fails is a good thing as long as Fabio has started up correctly and has been running for a few seconds. As I run consul-alert I find out pretty quickly if health checks are left dangling! |
Yeah, that's the edge case. If fabio FATALs after |
fabio was not deregistering from consul if there was a critical failure (FATAL) after the initial consul registration. This patch removes the calls to log.Fatal() and replaces them with exit.Fatal() which waits until all registered exit listeners have been called.
I've pushed a patch which seems to address the issue. |
fabio was not deregistering from consul if there was a critical failure (FATAL) after the initial consul registration. This patch removes the calls to log.Fatal() and replaces them with exit.Fatal() which waits until all registered exit listeners have been called.
I've tried building but I'm not a go developer and the build is failing with a weird error I don't understand:
I suspect this is just because something is set up wrong for building but it might be better for someone else to test ;) For reference, if you want to repeat the exit I was getting that was leaving the service registration in Consul I think you can replicate it by simply pointing a vault cs config at an invalid url. |
e.g. set an invalid VAULT_ADDR and VAULT_TOKEN and setup a cs in the config
|
Try this:
If you are on OSX and need to build a |
ah, great, that did it. Managed to build and package into a docker image on my home machine and push up to the work server for testing and it now appears to correctly deregister after it encounters an error. I tested it with the setup I was using with the invalid vault token:
|
OK, then I'll let that rest for a couple of days. Helps me to reason about the approach a bit longer. If nothing comes up then I'll merge it. Looking at #134 now. |
fabio was not deregistering from consul if there was a critical failure (FATAL) after the initial consul registration. This patch replaces the calls to log.Fatal()/log.Fatalf() and with exit.Fatal()/exit.Fatalf() which wait until all registered exit listeners have been called.
I've had a look at it again and think the approach is sound. I've added a test for the exit behavior and made sure that |
Good way to test this is |
One question - does the health check always get removed on any termination or does your patch only mean any problems during startup are caught? I'm pretty new to health checks so maybe my thinking is a bit wrong but to my mind if a process has been running for more than a 'grace' period of maybe a few mins and then terminates there are situations where leaving the health check in place (so it then fails) is a good thing. If you are not using any form of orchestration system and Fabio fails unexpectedly after it's been running for a while then the health check is a way to be informed of this so shouldn't be cleared up. If you are using an orchestration system then I assume it will notice if the container exits and re-instantiate it so Fabio will just update its details and health check anyhow on restart. Or, as I said, maybe I'm thinking this wrong :) |
Health check disappears with the service since you can't register a health check without a service. |
ok, so if Fabio crashes out with an exception then the service and health check will be deregistered in all cases? Maybe it would be good to document what situations will cause the health check to fail and what situations will cause fabio to exit and deregister the health check so the scope of the health check as a warning of incorrect operation is understood :) |
No, in that case fabio will not de-register itself but consul will mark fabio as down as the health check fails. Just restart fabio or clear the registration manually. If you use consul for things other than fabio (e.g. service discovery between your services) then this is not a problem. I also don't recall consul crashing on us either. However, after having fabio in production for a year on several fairly heavily loaded public websites with tens of fabio instances and hundreds of micro-services we literally never had that happen with any of the releases (knock on wood). This doesn't mean it can't happen but this is not the kind of thing you have to deal with on a regular basis and it is fairly easy to fix if it does happen. I've come to call fabio the piece of infrastructure we forget it exists. |
Great, that's what I hoped would happen! I'd like to be alerted when / if Fabio was ever to die unexpectedly and just wanted to make sure the changes you made to clean up on exit after config error etc. didn't change that behaviour :) I'm glad to hear Fabio has been so stable and I've had the same experiences as you with Consul - I've never had any unexpected behaviour from it and it's been as stable as a rock. |
I've hunted through the wiki documentation but can't find anywhere referring to how to shutdown fabio in a graceful manner that deregisters the service and associated health check from Consul. I know with other apps you can send a different sig (e.g. sigint) for a graceful shutdown. Does Fabio support such behaviour and, if not, what's the recommended approach for graceful shutdown that also cleans up Consul records?
The text was updated successfully, but these errors were encountered: