Always deregister from Consul #136

far-blue · 2016-07-26T13:39:44Z

I've hunted through the wiki documentation but can't find anywhere referring to how to shutdown fabio in a graceful manner that deregisters the service and associated health check from Consul. I know with other apps you can send a different sig (e.g. sigint) for a graceful shutdown. Does Fabio support such behaviour and, if not, what's the recommended approach for graceful shutdown that also cleans up Consul records?

magiconair · 2016-07-26T13:53:42Z

SIGQUIT or SIGTERM should do this. Let me check whether this is a regression. How are you shutting it down?

far-blue · 2016-07-26T13:56:03Z

I'm just running the docker container (in the foreground) via docker-compose and when I want to make changes I'm using ctl-c to shut it down and just running docker-compose up again.

magiconair · 2016-07-26T14:44:32Z

Looks like a shutdown race. I'll try to make it more robust and provide a patch. Can you work with that and build a fabio binary yourself or do you need a full Docker container to test this?

far-blue · 2016-07-26T14:50:23Z

What's the race? Is it that docker-compose isn't giving enough time for the shutdown? If that's the case I can tell docker-compose to wait longer before sending the kill signal.

If it's something internal then yes, I can prob. build my own docker image from the source or, if not, I can wait for a new image release. As long as I know the problem is being sorted I can cope with cleaning out the Consul data manually during testing :)

magiconair · 2016-07-26T14:56:55Z

Actually, I don't think there is a race. startListeners blocks until it gets a shutdown signal which is triggered by signal.Notify(sigchan, os.Interrupt, os.Kill, syscall.SIGTERM). That closes the quit channel which returns startListeners and then triggers registry.Default.Deregister() as the last call. And that one waits for completion, AFAICT.

You should have the following log output:

2016/07/26 16:50:53 [INFO] Updated config to
2016/07/26 16:50:53 [INFO] consul: Registered fabio with id "fabio-LM-AMS-00963965-9998"
2016/07/26 16:50:53 [INFO] consul: Registered fabio with address "10.249.126.25"
2016/07/26 16:50:53 [INFO] consul: Registered fabio with tags ""
2016/07/26 16:50:53 [INFO] consul: Registered fabio with health check to "http://[10.249.126.25]:9998/health"
2016/07/26 16:50:53 [INFO] consul: Health changed to #3360
2016/07/26 16:50:54 [INFO] consul: Health changed to #3361
2016/07/26 16:51:00 [INFO] Graceful shutdown over 0
2016/07/26 16:51:00 [INFO] Down
2016/07/26 16:51:00 [INFO] consul: Deregistering fabio
2016/07/26 16:51:00 [INFO] consul: Health changed to #3363

far-blue · 2016-07-26T15:10:37Z

Ah, right, I think I see the problem. My mistake. It must be that it only leaves the data in Consul if it dies for some reason while processing the config file or it fails to talk to Vault. I think I'd miss-counted the number of left-over healthchecks vs the number of times I'd started and stopped the container.

far-blue · 2016-07-26T15:23:55Z

It also doesn't help that the 'Down' line and all after it are not visible via docker-compose during the shutdown - although if you run docker-compose logs after the container has terminated you can then see the log output :)

Thanks for looking into things and sorry it was my error!

magiconair · 2016-07-26T15:25:37Z

I think by moving the deregistration into a defer block should take care of a stale registration if fabio fatals during startup but after registration. I'll have a look.

far-blue · 2016-07-26T15:31:56Z

ah, that would be great :) I guess leaving the health check in place to turn critical when/if Fabio fails is a good thing as long as Fabio has started up correctly and has been running for a few seconds.

As I run consul-alert I find out pretty quickly if health checks are left dangling!

magiconair · 2016-07-26T15:32:27Z

Yeah, that's the edge case. If fabio FATALs after initBackend it does not clean up after itself. defer won't help since FATAL calls os.Exit(1) I'll see how to improve that.

fabio was not deregistering from consul if there was a critical failure (FATAL) after the initial consul registration. This patch removes the calls to log.Fatal() and replaces them with exit.Fatal() which waits until all registered exit listeners have been called.

magiconair · 2016-07-26T15:57:56Z

I've pushed a patch which seems to address the issue. log.Fatal has been replaced with exit.Fatal which waits until all registered exit handlers have been called and then exits. The line numbers of the last log lines should be off and I need to think about the approach in general a bit more but it would be great if you could test it.

fabio was not deregistering from consul if there was a critical failure (FATAL) after the initial consul registration. This patch removes the calls to log.Fatal() and replaces them with exit.Fatal() which waits until all registered exit listeners have been called.

far-blue · 2016-07-26T16:31:01Z

I've tried building but I'm not a go developer and the build is failing with a weird error I don't understand:

src/github.com/eBay/fabio/admin/server.go:17: cannot use cfg (type *"github.com/eBay/fabio/vendor/github.com/eBay/fabio/config".Config) as type *"github.com/eBay/fabio/vendor/github.com/eBay/fabio/vendor/github.com/eBay/fabio/config".Config in assignment

I suspect this is just because something is set up wrong for building but it might be better for someone else to test ;)

For reference, if you want to repeat the exit I was getting that was leaving the service registration in Consul I think you can replicate it by simply pointing a vault cs config at an invalid url.

far-blue · 2016-07-26T16:33:34Z

e.g. set an invalid VAULT_ADDR and VAULT_TOKEN and setup a cs in the config

proxy.cs = cs=foo;type=vault;cert=wibble
proxy.addr = :443;cs=foo

magiconair · 2016-07-26T17:00:23Z

Try this:

mkdir ~/gopath
export GOPATH=~/gopath
go get -u github.com/eBay/fabio
cd ~/gopath/src/github.com/eBay/fabio
git checkout issue-136-always-deregister-from-consul
go build (or make)
./fabio

If you are on OSX and need to build a linux/amd64 binary run make linux

far-blue · 2016-07-26T18:45:39Z

ah, great, that did it. Managed to build and package into a docker image on my home machine and push up to the work server for testing and it now appears to correctly deregister after it encounters an error. I tested it with the setup I was using with the invalid vault token:

fabio_1 | 2016/07/26 18:40:21 [INFO] Version v1.2-4-g75c5c70 starting
fabio_1 | 2016/07/26 18:40:21 [INFO] Go runtime is go1.6.3
fabio_1 | 2016/07/26 18:40:21 [INFO] Setting GOGC=800
fabio_1 | 2016/07/26 18:40:21 [INFO] Setting GOMAXPROCS=8
fabio_1 | 2016/07/26 18:40:21 [INFO] Metrics disabled
fabio_1 | 2016/07/26 18:40:21 [INFO] consul: Connecting to "bar:8500" in datacenter "foo"
fabio_1 | 2016/07/26 18:40:21 [INFO] Admin server listening on ":9998"
fabio_1 | 2016/07/26 18:40:21 [INFO] Using routing strategy "rnd"
fabio_1 | 2016/07/26 18:40:21 [INFO] Using routing matching "prefix"
fabio_1 | 2016/07/26 18:40:21 [INFO] consul: Using dynamic routes
fabio_1 | 2016/07/26 18:40:21 [INFO] consul: Using tag prefix "urlprefix-"
fabio_1 | 2016/07/26 18:40:21 [INFO] consul: Watching KV path "/fabio/config"
fabio_1 | 2016/07/26 18:40:21 [INFO] consul: Manual config changed to #8719119
fabio_1 | 2016/07/26 18:40:21 [INFO] Updated config to
fabio_1 | 2016/07/26 18:40:21 [INFO] consul: Health changed to #8719118
fabio_1 | 2016/07/26 18:40:21 [FATAL] vault: client: Error making API request.
fabio_1 |
fabio_1 | URL: POST http://ping:8200/v1/auth/token/create
fabio_1 | Code: 400. Errors:
fabio_1 |
fabio_1 | * root or sudo privileges required to create orphan token
fabio_1 | 2016/07/26 18:40:21 [INFO] Graceful shutdown over 0
fabio_1 | 2016/07/26 18:40:21 [INFO] Down
fabio_1 | 2016/07/26 18:40:21 [INFO] consul: Registered fabio with id "fabio-c49206c22615-9998"
fabio_1 | 2016/07/26 18:40:21 [INFO] consul: Registered fabio with address "192.168.5.3"
fabio_1 | 2016/07/26 18:40:21 [INFO] consul: Registered fabio with tags ""
fabio_1 | 2016/07/26 18:40:21 [INFO] consul: Registered fabio with health check to "http://[192.168.5.3]:9998/health"
fabio_1 | 2016/07/26 18:40:21 [INFO] consul: Deregistering fabio
fabio_1 | 2016/07/26 18:40:21 [INFO] Updated config to
fabio_1 | route add Test-Nginx wibble/ http://192.168.5.2:80/ tags "urlprefix-wibble/"
fabio_1 | 2016/07/26 18:40:21 [INFO] consul: Health changed to #8719122

magiconair · 2016-07-26T18:48:29Z

OK, then I'll let that rest for a couple of days. Helps me to reason about the approach a bit longer. If nothing comes up then I'll merge it. Looking at #134 now.

fabio was not deregistering from consul if there was a critical failure (FATAL) after the initial consul registration. This patch replaces the calls to log.Fatal()/log.Fatalf() and with exit.Fatal()/exit.Fatalf() which wait until all registered exit listeners have been called.

magiconair · 2016-07-30T10:02:08Z

I've had a look at it again and think the approach is sound. I've added a test for the exit behavior and made sure that exit.FatalF can be called concurrently. Just merged this to master.

magiconair · 2016-07-30T10:02:56Z

Good way to test this is ./fabio -proxy.addr 1.2.3.4 assuming that you don't have 1.2.3.4 on that machine.

far-blue · 2016-07-31T12:07:49Z

One question - does the health check always get removed on any termination or does your patch only mean any problems during startup are caught?

I'm pretty new to health checks so maybe my thinking is a bit wrong but to my mind if a process has been running for more than a 'grace' period of maybe a few mins and then terminates there are situations where leaving the health check in place (so it then fails) is a good thing. If you are not using any form of orchestration system and Fabio fails unexpectedly after it's been running for a while then the health check is a way to be informed of this so shouldn't be cleared up. If you are using an orchestration system then I assume it will notice if the container exits and re-instantiate it so Fabio will just update its details and health check anyhow on restart.

Or, as I said, maybe I'm thinking this wrong :)

magiconair · 2016-07-31T12:39:37Z

Health check disappears with the service since you can't register a health check without a service.

far-blue · 2016-07-31T15:04:10Z

ok, so if Fabio crashes out with an exception then the service and health check will be deregistered in all cases? Maybe it would be good to document what situations will cause the health check to fail and what situations will cause fabio to exit and deregister the health check so the scope of the health check as a warning of incorrect operation is understood :)

magiconair · 2016-07-31T15:38:04Z

No, in that case fabio will not de-register itself but consul will mark fabio as down as the health check fails. Just restart fabio or clear the registration manually. If you use consul for things other than fabio (e.g. service discovery between your services) then this is not a problem. I also don't recall consul crashing on us either.

However, after having fabio in production for a year on several fairly heavily loaded public websites with tens of fabio instances and hundreds of micro-services we literally never had that happen with any of the releases (knock on wood). This doesn't mean it can't happen but this is not the kind of thing you have to deal with on a regular basis and it is fairly easy to fix if it does happen.

I've come to call fabio the piece of infrastructure we forget it exists.

far-blue · 2016-08-01T01:12:22Z

Great, that's what I hoped would happen! I'd like to be alerted when / if Fabio was ever to die unexpectedly and just wanted to make sure the changes you made to clean up on exit after config error etc. didn't change that behaviour :)

I'm glad to hear Fabio has been so stable and I've had the same experiences as you with Consul - I've never had any unexpected behaviour from it and it's been as stable as a rock.

magiconair added the looking label Jul 26, 2016

magiconair changed the title ~~Deregistering from Consul on 'polite' shutdown~~ Always deregister from Consul Jul 26, 2016

magiconair added bug and removed looking labels Jul 26, 2016

magiconair closed this as completed Jul 30, 2016

magiconair added this to the 1.2.1 milestone Oct 10, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Always deregister from Consul #136

Always deregister from Consul #136

far-blue commented Jul 26, 2016

magiconair commented Jul 26, 2016

far-blue commented Jul 26, 2016 •

edited

Loading

magiconair commented Jul 26, 2016

far-blue commented Jul 26, 2016

magiconair commented Jul 26, 2016

far-blue commented Jul 26, 2016

far-blue commented Jul 26, 2016

magiconair commented Jul 26, 2016

far-blue commented Jul 26, 2016

magiconair commented Jul 26, 2016 •

edited

Loading

magiconair commented Jul 26, 2016

far-blue commented Jul 26, 2016 •

edited

Loading

far-blue commented Jul 26, 2016

magiconair commented Jul 26, 2016

far-blue commented Jul 26, 2016

magiconair commented Jul 26, 2016

magiconair commented Jul 30, 2016

magiconair commented Jul 30, 2016

far-blue commented Jul 31, 2016

magiconair commented Jul 31, 2016

far-blue commented Jul 31, 2016

magiconair commented Jul 31, 2016

far-blue commented Aug 1, 2016

Always deregister from Consul #136

Always deregister from Consul #136

Comments

far-blue commented Jul 26, 2016

magiconair commented Jul 26, 2016

far-blue commented Jul 26, 2016 • edited Loading

magiconair commented Jul 26, 2016

far-blue commented Jul 26, 2016

magiconair commented Jul 26, 2016

far-blue commented Jul 26, 2016

far-blue commented Jul 26, 2016

magiconair commented Jul 26, 2016

far-blue commented Jul 26, 2016

magiconair commented Jul 26, 2016 • edited Loading

magiconair commented Jul 26, 2016

far-blue commented Jul 26, 2016 • edited Loading

far-blue commented Jul 26, 2016

magiconair commented Jul 26, 2016

far-blue commented Jul 26, 2016

magiconair commented Jul 26, 2016

magiconair commented Jul 30, 2016

magiconair commented Jul 30, 2016

far-blue commented Jul 31, 2016

magiconair commented Jul 31, 2016

far-blue commented Jul 31, 2016

magiconair commented Jul 31, 2016

far-blue commented Aug 1, 2016

far-blue commented Jul 26, 2016 •

edited

Loading

magiconair commented Jul 26, 2016 •

edited

Loading

far-blue commented Jul 26, 2016 •

edited

Loading