Consul SD causing 100% cpu usage if the service is down #627

Xzya · 2017-10-26T11:23:22Z

Hey,

It seems that if the consul service is down, the Instancer will constantly try to get the list of services, causing 100% CPU usage until the service is back up. There should probably be a backoff mechanism which will only try after X amount of time. Here is how I'm using the client:

var client consulsd.Client
{
	consulConfig := consulapi.DefaultConfig()
	consulConfig.WaitTime = 10 * time.Second
	consulConfig.Address = config.Consul.Address
	consulClient, err := consulapi.NewClient(consulConfig)
	if err != nil {
		return Endpoints{}, err
	}
	client = consulsd.NewClient(consulClient)
}

var (
	retryMax     = 1
	retryTimeout = 2 * time.Second
	tags         = []string{environment}
	passingOnly  = true
	endpoints    = Endpoints{}
	instancer    = consulsd.NewInstancer(client, logger, ServiceName, tags, passingOnly)
)
{
	factory := MainDBClientFactory(MakeRegisterEndpoint)
	endpointer := sd.NewEndpointer(instancer, factory, logger)
	balancer := lb.NewRoundRobin(endpointer)
	retry := lb.Retry(retryMax, retryTimeout, balancer)
	endpoints.RegisterEndpoint = retry
}
...

Terminal output:
ts=2017-10-26T11:08:33.168160872Z caller=instancer.go:69 service=maindb tags=[local] err="Get http://192.168.99.100:8500/v1/health/service/maindb?passing=1&tag=local&wait=10000ms: dial tcp 192.168.99.100:8500: getsockopt: connection refused" ts=2017-10-26T11:08:33.171253831Z caller=instancer.go:69 service=maindb tags=[local] err="Get http://192.168.99.100:8500/v1/health/service/maindb?passing=1&tag=local&wait=10000ms: dial tcp 192.168.99.100:8500: getsockopt: connection refused" ts=2017-10-26T11:08:33.175242518Z caller=instancer.go:69 service=maindb tags=[local] err="Get http://192.168.99.100:8500/v1/health/service/maindb?passing=1&tag=local&wait=10000ms: dial tcp 192.168.99.100:8500: getsockopt: connection refused" ts=2017-10-26T11:08:33.176126961Z caller=instancer.go:69 service=maindb tags=[local] err="Get http://192.168.99.100:8500/v1/health/service/maindb?passing=1&tag=local&wait=10000ms: dial tcp 192.168.99.100:8500: getsockopt: connection refused" ...

The text was updated successfully, but these errors were encountered:

Add backoff to the Consul instancer loop. Fixes go-kit#627.

Justification for jitter and growth factor: https://aws.amazon.com/blogs/architecture/exponential-backoff-and-jitter/. Add backoff to the Consul instancer loop. Fixes go-kit#627.

* Add backoff package Justification for jitter and growth factor: https://aws.amazon.com/blogs/architecture/exponential-backoff-and-jitter/. Add backoff to the Consul instancer loop. Fixes #627. * Revert "Add backoff package" This reverts commit 924501a. * Get rid of external package and update exponential * Add instancer backoff * Fix old exponential name * Add doc comment * Fixup & respond to review

* Add backoff package Justification for jitter and growth factor: https://aws.amazon.com/blogs/architecture/exponential-backoff-and-jitter/. Add backoff to the Consul instancer loop. Fixes go-kit/kit#627. * Revert "Add backoff package" This reverts commit 924501ae1fcfadaa27593e9c019283412c513928. * Get rid of external package and update exponential * Add instancer backoff * Fix old exponential name * Add doc comment * Fixup & respond to review

peterbourgon added bug help wanted labels Oct 26, 2017

nicot pushed a commit to nicot/kit that referenced this issue Dec 3, 2017

Add backoff package

439fef3

Add backoff to the Consul instancer loop. Fixes go-kit#627.

nicot pushed a commit to nicot/kit that referenced this issue Dec 3, 2017

Add backoff package

924501a

Justification for jitter and growth factor: https://aws.amazon.com/blogs/architecture/exponential-backoff-and-jitter/. Add backoff to the Consul instancer loop. Fixes go-kit#627.

nicot mentioned this issue Dec 3, 2017

Add backoff package and fix Consul CPU usage #635

Merged

peterbourgon closed this as completed in #635 Apr 2, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consul SD causing 100% cpu usage if the service is down #627

Consul SD causing 100% cpu usage if the service is down #627

Xzya commented Oct 26, 2017

Consul SD causing 100% cpu usage if the service is down #627

Consul SD causing 100% cpu usage if the service is down #627

Comments

Xzya commented Oct 26, 2017