-
Notifications
You must be signed in to change notification settings - Fork 59
Service Discovery and Healthcheck
Instance health is an important aspect of any cloud ready application. It is used for service discovery as well as bad instance termination. Through HealthCheck API an application can expose a REST endpoint for external monitoring to ping for health status or integrate with Eureka for service discovery registration based on instance health state. An instance can be in one of 2 lifecycle states: Healthy and Unhealthy. Find more information on the Netflix Runtime HealthCheck.
Dynomite-manager schedules a Quartz (lightweight thread) every 15 seconds that checks the health of both Dynomite and the underlying storage engine. Since most of our current production deployments leverage Redis or storage engines based on Redis Serialization Protocol (RESP), the healthcheck involves a three step approach.
- Check if Dynomite and Redis are running as Linux processes.
- Check if Dynomite can listen to a Redis
PING
and respond to a RedisPONG
. This step ensures that the neither Dynomite nor Redis are zombie processes and are operational. - Check if Dynomite can respond
OK
to a RedisSETEX
with 1 second TTL. We useSETEX
because we can expire the key without needed to fire another delete. This step ensures that although Dynomite and Redis are operational, they can still write traffic, which is not the case if the available memory has been exhausted or Redis for some reason runs in slave mode.
If any of the above checks are not satisfied, Dynomite-manager informs Eureka (Netflix Service registry for resilient mid-tier load balancing and failover) and the node is removed from Discovery. This ensures that the Dyno client can gracefully failover the traffic to another Dynomite node with the same token.