Autoscaled instances registered before active health checks take into effect #16

adybuxton · 2019-01-18T14:44:33Z

We have an issue where instances are being added to the upstream lists before they are actually ready. The active NGinx health checks that run every 5 seconds, are ignored for roughly the first 30 seconds after the instance has been added. This results in the services timing out for 30 secs until the active health check marks the new instance as 'down' whilst chef installs the services.

We've noticed that this solution also adds auto scaled instances regardless of state, so by setting a lifecyle hook to mark the instance as initially pending and then allowing it timing out after several minutes to the InService state (whilst the services are provisioning in the background), still results in the new instance being added immediately when a new instance spins up.

Modifying the sync interval doesn’t make any difference in this scenario as an instance could be added towards the end of the sync interval and still show up immediately in the upstream list.

Is this a limitation of the service or there other approaches to mitigate this? Is there a reason why there is a large delay before active health checks mark the services as down?

pleshakov · 2019-01-18T16:28:12Z

@adybuxton
health checks take into consideration the connection and other timeouts related to establishing a connection and reading/sending response to a backend instance. Even though when the health check interval is 5 seconds, if the value of the timeout is bigger than 5s, the first health check for an unavailable instance will not fail until the timeout expires. So to make sure a health check fails fast, you can decrease the value of the connection and other timeouts. For example:

        proxy_connect_timeout 5s;
        proxy_read_timeout 5s;
        proxy_send_timeout 5s;

Additionally, you can tell NGINX Plus to not consider any new added instance as healthy until the first health check passes. For this, use the mandatory parameter when defining a health check -- http://nginx.org/en/docs/http/ngx_http_upstream_hc_module.html#health_check

Here is an example that uses low timeout values and the mandatory parameter. Please note that health checks can be put into a different internal location for convenience:

upstream webapp1 {
    zone webapp1 64k;
    state /var/lib/nginx/state/webapp1.conf;
}

    location /webapp1 {
        proxy_pass http://webapp1;
    }

    location @hc-webapp1 {
        internal;
        proxy_connect_timeout 1s;
        proxy_read_timeout 1s;
        proxy_send_timeout 1s;

        proxy_pass http://webapp1;
        health_check interval=1s mandatory;
    }

Is this a limitation of the service or there other approaches to mitigate this? Is there a reason why there is a large delay before active health checks mark the services as down?

yes, nginx-asg-sync doesn't take into consideration the state of an instances of a Auto Scaling group. However, to make sure that an unhealthy instance is never started being used by NGINX Plus, you can use mandatory health checks as described above.

adybuxton · 2019-01-18T16:34:05Z

Thanks, i'll take a look. One thing that may be of additional value and flexibility is to allow instances with specific lifecycle hook states to be filtered from the returned instance list, for instance Pending:Wait to cover situations when the instance is told to wait for a finite time (ie for provisioning) before it transitions into the InService state.

pleshakov · 2020-10-19T21:21:54Z

implemented in #39

pleshakov added the question An issue asking a question label Jan 18, 2019

HeyBillFinn mentioned this issue Apr 2, 2020

filtering out instances that are terminating #37

Closed

pleshakov closed this as completed Oct 19, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Autoscaled instances registered before active health checks take into effect #16

Autoscaled instances registered before active health checks take into effect #16

adybuxton commented Jan 18, 2019

pleshakov commented Jan 18, 2019

adybuxton commented Jan 18, 2019

pleshakov commented Oct 19, 2020

Autoscaled instances registered before active health checks take into effect #16

Autoscaled instances registered before active health checks take into effect #16

Comments

adybuxton commented Jan 18, 2019

pleshakov commented Jan 18, 2019

adybuxton commented Jan 18, 2019

pleshakov commented Oct 19, 2020