Implement graceful ASG shutdowns #376

jpswinski · 2024-02-12T14:00:31Z

When a cluster scales in, the terminating nodes are very quickly shutdown without any consideration for processing requests that they may be running. Current requests just error out, and in the case of proxied requests, the results are just lost.

The desired solution is to implement a graceful shutdown using ASG lifecycle hooks - specifically the terminate hook. The notification should be consumed by the terminating node, and it can then stop registering to the orchestrator and then wait 10 minutes. Or ideally, if the node has the ability to determine if any requests are running on it, then it could wait until all requests have completed before sending the continue command to the lifecycle (this would avoid the global 10 minute wait).

jpswinski · 2024-02-12T14:00:37Z

See https://circleci.com/blog/graceful-shutdown-using-aws

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement graceful ASG shutdowns #376

Implement graceful ASG shutdowns #376

jpswinski commented Feb 12, 2024

jpswinski commented Feb 12, 2024

Implement graceful ASG shutdowns #376

Implement graceful ASG shutdowns #376

Comments

jpswinski commented Feb 12, 2024

jpswinski commented Feb 12, 2024