This project is an extension of the ideas presented in the AWS Blog post and the GitHub sample, with key differences:
- Receives AutoScaling Hooks events through CloudWatch rules, supporting multiple ECS Clusters with a single function.
- Utilizes the Serverless Framework for deployment.
- Implemented in Golang.
- Supports draining Spot Instances via Spot Instance Interruption Notice.
During ECS instance AMI updates, the Auto Scaling Group (ASG) may replace instances without draining them, potentially causing brief container downtimes. This function automates the draining process for ECS cluster instances, enhancing availability.
The ecs-drainer function:
- Receives a CloudWatch event:
- An ANY AutoScaling Lifecycle Terminate event configured via EC2 Auto Scaling Lifecycle Hooks for the
autoscaling:EC2_INSTANCE_TERMINATING
event. - An ANY Spot Instance Interruption Notice (Note: AWS does not guarantee that instances will be drained in time; instances could be terminated before notice arrival).
- An ANY AutoScaling Lifecycle Terminate event configured via EC2 Auto Scaling Lifecycle Hooks for the
- Retrieves the ID of the terminating instance.
- Extracts the ECS Cluster name from the instance's UserData (
ECS_CLUSTER=xxxxxxxxx
format). - Initiates the draining process if ECS Tasks are running on the instance.
- Waits for all ECS Tasks to shutdown.
- Completes the Lifecycle Hook, allowing the ASG to proceed with instance termination.
- Serverless Framework
- Golang
- GNU Make
- Configured EC2 Auto Scaling Lifecycle Hooks for the
autoscaling:EC2_INSTANCE_TERMINATING
event.
ASGTerminateHook:
Type: "AWS::AutoScaling::LifecycleHook"
Properties:
AutoScalingGroupName: !Ref ECSAutoScalingGroup
DefaultResult: "ABANDON"
HeartbeatTimeout: "900"
LifecycleTransition: "autoscaling:EC2_INSTANCE_TERMINATING"
git clone github.com/moabukar/ecs-drainer.git
cd ecs-drainer
make deploy
# To specify a different AWS region, use:
sls deploy -v --region
For Terraform deployment, refer to the ecs-drainer Terraform Module.
-
The function waits up to 15 minutes to complete the Drain process. If it exceeds this time, it times out.
-
Failure of the function triggers the default lifecycle hook action (ABANDON or CONTINUE), both of which will allow the instance to terminate. ABANDON will halt any remaining actions, such as other lifecycle hooks, while CONTINUE will allow them to complete.