Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
fix(aws-ecs): drain hook lambda allows tasks to stop gracefully (#13559)
fixes #13506 ### Description After the container instance is set to draining, the tasks running on it transition from RUNNING > DEACTIVATING > STOPPING > DEPROVISIONING > STOPPED. The current way of counting running tasks via `instance['runningTasksCount'] + instance['pendingTasksCount']` does not include tasks in those transitional states, leading to the EC2 instance being terminated prematurely. ### Verification I have verified the change by manually updating the automatically created drain hook lambda and then running a ASG refresh. I ran the test with additional debug output to compare the old logic of `runningTasksCount + pendingTasksCount` and the new logic that fetches the status of the tasks. I interleaved the logs from the ECS events, application running in the task and the drain hook lambda: ``` 2021-03-11T15:56:52.608-08:00 Instance i-1234567890abcdefg has container instance ARN arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv 2021-03-11T15:56:52.649-08:00 Instance ARN arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has task ARNs arn:aws:ecs:us-west-2:123456789012:task/fooservice/1234567890abcdefghijklmnopqrstuv 2021-03-11T15:57:03.018-08:00 OLD: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 1 tasks 2021-03-11T15:57:03.051-08:00 NEW: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 1 tasks 2021-03-11T15:57:13.215-08:00 OLD: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 1 tasks 2021-03-11T15:57:13.280-08:00 NEW: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 1 tasks 2021-03-11T15:57:15.280-08:00 service fooservice has stopped 1 running tasks: task 1234567890abcdefghijklmnopqrstuv. 2021-03-11T15:57:23.438-08:00 OLD: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 0 tasks 2021-03-11T15:57:23.490-08:00 NEW: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 1 tasks 2021-03-11T15:57:33.632-08:00 OLD: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 0 tasks 2021-03-11T15:57:33.690-08:00 NEW: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 1 tasks 2021-03-11T15:57:43.853-08:00 OLD: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 0 tasks 2021-03-11T15:57:43.890-08:00 NEW: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 1 tasks 2021-03-11T15:57:46.000-08:00 service fooservice has started 1 tasks: task 1234567890abcdefghijklmnopqrstuv. 2021-03-11T15:57:46.000-08:00 (service fooservice, taskSet ecs-svc/1234567890abcdefghi) has begun draining connections on 2 tasks. 2021-03-11T15:57:46.000-08:00 service fooservice deregistered 1 targets in target-group fooservice-vpce-target 2021-03-11T15:57:46.000-08:00 service fooservice deregistered 1 targets in target-group fooservice-target 2021-03-11T15:57:54.032-08:00 OLD: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 0 tasks 2021-03-11T15:57:54.090-08:00 NEW: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 1 tasks 2021-03-11T15:57:58.000-08:00 service fooservice registered 1 targets in target-group fooservice-vpce-target 2021-03-11T15:57:58.000-08:00 service fooservice registered 1 targets in target-group fooservice-target 2021-03-11T15:58:04.242-08:00 OLD: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 0 tasks 2021-03-11T15:58:04.270-08:00 NEW: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 1 tasks 2021-03-11T15:58:14.430-08:00 OLD: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 0 tasks 2021-03-11T15:58:14.470-08:00 NEW: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 1 tasks 2021-03-11T15:58:24.611-08:00 OLD: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 0 tasks 2021-03-11T15:58:24.650-08:00 NEW: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 1 tasks 2021-03-11T15:58:34.796-08:00 OLD: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 0 tasks 2021-03-11T15:58:34.850-08:00 NEW: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 1 tasks 2021-03-11T15:58:44.999-08:00 OLD: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 0 tasks 2021-03-11T15:58:45.030-08:00 NEW: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 1 tasks 2021-03-11T15:58:49.000-08:00 app received SIGTERM 2021-03-11T15:58:54.000-08:00 service fooservice has reached a steady state. 2021-03-11T15:58:55.170-08:00 OLD: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 0 tasks 2021-03-11T15:58:55.210-08:00 NEW: Instance arn:aws:ecs:us-west-2:123456789012:container-instance/fooservice/1234567890abcdefghijklmnopqrstuv has 0 tasks 2021-03-11T15:58:55.210-08:00 Terminating instance i-1234567890abcdefg ``` The logs show that the new approach allows ecs to drain connections, deregister the target and respect the `deregistrationDelay` ( set to 1 minute in this case ). The old approach would have terminated the EC2 instance 23 seconds prior to ECS even deregistering the target, leading to 502 errors. ### Pull Request Checklist - [x] Testing I was not able to find any tests validating the functionality of the lambda. However, I have updated `expected.json` files to expect the new lambda function code. - [ ] Docs - *Not Applicable* No previously documented behavior has changed - [x] Title and Description - [ ] Sensitive Modules (requires 2 PR approvers) - *Not Applicable* ### Impact End users utilizing ECS on EC2 with capacity provided by an ASG will see an increase in instance termination time, however the process is now much safer, respects the ALBs `deregistrationDelay` and will reduce connection errors. ---- *By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
- Loading branch information