Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fixing an edge case for 'START' container dependency #3927

Merged
merged 3 commits into from
Oct 5, 2023

Conversation

singholt
Copy link
Contributor

@singholt singholt commented Sep 26, 2023

Summary

ECS supports specifying container ordering dependencies in task definitions. documentation

This PR fixes an edge case with the START container ordering dependency. Consider the following customer use case:

  • An ECS task definition specifies 2 containers - A and B.
  • ctr A is essential and it depends on ctr B to "START".
  • ctr B is non-essential. It can be a sidecar that shouldn't impact the task's uptime, for example, a logging sidecar like firelens or a miscellaneous cleanup job runner.

The edge case we're trying to solve here becomes apparent when ctr B starts and stops even before the ctr A has reached >= PULLED state. By definition of the START dependency condition, ctr A should proceed to run as long as ctr B has started (it shouldn't matter when/how ctr B stops. Other dependency checks like COMPLETE, SUCCESS, HEALTHY let customers control the stop behaviors).

Implementation details

Updated the start condition checks with the following: return true (i.e. dependency is resolved) when the dependency container's known status is stopped.

Testing

New tests cover the changes: yes

  1. Added new unit test cases when dependency container known status is stopped.
  2. Added a new integration test for testing the START condition. We seem to have integration tests for all other conditions except start. I ran this integration test on an AL2 instance about a 100 times to ensure its not flaky.
  3. Manually tested that this change fixes the corner case, while not breaking existing use-cases. This case is reproducible by using 2 different container images; ctr A depends on START of B; ctr image for A is considerably larger than that of B; B exits even before A's image was finished pulling. The task stays in pending forever. If the images are cached then it runs fine, exhibiting two different inconsistent task behaviors.
  4. Also verified that a container's known status is STOPPED only when it has transitioned from RUNNING -> STOPPED. This implies that since it reached RUNNING, the START dependency condition has been fulfilled.

Description for the changelog

bugfix: fixing an edge case for 'START' container dependency

Licensing

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@singholt singholt changed the base branch from master to dev September 26, 2023 18:56
@singholt singholt changed the title wip container ordering: fixing a corner case for 'START' dependency Sep 26, 2023
@singholt singholt changed the title container ordering: fixing a corner case for 'START' dependency fixing an edge case for 'START' container dependency Sep 26, 2023
@singholt singholt marked this pull request as ready for review September 26, 2023 22:42
@singholt singholt requested a review from a team as a code owner September 26, 2023 22:42
@singholt singholt merged commit d4039d2 into aws:dev Oct 5, 2023
40 checks passed
@singholt singholt mentioned this pull request Oct 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants