-
Notifications
You must be signed in to change notification settings - Fork 323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ECS] [Proposal]: Container Ordering #123
Comments
|
I moved already from designing this kind of dependency after reading moby/moby#31333, in particular comment moby/moby#31333 (comment).
Your orchestration is now out of order. Should ECS forcefully terminate B? What if there's no time? That being said, the START, SUCCESS and the COMPLETE scenarios described here does seem safe to use as they represent a immutable container state e.g a started container will never not have been started or a container that already shutdown will never not have been executed. |
@deleugpn your points are well taken and perhaps AWS should add a section to the documentation to warn users of the potential pitfalls and complexities of the feature, but overall I think it is a net positive for most situations. Take for example a case where you have multiple containers depending on a database or message queue to fully start. Currently, each container is responsible for implementing the exact same logic in a custom entrypoint script. With this feature the DB or message queue container can take on the responsible for implementing the check and marking itself as healthy, and then all other containers simply reuse that information. Regarding your example, the developer can decide what will happen by marking Container A as essential or not. If it is, then everything restarts, if not, not. So I agree, some people may get a false sense of security from this feature and may end up having problems troubleshooting if they don't have adequate fallbacks and proper timeouts, but overall I think it much is better to have the functionality available (and off by default). Good documentation and practical examples will help as well. |
This would be extremely useful for us - at the moment if a container instance dies there's a land rush for our ECS services to launch tasks which causes lots of unnecessary logs to be written and alarms to go off. For example we have a Consul ECS service which launches a deamon task on each cluster instance. Our microservices (each expressed as their own ECS service + task definition) expect this container to be running when they launch and inevitably fail until the Consul task is ready. Our current solution for this involves using bash for loops and silent curl calls in the |
Regarding 'healthy': this feature is designed for local coordination and should not be considered a solution for fault tolerance. The example that @deleugpn provided is a clear failure condition that this wouldn't catch by itself. The ordering won't report that your app is healthy or not -- rather, it is intended to replace the need to use a sleep / wait loop until resources are available. Your applications will still need to handle the case where container A breaks, either during startup or hours after the task starts. If you are trying to implement self-healing architecture as described in the moby thread, you could use the ECS service abstraction paired with health checks on your essential containers. Envoy proxy is the example we have been using to justify 'healthy' as a dependency condition. For Envoy, it is not enough to validate that the container has started. We also need to ensure that the container is ready to receive traffic. This means that containers that depend on Envoy can start knowing that Envoy has already finished its initialization sequences. However, this doesn't mean an application depending on Envoy can assume that it will always be available. You would still need to implement a failure path, even if that failure path is reporting that the container is unhealthy and signaling the scheduler to restart the task. |
That is a wonderful positioning for the feature. Although I personally don't need it yet, I think you have the heart of the feature in the right place. |
how does this relate to supporting the k8s concept of init containers? they might not be still running so reverse order should continue over ones possibly already done or stopped. |
I have a need for this and agree the proposal looks well thought out. However, a colleague pointed out the documentation doesn't explicitly state that the dependencies would be run on the same instance. I think it's somewhat implied, but in reading the proposal, statements like "agent will ensure that dependencies are run" could imply that these containers are run but not necessarily on the local host. In my case I would need dependencies run on the same host as the primary container, and it sounds like @alexbilbie has the same need. I think clarification of the proposal on this point would be good. |
This could be helpful for one of my use cases. I recently had to implement custom entrypoint logic that pauses containers on startup if there are database migrations pending and a companion task that actually applies pending migrations, orchestrated through cloudwatch events and lambdas. Couple of questions, though.
|
To answer some of the questions in the thread: this doesn't work across services or tasks. The ordering applies strictly within the task boundary. You will still need to employ other strategies to enforce dependencies across services / clusters. |
Very helpful feature. Any plans by when this will be supported by CloudFormation? |
I understand that this is bound to the task definition boundary but what or how can we achieve this? I am using Fargate & Terraform to provision my components. I am able to do this but when I look into the task definition on AWS has the depends on set to null. The platform version is 1.3.0 & I am using fargate which means there is no agent available. |
@petderek Is this feature available via the |
I need to configure the order of containers within different services, has anyone worked out a way to do this? |
Number 2. is not reasonable and cuts out the following use case involving 2 containers: Container 1 starts Container 2 depends on Container 1 finishing (otherwise there is no result file for Container 2 to find). This can be acheived with docker compose, however, it doesn't seem possible with ECS Task Definition. |
we have a nginx container which depends on a HEALTHY app container in a service. which works find for start-up on shut-down we need the nginx to stay open until the app finished processing requests (while receiving a shutdown) we would need a dependsOnShutDown container definition |
The ECS team is planning on implementing container startup and shutdown
ordering for tasks. We would like to get feedback on our current plan.
Specifically, we'd like to know:
application startup?
order?
Thanks!
Problem Statement
ECS does not currently have an explicit mechanism to ensure that containers
start in any particular order. Yet, many applications and services have
cross-container prerequisite dependencies. Common examples may include:
A container that gathers data or applies options before the rest of the
application may start.
An application that has a runtime expectation that another application
defined within the task has already started.
A container that reuses resources defined by other containers. Some
resources, such as volumes, are implicitly handled in ECS today.
Overview of Solution
ECS will address these use cases by improving container dependency management.
We will introduce the following concepts into our task definition:
A means to explicity declare dependencies on other containers within a task
A parameter to describe conditions of the container
Granular timeouts for container start and stop
These three components can be added to the container definition shape as follows:
Dependency Mappings
Within the container definition, we will make it possible to declare
dependencies on other named containers. A container may have zero, one, or
multiple dependencies. The chain of dependencies within a task will be used to
determine both start and stop order.
When starting up, agent will guarantee that a container will only start if its
dependent containers have already been started. Internally, agent respects
order if a task uses links or volumes between containers, but otherwise starts
containers in parallel. This project will extend this existing dependency logic
and make it usable in more situations.
Currently, the agent does not enforce any ordering when a task is stopped, even
for links and volumes. We will amend the behavior of container stops to respect
the order provided via declared dependencies. The shutdown order will simply be
the inverse of start order. For example: lets say container A depends on
container B. B will start before A is started, but A will stop before B is
stopped. If a task is shut down due to an essential container failing in the
middle of the chain, we will adhere to the shutdown ordering where possible.
Dependency Conditions
There is already an implicit dependency condition for containers using links or
volumes. However, for both of these cases it is only validated that the
required container be started before the dependent container may start.
However, only starting the container does not provide enough of a guarantee for
many application types. We will introduce dependency conditions as a way to
support these other kinds of applications.
A "condition" may be one of the three enumerated strings: "START", "COMPLETE",
"SUCCESS", or "HEALTHY". The behavior of these conditions follows:
"START" will emulate the behavior of links and volumes today. It will allow
customers to specify that a dependent container needs to only be started before
permitting other containers to start.
"COMPLETE" will validate that a dependent container runs to completion
(exits) before permitting other containers to start. This can be useful for
non-essential containers that run a script and then subsequently exit.
"SUCCESS" will be identical to "COMPLETE", but it will also require that the
container exits with status zero.
"HEALTHY" will validate that the dependent container passes its Docker
healthcheck before permitting other containers to start. This condition will
only be confirmed at task startup.
Granular Timeouts
Currently, the container start and stop timeouts are instance level settings
configured within the ecs.config file. These timeouts are strictly used as part
of the Docker timeout.
Introducing container dependencies will introduce an additional set of
potential failure conditions for startup that extend beyond the Docker API
timeout. For example, waiting for a container to complete or reach 'healthy'
won't use the Docker timeout as is. We will need to implement this feature in
order to prevent tasks to get stuck in 'starting' forever.
Additionally, a global option is not going to give customers enough
flexibility, since different containers will have different conditions.
In order to give customers the most flexibility, we will need to enhance the
timeout feature in two ways:
Provide start and stop timeouts on a per-container basis
Enhance the timeouts so that they can be applied to the "HEALTHY" and
"COMPLETE" conditions described earlier
The text was updated successfully, but these errors were encountered: