-
Notifications
You must be signed in to change notification settings - Fork 619
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Address envFile resource naming defect #3554
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent writeup for the defect, helped me get a lot of context. Changes look good.
779b454
to
0a0d3ec
Compare
Previous push is just to rebase and apply newest commits from |
The following tests are known to currently hang on pending status. ecs/al2kernel5dot10/integ_test @Yiyuanzzz is working on a fix and it will be deployed in the near future. |
Summary
Address EnvironmentFileResource naming defect which causes a race condition when a task has multiple containers which specify at least one unique environment file.
For further context, please refer to the "Additional Context" section below in this pull request's description.
Implementation details
GetName()
method to easily differentiate EnvironmentFileResource objects from one another (i.e., add container name to return value since container name is unique)resourcestatus.ResourceStatus(envFiles.EnvFileCreated)
instead ofresourcestatus.ResourceCreated
when building a container's EnvironmentFileResource dependency in theinitializeEnvfilesResource
functionTesting
Unit testing:
make test
Existing unit test
TestInitializeAndGetEnvfilesResource
has been expanded to cover the changes introduced in this pull request and operates as follows:This test will always pass when the fix in this pull request is implemented since the two resource dependencies will never be equal due to each EnvironmentFileResource object’s call to its
GetName()
method returning a unique name. This test will also always fail under the current ECS Agent because the two resource dependencies will always be considered equal due to all calls to EnvironmentFileResourceGetName()
method returning the constant string “envfile”.Manual testing:
An ECS customer who faced this issue was given a custom ECS Agent with the code changes from this PR built in and validated that they no longer ran into the issue after using that custom ECS Agent.
I was also able to reproduce the issue using the current ECS Agent by taking the following steps:
Upon using the custom ECS Agent mentioned previously, I could no longer reproduce the issue with the same steps.
New tests cover the changes: no (but existing
TestInitializeAndGetEnvfilesResource
test has been modified to cover the changes)Additional Context
Per the Amazon ECS Developer Guide, each container in a given task can specify up to 10 environment files. If a container specifies 1 or more environment files, then all of its environment files are grouped into a single EnvironmentFileResource. So for each container with 1 or more environment files, an EnvironmentFileResource is created.
A problem arises in the current ECS Agent code when a task has multiple (i.e., more than one) containers which specify at least one unique environment file.
When EnvironmentFileResource objects are initialized in the
initializeEnvfilesResource
function, each EnvironmentFileResource (tied to a unique container in the task) is:The 2nd point above is problematic. Every EnvironmentFileResource object’s
GetName()
method always returns “envfile”, which means all EnvironmentFileResource objects have the same name.Then later, when we check if dependencies for a container are resolved, we pass the slice of task resources returned by
mtask.GetResources()
to theDependenciesAreResolved
function.Diving deeper into
mtask.GetResources()
reveals that it callstask.getResourcesUnsafe()
, which iterates through the task’s resource map. According to the Go Programming Language Specification, “the iteration order over maps is not specified and is not guaranteed to be the same from one iteration to the next”. Thus, there is no guarantee as to how the task resources will be ordered in the slice of task resources that we pass to theDependenciesAreResolved
function.Recall that all EnvironmentFileResource objects have the same name. Because of this, observe that in the
DependenciesAreResolved
function, only the last EnvironmentFileResource encountered while iterating through the slice of task resources returned by the previousmtask.GetResources()
call (this EnvironmentFileResource could correspond to any container in the task) will be the one we check if it has KnownStatus CREATED” yet, regardless of whether or not that EnvironmentFileResource is actually associated with the current container. That is, only the last EnvironmentFileResource encountered can occupy this map at key value “envfile”.Thus, even though we have declared EnvironmentFileResource dependencies for containers, there is no guarantee that a container will try to transition to “CREATED” only after its associated environment files have finished downloading and been written to disk.
Simplified Example:
Let’s say we have a task that specifies 2 containers, container A and container B where each container specifies its own single unique environment file and the following sequence of events happens:
Description for the changelog
Address envFile resource naming defect
Licensing
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.