Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updating Task Model and implementing docker flags --pid and --ipc #1584

Merged
merged 3 commits into from
Oct 3, 2018

Conversation

linkar-ec2
Copy link
Contributor

@linkar-ec2 linkar-ec2 commented Sep 27, 2018

Summary

Updates the Agent Task model to include PidMode and IpcMode for sharing PID namespace and IPC resource namespace, respectively.
Adding unit tests for Pid and Ipc resource namespace sharing

Implementation details

Agent Task struct is updated to support the PidMode and IpcMode that is in the Task struct received in the payload message from ACS.
In PostUnmarshalTask(), the task is used to determine if a Pause container for namespace sharing needs to be provisioned. If so, it is added as an internal container to the task.
In CreateContainer(), the DockerHostConfig() method determines if the current container's PidMode and IpcOption needs to be changed.

Testing

  • Builds on Linux (make release)
  • Builds on Windows (go build -out amazon-ecs-agent.exe ./agent)
  • Unit tests on Linux (make test) pass
  • Unit tests on Windows (go test -timeout=25s ./agent/...) pass
  • Integration tests on Linux (make run-integ-tests) pass
  • Integration tests on Windows (.\scripts\run-integ-tests.ps1) pass
  • Functional tests on Linux (make run-functional-tests) pass
  • Functional tests on Windows (.\scripts\run-functional-tests.ps1) pass

New tests cover the changes:
Unit Tests cover the added Pause container provisioning methods and DockerHostConfig updates

Description for the changelog

Adding ECS Agent support for the --pid and --ipc Docker flags

Licensing

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@linkar-ec2 linkar-ec2 added this to the 1.21.0 milestone Sep 27, 2018
@haikuoliu haikuoliu requested a review from a team September 27, 2018 00:05
@samuelkarp
Copy link
Contributor

Is there any particular reason that you're not sharing the pause container between awsvpc mode and these new options? They serve the same purpose: to provide root namespaces that can be shared.

@linkar-ec2
Copy link
Contributor Author

linkar-ec2 commented Sep 27, 2018 via email

@samuelkarp
Copy link
Contributor

That's not exactly accurate. The purpose of the existing pause container is to create a network namespace that can be shared among the containers in a task. We don't attach the pause container to the ENI, but rather move the ENI into the network namespace that was created as a side-effect of running the pause container. PID and IPC namespaces are similar in that they're just namespaces of the container.

pauseContainer.BuildContainerDependency(container.Name, apicontainerstatus.ContainerStopped, apicontainerstatus.ContainerStopped)
}
}

func (task *Task) addNamespaceSharingProvisioningDependency(cfg *config.Config) {
// Pause container does not need to be created if no namespace sharing will be done
if task.getIpcMode() == "" && task.getPidMode() == "" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would change it to if task.getIpcMode() != "task" && task.getPidMode() != "task" so you don't need to check if it's task mode below.

@@ -167,6 +171,14 @@ type Task struct {
terminalReason string
terminalReasonOnce sync.Once

// PidMode is used to determine how PID namespaces are organized between
// containers of the Task
PidMode string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See https://github.com/golang/go/wiki/CodeReviewComments#initialisms, Golang convention is a bit different from Java convention, so use PIDMode and IPCMode instead, same for other namings.

}

// Create a Pause container if either IPC resource or PID namespace needs to be shared
if task.getIpcMode() == "task" || task.getPidMode() == "task" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of using "task" directly, you can declare constants like PIDModeNone="none", PIDModeHost="host", PIDModeTask="task" and use them here (and everywhere).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, will update in next commit with integ tests

}
}

shouldOverridePid, pidMode := task.shouldOverridePidMode(container, dockerContainerMap)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, pidMode would be sufficient, see https://github.com/golang/go/wiki/CodeReviewComments#variable-names

Same below.

agent/api/task/task.go Show resolved Hide resolved
agent/api/container/container.go Outdated Show resolved Hide resolved
pidPauseContName = cont.Name
break
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic to find pause container within all of the containers appears three times (CNI, pid, ipc), you can abstract them into a method.

if strings.EqualFold(task.getPidMode(), "host") {
return true, "host"
} else if strings.EqualFold(task.getPidMode(), "task") {
//need to find the ContainerNamespacePause and assign
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: // need, this happens in several places.

hostConfig.NetworkMode = networkMode
// Override 'awsvpc' parameters if needed
if container.Type == apicontainer.ContainerCNIPause {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why extra line here?

namespacePauseContainer := apicontainer.NewContainerWithSteadyState(apicontainerstatus.ContainerResourcesProvisioned)
namespacePauseContainer.TransitionDependenciesMap = make(map[apicontainerstatus.ContainerStatus]apicontainer.TransitionDependencySet)
namespacePauseContainer.Name = NamespacePauseContainerName
namespacePauseContainer.Image = fmt.Sprintf("%s:%s", cfg.PauseContainerImageName, cfg.PauseContainerTag)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid PauseContainerImageName being overridden by Fargate, one option is you can use DefaultPauseContainerImageName here. And you will also need to make change here and pause_linux.go to make Agent always load default pause container tarball

@linkar-ec2
Copy link
Contributor Author

[After syncing with Sam and Derek]
For closure, it is better to use separate Pause containers for the ENI/Network sharing and the PID/IPC resources namespace sharing. The image for the Pause container for the ENI attachment is specific to the requirements of the ENI. The image for the PID/IPC sharing Pause will be the default image.

Copy link
Contributor

@yhlee-aws yhlee-aws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add functional test for these flags? is it blocked by backend work?

pauseContainer.BuildContainerDependency(container.Name, apicontainerstatus.ContainerStopped, apicontainerstatus.ContainerStopped)
}
}

func (task *Task) addNamespaceSharingProvisioningDependency(cfg *config.Config) {
// Pause container does not need to be created if no namespace sharing will be done at task level
if task.getIPCMode() != "task" && task.getPIDMode() != "task" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move "task" into const block?

return false, ""
}

if strings.EqualFold(task.getPIDMode(), "host") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move "host" to const block also.
and all the rest of magic strings you are using, like "none" and "shareable"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yunhee-l Functional tests need the Frontend changes to be complete before we can register task definitions. Integ tests are currently being made and will probably be in another PR

Copy link
Contributor

@haikuoliu haikuoliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there 😊

agent/api/task/task.go Show resolved Hide resolved
if pidPauseContName == "" {
seelog.Critical("Namespace Pause Container required, but not found in the task: %s", task.String())
return false, "" // cannot override the Pid without the Pause container
} else if strings.EqualFold(task.getPIDMode(), "task") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reason using strings.EqualFold instead of ==? I think == should be sufficient here as we already know the result of task.getPIDMode(), i.e. it's "task" instead of "Task" or "tAsk", this is more like a contract between Agent and ACS.

agent/app/agent_unix.go Show resolved Hide resolved
@@ -401,8 +401,8 @@ func (c *Container) GetNextKnownStateProgression() apicontainerstatus.ContainerS
return c.GetKnownStatus() + 1
}

// IsInternal returns true if the container type is either `ContainerEmptyHostVolume`
// or `ContainerCNIPause`. It returns false otherwise
// IsInternal returns true if the container type is `ContainerEmptyHostVolume`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't use the ContainerEmptyHostVolume anymore, it can be removed from the comment.

if task.getIPCMode() != ipcModeTask && task.getPIDMode() != pidModeTask {
return
}
namespacePauseContainer := apicontainer.NewContainerWithSteadyState(apicontainerstatus.ContainerResourcesProvisioned)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the definition of ContainerResourcesProvisioned state for namespacePauseContainer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ContainerResourcesProvisioned seems to be the norm for internal containers. Even though this state checks to see if ENI resources need to be added (for our namespacePauseContainer, nothing additional will be done). Because we don't need anything in this method anyways, I set the Dependency to use Running. I don't think there should be difference in runtime though.

return false, ""
}

if task.getPIDMode() == pidModeHost {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These can be switch statements instead of if, else. That would be easier to read.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first if block tests if the container is internal. This is unrelated to the Task's pidMode, whose logic is an if and else-if block. For separation of logic, I think this if-else works better than the switch. What do you think?

seelog.Critical("Namespace Pause container not found in the task: %s", task.String())
return false, ""
}
pauseToGiveDocker, ok := dockerContainerMap[pauseContName]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does pauseToGiveDocker mean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the Docker container name that is contained in the map. We will 'give' this container to the Docker api for the --pid or --ipc flag

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. it's a little confusing. I would prefer just calling it pauseContainerID or something similar.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed!

}
pauseToGiveDocker, ok := dockerContainerMap[pauseContName]
if !ok || pauseToGiveDocker == nil {
// Docker container shouldn't be nill or not exist if the Container definition within task exists; implies code-bug
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: nil

pauseToGiveDocker, ok := dockerContainerMap[pauseContName]
if !ok || pauseToGiveDocker == nil {
// Docker container shouldn't be nill or not exist if the Container definition within task exists; implies code-bug
seelog.Criticalf("Namespace Pause container required, but not found in container map for container: [%s] in task: %s",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a failure case, which should be handled in a way that the task is not run. If pidMode is task and pause container is not found, we should stop the task imo. same for ipcMode.

Copy link
Contributor

@haikuoliu haikuoliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And please update the year in the header of the files you changed, thanks!

agent/api/task/task.go Show resolved Hide resolved
@@ -76,6 +76,16 @@ const (
// awslogsCredsEndpointOpt is the awslogs option that is used to pass in an
// http endpoint for authentication
awslogsCredsEndpointOpt = "awslogs-credentials-endpoint"

// ipcModeNone specifies the string used to define the `none` docker IPC resource namespace sharing mode
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason that ipcModeNone is not grouped with other pid/ipc constants?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to distinguish that None is an option for IPC but thinking again, it makes more sense to just group it and leave out the comment.

Copy link
Contributor

@haikuoliu haikuoliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🎉! Make sure all tests pass and squash the commits

Copy link
Contributor

@sharanyad sharanyad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would need new capabilities for the two flags that can be added in the next PR.

pauseContainer.BuildContainerDependency(container.Name, apicontainerstatus.ContainerStopped, apicontainerstatus.ContainerStopped)
}
}

func (task *Task) addNamespaceSharingProvisioningDependency(cfg *config.Config) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add unit tests for this method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sharanyad The new capabilities for the two flags will be updated with the Agent FE model changes in an upcoming PR!

DesiredStatus: strptr("RUNNING"),
Family: strptr("myFamily"),
PidMode: strptr("host"),
IpcMode: strptr("task"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add test cases for the other modes as well? You could use table driven tests for this. same for pidMode.

case task.getIPCMode() == ipcModeTask:
pauseContName, ok := task.findPauseContainerName(apicontainer.ContainerNamespacePause)
if !ok {
seelog.Critical("Namespace Pause container not found in the task: %s", task.String())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: seelog.Criticalf
Logging just the task ARN is enough here

pauseContName, ok := task.findPauseContainerName(apicontainer.ContainerNamespacePause)
if !ok {
seelog.Critical("Namespace Pause container not found in the task: %s", task.String())
seelog.Criticalf("Setting Task Desired Status to Stopped. Cannot set up PID sharing.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This log statement can be combined with previous one.
also *IPC sharing

return false, ""
}

switch {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be:

switch task.getIPCMode() {

   case ipcModeNone:
   ...

Copy link
Contributor

@sharanyad sharanyad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know what is the test coverage of the files that are changed?
This tool can help you find it.

@@ -635,6 +635,183 @@ func TestPostUnmarshalTaskWithLocalVolumes(t *testing.T) {

}

// Slice of structs for Table Driven testing for sharing PID and IPC resources
var namespaceTests = []struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add tests for IPCMode None and Shareable as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The customer will not be allowed to provide Shareable as an option. Shareable is internally set for the Pause container and is tested within a unit test.

agent/api/task/task_test.go Show resolved Hide resolved
@linkar-ec2
Copy link
Contributor Author

Code Coverage:
Task: 70.1%
Container: 57.3%

Adding unit tests for Pid and Ipc resource namespace sharing
@@ -1,4 +1,9 @@
# Changelog
## 1.20.4-dev
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this file should not be there in this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rebased and squashed all commits. Awaiting merge.

Copy link
Contributor

@sharanyad sharanyad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

pauseDockerID, ok := dockerContainerMap[pauseCont.Name]
if !ok || pauseDockerID == nil {
// Docker container shouldn't be nil or not exist if the Container definition within task exists; implies code-bug
seelog.Criticalf("Namespace Pause container not found in the task: %s; Setting Task's Desired Status to Stopped", task.Arn)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this error message is same as the previous condition.
Please change it to something like not found in container map, so its easier while debugging.

@linkar-ec2 linkar-ec2 merged commit b72bebf into aws:DockerFlagSupport Oct 3, 2018
@haikuoliu haikuoliu removed this from the 1.21.0 milestone Oct 4, 2018
@haikuoliu haikuoliu added this to the 1.22.0 milestone Oct 4, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants