-
Notifications
You must be signed in to change notification settings - Fork 618
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix issue1043 #1217
fix issue1043 #1217
Conversation
agent/engine/task_manager.go
Outdated
// a stop | ||
seelog.Warnf("Managed task [%s]: Forcing container [%s] to stop", | ||
mtask.Arn, container.Name) | ||
go mtask.engine.transitionContainer(mtask.Task, container, api.ContainerStopped) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to check the KnownStatus
of the container here? In case it has been stopped or already running, eg:
StartContainer -> inspect timeout -> stop the container
StartEvent -> inspect timeout -> stop the container
@@ -267,6 +274,13 @@ func (mtask *managedTask) waitSteady() { | |||
} | |||
} | |||
|
|||
// steadyState returns if the task is in a steady state. Steady state is when task's desired | |||
// and known status are both RUNNING | |||
func (mtask *managedTask) steadyState() bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, and possibly my java-brain speak here: can this be named "isSteadyState" or "isInSteadyState"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, don't have strong opinions here. if task.steadyState()
conveys the 'is task in steady state?' meaning in my brain. Golang prefers less verbosity in names (though we don't always stick to that convention). If you feel strongly about it, I can change it.
This should hopefully fix failure seen in https://ci.appveyor.com/project/AmazonECS/amazon-ecs-agent/build/1620 This change ensures that the message is sent on the stopDone channel before marking the waitgroup as done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, one minor comment, also please make sure the header was updated to 2018 for the modified file.
CHANGELOG.md
Outdated
@@ -5,6 +5,8 @@ | |||
* Feature - Support Docker health check. | |||
* Bug - Fixed a bug where `-version` fails due to its dependency on docker client. [#1118](https://github.com/aws/amazon-ecs-agent/pull/1118) | |||
* Bug - Persist container exit code in agent state file. [#1125](https://github.com/aws/amazon-ecs-agent/pull/1125) | |||
* Bug - Persist container exit code in agent state file [#1125](https://github.com/aws/amazon-ecs-agent/pull/1125) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be consistent in the changelog, whether the .
is needed at the end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think convention has been to not add .
's at the end. I'll modify that.
@@ -518,6 +518,7 @@ func (dg *dockerGoClient) createContainer(ctx context.Context, | |||
if err != nil { | |||
return DockerContainerMetadata{Error: CannotCreateContainerError{err}} | |||
} | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: blank line
agent/engine/docker_client.go
Outdated
switch event.Status { | ||
case "create": | ||
status = api.ContainerCreated | ||
// TODO no need to inspect containers here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
im probably missing context, but what do you mean by "no need to inspect containers here"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no need to inspect containers after they are created once we adopt Docker's volume APIs. Today, that's the only information we need from the inspect
API. Once we start injecting that ourselves, there's no need to inspect
containers on Create
anymore. This will save us a lot of inspect
calls in the future. I can update the comment to have this context
agent/engine/docker_client.go
Outdated
default: | ||
// Because docker emits new events even when you use an old event api | ||
// version, it's not that big a deal | ||
seelog.Debugf("DockerGoClient: unknown status event from docker: %s", event.Status) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: should we a tie an event.ID
to this debug line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably. I'll add more details here.
agent/engine/task_manager.go
Outdated
return | ||
} | ||
if containerKnownStatus != api.ContainerStopped { | ||
// Container's known status is STOPPED. Nothing to do here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did you mean to put container's known status is NOT stopped here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. thanks for catching that.
agent/engine/task_manager.go
Outdated
taskKnownStatus := mtask.GetKnownStatus() | ||
return taskKnownStatus == api.TaskRunning && taskKnownStatus >= mtask.GetDesiredStatus() | ||
// handleContainerStoppedTransitionError handles an error when transitioning a container to | ||
// STOPPED. It returns a boolean indicating whethere the taks can continue with updating its |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: whether, tasks
for !mtask.waitEvent(othersStopped) { | ||
if mtask.GetDesiredStatus().Terminal() { | ||
// If we end up here, that means we received a start then stop for this | ||
// task before a task that was expected to stop before it could |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this comment is not readable. Can you please refactor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:) it's a right of initiation for anyone who's relatively new to this code base to read this and wonder what in the world is the meaning of this sentence. Since I've already merged this PR, we can pick this next time we touch this file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this will help anyone who stumbles here: We want to process stop events (a task changes to desired=STOPPED
) prior to start events (desired=RUNNING
) so that any resources that are consumed by the desired=STOPPED
task can be freed for the desired=RUNNING
task. This logic helps to enforce the ordering.
@@ -418,7 +445,7 @@ func (mtask *managedTask) emitTaskEvent(task *api.Task, reason string) { | |||
func (mtask *managedTask) emitContainerEvent(task *api.Task, cont *api.Container, reason string) { | |||
event, err := api.NewContainerStateChangeEvent(task, cont, reason) | |||
if err != nil { | |||
seelog.Infof("Managed task [%s]: unable to create state change event for container [%s]: %v", | |||
seelog.Debugf("Managed task [%s]: unable to create state change event for container [%s]: %v", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not Infof
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
valid internal state change change events like NONE
and CREATED
end up resulting in an error here. Hence, reverted back to Debug
container.SetKnownStatus(currentKnownStatus) | ||
container.SetDesiredStatus(api.ContainerStopped) | ||
errorName := event.Error.ErrorName() | ||
if errorName == dockerTimeoutErrorName || errorName == cannotInspectContainerErrorName { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there other inspect error types? Is the error name check necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can either inspect types or use error names. Error names are easier as there's no type-casting involved. Instead of casting a wide net here, I'd prefer to selectively add error types as that impact to limit any potential unwanted impact of this
Summary
Implementation details
Testing
make release
)go build -out amazon-ecs-agent.exe ./agent
)make test
) passgo test -timeout=25s ./agent/...
) passmake run-integ-tests
) pass.\scripts\run-integ-tests.ps1
) passmake run-functional-tests
) pass.\scripts\run-functional-tests.ps1
) passNew tests cover the changes: yes!
Description for the changelog
Licensing
This contribution is under the terms of the Apache 2.0 License: