-
Notifications
You must be signed in to change notification settings - Fork 604
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix container metadata information missing during agent restart #1183
Conversation
caf8265
to
20d7ce9
Compare
efa87d9
to
423d1ae
Compare
@richardpen can you add more details about what |
agent/engine/docker_task_engine.go
Outdated
if !container.Container.KnownTerminal() { | ||
container.Container.ApplyingError = api.NewNamedError(&ContainerVanishedError{}) | ||
// If this is a Docker API error | ||
if metadata.Error.ErrorName() == "CannotDescribeContainerError" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: Can you const "CannotDescribeContainerError"
?
agent/engine/docker_task_engine.go
Outdated
now := engine.time().Now() | ||
ok := task.SetExecutionStoppedAt(now) | ||
if ok { | ||
seelog.Infof("Recording execution stopped time for a task, essential container in task stopped, task %s, time: %s", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once #1208 is merged, please rebase this on top of that and don't forget to update the log string as per the format being followed in that:
"Recording execution
->"Task engine [%s] Recording execution
- instead of
task.String()
, usetask.Arn
cf2c507
to
3181f40
Compare
@@ -332,6 +365,15 @@ func (engine *DockerTaskEngine) synchronizeContainerStatus(container *api.Docker | |||
// update the container known status | |||
container.Container.SetKnownStatus(currentState) | |||
} | |||
// Update task ExecutionStoppedAt timestamp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please rebase this from dev once #1217 is merged. This is now moved into its own method. May be it's better to review this after that PR is merged
agent/engine/docker_task_engine.go
Outdated
|
||
// Set Exitcode if it's not set | ||
if metadata.ExitCode != nil && (container.GetKnownExitCode() == nil || | ||
aws.IntValue(metadata.ExitCode) != aws.IntValue(container.GetKnownExitCode())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need the condition checks
(container.GetKnownExitCode() == nil || aws.IntValue(metadata.ExitCode) != aws.IntValue(container.GetKnownExitCode()))
?
can't we just set it irrespective of it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should have the same effect, but not sure which way is better. I can change that if no one is on the opposite side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I vote for setting it anyway and getting rid of this check. nice catch!
3181f40
to
bd4bda0
Compare
bd4bda0
to
1477d48
Compare
agent/api/task.go
Outdated
@@ -1103,3 +1103,21 @@ func (task *Task) GetID() (string, error) { | |||
|
|||
return resourceSplit[1], nil | |||
} | |||
|
|||
func (task *Task) RecordExecutionStoppedAt(container *Container) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lint: // RecordExecutionStoppedAt ...
@@ -1720,3 +1720,170 @@ func TestHandleDockerHealthEvent(t *testing.T) { | |||
}) | |||
assert.Equal(t, testContainer.Health.Status, api.ContainerHealthy) | |||
} | |||
|
|||
func TestCreatedContainerMetadataUpdateOnRestart(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can combine all of these into a single test. Either a table based test or a task having all of these containers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code lgtm, minor comments about documentation. also i agree with @aaithal wrt merging those four *MetadataUpdateOnRestart
tests into a single test with tables.
agent/engine/docker_task_engine.go
Outdated
@@ -290,6 +291,36 @@ func (engine *DockerTaskEngine) synchronizeState() { | |||
engine.saver.Save() | |||
} | |||
|
|||
func updateContainerMetadata(metadata *DockerContainerMetadata, container *api.Container, task *api.Task) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: // updateContainerMetadata ...
agent/engine/docker_task_engine.go
Outdated
@@ -301,51 +332,49 @@ func (engine *DockerTaskEngine) synchronizeContainerStatus(container *api.Docker | |||
seelog.Warnf("Task engine [%s]: could not find matching container for expected name [%s]: %v", | |||
task.Arn, container.DockerName, err) | |||
} else { | |||
// update the container metadata in case the container status/metadata changed during agent restart |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so if i understand correctly, this codepath is for containers that were potentially created while we were down
(debug msg above). is that right? maybe update the comment to reflect that.
2ff79d9
to
1788efa
Compare
1788efa
to
26b78d8
Compare
Summary
Fix the issue where agent could miss the container metadata information(portmapping/exitcode), if the container is started/stopped when the agent isn't running(eg: restart).
Implementation details
Record the metadata information during the reconciliation on agent restart.
Testing
make release
)go build -out amazon-ecs-agent.exe ./agent
)make test
) passgo test -timeout=25s ./agent/...
) passmake run-integ-tests
) pass.\scripts\run-integ-tests.ps1
) passmake run-functional-tests
) pass.\scripts\run-functional-tests.ps1
) passNew tests cover the changes:
yes
Description for the changelog
Licensing
This contribution is under the terms of the Apache 2.0 License:
yes