-
Notifications
You must be signed in to change notification settings - Fork 612
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix an race condition in agent #737
Conversation
agent/engine/task_manager.go
Outdated
// Now remove ourselves from the global state and cleanup channels | ||
mtask.engine.processTasks.Lock() | ||
mtask.engine.state.RemoveTask(mtask.Task) | ||
seelog.Debug("Finished removing task data; removing from state no longer managing", "task", mtask.Task) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to update the format string when calling seelog directly. I think we should also update the content of the string itself; it seems a bit confusing to me. Maybe something like:
seelog.Debugf("Finished removing task data, no longer managing task %v", mtask.Task)
9978c20
to
1a35527
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you make the commit message more descriptive? You can copy the summary from PR and use that for the commit message instead. It always helps to have as much context in the commit message as possible.
1a35527
to
42f7427
Compare
42f7427
to
573e1ff
Compare
The task state in dockerstate and managed task in task manager are modified when task is added and cleaned up, both are protected by processTasks.Lock. But in cleanup the task state is modified outside the lock which could cause inconsistent state of the task in rare case.
573e1ff
to
e6841f0
Compare
Summary
There is a race condition when task is cleaning up, the task state can get inconsistent with the managed task in some edge case. Because these two are updated one inside the lock and one outside the lock when clean up the task.
Implementation details
Testing
make release
)go build -out amazon-ecs-agent.exe ./agent
)make test
) passgo test -timeout=25s ./agent/...
) passmake run-integ-tests
) pass.\scripts\run-integ-tests.ps1
) passmake run-functional-tests
) pass.\scripts\run-functional-tests.ps1
) passNew tests cover the changes:
Description for the changelog
Fixed a potential race condition in agent where could cause agent in corrupted state.
Licensing
This contribution is under the terms of the Apache 2.0 License:
yes