Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix an race condition in agent #737

Merged
merged 1 commit into from
Apr 13, 2017
Merged

Conversation

richardpen
Copy link

@richardpen richardpen commented Mar 22, 2017

Summary

There is a race condition when task is cleaning up, the task state can get inconsistent with the managed task in some edge case. Because these two are updated one inside the lock and one outside the lock when clean up the task.

Implementation details

Testing

  • Builds on Linux (make release)
  • Builds on Windows (go build -out amazon-ecs-agent.exe ./agent)
  • Unit tests on Linux (make test) pass
  • Unit tests on Windows (go test -timeout=25s ./agent/...) pass
  • Integration tests on Linux (make run-integ-tests) pass
  • Integration tests on Windows (.\scripts\run-integ-tests.ps1) pass
  • Functional tests on Linux (make run-functional-tests) pass
  • Functional tests on Windows (.\scripts\run-functional-tests.ps1) pass
  • Manually modified the code to simulate the situation where race condition can happen without the change but not happened with this change.

New tests cover the changes:

Description for the changelog

Fixed a potential race condition in agent where could cause agent in corrupted state.

Licensing

This contribution is under the terms of the Apache 2.0 License:
yes

// Now remove ourselves from the global state and cleanup channels
mtask.engine.processTasks.Lock()
mtask.engine.state.RemoveTask(mtask.Task)
seelog.Debug("Finished removing task data; removing from state no longer managing", "task", mtask.Task)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to update the format string when calling seelog directly. I think we should also update the content of the string itself; it seems a bit confusing to me. Maybe something like:
seelog.Debugf("Finished removing task data, no longer managing task %v", mtask.Task)

Copy link
Contributor

@aaithal aaithal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make the commit message more descriptive? You can copy the summary from PR and use that for the commit message instead. It always helps to have as much context in the commit message as possible.

The task state in dockerstate and managed task in task manager are
modified when task is added and cleaned up, both are protected by
processTasks.Lock. But in cleanup the task state is modified outside
the lock which could cause inconsistent state of the task in rare case.
@richardpen richardpen merged commit e6841f0 into aws:dev Apr 13, 2017
@samuelkarp samuelkarp added this to the 1.14.2 milestone May 25, 2017
@adnxn adnxn mentioned this pull request May 26, 2017
@richardpen richardpen deleted the race-condition branch November 21, 2017 01:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants