Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue a stop when start container failed with EOF error #2245

Merged
merged 1 commit into from
Oct 24, 2019

Conversation

fenxiong
Copy link
Contributor

@fenxiong fenxiong commented Oct 22, 2019

Summary

Fix #1708. When start container failed with EOF, there's a chance that the container is started anyway, so issue a stop for the container in that case.

Implementation details

The stop is issued in handleEventErroor, in the same way as how we issue a stop when start container times out. I can't find a typed error from Docker or golang for this, so check is based on the error string.

Testing

Unit test added. Manually tested by injecting the error in docker client and verified that the container is forced to stop.

Description for the changelog

Fixed a bug where the agent could lose track of running containers upon certain Docker API error #2245.

Licensing

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

agent/engine/task_manager.go Outdated Show resolved Hide resolved
@fenxiong fenxiong force-pushed the test-startcontainer branch from 49d48ad to f8a68b7 Compare October 23, 2019 16:37
@fenxiong fenxiong marked this pull request as ready for review October 23, 2019 18:44
Copy link
Contributor

@adnxn adnxn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change lgtm, nonblocking nit.

agent/engine/task_manager_test.go Outdated Show resolved Hide resolved
@adnxn adnxn added this to the 1.33.0 milestone Oct 23, 2019
When start container failed with EOF there's a chance that the container is started anyway, so issue a stop for this case.
@fenxiong fenxiong force-pushed the test-startcontainer branch from f8a68b7 to adbf4a9 Compare October 23, 2019 18:58
@fenxiong fenxiong requested a review from a team October 23, 2019 18:59
Copy link
Member

@fierlion fierlion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good.

// a stop. See #1043 for details
shouldForceStop = true
} else if errorName == dockerapi.CannotStartContainerErrorName && strings.HasSuffix(errorStr, io.EOF.Error()) {
// If we get an EOF error from Docker when starting the container, we don't really know whether the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we know when does an EOF error occur?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my understanding, an EOF means the client didn't read anything from the response, which can happen if the connection closes before server writes the response.

@fenxiong fenxiong merged commit a0feddb into aws:dev Oct 24, 2019
@fenxiong fenxiong deleted the test-startcontainer branch October 24, 2019 19:16
@shubham2892 shubham2892 modified the milestones: 1.33.0, 1.32.1 Oct 24, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants