Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bugs around docker start retries #6326

Merged
merged 3 commits into from
Sep 18, 2019
Merged

Conversation

notnoop
Copy link
Contributor

@notnoop notnoop commented Sep 13, 2019

Noticed that docker driver attempts to retry container creation and start.

This fixes a couple of bugs, where we either don't retry transient errors, or we retry in such away that makes retry invocation fails.

Mahmood Ali added 2 commits September 13, 2019 13:02
This handles a bug where we may start a container successfully, yet we
fail due to retries and startContainer not being idempotent call.

Here, we ensure that when starting a container fails with 500 error,
the retry succeeds if container was started successfully.
@notnoop notnoop added this to the 0.10.0 milestone Sep 13, 2019
} else if isDockerTransientError(createErr) && attempted < 5 {
attempted++
time.Sleep(1 * time.Second)
goto CREATE
}

return nil, recoverableErrTimeouts(createErr)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not obvious to me why recoverableErrTimeout are used. StartTask doesn't special case them, and wraps them before returning. I considered removing it completely and doing more retries locally, but decided to be conservative and double check goal before making further changes to driver.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have much context to add here other than that it is was brought over from pre-0.9 drivers:

return nil, recoverableErrTimeouts(createErr)

Copy link
Contributor Author

@notnoop notnoop Sep 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Makes sense for more aggressive refactor follow up PR.

drivers/docker/driver.go Outdated Show resolved Hide resolved
@notnoop notnoop force-pushed the b-docker-start-failure-handling branch from 62d7fda to b5b445c Compare September 18, 2019 12:13
@notnoop notnoop merged commit 922663a into master Sep 18, 2019
@notnoop notnoop deleted the b-docker-start-failure-handling branch September 18, 2019 12:27
@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jan 30, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants