Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry the image pull once after 5 seconds #792

Merged
merged 1 commit into from
Oct 14, 2022

Conversation

komish
Copy link
Contributor

@komish komish commented Oct 11, 2022

This PR should allow crane the ability to retry an image pull after 5 seconds when a pull failure occurs.

The one thing this may be missing is a RetryPredicate. I'm unsure if we need to specify one, but the default RetryPredicate captures a few error cases.

CC @tkrishtop to see if something like this may help with failure cases you've observed.

fixes #785

Signed-off-by: Jose R. Gonzalez jose@flutes.dev

Signed-off-by: Jose R. Gonzalez <jose@flutes.dev>
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 11, 2022
@openshift-ci openshift-ci bot requested review from bcrochet and jomkz October 11, 2022 15:31
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 11, 2022
@coveralls
Copy link

Coverage Status

Coverage increased (+0.04%) to 84.5% when pulling 8e436bc on komish:retry-pull-once into ae43bbe on redhat-openshift-ecosystem:main.

@tkrishtop
Copy link
Contributor

check workload preflight-green

@tkrishtop
Copy link
Contributor

check workload preflight-green

@tkrishtop
Copy link
Contributor

Sorry, the job failure was not related to PR content. It was related to operator-sdk rebuild that made the format of IBM operator bundle to be deprecated. I removed this test from the list of mandatory tests.

@tkrishtop
Copy link
Contributor

check workload preflight-green

Factor: 1.0,
Jitter: 0.1,
Steps: 2,
}))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @komish, looks good for me, you change "0s -> 1s -> 4s" default strategy to "0s -> 5s", that could improve the situation in case of longer outages.

Maybe we could even keep the exponential strategy if the retry interval is large enough, something like "0s -> 5s -> 15s" strategy.

remote.Backoff{
			Duration: 5,
			Factor:   2.0,
			Jitter:   0.1,
			Steps:    3,
		}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the moment, I'll merge with 0s --> 5s, and if this proves not useful enough, we may consider a 3-step exponential. Keep us posted.

@openshift-ci
Copy link

openshift-ci bot commented Oct 12, 2022

@tkrishtop: changing LGTM is restricted to collaborators

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@komish komish removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 12, 2022
@komish komish changed the title [WIP] Retry the image pull once after 5 seconds Retry the image pull once after 5 seconds Oct 12, 2022
Copy link
Contributor

@bcrochet bcrochet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Oct 12, 2022
@openshift-ci
Copy link

openshift-ci bot commented Oct 13, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: acornett21, bcrochet, komish, tkrishtop

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [acornett21,bcrochet,komish]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@komish komish merged commit 5bdb893 into redhat-openshift-ecosystem:main Oct 14, 2022
@komish komish deleted the retry-pull-once branch October 14, 2022 14:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add a retry for the image pull during Preflight execution
6 participants