Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docker push retry loop #584

Closed
scones opened this issue Feb 24, 2019 · 13 comments · Fixed by #1578
Closed

docker push retry loop #584

scones opened this issue Feb 24, 2019 · 13 comments · Fixed by #1578
Labels
area/registry For all bugs having to do with pushing/pulling into registries kind/feature-request priority/p3 agreed that this would be good to have, but no one is available at the moment.

Comments

@scones
Copy link
Contributor

scones commented Feb 24, 2019

Actual behavior
first of: this is feature request rather than a bug report
i am running a private cluster and sometimes run into the occasion, that the push to the (on-cloud) registry receives a 504.
I am aware, that this is an issue with the cluster itself, but this doesn't make the feature request any less valid:

please wrap the actual docker push command into i retry loop.
additional thoughts on that:

  • exponential backoff
  • activated by flag (like --retries=0 as default to --retries=10 or more)

Expected behavior
avoid starting kaniko all over again and computing everything all over again.
have it retry the push only

To Reproduce
Steps to reproduce the behavior:
i would assume it should be possible to shutdown the target repository for the first few retries to test/implement that

Additional Information
irrelevant, i suppose

best regards and thanks for this marvelous project!

@balopat
Copy link
Contributor

balopat commented Sep 12, 2019

I just ran into this on our integration tests for skaffold - it would be very good if we could retry temporary errors in Kaniko, Docker also retries pushes:

error pushing image: failed to push to destination gcr.io/k8s-skaffold/skaffold-example:v0.37.1-223-gd2701f2: Patch https://gcr.io/v2/k8s-skaffold/skaffold-example/blobs/uploads/ABTmro6w_NXP3bfCN4xRTOpqbI8zPHZLwF0fTLX-NZaZUHaPliXwgjlx9nB31-z1prYDx0Rss-9fXdxowbL1vLs: io: read/write on closed pipe

@donmccasland donmccasland added area/registry For all bugs having to do with pushing/pulling into registries priority/p3 agreed that this would be good to have, but no one is available at the moment. labels Sep 24, 2019
@aivarasko
Copy link

Retry is already in google/go-containerregistry#459 but kaniko needs to upgrade this dependency.

@aivarasko
Copy link

Did extra checking and I think the issue is that just tryUpload part but not checkExistingBlob part in

func (w *writer) uploadOne(l v1.Layer) error {
is wrapped in retry

@ptemmer
Copy link

ptemmer commented Jan 21, 2020

+1 Very much desired functionality

@rrichardson
Copy link

I have Kaniko jobs that run in parallel on my CI system. If the system is busy, I see these errors:

PUT https://registry.mydomain.com/myimage/manifests/latest: MANIFEST_BLOB_UNKNOWN: blob unknown to registry; sha256:7bbd61231447a971b972bf6e62b7e5aecc52c8edbdf1cf372d63aa1a3b1ed821

This appears to be some problem on my registry side, but it only happens in high traffic. It would be great to have a retry, since the build succeed, but the push temporarily failed.

@EmilyEmily
Copy link

+1 Continue waiting for this functionality.

@romainourgorry
Copy link

+1 Very much desired functionality

@FranciscoMSM
Copy link

FranciscoMSM commented Dec 9, 2020

👍

@scones
Copy link
Contributor Author

scones commented Dec 10, 2020

in my experience p3 is the "edge feature" / "nice to have" category. i would kindly request bumping it to p2.

Since the continued annoyance of devs and ops using this project seems not to be enough: please think of the excess co2 produced by the unnecessary recompilations of already successfully compiled steps. (yes, the 'think of the trees!' argument)

@crumohr
Copy link

crumohr commented Jan 25, 2021

There seems to be a commit google/go-containerregistry@3b7741e from Jan 7. Could this solve our issue for good?

Addendum: this commit is part of github.com/google/go-containerregistry v0.4.0

@FritzOscar
Copy link

+1 Very much desired functionality

@ae-v
Copy link

ae-v commented Jan 26, 2021

Users of registries with a HA loadbalancing in front need this fix urgently @priyawadhwa @tejal29 @dlorenc Thanks in advance.

@scones
Copy link
Contributor Author

scones commented Feb 24, 2021

yay, 2 years later :)

no-reply pushed a commit to surfliner/surfliner-mirror that referenced this issue Mar 16, 2022
Setting the value to `3` initially, which we can adjust as needed.

This feature was introduced relatively recently, the following PR

GoogleContainerTools/kaniko#1578

Reason for feature:

> We are facing intermittent issues to push the image to the destination. Cause is as far as I can tell network flakeness. There is a long-standing issue asking for retries for the push operation, so I investigated this.

I recently noticed this error that happened to coincide with 5
dependency MRs from Renovate all coming in at once.

This user references similar experiences with Kaniko/Gitlab:

GoogleContainerTools/kaniko#584 (comment)

The hope here using the `--push-retry` option is that it will try to
push a few more times and avoid the need for us to manually retry the
`build` job to get it to pass.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/registry For all bugs having to do with pushing/pulling into registries kind/feature-request priority/p3 agreed that this would be good to have, but no one is available at the moment.
Projects
None yet
Development

Successfully merging a pull request may close this issue.