-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve pull handling #1320
Improve pull handling #1320
Conversation
4c7f395
to
45eb617
Compare
core/src/main/java/org/testcontainers/images/LoggedPullImageResultCallback.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/testcontainers/images/LoggedPullImageResultCallback.java
Show resolved
Hide resolved
core/src/main/java/org/testcontainers/images/LoggedPullImageResultCallback.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/testcontainers/images/RemoteDockerImage.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/testcontainers/images/TimeLimitedLoggedPullImageResultCallback.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/testcontainers/images/TimeLimitedLoggedPullImageResultCallback.java
Outdated
Show resolved
Hide resolved
Wow, really great change! 👍 Seems to make our pulling process much more stable 💯 I just left a few comments, but I really like the idea in general |
logger.info("Pulling docker image: {}. Please be patient; this may take some time but only needs to be done once.", imageName); | ||
} | ||
|
||
if (attempts++ >= 3) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that you removed the retrying logic here. Is it really safe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is deliberate. I don't think that we have any more reasons to retry pulls.
A pull timeout was one case where a retry actually used to be useful, in that we could expect huge/slow downloads to maybe fail on the first attempt but succeed after that due to cached layers.
With the new code, we can now tolerate extremely long downloads as long as progress is being made. We shouldn't need to retry downloads because of these any more.
Other reasons for failure, such as unavailable images or auth failures make no sense to retry anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll leave this comment unresolved for easy visiblity should we ever regret this change!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It turns out I was wrong; I've noticed numerous failures in CI due to timeout errors at pull time.
Will reinstate retry logic for pulls!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #1712
core/src/main/java/org/testcontainers/images/TimeLimitedLoggedPullImageResultCallback.java
Show resolved
Hide resolved
core/src/main/java/org/testcontainers/containers/GenericContainer.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! 👍
Left a few question about the retrying mechanism, but other than that - good to go 👍
02df4e1
to
315bc9d
Compare
1d40670
to
87a5d9f
Compare
@rnorth what was the cost? |
It's a shame you can't see the commit!
We could consider a broader refactoring, but I'm not sure that it's worth it. |
@rnorth ok 👍 |
Pulling of images is something that I feel we can generally improve. This PR attempts to:
Remove the current fixed timeout+retries for pulls, replacing it with a monitor of progress. If the download pauses for more than 30s, the pull will be aborted, but otherwise any duration of pull is allowed as long as progress is being made.
Make the logging more human friendly; we'll actually log the downloaded/extracted state of the layers - in a form that is informative but not too noisy in the logs.
An example of the logs from both of these changes together (where I cut the network connection during the pull):