Improve pull handling #1320

rnorth · 2019-03-18T20:10:49Z

Pulling of images is something that I feel we can generally improve. This PR attempts to:

Remove the current fixed timeout+retries for pulls, replacing it with a monitor of progress. If the download pauses for more than 30s, the pull will be aborted, but otherwise any duration of pull is allowed as long as progress is being made.
Make the logging more human friendly; we'll actually log the downloaded/extracted state of the layers - in a form that is informative but not too noisy in the logs.

An example of the logs from both of these changes together (where I cut the network connection during the pull):

19:34:25.198 INFO  🐳 [ibmcom/db2express-c:latest] - Pulling image
19:34:25.198 INFO  🐳 [ibmcom/db2express-c:latest] - Pulling image layers:  0 pending,  0 downloaded,  0 extracted, (0 bytes/0 bytes)
19:34:25.967 INFO  🐳 [ibmcom/db2express-c:latest] - Pulling image layers: 12 pending,  1 downloaded,  0 extracted, (32 bytes/? MB)
19:34:27.363 INFO  🐳 [ibmcom/db2express-c:latest] - Pulling image layers: 11 pending,  2 downloaded,  0 extracted, (1 MB/? MB)
19:34:58.519 ERROR 🐳 [ibmcom/db2express-c:latest] - Docker image pull has not made progress in 30s - aborting pull
19:34:58.564 ERROR 🐳 [ibmcom/db2express-c:latest] - Failed to pull image: ibmcom/db2express-c:latest. Please check output of `docker pull ibmcom/db2express-c:latest`

core/src/main/java/org/testcontainers/images/LoggedPullImageResultCallback.java

core/src/main/java/org/testcontainers/images/RemoteDockerImage.java

core/src/main/java/org/testcontainers/images/TimeLimitedLoggedPullImageResultCallback.java

bsideup · 2019-04-28T22:30:07Z

Wow, really great change! 👍 Seems to make our pulling process much more stable 💯 I just left a few comments, but I really like the idea in general

bsideup · 2019-07-21T07:12:39Z

core/src/main/java/org/testcontainers/images/RemoteDockerImage.java

-                    logger.info("Pulling docker image: {}. Please be patient; this may take some time but only needs to be done once.", imageName);
-                }
-
-                if (attempts++ >= 3) {


I see that you removed the retrying logic here. Is it really safe?

Yes, this is deliberate. I don't think that we have any more reasons to retry pulls.

A pull timeout was one case where a retry actually used to be useful, in that we could expect huge/slow downloads to maybe fail on the first attempt but succeed after that due to cached layers.

With the new code, we can now tolerate extremely long downloads as long as progress is being made. We shouldn't need to retry downloads because of these any more.

Other reasons for failure, such as unavailable images or auth failures make no sense to retry anyway.

I'll leave this comment unresolved for easy visiblity should we ever regret this change!

It turns out I was wrong; I've noticed numerous failures in CI due to timeout errors at pull time.

Will reinstate retry logic for pulls!

core/src/main/java/org/testcontainers/images/TimeLimitedLoggedPullImageResultCallback.java

core/src/main/java/org/testcontainers/containers/GenericContainer.java

bsideup

Looks good! 👍
Left a few question about the retrying mechanism, but other than that - good to go 👍

rnorth · 2019-07-22T08:15:02Z

I think I may revert 1d40670 and 315bc9d - I think the costs outweigh the benefit.

bsideup · 2019-07-22T08:49:44Z

@rnorth what was the cost?

rnorth · 2019-07-22T09:06:06Z

It's a shame you can't see the commit!

DockerClientFactory's runInsideDocker cannot use the singleton docker client, and needs to use a specific client instance for the pull. In order to make that work, I had to introduce an DockerClient parameter to RemoteDockerImage so that it could use that provided client. It 'worked' but is extending the public API somewhere else, and feels unpleasant.

We could consider a broader refactoring, but I'm not sure that it's worth it.

bsideup · 2019-07-22T09:07:46Z

@rnorth ok 👍

rnorth requested review from bsideup and kiview as code owners March 18, 2019 20:10

rnorth force-pushed the improve-pull-callback branch from 4c7f395 to 45eb617 Compare March 18, 2019 20:12