Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-consider container caching strategy to use registries #303

Open
thomaseizinger opened this issue Sep 13, 2023 · 6 comments
Open

Re-consider container caching strategy to use registries #303

thomaseizinger opened this issue Sep 13, 2023 · 6 comments

Comments

@thomaseizinger
Copy link
Contributor

thomaseizinger commented Sep 13, 2023

Whilst working on the interop-tests and taking a closer look at our container caching in general, I had the following thoughts:

The current caching mechanism introduces a fair bit of complexity:

The benefit we are getting from this is that we can pretty much plug any commit of a repository into the Makefile, hit make and we end up with a container. The cache is thus purely a performance optimisation.

Is this worth the complexity? Yesterday, I discovered a subtle bug in our setup that made us not cache a particular Rust image, see #301.

The test runner is already designed to work in phases:

  1. Generate the permutations of test cases
  2. Generate a docker-compose.yml file
  3. Run a particular docker-compose.yml file

If we would use container registries instead, we could delete all of the above code by just referencing image IDs in the versions.ts file.

When debugging code, we could always generate the offending docker-compose.yml file first and swap the container reference out to a point at a Dockerfile instead which would build a local container instead. #282 already hints at this too.

Currently, the "contract" between libp2p/test-plans and the repositories is that the need to provide a Makefile which builds a container. The new contract would be that they need to provide a Dockerfile that builds a functioning version. This would likely be more useful because not every developer has all build-environments of other languages set up.

Have I missed anything? Opinions welcome.

@thomaseizinger
Copy link
Contributor Author

@mxinden
Copy link
Member

mxinden commented Oct 16, 2023

Sorry for the late reply.

Generally I am in favor of the move to container registries for their simplicity.

That said, I don't have the capacity / priority today to change the existing transport interop caching strategy, nor to review all the necessary changes.

Thus, unless we have two people owning this change, I suggest keeping it as is.

What are other people's thoughts?

@thomaseizinger
Copy link
Contributor Author

That said, I don't have the capacity / priority today to change the existing transport interop caching strategy, nor to review all the necessary changes.

Thus, unless we have two people owning this change, I suggest keeping it as is.

I am happy to implement the changes. Review should be trivial as we are just deleting code. Docker-compose implicitly pulls containers so we don't even have to do anything to make it work with registry-hosted images.

@thomaseizinger
Copy link
Contributor Author

When debugging code, we could always generate the offending docker-compose.yml file first and swap the container reference out to a point at a Dockerfile instead which would build a local container instead. #282 already hints at this too.

FWIW, in #304 I adopted this design and always generate the docker-compose.yml file as a build artifact. Thus, it is trivial to modify it to build a container instead.

@achingbrain
Copy link
Member

I think the caching strategy was implemented as-is because building the artefacts required to run the tests was previously very slow - if memory serves it was over 25 minutes on GH CI.

I’m not against any of the proposed changes as long as it doesn’t increase the time taken to run the test suite.

@thomaseizinger
Copy link
Contributor Author

I’m not against any of the proposed changes as long as it doesn’t increase the time taken to run the test suite.

The idea is to decrease the time even further. At the moment, all images are pulled from the cache in sequence whereas I am pretty sure, docker compose would pull images in parallel.

I think the caching strategy was implemented as-is

From memory, what was important to @MarcoPolo is that everything is reproducible from a given commit. See also the design paragraph at the top in https://github.com/libp2p/test-plans/tree/master/transport-interop#transport-interoperability-tests.

Whilst I think that is a novel goal, it isn't fully true anyway. For example, we still depend on all sorts of software being installed and we don't pin the version of these tools. Referencing docker images by hash gives us similarish guarantees. We can still build a docker container from the exact same Git hash. It might not be bit-for-bit the same container but we also don't have this guarantee at the moment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants