Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comments about rule 2: "Build upon existing images" #97

Open
sdettmer opened this issue Jul 6, 2022 · 3 comments
Open

Comments about rule 2: "Build upon existing images" #97

sdettmer opened this issue Jul 6, 2022 · 3 comments

Comments

@sdettmer
Copy link

sdettmer commented Jul 6, 2022

comments about rule 2: "Build upon existing images"

From my point of view this is a clear no-go. Normally we cannot know the state of an existing image – there were cases where people manually intercepted build processes or even manually "unpack&patch&repack" distributions.

It cannot be reliably verified on the image except building it, which essentially means: do not use existing images, but build everything for yourself from locally available data.

@vsoch
Copy link
Collaborator

vsoch commented Jul 6, 2022

Again, I will kindly disagree. There are a core set of base images (e.g., centos, ubuntu) provided by the primary maintainers that are updated with security patches, and that is much better practice than "rolling your own" which at best would be the same thing.

At least for Singularity recipes I have a small plot: https://singularityhub.github.io/singularity-catalog/bases/ and we can see this practice is followed.

@sdettmer
Copy link
Author

sdettmer commented Jul 6, 2022

@vsoch Thank you for your quick reply.

I'm afraid you only believe that these images are reproducible, but in fact they might have been changed (such as adding security packages) or were built using apt install (and used whatever accidentally was available at this day). If you build the same Dockerfile, you might get different results, such as a security fixed package (for a flaw impossible to exploit in your environment) but with a little new bug (breaking your application). Either it is guaranteed to have exactly the same input, or it is not reproducible.

Of course there are other requirements, such as updating to include security fixes, and surely in many cases the old results will not be needed to be reproduced, but when for example in ten years someone wants to verify why a result was incorrect, 100% exact the same content is needed - maybe a well hidden bug somewhere lead to a wrong result.

Of course reproducibility has a price, and often it is high. For example, when using images from maintainers, each must be stored locally.

Let's assume one officially maintained image was attacked and contained a backdoor. This backdoor leads to wrong result of the container operation and to an invalid conclusion of some research. To analyze whether the invalid conclusion was caused by bad scientific practices or even data manipulation, someone could redo the processing. In meantime the maintainers surely removed the backdoor, of course they do, what else could be expected. By this, the reason for the wrong result is removed and the container produces the correct result, different than before (i.e. not reproducing) and the researcher may get into trouble because some may think the invalid conclusion was done to look better in publications.

@vsoch
Copy link
Collaborator

vsoch commented Jul 6, 2022

I don’t actually care if they are perfectly reproducible - it’s almost guaranteed they are slightly different, however is my supply chain in secure (a work in progress but registries will care soon with SBOMs etc) and my container is tested and works as I need it to, this is a successful outcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants