Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(executor): make pax tar builds reproducible again #2384

Merged

Conversation

BronzeDeer
Copy link
Contributor

@BronzeDeer BronzeDeer commented Feb 1, 2023

Fixes #2005

In v1.8.0 (commit 7410007) kaniko switched to using the pax tar header format for compressing image layers, since this format allows for greater precision in recording timestamps, however this inadvertendly broke the "--reproducible" functionality, due to an bug in the underlying go-containerregistry dependency which did not set the additional timestamps in the pax header when canonicalizing image layers. This oversight has since been fixed in the dependency.

This commit bumps the google/go-containerregistry dependency to the first commit which has fixed the bug
(v0.13.1-0.20230201183932-824efc7772b0). It also bumps the version of cloud.google.com/go/storage to v1.29.0 to be compatible with the higher transitive dependency.

Submitter Checklist

These are the criteria that every PR should meet, please check them off as you
review them:

  • Includes unit tests
  • Adds integration tests if needed.

(Since this is a fix in a dependency, no new integration or unit tests were added, do we need a regression or acceptance test for the --reproducible functionality?)

Reviewer Notes

  • The code flow looks good.
  • Unit tests and or integration tests added.

Release Notes

Describe any changes here so maintainer can include it in the release notes, or delete this block.

Examples of user facing changes:
- kaniko reproducible builds work correctly again

@imjasonh
Copy link
Collaborator

imjasonh commented Feb 2, 2023

Can you also update these lines to go-version: 1.20?


@BronzeDeer
Copy link
Contributor Author

I created the requested changes in the workflows and rebased both commits on top of main

@imjasonh
Copy link
Collaborator

imjasonh commented Feb 3, 2023

This error now seems to be actions/setup-go#328 -- I'll suggest and commit a fix on your branch.

.github/workflows/integration-tests.yaml Outdated Show resolved Hide resolved
.github/workflows/unit-tests.yaml Outdated Show resolved Hide resolved
@imjasonh imjasonh mentioned this pull request Feb 4, 2023
4 tasks
@imjasonh
Copy link
Collaborator

imjasonh commented Feb 4, 2023

Sorry about these failures. #2386 does the Go update and fixes the gofmt findings.

I'm not sure why the integration tests are failing.

@BronzeDeer
Copy link
Contributor Author

Sorry about these failures. #2386 does the Go update and fixes the gofmt findings.

I'm not sure why the integration tests are failing.

Sorry that I wasn't around, had a busy time at work, I can now also look back into this. In the mean time, would you like me to rebase this PR on top of #2386 ?

@BronzeDeer
Copy link
Contributor Author

BronzeDeer commented Feb 9, 2023

As a test, I've bumped the version of golangci-linter to the latest version, it can now handle the code for go 1.20, but it starts complaining about a lot of new unresolved linting issues (The jump was 24 minor versions, so new errors are to be expected). Maybe we should first spin out the fixes to the ci into a separate PR including fixing the linting issues?

@imjasonh
Copy link
Collaborator

imjasonh commented Feb 9, 2023

As a test, I've bumped the version of golangci-linter to the latest version, it can now handle the code for go 1.20, but it starts complaining about a lot of new unresolved linting issues (The jump was 24 minor versions, so new errors are to be expected). Maybe we should first spin out the fixes to the ci into a separate PR including fixing the linting issues?

Unfortunately golangci-lint adds and drops and changes lint requirements pretty aggressively sometimes, even within minor releases.

I'd say if there are any you can fix easily then go ahead, otherwise feel free to disable the noisy linters so this can merge. Nobody has the bandwidth to fix all of kaniko's many lint errors, and keep them up to date as golangci-lint finds more. Best to get the actual bugfix in.

#2386 indicates there might also be minikube errors lurking past the Go version bump.

Sorry for all these speed bumps.

@BronzeDeer BronzeDeer mentioned this pull request Feb 19, 2023
4 tasks
@BronzeDeer
Copy link
Contributor Author

I've created a separate PR in #2400 that bumps the linter version and fixes all linter errors and then rebased this PR on top of it. I also squashed the all test workflow go version bumps into a single commit

@BronzeDeer
Copy link
Contributor Author

@imjasonh Sorry to be here only so sporadicly, but could you unblock the workflows on the 2 PRs so that I can get some feedback on the integration tests, even if there is not enough time to review yet?

@BronzeDeer
Copy link
Contributor Author

I've rebased on top of the new version of the other PR, now the boilerplate test should also pass, the failing setup for minikube is still left blocking the integration tests, I'll see if I can find the problem in there later

@BronzeDeer
Copy link
Contributor Author

Update to the CI situation: I've played around with the setup script a bit here BronzeDeer#2

I've managed to restore minikube installation to a working state taking into account all the CNI and CRI changes that happened in k8s 1.24, currently I'm stuck on why the hostPort of the registry-proxy doesn't work. I'm at the point where I'm thinking about just setting up a more lightweight alternative using k3os or similar.

@imjasonh Is there a list of binaries/capabilities that the integration runner VM definitely needs? I might be able to explore simpler options

@imjasonh
Copy link
Collaborator

imjasonh commented Mar 2, 2023

I've managed to restore minikube installation to a working state taking into account all the CNI and CRI changes that happened in k8s 1.24, currently I'm stuck on why the hostPort of the registry-proxy doesn't work. I'm at the point where I'm thinking about just setting up a more lightweight alternative using k3os or similar.

Simplifying the integration test setup however you see fit would be very welcome, if you have the time and energy to undertake it. I think KinD could also be a good solution. AFAIK there's no reason to test k8s specifically at all. A test that uses run_in_docker.sh to build some Dockerfiles would likely be just as good as a test, and require much less infrastructure.

@imjasonh Is there a list of binaries/capabilities that the integration runner VM definitely needs? I might be able to explore simpler options

Not that I'm aware of.

@BronzeDeer
Copy link
Contributor Author

I've managed to restore minikube installation to a working state taking into account all the CNI and CRI changes that happened in k8s 1.24, currently I'm stuck on why the hostPort of the registry-proxy doesn't work. I'm at the point where I'm thinking about just setting up a more lightweight alternative using k3os or similar.

Simplifying the integration test setup however you see fit would be very welcome, if you have the time and energy to undertake it. I think KinD could also be a good solution. AFAIK there's no reason to test k8s specifically at all. A test that uses run_in_docker.sh to build some Dockerfiles would likely be just as good as a test, and require much less infrastructure.

@imjasonh Is there a list of binaries/capabilities that the integration runner VM definitely needs? I might be able to explore simpler options

Not that I'm aware of.

Just a quick update, since it's been a few days again. I've reworked most of the integration setup and got it working quicker and almost all green again, just struggling on one weird error around OCI support, will cut a separate PR once its ready. For Details: BronzeDeer#2

@imjasonh
Copy link
Collaborator

imjasonh commented Mar 8, 2023

You're a champion. 🤯

@BronzeDeer
Copy link
Contributor Author

I opened the integration test refactor PR (#2425) and rebased this PR on top of it. Currently the fix PR is still waiting on another fix in container-diff and once that is integrated into the integration fix PR, we should finally be able to validate and merge this as well

In v1.8.0 (commit 7410007) kaniko switched to using the pax tar header
format for compressing image layers, since this format allows for greater
precision in recording timestamps, however this inadvertendly broke the
"--reproducible" functionality, due to an bug in the underlying
go-containerregistry dependency which did not set the additional
timestamps in the pax header when canonicalizing image layers. This
oversight has since been fixed in the dependency.

This commit bumps the google/go-containerregistry dependency to the
first commit which has fixed the bug
(v0.13.1-0.20230201183932-824efc7772b0). It also bumps the version of
cloud.google.com/go/storage to v1.29.0 to be compatible with the higher
transitive dependency.
@BronzeDeer
Copy link
Contributor Author

Rebased on top of the CI fixed, although for some weird reason the k8s job and only the k8s job run pulled a corrupted container-diff. Forcing a rerun to rule out a one-in-a-thousand flake of a CI VM or similar

@imjasonh imjasonh merged commit 0cbb766 into GoogleContainerTools:main Mar 21, 2023
@isker
Copy link
Contributor

isker commented Mar 21, 2023

Thank you for your quite persistent efforts to get this fixed. I can finally update off 1.7 🌞.

@hrobertson
Copy link

🎉 Thank you @BronzeDeer!

@pstoellberger
Copy link

Thank you! Very much appreciated!

@hrobertson
Copy link

@imjasonh What's the policy regarding when to make a new tag, ie 1.9.2?

@imjasonh
Copy link
Collaborator

@imjasonh What's the policy regarding when to make a new tag, ie 1.9.2?

For that we beg someone at Google to cut a release. cc @chuangw6 🙏🥹🙏

@chuangw6
Copy link
Contributor

@imjasonh What's the policy regarding when to make a new tag, ie 1.9.2?

For that we beg someone at Google to cut a release. cc @chuangw6 🙏🥹🙏

Done! https://github.com/GoogleContainerTools/kaniko/releases/tag/v1.9.2. 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reproducible builds broken in 1.8.0
6 participants