Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci(fix): Switch test workflows to also use the buildx docker-container driver #3072

Merged
merged 3 commits into from
Feb 9, 2023

Conversation

polarathene
Copy link
Member

@polarathene polarathene commented Feb 8, 2023

Description

As the test workflow does not use the docker-container buildx driver (our build and publish workflows do), it uses the Docker Engine bundled BuildKit version (BuildKit release over 2 years old) which until the recent v23 release (Feb 2023) does not support attestations. Github should have v23 available in runners sometime this month.

This driver difference is causing the current CI failures for test workflow runs:

  • docker/build-push-action v4 will only add the --provenance option to buildx command, if explicitly configured in the action. By not adding a value we can avoid causing this failure happening in buildx.
  • The option is only valid in buildx 0.10.0 when there is a version of BuildKit that supports it. The option will cause buildx to fail early if the feature is not available the detected BuildKit (only when supported, does it acknowledge an opt-out). This appears to be resolved in newer buildx releases, at least 0.10.2.

As the test workflows don't do anything with Docker beyond re-use the build cache to import a local image for testing, the value of --provenance isn't relevant to this workflow regardless.

Alternative options considered were:

  • Also use a docker-container driver to always use the same BuildKit version (only during the rebuild of the image, not that relevant to this workflow).
  • Continue to allow CI failure until Github's updates to Docker Engine (expected to arrive within a week).
  • Reverting the action version won't make a difference, previously we used v3.3 for the past 3 weeks (It by default enabled provenance when supported). It was just the addition of --provenance option to buildx.

References

Type of change

  • Bug fix (non-breaking change which fixes an issue)

As the test workflow does not use the `docker-container` buildx driver, it uses the Docker Engine bundled BuildKit version which until v23 release does not support attestations.

Likewise the current buildx version in CI is `0.10.0` which does not respect `--provenance false`, the presence of the option appears to trigger a BuildKit version compatibility check and fail early before it considers the value of the option.
georglauterbach
georglauterbach previously approved these changes Feb 8, 2023
@wernerfred
Copy link
Member

I'm not sure if i 100% get the point? Wouldn't that mean that that all tests should always have failed? So if not, why would they fail only sometimes? And how will we make sure to enable it in the future again (what i want to say is: if it will be available sometime this month isn't it easier to wait that 20days instead of disabling and then thinking of re-adding it again)?

@casperklein
Copy link
Member

Alternative options considered were:
Also use a docker-container driver to always use the same BuildKit

I would opt for this, if that's easily possible.

@polarathene
Copy link
Member Author

polarathene commented Feb 9, 2023

@wernerfred tests only failed recently because of adding provenance: false and that not being compatible with current Docker Engine + buildx versions in Github CI (only when using the buildx docker driver instead of docker-container driver).

This only affected our test workflow. Obviously we don't want to wait 7-20 days for a fix if it means the test suite can't run in the CI 😅

The PR has been updated to use the docker-container driver instead. There is no need to worry now about later bringing back provenance: false, it'll just work now 👍

Original response

I'm not sure if i 100% get the point? Wouldn't that mean that that all tests should always have failed?

All tests will fail presently now since we added the provenance: false to our workflow.

As explained this adds the --provenance option to buildx command, which the current runners use buildx 0.10.0 and that fails early due to BuildKit in current Docker Engine being too old.

So if not, why would they fail only sometimes?

I have seen some runners use buildx 0.10.2 which has a fix to acknowledge --provenance false, and not do a BuildKit compatibility check when --provenance option is provided (regardless of value), those runs will pass.

This wasn't an issue prior to us adding provenance: false, as the --provenance option was not added to buildx command, which then AFAIK has buildx only enable attestations (--provenance true) if the BuildKit version supported it. Basically --provenance with buildx 0.10.0 was buggy, but since been fixed, just Github CI runners haven't all been updated to it yet.

how will we make sure to enable it in the future again
what i want to say is: if it will be available sometime this month isn't it easier to wait that 20days instead of disabling and then thinking of re-adding it again

We've chosen to opt-out intentionally already in build and publish. It's not actually relevant to the test workflow, so opt-in/out explicitly (or omit it for an implicit enable when supported) won't matter. It's only presently an issue due to the bug with explicit provenance setting and the current Docker Engine + buildx versions in the CI.

Having an explicit provenance: false in the test workflow later would only be to maintain consistency with the other workflows if maintainers prefer that (despite it not having any relevance to testing).

Leaving it as-is, means our CI will fail the test suite for that duration...I'm not sure why you'd be okay with 20 days of tests failing due to this? If you're concerned about someone forgetting, we can raise an issue as a reminder and the stale bot will ping any maintainer subscribed as a reminder.

Or we just leave it disabled / removed 🤷‍♂️

@polarathene
Copy link
Member Author

polarathene commented Feb 9, 2023

Alternative options considered were:
Also use a docker-container driver to always use the same BuildKit

I would opt for this, if that's easily possible.

@casperklein yeah no worries we can do that I think.

EDIT: Done. I looked into this a bit more and I think it affected our cache during re-builds of the image for test workflows. Could improve those each by up to 60 seconds from what I saw 🤔

Original response

The docker-container driver is only necessary for some other benefits like supporting ARM64 builds, and exporting cache features IIRC. COPY --link also needs at least BuildKit 0.10.0 (not to be confused with buildx version) from March 2022. Docker Engine v23 will provide that. For the tests it's not important though.


So to clarify:

  • test workflow does not need docker-container driver.
  • nor does it need to opt out or in for attestations (--provenance) as the image is discarded after tests, and this feature does not affect tests, only image builds to publish to registries.
  • Github CI versions of Docker Engine + buildx are the cause of failure with --provenance option, and that'll likely be resolved by Github within a week or so.

If you'd like more consistency, I can bring in the buildx action into the test workflow as well (the qemu one isn't necessary), which configures buildx with the docker-container driver by default. The only difference there is a newer version of BuildKit is used, but the build workflow has already built the image for us.

While it's not necessary, it may look easier to maintain that way if it's more consistent with the build and publish workflows. docker/build-push-action will still differ across all three though, as it should; but we can then have a consistent provenance: false if that is what other maintainers are comfortable with 👍

An alternative solution to omitting `provenance: false` (_not supported by buildx 0.10.0 with default `docker` driver when Docker Engine bundles BuildKit less than 0.10.0, which is the case prior to the Docker Engine v23 release_). 

This approach provides more consistency with the build and publish workflows by using the same buildx `docker-container` driver (_and thus newer BuildKit, enabling support for  `provenance: false`_).
@polarathene
Copy link
Member Author

polarathene commented Feb 9, 2023

@casperklein was right to prefer being consistent with the buildx docker-container driver.

The PR has been updated to use that solution instead. It looks like it will improve our caching in test workflow as a bonus 👍 (EDIT: The build step does seem to be completed faster with improved caching, but the added buildx step adds about 40s+ to the workflow run effectively cancelling out most of the gains 😅 )

Original comment

Docker buildx docs:

  • docker-container driver. Mentions --load being required. It's possible that using this driver to import the image build from the build container to the host Docker daemon will be slower due to a copy. CI time may increase.
  • docker driver (default), doesn't actually seem to require --load, thus we probably could have avoided load: true in the action config, or it may have been a no-op for this driver. I recall CI reading in the build cache and loading the image within 30 sec.

Switching to docker-container driver is probably an improvement actually.

I just reviewed some prior CI runs when they were working, and observed that while the build workflow cached the image and re-built withn 30 secs, the docker driver in the tests was not caching the same layers 😅 Instead taking 90+ sec to re-run package install scripts, while keeping the same cache key, thus not uploading a new cache. Whoops!

Buildx `docker-container` driver is not needed here, but it does seem like it improves cache-hit ratio when building from the retrieved build cache (from the earlier build workflow). Possibly due to building with the same BuildKit version.
@wernerfred
Copy link
Member

All tests will fail presently now since we added the provenance: false to our workflow.

@polarathene thanks for your detailed answer and clarification! I've read through the other PR as well and now my original comment doesn't make sense anymore 😆

@polarathene polarathene merged commit 646e010 into master Feb 9, 2023
@polarathene polarathene deleted the ci/fix-test-workflow-provenance branch February 9, 2023 09:33
@polarathene polarathene changed the title ci(fix): Temporarily avoid specifying provenance ci(fix): Switch test workflows to also use the buildx docker-container driver Feb 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants