[4.11] Backports for RHCOS CI (--transient option & test entrypoint) #2963

travier · 2022-07-06T12:34:14Z

Backports for 4.11:

We go to a lot of effort to create various caches of builds, and this is very useful for local incremental development to avoid re-downloading RPMs and rewriting data. It is useless for CI builds that discard everything when they're done. Add a `cosa init --transient` flag that for now just disables `fsync()` on `tmp/repo`. There's more we can do here in the future, for example we need to propagate this too into `cache/repo-build` and `cache/pkgcache-repo`, and actually in general we don't need the pkgcache repo at all here.

To make it clearer that things worked.

Followup to the addition of `--transient` for CI flows. If we had coreos/rpm-ostree#3719 this would mostly be unnecessary, but today in CI and prod builds, ostree is invoking `fsync()` inside the supermin VM, and then qemu in the container is going to `fsync()` all changes down to the host system (overlay)fs. For transient flows, switch to `cache=unsafe` so we stop doing that which should greatly help speed.

So we test it, and so we also gain the speed benefits.

Hopefully will fix the `ci/prow/rhcos` context here.

Use a COSA specifc test entry point to focus on tests relevant for COSA.

miabbott

LGTM

cgwalters · 2022-07-06T14:55:09Z

Looks like CI here hit #2946 - let's backport that too?

This is part of coreos/fedora-coreos-tracker#1151 Our generated disk images are largely just a "shell" around the egg of an ostree commit. There is almost nothing that lives in the disk image that isn't in the commit. (This is especially true now that a preparatory commit previous to this moved the *content* of our static `grub.cfg` into `image.json`) In the original coreos-assembler design I'd tried to cleanly separate builds of the ostree from disk image builds, but also support linking them together (with matching version numbers, etc.) The separate `image.yaml` was part of this. This...mostly worked. This change furthers that separation by having image builds input from *just the ostree commit*. Crucially we would no longer need the config git repository to perform an image build. And this in turn unlocks truly better separating ostree builds from disk image builds in the pipeline *and* supporting downstream tooling generating disk images from custom containers. One neat thing here is we will finally fix a longstanding issue where coreos-assembler fails when just the `image.yaml` changes: Closes: coreos#972

For now, we need to handle this. But I think we can switch back to hard requiring it later, which will happen naturally when we rework the build system to more cleanly separate ostree container builds and disk image builds. Closes: coreos#2885

If the cache was nuked, we should re-download everything even if nothing changed since the last build. Otherwise, it's impossible to do a `cosa build --force` afterwards.

Prep for future patch.

This inlines back `impl_rpmostree_compose` in the two places where it's called. Prep for future patch.

Prep for future patch.

Now that the variable is declared publicly, we can just add it to the list of rpm-ostree args as part of the compose helper. This has no functional change in the `cosa build` path. In the `cosa fetch` path, this will add the switch but rpm-ostree will ignore it since we're not actually writing a commit.

All the other overlays have the prefix `overlay/`. Do that here too for consistency.

Instead of having rpm-ostree effectively doing a `pull-local` under the hood over 9p, change things so we compose in a local repo, export the commit to an archive and only copy *that* over 9p. This should greatly help with pipelines hitting ENOMEM due to transferring many small files over 9p: openshift/os#594 An initial approach exported to OCI archive instead, but encapsulating and unencapsulating are more expensive operations. Unfortunately, we still need to unpack into `tmp/repo` given that many follow-up commands expect the commit to be available in `tmp/repo`. If we can drop that assumption (and get rid of `tmp/repo` entirely), we could refactor things further so that `compose.sh` creates the final chunked OCI artifact upfront. In local testing, this adds about 30s to `cosa build`. We still compress just once, and we get hardlinks pulling out of the tarball.

travier · 2022-07-07T11:45:15Z

I've added

This is starting to look like a big backport.

travier · 2022-07-07T11:47:15Z

Tagging @jlebon as I'm not sure this is complete

miabbott · 2022-07-07T14:59:16Z

This is starting to look like a big backport.

Maybe do the 9p changes as a separate backport?

cgwalters · 2022-07-07T15:46:54Z

We could also force reset the branch to current main...a similar debate in coreos/rpm-ostree#3819

Maybe do the 9p changes as a separate backport?

No objection from me, but it also seems to me we need the 9p fixes.

CI here failed on:

 --- FAIL: ext.config.shared.root-reprovision.raid1 (1073.08s)
        harness.go:103: TIMEOUT[15m0s]: ssh: journalctl -t kola-runext-test.sh
--- FAIL: coreos.boot-mirror (981.98s)
        boot-mirror.go:105: machine "3a25a671-62b5-4fa2-ba44-0f8f0880fdbc" failed to start: ssh journalctl failed: time limit exceeded

Hmm, though it seems like they succeeded in a retest?

/retest

travier · 2022-07-07T16:19:56Z

I'm leaning toward a force reset too to the first commit that has all the changes that we need.

miabbott · 2022-07-07T19:26:47Z

I'm leaning toward a force reset too to the first commit that has all the changes that we need.

I'm really gun shy about doing a force reset on a branch that should be in code freeze. We'd need to do incredibly thorough testing/analysis to ensure we don't introduce any unwanted changes.

cgwalters · 2022-07-07T21:11:59Z

Yeah, let's get this merged now since we have green CI. We can choose to do a force push later too.

cgwalters and others added 6 commits July 6, 2022 14:24

init: Add a success message

6d9956c

To make it clearer that things worked.

ci/prow-rhcos.sh: use --transient

8b8f99c

So we test it, and so we also gain the speed benefits.

ci/prow-rhcos: Adjust for recent changes in openshift/os

3c54104

Hopefully will fix the `ci/prow/rhcos` context here.

ci/prow-rhcos: Use COSA specific test entry point

37b5fc0

Use a COSA specifc test entry point to focus on tests relevant for COSA.

miabbott approved these changes Jul 6, 2022

View reviewed changes

cgwalters and others added 9 commits July 7, 2022 13:43

cmd-fetch: pass --force-nocache

e4e014d

If the cache was nuked, we should re-download everything even if nothing changed since the last build. Otherwise, it's impossible to do a `cosa build --force` afterwards.

cmdlib: factor out runnning supermin with cache

9399aaf

Prep for future patch.

cmdlib: drop impl_rpmostree_compose

4283768

This inlines back `impl_rpmostree_compose` in the two places where it's called. Prep for future patch.

cmd-build: globalize path to compose.json

83e9722

Prep for future patch.

cmdlib: rename cosa-image-json ostree overlay

71a294c

All the other overlays have the prefix `overlay/`. Do that here too for consistency.

cgwalters merged commit 8c6da5b into coreos:rhcos-4.11 Jul 7, 2022

travier deleted the rhcos-4.11-ci-bp branch July 11, 2022 12:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[4.11] Backports for RHCOS CI (--transient option & test entrypoint) #2963

[4.11] Backports for RHCOS CI (--transient option & test entrypoint) #2963

travier commented Jul 6, 2022 •

edited

Loading

miabbott left a comment

cgwalters commented Jul 6, 2022

travier commented Jul 7, 2022

travier commented Jul 7, 2022

miabbott commented Jul 7, 2022

cgwalters commented Jul 7, 2022

travier commented Jul 7, 2022

miabbott commented Jul 7, 2022

cgwalters commented Jul 7, 2022

[4.11] Backports for RHCOS CI (--transient option & test entrypoint) #2963

[4.11] Backports for RHCOS CI (--transient option & test entrypoint) #2963

Conversation

travier commented Jul 6, 2022 • edited Loading

miabbott left a comment

Choose a reason for hiding this comment

cgwalters commented Jul 6, 2022

travier commented Jul 7, 2022

travier commented Jul 7, 2022

miabbott commented Jul 7, 2022

cgwalters commented Jul 7, 2022

travier commented Jul 7, 2022

miabbott commented Jul 7, 2022

cgwalters commented Jul 7, 2022

travier commented Jul 6, 2022 •

edited

Loading