Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Builds are not reproducible #3332

Open
mtalexan opened this issue Jan 27, 2023 · 5 comments
Open

Builds are not reproducible #3332

mtalexan opened this issue Jan 27, 2023 · 5 comments

Comments

@mtalexan
Copy link
Contributor

Bug Report

This might be an ostree bug?

Builds using the coreos-assembler container with -e SOURCE_DATE_EPOCH=${fixed_epoch}, a fixed config commit, and run with build --version=${fixed_buildid} on a pristine VM system are completing but producing different ostree-content-checksum values (per meta.json) on every build repetition of cosa clean && cosa build --version=${fixed_buildid}.

Environment

What operating system is being used to run coreos-assembler?
Fedora 37 (in a fresh VM)

What operating system is being assembled?
Customized minimal FCOS variant (fixed config commit that builds successfully)

Is coreos-assembler running in Podman or Docker?
podman 4.3.1

If Podman, is coreos-assembler running privileged or unprivileged?
privileged

Expected Behavior

Running cosa clean then cosa build --version=${fixed_version} on a clean config repo with SOURCE_DATE_EPOCH set to a fixed value should produce the same ostree-content-checksum as the prior identical build when the rpm-ostree-inputhash is the same.

Actual Behavior

Running cosa clean then cosa build --version=${fixed_version} on a clean config repo with SOURCE_DATE_EPOCH set to a fixed value produces a unique ostree-content-checksum on every build even when the rpm-ostree-inputhash is the same on all prior builds.

Reproduction Steps

Using cosa alias:

cosa() {
   env | grep COREOS_ASSEMBLER
   set -x
   podman run --rm -it --security-opt label=disable --privileged \
              --uidmap=1000:0:1 --uidmap=0:1:1000 --uidmap 1001:1001:64536 \
              -v ${PWD}:/srv/ \
	      --device /dev/kvm \
	      --device /dev/fuse \
              --tmpfs /tmp -v /var/tmp:/var/tmp \
              -v /etc/ssl/certs:/etc/ssl/cert:ro \
	      -v /etc/pki/:/etc/pki/:ro \
	      -v /usr/share/pki/ca-trust-legacy/:/usr/share/pki/ca-trust-legacy/:ro \
	      -v ${HOME}/.ssh:/home/builder/.ssh \
	      -e SOURCE_DATE_EPOCH=1672531200 \
              private.registry/copy-of-quay-io-coreos-assembler-image/cosa:latest "$@"
   rc=$?; set +x; return $rc
}

  1. cosa init --branch=my-fixed-branch git@private-gitlab/my-private-coreos-config.git
  2. cosa fetch --strict
  3. cosa build --version=59ff54f
  4. cp builds/latest/x86_64/meta.json ~/
  5. cosa clean
  6. cosa fetch --strict # doesn't end up doing anything
  7. cosa build --version=59ff54f
  8. jq -f ~/meta.json '.ostree-content-checksum' > ~/build1.hash
  9. jq -f builds/latest/x86_64/meta.json '.ostree-content-checksum' > ~/build2.hash
  10. diff ~/build1.hash ~/build2.hash

Other Information

The CoreOS Config contains the following to make the RPM DB generation deterministic:

rpmdb: bdb
rpmdb-normalize: true
When comparing the entire meta.json files, the following are identical between the two builds

  • ref
  • ostree-n-metadata-total
  • ostree-n-metadata-written
  • ostree-n-content-total
  • ostree-n-content-written
  • ostree-n-cache-hits
  • ostree-content-bytes-written
  • ostree-version (also matches buildid)
  • ostree-timestamp
  • rpm-ostree-inputhash
  • buildid (also matches ostree-version)
  • coreos-assembler.image-genver
  • name
  • summary
  • coreos-assembler.image-config-checksum
  • "coreos-assembler.code-source": "container"
  • coreos-assembler.container-config-git
{
  "commit": "59ff54f0574445ef2912d7ecf1ccda71f0eb3efb",
  "origin": "git@private-gitlab/my-private-coreos-config.git",
  "branch": "my-fixed-branch",
  "dirty": "false"
}
  • "coreos-assembler.delayed-meta-merge": false
  • coreos-assembler.container-image-git
{
  "commit": "d5f1623aad6d133b2c7c00e784c04ab6828450c1",
  "origin": "https://github.com/coreos/coreos-assembler.git",
  "branch": "main",
  "dirty": "true"
}
  • "coreos-assembler.config-gitrev": "59ff54f0574445ef2912d7ecf1ccda71f0eb3efb"
  • "coreos-assembler.config-dirty": "false",
  • "coreos-assembler.basearch": "x86_64"

Trying lots of iterations, it appears this gets worse the larger the image size involved. Our configs that have a handful of the larger rpms commented out to reduce the resulting RPM total size from 6.8 GB to 500 MB have reproducible builds most of the time, though occasionally it will suddenly start producing different results, but the larger images never produce the same results.

@jlebon
Copy link
Member

jlebon commented Jan 27, 2023

Bit-level reproducibility is currently not a goal of coreos-assembler. Given the work that went into rpm-ostree to make composes reproducible, it probably wouldn't be too hard to at least make cosa build ostree be fully reproducible. We already reuse the source config git timestamp for overlays today. I suspect there are other compose inputs that need tighter timestamp control.

But cosa is used to also build many other artifacts (disk images and container images). Trying to make those fully reproducible would be a very large endeavour.

@mtalexan
Copy link
Contributor Author

The issue I'm reporting here is that the ostree commits aren't reproducible either.
I wasn't aware of the image non-reproducibility, though that's good to know and somewhat understandable, but the ostree commits coreos-assembler is constructing also aren't reproducible from what I can tell. My understanding was that's one of the main intents, reproducible ostree commits.
Am I misunderstanding or missing a relevant setting maybe?

@jlebon
Copy link
Member

jlebon commented Jan 27, 2023

My understanding was that's one of the main intents, reproducible ostree commits.

No, it hasn't really been a focus, but it'd definitely be nice to support. That would also allow reproducible builds of the container image. I know you're using bdb, but for general information, note that for sqlite (which is what FCOS and el9 use), actually achieving this is also blocked on rpm-software-management/rpm#2219.

@jlebon
Copy link
Member

jlebon commented Jan 27, 2023

I won't speak for the other maintainers but I'd accept patches to enable this at least for the ostree compose (assuming they're not invasive, which I don't think they would be).

@mtalexan
Copy link
Contributor Author

For posterity, I traced this to 2 issues.

The first is that some Python libraries are byte-compiled in the root file system that's part of ostree, but only if they're used as part of an rpm installation hook for one of the included RPMs, and Python byte compilation has never been reproducible. The only current fix for this is to manually wipe all python byte-compilation caches before the final ostree compose using a custom postprocess hook. E.g. add this to your treefile:

postprocess: 
  - |
    echo "Removing all module __pycache__ folders under /usr"
    find /usr -type d -name '__pycache__' -exec rm -r '{}' +

The second issue is that the boot image is included in the ostree commits generated, and the boot image is a binary blob. That binary blob is not reproducible (I haven't figured out the exact reasons why yet), but it's causing the resulting ostree commit that's created containing it to be different.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants