Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normalise various aspects of the RPM database that vary build-to-build #3165

Merged
merged 4 commits into from
Nov 4, 2021

Conversation

jeamland
Copy link
Contributor

@jeamland jeamland commented Oct 10, 2021

There are various parts of the compose process that result in build-to-build variations in the RPM database.

  1. The RPMTAG_INSTALLTIME and RPMTAG_INSTALLTID values in the RPM database are based on the time the compose process was run and do not respect SOURCE_DATE_EPOCH.
  2. The underlying Berkeley DB holding the RPM database (assuming this backend is in use) is... aggressively antagonistic towards reproducibility.

With regards to the second item, the ndb and sqlite backends have not been examined yet as all my stuff is based around Oracle Linux 8 which uses rpm 4.14 which doesn't have them.

Lastly I'm aware that what these patches do is rather... "vigorous"... and may not be to some people's taste. I'm happy to discuss putting these other flags or options or whatever solution is felt to be appropriate.

@openshift-ci
Copy link

openshift-ci bot commented Oct 10, 2021

Hi @jeamland. Thanks for your PR.

I'm waiting for a coreos member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

rust/src/normalization.rs Outdated Show resolved Hide resolved
rust/src/normalization.rs Outdated Show resolved Hide resolved
rust/src/normalization.rs Outdated Show resolved Hide resolved
rust/src/treefile.rs Outdated Show resolved Hide resolved
rust/src/passwd.rs Outdated Show resolved Hide resolved
@lucab
Copy link
Contributor

lucab commented Oct 11, 2021

Thanks for the PR! I've left a few comments in review.

I agree that some of those changes may be a bit controversial. Other developers are offline for a few days, so I'll leave this hanging for a bit so that they can chime in.

As it stands, I think the shadow normalization logic may be good to land anyway quickly (but see comments). Let's maybe split that one to its own PR?

@jlebon
Copy link
Member

jlebon commented Oct 12, 2021

Hi @jeamland, thanks for this PR!

The RPMTAG_INSTALLTIME and RPMTAG_INSTALLTID values in the RPM database are based on the time the compose process was run and do not respect SOURCE_DATE_EPOCH.

Feels like handing of SOURCE_DATE_EPOCH for rpmdb-related things would belong better in librpm, no? That way it would benefit any efforts for reproducible builds on dnf/yum-based systems as well.

Not completely against landing something in rpm-ostree either, though that's a lot of bit-flipping behind librpm's back to carry.

Re. Berkeley DB, support for writing it has been removed upstream, so I wouldn't care too much about that. You mentioned rpm 4.14 -- are you also trying to reproducibly build a system using that rpm version? In that case, it might be easier to carry the bdb tweaking in a separate utility that you run before rpm-ostree compose commit.

@jeamland
Copy link
Contributor Author

Thanks for the PR! I've left a few comments in review.

I agree that some of those changes may be a bit controversial. Other developers are offline for a few days, so I'll leave this hanging for a bit so that they can chime in.

As it stands, I think the shadow normalization logic may be good to land anyway quickly (but see comments). Let's maybe split that one to its own PR?

I've split the shadow stuff into its own PR (#3174) as requested. Happy to continue discussion on the other changes here.

@jeamland
Copy link
Contributor Author

Hi @jeamland, thanks for this PR!

The RPMTAG_INSTALLTIME and RPMTAG_INSTALLTID values in the RPM database are based on the time the compose process was run and do not respect SOURCE_DATE_EPOCH.

Feels like handing of SOURCE_DATE_EPOCH for rpmdb-related things would belong better in librpm, no? That way it would benefit any efforts for reproducible builds on dnf/yum-based systems as well.

Not completely against landing something in rpm-ostree either, though that's a lot of bit-flipping behind librpm's back to carry.

Re. Berkeley DB, support for writing it has been removed upstream, so I wouldn't care too much about that. You mentioned rpm 4.14 -- are you also trying to reproducibly build a system using that rpm version? In that case, it might be easier to carry the bdb tweaking in a separate utility that you run before rpm-ostree compose commit.

So the RPMTAG_INSTALLTID and RPMTAG_INSTALLTIME changes I'm happy to try and push upstream to rpm as well, although while an rpm-ostree compose run is, at least by my understanding, meant to be an atomic, singular installation of a bunch of packages in one hit (and thus rewriting all the timestamps is justifiable) I'm less certain about its use upstream. That said, another approach upstream would be to allow easier (or actually effective) overriding of the values via the API which would also make it easier. Landing this patch would be awesome for now, I'm more than happy to work upstream as well and then reimplement this functionality in terms of what lands there once it's landed.

With regards to Berkeley DB I suspect I'm not the only person who's going to be building images based on Oracle Linux 8 or RHEL 8 or the like or a while which means we have to care about 4.14 for a bit. If we had 4.15 I'd simply be looking at one of the other backends but unfortunately I don't really have that option unless I want to push hard in certain directions. I'm happy to make this purely an internal thing but for that to land I'll need the unified core argument changes in #3162 (or something equivalent) to land. Alternatively I'm also happy for these to land for now and get removed later once they're definitely not hanging around any more.

I'll do a pass over these to address the feedback above. Would it be best to force-push those into this PR or would it be better to drop this PR and start a new one, especially since some of the pieces have been moved to #3174?

Thanks for the discussion so far!

@cgwalters
Copy link
Member

Would it be best to force-push those into this PR

Force pushing is fine and expected from my PoV!

So the RPMTAG_INSTALLTID and RPMTAG_INSTALLTIME changes I'm happy to try and push upstream to rpm as well,

Ultimately though, we do want to head towards reproducibility for standard container images that have RPMs inside, for similar reasons.

And actually, that intersection becomes even more obvious with coreos/enhancements#7

I'd at least reach out to to rpm upstream, I am happy to champion this cause there.

@cgwalters
Copy link
Member

To clarify, even if rpm upstream accepts this, I am OK to carry normalization code here in the short term to unblock shipping it relatively soon. Particularly since you are in a position to e.g. provide custom build options, we can easily add something like --enable-header-normalization or whatever. Or gate it on an experimental treefile yaml option, etc. (If they don't accept, we can cross that when we come to it)

Also: snce we already write the rpmdb as a separate phase, perhaps we could fork that off into a separate process, and do a crude hack of an LD_PRELOAD that overrides gettimeofday() or whatever too.

It would actually make sense to do that forking unconditionally because it'd increase isolation versus the host daemon. We could even drop root privileges for doing it, just provide a writable tempdir that we take ownership of.

@jeamland jeamland changed the title Normalise various aspects of the composition process that vary build-to-build Normalise various aspects of the RPM database that vary build-to-build Oct 21, 2021
@jeamland
Copy link
Contributor Author

So I've retitled/redescribed this to focus on the RPM database aspects given those are the bits that remain.

I've reworked the RPM database backend detection to address @lucab's feedback above.

I've also started working with rpm upstream to try and get the necessary APIs in place so that we can remove the RPMTAG_INSTALLTID and RPMTAG_INSTALLTIME mucking around, although that will be dependent on having a version of librpm with those APIs in place (assuming they land):

rpm-software-management/rpm#1803

Lastly, even if those ones land the BDB normalisation will need to hang around for as long as people are using BDB-based installations. I'm going to dig in to what might need to be done to SQLite and/or NDB backends and those either may not need anything or may be amenable to accepting patches but either way I think keeping this stuff is worthwhile.

@jeamland jeamland force-pushed the normalisation branch 3 times, most recently from 52a926b to 3429534 Compare October 21, 2021 03:55
@jeamland
Copy link
Contributor Author

Oh, and to respond to the LD_PRELOAD idea: I tried that, and I needed to override time(2) and it caused a lot of things to break. 🙃

rust/src/normalization.rs Outdated Show resolved Hide resolved
rust/src/normalization.rs Outdated Show resolved Hide resolved
rust/src/normalization.rs Outdated Show resolved Hide resolved
rust/src/normalization.rs Outdated Show resolved Hide resolved
rust/src/normalization.rs Outdated Show resolved Hide resolved
@lucab
Copy link
Contributor

lucab commented Oct 21, 2021

/ok-to-test

@jeamland
Copy link
Contributor Author

I can't see exactly what failed in the test that didn't pass. If you can point me at what the issue is I can look in to it.

@lucab
Copy link
Contributor

lucab commented Oct 25, 2021

@jeamland the CI worker failed to spawn. I re-triggered the job and everything is green now.

@lucab
Copy link
Contributor

lucab commented Oct 25, 2021

The logic to normalise rpmdb looks fine to me now, thanks!
It is really missing any kind of testing though. I'd suggest adding at least a unit test with a small rpmdb fixture as a starting point to manipulate.

I haven't yet looked at the BDB logic, as I will need to learn more about internal details of the format first.
In the meanwhile, I guess we can split the rpmdb commit into its own PR and merge that one soonish.

@jeamland
Copy link
Contributor Author

jeamland commented Nov 3, 2021

So there's a basic smoke test in the Rust code now. I can have a go at a more comprehensive test but I'd need a better idea of how to get the tests running locally. Is there a stepwise guide anywhere?

@cgwalters
Copy link
Member

  1. Pages used to hold data to be written out to disk are not
    zeroed prior to use. This leads to arbitrary data from the
    current process being written out to disk.

Uh, wow...really? That's rather bad. Seems almost CVE worthy.

Cargo.toml Outdated Show resolved Hide resolved
rust/src/normalization.rs Outdated Show resolved Hide resolved
rust/src/normalization.rs Outdated Show resolved Hide resolved
rust/src/normalization.rs Outdated Show resolved Hide resolved
rust/src/normalization.rs Outdated Show resolved Hide resolved
rust/src/normalization.rs Outdated Show resolved Hide resolved
rust/src/normalization.rs Outdated Show resolved Hide resolved
rust/src/normalization.rs Show resolved Hide resolved
rust/src/normalization.rs Outdated Show resolved Hide resolved
rust/src/normalization.rs Show resolved Hide resolved
rust/src/composepost.rs Outdated Show resolved Hide resolved
@cgwalters
Copy link
Member

So there's a basic smoke test in the Rust code now. I can have a go at a more comprehensive test but I'd need a better idea of how to get the tests running locally. Is there a stepwise guide anywhere?

We all tend to use something like toolbox here but it's basically sudo ./tests/compose in a container. Or a VM if you prefer.

I'd probably add it to tests/compose/test-misc-tweaks.sh.

@jeamland
Copy link
Contributor Author

jeamland commented Nov 3, 2021

  1. Pages used to hold data to be written out to disk are not
    zeroed prior to use. This leads to arbitrary data from the
    current process being written out to disk.

Uh, wow...really? That's rather bad. Seems almost CVE worthy.

If I'm reading the source right (yes, that's how I worked all this out) then what ends up there is data that may have been in other database pages. Basically the code constructs a page cache and reuses no-longer-needed pages from that so I don't think you'll get random process data in there, just random other bits of database.

RPM package headers may contain several values that are either
timestamps or derived from timestamps. These introduce variation
into the RPM database.

This patch looks for the SOURCE_DATE_EPOCH environment variable
and, if that is present, rewrites these values to match the
value it contains.
We originally used (lazy) statics to hold the value of
SOURCE_DATE_EPOCH if we were using it but this can interfere with
unit testing.
Berkeley DB has several issues that cause unreproducible builds:

1) Upon creation each file is assigned a unique ID generated
   using a mixture of process ID, current time, and some
   randomness.
2) Pages used to hold data to be written out to disk are not
   zeroed prior to use. This leads to arbitrary data from the
   current process being written out to disk.
3) Unused fields in structures are not zeroed leading to arbitrary
   stack data being written out to disk.

Replacing the unique file ID causes no issues broadly but to ensure
"sufficient" uniqueness these are replaced with a value generated
by feeding the current time or the current value of
SOURCE_DATE_EPOCH along with a partial file path into sha256 and
using the first 20 bytes as the ID.

For the other problems, areas known to be unused are found and
zeroed out.

In order to ensure no change to data, the `db_dump` utility is
run prior to any changes and the output is hashed using sha256.
After changes the `db_verify` utility is run and, assuming this
is successful, `db_dump` is re-run and the hash of the contents
is compared. Any variation is considered a failure.

This change does not look at any potential reproducibility issues
in the ndb or sqlite backends.
@jeamland
Copy link
Contributor Author

jeamland commented Nov 4, 2021

Ok, this should address all the review feedback. I haven't done any further testing stuff yet.

Copy link
Member

@cgwalters cgwalters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome work!

I have some minor nits, but really, it can come after this merges.

Thanks so much for your contribution!

rust/src/normalization.rs Show resolved Hide resolved
@cgwalters cgwalters merged commit 7fec71c into coreos:main Nov 4, 2021
@cgwalters
Copy link
Member

Followup in #3203 - @jeamland can you review when you get a chance?

@travier
Copy link
Member

travier commented Nov 8, 2021

Nice work here! Thanks!

@cgwalters
Copy link
Member

@jeamland I just want you to know I've cited this PR so far at least 3-4 times as an exemplary pull request. One of the things I love about working in FOSS is when someone appears out of the blue (from my PoV) with a big, nice improvement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants