Package layering #107

cgwalters · 2015-02-12T18:46:17Z

This is a prototype patch series - it's taking https://github.com/cgwalters/atomic-pkglayer and moving it into the rpm-ostree core. However, as the last patch notes, upgrade and status (and pkg-remove) are not yet implemented.

cgwalters · 2015-02-12T18:48:03Z

This patch series depends on:

cgwalters · 2015-03-03T16:58:10Z

I've been thinking about this more recently...obviously having to detect unsafe %post makes things a lot more painful. What we could do is try using a custom FUSE filesystem that basically did "break hardlinks on write". Doing a re-commit of this layer would basically then just be doing the (device,inode) mapping that ostree commit --link-checkout-speedup does.

We wouldn't be using FUSE to run, only to build.

baude · 2015-03-03T17:47:33Z

@cgwalters sounds like a good idea; do you have experience with fuse filesystems?

dbnicholson · 2015-04-02T23:53:54Z

Another idea for handling this - write a LD_PRELOAD wrapper that catches open/openat. If the call is not O_RDONLY, then do the cp/mv trick and then run open on the newly created path.

http://www.catonmat.net/blog/simple-ld-preload-tutorial-part-2/

fakeroot does something similar - http://fakeroot.alioth.debian.org/.

cgwalters · 2015-04-07T13:24:01Z

@dbnicholson LD_PRELOAD is too fragile for use against anything except specific versions of applications you've tested. See the endless treadmill that https://github.com/wrpseudo/pseudo has been on as new filesystem calls (e.g. renameat2 https://lwn.net/Articles/569134/ ) have been added. That's why I'm leaning towards fuse.

dbnicholson · 2015-04-07T16:18:39Z

@cgwalters Definitely if you're trying to wrap any system call that can create a file, LD_PRELOAD is going to suck (been there before). However, this just has a single specific case to cover - don't write into files that have multiple hardlinks. I believe that could be covered just by catching open and openat.

That said, if you can come up with a fuse implementation, that would be better. I'll probably try the preload hack to see what it would look like, anyway.

cgwalters · 2015-04-10T19:10:23Z

Ok, I did this: https://github.com/cgwalters/rofiles-fuse

Was quite easy. I think it's correct. Now to stick it in...ostree? rpm-ostree? Debating.

dbnicholson · 2015-04-10T20:19:57Z

Nice. I'd like it if it was in ostree itself. Then it could be used in checkout to take a commit, safely update files within it, and then make a new commit. I don't think that's possible currently.

cgwalters · 2015-04-12T13:28:20Z

No major changes, just rebased on master.

cgwalters · 2015-10-14T02:05:54Z

Rebased on master. But still runs from the client - we need to have this code run from the daemon.

cgwalters · 2015-12-07T14:30:18Z

Rebased.
Moved the code inside the daemon

TODO:

Make status show this
Make upgrade understand packages

jlebon · 2015-12-07T15:08:44Z

Ahhh, looks like ostree has a new dependency on yacc. Will update the test container.

jlebon · 2015-12-07T15:57:29Z

bot, retest this please

cgwalters · 2015-12-14T00:01:24Z

Teaching upgrade about this would be relatively simple if we redownloaded the packages for each update. That probably wouldn't really be acceptable for most real-world use though.

So what we need to do is efficiently cache the unpacked RPM data we get from the tree itself. The interesting problem here is that RPMs only have SELinux labels applied at install time. If policy changes, we need to apply the new label. Which means we need a physical copy of the file.
This situation should be quite unusual, but we still need to solve it, because I really don't want to have "hysteresis" in the system.

Representing each RPM as a commit tree itself is the obvious thing to do - then we store metadata such as the checksum of the SELinux policy that was used for labeling? Alternatively, we do "trust but verfiy" by unpacking the commit again, but then verifying the labels vs the current policy.

cgwalters · 2016-01-05T15:53:42Z

Write a local commit rather than modifying the deployment in place
Some more tweaks to the Upgrader class (still needs work)

The next major item here is to move most of the logic into the Upgrader class so it can be shared with plain upgrade.

This builds upon the earlier prototype in https://github.com/cgwalters/atomic-pkglayer The `.origin` file says for a replicated installation: [origin] refspec=local:rhel-atomic-host/7/x86_64/standard If you then run `rpm-ostree pkg-add strace`, it will result in a new tree with: [origin] baserefspec=local:rhel-atomic-host/7/x86_64/standard [packages] requested=strace; Work still remaining here is to teach `rpm-ostree status` and `rpm-ostree upgrade` about this.

jlebon · 2016-03-15T14:50:49Z

OK, I pushed some more commits on my package-layering branch. Commits before 19233b9 are mostly focused on cleaning up the original package-layering branch, while the ones after start the work on migrating to the RPM import model.

Overview of the branch

Small cleanups to the container-builtins functions.
Adapt Treespec so that it fits our needs -- this is a bit abusive. We'll probably need a better solution down the line.
Move the package layering logic into RpmOstreeSysrootUpgrader. This means that now, any operation which does a deploy (e.g. upgrade, deploy, rebase) will automatically have the registered packages layered on top of the final deployment.

There's still a lot left to do (quoting from the TODO I added to track them all :) ):

check that the rpm is not already in the target deployment
proper selinux support
better UI feedback (laggy at the beginning)
support yum repos with vars
add pkg-delete
make upgrade also update the branches
support local rpm installations

Will update as I check off these items.

cgwalters · 2016-03-15T18:38:06Z

At a high level I am a fan of smaller commits...but I'm a little uncertain about this small. It feels like a lot of them could be rolled up as "libpriv: Rework core API to prepare for package layering merge"

Then the commit message could include the commit messages from the relevant commits there.

Maybe what I'm arguing for here is to split out internal changes not related to layering first, change the container bits to use those, then land layering... 😎 as a layer on top?

Not a hard requirement from my POV, just if you agree. Though maybe there are shades here, as
jlebon@f0d856f
is probably still worth calling out separately, but I think
jlebon@5c1d871
is just too trivial for a single commit. WDYT?

jlebon · 2016-03-15T18:46:28Z

Yup, that sounds good to me! Will squash the trivial ones. WDYT of the actual pkg layering commit in general?

jlebon · 2016-03-21T15:39:21Z

OK, pushed to the branch again. Rebased on master, squashed some commits, and made some progress. We now actually check for the final set of RPMs to install in the deployment. The logic for this was a bit trickier than expected.

E.g. when you do a rebase or upgrade, if you encounter some of the previously requested pkgs (i.e. they're now part of the compose), the "requested" set must shrink. OTOH, doing two pkg-add in a row, you want the "requested" set to grow, even though during the second pkg-add you've encountered all the same pkgs previously requested.

In the end, this means that what goes in the deployment's origin's "requested" set is not necessarily the same as the actual set of pkgs we need to layer, which was my assumption previously.

Next up is tackling the SELinux issues mentioned previously. Though it's definitely in a good enough state to test and play around with.

cgwalters · 2016-03-21T19:11:10Z

What do you think about submitting PRs for things that can basically be cherry-picked to master? Like jlebon@2949cbe ?

jlebon · 2016-03-22T18:20:02Z

Pushed some new commits to rework the package layering logic a bit and added a pkg-delete.

In my initial attempt at adding pkg-delete, I realized that it was not a trivial task since we keep layering on top of checkouts already containing layered packages. Thus, I changed the logic so that the layered commit has for parent the "base" commit. Now, whenever we want to modify the set of pkgs to layer, we re-apply it on top of the base commit.

The downside is that we now do slightly more work when doing multiple pkg-add (although unpacking is not really the bottleneck). The upside is that it becomes trivial to implement pkg-delete and it makes the pkg layering code much easier to follow.

jlebon · 2016-03-31T03:54:58Z

OK, took an initial stab at the SELinux code in the latest commit (jlebon/rpm-ostree@98b6747). We now pick up the sepolicy from the merge deployment and lookup the label as we unpack. We also record the policy csum in the commit metadata. It's still early, so the way these things are implemented might change (e.g. it'd be nice to push some of the work into ostree instead).

Next up is cache validation based on the policy csum.

jlebon · 2016-04-04T00:25:15Z

Another SELinux issue here is how we deal with packages which bring their own pps. These are normally installed in %post, which we don't run.

It seems like we'll have to be heuristicky here and assume that all pps the pkg adds to /usr/share/selinux were meant to be installed and then install them and recompile the policy in the tmprootfs. We will probably have to extend the OstreeSePolicy class for this.

There's also a sort of convergence issue with this since we label stuff with the new sepolicy as we import the pkg. Since we'll only find out about the new pps and recompile after we're done unpacking, we might have to re-unpack and relabel the pkg.

Maybe we can just do a quick scan of all the archive entries first to check for pps. Or maybe it would be easier to just unpack to a tmprootfs rather than directly committing from libarchive to ostree (maybe that will help us get rid of the libarchive-input-stream.c copy/paste as well?).

And of course, yet another possible issue with this is #27.

jlebon · 2016-04-04T21:42:29Z

OK pushed some more commits on my branch. I had also rebased on master, though I think it's already out of sync. To summarize, we now mark the csum of the policy during pkg import. Then during overlay, if we see a package that requires relabeling (i.e. target policy csum != import csum), we check it out, relabel, and add a commit to the package branch. More in-depth testing is definitely required, but it works well so far.

One interesting "issue" is that after writing the new deployment, since we call ostree_sysroot_cleanup(), only the HEAD of the RPM branches is kept, which means that we only ever keep one version ("version" here is wrt to the sepolicy, not RPM versioning) of each branch. In theory, this means that if the user ever deploys on a deployment that has an older sepolicy, we will not find a matching commit and thus recompute the labels. In practice, sepolicy versions will most likely be moving forward and may not always change on every new deployment. Though the code is already there to support this search-up-the-branch logic, so it might be worthwhile to at least keep e.g. 2 or 3 commits.

I added a couple of things in the TODO file to track pending items. Of those, these top three are probably hard reqs for a seamless user experience, so I'll be working on those now:

Proper labeling of parent directories during initial unpacking
Support pkgs which bring their own pps
Package layering #107 (comment)
Support yum repos with variables in them

That said, I'd like to get what's there merged (but maybe with the pkg-* commands hidden) so that (1) it will be less of a pain to rebase everytime, and (2) it will be easier for others to test (e.g. we could probably announce it at least on atomic-devel). I can clean up and squash some commits soon if we want to go that way.

jlebon · 2016-04-05T02:38:34Z

The issue above made me think about whether we should keep unpacked packages in a child repo so that we have better control over them. It seems like it just makes sense to have /sysroot/ostree/repo only contain things we want ostree to manage. There are of course solutions we could come up with for this specific pruning problem (e.g. maybe introducing a concept of ref ownership in ostree to say "don't touch!"), though philosophically keeping them separate might be better on the long term. @cgwalters WDYT?

jlebon · 2016-04-06T16:44:37Z

The issue above made me think about whether we should keep unpacked packages in a child repo so that we have better control over them.

Just to see what it would look like, I did this in jlebon@b914f30. We store the unpacked pkgs in a child repo at /ostree/repo/tmp/rpmostree-pkgcache. It was slightly complicated by the fact that the final assembled commit has to be "pushed up" to the parent repo so that it can be deployed. That should become less painful by adding a new option to the ostree pull API.

The advantages of this method are:

Doesn't litter the main ostree repo
ostree does not have access to it, so e.g. regular ref prunings or even ostree admin cleanup doesn't blow away the cache. This also means that we are now responsible for our own GC, which can be made smarter to only throw out things we know we won't need.
It reflects the "cache" aspect of it. The whole repo could be blown away and things would just work as usual.

It should be an easy patch to revert down the line, so I'll just try it out this way and see what it gives.

cgwalters · 2016-04-07T14:31:31Z

The packages with custom policies is indeed a nasty circular dependency. I don't even know how that works with rpm. Offhand I think librpm reloads its file context matching any time policy is loaded, but in this case it must have already laid out the files on disk.

Oh wait, librpm is probably relying on transaction ordering here. docker-selinux is a separate package, and docker Requires: %{repo}-selinux = %{epoch}:%{version}-%{release}.

This means we can't parallelize imports =(

cgwalters · 2016-04-07T14:32:42Z

BTW sorry I didn't respond here earlier, I missed all of these notifications recently I think due to a flood of other notifications (kubernetes/contrib changed, the ADB people got very active).

cgwalters · 2016-04-07T14:52:53Z

Child repo is OK by me. It's an interesting concept whether to model things like this as child repos or "prefixed branches" in a single repo. Offhand it seems the child repo is easier for management and avoids GC/locking issues, but reduces sharing.

My instinct here is sharing is mostly useful if you expect things to be shared. For example if one has both a Debian and Fedora trees, they probably share very little at an actual binary level. But, an imported Fedora Docker base image probably shares a lot with the host system (assuming same major version).

So that would argue for projectatomic/atomic#298 to use the primary repo, and not be a child repo.

Hmm, actually I just realized...GC in parent repos is unaware of child repos. It should probably work a bit more like git clone --local, not git clone --shared by default.

cgwalters · 2016-04-07T15:04:39Z

https://bugzilla.gnome.org/show_bug.cgi?id=764735

cgwalters · 2016-04-07T15:13:36Z

One hack that would probably get us pretty far is to require a marker on any packages that include selinux policy, where an initial definition of "marker" is "has selinux- as a prefix or -selinux as a suffix". So we special case unpacking those first into a "policy labeling root", load the regexps from there, then resume importing everything else.

cgwalters · 2016-04-07T15:24:57Z

There's two major pieces of work here. First is to allow layering "simple" packages like strace.

Let's take a value of "simple" = "no %post". That covers things like useradd as well as custom SELinux .pp files (since for traditional RPM one needs a %post to recompile the policy).

It seems very reasonable to land that in master now. We can try to change things to error out if we detect a %post - I used to have code to do this.

Then the branch continues for handling the fully general case of server-side treecompose, which has to handle everything. That fully general case also fixes things like installing a custom docker with new selinux policy.

jlebon · 2016-04-07T15:48:09Z

Oh wait, librpm is probably relying on transaction ordering here. docker-selinux is a separate package, and docker Requires: %{repo}-selinux = %{epoch}:%{version}-%{release}.

Hmm interesting, I didn't make that connection before, but that makes a lot of sense.

This means we can't parallelize imports =(

Yeah :(. The upside I guess is that it gives a way to handle this properly. We can use librpm to sort the pkgs to import (like we do during assembly), then during each unpacking use a flag to determine whether we have to recompile the policy before unpacking the next pkg.

Oh wait, since we're only importing a subset of all the pkgs in the transaction, I wonder if rpmtsOrder() can fail sometimes.

Hmm, actually I just realized...GC in parent repos is unaware of child repos. It should probably work a bit more like git clone --local, not git clone --shared by default.

Ouch, yeah that's an issue. Though I guess it doesn't affect us here since we never actually look up objects in the parent repo (other than the files we just checked out from it into the tmprootfs) and only really "push up".

Let's take a value of "simple" = "no %post". That covers things like useradd as well as custom SELinux .pp files (since for traditional RPM one needs a %post to recompile the policy).

It seems very reasonable to land that in master now. We can try to change things to error out if we detect a %post - I used to have code to do this.

OK that's a good way to break this up and get more early testers. Maybe we can allow but print a warning if we detect a %post?

Almost about to land a rework of the unpacker which does proper labeling of parent dirs. Afterwards I'll take a look at making libhif work with repo vars. I don't expect that to take too long, but if there's a blocking factor there, I'll just clean up my branch and get it ready for a merge.

jlebon · 2016-04-07T19:56:11Z

Pushed rpmostree-unpacker rework: jlebon@72ad56c

rpm-ostree internals unpack CLI:

Fix --to-ostree-repo option entry

Add --selinux flag

rpmostree_unpacker:

Make rpmfi_overrides instance member

Build rpmfi_overrides on init

Fix how we use rpmfiInit() & rpmfiNext()

Add overrides lookup helpers

Generalize write_directory_meta() to also take sepolicy

Label root dir and parent dirs as needed

cgwalters · 2016-04-07T20:14:47Z

WDYT about pushing this to the upstream repo as package-layering so we can both push to it?

jlebon · 2016-04-07T20:24:43Z

@cgwalters Sounds good, pushed it.

jlebon · 2016-04-12T21:26:31Z

Pushed a major rework of the unpacker: 8baffc0

This is a refactor of a big chunk of the unpacker. It adds proper
selinux labeling support to the unpack_to_dfd path and also solves a
number of subtle issues. The biggest issue solved is proper hardlink
handling. We previously assumed that the data payload was attached to
the first entry in a series of hardlinks. In fact, the payload is in the
*last* entry.  This resulted in any multilinked files all having size 0.

In the unpack_to_dfd path, this works great since we can just re-open
the file at the last hardlink entry to write the actual data. I also
tried implementing this for the unpack_to_ostree path, which was more
complicated since we have to defer imports until we hit the final
hardlink (which requires heuristics to detect).

In the end, I switch the unpack_to_ostree path to simply use a tmprootfs
and then make it call out to unpack_to_dfd. This is less efficient, but
greatly simplifies the whole unpacker code, as well as allowing us to
get rid of the copynpaste stuff.

Eventually, we should probably solve this properly upstream in
ostree_repo_write_archive_to_mtree() and then switch the
unpack_to_ostree codepath to use that.

cgwalters · 2016-04-13T13:56:20Z

re hardlinks, oops! Hmm...i'm not sure about unpacking to a tmprootfs. That has a few problems, such as the fact that for e.g. setuid binaries they'll exist on the host temporarily. Also, it's harder to run as unprivileged - it means we'd try to chown files to uid 0 as non-root, etc.

I think I'd be more in favor of dropping the unpack_to_dfd path, since it's mostly for debugging. Anyone who wants to unpack an rpm by hand already has rpm2cpio and things like that.

For hardlink support...it sounds like you're right this is a bug in the ostree-libarchive bits. I think we'd have to do something like hold a reference to mtrees for each hardlink target, and when we find the actual content, update the target mtrees?

jlebon · 2016-05-20T19:05:36Z

Closing in favour of #289.

cgwalters mentioned this pull request Mar 24, 2015

Support a hotfix process #119

Closed

cgwalters mentioned this pull request Apr 7, 2015

transaction: Add HIF_TRANSACTION_FLAG_OSTREE_MODE rpm-software-management/libdnf#39

Closed

cgwalters mentioned this pull request Apr 8, 2015

Failure to try other mirrors or retry package downloads #128

Closed

cgwalters force-pushed the package-layering branch from 7c1d7ae to da1df77 Compare April 12, 2015 13:28

cgwalters added the WIP label Apr 16, 2015

cgwalters mentioned this pull request May 4, 2015

WIP: Start of work on daemon #116

Closed

cgwalters force-pushed the package-layering branch from da1df77 to 01f13c0 Compare August 24, 2015 17:28

cgwalters force-pushed the package-layering branch from 01f13c0 to 654f33b Compare October 14, 2015 02:05

cgwalters force-pushed the package-layering branch from 654f33b to 39fe5ec Compare December 6, 2015 13:25

cgwalters force-pushed the package-layering branch 3 times, most recently from 0378099 to 4a9f20d Compare January 5, 2016 15:45

cgwalters added 3 commits January 10, 2016 09:31

WIP: port to daemon

eef3439

fixup! WIP: port to daemon

faa5fc4

cgwalters mentioned this pull request Apr 14, 2016

OstreeRepo: add extensions directory ostreedev/ostree#257

Closed

jlebon mentioned this pull request Apr 22, 2016

[merged] Rework libarchive import support ostreedev/ostree#275

Closed

jlebon mentioned this pull request May 20, 2016

[merged] Package layering #289

Closed

jlebon closed this May 20, 2016

jlebon mentioned this pull request Jun 2, 2016

Improving the rpm-ostree testsuite #300

Closed

cgwalters mentioned this pull request Aug 25, 2016

pruning tries to traverse into parent repos ostreedev/ostree#467

Closed

Package layering #107

Package layering #107

Conversation

cgwalters commented Feb 12, 2015

cgwalters commented Feb 12, 2015

cgwalters commented Mar 3, 2015

baude commented Mar 3, 2015

dbnicholson commented Apr 2, 2015

cgwalters commented Apr 7, 2015

dbnicholson commented Apr 7, 2015

cgwalters commented Apr 10, 2015

dbnicholson commented Apr 10, 2015

cgwalters commented Apr 12, 2015

cgwalters commented Oct 14, 2015

cgwalters commented Dec 7, 2015

jlebon commented Dec 7, 2015

jlebon commented Dec 7, 2015

cgwalters commented Dec 14, 2015

cgwalters commented Jan 5, 2016

jlebon commented Mar 15, 2016

cgwalters commented Mar 15, 2016

jlebon commented Mar 15, 2016

jlebon commented Mar 21, 2016

cgwalters commented Mar 21, 2016

jlebon commented Mar 22, 2016

jlebon commented Mar 31, 2016

jlebon commented Apr 4, 2016

jlebon commented Apr 4, 2016

jlebon commented Apr 5, 2016

jlebon commented Apr 6, 2016

cgwalters commented Apr 7, 2016

cgwalters commented Apr 7, 2016

cgwalters commented Apr 7, 2016

cgwalters commented Apr 7, 2016

cgwalters commented Apr 7, 2016

cgwalters commented Apr 7, 2016

jlebon commented Apr 7, 2016

jlebon commented Apr 7, 2016

cgwalters commented Apr 7, 2016

jlebon commented Apr 7, 2016

jlebon commented Apr 12, 2016

cgwalters commented Apr 13, 2016

jlebon commented May 20, 2016