Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Package layering #107

Closed
wants to merge 12 commits into from
Closed

Package layering #107

wants to merge 12 commits into from

Conversation

cgwalters
Copy link
Member

This is a prototype patch series - it's taking https://github.com/cgwalters/atomic-pkglayer and moving it into the rpm-ostree core. However, as the last patch notes, upgrade and status (and pkg-remove) are not yet implemented.

@cgwalters
Copy link
Member Author

I've been thinking about this more recently...obviously having to detect unsafe %post makes things a lot more painful. What we could do is try using a custom FUSE filesystem that basically did "break hardlinks on write". Doing a re-commit of this layer would basically then just be doing the (device,inode) mapping that ostree commit --link-checkout-speedup does.

We wouldn't be using FUSE to run, only to build.

@baude
Copy link

baude commented Mar 3, 2015

@cgwalters sounds like a good idea; do you have experience with fuse filesystems?

@dbnicholson
Copy link

Another idea for handling this - write a LD_PRELOAD wrapper that catches open/openat. If the call is not O_RDONLY, then do the cp/mv trick and then run open on the newly created path.

http://www.catonmat.net/blog/simple-ld-preload-tutorial-part-2/

fakeroot does something similar - http://fakeroot.alioth.debian.org/.

@cgwalters
Copy link
Member Author

@dbnicholson LD_PRELOAD is too fragile for use against anything except specific versions of applications you've tested. See the endless treadmill that https://github.com/wrpseudo/pseudo has been on as new filesystem calls (e.g. renameat2 https://lwn.net/Articles/569134/ ) have been added. That's why I'm leaning towards fuse.

@dbnicholson
Copy link

@cgwalters Definitely if you're trying to wrap any system call that can create a file, LD_PRELOAD is going to suck (been there before). However, this just has a single specific case to cover - don't write into files that have multiple hardlinks. I believe that could be covered just by catching open and openat.

That said, if you can come up with a fuse implementation, that would be better. I'll probably try the preload hack to see what it would look like, anyway.

@cgwalters
Copy link
Member Author

Ok, I did this: https://github.com/cgwalters/rofiles-fuse

Was quite easy. I think it's correct. Now to stick it in...ostree? rpm-ostree? Debating.

@dbnicholson
Copy link

Nice. I'd like it if it was in ostree itself. Then it could be used in checkout to take a commit, safely update files within it, and then make a new commit. I don't think that's possible currently.

@cgwalters
Copy link
Member Author

No major changes, just rebased on master.

@cgwalters
Copy link
Member Author

Rebased on master. But still runs from the client - we need to have this code run from the daemon.

@cgwalters
Copy link
Member Author

  • Rebased.
  • Moved the code inside the daemon

TODO:

  • Make status show this
  • Make upgrade understand packages

@jlebon
Copy link
Member

jlebon commented Dec 7, 2015

Ahhh, looks like ostree has a new dependency on yacc. Will update the test container.

@jlebon
Copy link
Member

jlebon commented Dec 7, 2015

bot, retest this please

@cgwalters
Copy link
Member Author

Teaching upgrade about this would be relatively simple if we redownloaded the packages for each update. That probably wouldn't really be acceptable for most real-world use though.

So what we need to do is efficiently cache the unpacked RPM data we get from the tree itself. The interesting problem here is that RPMs only have SELinux labels applied at install time. If policy changes, we need to apply the new label. Which means we need a physical copy of the file.
This situation should be quite unusual, but we still need to solve it, because I really don't want to have "hysteresis" in the system.

Representing each RPM as a commit tree itself is the obvious thing to do - then we store metadata such as the checksum of the SELinux policy that was used for labeling? Alternatively, we do "trust but verfiy" by unpacking the commit again, but then verifying the labels vs the current policy.

@cgwalters cgwalters force-pushed the package-layering branch 3 times, most recently from 0378099 to 4a9f20d Compare January 5, 2016 15:45
@cgwalters
Copy link
Member Author

  • Write a local commit rather than modifying the deployment in place
  • Some more tweaks to the Upgrader class (still needs work)

The next major item here is to move most of the logic into the Upgrader class so it can be shared with plain upgrade.

This builds upon the earlier prototype in
https://github.com/cgwalters/atomic-pkglayer

The `.origin` file says for a replicated installation:

    [origin]
    refspec=local:rhel-atomic-host/7/x86_64/standard

If you then run `rpm-ostree pkg-add strace`, it will result in a new tree with:

    [origin]
    baserefspec=local:rhel-atomic-host/7/x86_64/standard

    [packages]
    requested=strace;

Work still remaining here is to teach `rpm-ostree status` and
`rpm-ostree upgrade` about this.
@jlebon
Copy link
Member

jlebon commented Mar 15, 2016

OK, I pushed some more commits on my package-layering branch. Commits before 19233b9 are mostly focused on cleaning up the original package-layering branch, while the ones after start the work on migrating to the RPM import model.

Overview of the branch

  • Small cleanups to the container-builtins functions.
  • Adapt Treespec so that it fits our needs -- this is a bit abusive. We'll probably need a better solution down the line.
  • Move the package layering logic into RpmOstreeSysrootUpgrader. This means that now, any operation which does a deploy (e.g. upgrade, deploy, rebase) will automatically have the registered packages layered on top of the final deployment.

There's still a lot left to do (quoting from the TODO I added to track them all :) ):

  • check that the rpm is not already in the target deployment
  • proper selinux support
  • better UI feedback (laggy at the beginning)
  • support yum repos with vars
  • add pkg-delete
  • make upgrade also update the branches
  • support local rpm installations

Will update as I check off these items.

@cgwalters
Copy link
Member Author

At a high level I am a fan of smaller commits...but I'm a little uncertain about this small. It feels like a lot of them could be rolled up as "libpriv: Rework core API to prepare for package layering merge"

Then the commit message could include the commit messages from the relevant commits there.

Maybe what I'm arguing for here is to split out internal changes not related to layering first, change the container bits to use those, then land layering... 😎 as a layer on top?

Not a hard requirement from my POV, just if you agree. Though maybe there are shades here, as
jlebon@f0d856f
is probably still worth calling out separately, but I think
jlebon@5c1d871
is just too trivial for a single commit. WDYT?

@jlebon
Copy link
Member

jlebon commented Mar 15, 2016

Yup, that sounds good to me! Will squash the trivial ones. WDYT of the actual pkg layering commit in general?

@jlebon
Copy link
Member

jlebon commented Mar 21, 2016

OK, pushed to the branch again. Rebased on master, squashed some commits, and made some progress. We now actually check for the final set of RPMs to install in the deployment. The logic for this was a bit trickier than expected.

E.g. when you do a rebase or upgrade, if you encounter some of the previously requested pkgs (i.e. they're now part of the compose), the "requested" set must shrink. OTOH, doing two pkg-add in a row, you want the "requested" set to grow, even though during the second pkg-add you've encountered all the same pkgs previously requested.

In the end, this means that what goes in the deployment's origin's "requested" set is not necessarily the same as the actual set of pkgs we need to layer, which was my assumption previously.

Next up is tackling the SELinux issues mentioned previously. Though it's definitely in a good enough state to test and play around with.

@cgwalters
Copy link
Member Author

What do you think about submitting PRs for things that can basically be cherry-picked to master? Like jlebon@2949cbe ?

@jlebon
Copy link
Member

jlebon commented Mar 22, 2016

Pushed some new commits to rework the package layering logic a bit and added a pkg-delete.

In my initial attempt at adding pkg-delete, I realized that it was not a trivial task since we keep layering on top of checkouts already containing layered packages. Thus, I changed the logic so that the layered commit has for parent the "base" commit. Now, whenever we want to modify the set of pkgs to layer, we re-apply it on top of the base commit.

The downside is that we now do slightly more work when doing multiple pkg-add (although unpacking is not really the bottleneck). The upside is that it becomes trivial to implement pkg-delete and it makes the pkg layering code much easier to follow.

@jlebon
Copy link
Member

jlebon commented Mar 31, 2016

OK, took an initial stab at the SELinux code in the latest commit (jlebon/rpm-ostree@98b6747). We now pick up the sepolicy from the merge deployment and lookup the label as we unpack. We also record the policy csum in the commit metadata. It's still early, so the way these things are implemented might change (e.g. it'd be nice to push some of the work into ostree instead).

Next up is cache validation based on the policy csum.

@jlebon
Copy link
Member

jlebon commented Apr 4, 2016

Another SELinux issue here is how we deal with packages which bring their own pps. These are normally installed in %post, which we don't run.

It seems like we'll have to be heuristicky here and assume that all pps the pkg adds to /usr/share/selinux were meant to be installed and then install them and recompile the policy in the tmprootfs. We will probably have to extend the OstreeSePolicy class for this.

There's also a sort of convergence issue with this since we label stuff with the new sepolicy as we import the pkg. Since we'll only find out about the new pps and recompile after we're done unpacking, we might have to re-unpack and relabel the pkg.

Maybe we can just do a quick scan of all the archive entries first to check for pps. Or maybe it would be easier to just unpack to a tmprootfs rather than directly committing from libarchive to ostree (maybe that will help us get rid of the libarchive-input-stream.c copy/paste as well?).

And of course, yet another possible issue with this is #27.

@jlebon
Copy link
Member

jlebon commented Apr 4, 2016

OK pushed some more commits on my branch. I had also rebased on master, though I think it's already out of sync. To summarize, we now mark the csum of the policy during pkg import. Then during overlay, if we see a package that requires relabeling (i.e. target policy csum != import csum), we check it out, relabel, and add a commit to the package branch. More in-depth testing is definitely required, but it works well so far.

One interesting "issue" is that after writing the new deployment, since we call ostree_sysroot_cleanup(), only the HEAD of the RPM branches is kept, which means that we only ever keep one version ("version" here is wrt to the sepolicy, not RPM versioning) of each branch. In theory, this means that if the user ever deploys on a deployment that has an older sepolicy, we will not find a matching commit and thus recompute the labels. In practice, sepolicy versions will most likely be moving forward and may not always change on every new deployment. Though the code is already there to support this search-up-the-branch logic, so it might be worthwhile to at least keep e.g. 2 or 3 commits.

I added a couple of things in the TODO file to track pending items. Of those, these top three are probably hard reqs for a seamless user experience, so I'll be working on those now:

  • Proper labeling of parent directories during initial unpacking
  • Support pkgs which bring their own pps
    Package layering #107 (comment)
  • Support yum repos with variables in them

That said, I'd like to get what's there merged (but maybe with the pkg-* commands hidden) so that (1) it will be less of a pain to rebase everytime, and (2) it will be easier for others to test (e.g. we could probably announce it at least on atomic-devel). I can clean up and squash some commits soon if we want to go that way.

@jlebon
Copy link
Member

jlebon commented Apr 5, 2016

The issue above made me think about whether we should keep unpacked packages in a child repo so that we have better control over them. It seems like it just makes sense to have /sysroot/ostree/repo only contain things we want ostree to manage. There are of course solutions we could come up with for this specific pruning problem (e.g. maybe introducing a concept of ref ownership in ostree to say "don't touch!"), though philosophically keeping them separate might be better on the long term. @cgwalters WDYT?

@jlebon
Copy link
Member

jlebon commented Apr 6, 2016

The issue above made me think about whether we should keep unpacked packages in a child repo so that we have better control over them.

Just to see what it would look like, I did this in jlebon@b914f30. We store the unpacked pkgs in a child repo at /ostree/repo/tmp/rpmostree-pkgcache. It was slightly complicated by the fact that the final assembled commit has to be "pushed up" to the parent repo so that it can be deployed. That should become less painful by adding a new option to the ostree pull API.

The advantages of this method are:

  • Doesn't litter the main ostree repo
  • ostree does not have access to it, so e.g. regular ref prunings or even ostree admin cleanup doesn't blow away the cache. This also means that we are now responsible for our own GC, which can be made smarter to only throw out things we know we won't need.
  • It reflects the "cache" aspect of it. The whole repo could be blown away and things would just work as usual.

It should be an easy patch to revert down the line, so I'll just try it out this way and see what it gives.

@cgwalters
Copy link
Member Author

The packages with custom policies is indeed a nasty circular dependency. I don't even know how that works with rpm. Offhand I think librpm reloads its file context matching any time policy is loaded, but in this case it must have already laid out the files on disk.

Oh wait, librpm is probably relying on transaction ordering here. docker-selinux is a separate package, and docker Requires: %{repo}-selinux = %{epoch}:%{version}-%{release}.

This means we can't parallelize imports =(

@cgwalters
Copy link
Member Author

BTW sorry I didn't respond here earlier, I missed all of these notifications recently I think due to a flood of other notifications (kubernetes/contrib changed, the ADB people got very active).

@cgwalters
Copy link
Member Author

Child repo is OK by me. It's an interesting concept whether to model things like this as child repos or "prefixed branches" in a single repo. Offhand it seems the child repo is easier for management and avoids GC/locking issues, but reduces sharing.

My instinct here is sharing is mostly useful if you expect things to be shared. For example if one has both a Debian and Fedora trees, they probably share very little at an actual binary level. But, an imported Fedora Docker base image probably shares a lot with the host system (assuming same major version).

So that would argue for projectatomic/atomic#298 to use the primary repo, and not be a child repo.

Hmm, actually I just realized...GC in parent repos is unaware of child repos. It should probably work a bit more like git clone --local, not git clone --shared by default.

@cgwalters
Copy link
Member Author

@cgwalters
Copy link
Member Author

One hack that would probably get us pretty far is to require a marker on any packages that include selinux policy, where an initial definition of "marker" is "has selinux- as a prefix or -selinux as a suffix". So we special case unpacking those first into a "policy labeling root", load the regexps from there, then resume importing everything else.

@cgwalters
Copy link
Member Author

There's two major pieces of work here. First is to allow layering "simple" packages like strace.

Let's take a value of "simple" = "no %post". That covers things like useradd as well as custom SELinux .pp files (since for traditional RPM one needs a %post to recompile the policy).

It seems very reasonable to land that in master now. We can try to change things to error out if we detect a %post - I used to have code to do this.

Then the branch continues for handling the fully general case of server-side treecompose, which has to handle everything. That fully general case also fixes things like installing a custom docker with new selinux policy.

@jlebon
Copy link
Member

jlebon commented Apr 7, 2016

Oh wait, librpm is probably relying on transaction ordering here. docker-selinux is a separate package, and docker Requires: %{repo}-selinux = %{epoch}:%{version}-%{release}.

Hmm interesting, I didn't make that connection before, but that makes a lot of sense.

This means we can't parallelize imports =(

Yeah :(. The upside I guess is that it gives a way to handle this properly. We can use librpm to sort the pkgs to import (like we do during assembly), then during each unpacking use a flag to determine whether we have to recompile the policy before unpacking the next pkg.

Oh wait, since we're only importing a subset of all the pkgs in the transaction, I wonder if rpmtsOrder() can fail sometimes.

Hmm, actually I just realized...GC in parent repos is unaware of child repos. It should probably work a bit more like git clone --local, not git clone --shared by default.

Ouch, yeah that's an issue. Though I guess it doesn't affect us here since we never actually look up objects in the parent repo (other than the files we just checked out from it into the tmprootfs) and only really "push up".

Let's take a value of "simple" = "no %post". That covers things like useradd as well as custom SELinux .pp files (since for traditional RPM one needs a %post to recompile the policy).

It seems very reasonable to land that in master now. We can try to change things to error out if we detect a %post - I used to have code to do this.

OK that's a good way to break this up and get more early testers. Maybe we can allow but print a warning if we detect a %post?

Almost about to land a rework of the unpacker which does proper labeling of parent dirs. Afterwards I'll take a look at making libhif work with repo vars. I don't expect that to take too long, but if there's a blocking factor there, I'll just clean up my branch and get it ready for a merge.

@jlebon
Copy link
Member

jlebon commented Apr 7, 2016

Pushed rpmostree-unpacker rework: jlebon@72ad56c

rpm-ostree internals unpack CLI:

  • Fix --to-ostree-repo option entry
  • Add --selinux flag

rpmostree_unpacker:

  • Make rpmfi_overrides instance member
  • Build rpmfi_overrides on init
  • Fix how we use rpmfiInit() & rpmfiNext()
  • Add overrides lookup helpers
  • Generalize write_directory_meta() to also take sepolicy
  • Label root dir and parent dirs as needed

@cgwalters
Copy link
Member Author

WDYT about pushing this to the upstream repo as package-layering so we can both push to it?

@jlebon
Copy link
Member

jlebon commented Apr 7, 2016

@cgwalters Sounds good, pushed it.

@jlebon
Copy link
Member

jlebon commented Apr 12, 2016

Pushed a major rework of the unpacker: 8baffc0

This is a refactor of a big chunk of the unpacker. It adds proper
selinux labeling support to the unpack_to_dfd path and also solves a
number of subtle issues. The biggest issue solved is proper hardlink
handling. We previously assumed that the data payload was attached to
the first entry in a series of hardlinks. In fact, the payload is in the
*last* entry.  This resulted in any multilinked files all having size 0.

In the unpack_to_dfd path, this works great since we can just re-open
the file at the last hardlink entry to write the actual data. I also
tried implementing this for the unpack_to_ostree path, which was more
complicated since we have to defer imports until we hit the final
hardlink (which requires heuristics to detect).

In the end, I switch the unpack_to_ostree path to simply use a tmprootfs
and then make it call out to unpack_to_dfd. This is less efficient, but
greatly simplifies the whole unpacker code, as well as allowing us to
get rid of the copynpaste stuff.

Eventually, we should probably solve this properly upstream in
ostree_repo_write_archive_to_mtree() and then switch the
unpack_to_ostree codepath to use that.

@cgwalters
Copy link
Member Author

re hardlinks, oops! Hmm...i'm not sure about unpacking to a tmprootfs. That has a few problems, such as the fact that for e.g. setuid binaries they'll exist on the host temporarily. Also, it's harder to run as unprivileged - it means we'd try to chown files to uid 0 as non-root, etc.

I think I'd be more in favor of dropping the unpack_to_dfd path, since it's mostly for debugging. Anyone who wants to unpack an rpm by hand already has rpm2cpio and things like that.

For hardlink support...it sounds like you're right this is a bug in the ostree-libarchive bits. I think we'd have to do something like hold a reference to mtrees for each hardlink target, and when we find the actual content, update the target mtrees?

@jlebon
Copy link
Member

jlebon commented May 20, 2016

Closing in favour of #289.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants