Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix oci whiteout implementation #8

Open
allisonkarlitskaya opened this issue Oct 14, 2024 · 5 comments
Open

Fix oci whiteout implementation #8

allisonkarlitskaya opened this issue Oct 14, 2024 · 5 comments

Comments

@allisonkarlitskaya
Copy link
Collaborator

I implemented whiteouts in the image-merging code without having read the spec.

https://github.com/opencontainers/image-spec/blob/main/layer.md

There are two things that need fixing:

  • Whiteout files MUST only apply to resources in lower/parent layers.
  • Files that are present in the same layer as a whiteout file can only be hidden by whiteout files in subsequent layers.

When processing the second layer, a/.wh..wh..opq is applied first, before creating the new version of a/b, regardless of the ordering in which the whiteout file was encountered.

Implementations SHOULD generate layers such that the whiteout files appear before sibling directory entries.

If we encounter a whiteout file in a layer tarball after that file was added in the same tarball we're going to delete it again. We could argue that the tarball is not a valid layer because it attempts to do something that the spec says that it shouldn't, but at the very least we should try to detect this situation, and probably we do need to support it (since a SHOULD is only a SHOULD).

This is going to be very difficult. The good news: this isn't a problem as long as we're only dealing with layers produced in the way that they SHOULD have been.

  • In addition to expressing that a single entry should be removed from a lower layer, layers may remove all of the children using an opaque whiteout entry.
  • An opaque whiteout entry is a file with the name .wh..wh..opq indicating that all siblings are hidden in the lower layer.

We just need to add support for this (but need to be careful about the interaction with the above). Assuming we've managed to solve the above issue, this issue will be trivial.

@allisonkarlitskaya allisonkarlitskaya changed the title Fix oci without implementation Fix oci whiteout implementation Oct 14, 2024
@allisonkarlitskaya
Copy link
Collaborator Author

A simple idea would be to add generation counters to our DirEnt structs, where the generation is equal to the sequence of the layer that introduced the entry. Whiteouts would then only apply to DirEnts that aren't in the current generation.

I'd honestly prefer to find a better way to deal with this, because this approach inserts some weird kind of non-normative metadata into a data structure which we use as part of a chain to produce results that must be deterministic.

Another idea is that during the extraction of each layer, we could maintain a HashSet of addresses of the entries we created during that step, using it in a similar way to the generation counter. We have to be careful with this approach: if we used the addresses of the DirEnt structs for example we would be in trouble since those are subject to reallocation. The fact that we use Box<Directory> and Rc<Leaf> might help us here, since both of those have stable spots on the heap.

Similarly, we could also add the involved filenames to the set. It would be nice if we could avoid that one, though, since it's quite a lot "heavier".

@allisonkarlitskaya
Copy link
Collaborator Author

I just read a bit about how mutable modifies to the contents of Boxes and Rcs can potentially change their address. I think we don't do that, but in the case of Directory I'm less than 100% sure (since we modify the DirEnt vector a lot). Apparently std::pin::Pin is a thing.

@allisonkarlitskaya
Copy link
Collaborator Author

allisonkarlitskaya commented Oct 14, 2024

Okay, Pin<> is pretty easy to use. It also makes me feel better to have Pin<Rc<Leaf>> since we already take the addresses of those for hardlink computations...

@allisonkarlitskaya
Copy link
Collaborator Author

Annoyingly, not available in stable yet: rust-lang/rust#129090.

I'm sure it's possible to work around this issue, anyway...

@allisonkarlitskaya
Copy link
Collaborator Author

The .wh..wh.opq thing is implemented now, so all that's left is the "tarball in the wrong order" thing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant