Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zstd:chunked metadata ambiguity #2014

Closed
mtrmac opened this issue Jul 11, 2024 · 7 comments · Fixed by #2035
Closed

zstd:chunked metadata ambiguity #2014

mtrmac opened this issue Jul 11, 2024 · 7 comments · Fixed by #2035
Labels
area/zstd:chunked Issues relating to zstd:chunked

Comments

@mtrmac
Copy link
Collaborator

mtrmac commented Jul 11, 2024

Filing separately, earlier discussion around #1888 (comment) :

zstd:chunked layers, when pulling, contain metadata in three places:

  • The ordinary uncompressed tar format
  • TOC metadata
  • tar-split metadata

When pulling, we ignore the uncompressed tar, build files from TOC metadata, and record tar-split

When pushing, we ignore the metadata of individual files, and use the tar-split metadata.

The net outcome is that a user can “pull; inspect; push”, and the pushed metadata will be different from what the user saw.

I think this_must_ be addressed.


Vaguely, I think that could either happen by having the tar-split “drive” the chunked pull, using the TOC only to look up data; or by overwriting the TOC metadata by the tar-split data.

The latter seems a bit easier because the “push” compression code already has a tar → TOC conversion code, so we would “only” need to read through tar-split; the former is conceptually nicer because, hypothetically, we could eventually only have a single “apply tar header → filesystem metadata” code for all of c/storage, instead of now having “tar → metadata” for ordinary overlay, “TOC → metadata” for chunked, and things like #1653 (comment) — but it would probably be more disruptive and not practical within current short time limits.

Cc: @giuseppe — I’ll be looking at this over the next few days, but I’d very much appreciate any insight, advice, or help.

@cgwalters cgwalters added the area/zstd:chunked Issues relating to zstd:chunked label Jul 11, 2024
@cgwalters
Copy link
Contributor

cgwalters commented Jul 12, 2024

The net outcome is that a user can “pull; inspect; push”, and the pushed metadata will be different from what the user saw.

In what scenarios though, more precisely?

I think this_must_ be addressed.

Can you clarify the primary concern - is it something like e.g. somehow losing xattrs or capability bits accidentally?

In a previous issue we were thinking about the "representational ambiguity" problem from the perspective of container image scanners and the like, who could be easily confused by actively maliciously crafted zstd:chunked images without a lot of care. But I'm not sure that's what you mean.

(Rereading your comment again, maybe it is?)

But, can you then clarify I guess more precisely: what does "inspect" mean here? Is it e.g. podman mount, or is it "parse the tar stream for a layer directly, potentially without awareness of zstd:chunked and simply treating it as zstd and skipping the skippable frames"?

(I actually don't know what e.g. clair does on this)


On the overall thing I just keep circling back to: Did we really need to ship an entirely distinct tar-split stream? A core issue here is of course: "what tar stream, precisely"? I believe today it just happens to be whatever the Go encoding/tar produces...but...we control both ends here today, and ISTM we could create a defined way to reproduce a "canonical tar header" dynamically from the TOC, and just entirely remove 1/3 of the duplicated data.

@giuseppe
Copy link
Member

On the overall thing I just keep circling back to: Did we really need to ship an entirely distinct tar-split stream? A core issue here is of course: "what tar stream, precisely"? I believe today it just happens to be whatever the Go encoding/tar produces...but...we control both ends here today, and ISTM we could create a defined way to reproduce a "canonical tar header" dynamically from the TOC, and just entirely remove 1/3 of the duplicated data.

I've added the tarsplit so we are able to rebuild the original tarball when we pull a partial layer, and it was the easiest way to implement. Longer term, I agree we should avoid doing it and add the missing information to the TOC file itself, so that expected tarball could be recreated using only the TOC information.

@giuseppe
Copy link
Member

another thing we could do when we create the tarball, is to sort entries by their mtime, directories do not matter since anyway they are created using only the information in the TOC file.

This could help with creating a "canonical tarball" and, even more important, it helps with image updates since all the new files are grouped together and end up being in the same range in the HTTP request.

I've started playing with it here: https://github.com/giuseppe/buildah/tree/sort-entries-tar

@mtrmac
Copy link
Collaborator Author

mtrmac commented Jul 12, 2024

The net outcome is that a user can “pull; inspect; push”, and the pushed metadata will be different from what the user saw.

In what scenarios though, more precisely?

  • podman pull consumes the TOC, creates a non-setUID file.
  • podman mount / podman run … inspects the resulting image, sees a non-setUID file.
  • podman push; or, more likely? podman build on top, and push of the result, consumes the tar-split, and pushes a set-UID file.

Can you clarify the primary concern - is it something like e.g. somehow losing xattrs or capability bits accidentally?

Yes

But, can you then clarify I guess more precisely: what does "inspect" mean here? Is it e.g. podman mount,

This.

or is it "parse the tar stream for a layer directly, potentially without awareness of zstd:chunked and simply treating it as zstd and skipping the skippable frames"?

That (an ambiguity in the primary tar stream vs. the 2 metadata-only locations) is a different can of works; I think that one is less severe, because we typically always “create a tar stream” and then build a TOC/tar-split from the tar stream. That ensures that the outcome of a build or a push is consistent in this sense, unless the author is actively malicious. The TOC/tar-split ambiguity being discussed here can trigger creation of unexpected images by passive victims.


On the overall thing I just keep circling back to: Did we really need to ship an entirely distinct tar-split stream?

[Off-topic:

As long as we are committed to transparent interoperability with non-chunked consumers, and we don’t want to issue range requests for metadata of every single file (which would mean resetting the compression state for every 512-byte tar header, in addition to resetting it for the file contents starting after the 512 bytes — I didn’t measure that but it seems to me that would be costly), we sadly seem to need two copies.

And as long as we don’t want to compute an extra sha256 sum over the full result of the chunked pull, they might get out of sync. (Maybe the cost of this would actually be small enough that we should do that? Having layers only identified by DiffID, not by TOC digest, would simplify the model quite a bit.)

If we dropped that restriction and decided to build an entirely new layer format, there’s a lot that could be done.

But that’s not this immediate project.]


A core issue here is of course: "what tar stream, precisely"? I believe today it just happens to be whatever the Go encoding/tar produces...but...we control both ends here today, and ISTM we could create a defined way to reproduce a "canonical tar header" dynamically from the TOC, and just entirely remove 1/3 of the duplicated data.

I think that requires a diversion into historical spelunking to answer “why do we have tar-split at all”? I don’t actually know that we have a written document.

  • The primary reason seems to be just that the tar headers might be non-canonical and we want to reproduce the DiffID exactly. Even if a layer is created by some non-Go author with a different tar implementation, users expect a podman build on top of a base image to only require storing/pushing/pulling the “new layers”, not duplicating the base ones.

    That’s not directly a concern for zstd:chunked layers (as you say, the format is new and we get to define the specifics), but it might be a concern for multi-compression images and round-trips: consume a gzip OS image, build an intermediate zstd:chunked image on top, build an application on top of that, publish as gzip: then users would expect the application’s OS layer to match the OS image.

    I can imagine such a request, I’m not sure how much to insist on fulfilling it. E.g. we are already storing the gzip-pulled and chunked-pulled layers separately and not deduplicating them, so some behavior changes happen anyway.

  • Another possible reason is that the tar-split is the only location storing the metadata in some cases (e.g. overlay has an ignore_chown_errors option, so the on-disk state of the layer is not sufficient). That’s not relevant to the zstd:chunked design —that’s a concern for how we store layers locally, not on registries.

  • Are there other functions performed by the tar-split?

To freeze / define a canonical tar format, we would want to actually document a specific bit stream in a public document to allow interoperability… plausible but some amount of work.

@cgwalters
Copy link
Contributor

(e.g. overlay has an ignore_chown_errors option, so the on-disk state of the layer is not sufficient).

is there a reason in that situation why we don't always fall back to setting the user.containers.override_stat xattr?

@mtrmac
Copy link
Collaborator Author

mtrmac commented Aug 5, 2024

I don’t know. On-disk cost? Also, I guess (I haven’t now checked) that tar-split has already existed by the time the override_stat attribute was being added.

@giuseppe
Copy link
Member

giuseppe commented Aug 6, 2024

user.containers.override_stat requires fuse-overlayfs while the ignore_chown_errors flag could be used also in a situation where there is only one ID available (so no additional IDs available to the users), but still be able to use native overlay. If there is only one ID available, fuse-overlayfs is still not useful because all the files with ID!=0 will be owned by the overflow UID inside the user namespace

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/zstd:chunked Issues relating to zstd:chunked
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants