zstd:chunked metadata ambiguity #2014

mtrmac · 2024-07-11T16:59:51Z

Filing separately, earlier discussion around #1888 (comment) :

zstd:chunked layers, when pulling, contain metadata in three places:

The ordinary uncompressed tar format
TOC metadata
tar-split metadata

When pulling, we ignore the uncompressed tar, build files from TOC metadata, and record tar-split

When pushing, we ignore the metadata of individual files, and use the tar-split metadata.

The net outcome is that a user can “pull; inspect; push”, and the pushed metadata will be different from what the user saw.

I think this_must_ be addressed.

Vaguely, I think that could either happen by having the tar-split “drive” the chunked pull, using the TOC only to look up data; or by overwriting the TOC metadata by the tar-split data.

The latter seems a bit easier because the “push” compression code already has a tar → TOC conversion code, so we would “only” need to read through tar-split; the former is conceptually nicer because, hypothetically, we could eventually only have a single “apply tar header → filesystem metadata” code for all of c/storage, instead of now having “tar → metadata” for ordinary overlay, “TOC → metadata” for chunked, and things like #1653 (comment) — but it would probably be more disruptive and not practical within current short time limits.

Cc: @giuseppe — I’ll be looking at this over the next few days, but I’d very much appreciate any insight, advice, or help.

cgwalters · 2024-07-12T00:27:02Z

The net outcome is that a user can “pull; inspect; push”, and the pushed metadata will be different from what the user saw.

In what scenarios though, more precisely?

I think this_must_ be addressed.

Can you clarify the primary concern - is it something like e.g. somehow losing xattrs or capability bits accidentally?

In a previous issue we were thinking about the "representational ambiguity" problem from the perspective of container image scanners and the like, who could be easily confused by actively maliciously crafted zstd:chunked images without a lot of care. But I'm not sure that's what you mean.

(Rereading your comment again, maybe it is?)

But, can you then clarify I guess more precisely: what does "inspect" mean here? Is it e.g. podman mount, or is it "parse the tar stream for a layer directly, potentially without awareness of zstd:chunked and simply treating it as zstd and skipping the skippable frames"?

(I actually don't know what e.g. clair does on this)

On the overall thing I just keep circling back to: Did we really need to ship an entirely distinct tar-split stream? A core issue here is of course: "what tar stream, precisely"? I believe today it just happens to be whatever the Go encoding/tar produces...but...we control both ends here today, and ISTM we could create a defined way to reproduce a "canonical tar header" dynamically from the TOC, and just entirely remove 1/3 of the duplicated data.

giuseppe · 2024-07-12T08:51:55Z

On the overall thing I just keep circling back to: Did we really need to ship an entirely distinct tar-split stream? A core issue here is of course: "what tar stream, precisely"? I believe today it just happens to be whatever the Go encoding/tar produces...but...we control both ends here today, and ISTM we could create a defined way to reproduce a "canonical tar header" dynamically from the TOC, and just entirely remove 1/3 of the duplicated data.

I've added the tarsplit so we are able to rebuild the original tarball when we pull a partial layer, and it was the easiest way to implement. Longer term, I agree we should avoid doing it and add the missing information to the TOC file itself, so that expected tarball could be recreated using only the TOC information.

giuseppe · 2024-07-12T08:57:18Z

another thing we could do when we create the tarball, is to sort entries by their mtime, directories do not matter since anyway they are created using only the information in the TOC file.

This could help with creating a "canonical tarball" and, even more important, it helps with image updates since all the new files are grouped together and end up being in the same range in the HTTP request.

I've started playing with it here: https://github.com/giuseppe/buildah/tree/sort-entries-tar

mtrmac · 2024-07-12T16:28:32Z

The net outcome is that a user can “pull; inspect; push”, and the pushed metadata will be different from what the user saw.

In what scenarios though, more precisely?

podman pull consumes the TOC, creates a non-setUID file.
podman mount / podman run … inspects the resulting image, sees a non-setUID file.
podman push; or, more likely? podman build on top, and push of the result, consumes the tar-split, and pushes a set-UID file.

Can you clarify the primary concern - is it something like e.g. somehow losing xattrs or capability bits accidentally?

Yes

But, can you then clarify I guess more precisely: what does "inspect" mean here? Is it e.g. podman mount,

This.

or is it "parse the tar stream for a layer directly, potentially without awareness of zstd:chunked and simply treating it as zstd and skipping the skippable frames"?

That (an ambiguity in the primary tar stream vs. the 2 metadata-only locations) is a different can of works; I think that one is less severe, because we typically always “create a tar stream” and then build a TOC/tar-split from the tar stream. That ensures that the outcome of a build or a push is consistent in this sense, unless the author is actively malicious. The TOC/tar-split ambiguity being discussed here can trigger creation of unexpected images by passive victims.

On the overall thing I just keep circling back to: Did we really need to ship an entirely distinct tar-split stream?

[Off-topic:

As long as we are committed to transparent interoperability with non-chunked consumers, and we don’t want to issue range requests for metadata of every single file (which would mean resetting the compression state for every 512-byte tar header, in addition to resetting it for the file contents starting after the 512 bytes — I didn’t measure that but it seems to me that would be costly), we sadly seem to need two copies.

And as long as we don’t want to compute an extra sha256 sum over the full result of the chunked pull, they might get out of sync. (Maybe the cost of this would actually be small enough that we should do that? Having layers only identified by DiffID, not by TOC digest, would simplify the model quite a bit.)

If we dropped that restriction and decided to build an entirely new layer format, there’s a lot that could be done.

But that’s not this immediate project.]

A core issue here is of course: "what tar stream, precisely"? I believe today it just happens to be whatever the Go encoding/tar produces...but...we control both ends here today, and ISTM we could create a defined way to reproduce a "canonical tar header" dynamically from the TOC, and just entirely remove 1/3 of the duplicated data.

I think that requires a diversion into historical spelunking to answer “why do we have tar-split at all”? I don’t actually know that we have a written document.

The primary reason seems to be just that the tar headers might be non-canonical and we want to reproduce the DiffID exactly. Even if a layer is created by some non-Go author with a different tar implementation, users expect a podman build on top of a base image to only require storing/pushing/pulling the “new layers”, not duplicating the base ones.

That’s not directly a concern for zstd:chunked layers (as you say, the format is new and we get to define the specifics), but it might be a concern for multi-compression images and round-trips: consume a gzip OS image, build an intermediate zstd:chunked image on top, build an application on top of that, publish as gzip: then users would expect the application’s OS layer to match the OS image.

I can imagine such a request, I’m not sure how much to insist on fulfilling it. E.g. we are already storing the gzip-pulled and chunked-pulled layers separately and not deduplicating them, so some behavior changes happen anyway.
Another possible reason is that the tar-split is the only location storing the metadata in some cases (e.g. overlay has an ignore_chown_errors option, so the on-disk state of the layer is not sufficient). That’s not relevant to the zstd:chunked design —that’s a concern for how we store layers locally, not on registries.
Are there other functions performed by the tar-split?

To freeze / define a canonical tar format, we would want to actually document a specific bit stream in a public document to allow interoperability… plausible but some amount of work.

cgwalters · 2024-08-02T20:59:47Z

(e.g. overlay has an ignore_chown_errors option, so the on-disk state of the layer is not sufficient).

is there a reason in that situation why we don't always fall back to setting the user.containers.override_stat xattr?

mtrmac · 2024-08-05T15:31:28Z

I don’t know. On-disk cost? Also, I guess (I haven’t now checked) that tar-split has already existed by the time the override_stat attribute was being added.

giuseppe · 2024-08-06T08:57:24Z

user.containers.override_stat requires fuse-overlayfs while the ignore_chown_errors flag could be used also in a situation where there is only one ID available (so no additional IDs available to the users), but still be able to use native overlay. If there is only one ID available, fuse-overlayfs is still not useful because all the files with ID!=0 will be owned by the overflow UID inside the user namespace

cgwalters added the area/zstd:chunked Issues relating to zstd:chunked label Jul 11, 2024

This was referenced Jul 16, 2024

Ensure chunked TOC and tar-split metadata are consistent #2035

Merged

Zstd(:chunked) work tracking checklist containers/image#2189

Open

openshift-merge-bot bot closed this as completed in #2035 Jul 19, 2024

cgwalters mentioned this issue Aug 2, 2024

Processing an image with podman throws away existing layer annotations containers/podman#23463

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

zstd:chunked metadata ambiguity #2014

zstd:chunked metadata ambiguity #2014

mtrmac commented Jul 11, 2024 •

edited

Loading

cgwalters commented Jul 12, 2024 •

edited

Loading

giuseppe commented Jul 12, 2024

giuseppe commented Jul 12, 2024

mtrmac commented Jul 12, 2024

cgwalters commented Aug 2, 2024

mtrmac commented Aug 5, 2024

giuseppe commented Aug 6, 2024

zstd:chunked metadata ambiguity #2014

zstd:chunked metadata ambiguity #2014

Comments

mtrmac commented Jul 11, 2024 • edited Loading

cgwalters commented Jul 12, 2024 • edited Loading

giuseppe commented Jul 12, 2024

giuseppe commented Jul 12, 2024

mtrmac commented Jul 12, 2024

cgwalters commented Aug 2, 2024

mtrmac commented Aug 5, 2024

giuseppe commented Aug 6, 2024

mtrmac commented Jul 11, 2024 •

edited

Loading

cgwalters commented Jul 12, 2024 •

edited

Loading