Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Processing an image with podman throws away existing layer annotations #23463

Closed
antheas opened this issue Jul 31, 2024 · 14 comments
Closed

Processing an image with podman throws away existing layer annotations #23463

antheas opened this issue Jul 31, 2024 · 14 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. stale-issue

Comments

@antheas
Copy link

antheas commented Jul 31, 2024

Issue Description

If you upload an image using the compression format zstd:chunked, podman will replace its annotations with the TOC. It should instead add the TOC on top of the existing annotations

Steps to reproduce the issue

Steps to reproduce the issue

  1. Create an image with Annotations or pull one
  2. Example image: ghcr.io/hhd-dev/bazzite-automated-deck
  3. Upload the image using podman push with --compression-format=zstd:chunked

Describe the results you received

The uploaded image has its annotations stripped and replaced by a TOC:
image

Describe the results you expected

Each layer having the existing annotations from the uploaded image:
image

podman info output

Podman from the LTS ubuntu 24 action runner. Ran as part of `redhat-actions/push-to-registry@v2` with:

extra-args: |
              --compression-format=zstd:chunked

Podman in a container

No

Privileged Or Rootless

Rootless

Upstream Latest Release

No

Additional environment details

No response

Additional information

I will be honest and say the error has not been completely verified 100%, and we will try to run more testing the following days. Opening this issue to keep track of it.

Tagging @KyleGospo as this action ran as part of bazzite CI.

I will download an image and replicate with the latest podman over the next 3 hours.

And yes, I think I am one of the first to find a use for layer annotations.

@antheas antheas added the kind/bug Categorizes issue or PR as related to a bug. label Jul 31, 2024
@antheas antheas changed the title Pushing an image with --compression-format=zstd:chunked throws away existing annotations Pushing an image with --compression-format=zstd:chunked throws away existing layer annotations Jul 31, 2024
@antheas antheas changed the title Pushing an image with --compression-format=zstd:chunked throws away existing layer annotations Processing an image with podman throws away existing layer annotations Aug 1, 2024
@antheas
Copy link
Author

antheas commented Aug 1, 2024

After more testing, it was found that GZIP compression also threw away the annotations, so that is irrelevant.

I suspect that maybe only skopeo has the concept of layer annotations, which causes them to be thrown away by podman during e.g., podman push.

To not have to rely on those, I moved the information to an annotation/label key which seems to be always preserved.

It is quite large (100kb-200kb), which is unfortunate, but seems to be the cost of doing business.

I will leave this issue open as you might want to look into this. I was reusing the ostree.components tag by ostree-rs-ext so this is something that will affect Kinoite/Silverblue as it moves to an OCI format.

@cgwalters
Copy link
Contributor

FWIW I think in general issues like this are probably best filed against either c/storage or maybe buildah...podman only vendors that code.

But anyways I think what's happening here is pretty simple, the process of pull, build, push always creates a new manifest, and what gets inherited there from the base image is an explicit set. (I'd have to dig to find the relevant code, but pretty sure that's what's happening)

This topic also came up in containers/buildah#5586 (comment)

@antheas
Copy link
Author

antheas commented Aug 2, 2024

In the example in this thread, I only did pull from oci: then push. Not build.

After pull, skopeo could see the annotation. Podman inspect did not show it.

Unfortunately I could not test push locally due to connectivity issues.

Perhaps it's podman push? On the rechunk action I push with skopeo and the annotation is preserved, on bazzite:unstable podman push is used and the annotation is not.

@cgwalters
Copy link
Contributor

If you do a podman pull oci: it's going to unpack it into containers-storage i.e. overlay and it's at that point where the original layer annotations are lost because we unpack the tarballs basically. There's a tricky balance here because while it would probably make sense to preserve them if we just repush an image that was originally e.g. gzip back as gzip again...if we're converting from gzip to e.g. zstd:chunked (or vice versa) we shouldn't (right?). Now arguably we could learn to e.g. strip off zstd:chunked annotations only when repushing as gzip, etc.

I guess what probably would have made sense is two namespaces for annotations: one which applies to the compressed stream, and one for the tar stream? We could try to standardize such a thing, which would help probably.

@antheas
Copy link
Author

antheas commented Aug 2, 2024

Kind of unfortunate there's no space in the OCI standard to store metadata. The labels is a decent workaround for the time being.

160kb is enough to store all package versions and all layer mappings as far as the rechunk project is concerned. Could also potentially be lowered by compressing the json and base64 encoding it.

This is enough to use the previous manifest to generate e.g., package changelogs.

ZSTD chunked opt

But it would have been nice to be able to store all files in the image. That would allow, at least for the purposes of zstd chunked, to take all new files that didn't exist on the previous version and place them at the end of the layer. Then consumers would only need to do a range request for the end of the file.

Would also work for multiple updates if the ordering is preserved. Consider updates A, B, C, D. The user when updating to D would get a layer that starts with the remaining files from A, then B, C, and finally the files introduced in D.

This means that a user without the layer would download all of it, but a user from A would only do a single request for BCD, from B a request for CD, and from C a request from D.

Essentially, you'd get a single file that acts as a static delta for all previous container versions.

It's also layer structure agnostic as if you sort the hashes it doesn't matter in how many layers you split them, consumers will only need to download the end of each layer. And if a layer has no new files it will still generate the same hash.

But that means finding around 4-8MB to store the ordered file hashes.

@cgwalters
Copy link
Contributor

cgwalters commented Aug 2, 2024

zstd:chunked uses zstd skippable frames to store all sorts of file metadata hidden in the compression (but not tar) stream today; the annotations you see on the layers just tell the client where to find those frames.

(A recent discussion of some issues raised by that fact containers/storage#2014 )

@cgwalters
Copy link
Contributor

. That would allow, at least for the purposes of zstd chunked, to take all new files that didn't exist on the previous version and place them at the end of the layer. Then consumers would only need to do a range request for the end of the file.

Right, see containers/storage#2014 (comment)

(I don't know if @giuseppe is actively working on it, but if someone chooses to dive into it I think we should track it as a dedicated issue to avoid duplicate work and also ensure the design has some bikeshedding)

@antheas
Copy link
Author

antheas commented Aug 2, 2024

Sure, if I look into it I will open an issue

Mtimes wouldn't work most of the time though, as when building an image through GitHub actions they would be wiped anyway

And for an OSTree commit we pin the timestamps to 1 to avoid layer hash changes

It would only work if the image was built entirely using local files.

So you'd need the previous version files, sorted. Then, assuming podman retains the tar stream ordering, modify the tar stream ordering before pushing to a registry. Then it would work even if an image was converted to gzip and back

@cgwalters
Copy link
Contributor

Right, I meant to reply to giuseppe that I despise mtimes and defining tooling around them being meaningful is a Bad Idea.

The relevant thing (for zstd:chunked) is really "new chunks" I think...and obviously the tooling already knows how to check for which chunks are shared, it'd just be factoring that out as a static thing?

However there's a caveat here to note that doing things this way necessarily creates Hysteresis...the state of the current image layout becomes a function of its previous layouts, which makes reproducible builds more complicated.

@antheas
Copy link
Author

antheas commented Aug 3, 2024

Agreed. I am fine with a bit of hysteresis in the packaging format if it is well controlled and provides a better experience to my users.

It is too early to do that for zstd:chunked though, as neither bootc or rpm-ostree support it yet (for deltas that is).

I am happy with using a tag though, so this issue is not applicable to me anymore.

In addition, it seems like trying to retrofit container storage will be such a pain that it is probably easier for ostree-rs-ext to move away from layer annotations to using a tag (for preventing layer shifting) as well.

Close as non-planned?

@rhatdan
Copy link
Member

rhatdan commented Aug 12, 2024

@giuseppe @mtrmac PTAL

@mtrmac
Copy link
Collaborator

mtrmac commented Aug 12, 2024

If you do a podman pull oci: it's going to unpack it into containers-storage i.e. overlay and it's at that point where the original layer annotations are lost because we unpack the tarballs basically.

We do actually preserve a manifest, typically (but not always) a byte-exact copy of the on-registry one. And, for some features like forensic signature checks, we would want to make preserving a byte-exact copy a promise. (The main case where we fail to do that is containers/image#1768 ).

So that makes most of the conceptual discussion here … true enough as it goes, but not directly relevant to the behavior in question.

(I agree that all bets are off on builds. And, in general, podman pull && podman push is almost always a bad way to do a copy; skopeo is much more efficient.)


As a guess, this is https://github.com/containers/image/blob/da938fa75e55f8cf57d3875ffc3946f011fb86ec/copy/compression.go#L246 and similar code in that file discarding existing annotations during the upload process.

I’d want to see at least detailed steps to reproduce, a precise Podman version number and a --log-level=debug log to confirm, though.


Assuming that guess is true: It’s plausible to call that a bug, and we can discuss what to do precisely.

There's a tricky balance here … preserve them [vs] … we shouldn't

We can flip from discarding everything to preserving everything; we can hard-code a database of behavior for specific annotations. That is never going to cover custom annotations, and initially/intuitively I find the idea of making that user-configurable for pulls/pushes/copies rather unattractive. I’d lean towards making annotation editing a Buildah feature, and then a push should just preserve everything.

Copy link

A friendly reminder that this issue had no activity for 30 days.

@antheas
Copy link
Author

antheas commented Sep 12, 2024

Since it seems the intention of layer annotations was to aid in packaging of said layers and not to act as metadata for it, and doing otherwise now would be too much work

I'm closing this issue

E.g. fixing this would interfere with zstd chunked

@antheas antheas closed this as not planned Won't fix, can't repro, duplicate, stale Sep 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. stale-issue
Projects
None yet
Development

No branches or pull requests

4 participants