Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve append command - ease the process to push manifests to remote registries from a set of images #891

Closed
wants to merge 1 commit into from

Conversation

vboulineau
Copy link

Early PR to to get feedback on the added feature: the merge command eases the following workflow:
CI -> one image / ARCH and/or OS -> build registry -> single manifest in remote registry

Command generates merged manifest from images or manifests. Currently platform information are always deduced and cannot be taken from user input.

Creating the PR to get feedbacks. If there's interest to get this feature in, I can polish/rework the PR as needed.

@google-cla
Copy link

google-cla bot commented Dec 29, 2020

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here with @googlebot I signed it! and we'll verify it.


What to do if you already signed the CLA

Individual signers
Corporate signers

ℹ️ Googlers: Go here for more info.

@google-cla
Copy link

google-cla bot commented Dec 29, 2020

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here with @googlebot I signed it! and we'll verify it.


What to do if you already signed the CLA

Individual signers
Corporate signers

ℹ️ Googlers: Go here for more info.

@codecov-io
Copy link

codecov-io commented Dec 29, 2020

Codecov Report

Merging #891 (491aca8) into main (4eb508c) will decrease coverage by 1.14%.
The diff coverage is 0.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #891      +/-   ##
==========================================
- Coverage   72.67%   71.53%   -1.15%     
==========================================
  Files         108      109       +1     
  Lines        4696     4735      +39     
==========================================
- Hits         3413     3387      -26     
- Misses        773      838      +65     
  Partials      510      510              
Impacted Files Coverage Δ
pkg/crane/merge.go 0.00% <0.00%> (ø)
pkg/v1/google/options.go 70.58% <0.00%> (-19.42%) ⬇️
pkg/crane/optimize.go 0.00% <0.00%> (-11.41%) ⬇️
pkg/v1/google/list.go 62.06% <0.00%> (-0.86%) ⬇️
pkg/v1/tarball/write.go 61.45% <0.00%> (-0.85%) ⬇️
pkg/v1/tarball/layer.go 71.17% <0.00%> (ø)
pkg/v1/remote/transport/error.go 100.00% <0.00%> (ø)
pkg/crane/digest.go 31.25% <0.00%> (+3.47%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4eb508c...826e81f. Read the comment docs.

@vboulineau
Copy link
Author

@googlebot I signed it!

@jonjohnsonjr
Copy link
Collaborator

Creating the PR to get feedbacks. If there's interest to get this feature in, I can polish/rework the PR as needed.

Thank you for doing this, it's a good starting point for discussion.

I'm +1 on the need for something like this but not sure about this exact behavior. I have some low-level feedback but let's get to that once we've settled on higher level stuff...

My initial reaction is "we could make append support most of this", but maybe it would get confusing if we start producing an index or an image depending on which flags you set (though I think it's doable). Currently, crane append understands a list of -f tarball layers for appending to an image. If we added e.g. -i to represent an image, we could append everything to an index instead of an image and push that.

Do you have a use case for flattening an index here? If not for that bit, we could just append each SRC descriptor to an index (with the platform stuff, if it's an image) and call it a day.

This also assumes that all children of an index will always be an image, which isn't necessarily true. Ideally you'd recurse on a child index (we might want to expose something like this to make that easier) if you did want to flatten everything down to one index..

If you want to punt on the flattening bits, we could simplify this a lot, but if that's something you think is useful, I'm open to understanding why. It might also make sense to pull that out into a separate thing, depending on why it's needed.

Currently platform information are always deduced and cannot be taken from user input.

This is reasonable for now, but we should figure out what we would want the interface to look like for overriding this. Going back to the possibility of adding this to crane append... I do have concerns about the flags getting messy if we want to support stuff like overriding platforms, but perhaps we could do something generic enough that it's useful for crane appended images as well, e.g. a flag for overriding descriptor values, something like:

crane append ... -i foo -i bar \
--override '.manifests[0].platform = {"os":"linux","architecture":"amd64"}' \
--override '.manifests[1].platform = {"os":"linux","architecture":"arm64"}' \
--override '.manifests[1].annotations["org.opencontainers.image.url"] = "example.com/foo"'
crane append ... -f foo.tar \
--override '.layers[0].URLs = {"os":"linux","architecture":"amd64"}'

Now... that would involve implementing a subset of jq as a little interpreter thing, which would be fun, but probably way outside the scope of this PR 😄 cc @imjasonh if we implemented this, it would be very useful for mutating image config as well? Also would be a nice generic tool for mutating indexes without having to merge my crazy interactive editor experiment.

We could also just borrow ideas from https://github.com/estesp/manifest-tool, e.g. --platforms linux/amd64,linux/arm/v5,linux/arm/v7 matches what we're doing elsewhere pretty well, but I'm not sure what value would represent "derive this from config" vs "don't set any platform", y'know? There's also os.version that we'd need to represent somehow... 🤔

Speaking of which, is there a reason you'd like to add this to crane instead of using manifest-tool? (I'm open to it, just curious.)

@vboulineau
Copy link
Author

vboulineau commented Jan 5, 2021

Hey @jonjohnsonjr,

Thanks for fast and in-depth review.

I think adding to append is a good idea, though we will need to prevent mixing image and tarballs and I think the code path will be almost entirely different 🤔 .

So, I think you're right on the flattening, it's probably not required, though I think it'd make the generated manifest more readable (I also to need to make sure it works properly everywhere).

About the platform information, I was leaning toward the manifest-tool syntax, however it lacks support for os.version, though I am not sure this part requires override.

We had two options to implement our workflow:

  • Generate a manifest with manifest-tool on source registry and then use crane to copy generated manifest
  • Copy source image to remote and use manifest-tool on remote to generate manifest.

Because manifest-tool does not handle source images and target to be in different registries.

The idea is to simplify this by adding the capability to crane and do everything in one go. Relying on crane to identify automatically which data needs to be copied based on generated manifest is very convenient.

@jonjohnsonjr
Copy link
Collaborator

jonjohnsonjr commented Jan 6, 2021

though we will need to prevent mixing image and tarballs

I'm not so sure about that. Technically, an index can reference anything, not just images, so it's totally fine to produce an index that references layers (or just arbitrary stuff, really). I guess it's not just "technically", because the buildkit folks do this, so it's definitely a valid use case.

My only concern is that I'm not sure if there's a way to easily control the ordering of different []string flags using cobra, so we'd have to parse that ourselves or just have the caveat that images come before layers in the resulting index's list of descriptors.

About the platform information, I was leaning toward the manifest-tool syntax, however it lacks support for os.version, though I am not sure this part requires override.

I'd be fine with us being "clever" with a --platform flag. Something like:

crane append \
  -i foo --platform "linux/amd64" \
  -i bar --platform "linux/arm64/v7" \
  -i baz --platform '{"os":"windows","architecture":"amd64","os.version":"10.0.14393.1066"}'

We can check to see if --platform starts with "{", and just parse it as JSON in that case, otherwise we try to parse the short form. This way you can supply whatever you want for the platform field. WDYT?

@vboulineau vboulineau force-pushed the vboulineau/merge branch 3 times, most recently from 2459624 to 50d5320 Compare January 20, 2021 15:19
@vboulineau vboulineau changed the title Add merge command - ease the process to push manifests to remote registries from a set of images Improve append command - ease the process to push manifests to remote registries from a set of images Jan 20, 2021
@vboulineau
Copy link
Author

Hello,

I've updated the PR with the implementation in Append function.
Some points I'd like to share that could be helpful in the review:

  • The crane.Append is taking string and []string as parameter, similar to Copy but unlike some other functions where v1 types are used in input. I've a mixed feeling about this. On one hand it makes it easier to use Crane as a library, but on the other hand, it prevents testing a bit (as we cannot inject arbitrary structure like random.Image). On top of that as base can be an Index or an Image, it pushes significant work to cmd (or to users) if Append does not do this job.
  • Due to the above, I haven't added tests for Append and I commented out the filesystem test. When we decide on the point above, I can update the tests.
  • The index flattening is required to make proper usage of the feature, nested indexes are not supported by Docker or DockerHub. Index is valid and can be pushed, but we cannot pull the arch referenced in nested index, so it kind of defeats the purpose of the feature.
  • I return Image/Index objects as after calling Append (as a library), one might want to sign these images using Notary, which requires digest and size.

About the inline platform, I think it's easier to have everything in -i option, i.e. -i foo,linux/amd64 or -i bar,{"os":"windows","architecture":"amd64","os.version":"10.0.14393.1066"}
I think it's easier to add and makes sure we don't depend on ordering, WDYT?

@jonjohnsonjr
Copy link
Collaborator

jonjohnsonjr commented Jan 21, 2021

I want to respond just to make sure you don't feel left in the dark, but I haven't had much time to think deeply about this PR, and I apologize for that.

The crane.Append is taking string and []string as parameter, similar to Copy but unlike some other functions where v1 types are used in input. I've a mixed feeling about this. On one hand it makes it easier to use Crane as a library, but on the other hand, it prevents testing a bit (as we cannot inject arbitrary structure like random.Image). On top of that as base can be an Index or an Image, it pushes significant work to cmd (or to users) if Append does not do this job.

This is a good point that I hadn't considered. Most of the stuff in pkg/crane was added as needed to simplify cmd/crane and lacks a cohesive design. Some things are composable, where other things aren't (and assume we are only interacting with remote registries). This is frustrating...

Due to the above, I haven't added tests for Append and I commented out the filesystem test. When we decide on the point above, I can update the tests.

Makes sense.

The index flattening is required to make proper usage of the feature, nested indexes are not supported by Docker or DockerHub. Index is valid and can be pushed, but we cannot pull the arch referenced in nested index, so it kind of defeats the purpose of the feature.

I wonder if it would be worth the effort to start pushing on this? The fact that there is no standardized mechanism for resolving an index to a manifest came up recently in the context of adding zstd compression to layers. I think it would move the whole ecosystem forward if we could agree on how clients should behave (including what to do when confronted with a nested index) and then update clients to match. See opencontainers/image-spec#803 (comment) for ~some context but there's a lot in that thread that is relevant.

I understand that you're probably not interested in blocking on this PR for years until we fix this, so let me think about an alternative for now.

I return Image/Index objects as after calling Append (as a library), one might want to sign these images using Notary, which requires digest and size.

This feels a bit strange. I think I'd rather see a v1.Descriptor returned here, which has all the metadata you need, and the client should have sufficient context to know where it was pushed if it needs that.

If we're going to have to change the signature of Append anyway, it might make sense to split this off into a separate function (please don't hate me for this whiplash 😄 I suspect there is still some more back and forth to come). We could have cmd/append know how to distinguish between crane.Append and crane.Merge (or whatever) based on the flags, or just split that off into a separate command too (again, sorry, I know this was your original proposal).

I am not entirely sure how to do this cleanly, so take most of this feedback as brainstorming -- hopefully we can arrive at something that feels elegant...

About the inline platform, I think it's easier to have everything in -i option, i.e. -i foo,linux/amd64 or -i bar,{"os":"windows","architecture":"amd64","os.version":"10.0.14393.1066"}
I think it's easier to add and makes sure we don't depend on ordering, WDYT?

I do see what you mean, but I'm not a huge fan of the syntax of trying to smash this all into a flag value. I can see why manifest-tool takes a config file. Using # as a separator for platform is probably fine, but I'm really reluctant to establish any new conventions like this because we'll be on the hook for maintaining that forever. This is one thing I do like about the --override flag I proposed earlier. It's order-independent and powerful enough to express ~anything you you might want to do; e.g. if you wanted to add annotations to one of these images, we'd need some other separator and then some string separator, etc etc.

One idea I've had (and haven't quite fleshed out, so please try to poke holes in it) would be to do this via multiple commands instead of just one. We could use an image layout to build up the index you care about (via layout.Append<whatever>) and have flags for overriding platform, annotations, etc. Then when you've done everything you want, just push that layout as an index to your target registry.

Of course, this loses some of the benefit of doing it in one shot (e.g. eliding layer downloads/uploads becomes trickier), but we could be able to split up some of these actions into separate commands to make them easier to use.

I am imagining something like:

# Create an empty directory with index.json
crane layout init output/

# Append each image referenced by gcr.io/distroless/static:latest
crane layout append -C output/ --index gcr.io/distroless/static:latest --flatten

# Also, append some windows images (this happens to be an index of length one)
crane layout append -C output/ --index mcr.microsoft.com/windows/nanoserver:20H2 --flatten

# If you want to manually manipulate this index, do something silly like:
jq '.manifests[0].annotations["foo"] = "bar"' < output/index.json > output/index.json

# Now we can push it wherever we want
crane push --format=layout output/ registry.example.com/whatever

For eliding layers, we could do something stupid like add something equivalent to RepoTags so that it's possible to skip downloading blobs (--sparse or something) to the layout, but still know where to fetch them when we need them.

If you're trying to actually get something merged soon, I'm sorry. Doing this kind of thing is sorely needed in a lot of contexts, and I think if we can nail the user experience, crane becomes a really, really nice tool for doing some stuff that is otherwise ~impossible right now.

Base automatically changed from master to main January 27, 2021 18:57
@vboulineau vboulineau force-pushed the vboulineau/merge branch 2 times, most recently from 1ee4300 to 8f2978e Compare January 29, 2021 17:46
…ies from a set of images

Signed-off-by: Vincent Boulineau <vincent.boulineau@datadoghq.com>
@vboulineau
Copy link
Author

I'm not in any rush to get the PR merged (thanks to Go modules replace), so PR is only here to propose things and help.

I think the crane layout command line is indeed a great idea, and should probably be the generic way to expose it through command line.

It'd also probably be a quite convenient API to play with when vendoring, though maybe there's a space for a bit higher level APIs for simple cases. So I've reworked the PR towards this goal. I moved it back to standalone, remove the command line part (though we could easily bring back crane merge - but was not sure if it's wise to add something that will be replaced later shortly after)

Implementation could later be re-written with the Layout API (probably some other high-level APIs could be too), so in the end this PR might not be that valuable, I think it highly depends on the direction for higher-level APIs and the timeframe for layout command.

@github-actions
Copy link

This Pull Request is stale because it has been open for 90 days with
no activity. It will automatically close after 30 more days of
inactivity. Reopen with /reopen. Mark as fresh by adding the
comment /remove-lifecycle stale.

@jonjohnsonjr
Copy link
Collaborator

Not sure if you're still interested in this, but I've had several people who want this now, and I think my dreams of making crane append more general might never happen. The name crane amend came to me and I kind of love it, if we just wanted to make an index version of append.

@vboulineau
Copy link
Author

Hey @jonjohnsonjr,

Yes very much. We're still using my crane fork to support this. I think amend could make sense but I am not sure limiting it to index provides a good user experience.

As noted in #1137, a practical use case is to aggregate images (or index) built on multiple OS (it's our use case as well). In this scenario I believe you should not have to know, as a end-user, if the source behind some tag is an image or an index.

@jonjohnsonjr
Copy link
Collaborator

In this scenario I believe you should not have to know, as a end-user, if the source behind some tag is an image or an index.

Yeah this is reasonable to me. We can probably mask some of this complexity and upgrade the target to an index as needed.

Another interesting use case is here for dropping some platforms from an existing index: #1143

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants