Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add remote.MultiWrite #798

Merged
merged 1 commit into from
Nov 1, 2020
Merged

Add remote.MultiWrite #798

merged 1 commit into from
Nov 1, 2020

Conversation

imjasonh
Copy link
Collaborator

@imjasonh imjasonh commented Oct 24, 2020

This method takes multiple images or manifest lists, and attempts to
efficiently upload all of them in parallel, deduplicating shared layer
and config blobs and avoiding unnecessary auth token exchanges.

Blobs are uploaded in parallel first, then manifests are tagged in order
with as much parallelism as we can support.

This method is currently limited in that:

  • all images must be getting uploaded to the same repository (to
    simplify auth and transports), and
  • no image may contain streaming layers (to simplify digests), but both
    of these can be removed in future changes.
  • blobs are uploaded and manifests are tagged in parallel with a
    constant max parallelism, which should be a user-controlled option.

@codecov-io
Copy link

codecov-io commented Oct 24, 2020

Codecov Report

Merging #798 into master will decrease coverage by 0.63%.
The diff coverage is 53.27%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #798      +/-   ##
==========================================
- Coverage   75.60%   74.96%   -0.64%     
==========================================
  Files         102      103       +1     
  Lines        4185     4306     +121     
==========================================
+ Hits         3164     3228      +64     
- Misses        566      601      +35     
- Partials      455      477      +22     
Impacted Files Coverage Δ
pkg/registry/manifest.go 86.20% <7.69%> (-13.80%) ⬇️
pkg/v1/remote/multi_write.go 57.54% <57.54%> (ø)
pkg/crane/push.go 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cd5ad27...5b1ea71. Read the comment docs.

return err
}
default:
return fmt.Errorf("unknown media type: %v", desc.MediaType)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need something like v1.ImageIndex.Blob to access non-manifest artifacts that are referenced by an index. Making the interface wider would be a breaking change, which is annoying, but I doubt it would affect much code outside of this repo...

pkg/v1/remote/multi_write.go Outdated Show resolved Hide resolved
pkg/v1/remote/multi_write.go Outdated Show resolved Hide resolved
pkg/v1/remote/multi_write_test.go Outdated Show resolved Hide resolved
pkg/v1/remote/multi_write.go Show resolved Hide resolved
pkg/v1/remote/multi_write.go Outdated Show resolved Hide resolved
@imjasonh
Copy link
Collaborator Author

Talked with Jon yesterday, some of this is more complex than we'd expected, since some registries (e.g., GCR) require that all of a manifest list's referenced manifests already be uploaded, so we can't just blindly copy manifests up at once, we have to do it in an order.

Since a manifest list can reference other manifest lists, etc., we get a tree of them that has to be uploaded, leafs before roots. Each level can be uploaded in parallel, but has to complete before the next level up starts.

I'm adding this check to our test registry in pkg/registry so we'll be able to test it's done right.

This method takes multiple images or manifest lists, and attempts to
efficiently upload all of them in parallel, deduplicating shared layer
and config blobs and avoiding unnecessary auth token exchanges.

Blobs are uploaded in parallel first, then manifests are tagged in order
with as much parallelism as we can support.

This method is currently limited in that:
- all images must be getting uploaded to the same repository (to
  simplify auth and transports), and
- no image may contain streaming layers (to simplify digests), but both
  of these can be removed in future changes.
- blobs are uploaded and manifests are tagged in parallel with a
  constant max parallelism, which should be a user-controlled option.
@@ -23,6 +23,9 @@ import (
"net/http"
"strings"
"sync"

v1 "github.com/google/go-containerregistry/pkg/v1"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer it if we inlined some of this stuff to avoid taking a dependency outside of the stdlib for this package, but I might just fix it myself before we cut a new release.

// list's constituent manifests are already uploaded.
// This isn't strictly required by the registry API, but some
// registries require this.
if mf.contentType == string(types.OCIImageIndex) ||
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

86.20% <7.69%> (-13.80%)

This used to have 100% coverage 😄

// Current limitations:
// - All refs must share the same repository.
// - Images cannot consist of stream.Layers.
func MultiWrite(m map[name.Reference]Taggable, options ...Option) error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This API is good. I have some issues with the implementation and limitations, which we've discussed offline. Given that the interface is correct, I'm fine with merging this and improving it as we have time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants