Support WithProgress for remote Writes #967

imjasonh · 2021-03-15T20:31:22Z

Fixes #952

Shoutout to @DennisDenuto who paired with me on this 🎉

supported in remote.WriteLayer, remote.Write, remote.WriteIndex, remote.MultiWrite
given update chan is closed when write completes
an error is sent along the chan if there are any non-temporary errors uploading
if a layer already exists or is mounted, progress updates immediately to account for that
if layer upload fails and is retried, progress goes backward and goes back up

d7fe938 demonstrates how easy it is to consume this channel, by hacking in an upload progress bar for crane cp (not part of this PR)

Fixes #852 as well as a bonus ⏩

codecov-io · 2021-03-16T01:05:46Z

Codecov Report

Merging #967 (7429ced) into main (4a92f6c) will decrease coverage by 0.50%.
The diff coverage is 58.68%.

@@            Coverage Diff             @@
##             main     #967      +/-   ##
==========================================
- Coverage   75.19%   74.68%   -0.51%     
==========================================
  Files         107      107              
  Lines        4837     4994     +157     
==========================================
+ Hits         3637     3730      +93     
- Misses        669      710      +41     
- Partials      531      554      +23

Impacted Files	Coverage Δ
pkg/v1/remote/write.go	`62.87% <57.85%> (-1.44%)`	⬇️
pkg/v1/remote/multi_write.go	`55.97% <58.33%> (-1.05%)`	⬇️
pkg/v1/remote/options.go	`59.18% <100.00%> (+2.66%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4a92f6c...7429ced. Read the comment docs.

- supported in WriteLayer, Write, WriteIndex, MultiWrite - chan is closed when write completes - an error is sent along the chan if there are any non-temporary errors uploading - if a layer already exists or is mounted, progress updates immediately to account for that - if layer upload fails and is retried, progress goes backward and goes back up

deitch · 2021-04-23T13:41:26Z

I was just hunting for something like this (especially after having put together the tarball one at #675 :-) . I was stumped, as I did not see it in godoc or pkg.go.dev, and was going to start trying to think about doing it, when I found this PR. Apparently godoc only takes the latest semver release (if one exists).

Does this send v1.Update on the channel:

type Update struct {
	Total    int64
	Complete int64
	Error    error
}

so it has just those fields? Does it calculate all of the bytes before sending, across all of the manifests and layers (e.g. remote.Write() or remote.WriteIndex() (it looks like it from skimming the source), so Total really is the total target, and Complete is the amount sent?

I also saw discussion about how docker does its push, where it has a series of updates per blob sent (layers, config, etc.). Are we interested in going down that path?

I came across this with a remote.WriteIndex() that errors out once in a while, and it isn't clear how far it got or why. Having progress reports would go a long way.

imjasonh · 2021-04-23T13:52:11Z

I was just hunting for something like this (especially after having put together the tarball one at #675 :-) . I was stumped, as I did not see it in godoc or pkg.go.dev, and was going to start trying to think about doing it, when I found this PR. Apparently godoc only takes the latest semver release (if one exists).

I'm actually planning on doing a ggcr release today! In the meantime, https://pkg.go.dev/github.com/google/go-containerregistry@main/pkg/v1/remote should work.

Does this send v1.Update on the channel:
type Update struct {
	Total    int64
	Complete int64
	Error    error
}
so it has just those fields? Does it calculate all of the bytes before sending, across all of the manifests and layers (e.g. remote.Write() or remote.WriteIndex() (it looks like it from skimming the source), so Total really is the total target, and Complete is the amount sent?

That's correct. It sends v1.Update, having pre-calculated the Total, taking into account duplicate blobs and manifests.

I also saw discussion about how docker does its push, where it has a series of updates per blob sent (layers, config, etc.). Are we interested in going down that path?

I think that's something we could investigate and implement, yes. We could accomplish this by adding fields to the v1.Update type, but we haven't really looked into it much yet.

deitch · 2021-04-23T13:54:42Z

OK, I will start with that. I have been very bogged down (in a positive sense) with various deliverables the last two weeks, otherwise I would take a stab at it. If I can at some point, I will.

deitch · 2021-04-23T14:12:49Z

This question is slightly off topic, but triggered here. How do I get a v1.Layer from a file on disk? The reason I ask here is, I would normally do:

                        f, err := os.Open(somefile)
                        defer f.Close()
                        layer = stream.NewLayer(f)

But that then gives you a *stream.Layer, which cannot be used with WithProgress(). Is there a way to get a layer off an arbitrary local file that is compatible with this?

imjasonh · 2021-04-23T14:14:59Z

https://pkg.go.dev/github.com/google/go-containerregistry@main/pkg/v1/tarball#LayerFromFile is the best we have right now.

Ideally updates would be able to handle streaming layers by increasing the total as bytes are streamed, but this hasn't been done yet.

deitch · 2021-04-23T14:18:56Z

True, but in this case, I actually know the total size (it is just a file), so no reason not to have proper updates.

Would that actually work? Or only with a tarball?

imjasonh · 2021-04-23T14:21:04Z

Would it make sense to have some stream.NewLayerWithSize that asserted the size? 🤔

deitch · 2021-04-23T14:24:20Z

That certainly would make it easier to get WithProgress() working. I would think that the ability to just get a layer from a file - i.e. something that has a known size - would make sense. Then again, that is precisely what NewLayerWithSize does.

At some point, if/when we decide to have per-layer progress reports, we will need the size for each layer as we push it up, won't we? This problem becomes the same there.

Although, as I think about it, when we push an Image or Index, we have the sizes of each blob from the manifest parent, so WithSize actually should cover that use case.

imjasonh · 2021-04-23T15:06:19Z

A stream.Layer doesn't know its size or digest by default today, and so a v1.Image that contains a stream.Layer can't calculate its Manifest or Digest or basically anything until that stream has been consumed and its digest and size calculated (these are streamily calculated while the stream is being consumed).

Care is taken so that mutations are lazily applied, so you can mutate.Append a stream.Layer and remote.Write it, which will read and push the blob before committing the just-recently-calculatable manifest. It's complex and brittle and honestly kind of a headache for not much gain in a lot of cases.

The use case for stream.Layer is in being able to generate layer contents and stream them to a registry (etc.), not in being able to stream bytes from disk, where size and digest are pretty easily calculatable. tarball.LayerFromFile assumes the file contents are already tarred, but if that's not the case you can either tar it and write it to back disk temporarily, or tar+stream it through tarball.LayerFromOpener, or just implement v1.Layer however else you want. stream.Layer is probably not going to be a good fit if you already know the size and digest, at least not in its current form.

deitch · 2021-04-23T15:24:31Z

The use case for stream.Layer is in being able to generate layer contents and stream them to a registry (etc.), not in being able to stream bytes from disk, where size and digest are pretty easily calculatable. tarball.LayerFromFile assumes the file contents are already tarred, but if that's not the case you can either tar it and write it to back disk temporarily, or tar+stream it through tarball.LayerFromOpener, or just implement v1.Layer however else you want. stream.Layer is probably not going to be a good fit if you already know the size and digest, at least not in its current form.

So thinking about the bigger picture, we either need a concept of presetting size and digest on stream.Layer, or having a different kind of layer that represents one whose digest and size are known.

What do layout.Image and remote.Image and tarball.Image and such do, when returning a v1.Layer?

imjasonh · 2021-04-23T15:45:46Z

remote.Image never returns a stream.Layer, it only returns a remote.remoteLayer which calls fetchBlob lazily when you request the blob contents. layout.Image and tarball.Image already assume they has the full manifest (as bytes or struct), so it also doesn't have to deal with stream.Layers directly.

Really the only time you'd have to deal with them is when you mutate.AppendLayer a stream.Layer on top of them, in which case you're actually dealing with the mutate.image, which handles all the lazy calculation for you, and fails if an appended stream.Layer hasn't been computed yet.

stream.Layer is intended for cases where the contents are generated on-the-fly, so size and digest are unknowable. If you already know those things, you probably want something less annoying to deal with, such as a tarball.Layer or some other v1.Layer implementation built for your purpose.

jonjohnsonjr · 2021-04-23T15:58:11Z

remote.Image never returns a stream.Layer, it only returns a remote.remoteLayer which calls fetchBlob lazily when you request the blob contents.

This is a slight oversimplification. If you've accessed a layer via remote.Image, we return a remoteImageLayer, which cheaply computes the digest, size, diffid, and mediaType by looking at the image manifest or config file.

If you access a layer via remote.Layer, we return a remoteLayer, which can get the size via a HEAD request, knows the digest because it was an input, and has the mediaType hardcoded (best we can do, really).

@deitch do you have this layer inside a layout? Or is it just a file? Where do you have this metadata stored? We'd probably want to implement something that knows how to read/parse this metadata and can return a layer with all of this pre-populated, rather than you having to supply each piece individually.

imjasonh · 2021-04-23T16:18:59Z

This is a slight oversimplification. If you've accessed a layer via remote.Image, we return a remoteImageLayer, which cheaply computes the digest, size, diffid, and mediaType by looking at the image manifest or config file.

Thanks for this factcheck, you're right. In either case, stream.Layer is not involved when you fetch an image using pkg/remote.

jonjohnsonjr · 2021-04-23T16:28:49Z

Some maybe useful example of similar situations...

cosign uploading signatures:

https://github.com/sigstore/cosign/blob/7fee630550dc8939dcbdf4f68898944844794569/pkg/cosign/remote.go#L86-L120

It's a very basic static v1.Layer implementation that just returns some bytes and computes its metadata.

It does this itself because in this case we don't want to compress the layer, which stream and tarball assume you do.

rules_docker translating from bazel's representation to v1.Layer:

https://github.com/bazelbuild/rules_docker/blob/8c3a8110a0c519929a7e79c39ac345a0f8c74d04/container/go/pkg/compat/image.go#L35-L115

loadHashes reads the digest and diffid from rules_docker's internal representation of these images, and Size/UncompressedSize just Stat the relevant file.

crane uploading a layer from a file:

go-containerregistry/pkg/crane/append.go

Lines 42 to 52 in 37467b5

    
           func getLayer(path string) (v1.Layer, error) { 
        
           	f, err := streamFile(path) 
        
           	if err != nil { 
        
           		return nil, err 
        
           	} 
        
           	if f != nil { 
        
           		return stream.NewLayer(f), nil 
        
           	} 
        
           	return tarball.LayerFromFile(path) 
        
           }

If it's a streaming thing, we use stream.Layer, otherwise tarball.Layer. This doesn't have any metadata associated with it, so we just fall back to v1.Layer implementations that will compute all of this for you.

deitch · 2021-04-25T14:54:43Z

Ah all of that makes sense and clarifies a lot. I didn't think about the implications when I had a file, so I just wrapped it all with a stream.Layer. Clearly, that wasn't the best option.

do you have this layer inside a layout? Or is it just a file?

Both use cases, actually. I am guessing that you are hinting that in a layout, I can just use the v1/layout properties to get a layer? Or even a whole image?

What about for just a file?

deitch · 2021-04-25T14:55:58Z

As for the progress, I really would like to figure out a way to get WithProgress to report writing by blob and manifest (similar to docker push). I have had a number of cases where the write hangs, or sometimes returns an error, when I am writing an index with 3 manifests underneath it, each of which may have several decent-sized layers. I have no idea which one is causing the error. I imagine reporting by blob would help.

imjasonh requested a review from jonjohnsonjr March 15, 2021 20:31

DennisDenuto mentioned this pull request Mar 15, 2021

Provide copy progress feedback carvel-dev/imgpkg#96

Closed

imjasonh force-pushed the progress branch 2 times, most recently from 2809cb0 to bb071df Compare March 16, 2021 00:58

imjasonh force-pushed the progress branch from bb071df to 1985ff4 Compare March 16, 2021 17:36

DennisDenuto mentioned this pull request Mar 16, 2021

Improve UX for copy carvel-dev/imgpkg#105

Merged

imjasonh force-pushed the progress branch from 1985ff4 to 7314478 Compare March 16, 2021 20:41

jonjohnsonjr approved these changes Mar 19, 2021

View reviewed changes

imjasonh mentioned this pull request Mar 21, 2021

Allow progress also when saving to docker daemon #972

Closed

imjasonh force-pushed the progress branch from 7314478 to 7429ced Compare March 22, 2021 16:11

imjasonh merged commit 3678a26 into google:main Mar 22, 2021

imjasonh mentioned this pull request Apr 26, 2021

Report per-object progress through WithProgress #1000

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support WithProgress for remote Writes #967

Support WithProgress for remote Writes #967

imjasonh commented Mar 15, 2021

codecov-io commented Mar 16, 2021 •

edited

Loading

deitch commented Apr 23, 2021

imjasonh commented Apr 23, 2021

deitch commented Apr 23, 2021

deitch commented Apr 23, 2021

imjasonh commented Apr 23, 2021

deitch commented Apr 23, 2021

imjasonh commented Apr 23, 2021

deitch commented Apr 23, 2021

imjasonh commented Apr 23, 2021

deitch commented Apr 23, 2021

imjasonh commented Apr 23, 2021

jonjohnsonjr commented Apr 23, 2021

imjasonh commented Apr 23, 2021

jonjohnsonjr commented Apr 23, 2021

deitch commented Apr 25, 2021

deitch commented Apr 25, 2021

Support WithProgress for remote Writes #967

Support WithProgress for remote Writes #967

Conversation

imjasonh commented Mar 15, 2021

codecov-io commented Mar 16, 2021 • edited Loading

Codecov Report

deitch commented Apr 23, 2021

imjasonh commented Apr 23, 2021

deitch commented Apr 23, 2021

deitch commented Apr 23, 2021

imjasonh commented Apr 23, 2021

deitch commented Apr 23, 2021

imjasonh commented Apr 23, 2021

deitch commented Apr 23, 2021

imjasonh commented Apr 23, 2021

deitch commented Apr 23, 2021

imjasonh commented Apr 23, 2021

jonjohnsonjr commented Apr 23, 2021

imjasonh commented Apr 23, 2021

jonjohnsonjr commented Apr 23, 2021

cosign uploading signatures:

rules_docker translating from bazel's representation to v1.Layer:

crane uploading a layer from a file:

deitch commented Apr 25, 2021

deitch commented Apr 25, 2021

codecov-io commented Mar 16, 2021 •

edited

Loading