-
Notifications
You must be signed in to change notification settings - Fork 543
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use Temp File For layout.{Write,Append,Replace}
Image/Index Methods
#1226
Conversation
Codecov Report
@@ Coverage Diff @@
## main #1226 +/- ##
==========================================
+ Coverage 73.80% 73.84% +0.04%
==========================================
Files 111 111
Lines 8191 8263 +72
==========================================
+ Hits 6045 6102 +57
- Misses 1549 1559 +10
- Partials 597 602 +5
Continue to review full report at Codecov.
|
Thanks @ben-krieger are you able to sign the CLA? |
@jonjohnsonjr I'm working on the process with my employer (Intel) now, but it's a holiday, so it might take a day or two. In the meantime, I'll fix the failing test and try to increase code coverage of tests. |
f753f3c
to
1764b92
Compare
Let's keep this unexported, but I'm not opposed to exporting it if/when you need to call it directly. |
layout.WriteLayer
layout.{Write,Append,Replace}
Image/Index Methods
1f2b55a
to
fc1eac8
Compare
pkg/v1/stream/layer.go
Outdated
l: l, | ||
// NOTE: Order matters! If zw is closed first, then it will panic, | ||
// because io.Copy in the goroutine below will still be copying the | ||
// contents of l.blob into it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found this out when calling Close
on a stream.Layer
that had not been fully read. I can write a test, but it's getting a bit out of scope for this PR.
And this only stops a panic from occurring. It was still confusing to see a stream.ErrNotConsumed
after calling Close()
even though it technically makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm really surprised nobody has caught this yet. Where do we panic with the other order?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try this. Revert the change:
diff --git a/pkg/v1/stream/layer.go b/pkg/v1/stream/layer.go
index 9f42f630..3a5993f7 100644
--- a/pkg/v1/stream/layer.go
+++ b/pkg/v1/stream/layer.go
@@ -162,7 +162,8 @@ func newCompressedReader(l *Layer) (*compressedReader, error) {
// NOTE: Order matters! If zw is closed first, then it will panic,
// because io.Copy in the goroutine below will still be copying the
// contents of l.blob into it.
- closer: newMultiCloser(l.blob, zw),
+ closer: newMultiCloser(zw, l.blob),
+ // closer: newMultiCloser(l.blob, zw),
pr: pr,
bw: bw,
Add this test:
func TestCloseBeforeConsume(t *testing.T) {
// Write small tar to pipe
pr, pw := io.Pipe()
tw := tar.NewWriter(pw)
go func() {
pw.CloseWithError(func() error {
body := "test file"
if err := tw.WriteHeader(&tar.Header{
Name: "test.txt",
Mode: 0600,
Size: int64(len(body)),
Typeflag: tar.TypeReg,
}); err != nil {
return err
}
if _, err := tw.Write([]byte(body)); err != nil {
return err
}
return tw.Close()
}())
}()
// Create stream layer from tar pipe
l := NewLayer(pr)
rc, err := l.Compressed()
if err != nil {
t.Fatalf("Compressed: %v", err)
}
// Close stream layer before consuming
if err := rc.Close(); err != nil {
t.Errorf("Close: %v", err)
}
}
If you run it, you'll see that it panics maybe 15-30% of the time. Below, it panicked on the third run.
$ go test -count=20 -run CloseBeforeConsume ./pkg/v1/stream/
--- FAIL: TestCloseBeforeConsume (0.00s)
layer_test.go:257: Close: io: read/write on closed pipe
--- FAIL: TestCloseBeforeConsume (0.00s)
layer_test.go:257: Close: io: read/write on closed pipe
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x4ebeb4]
goroutine 14 [running]:
compress/flate.(*Writer).Write(...)
/home/bkrieger/.gimme/versions/go1.17.2.linux.amd64/src/compress/flate/deflate.go:712
compress/gzip.(*Writer).Write(0xc0000b6840, {0xc0002c0000, 0x200, 0x8000})
/home/bkrieger/.gimme/versions/go1.17.2.linux.amd64/src/compress/gzip/gzip.go:196 +0x334
io.(*multiWriter).Write(0x40b54d, {0xc0002c0000, 0x200, 0x8000})
/home/bkrieger/.gimme/versions/go1.17.2.linux.amd64/src/io/multi.go:60 +0x86
io.copyBuffer({0x5d1b00, 0xc0002be000}, {0x5d1a60, 0xc0000100c0}, {0x0, 0x0, 0x0})
/home/bkrieger/.gimme/versions/go1.17.2.linux.amd64/src/io/io.go:425 +0x204
io.Copy(...)
/home/bkrieger/.gimme/versions/go1.17.2.linux.amd64/src/io/io.go:382
github.com/google/go-containerregistry/pkg/v1/stream.newCompressedReader.func1()
/home/bkrieger/go/src/github.com/google/go-containerregistry/pkg/v1/stream/layer.go:176 +0xe7
created by github.com/google/go-containerregistry/pkg/v1/stream.newCompressedReader
/home/bkrieger/go/src/github.com/google/go-containerregistry/pkg/v1/stream/layer.go:175 +0x575
FAIL github.com/google/go-containerregistry/pkg/v1/stream 0.007s
FAIL
Running the test with the race detector helps explain a bit further.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting... this looks like it might be a race in the gzip package where concurrently calling Write and Close isn't safe. Nice find.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this is working as intended but I disagree: golang/go#18883
IMO Write or Close should return an error and not panic here. I'll send a PR for compress/gzip.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something I was wondering as I explored the code: why does Compressed()
and similar functions return an io.ReadCloser
rather than an io.Reader
? And why do functions like layout.WriteBlob
take an io.ReadCloser
? It doesn't seem like the Close()
method is ever or should ever be called by a user of this library.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling Close()
allows us to clean up e.g. the goroutines that are gzipping stuff and writing into an io.PipeWriter
or an open file. Otherwise we'd just leak resources sometimes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. I guess that's true for tarball and remote layers. It's just stream layers that close themselves automatically once fully read. And layout.WriteBlob
probably should have just taken a reader since cleanup duties are generally deferred to the opener, but now the API is set.
… incomplete downloads WriteLayer is an alternative to WriteBlob which can be used with streaming layers which have not yet calculated a digest or size. A temporary file is first written and then renamed when the reader has been fully consumed. Also, existing layers which do not have the correct size (and thus invalid hashes, but the hash is not checked) are overwritten. {Append,Replace,Write}{Image,Layer} functions utilize WriteLayer instead of WriteBlob to correctly overwrite undersized layers which were previously ignored without error. Undersized layers should no longer occur, since WriteLayer writes to a temporary file first, but when WriteBlob was used, these were common due to incomplete downloads.
…er blob content is truncated
…eLayers and temp file naming
27e9a32
to
5965a6c
Compare
5965a6c
to
1efe3ef
Compare
Thanks for fixing all these races, let me do another pass through everything but generally lgtm |
My pleasure! I didn't really intend for this PR to grow like it did, but it was a fun excursion from the norm to jump into a new codebase and really get into the weeds of concurrent processing of streams. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Thanks for this contribution! 👍 |
This pull request was prompted by linuxkit/linuxkit#3748.
Linuxkit, which uses an oci-layout cache, suffers the need for frequent cache clears, because interrupted downloads of layers results in images which fail
validate.Image
but cannot be repaired with a subsequent(layout.Path).ReplaceImage
. This is due to the underlying call to(layout.Path).WriteBlob
which does not overwrite existing files, even if they are smaller than the size the manifest claims they should be.I found these comments in
pkg/v1/layout/write.go
:I believe that this PR addresses both issues and will resolve the issue in Linuxkit, as well. However, I have no need for calling
WriteLayer
as library code, so I'm not opposed to making it unexported, as long asReplaceImage
makes use of its behaviors of overwriting undersized layers and/or writing blobs to a temporary location and renaming.Changes:
WriteLayer
is an alternative toWriteBlob
which can be used with streaming layers which have not yet calculated a digest or size. A temporary file is first written and then renamed when the reader has been fully consumed. Also, existing layers which do not have the correct size (and thus invalid hashes, but the hash is not checked) are overwritten.{Append,Replace,Write}{Image,Layer}
functions utilizeWriteLayer
instead ofWriteBlob
to correctly overwrite undersized layers which were previously ignored without error. Undersized layers should no longer occur, sinceWriteLayer
writes to a temporary file first, but whenWriteBlob
was used, these were common due to incomplete downloads.Improve error messages for
validate.Image
andvalidate.Index
when layer blob content is truncated.