Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Temp File For layout.{Write,Append,Replace} Image/Index Methods #1226

Merged
merged 9 commits into from
Jan 8, 2022

Conversation

ben-krieger
Copy link
Contributor

@ben-krieger ben-krieger commented Jan 1, 2022

This pull request was prompted by linuxkit/linuxkit#3748.

Linuxkit, which uses an oci-layout cache, suffers the need for frequent cache clears, because interrupted downloads of layers results in images which fail validate.Image but cannot be repaired with a subsequent (layout.Path).ReplaceImage. This is due to the underlying call to (layout.Path).WriteBlob which does not overwrite existing files, even if they are smaller than the size the manifest claims they should be.

I found these comments in pkg/v1/layout/write.go:

// TODO: A streaming version of WriteBlob so we don't have to know the hash
// before we write it.

// TODO: For streaming layers we should write to a tmp file then Rename to the
// final digest.

I believe that this PR addresses both issues and will resolve the issue in Linuxkit, as well. However, I have no need for calling WriteLayer as library code, so I'm not opposed to making it unexported, as long as ReplaceImage makes use of its behaviors of overwriting undersized layers and/or writing blobs to a temporary location and renaming.


Changes:

WriteLayer is an alternative to WriteBlob which can be used with streaming layers which have not yet calculated a digest or size. A temporary file is first written and then renamed when the reader has been fully consumed. Also, existing layers which do not have the correct size (and thus invalid hashes, but the hash is not checked) are overwritten.

{Append,Replace,Write}{Image,Layer} functions utilize WriteLayer instead of WriteBlob to correctly overwrite undersized layers which were previously ignored without error. Undersized layers should no longer occur, since WriteLayer writes to a temporary file first, but when WriteBlob was used, these were common due to incomplete downloads.

Improve error messages for validate.Image and validate.Index when layer blob content is truncated.

@codecov-commenter
Copy link

codecov-commenter commented Jan 2, 2022

Codecov Report

Merging #1226 (4c3cdb6) into main (d9bfbcb) will increase coverage by 0.04%.
The diff coverage is 73.07%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1226      +/-   ##
==========================================
+ Coverage   73.80%   73.84%   +0.04%     
==========================================
  Files         111      111              
  Lines        8191     8263      +72     
==========================================
+ Hits         6045     6102      +57     
- Misses       1549     1559      +10     
- Partials      597      602       +5     
Impacted Files Coverage Δ
pkg/v1/layout/write.go 49.35% <61.11%> (+1.43%) ⬆️
pkg/v1/stream/layer.go 87.66% <81.57%> (+7.49%) ⬆️
pkg/v1/mutate/index.go 78.99% <0.00%> (-0.85%) ⬇️
pkg/v1/mutate/image.go 69.23% <0.00%> (-0.50%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d9bfbcb...4c3cdb6. Read the comment docs.

@jonjohnsonjr
Copy link
Collaborator

Thanks @ben-krieger are you able to sign the CLA?

@ben-krieger
Copy link
Contributor Author

Thanks @ben-krieger are you able to sign the CLA?

@jonjohnsonjr I'm working on the process with my employer (Intel) now, but it's a holiday, so it might take a day or two. In the meantime, I'll fix the failing test and try to increase code coverage of tests.

@ben-krieger ben-krieger force-pushed the layout-write-layer branch 3 times, most recently from f753f3c to 1764b92 Compare January 3, 2022 20:21
pkg/v1/validate/image.go Outdated Show resolved Hide resolved
pkg/v1/layout/write.go Outdated Show resolved Hide resolved
@jonjohnsonjr
Copy link
Collaborator

However, I have no need for calling WriteLayer as library code, so I'm not opposed to making it unexported, as long as ReplaceImage makes use of its behaviors of overwriting undersized layers and/or writing blobs to a temporary location and renaming.

Let's keep this unexported, but I'm not opposed to exporting it if/when you need to call it directly.

@ben-krieger ben-krieger changed the title Add layout.WriteLayer Use Temp File For layout.{Write,Append,Replace} Image/Index Methods Jan 3, 2022
pkg/v1/layout/write.go Outdated Show resolved Hide resolved
l: l,
// NOTE: Order matters! If zw is closed first, then it will panic,
// because io.Copy in the goroutine below will still be copying the
// contents of l.blob into it.
Copy link
Contributor Author

@ben-krieger ben-krieger Jan 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found this out when calling Close on a stream.Layer that had not been fully read. I can write a test, but it's getting a bit out of scope for this PR.

And this only stops a panic from occurring. It was still confusing to see a stream.ErrNotConsumed after calling Close() even though it technically makes sense.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm really surprised nobody has caught this yet. Where do we panic with the other order?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try this. Revert the change:

diff --git a/pkg/v1/stream/layer.go b/pkg/v1/stream/layer.go
index 9f42f630..3a5993f7 100644
--- a/pkg/v1/stream/layer.go
+++ b/pkg/v1/stream/layer.go
@@ -162,7 +162,8 @@ func newCompressedReader(l *Layer) (*compressedReader, error) {
 		// NOTE: Order matters! If zw is closed first, then it will panic,
 		// because io.Copy in the goroutine below will still be copying the
 		// contents of l.blob into it.
-		closer: newMultiCloser(l.blob, zw),
+		closer: newMultiCloser(zw, l.blob),
+		// closer: newMultiCloser(l.blob, zw),
 
 		pr:    pr,
 		bw:    bw,

Add this test:

func TestCloseBeforeConsume(t *testing.T) {
	// Write small tar to pipe
	pr, pw := io.Pipe()
	tw := tar.NewWriter(pw)
	go func() {
		pw.CloseWithError(func() error {
			body := "test file"
			if err := tw.WriteHeader(&tar.Header{
				Name:     "test.txt",
				Mode:     0600,
				Size:     int64(len(body)),
				Typeflag: tar.TypeReg,
			}); err != nil {
				return err
			}
			if _, err := tw.Write([]byte(body)); err != nil {
				return err
			}
			return tw.Close()
		}())
	}()

	// Create stream layer from tar pipe
	l := NewLayer(pr)
	rc, err := l.Compressed()
	if err != nil {
		t.Fatalf("Compressed: %v", err)
	}

	// Close stream layer before consuming
	if err := rc.Close(); err != nil {
		t.Errorf("Close: %v", err)
	}
}

If you run it, you'll see that it panics maybe 15-30% of the time. Below, it panicked on the third run.

$ go test -count=20 -run CloseBeforeConsume ./pkg/v1/stream/
--- FAIL: TestCloseBeforeConsume (0.00s)
    layer_test.go:257: Close: io: read/write on closed pipe
--- FAIL: TestCloseBeforeConsume (0.00s)
    layer_test.go:257: Close: io: read/write on closed pipe
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x4ebeb4]

goroutine 14 [running]:
compress/flate.(*Writer).Write(...)
        /home/bkrieger/.gimme/versions/go1.17.2.linux.amd64/src/compress/flate/deflate.go:712
compress/gzip.(*Writer).Write(0xc0000b6840, {0xc0002c0000, 0x200, 0x8000})
        /home/bkrieger/.gimme/versions/go1.17.2.linux.amd64/src/compress/gzip/gzip.go:196 +0x334
io.(*multiWriter).Write(0x40b54d, {0xc0002c0000, 0x200, 0x8000})
        /home/bkrieger/.gimme/versions/go1.17.2.linux.amd64/src/io/multi.go:60 +0x86
io.copyBuffer({0x5d1b00, 0xc0002be000}, {0x5d1a60, 0xc0000100c0}, {0x0, 0x0, 0x0})
        /home/bkrieger/.gimme/versions/go1.17.2.linux.amd64/src/io/io.go:425 +0x204
io.Copy(...)
        /home/bkrieger/.gimme/versions/go1.17.2.linux.amd64/src/io/io.go:382
github.com/google/go-containerregistry/pkg/v1/stream.newCompressedReader.func1()
        /home/bkrieger/go/src/github.com/google/go-containerregistry/pkg/v1/stream/layer.go:176 +0xe7
created by github.com/google/go-containerregistry/pkg/v1/stream.newCompressedReader
        /home/bkrieger/go/src/github.com/google/go-containerregistry/pkg/v1/stream/layer.go:175 +0x575
FAIL    github.com/google/go-containerregistry/pkg/v1/stream    0.007s
FAIL

Running the test with the race detector helps explain a bit further.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting... this looks like it might be a race in the gzip package where concurrently calling Write and Close isn't safe. Nice find.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this is working as intended but I disagree: golang/go#18883

IMO Write or Close should return an error and not panic here. I'll send a PR for compress/gzip.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something I was wondering as I explored the code: why does Compressed() and similar functions return an io.ReadCloser rather than an io.Reader? And why do functions like layout.WriteBlob take an io.ReadCloser? It doesn't seem like the Close() method is ever or should ever be called by a user of this library.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling Close() allows us to clean up e.g. the goroutines that are gzipping stuff and writing into an io.PipeWriter or an open file. Otherwise we'd just leak resources sometimes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I guess that's true for tarball and remote layers. It's just stream layers that close themselves automatically once fully read. And layout.WriteBlob probably should have just taken a reader since cleanup duties are generally deferred to the opener, but now the API is set.

pkg/v1/layout/write.go Outdated Show resolved Hide resolved
pkg/v1/stream/layer.go Outdated Show resolved Hide resolved
… incomplete downloads

WriteLayer is an alternative to WriteBlob which can be used with
streaming layers which have not yet calculated a digest or size. A
temporary file is first written and then renamed when the reader has
been fully consumed. Also, existing layers which do not have the correct
size (and thus invalid hashes, but the hash is not checked) are
overwritten.

{Append,Replace,Write}{Image,Layer} functions utilize WriteLayer instead
of WriteBlob to correctly overwrite undersized layers which were
previously ignored without error. Undersized layers should no longer
occur, since WriteLayer writes to a temporary file first, but when
WriteBlob was used, these were common due to incomplete downloads.
pkg/v1/layout/write.go Outdated Show resolved Hide resolved
@jonjohnsonjr
Copy link
Collaborator

Thanks for fixing all these races, let me do another pass through everything but generally lgtm

@ben-krieger
Copy link
Contributor Author

Thanks for fixing all these races, let me do another pass through everything but generally lgtm

My pleasure! I didn't really intend for this PR to grow like it did, but it was a fun excursion from the norm to jump into a new codebase and really get into the weeds of concurrent processing of streams.

Copy link
Collaborator

@jonjohnsonjr jonjohnsonjr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

pkg/v1/layout/write.go Outdated Show resolved Hide resolved
pkg/v1/layout/write.go Show resolved Hide resolved
pkg/v1/layout/write_test.go Outdated Show resolved Hide resolved
@imjasonh imjasonh merged commit ca48523 into google:main Jan 8, 2022
@imjasonh
Copy link
Collaborator

imjasonh commented Jan 8, 2022

Thanks for this contribution! 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants