Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crane export: Support reading tarball from a stream #1274

Merged
merged 7 commits into from
Mar 1, 2022

Conversation

abitrolly
Copy link
Contributor

@abitrolly abitrolly commented Jan 28, 2022

Implementation for #1273.

The goal is to read from stdin if the first argument is "-".

./crane export - ubuntu.tar < test.tar

I was not able to trace what is wrong with reading from os.Stdin, and my Go skills are not that good to understand it from the code.

@codecov-commenter
Copy link

codecov-commenter commented Jan 28, 2022

Codecov Report

Merging #1274 (c722e2d) into main (86e1c81) will increase coverage by 0.05%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1274      +/-   ##
==========================================
+ Coverage   74.01%   74.07%   +0.05%     
==========================================
  Files         112      112              
  Lines        8386     8386              
==========================================
+ Hits         6207     6212       +5     
+ Misses       1575     1570       -5     
  Partials      604      604              
Impacted Files Coverage Δ
pkg/v1/remote/write.go 64.82% <0.00%> (ø)
pkg/v1/remote/options.go 74.03% <0.00%> (+4.80%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 86e1c81...c722e2d. Read the comment docs.

Copy link
Collaborator

@jonjohnsonjr jonjohnsonjr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this actually work? We need to read through the tarball multiple times (or buffer it) to effectively export the filesystem.

cmd/crane/cmd/export.go Outdated Show resolved Hide resolved
@abitrolly
Copy link
Contributor Author

This doesn't work correctly. Why the tar needs to be read multiple times? It is a stream format, so it should be possible to process it in one go, no?

@imjasonh
Copy link
Collaborator

imjasonh commented Feb 25, 2022

One reason is we can't assume the ordering of files in the archive. We go through a few times to a.) find the manifest.json in the archive, maybe b.) find the descriptor corresponding to the requested tag (or only image), then c.) find the config blob discovered from (b).

We can't guarantee the ordering of these files in the archive, so we don't (currently) attempt to do this in a single read. Since the (c) depends on (b), we'd have to rely on archives being well-ordered to be able to read them in one go, and I don't think tools currently enforce that ordering.

Aside from that, we also peek at a layer blob to tell if it's compressed.

There might be other places I'm forgetting about where we assume a tar file is reopenable.

It might be possible to refactor all this so we can read a tar archive without repeatedly restarting the read, but we don't today. Doing so might break reading some images that we support today.

A working concept that reads the image from "test.tar" if first
argument is specified as "-".

The goal is to read from a stdin, but that part (commented) fails with

    Error: reading tarball from stdin: archive/tar: invalid tar header
    ./crane export - ubuntu.tar < test.tar
    Error: reading tarball from stdin: archive/tar: invalid tar header
@abitrolly
Copy link
Contributor Author

abitrolly commented Feb 28, 2022

I chose to write stdin to temporary file. On Linux it would be equivalent of writing to memory, because /tmp is mouted as tmpfs. If one pass tar parser is implemented, well-ordered images can save this memory. Even non-ordered images can benefit from that if the manifest.json doesn't come at the end of stream.

This is probably doable for this PR, but the change surface might get too vast to handle.

Rebased as some test was failing for unclear reason.

Copy link
Collaborator

@imjasonh imjasonh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not blocking, but any idea why this doesn't work?

$ ./crane pull ubuntu - | ./crane export - flat.tar
Error: reading tarball from stdin: file manifest.json not found in tar

...but this works:

./crane pull ubuntu ubuntu.tar
./crane export - - < ubuntu.tar | tar -Oxf - etc/passwd

I was hoping to be able to demonstrate this needless monstrosity:

$ ./crane pull ubuntu - | ./crane export - - | tar -Oxf - etc/passwd
Error: reading tarball from stdin: file manifest.json not found in tar
tar: etc/passwd: Not found in archive
tar: Error exit delayed from previous errors.

cmd/crane/cmd/export.go Outdated Show resolved Hide resolved
pkg/crane/export.go Outdated Show resolved Hide resolved
Co-authored-by: Jason Hall <jasonhall@redhat.com>
@abitrolly
Copy link
Contributor Author

Not blocking, but any idea why this doesn't work?

Because pull doesn't implement os.Stdout handling when - is supplied.
https://github.com/google/go-containerregistry/blob/main/cmd/crane/cmd/pull.go
It is not automatic.

@imjasonh
Copy link
Collaborator

imjasonh commented Mar 1, 2022

@jonjohnsonjr Any remaining concerns? If not, feel free to merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants