Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

image: Refactor to use cas/ref engines instead of walkers #5

Open
wants to merge 14 commits into
base: master
Choose a base branch
from

Conversation

wking
Copy link
Contributor

@wking wking commented Sep 16, 2016

Abstract out the details of the image-layout format, since the validators and unpackers shouldn't care about that sort of backend stuff. This should make it easy to build backends based on zip files, HTTP, the Docker registry format, etc.

Carried from opencontainers/image-spec#159. The outstanding issue there was what to do about tar handling. I'm fine with this PR as it stands, but I've floated a few alternatives in case the maintainers want to pick an alternative path.

@runcom
Copy link
Member

runcom commented Sep 16, 2016

/cc @s-urbaniak

@wking
Copy link
Contributor Author

wking commented Sep 17, 2016

I've pushed 968748490c4fea with:

  • An ‘=’ → ‘:=’ fix in cmd/oci-refs/list.go's printName.
  • O_RDWR → O_WRONLY in image/layout/tar.go's CreateTarFile.
  • A new commit adding directory-backed engines (to compliment the
    previous tar-backed engines). The directory-backed engines are much
    nicer ;).

On Fri, Sep 16, 2016 at 01:53:40AM -0700, W. Trevor King wrote:

Abstract out the details of the image-layout format, since the validators and unpackers shouldn't care about that sort of backend stuff. This should make it easy to build backends based on zip files, HTTP, the Docker registry format, etc.

Carried from opencontainers/image-spec#159. The outstanding issue there was what to do about tar handling. I'm fine with this PR as it stands, but I've floated a few alternatives in case the maintainers want to pick an alternative path.

You can view, comment on, or merge this pull request online at:

#5

-- Commit Summary --

  • Makefile: Add a pattern rule for oci-* commands
  • image/cas: Add a generic CAS interface
  • image/refs: Add a generic name-based reference interface
  • specs-go: Add ImageLayoutVersion and check oci-layout in tar engines
  • image/layout/tar.go: Add TarEntryByName
  • image/cas/put: Add a PutJSON helper
  • image/cas: Implement Engine.Put
  • vendor: Bundle golang.org/x/net/context
  • image/*/interface: Add unstable warnings to Engines
  • image/layout/tar: Add a CreateTarFile helper
  • image/refs: Implement Engine.Put
  • image: Refactor to use cas/ref engines instead of walkers
  • cmd: Document the cas, refs, and init commands

-- File Changes --

M .gitignore (5)
M Makefile (20)
A cmd/oci-cas/get.go (93)
A cmd/oci-cas/main.go (38)
A cmd/oci-cas/oci-cas-get.1.md (27)
A cmd/oci-cas/oci-cas-put.1.md (27)
A cmd/oci-cas/oci-cas.1.md (47)
A cmd/oci-cas/put.go (83)
M cmd/oci-create-runtime-bundle/main.go (9)
M cmd/oci-create-runtime-bundle/oci-create-runtime-bundle.1.md (4)
A cmd/oci-image-init/image_layout.go (57)
A cmd/oci-image-init/main.go (37)
A cmd/oci-image-init/oci-image-init-image-layout.1.md (27)
A cmd/oci-image-init/oci-image-init.1.md (33)
A cmd/oci-image-tools.7.md (40)
M cmd/oci-image-validate/main.go (14)
M cmd/oci-image-validate/oci-image-validate.1.md (8)
A cmd/oci-refs/get.go (78)
A cmd/oci-refs/list.go (75)
A cmd/oci-refs/main.go (39)
A cmd/oci-refs/oci-refs-get.1.md (27)
A cmd/oci-refs/oci-refs-list.1.md (27)
A cmd/oci-refs/oci-refs-put.1.md (27)
A cmd/oci-refs/oci-refs.1.md (55)
A cmd/oci-refs/put.go (81)
M cmd/oci-unpack/main.go (13)
M cmd/oci-unpack/oci-unpack.1.md (4)
M glide.lock (10)
M image/autodetect.go (3)
A image/cas/interface.go (47)
A image/cas/layout/interface.go (25)
A image/cas/layout/main.go (37)
A image/cas/layout/tar.go (108)
A image/cas/put.go (47)
M image/config.go (69)
M image/descriptor.go (115)
M image/image.go (202)
A image/layout/layout.go (22)
A image/layout/tar.go (267)
M image/manifest.go (119)
M image/manifest_test.go (45)
A image/refs/interface.go (82)
A image/refs/layout/main.go (37)
A image/refs/layout/tar.go (135)
D image/walker.go (118)
A vendor/golang.org/x/net/LICENSE (27)
A vendor/golang.org/x/net/PATENTS (22)
A vendor/golang.org/x/net/context/context.go (156)
A vendor/golang.org/x/net/context/go17.go (72)
A vendor/golang.org/x/net/context/pre_go17.go (300)

-- Patch Links --

https://github.com/opencontainers/image-tools/pull/5.patch
https://github.com/opencontainers/image-tools/pull/5.diff

You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
#5

This email may be signed or encrypted with GnuPG (http://www.gnupg.org).
For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy

@wking
Copy link
Contributor Author

wking commented Sep 17, 2016

Pushed 90c4fea7c9efa7 adding:

  • oci-cas delete …

  • oci-refs delete …

  • DirDelete to avoid 1:

    $ make lint
    checking lint
    image/cas/layout/dir.go:108::warning: duplicate of image/refs/layout/dir.go:156-167 (dupl)
    image/refs/layout/dir.go:156::warning: duplicate of image/cas/layout/dir.go:108-119 (dupl)

@xiekeyang
Copy link
Contributor

xiekeyang commented Sep 22, 2016

@wking

This patch seems to be too great to be merged sooner. Could you please split it and submit one by one commit? Now we need the method instead of walking, and go forward to some other functionality based on it. I'm working on my local side https://github.com/xiekeyang/image-tools/commit/ff0401330fdca591f67b82924c74fc84ce9d052f referring to your implementation, which is temporary solution and should wait for your patch merged. So I hope your commit can be merged. Thanks so much.

@wking
Copy link
Contributor Author

wking commented Sep 22, 2016

On Wed, Sep 21, 2016 at 05:27:46PM -0700, xiekeyang wrote:

Could you please split it and submit one by one commit?

First commit submitted separately as #28. If/when that lands, I'll
file a PR for the next commit, etc. (unless this PR has already landed
in its entirety ;).

@wking
Copy link
Contributor Author

wking commented Oct 11, 2016

On Wed, Sep 21, 2016 at 10:54:19PM -0700, W. Trevor King wrote:

On Wed, Sep 21, 2016 at 05:27:46PM -0700, xiekeyang wrote:

Could you please split it and submit one by one commit?

First commit submitted separately as #28. If/when that lands, I'll
file a PR for the next commit, etc. (unless this PR has already
landed in its entirety ;).

Rebased onto master with 7c9efa71926150 now that #28 has landed.
Bottom of that stack submitted separately as #40 in case that
continues to help review.

@wking
Copy link
Contributor Author

wking commented Oct 15, 2016

I've pushed 1926150ee44a09 which:

  • Checks the result of the DirEngine.Put cleanup Remove, and returns a wrapped error if both the Put and the cleanup Remove fail.
  • Adds a comment describing the public DirDelete function.

@wking wking force-pushed the cas-api branch 2 times, most recently from cb0b850 to 4dfe06f Compare October 15, 2016 05:51
@wking
Copy link
Contributor Author

wking commented Oct 15, 2016

I've pushed ee44a094dfe06f fixing a few more lint issues with the
tip commit:

  • Cleanup Remove error wrapping in the cas DirEngine.Put (previously
    I'd just fixed the refs DirEngine.Put).
  • Check the io.Copy error status in cas DirEngine.Put.
  • Don't mask err in cas TarEngine.Put now that it's using blobPath.

wking added 11 commits November 18, 2016 07:43
Making the CAS/refs Get implementations more DRY.

Signed-off-by: W. Trevor King <wking@tremily.us>
A fair amount of image setup involves pushing JSON objects to CAS, so
provide a convenient wrapper around that.  This implementation could
be improved, with at least:

* Consistent key sorts, etc. to increase the chances of matching an
  existing CAS blob.
* Streaming the marshaled JSON into the engine to avoid serializing it
  in memory before passing it into Engine.Put.

But the API is fine, and we can improve the implementation as we go.

Signed-off-by: W. Trevor King <wking@tremily.us>
Generated with:

  $ make install.tools
  $ make update-deps
  $ git checkout HEAD -- vendor/github.com/spf13/pflag
  $ git checkout HEAD -- vendor/github.com/runtime-spec

I rolled back the other changes because I haven't checked for
compatibility issues due to the upgrades.  I also rolled back the hash
bumps for those packages in glide.lock.

Signed-off-by: W. Trevor King <wking@tremily.us>
This is a bit awkward.  For writing a tar entry, we need to know both
the name and size of the file ahead of time.  The implementation in
this commit accomplishes that by reading the Put content into a
buffer, hashing and sizing the buffer, and then calling
WriteTarEntryByName to create the entry.  With a filesystem-backed CAS
engine, we could avoid the buffer by writing the file to a temporary
location with rolling hash and size tracking and then renaming the
temporary file to the appropriate path.

WriteTarEntryByName itself has awkward buffering to avoid dropping
anything onto disk.  It reads through its current file and writes the
new tar into a buffer, and then writes that buffer back back over its
current file.  There are a few issues with this:

* It's a lot more work than you need if you're just appending a new
  entry to the end of the tarball.  But writing the whole file into a
  buffer means we don't have to worry about the trailing blocks that
  mark the end of the tarball; that's all handled transparently for us
  by the Go implementation.  And this implementation doesn't have to
  be performant (folks should not be using tarballs to back
  write-heavy engines).

* It could leave you with a corrupted tarball if the caller dies
  mid-overwrite.  Again, I expect folks will only ever write to a
  tarball when building a tarball for publishing.  If the caller dies,
  you can just start over.  Folks looking for a more reliable
  implementation should use a filesystem-backed engine.

* It could leave you with dangling bytes at the end of the tarball.  I
  couldn't find a Go invocation to truncate the file.  Go does have an
  ftruncate(2) wrapper [1], but it doesn't seem to be exposed at the
  io.Reader/io.Writer/... level.  So if you write a shorter file with
  the same name as the original, you may end up with some dangling
  bytes.

cas.Engine.Put protects against excessive writes with a Get guard;
after hashing the new data, Put trys to Get it from the tarball and
only writes a new entry if it can't find an existing entry.  This also
protects the CAS engine from the dangling-bytes issue.

The 0666 file modes and 0777 directory modes rely on the caller's
umask to appropriately limit user/group/other permissions for the
tarball itself and any content extracted to the filesystem from the
tarball.

The trailing slash manipulation (stripping before comparison and
injecting before creation) is based on part of libarchive's
description of old-style archives [2]:

  name
    Pathname, stored as a null-terminated string.  Early tar
    implementations only stored regular files (including hardlinks to
    those files).  One common early convention used a trailing "/"
    character to indicate a directory name, allowing directory
    permissions and owner information to be archived and restored.

and POSIX ustar archives [3]:

  name, prefix
    ... The standard does not require a trailing / character on
    directory names, though most implementations still include this
    for compatibility reasons.

[1]: https://golang.org/pkg/syscall/#Ftruncate
[2]: https://github.com/libarchive/libarchive/wiki/ManPageTar5#old-style-archive-format
[3]: https://github.com/libarchive/libarchive/wiki/ManPageTar5#posix-ustar-archives

Signed-off-by: W. Trevor King <wking@tremily.us>
This is pretty straightforward with the new WriteTarEntryByName
helper.

I considered pulling the ref name -> path conversion (%s -> ./refs/%s)
out into a helper function to stay DRY, but the logic is simple enough
that it seemed more intuitive to leave it inline.

Signed-off-by: W. Trevor King <wking@tremily.us>
The NewEngine commands for the tar-backed image-layout engines (both
the CAS and refs engines) open files O_RDWR and expect image-layout
compatible content in the tarball.  That makes sense, but for folks
who *don't* have such a tarball, a helper like CreateTarFile makes it
easy to explicitly create an empty one.

The 0666 file modes and 0777 directory modes rely on the caller's
umask to appropriately limit user/group/other permissions for the
tarball itself and any content extracted to the filesystem from the
tarball.

The trailing slashes are based on part of libarchive's description of
old-style archives [1]:

  name
    Pathname, stored as a null-terminated string.  Early tar
    implementations only stored regular files (including hardlinks to
    those files).  One common early convention used a trailing "/"
    character to indicate a directory name, allowing directory
    permissions and owner information to be archived and restored.

and POSIX ustar archives [2]:

  name, prefix
    ... The standard does not require a trailing / character on
    directory names, though most implementations still include this
    for compatibility reasons.

Expose this new functionality on the command line as:

  $ oci-image-init image-layout PATH

where 'image-layout' is a separate level in case we support
initializing additional types of repositories in the future.

[1]: https://github.com/libarchive/libarchive/wiki/ManPageTar5#old-style-archive-format
[2]: https://github.com/libarchive/libarchive/wiki/ManPageTar5#posix-ustar-archives

Signed-off-by: W. Trevor King <wking@tremily.us>
The validation/unpacking code doesn't really care what the reference
and CAS implemenations are.  And the new generic interfaces in
image/refs and image/cas will scale better as we add new backends than
the walker interface.  This replaces the simpler interface from
image/reader.go with something more robust.

The old tar/directory distinction between image and imageLayout is
gone.  The new CAS/refs engines don't support directory backends yet
(I plan on adding them once the engine framework lands), but the new
framework will handle tar/directory/... detection inside
layout.NewEngine (and possibly inside a new (cas|refs).NewEngine when
we grow engine types that aren't based on image-layout).

Also replace the old methods like:

  func (d *descriptor) validateContent(r io.Reader) error

with functions like:

  validateContent(ctx context.Context, descriptor *specs.Descriptor, r io.Reader) error

to avoid local types that duplicate the image-spec types.  This saves
an extra instantiation for folks who want to validate (or whatever) a
specs.Descriptor they have obtained elsewhere.

I'd prefer casLayout and refsLayout for the imported packages, but
Stephen doesn't want camelCase for package names [1].

[1]: opencontainers/image-spec#159 (comment)

Signed-off-by: W. Trevor King <wking@tremily.us>
Most of this is new boilerplate, but oci-image-tools.7.md is based on
the old oci-image-tool.1.md removed by fe363aa (*: move to
opencontainers/image-tools, 2016-09-15).  There's a lot going on in
this repo, and it's nice to have a page that outlines everything
provided by the project even though we're no longer providing a single
command.

Signed-off-by: W. Trevor King <wking@tremily.us>
Don't worry about:

  $ make lint
  checking lint
  image/cas/layout/dir.go:37::warning: duplicate of image/refs/layout/dir.go:37-54 (dupl)
  image/refs/layout/dir.go:37::warning: duplicate of image/cas/layout/dir.go:37-54 (dupl)
  cmd/oci-cas/delete.go:41::warning: duplicate of cmd/oci-cas/get.go:43-62 (dupl)
  cmd/oci-refs/delete.go:41::warning: duplicate of cmd/oci-refs/get.go:42-61 (dupl)
  cmd/oci-cas/delete.go:15::warning: duplicate of cmd/oci-refs/delete.go:15-72 (dupl)
  cmd/oci-refs/delete.go:15::warning: duplicate of cmd/oci-cas/delete.go:15-72 (dupl)
  cmd/oci-cas/get.go:43::warning: duplicate of cmd/oci-refs/get.go:42-61 (dupl)
  cmd/oci-refs/get.go:42::warning: duplicate of cmd/oci-cas/get.go:43-62 (dupl)
  make: *** [lint] Error 1

The commands are all similar (open an engine, perform some method,
print the result), but are short enough that extracting out helpers
would be more trouble and indirection than it's worth.

Oddly, dupl seems happy to print:

  "duplicate of oci-cas/get.go:..."

and

  "duplicate of get.go:..."

if I exclude:

  "duplicate of cmd/oci-cas/get.go:..."

or

  "duplicate of .*oci-cas/get.go:..."

I want to get "oci-cas" and "oci-refs" in the exclusion regular
expression somewhere to avoid accidentally skipping dupl checks for
other get.go and similar if they show up somewhere else in the
repository, so I'm matching on the initial filename.

Signed-off-by: W. Trevor King <wking@tremily.us>
These are much nicer than the tar engines (hooray atomic renames :),
so switch the manifest tests tests back to using the directory-backed
engines.  I also switched the man-page examples over to
directory-backed layouts, now that they're what oci-image-init
generates by default.  And I added command-line wrappers for the
delete methods, now that we have a backend that implements it.

I do with there was a paginated, callback-based directory lister we
could use instead of ioutils.ReadDir.  On the other hand, by the time
directories get big enough for that to matter we may be sharding them
anyway.

Signed-off-by: W. Trevor King <wking@tremily.us>
@stevvooe
Copy link
Contributor

@cyphar Sorry for the slow cycle, but it looks like that link is broken.

Shall we take this design into an issue and close this PR?

@wking
Copy link
Contributor Author

wking commented Nov 18, 2016

On Fri, Nov 18, 2016 at 12:11:25PM -0800, Stephen Day wrote:

@cyphar Sorry for the slow cycle, but it looks like that link is broken.

He removed it in cyphar/umoci@8d19f36. The last version before that
removal is in 1.

Shall we take this design into an issue and close this PR?

I'm happy to discuss any design differences between what I have here
and the choices @cyphar has made in umoci. I don't think ditching all
of this is the best way to have that discussion ;). Last night saw
some progress on #40, maybe you want to chime in there on any issues
you see with the CAS engine interface or the tar-get implementation?

@cyphar
Copy link
Member

cyphar commented Nov 19, 2016

@stevvooe I've removed the file because all of the stuff in the design document has since been implemented in umoci and the design document no longer accurately described how everything is implemented.

@zhouhao3
Copy link

SGTM, What is the current state of this patch?

@wking
Copy link
Contributor Author

wking commented Jul 27, 2017

SGTM, What is the current state of this patch?

Wait, is there really renewed interest in this over a year since I filed it as opencontainers/image-spec#159? If so, you should probably start with #40, which has just the CAS part of this PR. And getting a green light or concrete improvement suggestions from @stevvooe would be nice, since he's been pushing back against my proposed CAS interface, most recently here.

@stevvooe
Copy link
Contributor

@q384566678 We worked hard on defining a useful content api in https://godoc.org/github.com/containerd/containerd/content. It handles all the cases of resumption, keeps memory usage low and handles pre and post known digest entries, as well as concurrency issues. We've also been able to use context to wrap it in transaction managers.

There may be a few tweaks we still need to make, but it is the culmination of work around the old docker/distribution, docker and containerd, as well as some research on distributed content addressable storage systems.

I'd advocate for closing this PR.

Let me know if you have questions.

@wking
Copy link
Contributor Author

wking commented Jul 28, 2017

We worked hard on defining a useful content api in https://godoc.org/github.com/containerd/containerd/content.

Comparing that API's Provider.Reader:

type Provider interface {
  Reader(ctx context.Context, dgst digest.Digest) (io.ReadCloser, error)
  ReaderAt(ctx context.Context, dgst digest.Digest) (io.ReaderAt, error)
}

with my Engine.Get:

type Engine interface {
  Get(ctx context.Context, digest digest.Digest) (reader io.ReadCloser, err error)
}

You can see tht we have exactly the same API. And it is also shared by umoci, where it is GetBlob. So I don't see any problem with the part of this PR's API that I've covered in #40.

@zhouhao3
Copy link

@wking @stevvooe @xiekeyang @cyphar
Sorry I forgot to reply. So the main question right now is the API question?
I've seen similar functionality in umoci. And these features are important for image-tools propulsion, and I wonder if we can discuss and solve this?

@wking
Copy link
Contributor Author

wking commented Oct 25, 2017

In case folks are interested, I have more recent work on this front in the stand-alone casengine repo, and there's a good ref-engine reader in oci-discovery (although there is currently no ref-engine writer there). @xiekeyang and I would be happy to get feedback on those and, eventually if feedback is positive, transfer those to the OCI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants