Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CAS in the granularity of file #577

Closed
AkihiroSuda opened this issue Feb 17, 2017 · 8 comments
Closed

CAS in the granularity of file #577

AkihiroSuda opened this issue Feb 17, 2017 · 8 comments

Comments

@AkihiroSuda
Copy link
Member

AkihiroSuda commented Feb 17, 2017

EDIT: update: POC available: #577 (comment) (May 4, 2017)


I'd like to propose an alternative image layer format that is content-addressable in the granularity of file, rather than in the granularity of tar.
(Of course not proposing for image-spec v1.0.0 :-), just a baby step toward the future spec)

This is a continuation of the previous discussion about lazy-pulling.

Now I refined my proposal focusing on CAS in the granularity of file, rather than just sticking to lazy-pulling.
CAS in the granularity of file also has a benefit of higher concurrency in pulling images.

For distribution, even though I prefer IPFS, I intentionally kept my idea agnostic to a certain distribution method (which is out of scope of OCI mission currently).

Proposal

TLDR: Just use @stevvooe 's continuity instead of tar.
Regular files would be stored as OCI blobs and accessed via the digest value recorded in the continuity manifest.

For the whole idea, please refer to
https://github.com/AkihiroSuda/filegrain/tree/v20170217

@AkihiroSuda
Copy link
Member Author

My answers to @cyphar 's questions at ML ( https://groups.google.com/a/opencontainers.org/d/msg/dev/ocsaii6-P0k/0QcwrF3wCgAJ )

Now I refined my proposal focusing on CAS in the granularity of file,
rather than just sticking to lazy-pulling.
CAS in the granularity of file also has a benefit of higher
concurrency in pulling images.

But also higher latency if you have many small files.

A continuity layer and traditional tar layer can be composed together.
So you can put many small files into a single traditional tar.

For distribution, even though I prefer IPFS, I intentionally kept my
idea agnostic to a certain distribution method (which is out of scope
of OCI mission currently).

As an aside, distribution is something that we're already having trouble
coming to a decision on (and it's something that is being held out of
scope, which is leading to fragmentation of distribution methods). So
note that if your proposal includes some distribution method, expect it
to be delayed for a while longer. ;)

So, I intentionally omitted distribution method from my proposal. 😄

The idea is interesting, and it would be nice to get away from tar
archives. However, my main concern with this idea is that it means you
can no longer easily be certain you have all of the blobs you need for
an image to be "complete" -- in other words the image is no longer
self-contained.

Please elaborate?
Isn't it just achieved by parsing continuity manifest file? (Maybe its format should be JSON rather than PB for ease of implementation?)

How do you envision this scheme working for usecases where network
access is not possible, or is otherwise restricted? What happens if the
blob store you're pulling from goes offline -- how will you get your
file blobs when ImageStore Inc. eventually goes bankrupt or gets acquired?

This is not a new issue w.r.t. non-lazy pulling.
For lazy-pulling, network failure would result in EIO.
If you fear ImageStore Inc, can shutdown, you can use IPFS or something similar.

There's also the question of how will you store the history of an image
-- are diff layers something that you would want?

You can just use continuity as diff layers (requires on-going PR by @dmcgowan containerd/continuity#39 (comment)).
However, since having many continuity layers is redundant, maybe it should be better to have annotation field for representation of the history.

Another valid concern is that this will make "simple" distribution of
OCI images (something I've been working on recently, to see if you can
effectively package all of the blobs as separate .rpm or .deb packages)
quite difficult. On the SUSE side, getting maintainence to approve
several thousand packages containing a single file is not going to
go over well 😉.

Could you please elaborate why you need to build RPMs for each of blobs?

@cyphar
Copy link
Member

cyphar commented Feb 17, 2017

@AkihiroSuda

Please elaborate?
Isn't it just achieved by parsing continuity manifest file? (Maybe its format should be JSON rather than PB for ease of implementation?)

Yes you're right. I misunderstood your initial proposal as being effectively "put links where blobs used to be so that you fetch everything". What you're actually proposing is that we have another level of manifest (continuity or whatever else) and then you still use the old blob store but store many more blobs inside it. For skopeo (or eventually umoci pull) this would involve some sort of recursive parsing to find all blob references and then pull everything necessary -- which is totally fine and to be expected.

However, since having many continuity layers is redundant, maybe it should be better to have annotation field for representation of the history.

I'd be a bit hesitant about using annotations for history representation, because of their limited scope (they can only be JSON strings). But if we adopt this sort of proposal in the future I would expect that it would be implemented with some sort of descriptor tag describing the history of the thing referenced -- sort of like Camlistore's history model (though without requiring an indexer).

Could you please elaborate why you need to build RPMs for each of blobs?

This is for the actual distribution of OCI images, not for the internal representation. The argument from my side is that distribution was solved ~20 years ago with the various distribution formats, and it would be great if we could just use the existing infrastructure for sending around OCI images. In order to achieve deduplication (the whole reason diff layers exist) you need to split up all of the blobs into separate RPMs so that each RPM is only installed once. And then using libsolv or whatever depsolver you like, your package manager will have all the information required.

I'm planning on hacking on this idea near the end of March. The idea would be to make this as agnostic to package format as possible so that everyone can use it. CoreOS already has something similar for their image distribution, but I would like to leverage existing package formats rather than coming up with my own.

@wking
Copy link
Contributor

wking commented Feb 17, 2017 via email

@RobDolinMS
Copy link
Collaborator

@AkihiroSuda Is this something that we need for milestone==v1.0.0 or could this be considered for milestone==post-v1.0.0 ?

@stevvooe
Copy link
Contributor

stevvooe commented Apr 13, 2017

@RobDolinMS This is post 1.0.

@AkihiroSuda
Copy link
Member Author

Update: I implemented an initial POC of this: https://github.com/AkihiroSuda/filegrain/tree/v20170504

Your feedback is welcome 😄

Usage

Build a FILEgrain image from a raw rootfs directory:

$ filegrain build /tmp/raw-rootfs /tmp/filegrain-image

Mount the image with lazy-pulling:
(This is not practically useful ATM because the image already exists on the local filesystem. In future, it should support remote registry of course. Anyway, using a local image here seems enough for POC.)

$ filegrain mount /tmp/filegrain-image /tmp/bundle/rootfs

You can run runC with this bundle, without waiting for all the blobs to be "pulled".

Experimental result (using docker.io/library/java:8)

  • Full image size (uncompressed): 633 MB
  • Blobs needed for running sh: 4.4 MB
  • For java -version: 87 MB
  • For javac Hello.java: 136 MB

@vbatts
Copy link
Member

vbatts commented May 3, 2017 via email

wking added a commit to wking/nmbug-oci that referenced this issue Jul 26, 2017
This discussion spans two dev@ threads (I've obsoleted the first).
And there's also a GitHub issue [1].

[1]: opencontainers/image-spec#577
@AkihiroSuda
Copy link
Member Author

Originally I designed this proposal to be agnostic to distribution protocols, because the standardization of distribution was out of the scope of OCI's mission at that time.

But the situation has changed now, and I think I found alternative way, although it is specific to Docker/OCI registry API: AkihiroSuda/filegrain#21

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants