-
Notifications
You must be signed in to change notification settings - Fork 651
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CAS in the granularity of file #577
Comments
My answers to @cyphar 's questions at ML ( https://groups.google.com/a/opencontainers.org/d/msg/dev/ocsaii6-P0k/0QcwrF3wCgAJ )
A continuity layer and traditional tar layer can be composed together.
So, I intentionally omitted distribution method from my proposal. 😄
Please elaborate?
This is not a new issue w.r.t. non-lazy pulling.
You can just use continuity as diff layers (requires on-going PR by @dmcgowan containerd/continuity#39 (comment)).
Could you please elaborate why you need to build RPMs for each of blobs? |
Yes you're right. I misunderstood your initial proposal as being effectively "put links where blobs used to be so that you fetch everything". What you're actually proposing is that we have another level of manifest (continuity or whatever else) and then you still use the old blob store but store many more blobs inside it. For
I'd be a bit hesitant about using annotations for history representation, because of their limited scope (they can only be JSON strings). But if we adopt this sort of proposal in the future I would expect that it would be implemented with some sort of descriptor tag describing the history of the thing referenced -- sort of like Camlistore's history model (though without requiring an indexer).
This is for the actual distribution of OCI images, not for the internal representation. The argument from my side is that distribution was solved ~20 years ago with the various distribution formats, and it would be great if we could just use the existing infrastructure for sending around OCI images. In order to achieve deduplication (the whole reason diff layers exist) you need to split up all of the blobs into separate RPMs so that each RPM is only installed once. And then using I'm planning on hacking on this idea near the end of March. The idea would be to make this as agnostic to package format as possible so that everyone can use it. CoreOS already has something similar for their image distribution, but I would like to leverage existing package formats rather than coming up with my own. |
On Fri, Feb 17, 2017 at 12:50:16AM -0800, Akihiro Suda wrote:
For distribution, even though I prefer IPFS, I intentionally kept my
idea agnostic to a certain distribution method (which is out of
scope of OCI mission currently).
IPFS-the-ecosystem is build out of several layers [1]. The data
model, which is distribution agnostic, is only one of those layers
(and the IPFS folks are in the process of peeling it off into a
stand-alone spec with IPLD [2]). The (poorly documented) filesystem
model is *another* layer on top of the generic IPLD framework [3]. If
the current unixfs models don't cut it for you (e.g. poor metadata
support), you can recycle a lot of existing IPFS-ecosystem groundwork
by proposing a different/improved unixfs model on top of the existing
IPLD base, and easily share and transport those blobs via the existing
IPFS ecosystem or alternative naming/exchange/routing/network
implementations.
In the OCI ecosystem, there is also work towards supporting generic
transports (e.g. containers/image#125, containers/image#216). There's
no base data model that I'm aware of here, although SHOULDing
descriptors for references [4] is a step in that direction (e.g. you
can automate most graph walking if you trust things which look
descriptor-ish to be Merkle links [5]).
To me it seems like these are parallel efforts which would benefit
from more cross-pollination. But without agreement on the base model
(the thing that gets hashed for content addressability), clients can't
swap blobs back and forth between the two systems and stay content
addressable. So thinking the IPLD model is tied to a particular
distribution method is not a good reason to be dropping it, but “I
need something which I can distribute via existing OCI CAS
implementations and which can be walked by a descriptor-sniffer” may
be.
[1]: https://github.com/ipfs/specs/tree/143319a99a9f191c1dfd50814623b9b7a4752254/architecture#3-the-stack
[2]: https://github.com/ipld/specs
[3]: https://github.com/ipfs/specs/tree/143319a99a9f191c1dfd50814623b9b7a4752254/architecture#41-unixfs----representing-traditional-files
[4]: https://github.com/opencontainers/image-spec/blame/v1.0.0-rc4/descriptor.md#L10
[5]: https://github.com/openSUSE/umoci/blob/v0.1.0/oci/cas/gc.go#L35-L44
|
@AkihiroSuda Is this something that we need for milestone==v1.0.0 or could this be considered for milestone==post-v1.0.0 ? |
@RobDolinMS This is post 1.0. |
Update: I implemented an initial POC of this: https://github.com/AkihiroSuda/filegrain/tree/v20170504 Your feedback is welcome 😄
UsageBuild a FILEgrain image from a raw rootfs directory: $ filegrain build /tmp/raw-rootfs /tmp/filegrain-image Mount the image with lazy-pulling: $ filegrain mount /tmp/filegrain-image /tmp/bundle/rootfs You can run runC with this bundle, without waiting for all the blobs to be "pulled". Experimental result (using
|
ohman
…On Wed, May 3, 2017 at 1:28 PM, Akihiro Suda ***@***.***> wrote:
Update: I implemented an initial POC of this: https://github.com/
AkihiroSuda/filegrain/tree/v20170504
Your feedback is welcome 😄
- general: https://github.com/AkihiroSuda/filegrain/issues would be
preferred
- something needs to be worked on this image-spec repo: this github
issue
Usage
Build a FILEgrain image from a raw rootfs directory:
$ filegrain build /tmp/raw-rootfs /tmp/filegrain-image
Mount the image with lazy-pulling:
(This is not practically useful ATM because the image already exists on
the local filesystem. In future, it should support remote registry of
course. Anyway, using a local image here seems enough for POC.)
$ filegrain mount /tmp/filegrain-image /tmp/bundle/rootfs
You can run runC with this bundle, without waiting for all the blobs to be
"pulled".
Experimental result (using docker.io/library/java:8)
- Full image size (uncompressed): 633 MB
- Blobs needed for running sh: 4.4 MB
- For java -version: 87 MB
- For javac Hello.java: 136 MB
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#577 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEF6SzdAdeHczaiynx6259m7IVqNrPsks5r2LlVgaJpZM4MEA1T>
.
|
This discussion spans two dev@ threads (I've obsoleted the first). And there's also a GitHub issue [1]. [1]: opencontainers/image-spec#577
Originally I designed this proposal to be agnostic to distribution protocols, because the standardization of distribution was out of the scope of OCI's mission at that time. But the situation has changed now, and I think I found alternative way, although it is specific to Docker/OCI registry API: AkihiroSuda/filegrain#21 |
EDIT: update: POC available: #577 (comment) (May 4, 2017)
I'd like to propose an alternative image layer format that is content-addressable in the granularity of file, rather than in the granularity of tar.
(Of course not proposing for image-spec v1.0.0 :-), just a baby step toward the future spec)
This is a continuation of the previous discussion about lazy-pulling.
Now I refined my proposal focusing on CAS in the granularity of file, rather than just sticking to lazy-pulling.
CAS in the granularity of file also has a benefit of higher concurrency in pulling images.
For distribution, even though I prefer IPFS, I intentionally kept my idea agnostic to a certain distribution method (which is out of scope of OCI mission currently).
Proposal
TLDR: Just use @stevvooe 's continuity instead of tar.
Regular files would be stored as OCI blobs and accessed via the digest value recorded in the continuity manifest.
For the whole idea, please refer to
https://github.com/AkihiroSuda/filegrain/tree/v20170217
The text was updated successfully, but these errors were encountered: