Skip to content
This repository has been archived by the owner on Sep 16, 2020. It is now read-only.

AkihiroSuda/filegrain

Repository files navigation

⚠️ FILEgrain is abandoned in favor of stargz/CRFS. See containerd#3731 and https://github.com/ktock/remote-snapshotter


FILEgrain: transport-agnostic, fine-grained content-addressable container image layout

Build Status GoDoc

FILEgrain is a (long-term) proposal to extend OCI Image Format to support CAS in the granularity of file, in a transport-agnostic way.

Your feedback is welcome.

Talks

Pros and Cons

Pros:

  • Higher concurrency in pulling image, in a transport-agnostic way
  • Files can be lazy-pulled. i.e. Files can appear at the filesystem before it is actually pulled.
  • Finer deduplication granularity

Cons:

  • The blobs directory in the image can contain a large number of files. So, readdir() for the directory is likely to become slow. This could be mitigated by using external blob stores though.

Format

FILEgrain defines the image manifest which is almost identical to the OCI image manifest, but different in the following points:

  • FILEgrain image manifest supports continuity manifest (application/vnd.continuity.manifest.v0+pb and ...+json) as an Image Layer Filesystem Changeset. Regular files in an image are stored as OCI blob and accessed via the digest value recorded in the continuity manifest. FILEgrain still supports tar layers (application/vnd.oci.image.layer.v1.tar and its families), and it is even possible to put a continuity layer on top of tar layers, and vice versa. Tar layers might be useful for enforcing a lot of small files to be downloaded in batch (as a single tar file).
  • FILEgrain image manifest SHOULD have an annotation filegrain.version=20170501, in both the manifest JSON itself and the image index JSON. This annotation WILL change in future versions.

It is possible and recommended to put both a FILEgrain manifest file and an OCI manifest file in a single image.

Example

image index: (The second entry is a FILEgrain manifest)

{
    "schemaVersion": 2,
    "manifests": [
	{
	    "mediaType": "application/vnd.oci.image.manifest.v1+json",
	    ...
	},
	{
	    "mediaType": "application/vnd.oci.image.manifest.v1+json",
	    ...,
	    "annotations": {
		"filegrain.version": "20170501"
	    }
	}
    ]
}

image manifest: (a continuity layer on top of a tar layer)

{
    "schemaVersion": 2,
    "layers": [
	{
	    "mediaType": "application/vnd.continuity.manifest.v0+json",
	    ...
	},
	{
	    "mediaType": "application/vnd.oci.image.layer.v1.tar+gzip",
	    ..,
	}
    ],
    "annotations": {
	"filegrain.version": "20170501"
    }
}

Distribution

FILEgrain is designed agnostic to transportation and hence can be distribeted in any way.

My personal recommendation is to just put the image directory to IPFS. However, I intentionally designed FILEgrain not to use IPFS multiaddr/multihash.

Future support for IPFS blob store

So as to avoid putting a lot file into a single OCI blob directory, it might be good to consider using IPFS as an additional blob store.

IPFS store support is not yet undertaken, but it would be like this:

{
    "schemaVersion": 2,
    "layers": [
	{
		"mediaType": "application/vnd.continuity.manifest.v0+json",
		...,
		"annotations": {
			"filegrain.ipfs": "QmFooBar"
		}
	}
    ],
    "annotations": {
	"filegrain.version": "2017XXXX"
    }
}

In this case, the layer SHOULD be fetch via IPFS multihash, rather than the digest values specified in the continuity manifest. Also, the continuity manifest MAY omit digest values, since IPFS provides them redundantly.

Note that this is different from just putting the blobs directory onto IPFS, which would still create a lot of files on a single directory, when pulled from non-FILEgrain implementation.

POC

Builder:

  • Build a FILEgrain image from an existing OCI image (--source-type oci-image)
  • Build a FILEgrain image from an existing Docker image (--source-type docker-image)
  • Build a FILEgrain image from a raw rootfs directory (--source-type rootfs)

Lazy Puller:

Mounter:

  • Read-only mount using FUSE (Linux)

Writable mount is not planned at the moment, as FILEgrain is optimized for "cattles" rather than "pets". Users should use bind-mount or some union filesystems for /tmp, /run, and /home.

POC Usage

Install FILEgrain binary:

$ go get github.com/AkihiroSuda/filegrain

Convert a Docker image (e.g. java:8) to a FILEgrain image /tmp/filegrain-image:

# filegrain build -o /tmp/filegrain-image --source-type docker-image java:8

Prepare an OCI bundle /tmp/bundle.sh from ./oci-runtime-bundle.template:

# cp -r ./oci-runtime-bundle.template /tmp/bundle
# cd /tmp/bundle
# ./prepare.sh

Mount the local FILEgrain image /tmp/filegrain-image on /tmp/bundle/rootfs:

# filegrain mount /tmp/filegrain-image /tmp/bundle/rootfs

In future, filegrain mount should support mounting remote images over Docker Registry HTTP API as well.

Open another terminal, and start runC with the bundle /tmp/bundle:

# cd /tmp/bundle
# runc run foo

Instead of runc, you will be able to use docker run as well when Docker supports running an arbitrary OCI runtime bundle.

The container starts without pulling all the blobs. Pulled blobs can be found on /tmp/filegrain-blobcacheXXXXX:

# du -hs /tmp/filegrain-blobcache*

This directory grows as you read(2) files within the container rootfs.

POC Benchmark

Please refer to #17.

e.g. Pulling 352MB of blobs is enough for using NLTK with 8.3GB kaggle/python image.

Similar work

Lazy distribution

FAQ

Q. Why not just use IPFS directory? It is CAS in the granularity of file.

A. Because IPFS does not support metadata of files. Also, this way is not transport-agnostic.

Q. Usecases for lazy-pulling?

A. Here are some examples I can come up with:

  • Huge web application composed of a lot of static HTML and graphic files
  • Huge scientific data (a content-addressable image with full code and full data would be great for reproducible research)
  • Huge OS image (e.g. Windows Server, Linux with VNC desktop)
  • Huge runtime (e.g. Java, dotNET)
  • Huge image that is composed of multiple software stack for integration testing

Please also refer to the list of similar work about lazy distribution.

Q. Isn't it a bad idea to put a lot of file into a single blobs directory?

A. This could be mitigated by avoid putting file into the OCI blob store, and use an external blob store instead e.g. IPFS. (go-ipfs supports sharding), although not transport-agnostic. See also an idea about future support for IPFS blob store.

Also, there is an idea to implement sharding to the OCI native blob store: opencontainers/image-spec#449.