Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify contentDigest meaning for docker/OCI images #287

Closed
glyn opened this issue Oct 14, 2019 · 7 comments
Closed

Clarify contentDigest meaning for docker/OCI images #287

glyn opened this issue Oct 14, 2019 · 7 comments

Comments

@glyn
Copy link
Contributor

glyn commented Oct 14, 2019

Let's start with some definitions (based on the "OCI Image Format Specification").

Docker and OCI images have two types of digest: a repo digest and an image id. A repo digest is the SHA-256 digest of the compressed image manifest. Since compression depends on the implementation of the registry used to store the image, the repo digest doesn't logically exist until the image has been pushed. An image id, on the other hand, is the SHA-256 digest of the uncompressed image configuration, which is independent of the registry implementation.

Both these digests are content addresses of an image in the sense that each uniquely identifies the content (modulo SHA-256 collisions). Note that the docker registry spec refers to the repo digest as a "content digest".

The CNAB spec defines the contentDigest fields in bundle.json as follows, firstly for invocation images:

The contentDigest field MUST contain a digest, in OCI format, to be used to compute the integrity of the image. The calculation of how the image matches the contentDigest is dependent upon image type. (OCI, for example, uses a Merkle tree while VM images are checksums). During bundle development, it may be ideal to omit the contentDigest field and/or skip validation. Once a bundle is ready to be transmitted as a thick or thin bundle, it must have a contentDigest field. If a contentDigest field is present, a runtime MUST validate the image digest prior to executing an action. If the contentDigest is not present, the runtime SHOULD report an error so the user is aware that there is no contentDigest provided. Runtimes MAY allow users to override this behavior and perform actions on bundles that do not have contentDigest values populated.

and then for images other than invocation images:

contentDigest: MUST contain a digest of the contents of the image, in OCI format, to be used to compute the integrity of the image. The calculation of how the image matches the contentDigest is dependent upon image type. (OCI, for example, uses a Merkle tree while VM images use checksums.)

Since both repo digests and image ids are roots of Merkle trees, the CNAB spec doesn't actually prescribe whether repo digest or image id (or indeed some other Merkle tree root digest!) should be used for contentDigest fields of docker/OCI images. This needs clarifying so that CNAB runtimes know how to validate these fields.

@trishankatdatadog
Copy link
Member

Great question! I think image id is the better idea, since it's registry-independent... @jlegrone

@jeremyrickard
Copy link
Member

I think we discussed this way back in 2018 in the early days and this was a common view. I think we ended up going with the assumption it was the repo digest, I’ll see if I can find an old issue in the Duffle repo!

I think the image id is attractive since it has no registry requirement.

@jeremyrickard
Copy link
Member

jeremyrickard commented Jan 28, 2020

Some of this history was on Docker’s slack I think, so that’s not all going to be recoverable (unless Docker can get it), some related comments and discussion:

cnabio/duffle#691 (comment)

#61 (comment)

A note I had from someone at docker (dcmg)


containerd uses the digest, not the image id to refer to images. If you pull with containerd, the image digest used is the manifest hash. Containerd itself does no create content, so you will always know the digest before pushing. If using buildkit, buildkit can create the content but it will create the full manifest, rather than just image id with uncompressed layers

The manifest digest refers to compressed layers, so Docker doesn't know that identifier until after push since it calculates it on push. After we replace the image backend in Docker, that will work a little differently, we will be able to keep the compressed image hashes that were pulled or built

Related to multiple identifiers, it is always possible to create images that are the "same" but only differ by metadata, compression, encryption, or anything else

However, we are trying to move to a world where that original content is always used, so changes to the identifier actually represent a change to the image, rather than a side effect of pulling and pushing an image from a different docker version

The image ID does not have a the compressed hash, which tends to be what is needed to fetch the image from a repository or the byte size of the fetch-able artifacts”

@technosophos
Copy link
Member

I don't think you can reasonably call a hashed file "a root of a merkle tree". That assumes an intent that is clearly not there in VM images (namely, that they are tree-structured).

I am not understanding, though, what particular change you are requesting in the spec. Is it a clarification of which SHA Docker considers to be the correct SHA? Or are you proposing an alternative?

@glyn
Copy link
Contributor Author

glyn commented Feb 13, 2020

I don't think you can reasonably call a hashed file "a root of a merkle tree". That assumes an intent that is clearly not there in VM images (namely, that they are tree-structured).

This issue is scoped to docker/OCI images.

I am not understanding, though, what particular change you are requesting in the spec. Is it a clarification of which SHA Docker considers to be the correct SHA? Or are you proposing an alternative?

If I consume a bundle containing a docker/OCI image with contentDigest specified, I need to know whether that's the repo digest or the image id in order to verify it. I'm not asking which one is correct from Docker's perspective: they both have valid uses. It's merely a choice that the CNAB spec. has to make.

Let's take a simple example to make this crystal clear. CNAB runtime A could assume the contentDigest of a docker/OCI image is its repo digest while CNAB runtime B could assume its the image id. If a bundle created by runtime A was consumed by runtime B, then runtime B could say the contentDigest was invalid because it wasn't what runtime B was expecting.

@vdice
Copy link
Member

vdice commented Sep 18, 2020

With #384 merged, I believe we can consider this issue closed.

@vdice vdice closed this as completed Sep 18, 2020
@glyn
Copy link
Contributor Author

glyn commented Sep 21, 2020

LGTM, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants