Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

skopeo inspect digest differs from docker images --digest #469

Closed
thomasmckay opened this issue Jan 12, 2018 · 14 comments
Closed

skopeo inspect digest differs from docker images --digest #469

thomasmckay opened this issue Jan 12, 2018 · 14 comments

Comments

@thomasmckay
Copy link
Contributor

Inspecting an image with skopeo vs. docker gives different digest values

[vagrant@devel foreman]$ docker --version
Docker version 1.12.6, build ec8512b/1.12.6
[vagrant@devel foreman]$ skopeo --version
skopeo version 0.1.26

[vagrant@devel foreman]$ docker pull busybox:1.28
Trying to pull repository docker.io/library/busybox ... 
sha256:436bbf48aa1198ebca8eac0ad9a9c80c8929d9242e02608f76ce18334e0cfe6a: Pulling from docker.io/library/busybox
Digest: sha256:436bbf48aa1198ebca8eac0ad9a9c80c8929d9242e02608f76ce18334e0cfe6a
Status: Downloaded newer image for docker.io/busybox:1.28

[vagrant@devel foreman]$ skopeo inspect docker-daemon:docker.io/busybox:1.28
{
    "Digest": "sha256:113d150b5cabd9d72a0c9028d97e7069f698aad7dda76fc1110fd4b5c732cbdf",
    "RepoTags": [],
    "Created": "2018-01-09T02:06:57.143983197Z",
    "DockerVersion": "17.06.2-ce",
    "Labels": null,
    "Architecture": "amd64",
    "Os": "linux",
    "Layers": [
        "sha256:ae25bfd8dcdeab5892b4efa7364c0a1bba59e11491e10415893b8905e3467799"
    ]
}

[vagrant@devel foreman]$ docker images --digests
REPOSITORY                                                                TAG                 DIGEST                                                                    IMAGE ID            CREATED             SIZE
docker.io/busybox                                                         1.28                sha256:436bbf48aa1198ebca8eac0ad9a9c80c8929d9242e02608f76ce18334e0cfe6a   807fd4df40d1        3 days ago          1.143 MB

@runcom
Copy link
Member

runcom commented Jan 12, 2018

@mtrmac do you happen to know why?

@mtrmac
Copy link
Contributor

mtrmac commented Jan 12, 2018

Sure, 436bb… is the digest of the manifest list. docker pull throws away both the manifest list and the original manifest, so reading from docker-daemon: synthesizes a new manifest which does not match the ID.

The two images can probably be paired using their config digest (i.e. "Id" in docker inspect, and the "config"."digest" field in skopeo inspect --raw).

containers/image#288 will allow us to record the original manifest (but not the higher-level manifest list), and only for images pulled via skopeo, not docker pull. I’m not sure whether showing the original digest of the manifest in skopeo inspect docker-daemon: in this restricted case would be more valuable; it might.

@thomasmckay
Copy link
Contributor Author

@mtrmac Where is 'docker images --digests' getting the 436bb digest value? It is recalculating it on the fly?

@mtrmac
Copy link
Contributor

mtrmac commented Jan 12, 2018

docker images records the digest used for pulling the image in a locally stored data structure. On Fedora at least, docker inspect shows it in a RepoDigests array — and do note that it is an array, a single image can have zero, one, or multiple digests if it were pulled from multiple places/names.

skopeo inspect, OTOH, is recalculating the digest from manifest contents, in the docker-daemon: case from the contents of the synthesized manifest.

What are you actually trying to do, or what question are the digests supposed to answer?

@thomasmckay
Copy link
Contributor Author

I would like to match a docker-daemon (ie. locally pulled) image to its exact match in a registry. The registry in my case is foreman/pulp/crane (Satellite-6). From a user perspective, I'd like to provide instructions along the line of "Run $something on your host then use $unique-id to look up the image in foreman." The $unique-id, I thought, was the digest.

This would be applicable to for image signatures, or image scanning results, or history info, or whatever I manage to pack into foreman.

Is that realistic? Thoughts on how to approach this cross-reference through something besides digest?

@mtrmac
Copy link
Contributor

mtrmac commented Jan 13, 2018

If the Docker version provides RepoDigests in the docker inspect output (which docker-1.13.1-44.git584d391.fc26.x86_64 does, but your output seems not to; I haven’t now checked what makes a difference, I can look into it on Monday if necessary), that is easiest to match against a registry from which the image was docker pulled.

If RepoDigests is not available, and the image uses the schema2 or OCI manifest format, the config digest can be used: Read "Id" from docker inspect, and then $somehow (there probably isn’t already an index for this if the registry isn’t already keeping almost everything in a relational database) find an image which has that value in the config.digest JSON field of the manifest.

If RepoDigests is not available, and the image on the registry uses schema1, I can’t think of a way. The layer list might help a bit (at the cost of uncompressing the registry layers), but there may be several different images with the same layer lists and different configs.

@NicolasT
Copy link

Running into a similar issue:

$ export IMAGE=docker.io/library/registry:2.7.1
$ docker pull $IMAGE
2.7.1: Pulling from library/registry
Digest: sha256:870474507964d8e7d8c3b53bcfa738e3356d2747a42adad26d0d81ef4479eb1b
Status: Image is up to date for registry:2.7.1
$ skopeo inspect docker://$IMAGE | jq ".Digest"
"sha256:870474507964d8e7d8c3b53bcfa738e3356d2747a42adad26d0d81ef4479eb1b"
$ docker image inspect $IMAGE | jq ".[].RepoDigests"
[
  "registry@sha256:870474507964d8e7d8c3b53bcfa738e3356d2747a42adad26d0d81ef4479eb1b"
]

So, all tooling agrees $IMAGE has digest sha256:870474507964d8e7d8c3b53bcfa738e3356d2747a42adad26d0d81ef4479eb1b, right?

$ docker save $IMAGE > docker-export.tar
$ skopeo inspect docker-archive:docker-export.tar | jq ".Digest"
"sha256:3ec6f2e82a4142d71f6def7f3349c80688b32c7ebcdcdd03934f054044b8e5b2"

Ugh, what happened?

$ skopeo copy docker://$IMAGE docker-archive:skopeo-export.tar
Getting image source signatures
Copying blob sha256:169185f82c45a6eb72e0ca4ee66152626e7ace92a0cbc53624fb46d0a553f0bd
 2.10 MB / 2.10 MB [========================================================] 0s
Copying blob sha256:046e2d030894bf549c6d587edae9ec017659e4e32d08b82431245b3ae719bd95
 612.44 KB / 612.44 KB [====================================================] 0s
Copying blob sha256:188836fddeeb46dc9540ba4571bbac1e20021f31a0d0ed15d954c30edf450464
 6.51 MB / 6.51 MB [========================================================] 2s
Copying blob sha256:8327445377470a020be1cda0a9c0c3ee75a66cf28d2db41494db02651ecd50a7
 370 B / 370 B [============================================================] 0s
Copying blob sha256:7ceea07e80be4bb4941253cf1c886775de26101d016bd66176c8853bc00a46e1
 214 B / 214 B [============================================================] 0s
Copying config sha256:d0eed8dad114db55d81c870efb8c148026da4a0f61dc7710c053da55f9604849
 3.09 KB / 3.09 KB [========================================================] 0s
Writing manifest to image destination
Storing signatures
$ skopeo inspect docker-archive:skopeo-export.tar | jq ".Digest"
"sha256:3ec6f2e82a4142d71f6def7f3349c80688b32c7ebcdcdd03934f054044b8e5b2"

Looks like there's agreement the digest of the exported image is indeed sha256:3ec6f2e82a4142d71f6def7f3349c80688b32c7ebcdcdd03934f054044b8e5b2, which is not the source digest 😕

Oddly enough:

$ skopeo inspect docker-daemon:$IMAGE | jq ".Digest"
"sha256:3ec6f2e82a4142d71f6def7f3349c80688b32c7ebcdcdd03934f054044b8e5b2"

Now, after removing $IMAGE from Docker, and importing docker-export.tar using docker image load, the resulting image has... no RepoDigest in docker image inspect $IMAGE.

I guess there's basically no way to know for sure an export (using skopeo copy docker://... docker-archive:... or docker pull ...; docker save ...) of some image is the same as a specific image stored on the hub (RootFS Layers digests seem to change all the time as well, huh?!?).

(Rant: time and time again I get to experience Docker image 'distribution' (the spec), digest management, integrity,... is a complete disaster).

@mtrmac
Copy link
Contributor

mtrmac commented Feb 19, 2019

Before dealing with the intricacies of the implementations, what are you ultimately trying to do?


$ docker pull $IMAGE
…
Digest: sha256:870474507964d8e7d8c3b53bcfa738e3356d2747a42adad26d0d81ef4479eb1b
$ skopeo inspect docker://$IMAGE | jq ".Digest"
"sha256:870474507964d8e7d8c3b53bcfa738e3356d2747a42adad26d0d81ef4479eb1b"
$ docker image inspect $IMAGE | jq ".[].RepoDigests"
…
  "registry@sha256:870474507964d8e7d8c3b53bcfa738e3356d2747a42adad26d0d81ef4479eb1b"

So, all tooling agrees $IMAGE has digest sha256:870474507964d8e7d8c3b53bcfa738e3356d2747a42adad26d0d81ef4479eb1b, right?

$IMAGE on the registry has that digest. docker pull reads the manifest with that digest, but does not store it locally at all. The manifest is not recoverable from the local storage (i.e. without reading from the remote registry again). Also…

$ docker save $IMAGE > docker-export.tar

docker save uses a quite different image format, which does not have manifests of that kind, so it also does not have any manifest digests, and the question “what manifest digest does docker-export.tar have” does not make sense.

$ skopeo inspect docker-archive:docker-export.tar | jq ".Digest"
"sha256:3ec6f2e82a4142d71f6def7f3349c80688b32c7ebcdcdd03934f054044b8e5b2"

Ugh, what happened?

c/image, in the implementation of docker-archive:, synthesizes an in-memory schema2 manifest so that the archive can be processed as an ordinary schema2 image, but that’s not a manifest “in” the .tar archive, and it can’t be the original manifest of $IMAGE because it has been discarded during docker pull.

(FWIW c/storage, and podman pull, do record the original manifest in local storage, so the situation is not that hopeless — but a podman push still can’t just reuse the original manifest (it has to compress layers afresh, which likely enough changes the manifest). But, at least, the original manifest exists locally to support integrity, and eventually, signature checking.)


$ skopeo copy docker://$IMAGE docker-archive:skopeo-export.tar
…
$ skopeo inspect docker-archive:skopeo-export.tar | jq ".Digest"
"sha256:3ec6f2e82a4142d71f6def7f3349c80688b32c7ebcdcdd03934f054044b8e5b2"

Looks like there's agreement the digest of the exported image is indeed sha256:3ec6f2e82a4142d71f6def7f3349c80688b32c7ebcdcdd03934f054044b8e5b2, which is not the source digest 😕

Not really; it’s not at all guaranteed that the result of a schema2→archive→schema2 conversion will have a consistent digest over time (the conversion implementation can change), nor that it will be the same as the result of a registry→docker pulldocker save→schema2 process (especially when the source image uses schema1, when a config object must be created during the pull / archive conversion process).

The value happens to be the same in this case because none of the operations have changed the config object, and because you are doing all testing with a single implementation of skopeo that internally does the tar→schema2 conversion which happens before computing/reporting the digest.


Now, after removing $IMAGE from Docker, and importing docker-export.tar using docker image load, the resulting image has... no RepoDigest in docker image inspect $IMAGE.

Yes: RepoDigest values are not immutable properties of the image, they are attributes of how/where the images have been stored (e.g. depending on particular version of the compression implementation, {podman,docker} push can cause different RepoDigest values to be used, and RepoDigest obviously depends on the destination registry host and repository name). In docker pull, RepoDigest is set when pulling the image to record the source of the pull operation. (Multiple sources, possibly, when “the same” image has been pulled from multiple sources and deduplicated locally.)

docker load does not involve a pull from a registry, so there is no “source of the pull operation”, and it inherently can’t record anything RepoDigests; at most, there could be a path name, or not even that when streaming the archive from a pipe, and a path name is not a valid / useful RepoDigests value anyway.

I guess there's basically no way to know for sure an export (using skopeo copy docker://... docker-archive:... or docker pull ...; docker save ...) of some image is the same as a specific image stored on the hub (RootFS Layers digests seem to change all the time as well, huh?!?).

Strictly speaking, that’s trivial: “it’s not the same”, an archive uses a different format. If you want to know whether it has been maliciously modified…

RootFS.DiffIDs values should be stable across push/pull/save/load, and, if it exists and no conversion is necessary, so should the config digests. As mentioned above, getting the config digests is not always trivial, especially getting the value if you only have a manifest digest and the image on the remote registry goes away in the meantime.

But if you want to preserve exactly the same image across copies over registries (or, well, what is it that you are really trying to do), I would strongly recommend staying completely away from docker {push, pull,save,load}, all of which inherently involve some kind of format conversion / compression, or something similar and using skopeo copy, which can copy compressed images bit-by-bit identically if supported by the source and destination. Also, if you need local files, possibly use the dir: transport instead of docker-archive:; dir: is designed not to require any format conversions.

Again, depends on what are you ultimately trying to do.

@rhatdan
Copy link
Member

rhatdan commented Apr 25, 2019

@thomasmckay Could you answer @mtrmac Questions, or should I close this issue?

@NicolasT
Copy link

@mtrmac I completely missed your reply, sorry. So, getting back to #469 (comment):

Indeed, as you guessed, the intent is to be able to retrieve -> store -> distribute -> ingest -> pull -> run images based on a static description of said image (i.e. some hash of it) in a repeatable way. I was using docker save because (at that point in time) there was no way to ingest/run a skopeo-exported image in the target (containerd, using ctr cri load ..., which seems to work only with Docker-generated archives, where skopeo seems to extract something slightly different). However, meanwhile I found a way to inject images in containerd's cache which does support skopeo exports.

Overall, it looks like I may want/need to give up on the original goal, given how (at least Docker) image management is not really embracing blob checksums as identifiers for images across their lifecycle/exports/pushes/...

I should spend some more time using skopeo again, indeed investigating the dir target as well (would that somehow allow to set up a plain HTTP server over it to act as a 'distribution spec'-compatible endpoint? One can dream, right...)

@mtrmac
Copy link
Contributor

mtrmac commented Apr 25, 2019

I should spend some more time using skopeo again, indeed investigating the dir target as well (would that somehow allow to set up a plain HTTP server over it to act as a 'distribution spec'-compatible endpoint? One can dream, right...)

No. But, you can “easily enough” run a docker/distribution registry server container with a volume mounted into the container’s storage directory, copy images to that registry, then shut down the container, move the volume data elsewhere, and run a similar container elsewhere and pull from it.

(The docker/distribution protocol is pretty close to being able to be served using a plain HTTP server, but not quite there — e.g. it needs to return the correct Content-Type for manifest files, all of which are JSON but the Content-Type can differ from one to another, while the paths look the same. So, just a flat directory structure without server-specific options to serve the right headers isn’t possible in general.)

@NicolasT
Copy link

I should spend some more time using skopeo again, indeed investigating the dir target as well (would that somehow allow to set up a plain HTTP server over it to act as a 'distribution spec'-compatible endpoint? One can dream, right...)

No. But, you can “easily enough” run a docker/distribution registry server container with a volume mounted into the container’s storage directory, copy images to that registry, then shut down the container, move the volume data elsewhere, and run a similar container elsewhere and pull from it.

I considered that at some point, but it doesn't fit our needs for a couple of reasons, one of them it being not very 'composable'.

(The docker/distribution protocol is pretty close to being able to be served using a plain HTTP server, but not quite there — e.g. it needs to return the correct Content-Type for manifest files, all of which are JSON but the Content-Type can differ from one to another, while the paths look the same. So, just a flat directory structure without server-specific options to serve the right headers isn’t possible in general.)

Well, thanks to your dir target hint, I was able to hack this together: https://github.com/NicolasT/static-container-registry

It basically allows you to

  • skopeo copy ... a bunch of images
  • Run the script
  • include the result in an nginx configuration
  • 💰 ?

CI then tests whether docker pull and skopeo inspect work, and whether crictl pull ... works using cri-o and containerd.

Not meant to be a fully-compliant distribution-API implementation, but does 'good enough' for those runtimes/clients to work.

@agners
Copy link

agners commented Aug 27, 2019

The two images can probably be paired using their config digest (i.e. "Id" in docker inspect, and the "config"."digest" field in skopeo inspect --raw).

This works for me:

skopeo inspect --raw docker://debian:buster | jq '.manifests | map(select(.platform.architecture == "amd64" and .platform.os == "linux") )'

@mtrmac
Copy link
Contributor

mtrmac commented Aug 27, 2019

As much as this bug has been useful to discuss the various digest values and what happens during copies, it’s not clear what Skopeo can or should do about all the consequences of the various design decisions.

If anyone has specific needs not covered by the conversations above, please file separate issues.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants