Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.Digest() gives wrong sha ! but Manifest() is correct #625

Closed
medyagh opened this issue Dec 9, 2019 · 11 comments
Closed

.Digest() gives wrong sha ! but Manifest() is correct #625

medyagh opened this issue Dec 9, 2019 · 11 comments

Comments

@medyagh
Copy link
Member

medyagh commented Dec 9, 2019

Digest() gives wrong sha

Example

	ref, _ := name.ParseReference("justforfun", name.WeakValidation)
	img, _ := daemon.Image(ref)
	d, _ := img.Digest()
	fmt.Printf("\nhash: %+v", d.Hex)
	m, _ := img.Manifest()
	fmt.Printf("\nmani: %+v", m.Config.Digest.Hex)

output

hash: 6f8f28c26076ece6a96780fbfba4ef2e93c3938d0ce97550ccaf94b104c520ca
mani: 965ea09ff2ebd2b9eeec88cd822ce156f6674c7e99be082c7efac3c62f3ff652

I expected them to be same !
for the record, the hash I get from Manifest() is the correct one (according to docker)

$ docker inspect --format='{{.Id}}' justforfun
sha256:965ea09ff2ebd2b9eeec88cd822ce156f6674c7e99be082c7efac3c62f3ff652
@medyagh medyagh changed the title .Digerst() gives wrong sha ! but Manifest() is correct .Digest() gives wrong sha ! but Manifest() is correct Dec 9, 2019
@imjasonh
Copy link
Collaborator

imjasonh commented Dec 9, 2019

This is expected. The image's digest is the manifest's digest, which points to the config, which has a different digest.

Container images are weird. 🤷‍♂️

@imjasonh imjasonh closed this as completed Dec 9, 2019
@jonjohnsonjr
Copy link
Collaborator

To elaborate a bit -- what you're looking at with docker inspect is usually referred to as the "Image ID", which is equivalent to the config file's digest (which, as far as I can tell, is an implementation detail of docker, and isn't guaranteed to be true).

By default, docker images doesn't show the manifest digest, just the Image ID. You can call docker images --digests to show the digests as well.

If you call ConfigName on img, it will print the Image ID, which should match what you get from docker.

@medyagh
Copy link
Member Author

medyagh commented Dec 10, 2019

Thanks @jonjohnsonjr @imjasonh for the response.
both podman and crio also give me the same SHA which is why I need that.

I am trying to find a cheap way to get the image SHA

if I do docker inspect it takes 0.7 s

$ time docker inspect --format='{{.Id}}' ubuntu
sha256:775349758637aff77bf85e2ff0597e86e3e859183ef0baba8b3e8fc8d3cba51c

real	0m0.073s
user	0m0.035s
sys	0m0.033s

however if I use the code above it will take 4s, is there any cheaper way to get the image ID using go-containerregistry ?

UPDATE:
I also tried

fmt.Println(crane.Digest("ubuntu"))

but that also still takes 2.1 seconds

@imjasonh
Copy link
Collaborator

imjasonh commented Dec 10, 2019

I believe Jon was saying that this should return the "image ID" of the image (i.e., the digest of the image's config file -- not the "image's digest", which is the digest of the image's manifest file)

img, _ := daemon.Image(ref)
cf, _ := img.ConfigName()
fmt.Println(cf)

Once you've got the image metadata this shouldn't need to do much work, so I wouldn't expect this to take more than a few hundred ms. If you're seeing something else, that's likely a bug.

@medyagh
Copy link
Member Author

medyagh commented Dec 10, 2019

@imjasonh that still takes 1.95ssec

code:

	ref, _ := name.ParseReference("ubuntu", name.WeakValidation)
	img, _ := daemon.Image(ref)
	cf, _ := img.ConfigName()
	fmt.Println(cf)
$ time ./example1 
sha256:775349758637aff77bf85e2ff0597e86e3e859183ef0baba8b3e8fc8d3cba51c

real    0m1.954s

@gravypod
Copy link

gravypod commented Jun 7, 2020

I just wanted to chime in here and mention some issues I was running into from some downstream effects of this decision. I wanted to leave a tombstone for anyone debugging bazelbuild/rules_docker's behavior at a later date. If there is some way to create a digest-like file, that will be a 1:1 mapping with what's going to be pushed to an implementation of a docker registry v2, it would be very valuable for downstream consumers of this library.

What I was attempting to do

I'm attempting to write a macro to help with the deployment/rollout of services that use rules_docker and deploy into k8s. Essentially you can take a bunch of json/yaml with a similar format to this:

apiVersion: apps/v1
kind: Deployment
...
spec:
  ...
      containers:
        - image: //java/machinedb:machinedb
          ...

Where //java/machinedb:machinedb is a java_image container, the image: property is replaced with a pointer to the registry I'm using in that environment. In my case:
image: xxxx/java/machinedb@sha256:a3d1423818311c7f896dfe7f57edab52020f461a2fd303f0b0c9397698a69ae4
This sha comes from the //java/machinedb:machinedb.digest file that rules_docker creates. I'm using this sha as the image tag. Using a build system that guarantees reproducible builds (bazel) the same code will "always" produce the same sha. Using this property we can naively run kubectl apply -f ... and not need to worry about services doing a rollout if no code has changed even if the image tag has changed ("xxxx/pixiecore:{VERSION}" is changed every commit).

How this relates to go-containerregistry

I'm noticing an issue when using images from container_pull rules where the @IMAGE_NAME//image:digest file that is created seems to have no bearing on the image sha that will be set when it is pushed into a registry. This is very strange because none of the existing rules exhibit this behavior (all other <name>.digest or <name>/digest files share an exact match with what the registry will see when the image is pushed). This is because of the behavior in go-containerregistry which seems to create the digest file for the rules.

Behavior I was seeing downstream

  • @IMAGE_NAME//image:digest file contains: sha256:eca9e28be8ba1a4aa56ba45ea85b2e6f7ebe479569f282037cc1857bb7ae6eb8 (go-containerregistry)
  • printed when docker pull ... from registry: sha256:a9ea5a8450cf99bf037f4d3e7f036485055a4d4b75977b1c906ad7e70c0b2391

Reproducing this in bazel/rules_docker

Import a container:

container_pull(
    name = "pixiecore",
    digest = "sha256:eca9e28be8ba1a4aa56ba45ea85b2e6f7ebe479569f282037cc1857bb7ae6eb8",
    registry = "index.docker.io",
    repository = "danderson/pixiecore",
)

Push your container:

container_bundle(
    name = "images",
    images = {
        # VERSION is printed in the my workspace_status_command. 
        "xxxx/pixiecore:{VERSION}": "@pixiecore//image"
    },
)
container_push(
    name = "push_images",
    bundle = ":images",
    format = "Docker",
    insecure_repository = True,
)

The digest file produced by go-containerregistry does not match what's in the registry.

Temporary Hack (Fix)

In bazelbuild/rules_docker you can do the following:

container_image(
    name = "pixiecore",
    base = "@pixiecore//image",
    entrypoint = ["/usr/local/bin/pixiecore"],
)

This creates a derivative container that FROM ...s the container pulled with go-containerregistry and allows you to sidestep this unfortunate less-than-desirable defaults within this project.

@jonjohnsonjr
Copy link
Collaborator

@gravypod you might be hitting a bug in rules_docker (cc @smukherj1 for ideas). We expect Digest on an image to be at least stable, so you can know the digest of what you're pushing. That's how rules_k8s and ko both work, so I'd be surprised if it's an issue with go-containerregistry.

Possibly the rules_docker internal representation for images isn't roundtripping, but I'd be really surprised.

Maybe the "digest" file there isn't actually the manifest digest but something else, and this is just an issue of naming?

@gravypod
Copy link

gravypod commented Jun 8, 2020

@jonjohnsonjr I think you are correct. If this is a bug in rules_docker I'd be happy to reopen the issue there and submit a CL to fix the issue. From what I'm seeing it looks like this is a side effect behavior reported in this issue mixed with exactly a misunderstanding of the name .Digest(). The flow of a container_pull is seemingly:

If you look at that last link you can see the commenter thought that the .Digest() method was producing the manifest's digest:

digest        <-- sha256 digest of the image's manifest

We might be able to update rules_docker to write something like "sha256" + img.Manifest().Config.Digest.Hex but that would likely cause many issues:

  • Some people may be depending on the digest file to contain this information.
  • This could break everyone's WORKSPACEs since this file is what the container_pull uses to verify/identify the downloaded image.

From the perspective of go-containerregistry it might be a good idea to add something to identify the difference between Digest() and what most people with only a high level understanding of the docker API would think of when they think of an Image's Digest. I think Docker calls one the image id and one the repo digest. Do you think it would make sense to add a .RepoDigest() defined right next to the Digest() in the interface definition for v1.Image to make it clearer, and easier, for future users to identify what they want to use in the API this project provides?

@NathanHowell
Copy link

@gravypod adding some more detail here... I did some work on this for a very similar reason. images in kubernetes manifests were are referenced by label, e.g. //path/to/service.image and the build (via a kustomize overlay) rewrites these with the fully qualified path and repo digest.

existing efforts to get the repo digest do this by pushing to the registry and recording the digest. this not deterministic because the format of the repo digest is not standardized. when I checked earlier this year there was a small difference between index.docker.io and gcr.io: pretty printing and whitespace. of course this can break at any point in the future.

https://gist.github.com/NathanHowell/04a1e26cda9946acedee838aec75d28f has the relevant aspects I used to compute the repo digest for gcr.io.

@jonjohnsonjr
Copy link
Collaborator

From the perspective of go-containerregistry it might be .a good idea to add something to identify the difference between Digest() and what most people with only a high level understanding of the docker API would think of when they think of an Image's Digest. I think Docker calls one the image id and one the repo digest. Do you think it would make sense to add a .RepoDigest() defined right next to the Digest() in the interface definition for v1.Image to make it clearer, and easier, for future users to identify what they want to use in the API this project provides?

I think adding additional methods that do the same thing might actually make things more confusing -- the method names are from the perspective of a registry (hence go-containerregistry), which is a bit unfortunate if you're dealing with a v1.Image in a different context.

If you look at that last link you can see the commenter thought that the .Digest() method was producing the manifest's digest:

That is what's happening, so I still don't understand where this bug is.

existing efforts to get the repo digest do this by pushing to the registry and recording the digest. this not deterministic ...

There are a few reasons this might not be deterministic -- decompressing and recompressing the layers with different gzip implementations of compression levels is a common culprit.

... because the format of the repo digest is not standardized. when I checked earlier this year there was a small difference between index.docker.io and gcr.io: pretty printing and whitespace. of course this can break at any point in the future.

There shouldn't ever be whitespace differences between registries. As part of the protocol, clients assert the digest of the contents when uploading them, and registries return the exact bytes of what was uploaded. If there's any difference in whitespace, it was done on the client side. Note that GCR's web UI will show pretty-printed manifests if you click a checkbox, but the underlying data is still whatever clients uploaded.

@NathanHowell
Copy link

@jonjohnsonjr ah ok, so I had a poorly setup test. our build was pushing images with bazel to gcr.io, I pulled one with docker, tagged and pushed to index.docker.io and the only difference was manifest formatting. But the canonical manifest creation happens client side during docker push and (in some cases) server side if pulling by tag and not digest. containers/skopeo#469 (comment) had some useful info.

@gravypod ignore some of my comments :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants