-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
source: imageblob source implementation #4286
base: master
Are you sure you want to change the base?
Conversation
Relates to #4259 @AkihiroSuda Maybe we should call this artifact, although it is already overused term and may lead to even more confusion. Lmk if you have any opinions on the pURL etc. |
Image blob source in LLB allows addressing a single blob from a container image registry. The difference from the image source is that image source needs to point to a manifest that internally points to an array of layer blobs that are all extracted on top of each other to form a root FS. Contrary, image blob points to a single blob that is not extracted but downloaded as a single file into an empty snapshot, similarily how the HTTP source works. The main use case for this source is to pin snapshots of HTTP URLs, upload the downloaded blob into container registry, and then use a source policy to map a HTTP URL (whose content might be changed) to the copy of the source as image blob to ensure immutability. Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
935aaf7
to
278daa0
Compare
@@ -83,6 +83,22 @@ func slsaMaterials(srcs Sources) ([]slsa.ProvenanceMaterial, error) { | |||
out = append(out, material) | |||
} | |||
|
|||
for _, s := range srcs.ImageBlobs { | |||
uri, err := purl.RefToPURL("docker-blob", s.Ref, nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Surfacing from Slack:
Am I correct in that you’re inventing your own docker-blob PURL scheme?
I wouldn't invent a new PURL ecosystem; they're well-defined in https://github.com/package-url/purl-spec/blob/master/PURL-TYPES.rst; pkg:docker/ with a specifier (e.g. ?buildkit_type=blob) makes more sense to me.
I don’t know what else to put there. I’m extending the docker scheme because it is the same protocol. But I already have a problem that I want to be able to point to a local OCI layout this way as well that would need another name.
I'd like to suggest some alternatives as I don't think inventing our own scheme outside the PURL spec is a good idea:
# Embedding the original source as a URL + checksum:
pkg:generic/openssl@1.1.10g?download_url=https://openssl.org/source/openssl-1.1.0g.tar.gz&checksum=sha256:de4d501267da&buildkit_blob=docker.io/mynamespace/myrepo@sha256:...
# Extending the existing docker scheme:
pkg:docker/mynamespace/myrepo@sha256%3A244fd47e07d10?ref_type=blob
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tonis and I discussed this further; what likely makes more sense is passing the provenance all the way through (so, use the existing URL and hash even when doing a "replay" from "registry mirrored inputs"; and have some BuildKit-specific sourcepolicy-map file that records what verifiable mirrored input was used for the specific build).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for the PURL, we should be good to just reference the blob as in the second example:
pkg:docker/mynamespace/myrepo@sha256%3A244fd47e07d10?ref_type=blob
I don't even think we would need to add a ref_type
, I can't see a good reason for why we'd need that. It doesn't seem strictly necessary to distinguish between the docker-blob
and docker
types, though maybe there's something I'm missing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's mostly because "try a digest as a manifest and fall back to trying it as a blob" isn't an existing behavior and doesn't really make sense in the general case; there are separate APIs to get this content (for better and for worse) and having a hint makes it more clear what is actually expected/prevents fragmentation among tools that return a 404 because they didn't fall back to blob.
This is all mostly hypothetical as this is intended to be an internal implementation detail of "replayable" builds, but nonetheless PURLs are meant to be interoperable and commonly understood...
If we use the latter type, we should probably PR it back to the PURL spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docker
------
``docker`` for Docker images:
- The default repository is ``https://hub.docker.com``.
- The ``namespace`` is the registry/user/organization if present.
- The version should be the image id sha256 or a tag. Since tags can be moved,
a sha256 image id is preferred.
- Examples::
pkg:docker/cassandra@latest
pkg:docker/smartentry/debian@dc437cc87d10
pkg:docker/customer/dockerimage@sha256%3A244fd47e07d10?repository_url=gcr.io
oci
------------
``oci`` for all artifacts stored in registries that conform to the
`OCI Distribution Specification <https://github.com/opencontainers/distribution-spec>`_,
including container images built by Docker and others:
- There is no canonical package repository for OCI artifacts. Therefore
``oci`` purls must be registry agnostic by default. To specify the repository,
provide a ``repository_url`` value.
- OCI purls do not contain a ``namespace``, although, ``repository_url`` may
contain a namespace as part of the physical location of the package.
- The ``name`` is not case sensitive and must be lowercased. The name is the
last fragment of the repository name. For example if the repository
name is ``library/debian`` then the ``name`` is ``debian``.
- The ``version`` is the ``sha256:hex_encoded_lowercase_digest`` of the
artifact and is required to uniquely identify the artifact.
- Optional qualifiers may include:
- ``arch``: key for a package architecture, when relevant.
- ``repository_url``: A repository URL where the artifact may be found, but not
intended as the only location. This value is encouraged to identify a
location the content may be fetched.
- ``tag``: artifact tag that may have been associated with the digest at the time.
- Examples::
pkg:oci/debian@sha256%3A244fd47e07d10?repository_url=docker.io/library/debian&arch=amd64&tag=latest
pkg:oci/debian@sha256%3A244fd47e07d10?repository_url=ghcr.io/debian&tag=bullseye
pkg:oci/static@sha256%3A244fd47e07d10?repository_url=gcr.io/distroless/static&tag=latest
pkg:oci/hello-wasm@sha256%3A244fd47e07d10?tag=v1
Since a blob is not an "image", using oci
type may make more sense.
?ref_type=blob
doesn't seem needed.
HTTPSScheme = "https" | ||
OCIScheme = "oci-layout" | ||
DockerImageScheme = "docker-image" | ||
DockerImageBlobScheme = "docker-image-blob" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Surfacing from Slack:
I think that docker-image-blob:// feels too generic; it seems like something more specific like buildkit-pined-artifact:// or something might be more straightforward, even if this is an implementation detail/not user facing, as you said.
This doesn’t say anything about the transport/storage. All the current source names define that.
But I already have a problem that I want to be able to point to a local OCI layout this way as well that would need another name.
Based on the above, what about docker-image+blob
and oci-layout+blob
for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oci-layout+blob
I'd definitely be interested in having this as well - making sure we can load them from disk makes this super useful for me. I don't think we should have both oci-layout
and docker-image
functionality merged into a new docker-image-blob
, so the +
sounds like a neat solution to me 🎉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't feel like it's worth bikeshedding this too much: either docker-image+blob
/oci-layout+blob
or docker-image-blob
/oci-layout-blob
would work for me.
The +
implies to me shared code / similar functionality to the base, which isn't quite true though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is though, from the perspective of a user: both come from a registry/an OCI bundle.
type ImageBlobIdentifier struct { | ||
Reference reference.Spec | ||
RecordType client.UsageRecordType | ||
Filename string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity - might we want the ability to unpack a directory from an image blob?
For example, this works great for replacing sources that result in a single file, like HTTP, but what about for Git or Local? I guess we care less about Git, but there are some interesting use cases for capturing the Local directory, since it may not be reproducible between users (depending on the build process).
I suppose you would just be able to do this by having a single layered image with an empty config (though this would require an extra level of indirection).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe better would be for the policy to provide multi-vertex LLB instead of single op in this case. In that case, you could pair it with a FileOp
that does the extraction.
I don't really care about local much because there is no checksum that tracks locals anyway. Git is interesting but it would really be only valuable if we can validate the original checksum based on the commit sha. If we have Git source with sha1:abc
and then some targz blob that we claim is equivalent with targz checksum sha256:def
there is no way to validate it either visually or by buildkit. So it looks like if we want to store snapshot of git source outside of the real git repository we need some special way to import it that still validates the original commit sha (and makes sure there are no extra files etc).
Image blob source in LLB allows addressing a single blob from a container image registry. The difference from the image source is that image source needs to point to a manifest that internally points to an array of layer blobs that are all extracted on top of each other to form a root FS. Contrary, image blob points to a single blob that is not extracted but downloaded as a single file into an empty snapshot, similarily how the HTTP source works.
The main use case for this source is to pin snapshots of HTTP URLs, upload the downloaded blob into container registry, and then use a source policy to map a HTTP URL (whose content might be changed) to the copy of the source as image blob to ensure immutability.