BuildKit + SBOM integration #2773

tonistiigi · 2022-04-06T15:11:17Z

Docker has created an experimental docker sbom command https://github.com/docker/sbom-cli-plugin that is shipping in the Docker Desktop 4.7.0 release and provides visibility into the components that make up the image.

The current command works on existing images, pulling them down and analyzing the contents of the layers. In addition to scanning existing images, it could be useful to see if in some cases it would be better to capture the SBOM data in build time instead. Theoretically, this gives us access to more data points because a lot of dependencies that are used during a build don’t end up in the final image or their version information has been lost. We can also combine it with the information of the BuildKit build itself https://github.com/moby/buildkit/blob/master/docs/build-repro.md .

As a sample, we’ve created a simple POC frontend that you can use with the existing BuildKit/Buildx installation.

To test it you can put # syntax=crazymax/dockerfile:sbom as first line of your Dockerfile or set BUILDKIT_SYNTAX build argument.

For your existing build you can run:

docker buildx build --build-arg BUILDKIT_SYNTAX=crazymax/dockerfile:sbom -o sbom .

That will run your build, generate SBOM for it, and export it into the local “sbom” directory:

» ls sbom         
sbom.txt		sbom_cyclonedx.json	sbom_spdx.json		sbom_syft.json

The current POC and Docker CLI plugin uses Syft as a backend. We don’t plan to hardcode any application logic into BuildKit for detecting a specific language dependency etc. but keep this part pluggable (either by the user or frontend author) in containers, with good defaults.

There are many open questions that need to be figured out and for which we would like to get more feedback. For example:

What points of the builds are useful to capture? Exported images have runtime and build-time dependencies. Extracting the runtime dependencies from the final build stage is quite safe but build-time dependencies can be lost there. Some languages, like Go make it simpler by embedding the package dependency info into the final binary where it can be read out but I don’t think we can rely on all applications doing that. Analyzing the build context should be useful but not sure if even that is enough for all cases. Can these points of the build where we read dependencies be detected automatically or do we need to add a way for users to control this behavior. We need more examples of real-world builds and what data users would expect to be captured for such builds.
Where to store the final SBOM? Having the SBOM locally like in the current POC is ok for checking its contents but not really what you would want to do for your release builds. The most useful is to keep the SBOM connected to the image and push them together. Our current method of embedding structures into the image configuration that we have used for capturing build-cache metadata and build sources does not work well as the SBOM can quickly grow quite large. We would need to push it as a separate object and connect it with the image manifest. Hopefully without breaking compatibility with existing registries. These updates would definitely require BuildKit changes as currently we can only generate image manifests.

Your comments, suggestions, feedback, use cases and help are welcome! We want to make images built with Docker better and we want to make the experience of developing features to do that great. We’re happy to collaborate here, elsewhere on GitHub and on our community Slack channel (#buildkit).

The text was updated successfully, but these errors were encountered:

SteveLasker · 2022-04-07T21:20:39Z

As an example for how to push SBOMs into registries, ORAS Artifacts does just this:

oras push $REGISTRY/$REPO \
  --artifact-type 'sbom/example' \
  --subject $IMAGE \
  ./sbom.json:application/json

This is also avaialble in Azure, today (preview): https://aka.ms/acr/oras-artifacts

dlorenc · 2022-04-08T02:10:15Z

We would need to push it as a separate object and connect it with the image manifest. Hopefully without breaking compatibility with existing registries.

This is the tricky part today. The OCI Reference Types Working Group is making quite a bit of progress on the new registry specs here though!

https://github.com/opencontainers/wg-reference-types

OR13 · 2022-04-08T02:31:51Z

This was a weekend project a while back, but you can push signed json as part of the container build process using github container registry today: https://github.com/transmute-industries/public-credential-registry-template/blob/main/docs/public-container-registry.md

Here we include a jwt built from the previous steps in the container push call:

https://github.com/transmute-industries/public-credential-registry-template/blob/main/.github/workflows/create-container.yml#L88

dirien · 2022-04-08T08:48:10Z

HI @caarlos0,

We should keep an eye on this regarding goreleaser. So in theory, goreleaser would need to provide the SBOM in the directory sbom as we are building the binary outside of the Container step.

sudo-bmitch · 2022-04-08T13:41:02Z

What points of the builds are useful to capture?

I want to be sure all of the source code dependencies are included. Whether that means running the generator before the compile, or using languages that include that metadata in the binaries for later retrieval, I'm flexible. But if the SBOM only includes Linux packages and misses that log4j is included in the jar, this won't be as useful to me.

Where to store the final SBOM?

That's the big question right now. STAG's Secure Supply Chain WG was discussing that yesterday and I think it will be a focus of our next WG goals. And OCI has their reference type WG trying to come up with a good spec for how to associate artifacts with images. And there's currently an OCI artifact definition that describes how to package data using the image manifest media type and a custom config media type.

For portability, the two biggest areas that I know we're working on to make this possible (I name here only to give us a checklist of work that needs to be done, not intending any public shame):

Docker Hub filters on media type, so we can't push an OCI Artifact there yet without faking the config media type to look like an image. We've got an issue over in the roadmap tracking that need: Support OCI artifacts on Docker Hub docker/roadmap#135
AWS blocks the push of a manifest with any unknown fields. It sounds like progress has been made on that front, and they're also working with us over in OCI, so if the reference type WG adds some new fields, I believe we'll get support from AWS.

I believe everything we're trying to create in these working groups will support what Docker and many others are trying to do. Let me know if we're missing a use case. Happy to chat more on these, either here or in one of the many slacks.

AuraSinis · 2022-04-08T17:14:44Z

Is there additional auth required to run this command on an image from a (private) registry?

chris-crone · 2022-04-08T18:19:38Z

@AuraSinis there is a known bug if you're using plain text credentials in your ~/.docker/config.json. It should be fixed by this PR: docker/sbom-cli-plugin#14

We'll ship that as part of the next Desktop release and when we ship the command in the Linux packages.

AuraSinis · 2022-04-08T18:30:23Z

Is permission to pull an image from a registry enough to be able to sbom it?

chris-crone · 2022-04-08T19:07:20Z

Is permission to pull an image from a registry enough to be able to sbom it?

@AuraSinis, yes! Generating the SBOM involves pulling the image and then locally analyzing its contents. Once you have the image locally, no network calls need to be done.

nishakm · 2022-04-11T16:39:34Z

Is it possible to expose the "live" container filesystem via llb? I would like to see something like [generating an SBOM for the base container image](tern report --live $mnt -f spdxjson -o debian-sbom) and then [generating an SBOM for the derived container image](tern report --live $mnt -f spdxjson -ctx debian-sbom -o python-sbom).

This would allow us to generate SBOMs for multistage builds.

tonistiigi · 2022-05-03T02:15:36Z

@sudo-bmitch From the proposals you listed, D seems to be the only one with some backward compatibility. I think that part is most important for us. But as you said it probably still doesn't work with current Docker Hub because of the invalid config mediatype. Just linking to the sbom blob from the manifest list would in practice work almost everywhere but I guess you don't like it based on the comments on some previous structures that have attempted that? I don't have any ETA when Hub could support these config mediatypes though. Given either of these two options, I think we are ready to start implementing it similarly to the D proposal.

I didn't see many comments about tracing points, other than only tracing the build result is not enough in many cases. We could include build context. @wagoodman Does something like Syft take multiple inputs in this case, or would we generate SBOMs for both and then try to merge them? Other than build result and build context I don't see content that could be scanned automatically. Possibly a best option is for user to provide something in the Dockerfile if they expect a intermediate stage/step to be scanned as well(eg. with a code comment).

Regarding composition and invocation points. If we assume the constraints that we do not want to add any specific scanning logic to the Dockerfile frontend itself I think it is best if the SBOM integration point is implemented as its own frontend. This is much more flexible than Dockerfile frontend just running a container with a scanner process.

Now we would need to figure out the protocol between the "builder frontend" (eg. Dockerfile) and "sbom generator" frontend. Frontends can already call each other so it is more a matter of making them understand that others exist.

Options:

a) Dockerfile frontend does its build. Before result is sent to the exporter it is instead sent to the "next frontend". Assuming there are multiple scanning points, all of them would need to be sent separately via some array definition.

b) Dockerfile frontend calls a "sbom frontend" for each scanning point internally. Then merges them all together(or forwards all and exporter merges them together). This seems most practical but every custom frontend would also need to implement these internal calls.

c) We call "sbom frontend" instead and it calls Dockerfile frontend. Problem with this one is that it can't really scan more than build result and context because "sbom frontend" does not know how to parse Dockerfile for extra scanning points.

Regarding protocol for chaining frontends, are there more use cases than SBOM that could benefit from it? If there are what would be the UX of how user controls it? With only SBOM I would imaging the UX would be something simple as docker buildx build --sbom or docker buildx build --sbom --build-arg BUILDKIT_SBOM_GENERATOR=custom-image but if the frontend chains get more complicated this would not work. Potentially we need higher-level clients like buildx bake to control this but would be nice to have basic functionality from plain build command as well.

sudo-bmitch · 2022-05-03T23:45:26Z

@tonistiigi the working group appears to be consolidating behind proposal E which also has a backwards compatible component. There's still some work to be done before we're ready to propose changes to the various OCI specs. If you're building an example implementation of a proposal, we'd love to work with you to be sure we're proposing something that's usable to the community.

chris-crone · 2022-05-04T13:20:36Z

@sudo-bmitch Our goal is definitely to build something aligned with the broader community. What's the best way to catch up on how you've got to Proposal E and to feed into the process?

On a quick read of Proposal E and without all the context of previous discussions, my initial take is that I do not like the idea of manifests referencing manifests. This breaks a core assumption of registry data models– manifests are linked together by manifest lists/indexes. The proposed reference manifest field adds another layer of dependencies that will need to be navigated. It doesn't only have GC implications but could impact other things too. I see the need to call out links between manifests but I think that Proposal D succeeds in doing this.

SteveLasker · 2022-05-04T13:44:03Z

Hey @chris-crone,
For the question around manifests linking to other manifests, it's based on the discovery of these additional types.
The users knows the image URL lets say the ubuntu image. How do they find the ubuntu SBOM or ubuntu scan results, linked to the specific digest of the ubuntu image?
And, when the ubuntu image is deleted, or promoted, how do you capture all the other linked artifacts?

The premise of reference types is you push the SBOM with a pointer back to the specific ubuntu image (linked by its digest)
The registry knows the SBOM is a reverse pointer to the ubuntu image so it knows how to serve it back.

Here's a video that might help Artifact ReferenceTypes

Proposal E has a means to support existing registries and registries that wish to natively support these reverse links.

sudo-bmitch · 2022-05-04T14:18:15Z

@chris-crone I was looking at whether we could do all of this with an index or manifest list, pushing artifacts directly into that and shipping along side images. I think Proposal C was also going doing that path. The challenges I ran into included race conditions where two tools try to extend the same manifest and update the same index with the other change missing, and digests changing on the index with every update to an attached artifact. It felt like it was breaking too many existing workflows to try solving it at that level.

So we've mostly opted for solutions that leave the original manifest untouched and give the user a way to discover things that point to that manifest. Proposal D does that entirely client side by leveraging a tag syntax. And proposal E extends that to allow some of the work to occur on registries that choose to support a new API. Both of these are giving a way to have a manifest reference another manifest, it's then a question of whether we want to allow registries to offload the query process so the client isn't pulling a long list of matching tags to discover the specific artifact they are looking for.

Thinking of why we want to offload that query to the registry, I've imagined a scenario where a vulnerability scanner updates a signature on an image, perhaps daily, to indicate it passed the security check and is allowed to run in production. After a month of scan updates, a client may have to pull 30 manifests to find the current one if all it has is the digest tags that require client side processing. And for reasons, I tend to avoid solutions that result in a lot of round trips to pull manifests if there's a better way.

There are certainly questions we asked about GC, and because of that, I'm not proposing a reference type from an index to a manifest. The thing I've been avoiding is any object that creates a reference type pointing to one of it's other child objects since that reference type is a reverse pointer for most GC designs, and we don't want to create loops with GC algorithms.

One key part for proposal E is if a registry doesn't want to upgrade, we aren't forcing that, it's the same as proposal D for them. But we do need a client to support the API if the producer half of the client could potentially be updated in the future. Otherwise the producer would create a reference type without a tag and the consumer would never see it.

sudo-bmitch · 2022-05-04T14:20:30Z

For getting involved with the process, issues and PR's are welcome on the working group repo and details on our weekly meetings are over there too.

wagoodman · 2022-05-04T15:27:56Z

I didn't see many comments about tracing points, other than only tracing the build result is not enough in many cases. We could include build context. ... Does something like Syft take multiple inputs in this case, or would we generate SBOMs for both and then try to merge them?

right now syft does not take multiple inputs or have a built in merge capability yet, but we've been eyeing adding support for both of these features depending on the use case / guiding behavior --here are those issues:

Merge: Support to stack multiple Syft SBOM files into a single one anchore/syft#617
Multiple Input: Describe multiple SBOM scan targets anchore/syft#562

@tonistiigi TL;DR: to answer the question squarely I think we'd need to understand:

a) if the items in the merged SBOM will ever need to be separated out in downstream processes, and,

b) if the merged SBOM should capture descriptions of "what was analyzed" (the "source" block) from both of the original SBOMs or not (we pick one and drop the other).

I think that separability is important for the usecases in this issue thread but would like to speculate on that further here in conversation.

Let's talk about the merge option first: the largest problem with the merge approach is loosing information about what is being described. Today there is a source block in the syft-json output which lists out information about the thing being described by the SBOM (all major SBOM formats have a spot for this). As I understand it for Syft/SPDX/CycloneDX formats, these blocks today are singular in nature --you can only describe a single thing that represents the SBOM as a whole. If we were to merge two SBOMs together that represent two things (or two different aspects of the same thing) then you will:

a) loose the ability to separate out which elements describe which source (unless you use relationships to do this, but it would be very verbose), and

b) loose information about one of the sources with the current SBOM ontologies (unless both source information matches exactly, which would be odd since you would want to distinct the different perspectives somehow I would think).

I think to move forward with a merging approach we'd need to either build something that reconciles theses problems or convince ourselves that they are not problems in this particular context.

Onto the multiple-input approach: This is just another way of looking at "merge", so has all of the same pros/cons with the merged approach. That is, since the final document would need to describe multiple sources and you want the information to be separable in some way in the future then you'll need a way to denote which things relate to which sources (again, the "separable-ness" may not be a requirement here, but I didn't want to assume that upfront), which is the same as merging two SBOMs in the first topic.

What are the ways forward with this?

Assuming that we need to organize data in a way where it can be separated back out by source again in the future (so we can tell what aspects of the SBOM are from a docker context vs what is in the final image itself):

Bundle multiple SBOMs into a single file. This may have different solutions for different formats (JSONL for json docs, concat XLM docs, no solution for the SPDX tag-value format, etc). This is simple, but pushes the concern to the document consumer to understand what's really going on (one con of many for this option).
Modify the SBOM shape to account for multiple source blocks... relating elements back to the source block. From there you could either a) tag each element with the source it came from, or b) add the ability to add relationships from sources to packages/files and add relationships explicitly for each element in each SBOM to the respective source object. The largest con with this approach is that the amount of data needed to "relate" individual items back to the source block could be large (and is directly proportional to the number of packages/files in the SBOM described).
Modify the SBOM shape to account for multiple source blocks... and keep elements organized as children of the source block. This solves the main con with option 2, but brings up questions about how a consumer should be getting the child information out appropriately; today this is jq .artifacts[] but you'd need to now do this for every source section.
Keep the SBOMs separate and relate them with "bom-refs" (this is a little bit of a cop-out answer in the sense that the answer to "merging" is to not do any merging at all, but to instead denote relationships between sboms). This can be done a few different ways... one way would be to craft a "main" SBOM that has no package/file info at all and only references child SBOMs, one for each source we want to describe, and to add a back-reference to the main SBOM in each child SBOM (optionally). The largest con to this approach is that the references would need to be addressable in someway, so is somewhat coupled to the SBOM store (that is, since these references are "external" in nature, a URL works best, but that means you couple store concerns within the SBOM).

Assuming that we don't need separability, the path forward is easier, needing only to solve a few things:

reconciling identical package/element IDs found in both documents. They may truly describes the same thing (so one can be dropped) or they may be distinct (so you will need to craft a separate ID). There is logic for this in syft already that can be adapted for this case.
selecting which of the source blocks that will represent the merged SBOM and throwing away the other source block. Or, alternatively changing the SBOM shape to allow for multiple source blocks (but don't add tags/relationships to separate out package/file elements by source).

tonistiigi · 2022-05-05T22:56:36Z

reconciling identical package/element IDs found in both documents.

I'd imagine structure-wise best would be that the different types of elements are merged together and if multiple scanning points detect the same type(ID?) of elements then they are grouped into an array. That would though mean that we can only add scanning points to known locations or user needs to create a scanning point and assign a name for it(potentially we could use Dockerfile stage name).

Eg. for a multi-stage rust project

- Build dependencies (provided by buildkit)
- Cargo dependencies (scanned from build context)
- Alpine packages
| - Alpine packages Build stage
| - Alpine packages Runtime stage

From Dockerfile we could trigger runtime scan and build context scan automatically. For " Alpine packages Build stage" user would need to add a scanning point to the Dockerfile stage called "build".

chris-crone · 2022-05-06T20:20:56Z

For getting involved with the process, issues and PR's are welcome on the working group repo and details on our weekly meetings are over there too.

I'll move the conversation to that repo, thanks for the pointers @sudo-bmitch @SteveLasker

sudo-bmitch · 2022-05-06T20:46:39Z

@tonistiigi could there be an option to allow users to generate their own SBOMs?

FROM compiler:latest as build
COPY . /src
RUN do install stuff

FROM build as sbom-generate
RUN --mount=type=bind,from=vendor/sbom:latest,target=/sbom /sbom/scan /src >/out/sbom.json

FROM scratch as sbom-export
COPY --from=sbom-generate /out/sbom.json /sbom.json

FROM runtime:latest as release
COPY --from=build /src/bin/ /usr/local/bin/
CMD [ "/usr/local/bin/app" ]

Then:

docker buildx build --sbom-from=target=sbom-export,file=/sbom.json -t repo/your-image .

could extract the file from that stage and attach it as an SBOM. It's not as pretty as a prepackaged solution with a frontend, so still consider how to do that. But this gives flexibility for the power users that need something different than the prepackaged solution, while still using buildx to create the image and attach the SBOM.

jedevc · 2022-05-09T09:24:23Z

I think the option to let users generate their own SBOMs how they want is probably quite important. We're also looking at how in-toto attestations might be generated and collected in-build as well, so it might be worth having a similar generic interface for both of them.

I had an idea that instead of collecting SBOMs into a file in a stage, we could maybe collect them into a special mount type - that could avoid needing to specify a stage for the file, since we could just build all the stages that reference that mount type.

Adapting the example from above:

FROM compiler:latest as build
COPY . /src
RUN do install stuff

FROM build as sbom
RUN --mount=type=bind,from=vendor/sbom:latest,target=/sbom \
         --mount=type=sbom,name=myscan,target=/scanresults \
         /sbom/scan /src >/scanresults/sbom.json

FROM runtime:latest as release
COPY --from=build /src/bin/ /usr/local/bin/
CMD [ "/usr/local/bin/app" ]

Then the command line could be something like:

docker buildx build --sbom-from=myscan:sbom.json -t repo/your-image .

I think this might be a possible better approach when custom frontends get involved, since the custom frontend doesn't then need to use the same stages abstraction, but can instead just use the common mount which are the same for all frontends. Aside from that, I don't think there's that many advantages of this approach, just think it's worth presenting as a possible alternative.

sudo-bmitch · 2022-05-09T12:55:21Z

Thinking through @jedevc's example, I really like the idea of a different kind of mount. I'd just make it slightly more generic, which would give buildx the ability to use those mounts for other things in addition to SBOMs:

FROM compiler:latest as build
COPY . /src
RUN --mount=type=bind,from=vendor/intoto:latest,target=/intoto \
         --mount=type=output,name=intoto,target=/attest \
         /intoto attest --out /attest/result.json -- do install stuff
RUN --mount=type=bind,from=vendor/sbom:latest,target=/sbom \
         --mount=type=output,name=sbom,target=/scanresults \
         /sbom/scan /src >/scanresults/sbom.json

FROM runtime:latest as release
COPY --from=build /src/bin/ /usr/local/bin/
CMD [ "/usr/local/bin/app" ]

And then we could have something like:

docker buildx build --sbom-from=sbom:/scanresults/sbom.json --attestion-from=intoto:/attest/result.json -t repo/your-image .

That could also make it possible for multiple outputs, like creating an image and generating binaries.

nishakm · 2022-05-09T13:50:17Z

Thanks @sudo-bmitch! I'd +1 the idea of an external mount to collect at-build data.

westonsteimel · 2022-05-24T12:45:03Z

Thanks so much for working on this integration! As a start I'd be really interested in seeing the full sbom output per stage of a multi-stage dockerfile.

tonistiigi · 2022-06-04T00:41:06Z

Putting together some more concrete implementation steps for review:

Step1. Generic (unsigned) attestation support

Add the ability for the BuildKit client or frontend to add custom attestations to the build results in the gateway API. This will allow additional objects with the image like SBOM.

Currently the build result is defined as:

type Result struct {
	Ref      Reference
	Refs     map[string]Reference
	Metadata map[string][]byte

Should be extended with

type Result struct {
     ….
     Attestations map[string][]Attestation
}

type Attestation struct {
    Ref Reference
    Path string // filename inside the reference that contains the Attestation raw bytes
    PredicateType string // https://syft.dev/bom , https://spdx.dev/Document etc
    Annotations map[string]string // annotations set for the blob descriptor, this could be used for freeform attestation type (maybe define this more precisely).

  // We should think if we need some fields for manual signatures. Ideally eventually BuildKit will do signing for you but may need something in the shorter term. 
}

Key in Attestations map is the same key as used in the Refs map. We could also consider changing the Reference type for a similar effect. The main consideration is that protobuf types remain backward compatible without deprecating a lot of fields.

Array of Attestations is turned into an image manifest identical to the one currently created by cosign. Example. Except in the initial version, attestations would be unsigned unless an external signature is passed so without dsse.envelope. Each attestation is added as a blob in the layers array. The attestation data will be JSON that is used as a predicate in “in-toto” format, eg.

{
  "_type": "https://in-toto.io/Statement/v0.1",
  "predicateType": "https://syft.dev/bom",
  "subject": [
    {
      "name": "",
      "digest": {
        "sha256": "abcdef"
      }
    }
  ],
  "predicate": <user-provided-data>
}

BuildKit will use json.RawMessage to insert the predicate as-is. BuildKit will fill in the subject to point to the correct image.

This manifest of attestations is linked with the exported image index using the Proposal F design of adding signatures and attestations to images. This proposal fits best with our requirements. Should the WG make modifications or end up with a different design that also fits our requirements we will move things around as needed. This part will eventually be experimental until we are comfortable with the format.

     "annotations": {
        "org.opencontainers.reference": "sha256:020202...", // digest to image manifest Ref
        "org.opencontainers.reference.type": "attest"
      }

Attestations are defined for a single-arch image manifest descriptor. When building a multi-arch image, each submanifest gets its own attestations array.

As an example, the way a BuildKit client or a frontend could use this feature to export an image with SBOM attestation is to run a container using llb.Exec that has access to the traced files and adding an RW mount to it, eg. /sbom-out. Inside the container, a process will generate SBOM bits and write it to /sbom-out/sbom.spdx.att. The client will add to the build result Attestation{} struct where Attestation.Ref points to the sbom-out mount and set Attestation.Path to sbom.spdx.att.

Step2. Simple opt-in SBOM attestation

With generic attestation support, users can run a process as part of their build that generates SBOM and then set it as an attestation to the build result. While this is powerful, we want the default experience of adding SBOM to any user build to be much simpler.

buildctl build -o attest:sbom=true .
buildctl build -o attest:sbom=generator-image .

Buildx level might also consider a special --attest flag.

The generator image provides the SBOM data by scanning the specified content. We don’t want to maintain these scanners as there are already plenty of options but we should define a good default (default generator image could also be defined in buildx level if too opinionated for buildkit).

When built with such exporter options, it is an indication to the BuildKit frontend that the user wishes SBOM attestation to be added.

A frontend can use its own logic to understand what build points should be used as an input to the SBOM generator image. SBOM can be generated on multiple build points - we don’t just want to trace the final image but also the build context and possibly some of the intermediate build stages. If multiple SBOMs are generated, they each add one attestation to the attestation array. Currently, there is no standard for “merged SBOM” so it is up to the tooling that accepts SBOM as input to determine how it wants to present multiple SBOMs to the user.

If the frontend has generated an SBOM attestation, it leaves a mark with that in the build result. If the frontend completes its build but did not generate SBOM (eg. it was old or a custom frontend) BuildKit will use its default flow to generate the SBOM itself by running the generator on the final build image and build context. When the user requested an SBOM attestation for the build, one is guaranteed to be created.

The SBOM generated with simple opt-in will be in SPDX format using “https://spdx.dev/Document” predicate. https://github.com/in-toto/in-toto-golang/blob/master/in_toto/model.go#L81 (unless we find another format is superior) . Custom generators could use whatever format they like if they are not worried about tools reading their SBOM not understanding custom formats.

Step 3. Add Dockerfile syntax for custom attestation and SBOM generation steps

Complex multi-stage builds should allow marking specific points in the build where the files that are needed for generating SBOM are located. This way dependencies that can not be determined from the final image or build context can still be captured by the SBOM.

There should also be a way for the user to generate SBOM as part of their own Dockerfile commands and mark it as a file that should be used as the contents of the SBOM.

To implement this, Dockerfile frontend will look for the attest:sbom opt-in from the user described in step 2, run sbom generators, and output attestations to the result as described in step 1. Dockerfile syntax that lets users configure this behavior TBD. Eg. it could be a comment describing that a specific intermediate stage should be scanned, or a special mountpoint in the RUN command where the user process can write the attestation.

jedevc · 2022-06-15T14:54:20Z

Have started making some progress towards step 1, and had a couple notes/questions.

Do we want to allow the annotation generator (frontend, sbom, etc) to specify a subject, or should it automatically be filled (and if so, with what?)? I think we probably want to allow a configurable subject, since there are scenarios where an attestation may be made about a file, or a directory, or a code review, or whatever an attestation generator may want to attest about. This should help with integrating with other tooling.
Do we want to allow bundling attestations together into logical groups beyond simply platform? Proposal F allows multiple manifests to reference a target, so we could potentially do this. In an example use case, Syft output could be attached into a syft-attestion-specific manifest, while materials related attestations could be grouped into a materials manifest, and processed separately

tonistiigi · 2022-06-15T17:08:24Z

I think the subject should always be automatic. We are adding attestations about the things we built, not random notes. Also because frontend can control this we don't want misbehaving frontend to produce something unexpected for the user.
I think it makes more sense to group things by the subject rather than the attestation type. This is also required by org.opencontainers.reference.digest in proposal F. Regarding whether to group things for a single subject or not we should probably minimize the number of objects by default so put all the attestations for single subject together. This should not be a requirement for the data format though. On looking up the attestation client should walk through all possible matches and not expect them to be in a single group.

williamdes · 2023-02-16T16:40:23Z

can we have some state of the art summary about this topic?
it's been quite a while and now it's closed so maybe everything is released and ready to use?

Edit: to not spam too much people, I post my thank you for this link! here

jedevc · 2023-02-16T16:56:33Z

See https://www.docker.com/blog/generate-sboms-with-buildkit/ 🎉

It's released, and ready to use.

tonistiigi added the kind/enhancement label Apr 6, 2022

chris-crone mentioned this issue Apr 8, 2022

[Feature Request]: generating SBOMs for container images while building them docker/roadmap#274

Closed

jerbia mentioned this issue Apr 30, 2022

Custom build steps #2840

Open

jedevc mentioned this issue Jun 27, 2022

Initial attestations support #2935

Merged

bureado mentioned this issue Jul 6, 2022

Possiblity to run Syft on a Dockerfile anchore/syft#1074

Closed

arkodg mentioned this issue Jul 7, 2022

Set up software supply chain security envoyproxy/gateway#98

Open

jedevc mentioned this issue Jul 26, 2022

SBOM attestations generation #2983

Merged

3 tasks

tianon mentioned this issue Aug 25, 2022

GH action: add SBOM + scanner for dockerfile and docker images adoptium/containers#260

Closed

jedevc mentioned this issue Sep 5, 2022

dockerfile: allow heredocs for CMD instruction #3082

Closed

jedevc closed this as completed Feb 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BuildKit + SBOM integration #2773

BuildKit + SBOM integration #2773

tonistiigi commented Apr 6, 2022

SteveLasker commented Apr 7, 2022

dlorenc commented Apr 8, 2022

OR13 commented Apr 8, 2022

dirien commented Apr 8, 2022

sudo-bmitch commented Apr 8, 2022

AuraSinis commented Apr 8, 2022

chris-crone commented Apr 8, 2022

AuraSinis commented Apr 8, 2022

chris-crone commented Apr 8, 2022

nishakm commented Apr 11, 2022

tonistiigi commented May 3, 2022

sudo-bmitch commented May 3, 2022

chris-crone commented May 4, 2022 •

edited

Loading

SteveLasker commented May 4, 2022

sudo-bmitch commented May 4, 2022 •

edited

Loading

sudo-bmitch commented May 4, 2022

wagoodman commented May 4, 2022 •

edited

Loading

tonistiigi commented May 5, 2022

chris-crone commented May 6, 2022

sudo-bmitch commented May 6, 2022

jedevc commented May 9, 2022

sudo-bmitch commented May 9, 2022 •

edited

Loading

nishakm commented May 9, 2022

westonsteimel commented May 24, 2022 •

edited

Loading

tonistiigi commented Jun 4, 2022

jedevc commented Jun 15, 2022

tonistiigi commented Jun 15, 2022

williamdes commented Feb 16, 2023 •

edited

Loading

jedevc commented Feb 16, 2023

BuildKit + SBOM integration #2773

BuildKit + SBOM integration #2773

Comments

tonistiigi commented Apr 6, 2022

SteveLasker commented Apr 7, 2022

dlorenc commented Apr 8, 2022

OR13 commented Apr 8, 2022

dirien commented Apr 8, 2022

sudo-bmitch commented Apr 8, 2022

AuraSinis commented Apr 8, 2022

chris-crone commented Apr 8, 2022

AuraSinis commented Apr 8, 2022

chris-crone commented Apr 8, 2022

nishakm commented Apr 11, 2022

tonistiigi commented May 3, 2022

sudo-bmitch commented May 3, 2022

chris-crone commented May 4, 2022 • edited Loading

SteveLasker commented May 4, 2022

sudo-bmitch commented May 4, 2022 • edited Loading

sudo-bmitch commented May 4, 2022

wagoodman commented May 4, 2022 • edited Loading

tonistiigi commented May 5, 2022

chris-crone commented May 6, 2022

sudo-bmitch commented May 6, 2022

jedevc commented May 9, 2022

sudo-bmitch commented May 9, 2022 • edited Loading

nishakm commented May 9, 2022

westonsteimel commented May 24, 2022 • edited Loading

tonistiigi commented Jun 4, 2022

Step1. Generic (unsigned) attestation support

Step2. Simple opt-in SBOM attestation

Step 3. Add Dockerfile syntax for custom attestation and SBOM generation steps

jedevc commented Jun 15, 2022

tonistiigi commented Jun 15, 2022

williamdes commented Feb 16, 2023 • edited Loading

jedevc commented Feb 16, 2023

chris-crone commented May 4, 2022 •

edited

Loading

sudo-bmitch commented May 4, 2022 •

edited

Loading

wagoodman commented May 4, 2022 •

edited

Loading

sudo-bmitch commented May 9, 2022 •

edited

Loading

westonsteimel commented May 24, 2022 •

edited

Loading

williamdes commented Feb 16, 2023 •

edited

Loading