-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BuildKit + SBOM integration #2773
Comments
As an example for how to push SBOMs into registries, ORAS Artifacts does just this: oras push $REGISTRY/$REPO \
--artifact-type 'sbom/example' \
--subject $IMAGE \
./sbom.json:application/json This is also avaialble in Azure, today (preview): https://aka.ms/acr/oras-artifacts |
This is the tricky part today. The OCI Reference Types Working Group is making quite a bit of progress on the new registry specs here though! |
This was a weekend project a while back, but you can push signed json as part of the container build process using github container registry today: https://github.com/transmute-industries/public-credential-registry-template/blob/main/docs/public-container-registry.md Here we include a jwt built from the previous steps in the container push call: |
HI @caarlos0, We should keep an eye on this regarding goreleaser. So in theory, goreleaser would need to provide the SBOM in the directory |
I want to be sure all of the source code dependencies are included. Whether that means running the generator before the compile, or using languages that include that metadata in the binaries for later retrieval, I'm flexible. But if the SBOM only includes Linux packages and misses that log4j is included in the jar, this won't be as useful to me.
That's the big question right now. STAG's Secure Supply Chain WG was discussing that yesterday and I think it will be a focus of our next WG goals. And OCI has their reference type WG trying to come up with a good spec for how to associate artifacts with images. And there's currently an OCI artifact definition that describes how to package data using the image manifest media type and a custom config media type. For portability, the two biggest areas that I know we're working on to make this possible (I name here only to give us a checklist of work that needs to be done, not intending any public shame):
I believe everything we're trying to create in these working groups will support what Docker and many others are trying to do. Let me know if we're missing a use case. Happy to chat more on these, either here or in one of the many slacks. |
Is there additional auth required to run this command on an image from a (private) registry? |
@AuraSinis there is a known bug if you're using plain text credentials in your We'll ship that as part of the next Desktop release and when we ship the command in the Linux packages. |
Is permission to pull an image from a registry enough to be able to |
@AuraSinis, yes! Generating the SBOM involves pulling the image and then locally analyzing its contents. Once you have the image locally, no network calls need to be done. |
Is it possible to expose the "live" container filesystem via llb? I would like to see something like [generating an SBOM for the base container image](tern report --live $mnt -f spdxjson -o debian-sbom) and then [generating an SBOM for the derived container image](tern report --live $mnt -f spdxjson -ctx debian-sbom -o python-sbom). This would allow us to generate SBOMs for multistage builds. |
@sudo-bmitch From the proposals you listed, D seems to be the only one with some backward compatibility. I think that part is most important for us. But as you said it probably still doesn't work with current Docker Hub because of the invalid config mediatype. Just linking to the sbom blob from the manifest list would in practice work almost everywhere but I guess you don't like it based on the comments on some previous structures that have attempted that? I don't have any ETA when Hub could support these config mediatypes though. Given either of these two options, I think we are ready to start implementing it similarly to the D proposal. I didn't see many comments about tracing points, other than only tracing the build result is not enough in many cases. We could include build context. @wagoodman Does something like Syft take multiple inputs in this case, or would we generate SBOMs for both and then try to merge them? Other than build result and build context I don't see content that could be scanned automatically. Possibly a best option is for user to provide something in the Dockerfile if they expect a intermediate stage/step to be scanned as well(eg. with a code comment). Regarding composition and invocation points. If we assume the constraints that we do not want to add any specific scanning logic to the Dockerfile frontend itself I think it is best if the SBOM integration point is implemented as its own frontend. This is much more flexible than Dockerfile frontend just running a container with a scanner process. Now we would need to figure out the protocol between the "builder frontend" (eg. Dockerfile) and "sbom generator" frontend. Frontends can already call each other so it is more a matter of making them understand that others exist. Options: a) Dockerfile frontend does its build. Before result is sent to the exporter it is instead sent to the "next frontend". Assuming there are multiple scanning points, all of them would need to be sent separately via some array definition. b) Dockerfile frontend calls a "sbom frontend" for each scanning point internally. Then merges them all together(or forwards all and exporter merges them together). This seems most practical but every custom frontend would also need to implement these internal calls. c) We call "sbom frontend" instead and it calls Dockerfile frontend. Problem with this one is that it can't really scan more than build result and context because "sbom frontend" does not know how to parse Dockerfile for extra scanning points. Regarding protocol for chaining frontends, are there more use cases than SBOM that could benefit from it? If there are what would be the UX of how user controls it? With only SBOM I would imaging the UX would be something simple as |
@tonistiigi the working group appears to be consolidating behind proposal E which also has a backwards compatible component. There's still some work to be done before we're ready to propose changes to the various OCI specs. If you're building an example implementation of a proposal, we'd love to work with you to be sure we're proposing something that's usable to the community. |
@sudo-bmitch Our goal is definitely to build something aligned with the broader community. What's the best way to catch up on how you've got to Proposal E and to feed into the process? On a quick read of Proposal E and without all the context of previous discussions, my initial take is that I do not like the idea of manifests referencing manifests. This breaks a core assumption of registry data models– manifests are linked together by manifest lists/indexes. The proposed reference manifest field adds another layer of dependencies that will need to be navigated. It doesn't only have GC implications but could impact other things too. I see the need to call out links between manifests but I think that Proposal D succeeds in doing this. |
Hey @chris-crone, The premise of reference types is you push the SBOM with a pointer back to the specific ubuntu image (linked by its digest) Here's a video that might help Artifact ReferenceTypes Proposal E has a means to support existing registries and registries that wish to natively support these reverse links. |
@chris-crone I was looking at whether we could do all of this with an index or manifest list, pushing artifacts directly into that and shipping along side images. I think Proposal C was also going doing that path. The challenges I ran into included race conditions where two tools try to extend the same manifest and update the same index with the other change missing, and digests changing on the index with every update to an attached artifact. It felt like it was breaking too many existing workflows to try solving it at that level. So we've mostly opted for solutions that leave the original manifest untouched and give the user a way to discover things that point to that manifest. Proposal D does that entirely client side by leveraging a tag syntax. And proposal E extends that to allow some of the work to occur on registries that choose to support a new API. Both of these are giving a way to have a manifest reference another manifest, it's then a question of whether we want to allow registries to offload the query process so the client isn't pulling a long list of matching tags to discover the specific artifact they are looking for. Thinking of why we want to offload that query to the registry, I've imagined a scenario where a vulnerability scanner updates a signature on an image, perhaps daily, to indicate it passed the security check and is allowed to run in production. After a month of scan updates, a client may have to pull 30 manifests to find the current one if all it has is the digest tags that require client side processing. And for reasons, I tend to avoid solutions that result in a lot of round trips to pull manifests if there's a better way. There are certainly questions we asked about GC, and because of that, I'm not proposing a reference type from an index to a manifest. The thing I've been avoiding is any object that creates a reference type pointing to one of it's other child objects since that reference type is a reverse pointer for most GC designs, and we don't want to create loops with GC algorithms. One key part for proposal E is if a registry doesn't want to upgrade, we aren't forcing that, it's the same as proposal D for them. But we do need a client to support the API if the producer half of the client could potentially be updated in the future. Otherwise the producer would create a reference type without a tag and the consumer would never see it. |
For getting involved with the process, issues and PR's are welcome on the working group repo and details on our weekly meetings are over there too. |
right now syft does not take multiple inputs or have a built in merge capability yet, but we've been eyeing adding support for both of these features depending on the use case / guiding behavior --here are those issues:
@tonistiigi TL;DR: to answer the question squarely I think we'd need to understand: a) if the items in the merged SBOM will ever need to be separated out in downstream processes, and, b) if the merged SBOM should capture descriptions of "what was analyzed" (the "source" block) from both of the original SBOMs or not (we pick one and drop the other). I think that separability is important for the usecases in this issue thread but would like to speculate on that further here in conversation. Let's talk about the merge option first: the largest problem with the merge approach is loosing information about what is being described. Today there is a a) loose the ability to separate out which elements describe which source (unless you use relationships to do this, but it would be very verbose), and b) loose information about one of the sources with the current SBOM ontologies (unless both source information matches exactly, which would be odd since you would want to distinct the different perspectives somehow I would think). I think to move forward with a merging approach we'd need to either build something that reconciles theses problems or convince ourselves that they are not problems in this particular context. Onto the multiple-input approach: This is just another way of looking at "merge", so has all of the same pros/cons with the merged approach. That is, since the final document would need to describe multiple sources and you want the information to be separable in some way in the future then you'll need a way to denote which things relate to which sources (again, the "separable-ness" may not be a requirement here, but I didn't want to assume that upfront), which is the same as merging two SBOMs in the first topic. What are the ways forward with this? Assuming that we need to organize data in a way where it can be separated back out by source again in the future (so we can tell what aspects of the SBOM are from a docker context vs what is in the final image itself):
Assuming that we don't need separability, the path forward is easier, needing only to solve a few things:
|
I'd imagine structure-wise best would be that the different types of elements are merged together and if multiple scanning points detect the same type(ID?) of elements then they are grouped into an array. That would though mean that we can only add scanning points to known locations or user needs to create a scanning point and assign a name for it(potentially we could use Dockerfile stage name). Eg. for a multi-stage rust project
From Dockerfile we could trigger runtime scan and build context scan automatically. For " Alpine packages Build stage" user would need to add a scanning point to the Dockerfile stage called "build". |
I'll move the conversation to that repo, thanks for the pointers @sudo-bmitch @SteveLasker |
@tonistiigi could there be an option to allow users to generate their own SBOMs?
Then:
could extract the file from that stage and attach it as an SBOM. It's not as pretty as a prepackaged solution with a frontend, so still consider how to do that. But this gives flexibility for the power users that need something different than the prepackaged solution, while still using buildx to create the image and attach the SBOM. |
I think the option to let users generate their own SBOMs how they want is probably quite important. We're also looking at how in-toto attestations might be generated and collected in-build as well, so it might be worth having a similar generic interface for both of them. I had an idea that instead of collecting SBOMs into a file in a stage, we could maybe collect them into a special mount type - that could avoid needing to specify a stage for the file, since we could just build all the stages that reference that mount type. Adapting the example from above: FROM compiler:latest as build
COPY . /src
RUN do install stuff
FROM build as sbom
RUN --mount=type=bind,from=vendor/sbom:latest,target=/sbom \
--mount=type=sbom,name=myscan,target=/scanresults \
/sbom/scan /src >/scanresults/sbom.json
FROM runtime:latest as release
COPY --from=build /src/bin/ /usr/local/bin/
CMD [ "/usr/local/bin/app" ] Then the command line could be something like:
I think this might be a possible better approach when custom frontends get involved, since the custom frontend doesn't then need to use the same stages abstraction, but can instead just use the common mount which are the same for all frontends. Aside from that, I don't think there's that many advantages of this approach, just think it's worth presenting as a possible alternative. |
Thinking through @jedevc's example, I really like the idea of a different kind of mount. I'd just make it slightly more generic, which would give buildx the ability to use those mounts for other things in addition to SBOMs: FROM compiler:latest as build
COPY . /src
RUN --mount=type=bind,from=vendor/intoto:latest,target=/intoto \
--mount=type=output,name=intoto,target=/attest \
/intoto attest --out /attest/result.json -- do install stuff
RUN --mount=type=bind,from=vendor/sbom:latest,target=/sbom \
--mount=type=output,name=sbom,target=/scanresults \
/sbom/scan /src >/scanresults/sbom.json
FROM runtime:latest as release
COPY --from=build /src/bin/ /usr/local/bin/
CMD [ "/usr/local/bin/app" ] And then we could have something like:
That could also make it possible for multiple outputs, like creating an image and generating binaries. |
Thanks @sudo-bmitch! I'd +1 the idea of an external mount to collect at-build data. |
Thanks so much for working on this integration! As a start I'd be really interested in seeing the full sbom output per stage of a multi-stage dockerfile. |
Putting together some more concrete implementation steps for review: Step1. Generic (unsigned) attestation supportAdd the ability for the BuildKit client or frontend to add custom attestations to the build results in the gateway API. This will allow additional objects with the image like SBOM. Currently the build result is defined as:
Should be extended with
Key in Array of
BuildKit will use This manifest of attestations is linked with the exported image index using the Proposal F design of adding signatures and attestations to images. This proposal fits best with our requirements. Should the WG make modifications or end up with a different design that also fits our requirements we will move things around as needed. This part will eventually be experimental until we are comfortable with the format.
Attestations are defined for a single-arch image manifest descriptor. When building a multi-arch image, each submanifest gets its own attestations array. As an example, the way a BuildKit client or a frontend could use this feature to export an image with SBOM attestation is to run a container using Step2. Simple opt-in SBOM attestationWith generic attestation support, users can run a process as part of their build that generates SBOM and then set it as an attestation to the build result. While this is powerful, we want the default experience of adding SBOM to any user build to be much simpler.
Buildx level might also consider a special The generator image provides the SBOM data by scanning the specified content. We don’t want to maintain these scanners as there are already plenty of options but we should define a good default (default generator image could also be defined in buildx level if too opinionated for buildkit). When built with such exporter options, it is an indication to the BuildKit frontend that the user wishes SBOM attestation to be added. A frontend can use its own logic to understand what build points should be used as an input to the SBOM generator image. SBOM can be generated on multiple build points - we don’t just want to trace the final image but also the build context and possibly some of the intermediate build stages. If multiple SBOMs are generated, they each add one attestation to the attestation array. Currently, there is no standard for “merged SBOM” so it is up to the tooling that accepts SBOM as input to determine how it wants to present multiple SBOMs to the user. If the frontend has generated an SBOM attestation, it leaves a mark with that in the build result. If the frontend completes its build but did not generate SBOM (eg. it was old or a custom frontend) BuildKit will use its default flow to generate the SBOM itself by running the generator on the final build image and build context. When the user requested an SBOM attestation for the build, one is guaranteed to be created. The SBOM generated with simple opt-in will be in SPDX format using “https://spdx.dev/Document” predicate. https://github.com/in-toto/in-toto-golang/blob/master/in_toto/model.go#L81 (unless we find another format is superior) . Custom generators could use whatever format they like if they are not worried about tools reading their SBOM not understanding custom formats. Step 3. Add Dockerfile syntax for custom attestation and SBOM generation stepsComplex multi-stage builds should allow marking specific points in the build where the files that are needed for generating SBOM are located. This way dependencies that can not be determined from the final image or build context can still be captured by the SBOM. There should also be a way for the user to generate SBOM as part of their own Dockerfile commands and mark it as a file that should be used as the contents of the SBOM. To implement this, Dockerfile frontend will look for the |
Have started making some progress towards step 1, and had a couple notes/questions.
|
|
can we have some state of the art summary about this topic? Edit: to not spam too much people, I post my thank you for this link! here |
See https://www.docker.com/blog/generate-sboms-with-buildkit/ 🎉 It's released, and ready to use. |
Docker has created an experimental
docker sbom
command https://github.com/docker/sbom-cli-plugin that is shipping in the Docker Desktop 4.7.0 release and provides visibility into the components that make up the image.The current command works on existing images, pulling them down and analyzing the contents of the layers. In addition to scanning existing images, it could be useful to see if in some cases it would be better to capture the SBOM data in build time instead. Theoretically, this gives us access to more data points because a lot of dependencies that are used during a build don’t end up in the final image or their version information has been lost. We can also combine it with the information of the BuildKit build itself https://github.com/moby/buildkit/blob/master/docs/build-repro.md .
As a sample, we’ve created a simple POC frontend that you can use with the existing BuildKit/Buildx installation.
To test it you can put
# syntax=crazymax/dockerfile:sbom
as first line of your Dockerfile or setBUILDKIT_SYNTAX
build argument.For your existing build you can run:
That will run your build, generate SBOM for it, and export it into the local “sbom” directory:
The current POC and Docker CLI plugin uses Syft as a backend. We don’t plan to hardcode any application logic into BuildKit for detecting a specific language dependency etc. but keep this part pluggable (either by the user or frontend author) in containers, with good defaults.
There are many open questions that need to be figured out and for which we would like to get more feedback. For example:
Your comments, suggestions, feedback, use cases and help are welcome! We want to make images built with Docker better and we want to make the experience of developing features to do that great. We’re happy to collaborate here, elsewhere on GitHub and on our community Slack channel (#buildkit).
The text was updated successfully, but these errors were encountered: