-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(sbom)!: overhaul SBOM generation logic #1474
Conversation
"originator": "Organization: Wolfi",
"supplier": "Organization: Wolfi", |
Nitpicks: {
"SPDXID": "SPDXRef-Package-go-containerregistry-v0.20.2",
"name": "go-containerregistry",
"versionInfo": "v0.20.2",
"filesAnalyzed": false,
"licenseConcluded": "NOASSERTION",
"licenseDeclared": "Apache-2.0",
"downloadLocation": "NOASSERTION",
"originator": "Organization: Google",
"supplier": "Organization: Google",
"externalRefs": [
{
"referenceCategory": "PACKAGE-MANAGER",
"referenceLocator": "pkg:github/google/go-containerregistry@v0.20.2",
"referenceType": "purl"
}
]
} Previously multiple PURLs were generated for pkg:github style references, because tag is not fixed in time; and lots of upstreams delete tags or rewrite them. Given we have versionInfo as a top level information on this spdx package, can we please switch externalref to using git commit hash, rather than version string. As that would be more precise. Separately, I actually originally wanted to simply use downloadLocation - but couldn't, as I only had a single downloadLocation and needed to encode multiple locations. Now that there is 1:1 downloadLocation, i wonder if we can shove a git hurl into that, and not even emit externalRefs. Separate I hate boilerplate that we emit which is useless. I ponder if we can omit filesAnalyzed, licenseConcluded copyrightText keys, as we never fill them in, so they have no value. I guess that's unrelated to the above reorg and can be submitted stand alone. Have you attempted an image build with these packages? are there changes needed in apko which "merges" things? |
Expanding on the above, given we now have packages for things, I do wonder if we should remove externalref PURL alltogether and instead fill in: https://spdx.github.io/spdx-spec/v2.3/package-information/#77-package-download-location-field with direct url to plain yaml file; tarball; git+https url with commit encoded And for tarball downloads also emit https://spdx.github.io/spdx-spec/v2.3/package-information/#710-package-checksum-field for the expected hash. Possibly for git checkouts too. Because purls are ugly, and difficult to decode, and are not "click to download" experience. |
Awesome feedback ❤️ — talked more w/ @xnox about some of the nuanced points. Some of these beyond the intended scope of this PR, but we'll go ahead and tackle any that are easy enough to include now, and we'll open an issue for anything left. Capturing a list of TODOs:
|
I also dream a dream, to be able to encode/include 'libfoo-static' as a package in sbom, when it is used in build-depends, or some other field declaring that. I.e. such that boringssl-fips-static does make it into SBOMs of like envoy-fips package. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just various comments. next is for me to build this against a few packages and check if it all is sensible.
// and the values are the checksums. | ||
// | ||
// TODO: We're not currently using this field, consider removing it. | ||
Checksums map[string]string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SPDX says do not include checksum for package that includes the SPDX itself. So we can start using this, if we move sbom from data tarball to control tarball and set this to like data tarball sha256sum.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, we can remove it for now and add it back when we know what we end up doing. This part of the code is a little awkward too, so it's probably good to approach it fresh with a clear use case.
Overall thoughts — these are all great comments, and there's a few I should address ASAP. Also this PR is pretty huge already, so what I'm thinking is we should focus on getting tests in place to assert the SBOMs we expect to produce, and then keep iterating on this as fast follows. Thoughts? |
Yes that. Plan was for me to build a few things. And then see how they look. And yeah instrument small thing that is a useful demonstrator of all the things expected to still work. Like tiny packages with all the expected features. Cause we need good small test cases of all the things. |
FYI — I plan to return to this next week after my vacation to finish this up. |
491450a
to
f039274
Compare
- Introduce new API for creating SBOMs during builds - Make SBOMs a first class concept throughout the entire build - Separate out distinct packages for upstream source and build configuration Signed-off-by: Dan Luhring <dluhring@chainguard.dev>
Signed-off-by: Dan Luhring <dluhring@chainguard.dev>
Signed-off-by: Dan Luhring <dluhring@chainguard.dev>
Signed-off-by: Dan Luhring <dluhring@chainguard.dev>
939a52c
to
edf7a87
Compare
Signed-off-by: Dan Luhring <dluhring@chainguard.dev>
I believe these are brittle and inferior to the newer integration testing for SBOMs, which checks external refs along with every other field value for our SBOMs. Signed-off-by: Dan Luhring <dluhring@chainguard.dev>
Signed-off-by: Dan Luhring <dluhring@chainguard.dev>
Signed-off-by: Dan Luhring <dluhring@chainguard.dev>
f9baebe
to
af2d7b0
Compare
Signed-off-by: Dan Luhring <dluhring@chainguard.dev>
Signed-off-by: Dan Luhring <dluhring@chainguard.dev>
Signed-off-by: Dan Luhring <dluhring@chainguard.dev>
Signed-off-by: Dan Luhring <dluhring@chainguard.dev>
The new melange integration tests are working great. It's already been so nice to have debugging as an option when things go unexpectedly. 🎉 I've also tried to add an integration test that uses apko to build images from melange's packages (and their SBOMs), but I'm hitting enough frictions that I'm going to stop and come back to that later. In the meantime, here's an example of an apko SBOM using these new changes. I've built `sbom-aarch64.spdx.json`{
"SPDXID": "SPDXRef-DOCUMENT",
"name": "sbom-sha256:0ba74df91115b4c378c31d29d449ad1fef18f5c2b29fbc806d0183fc202a276e",
"spdxVersion": "SPDX-2.3",
"creationInfo": {
"created": "2024-10-10T21:49:59Z",
"creators": [
"Tool: apko (devel)",
"Organization: Chainguard, Inc"
],
"licenseListVersion": "3.16"
},
"dataLicense": "CC0-1.0",
"documentNamespace": "https://spdx.org/spdxdocs/apko/",
"documentDescribes": [
"SPDXRef-Package-sha256-248efa08bf88cba44c471bc969160798fb70ddfc852ad528cd4382f952d3cc5d"
],
"packages": [
{
"SPDXID": "SPDXRef-Package-sha256-248efa08bf88cba44c471bc969160798fb70ddfc852ad528cd4382f952d3cc5d",
"name": "sha256:248efa08bf88cba44c471bc969160798fb70ddfc852ad528cd4382f952d3cc5d",
"versionInfo": "sha256:248efa08bf88cba44c471bc969160798fb70ddfc852ad528cd4382f952d3cc5d",
"filesAnalyzed": false,
"description": "apko container image",
"downloadLocation": "NOASSERTION",
"supplier": "Organization: apko-generated image",
"primaryPackagePurpose": "CONTAINER",
"checksums": [
{
"algorithm": "SHA256",
"checksumValue": "248efa08bf88cba44c471bc969160798fb70ddfc852ad528cd4382f952d3cc5d"
}
],
"externalRefs": [
{
"referenceCategory": "PACKAGE-MANAGER",
"referenceLocator": "pkg:oci/test@sha256%3A248efa08bf88cba44c471bc969160798fb70ddfc852ad528cd4382f952d3cc5d?arch=arm64\u0026mediaType=application%2Fvnd.oci.image.manifest.v1%2Bjson\u0026os=linux",
"referenceType": "purl"
}
]
},
{
"SPDXID": "SPDXRef-Package-sha256-0ba74df91115b4c378c31d29d449ad1fef18f5c2b29fbc806d0183fc202a276e",
"name": "sha256:0ba74df91115b4c378c31d29d449ad1fef18f5c2b29fbc806d0183fc202a276e",
"versionInfo": "unknown",
"filesAnalyzed": false,
"description": "apko operating system layer",
"downloadLocation": "NOASSERTION",
"supplier": "Organization: apko-generated image",
"externalRefs": [
{
"referenceCategory": "PACKAGE-MANAGER",
"referenceLocator": "pkg:oci/test@sha256%3A0ba74df91115b4c378c31d29d449ad1fef18f5c2b29fbc806d0183fc202a276e?arch=arm64\u0026mediaType=application%2Fvnd.oci.image.layer.v1.tar%2Bgzip\u0026os=linux",
"referenceType": "purl"
}
]
},
{
"SPDXID": "SPDXRef-Package-ca-certificates-bundle-20241010-r0",
"name": "ca-certificates-bundle",
"versionInfo": "20241010-r0",
"filesAnalyzed": false,
"licenseConcluded": "NOASSERTION",
"licenseDeclared": "MPL-2.0 AND MIT",
"downloadLocation": "NOASSERTION",
"originator": "Organization: Wolfi",
"supplier": "Organization: Wolfi",
"copyrightText": "\n",
"externalRefs": [
{
"referenceCategory": "PACKAGE-MANAGER",
"referenceLocator": "pkg:apk/wolfi/ca-certificates-bundle@20241010-r0?arch=aarch64",
"referenceType": "purl"
},
{
"referenceCategory": "PACKAGE-MANAGER",
"referenceLocator": "pkg:generic/ca-certificates@20241010?vcs_url=git%2Bhttps%3A%2F%2Fgitlab.alpinelinux.org%2Falpine%2Fca-certificates%40153107a3beeff73331fd6ac8fa40312f3e92f4f0",
"referenceType": "purl"
}
]
},
{
"SPDXID": "SPDXRef-Package-crane-0.20.2-r1",
"name": "crane",
"versionInfo": "0.20.2-r1",
"filesAnalyzed": false,
"licenseConcluded": "NOASSERTION",
"licenseDeclared": "Apache-2.0",
"downloadLocation": "NOASSERTION",
"originator": "Organization: Wolfi",
"supplier": "Organization: Wolfi",
"copyrightText": "\n",
"externalRefs": [
{
"referenceCategory": "PACKAGE-MANAGER",
"referenceLocator": "pkg:apk/wolfi/crane@0.20.2-r1?arch=aarch64",
"referenceType": "purl"
}
]
},
{
"SPDXID": "SPDXRef-Package-crane.yaml-f90fa1593a6b8e20cc29047659e5a6e4c95ed94a",
"name": "crane.yaml",
"versionInfo": "f90fa1593a6b8e20cc29047659e5a6e4c95ed94a",
"filesAnalyzed": false,
"licenseConcluded": "NOASSERTION",
"licenseDeclared": "NOASSERTION",
"downloadLocation": "NOASSERTION",
"originator": "Organization: Wolfi",
"supplier": "Organization: Wolfi",
"externalRefs": [
{
"referenceCategory": "PACKAGE-MANAGER",
"referenceLocator": "pkg:github/wolfi-dev/os@f90fa1593a6b8e20cc29047659e5a6e4c95ed94a#crane.yaml",
"referenceType": "purl"
}
]
},
{
"SPDXID": "SPDXRef-Package-git.luolix.top-google-go-containerregistry-v0.20.2-c195f151efe3369874c72662cd69ad43ee485128",
"name": "go-containerregistry",
"versionInfo": "v0.20.2",
"filesAnalyzed": false,
"licenseConcluded": "NOASSERTION",
"licenseDeclared": "Apache-2.0",
"downloadLocation": "NOASSERTION",
"originator": "Organization: Google",
"supplier": "Organization: Google",
"externalRefs": [
{
"referenceCategory": "PACKAGE-MANAGER",
"referenceLocator": "pkg:github/google/go-containerregistry@v0.20.2",
"referenceType": "purl"
}
]
}
],
"relationships": [
{
"spdxElementId": "SPDXRef-Package-sha256-248efa08bf88cba44c471bc969160798fb70ddfc852ad528cd4382f952d3cc5d",
"relationshipType": "CONTAINS",
"relatedSpdxElement": "SPDXRef-Package-sha256-0ba74df91115b4c378c31d29d449ad1fef18f5c2b29fbc806d0183fc202a276e"
},
{
"spdxElementId": "SPDXRef-Package-crane-0.20.2-r1",
"relationshipType": "DESCRIBED_BY",
"relatedSpdxElement": "SPDXRef-Package-crane.yaml-f90fa1593a6b8e20cc29047659e5a6e4c95ed94a"
},
{
"spdxElementId": "SPDXRef-Package-crane-0.20.2-r1",
"relationshipType": "GENERATED_FROM",
"relatedSpdxElement": "SPDXRef-Package-git.luolix.top-google-go-containerregistry-v0.20.2-c195f151efe3369874c72662cd69ad43ee485128"
}
]
} |
{ | ||
name: "crane", | ||
expectedVersion: "0.20.2-r1", | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What other packages would be worth adding here?
(For later, a branch with additional WIP work to add coverage for apko builds: main...luhring:melange:spdx-rewiring-apko-int-test) |
Signed-off-by: Dan Luhring <dluhring@chainguard.dev>
In chainguard-dev/melange#1474, we updated our SBOMs to have distinct SPDX packages for the APK itself, vs. the build config used to produce it, vs. the upstream source(s) pulled in for the build. This helps us produce more idiomatic and descriptive SPDX data. But this caused a problem downstream in apko — when we aggregate APKs' SBOMs, and specifically their SPDX packages, into a single list of SPDX packages for the image, we end up with duplicate package IDs when multiple APKs were build from the same build config or the same upstream source. This PR adds tests to catch this case, and it fixes the duplicate package issue by dropping subsequent instances of a given package ID. It also further improves on the existing SBOM-related tests. Signed-off-by: Dan Luhring <dluhring@chainguard.dev>
This is substantial overhaul of how Melange produces SPDX SBOMs for the APK packages it builds.
What's changing
Forging a path to better "build-time" SBOMs
The main change here is that the concept of an "SBOM" has been promoted to an explicitly managed entity, starting at the beginning of the build process and lasting until the end. Before, some SBOM-related data was tracked, but the connection to the ultimate SBOM wasn't apparent until the last minute.
This means we can now add more logic to the build process over time that contributes software composition data to the SBOM during any pipeline steps, such as to describe language ecosystem packages and relate them to our APKs using first-hand knowledge about the build.
Using distinct packages for distinct software entities
This also addresses an issue where we had multiple PURLs associated with a single package. The PURL values were helpful, but a more idiomatic SPDX representation tracks distinct concepts like the upstream source and the build configuration as their own packages. This lets us be more explicit about how each PURL maps to one of these concepts. We then add relationships to the SBOM to describe how these concepts interact with one another to produce the APK artifact.
Better API for SBOMs
There are now explicit types for packages and SBOM documents in the
sbom
package, which let consumers create and iterate on an SBOM throughout the build.The build package has a new
SBOMGroup
type, whose purpose is to centralize interactions with the SBOM package to minimize the SBOM-related work in the build logic itself. This lets us create SBOMs tailored to the origin package or any subpackage, where some packages are specific to one SBOM, while other packages should be added to all SBOMs in the whole build group.More explicit supply chain information
Rather than detect things like the current git commit as a side effect deep into the build steps, this information has been pushed out to the CLI entrypoint, where it can still be detected OR be explicitly specified using CLI flags. Then, the build itself is going off of explicit information and can operate more deterministically.
One such change is that we add a
--license
string flag, so that we can know which license to attach to the build configuration package — e.g. for Wolfi this would be Apache-2.0. I'm interested in folks' thoughts on the best way to wire this up for our builds. We could specify it in the distro'sMakefile
, attempt to autodetect it from the local filesystem, etc. Left unset, it defaults toNOASSERTION
.General cleanup
This PR also cleans up existing code, including to fix #1460.
This PR also fixes an issue where the SPDX document namespace uses a different hash now, because the previous code didn't factor in epoch (which was wrong). This means that the SPDX document namespace will show a different hash with this PR merged than it did before, even with no changes to the package itself.
Follow-up work
Full example of SBOMs before and after
Click here!
Before
After