Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(sbom)!: overhaul SBOM generation logic #1474

Merged
merged 13 commits into from
Oct 21, 2024

Conversation

luhring
Copy link
Contributor

@luhring luhring commented Sep 4, 2024

This is substantial overhaul of how Melange produces SPDX SBOMs for the APK packages it builds.

What's changing

Forging a path to better "build-time" SBOMs

The main change here is that the concept of an "SBOM" has been promoted to an explicitly managed entity, starting at the beginning of the build process and lasting until the end. Before, some SBOM-related data was tracked, but the connection to the ultimate SBOM wasn't apparent until the last minute.

This means we can now add more logic to the build process over time that contributes software composition data to the SBOM during any pipeline steps, such as to describe language ecosystem packages and relate them to our APKs using first-hand knowledge about the build.

Using distinct packages for distinct software entities

This also addresses an issue where we had multiple PURLs associated with a single package. The PURL values were helpful, but a more idiomatic SPDX representation tracks distinct concepts like the upstream source and the build configuration as their own packages. This lets us be more explicit about how each PURL maps to one of these concepts. We then add relationships to the SBOM to describe how these concepts interact with one another to produce the APK artifact.

Better API for SBOMs

There are now explicit types for packages and SBOM documents in the sbom package, which let consumers create and iterate on an SBOM throughout the build.

The build package has a new SBOMGroup type, whose purpose is to centralize interactions with the SBOM package to minimize the SBOM-related work in the build logic itself. This lets us create SBOMs tailored to the origin package or any subpackage, where some packages are specific to one SBOM, while other packages should be added to all SBOMs in the whole build group.

More explicit supply chain information

Rather than detect things like the current git commit as a side effect deep into the build steps, this information has been pushed out to the CLI entrypoint, where it can still be detected OR be explicitly specified using CLI flags. Then, the build itself is going off of explicit information and can operate more deterministically.

One such change is that we add a --license string flag, so that we can know which license to attach to the build configuration package — e.g. for Wolfi this would be Apache-2.0. I'm interested in folks' thoughts on the best way to wire this up for our builds. We could specify it in the distro's Makefile, attempt to autodetect it from the local filesystem, etc. Left unset, it defaults to NOASSERTION.

General cleanup

This PR also cleans up existing code, including to fix #1460.

This PR also fixes an issue where the SPDX document namespace uses a different hash now, because the previous code didn't factor in epoch (which was wrong). This means that the SPDX document namespace will show a different hash with this PR merged than it did before, even with no changes to the package itself.

Follow-up work

  • Tests: While some SBOM tests exist already in GitHub Actions workflows (using YAML and bash), it'd be ideal to write tests in Go to ensure we're getting the expected data in our SBOMs.

Full example of SBOMs before and after

Click here!

Before

{
  "SPDXID": "SPDXRef-DOCUMENT",
  "name": "apk-crane-0.20.2-r0",
  "spdxVersion": "SPDX-2.3",
  "creationInfo": {
    "created": "2024-08-06T17:00:47Z",
    "creators": [
      "Tool: melange (devel)",
      "Organization: Chainguard, Inc"
    ],
    "licenseListVersion": "3.22"
  },
  "dataLicense": "CC0-1.0",
  "documentNamespace": "https://spdx.org/spdxdocs/chainguard/melange/3fe50e511d389d87aa83cfe4aff488dcff156e09",
  "documentDescribes": [
    "SPDXRef-Package-crane-0.20.2-r0"
  ],
  "packages": [
    {
      "SPDXID": "SPDXRef-Package-crane-0.20.2-r0",
      "name": "crane",
      "versionInfo": "0.20.2-r0",
      "filesAnalyzed": false,
      "licenseConcluded": "NOASSERTION",
      "licenseDeclared": "Apache-2.0",
      "downloadLocation": "NOASSERTION",
      "originator": "Organization: Wolfi",
      "supplier": "Organization: Wolfi",
      "copyrightText": "\n",
      "externalRefs": [
        {
          "referenceCategory": "PACKAGE-MANAGER",
          "referenceLocator": "pkg:apk/wolfi/crane@0.20.2-r0?arch=aarch64",
          "referenceType": "purl"
        },
        {
          "referenceCategory": "PACKAGE-MANAGER",
          "referenceLocator": "pkg:github/google/go-containerregistry@v0.20.2",
          "referenceType": "purl"
        },
        {
          "referenceCategory": "PACKAGE-MANAGER",
          "referenceLocator": "pkg:github/google/go-containerregistry@c195f151efe3369874c72662cd69ad43ee485128",
          "referenceType": "purl"
        }
      ]
    }
  ],
  "relationships": []
}

After

{
  "SPDXID": "SPDXRef-DOCUMENT",
  "name": "apk-crane-0.20.2-r0",
  "spdxVersion": "SPDX-2.3",
  "creationInfo": {
    "created": "2024-08-06T17:00:47Z",
    "creators": [
      "Tool: melange (devel)",
      "Organization: Chainguard, Inc"
    ],
    "licenseListVersion": "3.22"
  },
  "dataLicense": "CC0-1.0",
  "documentNamespace": "https://spdx.org/spdxdocs/chainguard/melange/6bc62744084d5c2695c9e44af7abbc83b386da6d",
  "documentDescribes": [
    "SPDXRef-Package-crane-0.20.2-r0"
  ],
  "packages": [
    {
      "SPDXID": "SPDXRef-Package-crane-0.20.2-r0",
      "name": "crane",
      "versionInfo": "0.20.2-r0",
      "filesAnalyzed": false,
      "licenseConcluded": "NOASSERTION",
      "licenseDeclared": "Apache-2.0",
      "downloadLocation": "NOASSERTION",
      "originator": "Organization: Wolfi",
      "supplier": "Organization: Wolfi",
      "copyrightText": "\n",
      "externalRefs": [
        {
          "referenceCategory": "PACKAGE-MANAGER",
          "referenceLocator": "pkg:apk/wolfi/crane@0.20.2-r0?arch=aarch64",
          "referenceType": "purl"
        }
      ]
    },
    {
      "SPDXID": "SPDXRef-Package-crane.yaml-1ac0df6651232eec66d1bba8f95f495b61b52507",
      "name": "crane.yaml",
      "versionInfo": "1ac0df6651232eec66d1bba8f95f495b61b52507",
      "filesAnalyzed": false,
      "licenseConcluded": "NOASSERTION",
      "licenseDeclared": "NOASSERTION",
      "downloadLocation": "NOASSERTION",
      "originator": "Organization: Wolfi",
      "supplier": "Organization: Wolfi",
      "externalRefs": [
        {
          "referenceCategory": "PACKAGE-MANAGER",
          "referenceLocator": "pkg:github/wolfi-dev/os@1ac0df6651232eec66d1bba8f95f495b61b52507#crane.yaml",
          "referenceType": "purl"
        }
      ]
    },
    {
      "SPDXID": "SPDXRef-Package-go-containerregistry-v0.20.2",
      "name": "go-containerregistry",
      "versionInfo": "v0.20.2",
      "filesAnalyzed": false,
      "licenseConcluded": "NOASSERTION",
      "licenseDeclared": "Apache-2.0",
      "downloadLocation": "NOASSERTION",
      "originator": "Organization: Google",
      "supplier": "Organization: Google",
      "externalRefs": [
        {
          "referenceCategory": "PACKAGE-MANAGER",
          "referenceLocator": "pkg:github/google/go-containerregistry@v0.20.2",
          "referenceType": "purl"
        }
      ]
    }
  ],
  "relationships": [
    {
      "spdxElementId": "SPDXRef-Package-crane-0.20.2-r0",
      "relationshipType": "DESCRIBED_BY",
      "relatedSpdxElement": "SPDXRef-Package-crane.yaml-1ac0df6651232eec66d1bba8f95f495b61b52507"
    },
    {
      "spdxElementId": "SPDXRef-Package-crane-0.20.2-r0",
      "relationshipType": "GENERATED_FROM",
      "relatedSpdxElement": "SPDXRef-Package-go-containerregistry-v0.20.2"
    }
  ]
}

@luhring
Copy link
Contributor Author

luhring commented Sep 4, 2024

  • Need to fix: the upstream source package claims "Wolfi" as the originator/supplier
      "originator": "Organization: Wolfi",
      "supplier": "Organization: Wolfi",

@xnox
Copy link
Contributor

xnox commented Sep 4, 2024

Nitpicks:

    {
      "SPDXID": "SPDXRef-Package-go-containerregistry-v0.20.2",
      "name": "go-containerregistry",
      "versionInfo": "v0.20.2",
      "filesAnalyzed": false,
      "licenseConcluded": "NOASSERTION",
      "licenseDeclared": "Apache-2.0",
      "downloadLocation": "NOASSERTION",
      "originator": "Organization: Google",
      "supplier": "Organization: Google",
      "externalRefs": [
        {
          "referenceCategory": "PACKAGE-MANAGER",
          "referenceLocator": "pkg:github/google/go-containerregistry@v0.20.2",
          "referenceType": "purl"
        }
      ]
    }

Previously multiple PURLs were generated for pkg:github style references, because tag is not fixed in time; and lots of upstreams delete tags or rewrite them. Given we have versionInfo as a top level information on this spdx package, can we please switch externalref to using git commit hash, rather than version string. As that would be more precise.

Separately, I actually originally wanted to simply use downloadLocation - but couldn't, as I only had a single downloadLocation and needed to encode multiple locations. Now that there is 1:1 downloadLocation, i wonder if we can shove a git hurl into that, and not even emit externalRefs.

Separate I hate boilerplate that we emit which is useless. I ponder if we can omit filesAnalyzed, licenseConcluded copyrightText keys, as we never fill them in, so they have no value. I guess that's unrelated to the above reorg and can be submitted stand alone.

Have you attempted an image build with these packages? are there changes needed in apko which "merges" things?

@xnox
Copy link
Contributor

xnox commented Sep 4, 2024

Expanding on the above, given we now have packages for things, I do wonder if we should remove externalref PURL alltogether and instead fill in:

https://spdx.github.io/spdx-spec/v2.3/package-information/#77-package-download-location-field with direct url to plain yaml file; tarball; git+https url with commit encoded

And for tarball downloads also emit https://spdx.github.io/spdx-spec/v2.3/package-information/#710-package-checksum-field for the expected hash. Possibly for git checkouts too.

Because purls are ugly, and difficult to decode, and are not "click to download" experience.

@luhring
Copy link
Contributor Author

luhring commented Sep 4, 2024

Awesome feedback ❤️ — talked more w/ @xnox about some of the nuanced points. Some of these beyond the intended scope of this PR, but we'll go ahead and tackle any that are easy enough to include now, and we'll open an issue for anything left.

Capturing a list of TODOs:

  • For upstream source packages, attach a PURL only for github-based git-checkout pipelines. (The pkg:generic/... PURL doesn't seem useful for now. We'll continue to use tags over commits when available, to make PURL-based vulnerability matching easier.)
  • Don't attach PURLs to the build config (YAML) package.
  • Use downloadLocation for upstream source package. We can use https URLs for fetch, and git+https for git-checkout, and here we can prefer commits over tags.
  • Also use downloadLocation for the build config package, using git+https with a commit and subpath for the config file.
  • Omit non-required fields where we have no usable information, like with licenseConcluded. (Be careful omitting filesAnalyzed, because the assumed default is true.)
  • Test how these new SBOMs interact with apko's SBOM aggregration process.

@xnox
Copy link
Contributor

xnox commented Sep 11, 2024

I also dream a dream, to be able to encode/include 'libfoo-static' as a package in sbom, when it is used in build-depends, or some other field declaring that.

I.e. such that boringssl-fips-static does make it into SBOMs of like envoy-fips package.

Copy link
Contributor

@xnox xnox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just various comments. next is for me to build this against a few packages and check if it all is sensible.

pkg/build/build.go Show resolved Hide resolved
pkg/cli/build.go Show resolved Hide resolved
pkg/cli/build.go Show resolved Hide resolved
pkg/sbom/document.go Show resolved Hide resolved
// and the values are the checksums.
//
// TODO: We're not currently using this field, consider removing it.
Checksums map[string]string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SPDX says do not include checksum for package that includes the SPDX itself. So we can start using this, if we move sbom from data tarball to control tarball and set this to like data tarball sha256sum.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, we can remove it for now and add it back when we know what we end up doing. This part of the code is a little awkward too, so it's probably good to approach it fresh with a clear use case.

pkg/sbom/package.go Show resolved Hide resolved
@luhring
Copy link
Contributor Author

luhring commented Oct 5, 2024

Overall thoughts — these are all great comments, and there's a few I should address ASAP.

Also this PR is pretty huge already, so what I'm thinking is we should focus on getting tests in place to assert the SBOMs we expect to produce, and then keep iterating on this as fast follows. Thoughts?

@xnox
Copy link
Contributor

xnox commented Oct 5, 2024

Yes that. Plan was for me to build a few things. And then see how they look. And yeah instrument small thing that is a useful demonstrator of all the things expected to still work. Like tiny packages with all the expected features. Cause we need good small test cases of all the things.

@luhring
Copy link
Contributor Author

luhring commented Oct 9, 2024

FYI — I plan to return to this next week after my vacation to finish this up.

- Introduce new API for creating SBOMs during builds
- Make SBOMs a first class concept throughout the entire build
- Separate out distinct packages for upstream source and build configuration

Signed-off-by: Dan Luhring <dluhring@chainguard.dev>
Signed-off-by: Dan Luhring <dluhring@chainguard.dev>
Signed-off-by: Dan Luhring <dluhring@chainguard.dev>
Signed-off-by: Dan Luhring <dluhring@chainguard.dev>
@luhring luhring force-pushed the spdx-rewiring branch 2 times, most recently from 939a52c to edf7a87 Compare October 18, 2024 17:56
Signed-off-by: Dan Luhring <dluhring@chainguard.dev>
I believe these are brittle and inferior to the newer integration testing for SBOMs, which checks external refs along with every other field value for our SBOMs.

Signed-off-by: Dan Luhring <dluhring@chainguard.dev>
Signed-off-by: Dan Luhring <dluhring@chainguard.dev>
Signed-off-by: Dan Luhring <dluhring@chainguard.dev>
Signed-off-by: Dan Luhring <dluhring@chainguard.dev>
Signed-off-by: Dan Luhring <dluhring@chainguard.dev>
Signed-off-by: Dan Luhring <dluhring@chainguard.dev>
Signed-off-by: Dan Luhring <dluhring@chainguard.dev>
@luhring
Copy link
Contributor Author

luhring commented Oct 21, 2024

The new melange integration tests are working great. It's already been so nice to have debugging as an option when things go unexpectedly. 🎉

I've also tried to add an integration test that uses apko to build images from melange's packages (and their SBOMs), but I'm hitting enough frictions that I'm going to stop and come back to that later.

In the meantime, here's an example of an apko SBOM using these new changes. I've built crane locally, which pulls in ca-certificates-bundle from the repo since it's a runtime dependency of crane.

`sbom-aarch64.spdx.json`
{
  "SPDXID": "SPDXRef-DOCUMENT",
  "name": "sbom-sha256:0ba74df91115b4c378c31d29d449ad1fef18f5c2b29fbc806d0183fc202a276e",
  "spdxVersion": "SPDX-2.3",
  "creationInfo": {
    "created": "2024-10-10T21:49:59Z",
    "creators": [
      "Tool: apko (devel)",
      "Organization: Chainguard, Inc"
    ],
    "licenseListVersion": "3.16"
  },
  "dataLicense": "CC0-1.0",
  "documentNamespace": "https://spdx.org/spdxdocs/apko/",
  "documentDescribes": [
    "SPDXRef-Package-sha256-248efa08bf88cba44c471bc969160798fb70ddfc852ad528cd4382f952d3cc5d"
  ],
  "packages": [
    {
      "SPDXID": "SPDXRef-Package-sha256-248efa08bf88cba44c471bc969160798fb70ddfc852ad528cd4382f952d3cc5d",
      "name": "sha256:248efa08bf88cba44c471bc969160798fb70ddfc852ad528cd4382f952d3cc5d",
      "versionInfo": "sha256:248efa08bf88cba44c471bc969160798fb70ddfc852ad528cd4382f952d3cc5d",
      "filesAnalyzed": false,
      "description": "apko container image",
      "downloadLocation": "NOASSERTION",
      "supplier": "Organization: apko-generated image",
      "primaryPackagePurpose": "CONTAINER",
      "checksums": [
        {
          "algorithm": "SHA256",
          "checksumValue": "248efa08bf88cba44c471bc969160798fb70ddfc852ad528cd4382f952d3cc5d"
        }
      ],
      "externalRefs": [
        {
          "referenceCategory": "PACKAGE-MANAGER",
          "referenceLocator": "pkg:oci/test@sha256%3A248efa08bf88cba44c471bc969160798fb70ddfc852ad528cd4382f952d3cc5d?arch=arm64\u0026mediaType=application%2Fvnd.oci.image.manifest.v1%2Bjson\u0026os=linux",
          "referenceType": "purl"
        }
      ]
    },
    {
      "SPDXID": "SPDXRef-Package-sha256-0ba74df91115b4c378c31d29d449ad1fef18f5c2b29fbc806d0183fc202a276e",
      "name": "sha256:0ba74df91115b4c378c31d29d449ad1fef18f5c2b29fbc806d0183fc202a276e",
      "versionInfo": "unknown",
      "filesAnalyzed": false,
      "description": "apko operating system layer",
      "downloadLocation": "NOASSERTION",
      "supplier": "Organization: apko-generated image",
      "externalRefs": [
        {
          "referenceCategory": "PACKAGE-MANAGER",
          "referenceLocator": "pkg:oci/test@sha256%3A0ba74df91115b4c378c31d29d449ad1fef18f5c2b29fbc806d0183fc202a276e?arch=arm64\u0026mediaType=application%2Fvnd.oci.image.layer.v1.tar%2Bgzip\u0026os=linux",
          "referenceType": "purl"
        }
      ]
    },
    {
      "SPDXID": "SPDXRef-Package-ca-certificates-bundle-20241010-r0",
      "name": "ca-certificates-bundle",
      "versionInfo": "20241010-r0",
      "filesAnalyzed": false,
      "licenseConcluded": "NOASSERTION",
      "licenseDeclared": "MPL-2.0 AND MIT",
      "downloadLocation": "NOASSERTION",
      "originator": "Organization: Wolfi",
      "supplier": "Organization: Wolfi",
      "copyrightText": "\n",
      "externalRefs": [
        {
          "referenceCategory": "PACKAGE-MANAGER",
          "referenceLocator": "pkg:apk/wolfi/ca-certificates-bundle@20241010-r0?arch=aarch64",
          "referenceType": "purl"
        },
        {
          "referenceCategory": "PACKAGE-MANAGER",
          "referenceLocator": "pkg:generic/ca-certificates@20241010?vcs_url=git%2Bhttps%3A%2F%2Fgitlab.alpinelinux.org%2Falpine%2Fca-certificates%40153107a3beeff73331fd6ac8fa40312f3e92f4f0",
          "referenceType": "purl"
        }
      ]
    },
    {
      "SPDXID": "SPDXRef-Package-crane-0.20.2-r1",
      "name": "crane",
      "versionInfo": "0.20.2-r1",
      "filesAnalyzed": false,
      "licenseConcluded": "NOASSERTION",
      "licenseDeclared": "Apache-2.0",
      "downloadLocation": "NOASSERTION",
      "originator": "Organization: Wolfi",
      "supplier": "Organization: Wolfi",
      "copyrightText": "\n",
      "externalRefs": [
        {
          "referenceCategory": "PACKAGE-MANAGER",
          "referenceLocator": "pkg:apk/wolfi/crane@0.20.2-r1?arch=aarch64",
          "referenceType": "purl"
        }
      ]
    },
    {
      "SPDXID": "SPDXRef-Package-crane.yaml-f90fa1593a6b8e20cc29047659e5a6e4c95ed94a",
      "name": "crane.yaml",
      "versionInfo": "f90fa1593a6b8e20cc29047659e5a6e4c95ed94a",
      "filesAnalyzed": false,
      "licenseConcluded": "NOASSERTION",
      "licenseDeclared": "NOASSERTION",
      "downloadLocation": "NOASSERTION",
      "originator": "Organization: Wolfi",
      "supplier": "Organization: Wolfi",
      "externalRefs": [
        {
          "referenceCategory": "PACKAGE-MANAGER",
          "referenceLocator": "pkg:github/wolfi-dev/os@f90fa1593a6b8e20cc29047659e5a6e4c95ed94a#crane.yaml",
          "referenceType": "purl"
        }
      ]
    },
    {
      "SPDXID": "SPDXRef-Package-git.luolix.top-google-go-containerregistry-v0.20.2-c195f151efe3369874c72662cd69ad43ee485128",
      "name": "go-containerregistry",
      "versionInfo": "v0.20.2",
      "filesAnalyzed": false,
      "licenseConcluded": "NOASSERTION",
      "licenseDeclared": "Apache-2.0",
      "downloadLocation": "NOASSERTION",
      "originator": "Organization: Google",
      "supplier": "Organization: Google",
      "externalRefs": [
        {
          "referenceCategory": "PACKAGE-MANAGER",
          "referenceLocator": "pkg:github/google/go-containerregistry@v0.20.2",
          "referenceType": "purl"
        }
      ]
    }
  ],
  "relationships": [
    {
      "spdxElementId": "SPDXRef-Package-sha256-248efa08bf88cba44c471bc969160798fb70ddfc852ad528cd4382f952d3cc5d",
      "relationshipType": "CONTAINS",
      "relatedSpdxElement": "SPDXRef-Package-sha256-0ba74df91115b4c378c31d29d449ad1fef18f5c2b29fbc806d0183fc202a276e"
    },
    {
      "spdxElementId": "SPDXRef-Package-crane-0.20.2-r1",
      "relationshipType": "DESCRIBED_BY",
      "relatedSpdxElement": "SPDXRef-Package-crane.yaml-f90fa1593a6b8e20cc29047659e5a6e4c95ed94a"
    },
    {
      "spdxElementId": "SPDXRef-Package-crane-0.20.2-r1",
      "relationshipType": "GENERATED_FROM",
      "relatedSpdxElement": "SPDXRef-Package-git.luolix.top-google-go-containerregistry-v0.20.2-c195f151efe3369874c72662cd69ad43ee485128"
    }
  ]
}

@luhring luhring marked this pull request as ready for review October 21, 2024 14:11
Comment on lines +27 to +30
{
name: "crane",
expectedVersion: "0.20.2-r1",
},
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What other packages would be worth adding here?

imjasonh
imjasonh previously approved these changes Oct 21, 2024
@luhring
Copy link
Contributor Author

luhring commented Oct 21, 2024

(For later, a branch with additional WIP work to add coverage for apko builds: main...luhring:melange:spdx-rewiring-apko-int-test)

Signed-off-by: Dan Luhring <dluhring@chainguard.dev>
@luhring luhring merged commit d30e397 into chainguard-dev:main Oct 21, 2024
36 checks passed
luhring added a commit to chainguard-dev/apko that referenced this pull request Oct 24, 2024
In chainguard-dev/melange#1474, we updated our
SBOMs to have distinct SPDX packages for the APK itself, vs. the build
config used to produce it, vs. the upstream source(s) pulled in for the
build. This helps us produce more idiomatic and descriptive SPDX data.

But this caused a problem downstream in apko — when we aggregate APKs'
SBOMs, and specifically their SPDX packages, into a single list of SPDX
packages for the image, we end up with duplicate package IDs when
multiple APKs were build from the same build config or the same upstream
source.

This PR adds tests to catch this case, and it fixes the duplicate
package issue by dropping subsequent instances of a given package ID.

It also further improves on the existing SBOM-related tests.

Signed-off-by: Dan Luhring <dluhring@chainguard.dev>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unify git state detection mechanisms
3 participants