Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate pre-exiting formats for storing dependency info #31

Open
Shnatsel opened this issue Feb 24, 2022 · 8 comments
Open

Investigate pre-exiting formats for storing dependency info #31

Shnatsel opened this issue Feb 24, 2022 · 8 comments

Comments

@Shnatsel
Copy link
Member

Apparently there is a number of formats designed to encode package info already: https://gitbom.dev/glossary/sbom/

We need to check if any of them are suitable for our use case. Notably we redact some field such as git repo URLs, and also include information about enabled features, so it might not be 100% compatible.

Also, the degree of adoption of these formats needs to be understood; perhaps we should provide conversion utilities, even if we don't end up using the format internally.

@Shnatsel
Copy link
Member Author

Specifically, we need to understand:

  1. Does anyone actually use those SBOM formats?
  2. Are any of those formats a good fit for storing our data - perhaps we won't have to invent a custom format after all?

@tofay
Copy link
Contributor

tofay commented Aug 3, 2022

Dumping my notes on formats and SPDX here.


Suggested requirements for data format.

  1. Needs to be able to convey Rust crate runtime and build dependencies.
  2. Needs to be extensible to adding extra information we may want to add in the future, e.g statically linked C libraries, or build tool versions such as rustc?
  3. Needs to be easily interoperable with other tools. Parsable in Rust and other languages (in particular go as used by syft/trivy SCA tools). Needs to be easy for tools to correlate with vulnerability dbs (e.g Rustsec)

Trivy creator asked Embed CPE names into binaries · Issue #76 · ossf/wg-vulnerability-disclosures (github.com)
Discussion loosely points to SBOM formats being more appropriate as a data format than package identification formats (SWID/PURL). In particular SBOM formats allow expressing the nature of relationships (e.g build/runtime dependency).

It was suggested on zulip that SPDX is likeliest SBOM format to reach wider adoption given it's backing by OpenSSF and industry.

There's currently no standardized way to embed SPDX SBOMs into binaries - Embedding SPDX into binaries · Issue #739 · spdx/spdx-spec (github.com).

Some concerns over embedding SPDX SBOMs are:

  • Size, as SBOMs can be very large with e.g license information. It's not clear that’s required for the vulnerability use case, as for SPDX SBOMs NOASSERTION could be used as the value for various license fields (or the SPDX identifier instead of full license text). The SBOM could be compressed prior to embedding (ELF supports native compression too of sections too, unsure about PE/Mach-O)
  • Impact on reproducibility. SPDX format includes creation timestamps. If the binary is represented in SPDX SBOM as a File then it'd need to have a SHA1 checksum, which wouldn't be accurate. This could be mitigated by representing the binary as a (Root?) Package of the SBOM, and not including file information for the binary itself.

An example representing a binary as a SPDX File looks like


{
  "spdxVersion": "SPDX-2.2",
  "dataLicense": "CC0-1.0",
  "SPDXID": "SPDXRef-DOCUMENT",
  "name": "baz.spdx.json",
  "documentNamespace": "https://foo.bar/",
  "creationInfo": {
    "created": "2022-08-01T18:44:38Z",
    "creators": [
      "Tool: cargo-spdx 0.1.0"
    ]
  },
  "packages": [
    {
      "copyrightText": "NOASSERTION",
      "downloadLocation": "NOASSERTION",
      "externalRefs": [
        {
          "referenceCategory": "PACKAGE_MANAGER",
          "referenceLocator": "pkg:cargo/bar@0.1.0",
          "referenceType": "purl"
        }
      ],
      "licenseConcluded": "NOASSERTION",
      "licenseDeclared": "NOASSERTION",
      "name": "bar",
      "SPDXID": "SPDXRef-bar-0.1.0",
      "versionInfo": "0.1.0"
    },
    {
      "copyrightText": "NOASSERTION",
      "downloadLocation": "NOASSERTION",
      "externalRefs": [
        {
          "referenceCategory": "PACKAGE_MANAGER",
          "referenceLocator": "pkg:cargo/baz@0.1.0",
          "referenceType": "purl"
        }
      ],
      "licenseConcluded": "NOASSERTION",
      "licenseDeclared": "NOASSERTION",
      "name": "baz",
      "SPDXID": "SPDXRef-baz-0.1.0",
      "versionInfo": "0.1.0"
    },
    {
      "copyrightText": "NOASSERTION",
      "downloadLocation": "NOASSERTION",
      "externalRefs": [
        {
          "referenceCategory": "PACKAGE_MANAGER",
          "referenceLocator": "pkg:cargo/foo@0.1.0",
          "referenceType": "purl"
        }
      ],
      "licenseConcluded": "NOASSERTION",
      "licenseDeclared": "NOASSERTION",
      "name": "foo",
      "SPDXID": "SPDXRef-foo-0.1.0",
      "versionInfo": "0.1.0"
    }
  ],
  "files": [
    {
      "checksums": [
        {
          "algorithm": "SHA1",
          "checksumValue": "da39a3ee5e6b4b0d3255bfef95601890afd80709"
        }
      ],
      "copyrightText": "NOASSERTION",
      "fileName": "baz",
      "fileTypes": [
        "BINARY"
      ],
      "licenseConcluded": "NOASSERTION",
      "SPDXID": "SPDXRef-File-baz"
    }
  ],
  "relationships": [
    {
      "relatedSpdxElement": "SPDXRef-baz-0.1.0",
      "relationshipType": "GENERATED_FROM",
      "spdxElementId": "SPDXRef-File-baz"
    },
    {
      "relatedSpdxElement": "SPDXRef-bar-0.1.0",
      "relationshipType": "DEPENDS_ON",
      "spdxElementId": "SPDXRef-File-baz"
    },
    {
      "relatedSpdxElement": "SPDXRef-baz-0.1.0",
      "relationshipType": "DEPENDS_ON",
      "spdxElementId": "SPDXRef-File-baz"
    },
    {
      "relatedSpdxElement": "SPDXRef-foo-0.1.0",
      "relationshipType": "DEPENDS_ON",
      "spdxElementId": "SPDXRef-File-baz"
    }
  ]
}

Rust support for SPDX SBOM format:

More questions to consider regarding use of SPDX in cargo-auditable:

Does it actually make it easier to use the embedded data?

  • Considering both Rust tooling (cargo audit) and external tools (go-rustaudit/syft)

Is it worth using a different format at all without a resolution to
Embed CPE names into binaries · Issue #76 · ossf/wg-vulnerability-disclosures (github.com)

  • Win from that would be interoperability. If we used SPDX format but in a non-standardized section header then we'd still have to teach SCA tools to look in that location.
  • The existing format is conveys similar information to Cargo.lock. An advantage to this is that SCA tools are generally capable of reading Cargo.lock files, so the existing format is likely to be easy to integrate with SCA tools existing Rust support (and this was the case when integrating with syft). Unclear whether that would apply to non-crate information (e.g rustc version/statically linked C libraries)

@tofay
Copy link
Contributor

tofay commented Aug 3, 2022

Re "does anyone actually use these format", both trivy and grype (the vulnerability scanning tool that works with/uses syft) are capable of reading SBOMs in multiple formats, e.g SPDX/cyclonedx.

If there was a standardized section name for embedding SBOMs then cargo-auditable could use that and these tools could be updated to detect that. And without section name standardization, cargo-auditable could use SPDX, and go-rustaudit could extract the SBOM and expose the JSON for these tools to parse with their existing parsers.

@orangecms
Copy link

Hi, I just heard from you on the Rustacean Station podcast - really cool stuff here! :-)

I've been thinking, talking and exchanging about this whole topic here for a while now, so let me add some references:

When I asked who else would be interested in the topic, I was invited to the CycloneDX Slack, where people discuss the entire SBoM topic very broadly. Maybe that's also for you. :-)

Finally, I am quite involved in the oreboot firmware project, where I'm seeking to introduce SBoM as well, likely based on CycloneDX, for which there is also a Rust implementation.

That shall be it for now; feel free to poke back at me should you have any further questions etc.. 🥳

@Shnatsel
Copy link
Member Author

Thanks for the links! Having SBOMs in firmware would certainly be cool!

So far I've found everything not specifically designed for inclusion into binaries unsuitable, for two reasons:

  1. Inclusion of dates messes up reproducible builds
  2. The formats are very verbose and/or require including lots of information that is not relevant for the purposes of a security audit, increasing the binary size considerably.

I'm looking to talk to some people who have worked on the SBOM embedded in Go binaries by default. They also rolled their own JSON-based format, and perhaps we could collaborate on something more generic or at least that could be shared between the two.

FWIW Syft can already convert from the cargo auditable data format to CycloneDX.

@jayvdb
Copy link

jayvdb commented May 12, 2023

https://github.com/google/osv-scanner supports "SPDX and CycloneDX SBOMs using Package URLs" - https://google.github.io/osv-scanner/usage/#specify-sbom

As an alternative/pre-cursor for storing the dependency info in those SBOM formats, perhaps rust-audit-info could extract the existing format and do a "rough" conversion to these SBOM formats, so that integration with these other tools can be explored, determining what (if any) extra fields need to be stored in the rust binaries in order to get reasonable compatibility with these tools.

@Shnatsel
Copy link
Member Author

Syft can already perform such a conversion today.

@Shnatsel
Copy link
Member Author

I've prototyped recording CycloneDX in the binaries directly, you can find the code in this branch: https://github.com/rust-secure-code/cargo-auditable/tree/record-cyclonedx

This was made possible by newer CycloneDX versions that no longer require a date and serial number to be present, which enables them to be made reproducible.

Recording CycloneDX results in 2x the overhead compared to the custom format. But the overhead is still consistently below 1/1000th of the size of the binary across a wide range of projects, so this is probably acceptable.

I've also built a pure-Rust converter from the custom format to CycloneDX, so that anyone who needs the conversion would not need to pull in the entirety of Syft: https://github.com/rust-secure-code/cargo-auditable/tree/master/auditable2cdx

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants