Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for a structured BOM format #166

Merged
merged 11 commits into from
Oct 4, 2021

Conversation

samj1912
Copy link
Member

@samj1912 samj1912 commented Jun 2, 2021

Readable

Note - This RFC only changes the BOM format for the existing bom tables. The RFCs should also be followed by RFCs that propose -

  • BOMs for stacks
  • How to merge stack boms with BP boms during build and rebase

@samj1912 samj1912 changed the title Add initial draft Add support for a structured BOM format Jun 2, 2021
text/0000-sbom.md Outdated Show resolved Hide resolved
text/0000-sbom.md Outdated Show resolved Hide resolved
text/0000-sbom.md Outdated Show resolved Hide resolved
@samj1912
Copy link
Member Author

samj1912 commented Jun 4, 2021

Some example SBOMs created using Cyclonedx tools and how they look like after they are converted to spdx -

Sample node npm project from paketo
https://gist.github.com/samj1912/37b8454ec882fe2dd20133774ff0e283

Sample go mod project from paketo
https://gist.github.com/samj1912/7a5f0e39ce0eff6d0dc3835581d6581a

@samj1912
Copy link
Member Author

samj1912 commented Jun 4, 2021

I tried to use https://github.com/spdx/spdx-sbom-generator.

This requires me to run go mod vendor before I can generate the sbom but the SPDX sbom it produces for go looks like -
https://gist.github.com/63f36de13740d1b23702dab69af9b072

For npm it looks like -
https://gist.github.com/a78b9b9e219c22b3af5f5bf8e7123640 (seems mostly absent but looks like this is actively being worked upon)

@samj1912
Copy link
Member Author

samj1912 commented Jun 4, 2021

I could use some help trying this out with other samples repositories and other tools (both on CDX and SPDX side so that we have a clear picture and can recommend an appropriate direct for CNB.)

@dmikusa-pivotal @ForestEckhardt @sophiewigmore if any of you from the paketo side would like to help collaborate on this RFC please lmk :) that would be a great help.

@coderpatros @nishakm your reviews are extremely valuable 🙏 . Thank you for taking a look at this RFC. I think with both of you we should have a good coverage across both CDX and SPDX :)


cc: @buildpacks/platform-maintainers and @matthewmcnew for any platform related comments around pack inspect-image --bom --format <sbom-format> and similarly for kp cli

@samj1912 samj1912 changed the title Add support for a structured BOM format [WIP] Add support for a structured BOM format Jun 4, 2021
@nishakm
Copy link

nishakm commented Jun 7, 2021

I tried to use https://github.com/spdx/spdx-sbom-generator.

This requires me to run go mod vendor before I can generate the sbom but the SPDX sbom it produces for go looks like

Referencing opensbom-generator/spdx-sbom-generator#1, go mod vendor doesn't give you the dependencies of a go project. I am not sure what "lifecycle" does to collect SBOM data, but if it can get the mount point of the container, Tern will be able to generate an SBOM for it.

@samj1912
Copy link
Member Author

samj1912 commented Jun 7, 2021

I tried to use https://github.com/spdx/spdx-sbom-generator.
This requires me to run go mod vendor before I can generate the sbom but the SPDX sbom it produces for go looks like

Referencing spdx/spdx-sbom-generator#1, go mod vendor doesn't give you the dependencies of a go project. I am not sure what "lifecycle" does to collect SBOM data, but if it can get the mount point of the container, Tern will be able to generate an SBOM for it.

I may be wrong but AFAICT tern can only scan container images once they are created, right? The tools we need around sbom generation for buildpacks need to work on a normal directory/filesystem during the container build itself. Is tern meant for such use cases?

The other issue, although not a blocker/requirement is presence of go tooling for SBOM generation since a lot of the buildpacks are currently written in go and it would be ideal for them to use these SBOM generators as libraries instead of shelling out. (For buildpacks written in bash this is not a concern)

@nishakm
Copy link

nishakm commented Jun 7, 2021

I may be wrong but AFAICT tern can only scan container images once they are created, right? The tools we need around sbom generation for buildpacks need to work on a normal directory/filesystem during the container build itself. Is tern meant for such use cases?

This is actually new. So my apologies for not communicating that better. Tern can now generate an SBOM at container build time using tern report --live /path/to/mounted/filesystem.

The other issue, although not a blocker/requirement is presence of go tooling for SBOM generation since a lot of the buildpacks are currently written in go and it would be ideal for them to use these SBOM generators as libraries instead of shelling out. (For buildpacks written in bash this is not a concern)

This is something we'd love help with! Would you be able to join our upcoming community meeting? https://github.com/tern-tools/tern#community-meetings? We can discuss there.

cc: @rnjudge

@samj1912
Copy link
Member Author

samj1912 commented Jun 7, 2021

I may be wrong but AFAICT tern can only scan container images once they are created, right? The tools we need around sbom generation for buildpacks need to work on a normal directory/filesystem during the container build itself. Is tern meant for such use cases?

This is actually new. So my apologies for not communicating that better. Tern can now generate an SBOM at container build time using tern report --live /path/to/mounted/filesystem.

I tried replicating my above experiments with tern but I wasn't getting any output. Maybe I am doing something wrong here as it seems to be expecting layers? Should I reach out to you at a separate slack channel or other communication medium?

$ tern --version
Tern at commit bd359780316d146de4434998b1c99757d21be86e
   python version = 3.7.10 (default, Apr 27 2021, 08:48:55)

$ git clone https://github.com/paketo-buildpacks/samples
$ cd samples/nodejs/npm
$ tern report --live . 
2021-06-07 19:25:42,849 - DEBUG - __main__ - Starting...
2021-06-07 19:25:42,849 - DEBUG - prep - Setting up...
2021-06-07 19:25:42,849 - DEBUG - run - Starting analysis...
2021-06-07 19:25:42,874 - DEBUG - generator - Generating summary report for layer...
This report was generated by the Tern Project
https://github.com/tern-tools/tern/commit/b23ad86d3a2eca8a9249869bc67ce46f7e544bfa

	Layer :
	File licenses found in Layer:  None
	Packages found in Layer: None

2021-06-07 19:25:42,875 - DEBUG - prep - Tearing down...
2021-06-07 19:25:42,875 - DEBUG - __main__ - Finished

We would also need the SBOM generation utility to be a standalone binary ideally which can work on various linux/windows operating systems. It looks like that might be tough with tern since it is a python tool and requires a python interpreter.

@rnjudge
Copy link

rnjudge commented Jun 7, 2021

I tried replicating my above experiments with tern but I wasn't getting any output. Maybe I am doing something wrong here as it seems to be expecting layers? Should I reach out to you at a separate slack channel or other communication medium?

Yes, Tern expects a mounted filesystem layer. We have a community meeting tomorrow (Tuesday, June 8th) at 3PM UTC/8AM PST where it might be easiest to discuss this live if the timing works for you. If not, there's a slack channel for ongoing discussions.

We would also need the SBOM generation utility to be a standalone binary ideally which can work on various linux/windows operating systems. It looks like that might be tough with tern since it is a python tool and requires a python interpreter.

This is not the first time we have heard this feedback and are working on packing Tern as a debian package.

@nishakm
Copy link

nishakm commented Jun 7, 2021

We would also need the SBOM generation utility to be a standalone binary ideally which can work on various linux/windows operating systems. It looks like that might be tough with tern since it is a python tool and requires a python interpreter.

It's quite straightforward to build the Docker image for tern if you are concerned with portability. Even if we were to build a go library, it won't work natively on Mac or Windows because it uses linux syscalls.

@samj1912
Copy link
Member Author

samj1912 commented Jun 7, 2021

Sorry for any confusion around our use case, buildpacks generate sbom during the build process, completely independent of the docker daemon or dockerfiles. This is independent of any post build container scanning that a tool might do (which is where I think tern fits in).

Most of the SBOM tools I tested above do some file system parsing and I don't think they need privileged sys calls, and if they did, it won't work during the build process since the builds happen in an unprivileged environment.

The sbom generation binary or library will be needed inside the build environment and in some cases the build environment can be pretty minimal which is why a standalone binary produced by go is an attractive option. (Our build environments are just Linux/windows, with majority of the buildpacks currently targeting Linux)

The build time user also doesn't have root privileges inside the build environment either so apt or dpkg installations don't work either.

Either way, from what I can tell tern is meant for scanning containers and not files/directories. (See my above use cases around simply cloning the source repo and running the sbom generation tool)

Edit - Will catch up with the tern team on slack and post the final conclusion here.

@jabrown85
Copy link
Contributor

This may be controversial, but I am not convinced this should be baked into buildpacks as a first-party thing. I am not saying the SBOM isn't useful or that this won't increase the security posture of some buildpack built application images.

I am hesitant to add a new format that lifecycle/pack has to know about and interact with. I would much rather give buildpacks the ability to add BOM/Licenses/SBOM/Cyclone/CodeCov/Whatever output they want into a report.toml-like file. That way paketo buildpacks could use cyclone while another builder uses SPDX.

I'm also not convinced that each buildpack is going to properly report everything it did. Or that another buildpack didn't modify the contents of a previous buildpack's layers. Will security tools end up having to scan the image anyway?

I'm +1 for giving buildpacks and platforms the low level API tooling to allow them to produce these reports...but I am -1 on baking this in.

@cjnosal
Copy link

cjnosal commented Jun 9, 2021

Freely available scanners like Trivy and Grype look for package managers, lockfiles, and manifests inside the image (e.g. Gemfile, dpkg, apt, jar manifests). When a buildpack layer drops a binary into the filesystems, e.g. the java runtime environment, these tools can't identify it.

Enterprise binary analysis scanners like Blackduck can identify common binaries like the JRE, but don't support other languages like node.js.

For both types of tools, there's a level of fuzzy matching to turn the filepaths, api groups, package names into a CPE's vendor, name, and target fields, which leads to both false negatives and false positives.

BoMs included in a buildpack may have errors, but that's also true of the existing tools. Combining a provided BoM with an extracted BoM could mitigate the shortcomings of both.

@cjnosal
Copy link

cjnosal commented Jun 11, 2021

To expand on the CVE scanning use case (focused on the content of the BoM rather than the schema or location):

CVE lookups based on a BoM will get better results with more CPE fields properly filled out, specifically vendor (which in some cases may be the API Group), language, and target_sw.

As a concrete example, the java buildpack BoM includes:

{
      "name": "jre",
      "metadata": {
        "layer": "jre",
        "licenses": [
          {
            "type": "GPL-2.0 WITH Classpath-exception-2.0",
            "uri": "https://openjdk.java.net/legal/gplv2+ce.html"
          }
        ],
        "name": "BellSoft Liberica JRE",
        "sha256": "5bbb7b867ab797ace54aa98a76b7abcca6c5fa01338ee2907e97adb21150c414",
        "stacks": [
          "io.buildpacks.stacks.bionic",
          "org.cloudfoundry.stacks.cflinuxfs3"
        ],
        "uri": "https://github.com/bell-sw/Liberica/releases/download/11.0.11+9/bellsoft-jre11.0.11+9-linux-amd64.tar.gz",
        "version": "11.0.11"
      },
      "buildpacks": {
        "id": "paketo-buildpacks/bellsoft-liberica",
        "version": "8.0.0"
      }
    }

The vendor strings (BellSoft, bell-sw), package string (liberica), and publisher (paketo-buildpacks) would all be useful strings for a CVE search, ideally without extracting them from urls.

It would also be helpful if these fields match the upstream package names (e.g. bellsoft-jdk11.0.9+12-linux-amd64.deb installs the package bellsoft-java11). If Bellsoft reports bellsoft-java11 as impacted by a CVE, a scanner searching for "jre" based on the buildpack BoM won't find the CVE, unless the buildpack maintainer also reports their own CPE string as impacted.

Additionally, scanners may need hints about the distro to determine what vulnerability feed to query (the ubuntu version used by io.buildpacks.stacks.bionic)

TLDR users of CVE scanners need to be able to retrieve (or assemble) accurate and complete CPEs and OS version from the BoM

@coderpatros
Copy link

CVE lookups based on a BoM will get better results with more CPE fields properly filled out, specifically vendor (which in some cases may be the API Group), language, and target_sw.

Depending on the component, vulnerability lookups are better served using package URLs https://github.com/package-url/purl-spec.

It's probably some time away, but the NVD will deprecate CPEs and likely replace them with SWID tags. So it may be worth considering the work being done on software identification here too https://github.com/usnistgov/swid-reg

@ForestEckhardt
Copy link
Contributor

Depending on the component, vulnerability lookups are better served using package URLs https://github.com/package-url/purl-spec.

If you wouldn't mind elaborating on how the purl is able to be used to identify vulnerabilities, I understand cpe and swid but I am falling short on finding the integration point for purl. As far as I can understand the purl can help identify the package but when it comes to actual using the purl for look up I can't find anything.

@samj1912
Copy link
Member Author

samj1912 commented Jun 16, 2021

Depending on the component, vulnerability lookups are better served using package URLs https://github.com/package-url/purl-spec.

If you wouldn't mind elaborating on how the purl is able to be used to identify vulnerabilities, I understand cpe and swid but I am falling short on finding the integration point for purl. As far as I can understand the purl can help identify the package but when it comes to actual using the purl for look up I can't find anything.

I believe here are some integrations \w purl - https://github.com/package-url/purl-spec#users-adopters-and-links

https://ossindex.sonatype.org/doc/coordinates
https://docs.dependencytrack.org/datasources/routing/

@coderpatros
Copy link

coderpatros commented Jun 17, 2021

Depending on the component, vulnerability lookups are better served using package URLs https://github.com/package-url/purl-spec.

If you wouldn't mind elaborating on how the purl is able to be used to identify vulnerabilities, I understand cpe and swid but I am falling short on finding the integration point for purl. As far as I can understand the purl can help identify the package but when it comes to actual using the purl for look up I can't find anything.

For software components, the two main sources of vulnerability information I know of are OSS Index and VulnDB. Both support purl.

OSS Index is provided by Sonatype. And is free to use.

VulnDB is provided by Risk Based Security. And is paid only.

But both sources go beyond the CVEs in the NVD.

Additionally, the centralised nature of CPEs can make them less than ideal to identify a lot of OSS components.

I'm not suggesting to leave them out if they can be accurately represented. Just that purl should be considered too.

text/0000-sbom.md Outdated Show resolved Hide resolved
@sclevine
Copy link
Member

sclevine commented Jul 1, 2021

Re: support for multiple formats: since buildpacks that output different formats would be incompatible with each other (i.e., spoil the SBoM), I think we should start with a single standardized format. The proposed design includes .cdx. as part of the file extension for every SBoM file. This gives us the ability to support additional formats in a controlled way (e.g., only when it's possible to convert between them, or such that one format is mandatory) in the future without making breaking changes. I support this approach, and suggest that we move forward with CycloneDX unless there is additional feedback.

Signed-off-by: Sambhav Kothari <skothari44@bloomberg.net>
Signed-off-by: Sambhav Kothari <skothari44@bloomberg.net>
@samj1912
Copy link
Member Author

samj1912 commented Oct 1, 2021

/queue-issue buildpacks/lifecycle "Builder should warn if newer buildpacks write a bom in *.toml"

@samj1912
Copy link
Member Author

samj1912 commented Oct 1, 2021

/queue-issue buildpacks/lifecycle "Restorer should restore bom files from app and cache" type/enhancement epic/sbom

@samj1912
Copy link
Member Author

samj1912 commented Oct 1, 2021

/queue-issue buildpacks/lifecycle "Lifecycle should inject io.buildpacks.bom.* metadata when merging SBOMs" type/enhancement epic/sbom

/queue-issue buildpacks/lifecycle "Exporter should export bom files for launch layers" type/enhancement epic/sbom

/queue-issue buildpacks/lifecycle "Exporter should cache bom files for cached layers" type/enhancement epic/sbom

/queue-issue buildpacks/lifecycle "Lifecycle should merge CycloneDX bom files" type/enhancement epic/sbom

/queue-issue buildpacks/lifecycle "Builder should copy bom files to /layers/config/sbom" type/enhancement epic/sbom

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.