Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use SPDX as a package manifest #439

Closed
tsteenbe opened this issue Jun 23, 2020 · 26 comments
Closed

How to use SPDX as a package manifest #439

tsteenbe opened this issue Jun 23, 2020 · 26 comments
Labels
question Request for information or clarification

Comments

@tsteenbe
Copy link
Member

tsteenbe commented Jun 23, 2020

In several use case OSS is copied into a code repository instead of included via a package manager. As OSS Review Toolkit we would like to offer users a way to define package metadata as SPDX for example a C/C++ package that was copied into a project .

We prefer to use SPDX over say DOAP or AboutCode. We came up with below minimal SPDX file - is this file correct/valid or is there a better way to do it?

Note: This ticket 's description has been updated multiple times based on people's feedback and when from a simple question to a mini specification/cookbook.

Specification

The SPDX package manifest file must:

  • Be valid SPDX-2.2
  • End with extension .spdx.yml, .spdx.yaml or .spdx.json.

We recommend to use:

  • package or the name of package for the name of the manifest file e.g. package.spdx.yml or [name of package].spdx.yml if you are describing a single root package, or
  • project or the name of the project for the name of the manifest file e.g. project.spdx.yml or [name of project].spdx.yml if you are describing a project that holds multiple other packages.

Examples

  1. SPDX manifest describes a single package

A) Project is a private fork of curl 7.70.0 in its own repository
B) Project is a private fork of curl 7.70.0 in directory libs/curl within project's repository.

Idea: For A) and B) add a file name package.spdx.yml or curl.spdx.yml to either the root of the repository for A) or to the root of the libs/curl directory for B).

Directory lay-out for scenario B):

.
├── LICENSE
├── README.md
├── libs
│   └── curl
│       ├── CMakeLists.txt
│       ├── package.spdx.yml
│       └── src/...
└── main.c

package.spdx.yml:

SPDXID: "SPDXRef-DOCUMENT"
spdxVersion: "SPDX-2.2"
creationInfo:
  created: "2020-07-23T18:30:22Z"
  creators:
  - "Organization: Example Inc."
  - "Person: Thomas Steenbergen"
  licenseListVersion: "3.9"
name: "curl-7.70.0"
dataLicense: "CC0-1.0"
documentNamespace: "http://spdx.org/spdxdocs/spdx-document-curl"
documentDescribes:
- "SPDXRef-Package-curl"
packages:
- SPDXID: "SPDXRef-Package-curl"
  description: "A command line tool and library for transferring data with URL syntax, supporting \
     HTTP, HTTPS, FTP, FTPS, GOPHER, TFTP, SCP, SFTP, SMB, TELNET, DICT, LDAP, LDAPS, MQTT, FILE, \
     IMAP, SMTP, POP3, RTSP and RTMP. libcurl offers a myriad of powerful features."
  copyrightText: "Copyright (c) 1996 - 2020, Daniel Stenberg, <daniel@haxx.se>, and many
    contributors, see the THANKS file."
  downloadLocation: "git+https://github.com/curl/curl.git@53cdc2c963e33bc0cc1a51ad2df79396202e07f8"
  filesAnalyzed: false
  homepage: "https://curl.haxx.se/"
  licenseConcluded: "NOASSERTION"
  licenseDeclared: "curl"
  name: "curl"
  versionInfo: "7.70.0"
  packageFileName: "./"
  1. SPDX manifest describes a project that includes multiple packages

Project XYZ code repository contains two subdirectories with private forks of curl 7.70.0 in directory ./libs/curl and openssl 1.1.1g in directory ./libs/openssl

Idea: Add a file named project.spdx.yml or xyz.spdx.yml in the root of the code repository containing project xyz e.g.

.
├── LICENSE
├── README.md
├── libs
│   ├── curl/..
│   └── openssl/..
├── main.c
└── project.spdx.yml

project.spdx.yml:

SPDXID: "SPDXRef-DOCUMENT"
spdxVersion: "SPDX-2.2"
creationInfo:
  created: "2020-07-23T18:30:22Z"
  creators:
  - "Organization: Example Inc."
  - "Person: Thomas Steenbergen"
  licenseListVersion: "3.9"
name: "xyz-0.1.0"
dataLicense: "CC0-1.0"
documentNamespace: "http://spdx.org/spdxdocs/spdx-document-xyz"
documentDescribes:
- "SPDXRef-Package-xyz"
packages:
- SPDXID: "SPDXRef-Package-xyz"
  summary: "Awesome product created by Example Inc."
  copyrightText: "copyright 2004-2020 Example Inc. All Rights Reserved."
  downloadLocation: "git+ssh://gitlab.example.com:3389/products/xyz.git@b2c358080011af6a366d2512a25a379fbe7b1f78"
  filesAnalyzed: false
  homepage: "https://example.com/products/xyz"
  licenseConcluded:  "NOASSERTION"
  licenseDeclared: "Apache-2.0 AND curl AND LicenseRef-Proprietary-ExampleInc"
  name: "xyz"
  versionInfo: "0.1.0"
- SPDXID: "SPDXRef-Package-curl"
  description: "A command line tool and library for transferring data with URL syntax, supporting \
     HTTP, HTTPS, FTP, FTPS, GOPHER, TFTP, SCP, SFTP, SMB, TELNET, DICT, LDAP, LDAPS, MQTT, FILE, \
     IMAP, SMTP, POP3, RTSP and RTMP. libcurl offers a myriad of powerful features."
  copyrightText: "Copyright (c) 1996 - 2020, Daniel Stenberg, <daniel@haxx.se>, and many
    contributors, see the THANKS file."
  downloadLocation: "https://github.com/curl/curl/releases/download/curl-7_70_0/curl-7.70.0.tar.gz"
  filesAnalyzed: false
  homepage: "https://curl.haxx.se/"
  licenseConcluded: "NOASSERTION"
  licenseDeclared: "curl"
  name: "curl"
  packageFileName: "./libs/curl"
  versionInfo: "7.70.0"
- SPDXID: "SPDXRef-Package-openssl"
  description: "OpenSSL is a robust, commercial-grade, full-featured Open Source Toolkit for the Transport Layer Security (TLS) protocol formerly known as the Secure Sockets Layer (SSL) protocol. The protocol implementation is based on a full-strength general purpose cryptographic library, which can also be used stand-alone."
  copyrightText: "copyright 2004-2020 The OpenSSL Project Authors. All Rights Reserved."
  downloadLocation: "git+ssh://github.com/openssl/openssl.git@e2e09d9fba1187f8d6aafaa34d4172f56f1ffb72"
  filesAnalyzed: false
  homepage: "https://www.openssl.org/"
  licenseConcluded: "NOASSERTION"
  licenseDeclared: "Apache-2.0"
  packageFileName: "./libs/openssl"
  name: "openssl"
  versionInfo: "1.1.1g"
relationships:
- spdxElementId: "SPDXRef-Package-xyz"
  relatedSpdxElement: "SPDXRef-Package-curl"
  relationshipType: "CONTAINS"
- spdxElementId: "SPDXRef-Package-xyz"
  relatedSpdxElement: "SPDXRef-Package-openssl"
  relationshipType: "CONTAINS"
@tsteenbe tsteenbe added the question Request for information or clarification label Jun 23, 2020
@swinslow
Copy link
Member

+1 from me on the general approach! I haven't held it up yet against the SPDX 2.2 example file for YAML (which I suspect is the closest we currently have to specifying how this should look) but I like this approach.

A couple of minor comments from a first glance:

  • I like the single consistent file name of package.spdx.yml across all directories / subdirectories. Makes it easier to walk through a directory tree and find all SPDX documents like this that are present.
  • For YAML, is it more typical to omit quotation marks around strings? I don't know offhand if this is permitted / preferred.
  • in the creators array, I suspect the second one should have Person: added: `- "Person: Thomas Steenbergen"
  • For the documentNamespace field, I'd welcome input from @goneall @zvr and/or others who might have opinions on how best to recommend people to use this field in this context. In a perfect world I'd love for it to be able to reference this version of this SPDX file in this particular commit, but I assume that's recursive in a way that doesn't work :)

And an example / question:

  • Assume that repo github.com/swinslow/foo has a declared license of MIT.
  • In a subdirectory /bar/ there is a component bar with a license of Apache-2.0.
  • In the top-level package.spdx.yml file for foo, what should the licenseConcluded field be?
    • MIT
    • MIT AND Apache-2.0

Really I'm asking if licenseConcluded should "roll up" all of the licenses for what is contained within it, even if those are in different sub-packages.

@tsteenbe
Copy link
Member Author

@swinslow Thank you for your feedback I fixed creators array

@tsteenbe
Copy link
Member Author

tsteenbe commented Jun 23, 2020

Noticed I am forgot to add the required the package verification code.

I am correct that to simply add the SHA1 sum for https://github.com/curl/curl/releases/download/curl-7_70_0/curl-7.70.0.tar.gz? e.g.

checksums:
  - algorithm: "SHA1"
    checksumValue: "cfa63c38800d7fcf90328d10f61a191dba475762"

What should be the package verification code value if the package is a specific SHA1 taken from a GitHub repo?
Use the Git revision as checksumValue?

@swinslow
Copy link
Member

@tsteenbe I don't think package verification code is required when filesAnalyzed is false...

@swinslow
Copy link
Member

https://spdx.github.io/spdx-spec/v2-draft/package-information/#79-package-verification-code-field

Cardinality | 0..1 if FilesAnalyzed (7.8) is true or omitted, 0..0 (must be omitted) if FilesAnalyzed is false.

@tsteenbe
Copy link
Member Author

@swinslow You're right forgot about that .. still interested if filesAnalyzed were to be true what would the value be?

@swinslow
Copy link
Member

@tsteenbe that's a good question and I don't know the answer :) If I understand the question, I'd assume the process is:

  • pull the repo
  • check out that particular SHA1 commit
  • adjust as needed:
    • if doing the repo as a whole, remove or ignore the top-level .git directory
    • or if doing a subdirectory, change into that subdirectory
  • apply the code algorithm to all the files in or below that directory
  • presumably also put an (excludes: package.spdx.yml) after the code in the manifest (see here)

Let me know if there's something I'm missing in your hypo...

(but I assume that for this manifest format, the presumption should be that people will want to use filesAnalyzed: false, otherwise they'd need to recalculate the hashes / verification code on every commit, which sounds unrealistic in practice.)

@swinslow
Copy link
Member

One other simplification to consider: If the document will have only one package in it, then I don't think you need a documentDescribes field. The DESCRIBES relationship I believe is only required when the document contains multiple packages.

@goneall
Copy link
Member

goneall commented Jun 23, 2020

One other simplification to consider: If the document will have only one package in it, then I don't think you need a documentDescribes field. The DESCRIBES relationship I believe is only required when the document contains multiple packages.

That's true for tag/value but not true for RDF/XML and undefined for YAML/JSON/XML.

I personally would like to make documentDescribes required even if it is only one package. This makes it easier for tooling and humans alike and avoid possible human error when creating or readin the files.

@goneall
Copy link
Member

goneall commented Jun 23, 2020

On the tech call today, it was suggested that any of the serialization formats be allowed.

I don't know if all formats need to be supported - we could restrict it to more human readable versions. Since tag/value also has some structural issues in representing nested objects, we could start with YAML, JSON. Once XML is solid, we may want to add that since many of the current package managers use XML (e.g. Maven).

@tsteenbe
Copy link
Member Author

@goneall My idea was to start with a package.spdx.yaml and package.spdx.json but already looking at XML. Have it on my to-do list to fill a ticket with some .spdx.xml improvements.

@tsteenbe
Copy link
Member Author

tsteenbe commented Jun 24, 2020

@goneall @zvr What do you think would be a good recommendation to use as documentNamespace in this context?

I know the spec says http://[CreatorWebsite]/[pathToSpdx]/[DocumentName]-[UUID] so my recommendation to use would be to use URL from which the spdx file can be directory downloaded. For SPDX 2.2 example it would be https://raw.githubusercontent.com/spdx/spdx-spec/development/v2.2.1/examples/SPDXYAMLExample-2.2.spdx.yaml. Or alternatively if one doe snot have a website use http://spdx.org/spdxdocs/[DocumentName].

@goneall
Copy link
Member

goneall commented Jun 24, 2020

What do you think would be a good recommendation to use as documentNamespace in this context?

I like having the URL where the SPDX file is located. The only thing that may be an issue is if the SPDX file is ever modified at the same URL, it really should have a different namespace. I was thinking if we could have the commit hash in the namespace, but that wouldn't work since you could only get the commit hash after you committed the file containing the namespace - a bit of a recursive problem.

Adding a generated UUID at the end of the URL would help make sure it is unique. Not easy to generate by hand, but there are websites that can generate UUID (e.g. https://www.uuidgenerator.net/version4).

@zvr
Copy link
Member

zvr commented Jun 26, 2020

Have I ever mentioned SWHIDs? ;-)
You know, the SoftWare Heritage IDentifiers, that you can think of as "hey, a hash that also works fine for directory trees".
Which makes it a very nice index in a collection (database) of components, with immediately knowing whether something has been modified from an identically-named other component... ("is your curl the same as my curl?")

Well, it turns out that these marvelous SWHIDs can be used not only for values of external reference fields, but also incorporated into strings like document names and namespaces... (e.g. http://corp.com/spdx/swh:1:dir:7a6bcf6db04fd1984f15750b9317d505c9d476d2).

@sschuberth
Copy link
Member

For A) and B add a file name package.spdx.yml to either the root of the repository for A) or to the root of the curl directory for B).

Nit: Shouldn't the file be called packages.spdx.yml (plural) as it seems to support multiple entries below packages:?

@blaumeiser-at-bosch
Copy link

+1 from my side, sounds like the best approach I heard in this regard so far.

@sschuberth
Copy link
Member

Noticed I am forgot to add the required the package verification code.

@tsteenbe , would you mind updating your original post accordingly so we have one place which contains the latest / complete example?

@tsteenbe
Copy link
Member Author

@sschuberth Example updated, note that package verification code is not needed as FilesAnalyzed is false see also https://spdx.github.io/spdx-spec/3-package-information/#39-package-verification-code.

@tsteenbe
Copy link
Member Author

And an example / question:

* Assume that repo `github.com/swinslow/foo` has a declared license of MIT.

* In a subdirectory `/bar/` there is a component `bar` with a license of Apache-2.0.

* In the top-level `package.spdx.yml` file for `foo`, what should the `licenseConcluded` field be?
  
  * MIT
  * MIT AND Apache-2.0

Really I'm asking if licenseConcluded should "roll up" all of the licenses for what is contained within it, even if those are in different sub-packages.

@swinslow Yes, that how we always interpreted licenseConcluded per spdx spec "Contain the license the SPDX file creator has concluded as governing the package" as in the above examples for project.spdx.yml I used the CONTAINS relationship as both curl and openssl are part of xyz package therefore their combined licenseConcluded expression should be in the licenseConcluded of xyz.

Used "NOASSERTION" as the value licenseConcluded in the above examples for package.spdx.yml and project.spdx.yml to:

  1. Keep things simple for people who will be writing these files by hand
  2. Expect tooling like OSS Review Toolkit to be used to determine licenseConcluded using on scan results and excludes and license finding curations

However, creator of package.spdx.yml and project.spdx.yml are free set licenseConcluded based on their conclusions but tools should be free to determine if the use licenseConcluded.

@zvr
Copy link
Member

zvr commented Jun 30, 2020

I don't think we can prescribe how a package-level concluded license is constructed by the (concluded) licenses of the different contents. It might be License-1, or License-2, or License-1 AND License-2, or ...

In the specific case that @tsteenbe describes, where the package CONTAINS the two independent components curl and openssl, I agree that "L1 AND L2" is a reasonable choice.

sschuberth added a commit to oss-review-toolkit/ort that referenced this issue Jul 6, 2020
See spdx/spdx-spec#439 for details.

Signed-off-by: Sebastian Schuberth <sebastian.schuberth@bosch.io>
sschuberth added a commit to oss-review-toolkit/ort that referenced this issue Jul 6, 2020
See spdx/spdx-spec#439 for details.

Signed-off-by: Sebastian Schuberth <sebastian.schuberth@bosch.io>
sschuberth added a commit to oss-review-toolkit/ort that referenced this issue Jul 6, 2020
See spdx/spdx-spec#439 for details.

Signed-off-by: Sebastian Schuberth <sebastian.schuberth@bosch.io>
sschuberth added a commit to oss-review-toolkit/ort that referenced this issue Jul 6, 2020
See spdx/spdx-spec#439 for details.

Signed-off-by: Sebastian Schuberth <sebastian.schuberth@bosch.io>
@sschuberth
Copy link
Member

sschuberth commented Jul 7, 2020

@tsteenbe, shouldn't this bit

relationships:
- spdxElementId: "SPDXRef-Package-xyz"
  relatedSpdxElement: "SPDXRef-Package-curl"
  relationshipType: "CONTAINS"
- spdxElementId: "SPDXRef-Package-xyz"
  relatedSpdxElement: "SPDXRef-Package-openssl"
  relationshipType: "CONTAINS"

rather say

relationships:
- spdxElementId: "SPDXRef-Package-xyz"
  relatedSpdxElement: "SPDXRef-Package-curl"
  relationshipType: "DEPENDS_ON"
- spdxElementId: "SPDXRef-Package-xyz"
  relatedSpdxElement: "SPDXRef-Package-openssl"
  relationshipType: "DEPENDS_ON"

I.e. use DEPENDS_ON instead of CONTAINS?

Edit: More findings:

  • SPDXRef-Package-xyz lacks the mandatory copyrightText field, i.e. copyrightText: "Copyright (C) 2020 Example Inc.".
  • packageFileName needs to start lower case.

sschuberth added a commit to oss-review-toolkit/ort that referenced this issue Jul 7, 2020
See spdx/spdx-spec#439 for details.

Signed-off-by: Sebastian Schuberth <sebastian.schuberth@bosch.io>
sschuberth added a commit to oss-review-toolkit/ort that referenced this issue Jul 7, 2020
See spdx/spdx-spec#439 for details.

Signed-off-by: Sebastian Schuberth <sebastian.schuberth@bosch.io>
sschuberth added a commit to oss-review-toolkit/ort that referenced this issue Jul 7, 2020
See spdx/spdx-spec#439 for details.

Signed-off-by: Sebastian Schuberth <sebastian.schuberth@bosch.io>
@tsteenbe
Copy link
Member Author

tsteenbe commented Jul 7, 2020

@sschuberth Fixed copyrightText and packageFileName in the example.

DEPENDS_ON is "Package A depends on the presence of package B in order to build and run" - whilst in the example I assume openssl and curl to be part of package XYZ. You would use DEPENDS_ON for example if package XYZ depends on system library curl on say Android.

@sschuberth
Copy link
Member

DEPENDS_ON is "Package A depends on the presence of package B in order to build and run" - whilst in the example I assume openssl and curl to be part of package XYZ. You would use DEPENDS_ON for example if package XYZ depends on system library curl on say Android.

Hmm. And what relation would you use for a dependency on a dynamic library that is not a system library, DEPENDENCY_OF? In other words, are DEPENDS_ON and DEPENDENCY_OF not symmetric, i.e. the same relation but once viewed from the source, and once viewed from the target?

sschuberth added a commit to oss-review-toolkit/ort that referenced this issue Jul 7, 2020
See spdx/spdx-spec#439 for details.

Signed-off-by: Sebastian Schuberth <sebastian.schuberth@bosch.io>
sschuberth added a commit to oss-review-toolkit/ort that referenced this issue Jul 7, 2020
See spdx/spdx-spec#439 for details.

Signed-off-by: Sebastian Schuberth <sebastian.schuberth@bosch.io>
sschuberth added a commit to oss-review-toolkit/ort that referenced this issue Jul 7, 2020
See the discussion at spdx/spdx-spec#439 for
details.

Signed-off-by: Sebastian Schuberth <sebastian.schuberth@bosch.io>
sschuberth added a commit to oss-review-toolkit/ort that referenced this issue Jul 7, 2020
See the discussion at spdx/spdx-spec#439 for
details.

Signed-off-by: Sebastian Schuberth <sebastian.schuberth@bosch.io>
sschuberth added a commit to oss-review-toolkit/ort that referenced this issue Jul 8, 2020
See the discussion at spdx/spdx-spec#439 for
details.

Signed-off-by: Sebastian Schuberth <sebastian.schuberth@bosch.io>
@tsteenbe
Copy link
Member Author

tsteenbe commented Jul 9, 2020

And what relation would you use for a dependency on a dynamic library that is not a system library

@sschuberth The differences between DEPENDS_ON and DEPENDENCY_OF is that the first is the generic dependency relationship and the later a specific DEPENDS_ON relationship for primarily package managed dependencies introduced via say a pom.xml, build.gradle, or package.json file.

To make things even more complex there is also a PREREQUISITE_OF. We talked about to deprecating PREREQUISITE_OF in SPDX 3.0 in favor of DEPENDENCY_OF as it more powerful in expressing the type of dependency relationship, maybe we need to consider dropping DEPENDS_ON as well to make things easier.

See also #154 in which DEPENDENCY_OF was introduced and examples in https://spdx.github.io/spdx-spec/v2-draft/relationships-between-SPDX-elements/

  • DEPENDS_ON Package A depends on the presence of package B in order to build and run
  • DEPENDENCY_OF A is explicitly stated as a dependency of B in a machine-readable file.
  • PREREQUISITE_OF Is to be used when SPDXRef-A is a prerequisite for SPDXRef-B.

sschuberth added a commit to oss-review-toolkit/ort that referenced this issue Jul 9, 2020
See the discussion at spdx/spdx-spec#439 for
details.

Signed-off-by: Sebastian Schuberth <sebastian.schuberth@bosch.io>
@sschuberth
Copy link
Member

maybe we need to consider dropping DEPENDS_ON as well to make things easier.

👍 on that one, as having both DEPENDS_ON and DEPENDENCY_OF is just confusing and seem unnecessary. Actualy, it sounds like DEPENDS_ON should have been what PREREQUISITE_OF is. But in terms of prerequisites I'd clearly distinguish between "build prerequisites " and "run prerequisites".

sschuberth added a commit to oss-review-toolkit/ort that referenced this issue Jul 9, 2020
See the discussion at spdx/spdx-spec#439 for
details.

Signed-off-by: Sebastian Schuberth <sebastian.schuberth@bosch.io>
sschuberth added a commit to oss-review-toolkit/ort that referenced this issue Jul 10, 2020
See the discussion at spdx/spdx-spec#439 for
details.

Signed-off-by: Sebastian Schuberth <sebastian.schuberth@bosch.io>
sschuberth added a commit to oss-review-toolkit/ort that referenced this issue Jul 10, 2020
See the discussion at spdx/spdx-spec#439 for
details.

Signed-off-by: Sebastian Schuberth <sebastian.schuberth@bosch.io>
sschuberth added a commit to oss-review-toolkit/ort that referenced this issue Jul 10, 2020
See the discussion at spdx/spdx-spec#439 for
details.

Signed-off-by: Sebastian Schuberth <sebastian.schuberth@bosch.io>
sschuberth added a commit to oss-review-toolkit/ort that referenced this issue Jul 10, 2020
See the discussion at spdx/spdx-spec#439 for
details.

Signed-off-by: Sebastian Schuberth <sebastian.schuberth@bosch.io>
tardyp added a commit to tardyp/tools-python that referenced this issue Jul 29, 2021
example taken from spdx/spdx-spec#439
coming from ART people

Signed-off-by: Pierre Tardy <pierre.tardy@renault.com>
@kestewart
Copy link
Contributor

Closing this, as I believe the original point has been answered. Please reopen if you disagree.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Request for information or clarification
Projects
None yet
Development

No branches or pull requests

7 participants