Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SPDX generation using spdx-tools #1233

Open
wants to merge 24 commits into
base: main
Choose a base branch
from

Conversation

armintaenzertng
Copy link

This is set up to produce the same SPDX output as the current spdx generation module while utilising the spdx-tools library. The goal is to replace the current module with this new one, which will allow easy migration to more SPDX formats as well as SPDXv3.
I tried to stay close to the structure of the original implementation.

I tested this using the following commands:
tern report -i golang:1.12-alpine -f spdxjson -o spdx_test.json
and
tern report -i golang:1.12-alpine -f spdxjson_new -o new_spdx_test.json
I compared the resulting json files using jd, treating arrays as unordered sets. The only differences were in timestamps, UUIDs, and two differences in how json output is generated:

  • The enum value in spdx-tools is PACKAGE_MANAGER, not PACKAGE-MANAGER
  • The optional key documentDescribes has been deprecated and is therefore not used by the spdx-tools. Only the corresponding DESCRIBES relationship is serialized.

'report-{version}-{image}-{uuid}'
LICENSE_LIST_VERSION = Version(3, 20)
CREATOR_NAME = 'tern-{version}'
DOCUMENT_NAME_SNAPSHOT = 'Tern SPDX JSON SBoM' # TODO: different name here that is not specific to JSON
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps just drop the JSON from the name?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small nit: SBoM should be SBOM

tern/formats/spdx_new/file_helpers.py Outdated Show resolved Hide resolved
@armintaenzertng
Copy link
Author

I added support for YAML, XML and RDF-XML formats.

@armintaenzertng armintaenzertng force-pushed the add_spdx_tools branch 2 times, most recently from b0f076b to 9f80484 Compare June 28, 2023 05:58
@rnjudge
Copy link
Contributor

rnjudge commented Jul 10, 2023

@armintaenzertng Thank you very much for all your work on this! How does one denote which version of SPDX documents they want using this PR? I am assuming this PR, by default, generates SPDX 2.3 documents. However, we can't drop support for SPDX 2.2 since we have users who want it because it is the ISO standard version. There needs to be a way to denote SPDX version to generate on the command line before we can merge this.

@armintaenzertng
Copy link
Author

This PR currently replicates the behavior of the current state. That is, SPDX 2.2 is hardcoded into the output.
I will gladly add support for multiple SPDX versions, but we should clarify how this would work in the current workflow.

  • Should there be an optional parameter that denotes the SPDX version (that gets ignored if no SPDX output is required)? If yes, how do we call that?
  • Or will we add two (and more in the future) parameters for the -f flag, like spdxjson2.2, spdxjson2.3, spdxjson3.0 etc.? This is probably easier to implement, but will lead to much code duplication, probably not the best idea.

@armintaenzertng
Copy link
Author

After some further consideration and having a deeper look at the code, point 2 from above might be the better alternative after all. I'll try to implement that.

@armintaenzertng
Copy link
Author

I added the versioning I described above.
@rnjudge, please have a look if that is OK with you. :)

@rnjudge
Copy link
Contributor

rnjudge commented Jul 11, 2023

@armintaenzertng do you want to schedule a zoom call about this? I would like to avoid mass code duplication as that was the whole point of using the SPDX tools library.

@armintaenzertng
Copy link
Author

Yes, certainly! :)
Do you have my email address?

@armintaenzertng
Copy link
Author

I added a CLI version parameter for the output format. Formats that don't support this (i.e. everything except for SPDX) will raise an error if this is set.

@armintaenzertng
Copy link
Author

I added SPDX-2.3 functionality.

In particular, this means that if the SPDX version is 2.3, we set the primary_package_purpose of the container package to CONTAINER and omit concluded license, declared license and copyright text in SpdxPackages if possible.

tern/__main__.py Outdated
@@ -216,6 +216,9 @@ def main():
"available formats: "
"spdxtagvalue, spdxjson, cyclonedxjson, json, "
"yaml, html")
parser_report.add_argument('-fv', '--format-version',
Copy link
Contributor

@rnjudge rnjudge Jul 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this is only for SPDX, suggest making that more clear in the wording. Maybe -sv for spdx-version? Also clarify in the help that it is only for the SPDX version and will otherwise be ignored.

Copy link
Contributor

@rnjudge rnjudge Jul 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also probably want to indicate what the valid input is since 3.3 is not supported yet and also what it defaults to (2.2) if not specified.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks for the input! :)

This is set up to produce the same output as the current
spdx generation module while utilising the spdx-tools
library. The goal is to replace the current module with
this new one, which will allow easy migration to more SPDX
formats as well as SPDXv3.

Signed-off-by: Armin Tänzer <armin.taenzer@tngtech.com>
This adds support for the other three SPDX formats:
XML, YAML and RDF-XML

Signed-off-by: Armin Tänzer <armin.taenzer@tngtech.com>
This is to ensure that both the new and old versions
of the SPDX writers satisfy the same tests. This uses an
Image instance that was generated during the call of
"tern report -i golang:1.12-alpine"

Signed-off-by: Armin Tänzer <armin.taenzer@tngtech.com>
This adds new options to the -f parameter like "spdxjson22",
"spdxjson23", "spdxrdf22" etc.
Also extracts common code from the generators
of all the different formats.

Signed-off-by: Armin Tänzer <armin.taenzer@tngtech.com>
This adds new a new -fv option to specify the
version of the output format if it supports versions.

Removes the old SPDX version handling.

Signed-off-by: Armin Tänzer <armin.taenzer@tngtech.com>
was blindly copied from the previous implementation without updating the year

Signed-off-by: Armin Tänzer <armin.taenzer@tngtech.com>
this ensures compatibility with scancode

Signed-off-by: Armin Tänzer <armin.taenzer@tngtech.com>
If spdx_version==2.3, set container package
primary package purpose to CONTAINER and
omit concluded license, declared license
and copyright text in SpdxPackages if possible

Signed-off-by: Armin Tänzer <armin.taenzer@tngtech.com>
These functions have been adapted from the previous implementation
and have not been renamed so far.

Signed-off-by: Armin Tänzer <armin.taenzer@tngtech.com>
this fixes validation issues in the spdx-tools

Signed-off-by: Armin Tänzer <armin.taenzer@tngtech.com>
@armintaenzertng
Copy link
Author

@rnjudge: tern report -i photon:3.0 -f spdxjson -sv 2.3 -o output.json followed by pyspdxtools -i output.json yields no errors or invalidations, so I'll mark this "Ready for review" now.
There are still some open issues regarding Spdx documents with file information, which I collected in #1240. As these existed in the old SPDX implementation already, I'd propose to fix them in a separate PR after this one to keep things clearer (this one is already quite large).

@armintaenzertng armintaenzertng marked this pull request as ready for review July 20, 2023 09:12
@vargenau
Copy link
Contributor

This would fix #1211

@vargenau
Copy link
Contributor

@rnjudge @armintaenzertng
For specifying the SPDX version of the output, I would propose the following:

tern report -f spdxtagvalue@2.3 -i golang:1.12-alpine -o alpine.spdx

and

tern report -f spdxjson@2.3 -i golang:1.12-alpine -o alpine.spdx.json

Advantages would be that we have one less argument and it is similar to the Syft syntax:
https://github.com/anchore/syft/

spdx-tag-value: A tag-value formatted report conforming to the SPDX 2.3 specification
spdx-tag-value@2.2: A tag-value formatted report conforming to the SPDX 2.2 specification
spdx-json: A JSON report conforming to the SPDX 2.3 JSON Schema
spdx-json@2.2: A JSON report conforming to the SPDX 2.2 JSON Schema

That is what I had prototyped in #1228

What do you think?

@armintaenzertng
Copy link
Author

@vargenau: I believe the current implementation suggestion is a little more flexible in supporting more versions/formats in the future. Your proposed solution would result in five additional entrypoints (one per spdx format) per version.

Still, if necessary, the spdxjson@2.2 usecase can be easily implemented by adding a new entrypoint with the generator just calling get_spdx_from_image_list with the right parameters.

@vargenau
Copy link
Contributor

Hi @armintaenzertng
My proposal aim was:

  • to have one less option
  • to have a syntax similar to Syft
    If it is more complex to implement, no problem.

@vargenau
Copy link
Contributor

@rnjudge: tern report -i photon:3.0 -f spdxjson -sv 2.3 -o output.json followed by pyspdxtools -i output.json yields no errors or invalidations, so I'll mark this "Ready for review" now. There are still some open issues regarding Spdx documents with file information, which I collected in #1240. As these existed in the old SPDX implementation already, I'd propose to fix them in a separate PR after this one to keep things clearer (this one is already quite large).

Is is really -sv 2.3 ?
In Rose message above, it was parser_report.add_argument('-fv', '--format-version',

@armintaenzertng
Copy link
Author

I meant -sv, this was changed according to this comment.

@vargenau
Copy link
Contributor

I meant -sv, this was changed according to this comment.

Thank you, I had missed that comment.

'tern report -f spdxjson -i photon:3.0 -o spdx.json && ' \
'java -jar tools-java/target/tools-java-*-jar-with-dependencies.jar '\
'Verify spdx.json'],
'tern report -f spdxjson -i photon:3.0 -o spdx.json && '
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@armintaenzertng Do these lines need the continuation like the lines that were removed?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The continuation lines were not required in the old code, so I removed them in the new one.
See for example here.

setup.cfg Show resolved Hide resolved
spdxrdf = tern.formats.spdx.spdxrdf.generator:SpdxRDF
spdxtagvalue = tern.formats.spdx.spdxtagvalue.generator:SpdxTagValue
spdxtagvalue_legacy = tern.formats.spdx_legacy.spdxtagvalue.generator:SpdxTagValue
spdxjson_legacy = tern.formats.spdx_legacy.spdxjson.generator:SpdxJSON
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will look into this today... but I think we may need to rename the SpdxJSON class here to SpdxJSONLegacy in the actual file at tern/formats/spdx_legacy/spdxjson/generator.py. Same with the SpdxTagValue class in the legacy code.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you say that?
I just tried running tern report -f spdxjson_legacy -i photon:3.0 and it worked without errors for me.

We can of course rename them to make it clearer that this code will be deprecated.

@@ -154,8 +157,11 @@ def generate(self, image_obj_list, print_inclusive=False):
return report
return report + print_licenses_only(image_obj_list)

def generate_layer(self, layer):
def generate_layer(self, layer, version: str):
Copy link
Contributor

@rnjudge rnjudge Jul 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the version reference here spdx_version? May want to update to spdx_version for clarity and consistency. There are lots of versions throughout the code and without clarity it may seem like this is referring to layer version.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, that was an oversight!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

report_dict = get_report_dict(image_obj_list)
report = create_html_report(report_dict, image_obj_list)
return report

def generate_layer(self, layer):
def generate_layer(self, layer, version: str):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above. Update to spdx_version.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@armintaenzertng
Copy link
Author

Small update: I changed the spdx-tools dependency to the new 0.8.1 version.

-file types are now correctly initialized
-license_concluded and copyright_text are omitted
when spdx_version == "2.2"

Signed-off-by: Armin Tänzer <armin.taenzer@tngtech.com>
Some file info from tern does not come with SHA1 checksums.
This is invalid and SPDX documents can't be built without them.
This adds a workaround that resorts to the empty string SHA1
in the case that a file doesn't have a checksum.

Signed-off-by: Armin Tänzer <armin.taenzer@tngtech.com>
The test and test data were mainly meant for easier debugging,
not for full testing purposes.

Signed-off-by: Armin Tänzer <armin.taenzer@tngtech.com>
this uses the official release now

Signed-off-by: Armin Tänzer <armin.taenzer@tngtech.com>
this was only implemented for spdxjson so far

Signed-off-by: Armin Tänzer <armin.taenzer@tngtech.com>
- lazy % formatting in log strings
- imports only on top-level
- suppress unused argument warning in legacy code

Signed-off-by: Armin Tänzer <armin.taenzer@tngtech.com>
all tern entrypoints should be tested,
even if they share most of the code

Signed-off-by: Armin Tänzer <armin.taenzer@tngtech.com>
…functions

This was an oversight in a previous commit.

Signed-off-by: Armin Tänzer <armin.taenzer@tngtech.com>
Main feature of that update is much faster validation
of large SBOMs, so that validation via spdx-tools
becomes viable.

Signed-off-by: Armin Tänzer <armin.taenzer@tngtech.com>
@armintaenzertng
Copy link
Author

small fix: I changed the check for the SPDX version to use the official string "SPDX-2.2" consistently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants