Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In SPDX reports, include licenseInfoFromFiles and file-level information for the scanned project itself as well #8485

Open
daniel-kr opened this issue Apr 4, 2024 · 10 comments
Labels
enhancement Issues that are considered to be enhancements reporter About the reporter tool spdx-utils About the SPDX utility library

Comments

@daniel-kr
Copy link

An ORT scan is applied on downloaded source code of external dependencies and on the scanned project itself. The latter is necessary to also cover OSS code that has been copied to the code base of a project. So far, so good. 👍

In an SPDX report, the information of the project itself is converted to an SPDX package entity as it is done for external dependencies as well. However, this project entity does neither contain the attribute licenseInfoFromFiles nor does it contain file-level information even though the scanner found (and e.g. the web app report contains) licenses. Only detected copyright statements are included in the field copyrightText.

I suggest to include licenseInfoFromFiles and, if the property file.information.enabled is set, also file-level information for the scanned project as well. Looking at the code, this should not be too difficult to achieve.

I don't know if this is a bug or a missing feature. In any case, I would volunteer to implement this change. But before I start I would like to know if you consider this a good idea and if my PR has a chance to be merged.

@sschuberth sschuberth added enhancement Issues that are considered to be enhancements reporter About the reporter tool labels Apr 4, 2024
@sschuberth
Copy link
Member

I don't know if this is a bug or a missing feature.

Maybe @fviernau could comment on that?

In any case, I would volunteer to implement this change.

That would be highly appreciated, than you for the willingness to help!

@daniel-kr
Copy link
Author

Do you consider this to be a sensible change?

@sschuberth
Copy link
Member

Personally, I believe it makes sense to treat projects and packages consistently here, yes.

Note though that SPDX does not explicitly distinguish between what ORT calls projects and packages, but just has (SPDX) packages and relations between them.

@fviernau
Copy link
Member

fviernau commented Apr 12, 2024

I don't know if this is a bug or a missing feature. In any case, I would volunteer to implement this change.

What I recall (long ago, I'm not certain anymore if that's correct) is the following:

  1. To fulfill the requirement at that time it was sufficient to implement for packages,
    so it saved some time to not do it.
  2. The value of adding projects at least to me personally was questionable:
    • For the project's, there may be different and maybe more fine grained requirements what information
      should / should not be exposed.
    • If the project is closed source, parts of the information e.g. file paths, submodule structure
      can be meaningless as these are just links to the source which cannot be looked up.
    • exposing multiple projects:
      • submodule strucure often is fine grained, and can be considered implementation detail
      • for proprietary software, I can imagine where one does not want to expose that structure
      • can lead to duplication, e.g. file findings
  3. There is a way to turn a projects directory into a dependency entry in the NOTICE_BY_PACKAGE report.
    (Nice to have if this worked also for SPDX reports, not sure if it does)

Given that I believe (without re-thinking it again more deeply) that if something for projects was implemented,

  1. There should be a toggle for enabling / disabling it
  2. I'd tend to only report a single (merged) project, instead of the submodule structure.

@tsteenbe do you maybe memorize further things, or have thoughts on this?

@daniel-kr
Copy link
Author

Thank you for the outline.

Making it configurable would be fine for me. However, I wonder what exactly should be configured? Currently, the project entries contain copyright statements found by the scanner but they do not contain license statements found by it. This is inconsistent IMO, isn't it? Other report formats like the PDF report contain both for the project. So I tend to all or nothing in that regard. The new configuration option could control if project entities are created at all. On top of that, there could be another option controlling whether file-level details are provided for project entities. I.e. the options not contain project, contain project summary, contain project with file-level details.

Having just one merged entry for the whole root project would be sufficient for me although it would be a bit more difficult to implement and questions would arise like what to put into the attribute versionInfo for the merged entry.

@kikofernandez
Copy link

For what it is worth, I think that adding this file-level information makes sense for packages that want to create a source SBOM.
That is exactly what the Python SBOM contains here (AFAIK).

For the Erlang programming language, it would be great if we can create SBOMs similar to those of Python, since we only distribute source code and not binaries (modulo one exception). From what I saw, the scan contains all the information available and pretty accurate!

This is just one more reason to encourage the author of the issue to work on this :)

@kikofernandez
Copy link

In the Erlang/OTP team, we would like to use ORT to produce an SBOM with file level information. We need to produce an SBOM similar to the Python SBOM here. This is because we do not produce binaries (modulo some exception), so we simply need to list all source files that will be included in a release with the corresponding license.

I think we should avoid duplicating efforts, so I ask:

  • Is it ok to take over this task? @sschuberth @fviernau @daniel-kr
  • @daniel-kr did you end up with some half-baked things that we can improve or based our work on?
  • If there is some clear guideline for the expected coming PR, such as which flags have been discussed, please let us know so that we take as much information as possible since the beginning :)

@sschuberth
Copy link
Member

Is it ok to take over this task?

From my perspective, yes. We'd like to encourage more external contributions, and AFAIK no one is actively working on the issue. YOu should, however, wait for #9182 to get merged as it will change the way reporters are configured.

guideline for the expected coming PR, such as which flags have been discussed

Actually, I'm not convinced yet that adding licemnseInfoFromFiles for project-packages needs to be configurable. I agree with @daniel-kr here that other reporters "expose" that information as well, and that the way we currently do it is just inconsistent.

@daniel-kr
Copy link
Author

  • @kikofernandez Sorry, I got distracted and have not started with a PR yet. Feel free to do it. I only tried it out. See this and that commit.

@fviernau
Copy link
Member

Actually, I'm not convinced yet that adding licemnseInfoFromFiles for project-packages needs to be configurable.

I still think configurability is necessary for the reasone I outlined above, see #8485 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Issues that are considered to be enhancements reporter About the reporter tool spdx-utils About the SPDX utility library
Projects
None yet
Development

No branches or pull requests

4 participants