Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need a way to configure additional Copyright holders #3400

Closed
sschuberth opened this issue Nov 30, 2020 · 14 comments
Closed

Need a way to configure additional Copyright holders #3400

sschuberth opened this issue Nov 30, 2020 · 14 comments
Assignees
Labels
new feature Issues that are considered to be new features scanner About the scanner tool

Comments

@sschuberth
Copy link
Member

sschuberth commented Nov 30, 2020

While ORT has a way to add custom license texts, the license text alone is of little value if you cannot also add custom Copyright holders.

Let's start a discussion where Copyright holders could be added / curated. My current thinking is to somehow allow to put them into curations.yml as they would then nicely pair with e.g. concluded_license. However, curations.yml is not designed to curate data that is not known to the analyzer.

As I'd really like to avoid adding yet another separate configuration file for only that purpose, maybe we could extend the curation concept also to the scanner, call the current curations.yml file analyzer-curations.yml, and create a new scanner-curations.yml file that can curate multiple aspects of a scan summary. This would then also nicely allow to curate e.g. false-positive license findings from the scanner.

@mnonnenmacher @fviernau any opinion here?

@sschuberth sschuberth added new feature Issues that are considered to be new features scanner About the scanner tool labels Nov 30, 2020
@bs-ondem bs-ondem self-assigned this Jan 19, 2021
@tsteenbe
Copy link
Member

tsteenbe commented Jan 21, 2021

As copyright holders normally associated with a license for it makes sense to make adding or curating copyright holders similar to how we do things for licenses e.g. license finding curations where we enable ORT users to add / curate license found by the scanner.

My first thoughts below :

** Fixing a copyright holder finding in .ort.yml / package configuration**

curations:
  license_findings:
  - path: "src/main.cpp"
    start_lines: "3"
    line_count: 11
    detected_license: "GPL-2.0-only"
    reason: "CODE"
    comment: "The scanner matches a variable named `gpl`."
    concluded_license: "Apache-2.0"
  copyright_findings:
  - path: "src/main.cpp"
    start_lines: "2"
    line_count: 1
    detected_copyright: "Copyright 2000-$ build.year a http://www.apache.org/ Apache Software Foundation"
    reason: "VARIABLE"
    comment: "Copyright line contains variable."
    concluded_copyrights:
    - "Copyright 2000-2016 Apache Software Foundation"

Adding one or more copyright holder via ort.yml / package configuration

curations:
  copyright_findings:
  - path: "src/main.cpp"
    start_lines: "2"
    reason: "MISSING"
    comment: "Maintainer forgot to add copyright statement, found the applicable holders on the project website at http://..."
    concluded_copyrights:
    - "Copyright 2000-2016 Apache Software Foundation"
    - "Copyright 2015 HERE Europe B.V."

Remove a copyright holder in .ort.yml / package configuration copyright
Allows ORT users to remove a false finding for single review/package instead of using copyright-garbage.yml to fix it for all scans

curations:
  copyright_findings:
  - path: "src/main.cpp"
    start_lines: "2"
    detected_copyright: "(c), Ao a Au"
    reason: "INCORRECT"
    comment: "Garbage copyright finding"
    concluded_copyrights:
    - ""

@fviernau
Copy link
Member

In my experience 'concluded*' something always comes at the cost of risk and effort as it overwrites something.
Reviewing it is unnecessary complex as it requires not only the review of the (wrong) copyrights in question, but all.

We had a similar things for the declared_license curation where we implemented a declared_license_mapping.
We could solve this problem using a similar mapping. Note that the mapping as we have it now does not yet allow for adding new entries but only for correcting existing entries. To fix that I propose to allow mapping the empty string to something, and assign the semantics of adding.

Disreagarding a potential concrete implementation, I believe we need (1) map and (2) add rather than (3) overwrite all by.

@fviernau
Copy link
Member

Regarding scanner curations mentioned in the ticket.
I believe that we should always try to apply curations in the latest-possible stage (analyzer, scanner, ... you name it) of ort.
Applying curations in the scan step requires to redo the scan step in order to test whether a curation actually works before committing it.

In short, making such curation affect the scan step increases the turn-around time of support work by an order of magnitude and imo should be avoided.

@bs-ondem
Copy link
Member

@tsteenbe, @fviernau Thank you for your ideas/opinions on this issue.

I had a discussion with @oheger-bosch about this ticket and we came up with the idea of ​​treating the copyright holder curations similar to the copyright-garbage.yml file, i.e. the copyright holder curations are applied in the evaluator/reporter stage and not in the scanner stage. The copyright information is only required in the evaluator/reporter stage anyway, so there is no reason to apply them in the scanner command.

Since the copyright-garbage.yml file is also a special kind of curation, the suggestion is to establish a new generic model to curate copyright holders globally (i.e. copyright-garbage.yml is no longer necessary) and in projects/packages.

The copyright-curations.yml file could look like this (inspired by @tsteenbe):

curations:
  # Fixing detected copyrights
  - id: "Maven:org.hamcrest:hamcrest-core:1.3"
    path: "src/main.cpp"
    start_lines: "2"
    line_count: 1
    detected_copyright: "Copyright 2000-$ build.year a http://www.apache.org/ Apache Software Foundation"
    reason: "VARIABLE"
    comment: "Copyright line contains variable."
    concluded_copyrights:
      - "Copyright 2000-2016 Apache Software Foundation"
  # Add new copyright holder
  - id: "Maven:org.hamcrest:hamcrest-core:1.3"
    path: "src/main.cpp"
    start_lines: "2"
    reason: "MISSING"
    comment: "Maintainer forgot to add copyright statement, found the applicable holders on the project website at http://..."
    concluded_copyrights:
      - "Copyright 2000-2016 Apache Software Foundation"
      - "Copyright 2015 HERE Europe B.V."
  # Remove copyright holder
  - id: "Maven:org.hamcrest:hamcrest-core:1.3"
    path: "src/main.cpp"
    start_lines: "2"
    detected_copyright: "(c), Ao a Au"
    reason: "INCORRECT"
    comment: "Garbage copyright finding"
    concluded_copyrights:
      - ""
  - detected_copyright: "Global copyright garbage"
    reason: "INCORRECT"
    comment: "Garbage copyright finding"
      - ""
  - detected_copyright: "(c) , SemanticDirection.INCOMING, None"
    reason: "INCORRECT"
    comment: "Garbage copyright finding"
      - ""

What do you think about this suggestion @sschuberth, @tsteenbe, @fviernau ?

@tsteenbe
Copy link
Member

tsteenbe commented Jan 27, 2021

@bs-ondem I prefer not to have this as a separate file but to have this part of .ort.yml and package configurations as it the same paradigm for me as license finding curations. copyright-garbage.yml is special as it applies on single strings finding regardless where found which is not the case here.

Think from the view point of the user, likely a developer, that need to fix it scan. If we follow the same concepts within the same files the more easier it is for him/her to use ORT. The easier users find ORT, the more people will adopt it, the more contributions we will get.

@sschuberth
Copy link
Member Author

I must admit that the concept of LicenseFindingCuration slipped my mind. I'm having some general questions about it that I'll probably bring up in the TSC meeting.

@tsteenbe
Copy link
Member

tsteenbe commented Feb 9, 2021

Updating this issue with the outcome of the latest discussion - @sschuberth feel free to jump if corrections to the below are needed.

We thinking of adding the following new features to ORT:

  1. Add declared copyright holders to curations.yml
  2. Update the ORT's Analyzer to collect copyright holder related information in various package manager see note below
  3. Add detected copyright holder finding curations to .ort.yml file and package configurations

To make it easier for user to add, correct or remove copyright holder we thinking of a very simple spec that would only require user to specify to license for which they want to change the copyright holders. Further discussions needed to determine the exact API signature - for example what do do in cases that only a copyright is found and not license?

Note: various package managers have author/name fields that may contain copyright holder see below list

@neubs-bsi
Copy link
Contributor

The SPDX document also has a field for the packages copyrightText

@sschuberth
Copy link
Member Author

Thanks for the write-up @tsteenbe.

When starting to discuss the implementation of 1. internally, we found that most package managers call the respective field "authors" or "contributors". These fields also just contain names, no "Copyright" prefix. Thus, when compared to CopyrightFinding, it would probably be misleading to name the new field declaredCopyrights. Instead, we currently favor declaredContriburors.

Another consequence of not having a "Copyright" prefix in those package manager fields is that we have no copyright year (range). So the question if and how we should determine the copyright year. And for now we simply came to the conclusion that we shouldn't try to be too smart and simply not use a year at all, but just prefix each entry in declaredContriburors with "Copyright (C) " to create a copyright statement.

bs-ondem added a commit to boschglobal/oss-review-toolkit that referenced this issue Feb 11, 2021
This is a preparation for solving oss-review-toolkit#3400 and making copyright holder
curations possible. A new field declaredContributors is added to
Package, PackageCuration and Project in order to be able to track
fields like authors/developers/contributrs from various package
managers. These changes don't include the adaptation of the package
managers.

Signed-off-by: Onur Demirci <onur.demirci@bosch.io>
bs-ondem added a commit to boschglobal/oss-review-toolkit that referenced this issue Feb 11, 2021
This is a preparation for solving oss-review-toolkit#3400 and making copyright holder
curations possible. A new field declaredContributors is added to
Package, PackageCuration and Project in order to be able to track
fields like authors/developers/contributrs from various package
managers. These changes don't include the adaptation of the package
managers.

Signed-off-by: Onur Demirci <onur.demirci@bosch.io>
bs-ondem added a commit to boschglobal/oss-review-toolkit that referenced this issue Feb 11, 2021
This is a preparation for solving oss-review-toolkit#3400 and making copyright holder
curations possible. A new field declaredContributors is added to
Package, PackageCuration and Project in order to be able to track
fields like authors/developers/contributors from various package
managers. These changes don't include the adaptation of the package
managers.

Signed-off-by: Onur Demirci <onur.demirci@bosch.io>
bs-ondem added a commit to boschglobal/oss-review-toolkit that referenced this issue Feb 11, 2021
This is a preparation for solving oss-review-toolkit#3400 and making copyright holder
curations possible. A new field declaredContributors is added to
Package, PackageCuration and Project in order to be able to track
fields like authors/developers/contributors from various package
managers. These changes don't include the adaptation of the package
managers.

Signed-off-by: Onur Demirci <onur.demirci@bosch.io>
bs-ondem added a commit to boschglobal/oss-review-toolkit that referenced this issue Feb 11, 2021
This is a preparation for solving oss-review-toolkit#3400 and making copyright holder
curations possible. A new field declaredContributors is added to
Package, PackageCuration and Project in order to be able to track
fields like authors/developers/contributors from various package
managers. These changes don't include the adaptation of the package
managers.

Signed-off-by: Onur Demirci <onur.demirci@bosch.io>
@tsteenbe
Copy link
Member

tsteenbe commented Feb 11, 2021

@sschuberth @bs-ondem We collect this information to do notices generation to show who owns the code and under which license they distributed it. Contributors are often not from a legal point of view authors/code owner as they signed a copyright assignment e.g. this is why most ORT code is 'HERE Europe B.V.' and/or 'Bosch.IO GmbH'.

Author is a term used in copyright law see the copyright law of the united states, June 2020 - it contains 622 matches on 'author' and 0 matches on 'contributor' which is exactly why most package managers I mention in above my comment use author as a field name

Propose to rename declaredContributors to declaredAuthors if your intent is to store the under copyright law recognized authors/owner of the code for use in notice generation.

@sschuberth
Copy link
Member Author

As @tsteenbe has commented the above also here, but also added more details to there, let's continue that discussion over there.

bs-ondem added a commit to boschglobal/oss-review-toolkit that referenced this issue Feb 12, 2021
This is a preparation for solving oss-review-toolkit#3400 and making copyright holder
curations possible. A new field declaredContributors is added to
Package, PackageCuration and Project in order to be able to track
fields like authors/developers/contributors from various package
managers. These changes don't include the adaptation of the package
managers.

Signed-off-by: Onur Demirci <onur.demirci@bosch.io>
bs-ondem added a commit to boschglobal/oss-review-toolkit that referenced this issue Feb 12, 2021
This is a preparation for solving oss-review-toolkit#3400 and making copyright holder
curations possible. A new field declaredAuthors is added to
Package, PackageCuration and Project in order to be able to track
fields like authors from various package managers. These changes
don't include the adaptation of the package managers.

Signed-off-by: Onur Demirci <onur.demirci@bosch.io>
bs-ondem added a commit to boschglobal/oss-review-toolkit that referenced this issue Feb 15, 2021
This is a preparation for solving oss-review-toolkit#3400 and making copyright holder
curations possible. A new field declaredAuthors is added to
Package, PackageCuration and Project in order to be able to track
fields like authors from various package managers. These changes
don't include the adaptation of the package managers.

Signed-off-by: Onur Demirci <onur.demirci@bosch.io>
bs-ondem added a commit to boschglobal/oss-review-toolkit that referenced this issue Feb 16, 2021
This is a preparation for solving oss-review-toolkit#3400 and making copyright holder
curations possible. A new field declaredAuthors is added to
Package, PackageCuration and Project in order to be able to track
fields like authors from various package managers. These changes
don't include the adaptation of the package managers.

Signed-off-by: Onur Demirci <onur.demirci@bosch.io>
bs-ondem added a commit to boschglobal/oss-review-toolkit that referenced this issue Feb 18, 2021
This is a preparation for solving oss-review-toolkit#3400 and making copyright holder
curations possible. A new field authors is added to Package,
PackageCuration and Project in order to be able to track fields like
authors from various package managers. These changes don't include the
adaptation of the package managers.

Signed-off-by: Onur Demirci <onur.demirci@bosch.io>
sschuberth pushed a commit that referenced this issue Feb 18, 2021
This is a preparation for solving #3400 and making copyright holder
curations possible. A new field authors is added to Package,
PackageCuration and Project in order to be able to track fields like
authors from various package managers. These changes don't include the
adaptation of the package managers.

Signed-off-by: Onur Demirci <onur.demirci@bosch.io>
@neubs-bsi
Copy link
Contributor

@tsteenbe For the SpdxDocumentFile would you recommend using the field originator or copyrightText?

@bs-ondem
Copy link
Member

@sschuberth I think this issue can be closed, because it's solved with the PRs #3627, #3726 and all the other PRs in which we add support for the authors field.

@sschuberth
Copy link
Member Author

Agreed. While the existing approach might need to be extended in the future, we now have an indirect way via authors to configure additional copyright holders.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new feature Issues that are considered to be new features scanner About the scanner tool
Projects
None yet
Development

No branches or pull requests

5 participants