Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Equalizer Function for license operators #6095

Closed
pepper-jk opened this issue Nov 18, 2022 · 10 comments
Closed

Feature Request: Equalizer Function for license operators #6095

pepper-jk opened this issue Nov 18, 2022 · 10 comments
Labels
needs info An issue where further information is required spdx-utils About the SPDX utility library

Comments

@pepper-jk
Copy link
Contributor

Context

I was trying to figure out how exceptions are detected, since double open has licenses listed redundantly in their license classification, which we adopted.

  • openvpn-openssl-exception
  • GPL-2.0-with-openvpn-openssl-exception
  • and so on

Since this specific exception is used by OpenVPN, I just scanned their repository.

And the license detected was GPL-2.0 WITH openvpn-openssl-exception. Yet another format for one and the same license, which I would need to add to my classifications file and have redundant classification.

Feature Request

I propose a license operation equalizer function, which detects -with-, WITH, OR, or-later, etc. and brings them into one format, e.g. lowercase with dashes, or whatever you fancy.

OR even better split the exception findings from the license.

This way we only need one classification per license and one per exception and can properly detect any combination.

I realize this might be be related to the scancode results. However, they might be limited by the spdx ids.
But maybe it should be implemented in scancode instead? Let me know.

Reproduce

Scan the OpenVPN repo linked above.

You will get something like this:

- license: "GPL-2.0-only WITH openvpn-openssl-exception"
  location:
    path: "COPYING"
    start_line: 20
    end_line: 32
  score: 98.31
@pepper-jk
Copy link
Contributor Author

@sschuberth
Copy link
Member

sschuberth commented Nov 18, 2022

I propose a license operation equalizer function, which detects -with-, WITH, OR, or-later, etc. and brings them into one format, e.g. lowercase with dashes, or whatever you fancy.

I guess we have something similar already, however we normalize on current SPDX expressions with operators and non-deprecated license IDs, which may be something different than what you have in mind.

OR even better split the exception findings from the license.

That's basically how it's already done on the ScanCode level: In a raw ScanCode result, licenses and exception are completely independent license finding entries, and actually ORT invests quite some effort to associate such stand-alone exception findings with their belonging license findings in order to produce correct SPDX expressions with the WITH operator.

However, they might be limited by the spdx ids.

Indeed. See spdx/spdx-spec#153 which is also somewhat related.

@sschuberth
Copy link
Member

Thinking about these again:

  1. openvpn-openssl-exception
  2. GPL-2.0-with-openvpn-openssl-exception
  3. GPL-2.0 WITH openvpn-openssl-exception

Point 1 is, strictly speaking when treated as a whole expression, another thing than 2. or 3. as the exception is missing. And between 2. and 3., only 3. is actually a valid SPDX expression. So IMO it's correct that ORT reports it that way. Actually, when working solely with ORT, you should never see 2., so there's also no need to classify that finding, and thus no duplication.

Can you shed a light on how you able to see a finding like 2. in ORT?

@sschuberth sschuberth added the spdx-utils About the SPDX utility library label Nov 18, 2022
@pepper-jk
Copy link
Contributor Author

I guess we have something similar already, however we normalize on current SPDX expressions with operators and non-deprecated license IDs, which may be something different than what you have in mind.

Great, so the feature actually exists already. I believe your approach would be the more desirable anyway.

That's basically how it's already done on the ScanCode level: In a raw ScanCode result, licenses and exception are completely independent license finding entries, and actually ORT invests quite some effort to associate such stand-alone exception findings with their belonging license findings in order to produce correct SPDX expressions with the WITH operator.

Okay, so we actually only need the 3. notation GPL-2.0 WITH openvpn-openssl-exception and would be good? That is great news.

Going a little off topic here, but we also had trouble with this regarding the or-later portion of GPL style licenses (#5967, aboutcode-org/scancode-toolkit#3128). Scancode seems to handle them the same way as exceptions. Could we fix that with the associateLicensesWithExceptions function as well?

Can you shed a light on how you able to see a finding like 2. in ORT?

We did not actually get that finding ourselves. We adopted the license classifications from the DoubleOpen project a few months back, as they had more licenses classified and more categories including license properties than the official ort-config repo.

It seems like we just need to update our license expressions according to the SPDX expressions and we are good to go. Thanks a lot for the help.

@sschuberth
Copy link
Member

Okay, so we actually only need the 3. notation GPL-2.0 WITH openvpn-openssl-exception and would be good?

I believe so, yes.

Could we fix that with the associateLicensesWithExceptions function as well?

Could you briefly recap what exactly you're looking for to get fixed?

It seems like we just need to update our license expressions according to the SPDX expressions and we are good to go.

Exactly 👍🏻

@sschuberth sschuberth added the needs info An issue where further information is required label Nov 22, 2022
@pepper-jk
Copy link
Contributor Author

pepper-jk commented Nov 22, 2022

Could we fix that with the associateLicensesWithExceptions function as well?

Could you briefly recap what exactly you're looking for to get fixed?

Sure. When scanning a repository with GPL-2.0-or-later the or-later notice gets detected separate from the GPL-2.0 license text. This results in 2 findings: GPL-2.0-only and GPL-2.0-or-later.

This is problematic if you want to create rules for upward compatibility of licenses, which we did. As we now get for example compatibility between GPL-2.0-or-later and GPL-3.0-only, but also get incompatibility between GPL-2.0-only (which is not actually present) and GPL-3.0-only.

The 2 separate findings by ScanCode and subsequently ORT in this case seems to match ScanCode's handling of exceptions, which your associateLicensesWithExceptions function attempts to combat. Maybe it or a similar function could do the same for or-later licenses.

@sschuberth
Copy link
Member

Maybe it or a similar function could do the same for or-later licenses.

I currently don't see how / cannot think of a safe heuristic right now. For example, IMO we can't simply say something like "if both GPL-2.0-only and GPL-2.0-or-later were detected in the same line of a file (or very close), always assume it's just GPL-2.0-or-later". Would you have a heuristic in mind?

@pepper-jk
Copy link
Contributor Author

What do you use as a heuristic for the exceptions? Would that be the "if both [...] were detected in the same line of a file (or very close)"?

Honestly, I don't really see why that would not apply for the GPL's or-later notice. As this is what ScanCode picks up as GPL-2.0-or-later:

licensed under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

(see diagnostics.yml in aboutcode-org/scancode-toolkit#3128 (comment))

@sschuberth
Copy link
Member

Would that be the "if both [...] were detected in the same line of a file (or very close)"?

Yes.

As this is what ScanCode picks up as GPL-2.0-or-later:

The problem is that doing this with confidence requires knowledge about how the scanner internally operates. ScanCode is not the only scanner ORT supports, and this code is shared by all scanners. So can we really safely assume that whenever any scanner finds GPL-2.0-only and GPL-2.0-or-later close to each other, always just GPL-2.0-or-later is correct?

You could have probably convinced me to implement this heuristic only for the case when both licenses are detected in the exact same line. But that seems to not be the case here (start_line: 3 vs. start_line: 6).

@pepper-jk
Copy link
Contributor Author

I understand. So we would need to investigate the behavior of FossID as well. And provide more conclusive evidence and an accompanying heuristic.

I might come back to this, once we find the time to investigate the or-later problem further.

But for now let's close this issue, as this was off topic anyway and the original feature request is already implemented in ORT. Thanks so much for your time and help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs info An issue where further information is required spdx-utils About the SPDX utility library
Projects
None yet
Development

No branches or pull requests

2 participants