Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Use new license key undetected-license #3021

Closed
AyanSinhaMahapatra opened this issue Jul 13, 2022 · 12 comments · Fixed by #3023
Closed

[RFC] Use new license key undetected-license #3021

AyanSinhaMahapatra opened this issue Jul 13, 2022 · 12 comments · Fixed by #3023
Labels

Comments

@AyanSinhaMahapatra
Copy link
Member

We have some discussion here wrt unknown licenses and how in some cases it could be confusing.

In license detection there are a few unknown cases: (which are more persistent and will be possibly)

  1. When there is a private (proprietary/commercial or others) license
  2. We have a incorrect detection as the detected text is very different from existing scancode rules
  3. We know there is a license but we cannot detect any (from legalese files/package manifests with a non empty extracted license statement)

Some part of the solution is:

  1. adding support for custom licenses to be installed (for the first case) [WIP]
  2. adding more license rules (for the second case)
  3. reporting LicenseDetections for detections we are sure about and report other matches as license_clues seperately [WIP]

But there still remains the confusion as unknown might mean licenses we don't know about when it actually means licenses that weren't detected by scancode.

So the new license-key being proposed is undetected-license.

It seems important in the context of case 3 (We know there is a license but we cannot detect any), but could also be a replacement for unknown-license?

@DennisClark @pombredanne @mjherzog what's your opinion on this?

@DennisClark
Copy link
Member

@AyanSinhaMahapatra thanks for articulating this issue. As a solution, I would prefer something a bit different, and would like to suggest:

custom-license
A license statement that identifies license terms but is not recognizable as any standard license text with a published name and version.

the word "custom" communicates here that yes, this thing really is different and might possibly be a very specific one-time-only thing,

but the word "undetected" suggests (unfairly) that there might be a limitation of the license detection process, even though we are likely looking at totally unique license terms. So better not to use that I think.

@DennisClark
Copy link
Member

Another candidate key could be
unpublished-license

making it clear that the license text has never been formalized as a standard text with its own name and version.

@DennisClark
Copy link
Member

DennisClark commented Jul 13, 2022

@AyanSinhaMahapatra and @pombredanne After further reflection on this topic, I think I prefer

unpublished-license
A license statement that states license terms but is not recognizable as any standard license text with a published name and version.

@DennisClark
Copy link
Member

one final refinement:

unpublished-license
A license statement that states license terms but is not recognizable as any standard license text with a published name and version, although there could be plans to publish it and submit it to the community as a new, unique license.

@pombredanne
Copy link
Member

@DennisClark I kinda prefer the custom-license over the unpublished-license which feels a bit too narrowly defined.

@DennisClark
Copy link
Member

@pombredanne OK, I can go with custom-license -- np.

@DennisClark
Copy link
Member

here we are:

custom-license
A license notice that states license terms but is not recognizable as any standard license text with a published name and version, although there could be plans to publish a final version of it and submit it to the community as a new, unique license.

@mjherzog
Copy link
Member

It seems that this use case is for license text that we detect as license text but cannot match to an existing license in the SC LicenseDB. (I am assuming that this is not for the case where we a reference to a license but cannot find the corresponding license text). So this seems to be a transitory license identifier where we would often add the license to LicenseDB sooner that later.
If this is correct then the most descriptive term may be "unmatched-license" meaning that it is not matched to current LicenseDB data. It is detected, but not matched. I think that unpublished and custom are too generic for this use case.

@DennisClark
Copy link
Member

my problem with undetected unknown unmatched is that those adjectives strongly imply that there is a license out there but we could not find it, which is often simply not the case. We need to recognize that a text with license-like terms may not exist anywhere else, and that is a different matter, because the problem is not an inability to find/detect/match a license, but instead finding yet-another-license-notice that has no standard version.

@mjherzog
Copy link
Member

Since we do not have the license text in LicenseDB at the time of running ScanCode "unmatched" seems most descriptive of the situation. If we research and find some standard license text that we add to LicenseDB then this text will have a match in LicenseDB in the future. If/until then this is a placeholder license key for use during scanning.

@DennisClark
Copy link
Member

We need to discuss this issue in context of the analysis provided already in this issue:
#2827

where multiple "unstated" licenses are identified with proposed descriptions, which do not yet exist in the licensedb. If we are going to create yet another "un-" license we also need to revisit those others.

unstated-license-analysis-2022-01-31.xlsx

AyanSinhaMahapatra added a commit that referenced this issue Jul 20, 2022
Update the notes for license data files with `unknown` license
keys with text authored by @DennisClark

Reference: #3021
Signed-off-by: Ayan Sinha Mahapatra <ayansmahapatra@gmail.com>
AyanSinhaMahapatra added a commit that referenced this issue Jul 20, 2022
Update the notes for license data files with `unknown` license
keys with text authored by @DennisClark. See also:
#3021

Reference: #2827
Signed-off-by: Ayan Sinha Mahapatra <ayansmahapatra@gmail.com>
@pombredanne
Copy link
Member

@DennisClark thanks! So the key thing to do is use the excellent definitions you provided in the XLSX of #3021 (comment) and add them to the notes of the license here to remove any ambiguity.

For the rare case where there is no license, we can create a no-license key, but we would never have detection rules and never detect this. This would only something to use for a manual added conclusion when there is nothing at all we could find.

@AyanSinhaMahapatra AyanSinhaMahapatra linked a pull request Jul 20, 2022 that will close this issue
4 tasks
AyanSinhaMahapatra added a commit that referenced this issue Jul 20, 2022
Update the notes for license data files with `unknown` license
keys with text authored by @DennisClark. See also:
#3021

Reference: #2827
Signed-off-by: Ayan Sinha Mahapatra <ayansmahapatra@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants