Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scancode-toolkit-31.0.2 is not recognizing a declared license choice #3082

Open
DennisClark opened this issue Sep 1, 2022 · 1 comment
Open
Assignees
Labels

Comments

@DennisClark
Copy link
Member

DennisClark commented Sep 1, 2022

scanned doris-1.1.1-rc03 ( available at https://github.com/apache/doris/archive/refs/tags/1.1.1-rc03.tar.gz )
using scancode-toolkit-31.0.2
and although it detected most of the licenses in the rather complex notice (attached) in
doris-1.1.1-rc03/dist/LICENSE-dist.txt
it returns unknown-license-reference twice for portions of the following text:

be/src/env (some portions): 3-clause BSD

Some portions of this module are derived from code from RocksDB
( https://github.com/facebook/rocksdb ). RocksDB is dual-licensed
 under both the GPLv2 and Apache 2.0 License. We select Apache 2.0
License.

--------------------------------------------------------------------------------

be/src/util/coding.*: this code is licensed under both GPLv2 and Apache 2.0 License.
                      Doris chooses Apache 2.0 License.

  Copyright (c) 2011-present, Facebook, Inc.  All rights reserved.
  This source code is licensed under both the GPLv2 (found in the
  COPYING file in the root directory) and Apache 2.0 License
  (found in the LICENSE.Apache file in the root directory).

  Copyright (c) 2011 The LevelDB Authors. All rights reserved.
  Use of this source code is governed by a BSD-style license that can be
  found in the LICENSE file. See the AUTHORS file for names of contributors.

See lines 9463 forward in the scan results for multiple cases where SCTK finds multiple cases of both unknown-license-reference and also identifies the license choices.

In these cases the authors declare the choice of Apache 2.0, using slightly different language in each case. It would be great if SCTK could be improved to interpret these instances of a dual-license choice being declared in the license notice and not to return an unknown-license-reference.

LICENSE-dist.txt.zip

doris-1.1.1-rc03-results.json.zip

@AyanSinhaMahapatra
Copy link
Member

@DennisClark this is already fixed (in the sense we don't return an unknown anymore) in the LicenseDetection branch for the upcoming release: https://github.com/nexB/scancode-toolkit/tree/add-license-detection.

Similar to Issue 2 in #3069 (comment) and also similar to this issue reported by eclipse foundation here: #2878 (comment), this is solved by:

Here the detection rule is "unknown-intro-followed-by-match" i.e. an unknown intro was there followed by a proper detection and so this unknown can be removed. This is achieved by tagging specific rules as is_license_intro as True.

It would be great if SCTK could be improved to interpret these instances of a dual-license choice being declared

But this still detects (gpl-2.0 OR apache-2.0) and not just apache-2.0 as marked by Doris chooses Apache 2.0 License or similar statements.

To solve that part we do have to add rules, and this cannot be solved by generalized processing.

New license detection looks like this:

      "detected_license_expression": "bsd-new AND ((gpl-2.0 OR apache-2.0) AND bsd-new)",
      "detected_license_expression_spdx": "BSD-3-Clause AND ((GPL-2.0-only OR Apache-2.0) AND BSD-3-Clause)",
      "license_detections": [
        {
          "license_expression": "bsd-new",
          "detection_rules": [
            "not-combined"
          ],
          "matches": [
            {
              "score": 100.0,
              "start_line": 1,
              "end_line": 1,
              "matched_length": 3,
              "match_coverage": 100.0,
              "matcher": "2-aho",
              "license_expression": "bsd-new",
              "rule_identifier": "bsd-new_308.RULE",
              "referenced_filenames": [],
              "is_license_text": false,
              "is_license_notice": false,
              "is_license_reference": true,
              "is_license_tag": false,
              "is_license_intro": false,
              "rule_length": 3,
              "rule_relevance": 100,
              "matched_text": "3-clause BSD",
              "licenses": [
                {
                  "key": "bsd-new",
                  "name": "BSD-3-Clause",
                  "short_name": "BSD-3-Clause",
                  "category": "Permissive",
                  "is_exception": false,
                  "is_unknown": false,
                  "owner": "Regents of the University of California",
                  "homepage_url": "http://www.opensource.org/licenses/BSD-3-Clause",
                  "text_url": "http://www.opensource.org/licenses/BSD-3-Clause",
                  "reference_url": "https://scancode-licensedb.aboutcode.org/bsd-new",
                  "scancode_text_url": "https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/licenses/bsd-new.LICENSE",
                  "scancode_data_url": "https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/licenses/bsd-new.yml",
                  "spdx_license_key": "BSD-3-Clause",
                  "spdx_url": "https://spdx.org/licenses/BSD-3-Clause"
                }
              ]
            }
          ]
        },
        {
          "license_expression": "(gpl-2.0 OR apache-2.0) AND bsd-new",
          "detection_rules": [
            "unknown-intro-followed-by-match"
          ],
          "matches": [
            {
              "score": 100.0,
              "start_line": 4,
              "end_line": 5,
              "matched_length": 3,
              "match_coverage": 100.0,
              "matcher": "2-aho",
              "license_expression": "unknown-license-reference",
              "rule_identifier": "lead-in_unknown_30.RULE",
              "referenced_filenames": [],
              "is_license_text": false,
              "is_license_notice": false,
              "is_license_reference": false,
              "is_license_tag": false,
              "is_license_intro": true,
              "rule_length": 3,
              "rule_relevance": 100,
              "matched_text": "dual-licensed\n under",
              "licenses": [
                {
                  "key": "unknown-license-reference",
                  "name": "Unknown License file reference",
                  "short_name": "Unknown License reference",
                  "category": "Unstated License",
                  "is_exception": false,
                  "is_unknown": true,
                  "owner": "Unspecified",
                  "homepage_url": null,
                  "text_url": "",
                  "reference_url": "https://scancode-licensedb.aboutcode.org/unknown-license-reference",
                  "scancode_text_url": "https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/licenses/unknown-license-reference.LICENSE",
                  "scancode_data_url": "https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/licenses/unknown-license-reference.yml",
                  "spdx_license_key": "LicenseRef-scancode-unknown-license-reference",
                  "spdx_url": "https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/licenses/unknown-license-reference.LICENSE"
                }
              ]
            },
            {
              "score": 33.33,
              "start_line": 4,
              "end_line": 6,
              "matched_length": 11,
              "match_coverage": 33.33,
              "matcher": "3-seq",
              "license_expression": "gpl-2.0 OR apache-2.0",
              "rule_identifier": "gpl-2.0_or_apache-2.0_2.RULE",
              "referenced_filenames": [
                "COPYING",
                "LICENSE.Apache"
              ],
              "is_license_text": false,
              "is_license_notice": true,
              "is_license_reference": false,
              "is_license_tag": false,
              "is_license_intro": false,
              "rule_length": 33,
              "rule_relevance": 100,
              "matched_text": "licensed\n under both the GPLv2 and Apache 2.0 License. We select Apache 2.0\nLicense.",
              "licenses": [
                {
                  "key": "gpl-2.0",
                  "name": "GNU General Public License 2.0",
                  "short_name": "GPL 2.0",
                  "category": "Copyleft",
                  "is_exception": false,
                  "is_unknown": false,
                  "owner": "Free Software Foundation (FSF)",
                  "homepage_url": "http://www.gnu.org/licenses/gpl-2.0.html",
                  "text_url": "http://www.gnu.org/licenses/gpl-2.0.txt",
                  "reference_url": "https://scancode-licensedb.aboutcode.org/gpl-2.0",
                  "scancode_text_url": "https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/licenses/gpl-2.0.LICENSE",
                  "scancode_data_url": "https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/licenses/gpl-2.0.yml",
                  "spdx_license_key": "GPL-2.0-only",
                  "spdx_url": "https://spdx.org/licenses/GPL-2.0-only"
                },
                {
                  "key": "apache-2.0",
                  "name": "Apache License 2.0",
                  "short_name": "Apache 2.0",
                  "category": "Permissive",
                  "is_exception": false,
                  "is_unknown": false,
                  "owner": "Apache Software Foundation",
                  "homepage_url": "http://www.apache.org/licenses/",
                  "text_url": "http://www.apache.org/licenses/LICENSE-2.0",
                  "reference_url": "https://scancode-licensedb.aboutcode.org/apache-2.0",
                  "scancode_text_url": "https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/licenses/apache-2.0.LICENSE",
                  "scancode_data_url": "https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/licenses/apache-2.0.yml",
                  "spdx_license_key": "Apache-2.0",
                  "spdx_url": "https://spdx.org/licenses/Apache-2.0"
                }
              ]
            },
            {
              "score": 36.36,
              "start_line": 10,
              "end_line": 11,
              "matched_length": 12,
              "match_coverage": 36.36,
              "matcher": "3-seq",
              "license_expression": "gpl-2.0 OR apache-2.0",
              "rule_identifier": "gpl-2.0_or_apache-2.0_2.RULE",
              "referenced_filenames": [
                "COPYING",
                "LICENSE.Apache"
              ],
              "is_license_text": false,
              "is_license_notice": true,
              "is_license_reference": false,
              "is_license_tag": false,
              "is_license_intro": false,
              "rule_length": 33,
              "rule_relevance": 100,
              "matched_text": "code is licensed under both GPLv2 and Apache 2.0 License.\n                      Doris chooses Apache 2.0 License.",
              "licenses": [
                {
                  "key": "gpl-2.0",
                  "name": "GNU General Public License 2.0",
                  "short_name": "GPL 2.0",
                  "category": "Copyleft",
                  "is_exception": false,
                  "is_unknown": false,
                  "owner": "Free Software Foundation (FSF)",
                  "homepage_url": "http://www.gnu.org/licenses/gpl-2.0.html",
                  "text_url": "http://www.gnu.org/licenses/gpl-2.0.txt",
                  "reference_url": "https://scancode-licensedb.aboutcode.org/gpl-2.0",
                  "scancode_text_url": "https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/licenses/gpl-2.0.LICENSE",
                  "scancode_data_url": "https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/licenses/gpl-2.0.yml",
                  "spdx_license_key": "GPL-2.0-only",
                  "spdx_url": "https://spdx.org/licenses/GPL-2.0-only"
                },
                {
                  "key": "apache-2.0",
                  "name": "Apache License 2.0",
                  "short_name": "Apache 2.0",
                  "category": "Permissive",
                  "is_exception": false,
                  "is_unknown": false,
                  "owner": "Apache Software Foundation",
                  "homepage_url": "http://www.apache.org/licenses/",
                  "text_url": "http://www.apache.org/licenses/LICENSE-2.0",
                  "reference_url": "https://scancode-licensedb.aboutcode.org/apache-2.0",
                  "scancode_text_url": "https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/licenses/apache-2.0.LICENSE",
                  "scancode_data_url": "https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/licenses/apache-2.0.yml",
                  "spdx_license_key": "Apache-2.0",
                  "spdx_url": "https://spdx.org/licenses/Apache-2.0"
                }
              ]
            },
            {
              "score": 100.0,
              "start_line": 14,
              "end_line": 16,
              "matched_length": 33,
              "match_coverage": 100.0,
              "matcher": "2-aho",
              "license_expression": "gpl-2.0 OR apache-2.0",
              "rule_identifier": "gpl-2.0_or_apache-2.0_2.RULE",
              "referenced_filenames": [
                "COPYING",
                "LICENSE.Apache"
              ],
              "is_license_text": false,
              "is_license_notice": true,
              "is_license_reference": false,
              "is_license_tag": false,
              "is_license_intro": false,
              "rule_length": 33,
              "rule_relevance": 100,
              "matched_text": "This source code is licensed under both the GPLv2 (found in the\n  COPYING file in the root directory) and Apache 2.0 License\n  (found in the LICENSE.Apache file in the root directory).",
              "licenses": [
                {
                  "key": "gpl-2.0",
                  "name": "GNU General Public License 2.0",
                  "short_name": "GPL 2.0",
                  "category": "Copyleft",
                  "is_exception": false,
                  "is_unknown": false,
                  "owner": "Free Software Foundation (FSF)",
                  "homepage_url": "http://www.gnu.org/licenses/gpl-2.0.html",
                  "text_url": "http://www.gnu.org/licenses/gpl-2.0.txt",
                  "reference_url": "https://scancode-licensedb.aboutcode.org/gpl-2.0",
                  "scancode_text_url": "https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/licenses/gpl-2.0.LICENSE",
                  "scancode_data_url": "https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/licenses/gpl-2.0.yml",
                  "spdx_license_key": "GPL-2.0-only",
                  "spdx_url": "https://spdx.org/licenses/GPL-2.0-only"
                },
                {
                  "key": "apache-2.0",
                  "name": "Apache License 2.0",
                  "short_name": "Apache 2.0",
                  "category": "Permissive",
                  "is_exception": false,
                  "is_unknown": false,
                  "owner": "Apache Software Foundation",
                  "homepage_url": "http://www.apache.org/licenses/",
                  "text_url": "http://www.apache.org/licenses/LICENSE-2.0",
                  "reference_url": "https://scancode-licensedb.aboutcode.org/apache-2.0",
                  "scancode_text_url": "https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/licenses/apache-2.0.LICENSE",
                  "scancode_data_url": "https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/licenses/apache-2.0.yml",
                  "spdx_license_key": "Apache-2.0",
                  "spdx_url": "https://spdx.org/licenses/Apache-2.0"
                }
              ]
            },
            {
              "score": 95.0,
              "start_line": 19,
              "end_line": 20,
              "matched_length": 19,
              "match_coverage": 100.0,
              "matcher": "2-aho",
              "license_expression": "bsd-new",
              "rule_identifier": "bsd-new_1169.RULE",
              "referenced_filenames": [
                "LICENSE"
              ],
              "is_license_text": false,
              "is_license_notice": true,
              "is_license_reference": false,
              "is_license_tag": false,
              "is_license_intro": false,
              "rule_length": 19,
              "rule_relevance": 95,
              "matched_text": "Use of this source code is governed by a BSD-style license that can be\n  found in the LICENSE file.",
              "licenses": [
                {
                  "key": "bsd-new",
                  "name": "BSD-3-Clause",
                  "short_name": "BSD-3-Clause",
                  "category": "Permissive",
                  "is_exception": false,
                  "is_unknown": false,
                  "owner": "Regents of the University of California",
                  "homepage_url": "http://www.opensource.org/licenses/BSD-3-Clause",
                  "text_url": "http://www.opensource.org/licenses/BSD-3-Clause",
                  "reference_url": "https://scancode-licensedb.aboutcode.org/bsd-new",
                  "scancode_text_url": "https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/licenses/bsd-new.LICENSE",
                  "scancode_data_url": "https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/licenses/bsd-new.yml",
                  "spdx_license_key": "BSD-3-Clause",
                  "spdx_url": "https://spdx.org/licenses/BSD-3-Clause"
                }
              ]
            }
          ]
        }
      ],
      "license_clues": [],

There was also a bug related to how we group matches into LicenseDetection, I have solved this to factor in license intros when doing this grouping.

Here are the scan results for you to look at:

Old scan just this issue:
doris-issue-3082.json.txt

New scan just this issue:
doris-add-license-detection-issue-3082.json.txt

Old scan entire file:
doris-v31.1.1-LICENSE-dist.json.txt

New scan entire file:
doris-add-license-detection-LICENSE-dist.json.txt

There are still a few unknown-license-reference remaining after this which are a result of lines like these:

 ANTLR 4 Runtime -- licenses/LICENSE-antlr4.txt
        - org.antlr:antlr4-runtime:4.7 (http://www.antlr.org/antlr4-runtime)
    * Automaton -- licenses/LICENSE-automaton.txt
        - dk.brics.automaton:automaton:1.11-8 (http://www.brics.dk/automaton/)
    * JLine -- licenses/LICENSE-jline.txt
        - jline:jline:2.12 (http://nexus.sonatype.org/oss-repository-hosting.html/jline)

But these are also beyond what we can achieve by generalized processing, but they can be solved by rules.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants