-
-
Notifications
You must be signed in to change notification settings - Fork 562
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
False positive AGPL detection from a mere URL #2877
Comments
@pombredanne I have a couple of possible cases here which could be clues, as opposed to detections. what do you think? i.e. they will have is_clue as Cases where is_clue = True:
Cases where is_clue is False: i.e. these are valid detections
Also attaching a csv file with a subset of the rules (is_license_reference = True and relevance < 100): |
This makes 100% sense... we have to thread lightly though..
|
This seems like a recipe for noise in the output. Or possibly the need for more granular levels of clue. (strong clue / weak clue). But I would probably lean towards "if it isn't actually useful signal, it's not interesting". What are you going to do with the clues once you have them? One of the challenges with these heuristics is context, or the lack thereof. I had a case a few weeks ago where https://github.com/svaarala/duktape/blob/master/website/index/index.html got scanned. It contains
Triggering off license name results in false positives for Duktape, even though this section is actually talking about other products. This particular example is more complicated/subtle than most of the other examples in this bug, so might be a distraction, but it's still interesting. |
We are detecting an AGPL with
agpl-3.0-plus_152.RULE
and this texthttp://www.ghostscript.com
... for instance from https://github.com/ReactiveX/rxjs/blob/6.x/README.mdThis is noisy.
There are two ways out:
The text was updated successfully, but these errors were encountered: