-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for lower bound exclusive version ranges #31
Comments
Thanks for filing this issue! The meaning of ">" seems overloaded in this case, in that there are two potential definitions? Either as a "last known good" (i.e. the definition of an exclusive lower bound), or meaning that the given version is affected but it potentially goes back further into a pre-release (so not really an "exclusive" lower bound) ? We could potentially add a "introduced_after" event to capture an exclusive lower bound, but this could potentially introduce incompatibilities and prevent interop with other standards such as the CVE 5.0 schema which does not have a similar operator. Would it be feasible at all to just encode {
"ranges": {
"events": [
{"introduced": "2.3.3"},
],
}
"database_specific": {
"needs_investigation": ["2.3.3"]
}
} @rsc for thoughts too |
I can see how overloading the use here is problematic. Let me think about how we might make a case for both interpretations though, as both are legitimate scenarios that deserve representation. In the LKG scenario, the trouble with determining "next_available_version" from the LKG version is it requires interacting with each ecosystem to determine what comes next and that's not always possible. If we could do that, then this wouldn't be an issue at all because In the case where we need to cover I will go back to my curation team about |
@oliverchang I've revised the OP to try to remove the operator overloading problem. I think in many cases the argument can be made that one could simply locate the first-known-bad from the last-known-good, but I think there are reasonable use cases for supporting a lower bound exclusive for instances where that's not possible or practical. I want to call out that this isn't a blocker for us at at the moment, not specifically, but I think it would be helpful to be able to support it going forward. |
Thanks for updating the examples! To clarify example 1 -- do you mean specifying Assuming access to the ecosystem by checking the list of released versions (or just bumping the version number for well defined schemes such as SemVer), the second example could also be encoded as
But I can definitely see the ease of allowing >! I'm just concerned about introducing an incompatibility with CVE 5.0 (which I believe has no analogue to > either). This was also discussed here as part of the 5.0 schema discussions where it was decided against to have "versionAfter" due to the small number of uses: CVEProject/cve-schema#87 (comment) @rsc do you have any thoughts here? |
The reasons against adding > seem to be:
Concrete examples are always essential for thinking through these issues, so thank you for pointing at Log4j. The GHSA description says:
I see that the GHSA encoded this text as:
But that seems wrong to me. Is the vuln fixed in 2.3.2 but not in 2.3.3? Surely not. It seems that 2.3.3 simply doesn't exist right now. Writing it down that way is problematic because when the next Log4j vuln is found and they do issue 2.3.3 and people upgrade, vuln checkers will start reporting that 2.3.3 is vulnerable to the original vuln. Same issue for the 2.12.x branch starting at 2.12.5. So in this case at least, making > available would lead to an inaccurate encoding of versions that would lead automated version decisions to incorrect results. This seems like a good reason not to add >. Also in this case, log4j's source repo is available (as it will be for essentially any open source vulnerability in OSV), and we can inspect the tags in the repo to identify precise start points. It's worth noting that they use Maven ordering and not SemVer ordering, although it doesn't affect the outcome too much in this case. Reading through the tags after cloning the repo, there were no 2.4 prereleases, so >= 2.4 is more accurate than > 2.3.2, and the first 2.13 sequence was 2.13.0-rc1, so >= 2.13.0-rc1 is more accurate than > 2.12.4. So in this case the non-buggy way to write the versions for Log4j is:
My conclusion from this is that Log4j by itself does not justify adding >. Going back to the more hypothetical case from Example 1:
For an open source project, I don't see why this would be the case. First, there should be a commit that introduced the vuln, and 'git tag --contains' should tell us which prereleases are vulnerable. If we don't know the commit, we really should find out to improve the precision. Second, even if we don't know the commit and cannot take the time to find out, we can certainly fetch a list of all the prereleases and identify the very first one in semver order and use that. Also, in this scenario, using > last-known-good, when last-known-good is on a different minor version branch, produces exactly the same future false positive as in the Log4j case. That's a significant downside, because once again it gets in the way of accurate machine-generated reports of vulnerabilities. My conclusion from this is that the hypothetical Example 1 does not justify adding > either. Trying to generalize, what these two examples seem to show is:
Are there other examples we should consider that would contradict these conclusions? |
Closing this topic for now. Our reliance on this operator is primarily due to a limitation in our internal curation process and I don't believe we can make a compelling enough argument for adding support for it externally. |
When defining vulnerable version ranges, the
>
operator, aka lower bound exclusive, represents the last-known-good release before a vulnerability is introduced. It might be argued that if one knows the last-known-good, one should also know the first-known-bad. However, this isn't always possible, and in some cases there are legitimate reasons for using>
even if one has access to the first-known-bad version.Example 1 - prereleases
In SemVer syntax,
1.2.3.alpha1
is less than1.2.3
because the former is a prerelease of the latter. If a vulnerability is known to have been introduced in a particular prerelease then one could simply say>= 1.2.3.rc2
which can be converted to anintroduced
range event type to cover all latter prereleases plus the release itself, assuming the order of the prereleases is alphabetical and machine parseable. However, sometimes there may be so many prereleases that it's not possible to immediately identify which ones are affected. In this case, one way to ensure that all prereleases are covered is to specify> last-known-good
.Example 2 - interlaced affected versions
If a package has multiple affected versions with interlaced fixes, it's reasonable to list them like so:
That example is from the recent log4j vulnerability.
Proposal
One suggestion for handling this is to extend
affected[].ranges[].events[]
to accept an event with a "last known good" event, and change the validation so that at least oneintroduced
OR "last known good" field is present. This can still be programmatically evaluated by comparing a given version to see if it's>
the last known good, just as you could compare it to see if it's>=
andintroduced
event. This could be expressed in the evaluation example in the schema docs by slightly altering theIncludedInRanges
method:The text was updated successfully, but these errors were encountered: