-
Notifications
You must be signed in to change notification settings - Fork 598
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Order CPEs deterministically for SBOM reproducibility #2967
Comments
I observed a similar problem when running the same scan directly with the syft cli twice in a row i get some non deterministic selection of the cpe in the CycloneDX sbom: {
"bom-ref": "pkg:pypi/cryptography@3.2.1?package-id=a4e081620662b87d",
"type": "library",
"author": "The cryptography developers <cryptography-dev@python.org>",
"name": "cryptography",
"version": "3.2.1",
"licenses": [
{
"license": {
"name": "BSD or Apache License, Version 2.0"
}
}
],
- "cpe": "cpe:2.3:a:python-cryptography_project:python-cryptography:3.2.1:*:*:*:*:*:*:*",
+ "cpe": "cpe:2.3:a:cryptography_project:cryptography:3.2.1:*:*:*:*:python:*:*",
"purl": "pkg:pypi/cryptography@3.2.1",
"properties": [
{
"name": "syft:package:foundBy",
"value": "python-installed-package-cataloger"
},
{
"name": "syft:package:language",
"value": "python"
},
{
"name": "syft:package:type",
"value": "python"
},
{
"name": "syft:package:metadataType",
"value": "python-package"
},
{
"name": "syft:cpe23",
- "value": "cpe:2.3:a:cryptography_project:cryptography:3.2.1:*:*:*:*:python:*:*"
+ "value": "cpe:2.3:a:python-cryptography_project:python-cryptography:3.2.1:*:*:*+:*:*:*:*"
},
{
"name": "syft:location:0:path",
"value": "usr/lib64/python3.6/site-packages/cryptography-3.2.1-py3.6.egg-info/PKG-INFO"
},
{
"name": "syft:location:1:path",
"value": "usr/lib64/python3.6/site-packages/cryptography-3.2.1-py3.6.egg-info/top_level.txt"
}
]
} I get that cpe's are not an exact science, but i think the selection of the cpe candidates should be done in an deterministic manner to reduce noise. |
I think the sorting of the cpe's would just need to be extended to use lexicographical order in case of "ties": syft/syft/cpe/by_source_then_specificity.go Lines 18 to 24 in 573440b
|
Thanks @luhring for the issue here. I've first focused on the CPE ordering. While I wasn't able to reproduce your exact example I've found non determinism in the sorting here:
I added your case and the one I found when reproducing to our tests found here(branch not yet pushed): After running them without cache to try and see if I could isolate where the reordering was happening and if it was indeed non deterministic at the cpe sorting level I got a 100% pass rate.
This tells me that the sort we have written is stable. Given your hypothesis about it only being for java packages I'm going to check there and make sure we're using the correct CPE sorting when assembling those packages as a first step to resolve this. |
@luhring check out the branch in draft I created here and let me know if that seems to work for the jenkins package you mentioned. I'm working on the jruby case now, but wanted to see if the comment I made at the bottom of that PR about certain java packages "winning" non deterministically for the final spot in the SBOM was similar to the JRUBY case you posted. The package case I found that caused me issues for my sample image that reproduce the issue was |
Thanks @spiffcs! I just tried out the repro steps using Syft at f7ffcc5 and I'm still seeing the nondeterminism, unfortunately. Do you want to share which part of the repro didn't work for you and we can go from there? I know it'd be easier to fix this if you could see what I'm seeing, so let me know how I can help. 🙇 |
@luhring I think it makes sense that you might still be seeing it - as of today I have an example I'm working with that has non deterministic order for java packages. Work is in progress to make that the same for every run. The issue arises when java packages are discovered in different order and one "wins" against another during deduplication. The winner during dedupe is not always consistent. I'm trying to track down all cases right now. I've got an example working with these two packages now that shows the inconsistency so I think we're in good shape as being on the same sheet of music
|
Okay great! It sounds like you're off and running. Let me know if you think of anything I can do here. 🙇 |
Hey @luhring -- I think everything reported in this issue has been taken care of, but please do let us know if you see any further instances of nonterminism! |
I've just run the tests several times with Syft 1.10.0 and it appears to be stable. 🎉 Thanks so much!! I'll report back if I find anything unexpected |
What happened:
I'm seeing nondeterministic behavior when using Syft as a library (in wolfictl) to generate SBOMs. I noticed this via new golden-file style tests we've introduced, to ensure we get the same output for the same input. For a couple of the test targets (which are each APK files), a test will fail on the next run immediately following that test's golden file update.
I'm not 100% sure this is Syft's fault yet, since there's wrapping code in wolfictl involved, too. But wanted to flag the issue here at least so we can discuss!
Here are some example diffs from two consecutive runs of the SBOM generation code under this test:
For
jenkins-2.461-r0.apk
For
jruby-9.4-9.4.7.0-r0.apk
:What you expected to happen:
Same exact output given same input!
Steps to reproduce the issue:
Check out https://github.com/wolfi-dev/wolfictl and run the test linked above. Note that you may have to run the test multiple times in order to get a complete sense of the results that code can produce. Also note that the first run of the test is doing a fetch of several APKs, so it will take considerably more time than subsequent test runs.
Anything else we need to know?:
So far the only test cases exhibiting this behavior are Java-based packages... 🤔
cc: @wagoodman, this is the thing we talked about briefly last week.
Environment:
syft version
:cat /etc/os-release
or similar): latest macOSThe text was updated successfully, but these errors were encountered: