-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add binary version parsing for nodejs, php and python #6524
feat: Add binary version parsing for nodejs, php and python #6524
Conversation
Levente Kovacs seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
I signed the cla... Not sure why it's not triggering as signed. On my forked branch, every test passed, so if you trigger the tests it should be good. |
Hmm, while all tests are passing, I do not see the new binary results when I build the trivy from source and run the same experiment as in #6457, I'm not sure why that is the case. EDIT: Guess I realized that I not only need to implement the parsing, but the analyzer logic as part of fanal. I'll try that. |
I have spent some time with this. Now python version is detected correctly by trivy and added into the sbom in a way that xeol detects it properly according to my use case. I looked into implementing vulnerability scanning as well for standalone binaries, but I think at first only having it part of sboms is a step forward. There are a few problems regarding implementing this with standalone binaries:
The only way I see to implement this nicely is by adding CPE support and also putting together a Get function for the NVD vulnsrc linked in 3rd point. Other way: just hack together an NVD get interface since it'd be used only for the scanner as a fallback in case of generic binary is detected, not for regular vulnerability scanning. Now that I know how to implement these things, I think I can put together the remaining parts (Java, PHP, NodeJS binary as part of package detection and SBOMs) in the upcoming days and the PR would be ready. |
I have added support for PHP and Nodejs standalone binaries and it works like a charm with the sboms. I'm dropping support for Java because I found that it's harder to correctly determine the package type (JRE vs. JDK), and I won't go down the rabbithole since it's not part of my usecase. Only issue I discovered yesterday is that this will break lots of integration tests which I'll have to adjust those accordingly. After I'll adjust integration tests, I think this PR can be merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @kovacs-levent
Thanks for this PR.
I left some comments. Some these comments same for nodejs, php and python.
Also parsers are very similar.
I think we can merge them to 1 package. Something like this:
parser:
├── executable
│ ├── exe.go
│ ├── nodejs
│ ├── parse.go
│ ├── parse_test.go
│ └── testdata
│ ├── python
│ ├── parse.go
│ ├── parse_test.go
│ └── testdata
│ └── php
│ ├── parse.go
│ ├── parse_test.go
│ └── testdata
about analyzers:
looks like we can add new logic here - https://github.com/aquasecurity/trivy/blob/main/pkg/fanal/analyzer/executable/executable.go
If a php/python/npm binary is found, use its parser and add the library.
wdyt?
go.mod
Outdated
@@ -124,6 +124,7 @@ require ( | |||
github.com/alecthomas/chroma v0.10.0 | |||
github.com/antchfx/htmlquery v1.3.0 | |||
github.com/apparentlymart/go-cidr v1.1.0 | |||
github.com/aquasecurity/go-dep-parser v0.0.0-20240213093706-423cd04548a5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
go-dep-parser
in Trivy now - https://github.com/aquasecurity/trivy/tree/main/pkg/dependency/parser
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the go-dep-parser as a dependency, it only uses the internal package now
return nil, err | ||
} | ||
|
||
if bytes.HasPrefix(data, []byte("\x7FELF")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about windows
and macos
formats?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't thought about that yet. Will have to check on those.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried testing with Windows PEs, the interface can be implemented for exe
and OpenExe
, but the regex based detection methods don't work on the PE files I tried (I did manual testing to confirm). I honestly don't have any experience with macos binaries 🫤 So not sure if I can implement the interface for those. I also tried to obtain files to test with, but failed, since now I don't have access to a macos system. Any idea if regex based detection would work on those?
If not, I'd rather not bother with implementing this for those binaries, even though you could implement the interface and executableAnalyzer
would still run on those, the binary version detection won't give package in the AnalysisResult
.
} | ||
|
||
// Python's version pattern is [NUL]3.11.2[NUL] | ||
re := regexp.MustCompile(`^\d{1,4}\.\d{1,4}\.\d{1,4}[-._a-zA-Z0-9]*$`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a comment with a link where you get this regex?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's from syft... Should I link that? All the regexes I used, I mostly copied them from syft. For example python regex:
https://github.com/anchore/syft/blob/b04bc0fbfe5742a970aa3f06594b1e82c87eb4a5/syft/pkg/cataloger/binary/classifiers.go#L495
|
||
var libs []types.Library | ||
libs = append(libs, types.Library{ | ||
ID: packageID(name, vers), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You use dependent.ID
once. So you can just use it here.
ID: packageID(name, vers), | |
ID: dependency.ID(ftypes.PythonGeneric, name, version), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the packageID function from all parsers and replaced with dependency.ID call only
} | ||
} | ||
|
||
return "python", vers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to determine package name?
Looks like it might be confusing if all detected binaries are named "python"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so unfortunately, if there was additional package information, the standalone binary detection would not be needed. But I will look into it if it's obtainable from some metadata.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a way to try detect it with OS package managers, but they don't necessarily yield results unfortunately. I think if these were in package manager, then OS pkg analyzer would be scanning these, so probably also not the right way to go about it.
I wasn't able to find any way to figure out more concrete package name from the standalone binaries.
for _, tt := range tests { | ||
t.Run(tt.name, func(t *testing.T) { | ||
a := pythonBinaryAnalyzer{} | ||
fileInfo, err := os.Lstat(tt.filePath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we can use 1 file and change names, permissions, etc. in test.
Then we can reduce the number of copies of test files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'll rework the tests for fanal in an upcoming commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fanal test cases were reworked. It uses the common logic of executable tests. Added test cases for Python/Node/PHP binaries.
pkg/scanner/langpkg/scan.go
Outdated
@@ -65,7 +65,6 @@ func (s *scanner) Scan(target types.ScanTarget, _ types.ScanOptions) (types.Resu | |||
} | |||
|
|||
logger := log.WithPrefix(string(app.Type)) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we don't need this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, not sure why I did it, but I'll remove it in next commit, it's just some "automatic" line deletion I did I guess :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
@@ -0,0 +1,82 @@ | |||
// Ported from https://github.com/golang/go/blob/e9c96835971044aa4ace37c7787de231bbde05d9/src/cmd/go/internal/version/exe.go |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like Go 1.22 doesn't have this file. Can you please update the link and double check - we may need to add some changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I unfortunately didn't find this in Go 1.22 either. I'll take a look into this in next commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated the link and also the implementation was changed.
It's not a 1on1 port, since they repurposed it for capturing Go main module version, but we need to read whole executable file, but I used the same ideas. It also contains interfaces for other binaries (didn't put those in, only elf).
pythonLibNameRegex := regexp.MustCompile("^libpython[0-9]+(?:[.0-9]+)+[a-z]?[.]so.*$") | ||
pythonBinaryNameRegex := regexp.MustCompile("(?:.*/|^)python(?P<version>[0-9]+(?:[.0-9]+)+)?$") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will be great if you add some comment about these regex (e.g. link to docs)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as for the other regexes, it's from Syft, should I link that?
https://github.com/anchore/syft/blob/b04bc0fbfe5742a970aa3f06594b1e82c87eb4a5/syft/pkg/cataloger/binary/classifiers.go#L495
pkg/fanal/analyzer/const.go
Outdated
TypePip Type = "pip" | ||
TypePipenv Type = "pipenv" | ||
TypePoetry Type = "poetry" | ||
TypePythonGeneric Type = "python" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like we can use executable for new binaries
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change all the types to become "TypeXExecutable".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand what you meant here now, that all the new binaries could be TypeExecutable.
I use the language specific types in fanal executable.go, to figure out which binary type was detected in case of "detectable" binary. It can be localized to executable.go instead and use the common type for the "outside" return types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added them as a separate, new group of variables instead in this same file.
Thanks for the feedback, I will make the changes when I have the time, this week’s been busy, but I’ll try to make progress when I have the time. |
As you can see, I finally got to finish up this PR. I'm going to work some more on this in the upcoming days, so that all the comments are reflected in the changes. I did refactor the code to have the proposed structure. |
39c24d0
to
e6e3165
Compare
…y into parse-binary-versions
I had some troubles with rebasing, probably shouldn't have waited so long with that 😓 I had to force push, because other people's commits started appearing here, so I did a hard reset. Sorry for the ugly commit history 🤷 |
I have encountered a weird issue now... I found that if Rekor sbom source is not specificed, then ExecutableAnalyzer is getting disabled. https://github.com/aquasecurity/trivy/blob/main/pkg/commands/artifact/run.go#L514 Can someone shed some light on why is that? The commit message contains a PR... Seems like completely unrelated issue: #6163 For some reason, the whole executable analyzer was disabled, instead of only disabling Rekor queries based on the executables. Can you please provide some insight on whether it's fine to remove this, and how could I go about it to disable Rekor queries instead? @DmitriyLewen |
Problem is that, with Rekor enabled, it works fine for me, but it takes significantly longer to scan with Rekor, even though I'm not interested in that information. |
I fixed this issue in the following commit. Now if the Rekor flag isn't enabled, it only skips the "unpackaged" hook, so Rekor queries are only made if the Rekor is actually specified, but executable analysis still happens regardless. |
Hello @kovacs-levent I'm working on other priority tasks. Regards, Dmitriy |
@DmitriyLewen Any update regarding when this PR can get reviewed? |
@kovacs-levent Binray identification is already implemented in the commercial scanner of Aqua as described here: https://github.com/aquasecurity/resources/blob/main/trivy-aqua.md#vulnerability-scanning (this doc is linked from the contribution guide, but regardless we should have let you know earlier). I'm really sorry it took us long time to realize this, it is always best to start a discussion/issue before starting to work. Thanks again for your spirit of contribution. |
@itaysk Thanks for the clarification! @kovacs-levent Sorry to waste your time, I missed that a commercial scanner supports this... |
Let me get this straight, so just to make sure I'm not getting punked right now... I've created a discussion thread months ago about this topic #6457, in which @DmitriyLewen explicitly told me to "We are always happy to receive new contributions! If you have time and understanding, it will be very cool if you make a PR with the addition of support for Python binaries.". I've created the PR, then you @itaysk now randomly come after 3 months of 0 contribution to this the topic. Now you claim that I did not do proper discussion before starting work on this feature, and that you are now gatekeeping this feature from the open source project, because you built this into your commercial offering. To be frankly honest, I don't care about the commercial offering of aquasec, I'd like this feature to be present in trivy, the open source version, hence I started contributing to this open source project, which I believed to be community-driven. Not just a tool for aquasec to use and gatekeep features as they see fit. I have never seen such practices in open source projects. |
If anyone is actually interested in using such a feature in trivy, feel free to pull the changes from my forked repository: https://github.com/kovacs-levent/trivy/tree/parse-binary-versions I'm going to stop contributing I guess then, because you don't like open source community contributing to your project... |
Yes Trivy is an open source project but I wouldn't say it's "community-driven" like you put it - Aqua invests heavility into this project and Aqua employees do the overwhelming majority of the work on it, for the benefit of the community. We do encourage community contribution and we are proud to have had many many examples of this, and also community-driven initiatives as long as ithey fall within the project's scope as defined in the contribution guidelines (which every mature oss project should have). I hope this clarifies the situation and guidelines. |
@itaysk If your contribution guidelines are so mature, and explicitly mention what you're talking about, then why did it take you 3 months to realize that my feature request is out of scope? @DmitriyLewen also seemed to be completely unaware of such constraints, otherwise, I'm pretty sure he'd have mentioned it... You're just trying to save face now by saying all of that about your guidelines, but your contribution guidelines do not say anything about such features being out of scope... https://aquasecurity.github.io/trivy/v0.53/community/principles/ You only mention that runtime security, intentional attacks, and user interface is out of scope. My feature which I implemented does not fall into any of these categories. So your "mature oss project", has such a great contribution guidelines documented that not only me, but also your maintainers are unaware of what's in it then 😅 Also, whatever you linked previously, does not mention the fact that "do not work on such features", it's just a table to mention which feature is in which offering... You only mention in the guidelines that Which is all good, but nowhere do you actually say that these features CANNOT BE IMPLEMENTED in the open source version. Even if it's under "Out of scope features" section, this is not meaning what you're saying... You just mention categories which are not included, nowhere is it coherent enough to mention that such features are all out of scope. It is very vague... |
I don't think I need to spell it out how bad Trivy looks when you're scanning a distroless python image with it for example, and Trivy is completely unaware that Python binary is actually part of the image and it's not part of SBOM, vuln scans, or anything. You'd rather keep this behavior in Trivy than to mitigate a serious blindspot, causing (probably) thousands of missed vulnerabilties over time. You value your business over making the tool better, and mitigating serious blindspots in it. |
The table I referenced is linked from the trivy documentation under "Out of Scope Features". The table includes a section titled "Compiled binaries" which puts "identifies popular applications by huristics" as a commercial feature. We could improve the wording to be more clear about it. Even though this case is pretty obvious, the doc is not meant to be a definitive and comprehansive policy. We can't forsee all potential future suggestions and decide in advance what will be in/out of scope. It's possible that a feature suggestion will be rejected based on maintainer's discretion. You have made your point clear about the necessity of this feature, thanks for the feedback. We will consider it in the future as we re-evaluate the scope guidelines. |
This PR is stale because it has been labeled with inactivity. |
Description
Implement binary version parsing for at least key binaries. The feature should be extended upon, but first round, I'm going to be focusing stuff which matters for my use-case (PHP, NodeJS, Python)... I'm keeping Java binary parsing since it's already been implemented by laurentdelosieresmano in the related PR👍 Since go-dep-parser was moved to this Repo, I'm reopening the PR and adding Python dep parsing.
Related issues
Related PRs
Checklist