Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add binary version parsing for nodejs, php and python #6524

Closed

Conversation

kovacs-levent
Copy link

@kovacs-levent kovacs-levent commented Apr 19, 2024

Description

Implement binary version parsing for at least key binaries. The feature should be extended upon, but first round, I'm going to be focusing stuff which matters for my use-case (PHP, NodeJS, Python)... I'm keeping Java binary parsing since it's already been implemented by laurentdelosieresmano in the related PR👍 Since go-dep-parser was moved to this Repo, I'm reopening the PR and adding Python dep parsing.

Related issues

Related PRs

Checklist

  • I've read the guidelines for contributing to this repository.
  • I've followed the conventions in the PR title.
  • I've added tests that prove my fix is effective or that my feature works.
  • I've updated the documentation with the relevant information (if needed).
  • I've added usage information (if the PR introduces new options)
  • I've included a "before" and "after" example to the description (if the PR is a user interface change).

@CLAassistant
Copy link

CLAassistant commented Apr 19, 2024

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


Levente Kovacs seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@kovacs-levent
Copy link
Author

I signed the cla... Not sure why it's not triggering as signed. On my forked branch, every test passed, so if you trigger the tests it should be good.

@kovacs-levent
Copy link
Author

kovacs-levent commented Apr 19, 2024

Hmm, while all tests are passing, I do not see the new binary results when I build the trivy from source and run the same experiment as in #6457, I'm not sure why that is the case.

EDIT: Guess I realized that I not only need to implement the parsing, but the analyzer logic as part of fanal. I'll try that.

@kovacs-levent
Copy link
Author

kovacs-levent commented Apr 20, 2024

I have spent some time with this. Now python version is detected correctly by trivy and added into the sbom in a way that xeol detects it properly according to my use case.

I looked into implementing vulnerability scanning as well for standalone binaries, but I think at first only having it part of sboms is a step forward.

There are a few problems regarding implementing this with standalone binaries:

  1. It's not an OS package, hence the whole PR. But that also means I can't implement the vuln scanning as part of the ospkg detector library. When I tried that, trivy started to correctly detect the vulns in the binaries, but it also discarded everything else due to conflicting OS versions (if I implement it as ospkg detector driver, then I just lost all dpkg vulns for example).
  2. It's also not really a langpkg. but this could be solved with some extra code as part of the driver logic and without messing more with the core. However, none of the ecosystems in trivy-db are really fit for querying vulnerabilities in these standalone binaries. The best bet would be NVD I thought.
  3. BUT NVD does not have an interface to query it in trivy-db. This means that I can't query NVD's information about the standalone binary, neither OS package sources.
  4. A Get function could be implemented for NVD, but I suspect that the best way to query it is through CPE2.3 vectors. For standalone-binaries, this is easy to generate correctly I think, but afaik, trivy does not support CPE for complexity reasons (it's also not in the SBOMs).

The only way I see to implement this nicely is by adding CPE support and also putting together a Get function for the NVD vulnsrc linked in 3rd point. Other way: just hack together an NVD get interface since it'd be used only for the scanner as a fallback in case of generic binary is detected, not for regular vulnerability scanning.

Now that I know how to implement these things, I think I can put together the remaining parts (Java, PHP, NodeJS binary as part of package detection and SBOMs) in the upcoming days and the PR would be ready.

@kovacs-levent kovacs-levent marked this pull request as draft April 20, 2024 08:37
@kovacs-levent kovacs-levent changed the title feat: Add binary version parsing for java, nodejs, php and python feat: Add binary version parsing for nodejs, php and python Apr 20, 2024
@kovacs-levent
Copy link
Author

I have added support for PHP and Nodejs standalone binaries and it works like a charm with the sboms. I'm dropping support for Java because I found that it's harder to correctly determine the package type (JRE vs. JDK), and I won't go down the rabbithole since it's not part of my usecase.

Only issue I discovered yesterday is that this will break lots of integration tests which I'll have to adjust those accordingly. After I'll adjust integration tests, I think this PR can be merged.

Copy link
Contributor

@DmitriyLewen DmitriyLewen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @kovacs-levent
Thanks for this PR.

I left some comments. Some these comments same for nodejs, php and python.

Also parsers are very similar.
I think we can merge them to 1 package. Something like this:
parser:

├── executable
│   ├── exe.go
│   ├── nodejs
│       ├── parse.go
│       ├── parse_test.go
│       └── testdata
│   ├── python
│       ├── parse.go
│       ├── parse_test.go
│       └── testdata
│   └── php
│       ├── parse.go
│       ├── parse_test.go
│       └── testdata

about analyzers:
looks like we can add new logic here - https://github.com/aquasecurity/trivy/blob/main/pkg/fanal/analyzer/executable/executable.go

If a php/python/npm binary is found, use its parser and add the library.

wdyt?

go.mod Outdated
@@ -124,6 +124,7 @@ require (
github.com/alecthomas/chroma v0.10.0
github.com/antchfx/htmlquery v1.3.0
github.com/apparentlymart/go-cidr v1.1.0
github.com/aquasecurity/go-dep-parser v0.0.0-20240213093706-423cd04548a5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the go-dep-parser as a dependency, it only uses the internal package now

return nil, err
}

if bytes.HasPrefix(data, []byte("\x7FELF")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about windows and macos formats?

Copy link
Author

@kovacs-levent kovacs-levent Jun 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't thought about that yet. Will have to check on those.

Copy link
Author

@kovacs-levent kovacs-levent Jun 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried testing with Windows PEs, the interface can be implemented for exe and OpenExe, but the regex based detection methods don't work on the PE files I tried (I did manual testing to confirm). I honestly don't have any experience with macos binaries 🫤 So not sure if I can implement the interface for those. I also tried to obtain files to test with, but failed, since now I don't have access to a macos system. Any idea if regex based detection would work on those?

If not, I'd rather not bother with implementing this for those binaries, even though you could implement the interface and executableAnalyzer would still run on those, the binary version detection won't give package in the AnalysisResult.

}

// Python's version pattern is [NUL]3.11.2[NUL]
re := regexp.MustCompile(`^\d{1,4}\.\d{1,4}\.\d{1,4}[-._a-zA-Z0-9]*$`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment with a link where you get this regex?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's from syft... Should I link that? All the regexes I used, I mostly copied them from syft. For example python regex:
https://github.com/anchore/syft/blob/b04bc0fbfe5742a970aa3f06594b1e82c87eb4a5/syft/pkg/cataloger/binary/classifiers.go#L495


var libs []types.Library
libs = append(libs, types.Library{
ID: packageID(name, vers),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You use dependent.ID once. So you can just use it here.

Suggested change
ID: packageID(name, vers),
ID: dependency.ID(ftypes.PythonGeneric, name, version),

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the packageID function from all parsers and replaced with dependency.ID call only

}
}

return "python", vers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to determine package name?
Looks like it might be confusing if all detected binaries are named "python"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so unfortunately, if there was additional package information, the standalone binary detection would not be needed. But I will look into it if it's obtainable from some metadata.

Copy link
Author

@kovacs-levent kovacs-levent Jun 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a way to try detect it with OS package managers, but they don't necessarily yield results unfortunately. I think if these were in package manager, then OS pkg analyzer would be scanning these, so probably also not the right way to go about it.

I wasn't able to find any way to figure out more concrete package name from the standalone binaries.

for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
a := pythonBinaryAnalyzer{}
fileInfo, err := os.Lstat(tt.filePath)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we can use 1 file and change names, permissions, etc. in test.
Then we can reduce the number of copies of test files.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'll rework the tests for fanal in an upcoming commit.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fanal test cases were reworked. It uses the common logic of executable tests. Added test cases for Python/Node/PHP binaries.

@@ -65,7 +65,6 @@ func (s *scanner) Scan(target types.ScanTarget, _ types.ScanOptions) (types.Resu
}

logger := log.WithPrefix(string(app.Type))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we don't need this change.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, not sure why I did it, but I'll remove it in next commit, it's just some "automatic" line deletion I did I guess :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

@@ -0,0 +1,82 @@
// Ported from https://github.com/golang/go/blob/e9c96835971044aa4ace37c7787de231bbde05d9/src/cmd/go/internal/version/exe.go
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like Go 1.22 doesn't have this file. Can you please update the link and double check - we may need to add some changes.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I unfortunately didn't find this in Go 1.22 either. I'll take a look into this in next commit.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated the link and also the implementation was changed.

It's not a 1on1 port, since they repurposed it for capturing Go main module version, but we need to read whole executable file, but I used the same ideas. It also contains interfaces for other binaries (didn't put those in, only elf).

Comment on lines 41 to 42
pythonLibNameRegex := regexp.MustCompile("^libpython[0-9]+(?:[.0-9]+)+[a-z]?[.]so.*$")
pythonBinaryNameRegex := regexp.MustCompile("(?:.*/|^)python(?P<version>[0-9]+(?:[.0-9]+)+)?$")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will be great if you add some comment about these regex (e.g. link to docs)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TypePip Type = "pip"
TypePipenv Type = "pipenv"
TypePoetry Type = "poetry"
TypePythonGeneric Type = "python"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like we can use executable for new binaries

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change all the types to become "TypeXExecutable".

Copy link
Author

@kovacs-levent kovacs-levent Jun 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand what you meant here now, that all the new binaries could be TypeExecutable.

I use the language specific types in fanal executable.go, to figure out which binary type was detected in case of "detectable" binary. It can be localized to executable.go instead and use the common type for the "outside" return types.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added them as a separate, new group of variables instead in this same file.

@kovacs-levent
Copy link
Author

Thanks for the feedback, I will make the changes when I have the time, this week’s been busy, but I’ll try to make progress when I have the time.

@kovacs-levent
Copy link
Author

kovacs-levent commented Jun 3, 2024

As you can see, I finally got to finish up this PR. I'm going to work some more on this in the upcoming days, so that all the comments are reflected in the changes.

I did refactor the code to have the proposed structure.

@kovacs-levent
Copy link
Author

I had some troubles with rebasing, probably shouldn't have waited so long with that 😓 I had to force push, because other people's commits started appearing here, so I did a hard reset. Sorry for the ugly commit history 🤷

@kovacs-levent
Copy link
Author

kovacs-levent commented Jun 4, 2024

I have encountered a weird issue now... I found that if Rekor sbom source is not specificed, then ExecutableAnalyzer is getting disabled. https://github.com/aquasecurity/trivy/blob/main/pkg/commands/artifact/run.go#L514 Can someone shed some light on why is that? The commit message contains a PR... Seems like completely unrelated issue: #6163

For some reason, the whole executable analyzer was disabled, instead of only disabling Rekor queries based on the executables. Can you please provide some insight on whether it's fine to remove this, and how could I go about it to disable Rekor queries instead? @DmitriyLewen

@kovacs-levent
Copy link
Author

Problem is that, with Rekor enabled, it works fine for me, but it takes significantly longer to scan with Rekor, even though I'm not interested in that information.

@kovacs-levent
Copy link
Author

I have encountered a weird issue now... I found that if Rekor sbom source is not specificed, then ExecutableAnalyzer is getting disabled. https://github.com/aquasecurity/trivy/blob/main/pkg/commands/artifact/run.go#L514 Can someone shed some light on why is that? The commit message contains a PR... Seems like completely unrelated issue: #6163

For some reason, the whole executable analyzer was disabled, instead of only disabling Rekor queries based on the executables. Can you please provide some insight on whether it's fine to remove this, and how could I go about it to disable Rekor queries instead? @DmitriyLewen

I fixed this issue in the following commit. Now if the Rekor flag isn't enabled, it only skips the "unpackaged" hook, so Rekor queries are only made if the Rekor is actually specified, but executable analysis still happens regardless.

@DmitriyLewen
Copy link
Contributor

Hello @kovacs-levent
Thanks for your work!

I'm working on other priority tasks.
I will consider this PR when I have time for it

Regards, Dmitriy

@kovacs-levent
Copy link
Author

@DmitriyLewen Any update regarding when this PR can get reviewed?

@itaysk
Copy link
Contributor

itaysk commented Jul 4, 2024

@kovacs-levent Binray identification is already implemented in the commercial scanner of Aqua as described here: https://github.com/aquasecurity/resources/blob/main/trivy-aqua.md#vulnerability-scanning (this doc is linked from the contribution guide, but regardless we should have let you know earlier). I'm really sorry it took us long time to realize this, it is always best to start a discussion/issue before starting to work. Thanks again for your spirit of contribution.

@DmitriyLewen

@DmitriyLewen
Copy link
Contributor

@itaysk Thanks for the clarification!

@kovacs-levent Sorry to waste your time, I missed that a commercial scanner supports this...

@kovacs-levent
Copy link
Author

Let me get this straight, so just to make sure I'm not getting punked right now...

I've created a discussion thread months ago about this topic #6457, in which @DmitriyLewen explicitly told me to "We are always happy to receive new contributions! If you have time and understanding, it will be very cool if you make a PR with the addition of support for Python binaries.".

I've created the PR, then you @itaysk now randomly come after 3 months of 0 contribution to this the topic. Now you claim that I did not do proper discussion before starting work on this feature, and that you are now gatekeeping this feature from the open source project, because you built this into your commercial offering.

To be frankly honest, I don't care about the commercial offering of aquasec, I'd like this feature to be present in trivy, the open source version, hence I started contributing to this open source project, which I believed to be community-driven. Not just a tool for aquasec to use and gatekeep features as they see fit. I have never seen such practices in open source projects.

@kovacs-levent
Copy link
Author

If anyone is actually interested in using such a feature in trivy, feel free to pull the changes from my forked repository:

https://github.com/kovacs-levent/trivy/tree/parse-binary-versions

I'm going to stop contributing I guess then, because you don't like open source community contributing to your project...

@itaysk
Copy link
Contributor

itaysk commented Jul 9, 2024

Yes Trivy is an open source project but I wouldn't say it's "community-driven" like you put it - Aqua invests heavility into this project and Aqua employees do the overwhelming majority of the work on it, for the benefit of the community. We do encourage community contribution and we are proud to have had many many examples of this, and also community-driven initiatives as long as ithey fall within the project's scope as defined in the contribution guidelines (which every mature oss project should have). I hope this clarifies the situation and guidelines.

@kovacs-levent
Copy link
Author

kovacs-levent commented Jul 10, 2024

@itaysk If your contribution guidelines are so mature, and explicitly mention what you're talking about, then why did it take you 3 months to realize that my feature request is out of scope? @DmitriyLewen also seemed to be completely unaware of such constraints, otherwise, I'm pretty sure he'd have mentioned it...

You're just trying to save face now by saying all of that about your guidelines, but your contribution guidelines do not say anything about such features being out of scope... https://aquasecurity.github.io/trivy/v0.53/community/principles/

You only mention that runtime security, intentional attacks, and user interface is out of scope. My feature which I implemented does not fall into any of these categories. So your "mature oss project", has such a great contribution guidelines documented that not only me, but also your maintainers are unaware of what's in it then 😅

Also, whatever you linked previously, does not mention the fact that "do not work on such features", it's just a table to mention which feature is in which offering... You only mention in the guidelines that
"Aqua Security offers a premium version with several features not available in the open-source Trivy project."

Which is all good, but nowhere do you actually say that these features CANNOT BE IMPLEMENTED in the open source version. Even if it's under "Out of scope features" section, this is not meaning what you're saying... You just mention categories which are not included, nowhere is it coherent enough to mention that such features are all out of scope. It is very vague...

@kovacs-levent
Copy link
Author

I don't think I need to spell it out how bad Trivy looks when you're scanning a distroless python image with it for example, and Trivy is completely unaware that Python binary is actually part of the image and it's not part of SBOM, vuln scans, or anything.

You'd rather keep this behavior in Trivy than to mitigate a serious blindspot, causing (probably) thousands of missed vulnerabilties over time. You value your business over making the tool better, and mitigating serious blindspots in it.

@itaysk
Copy link
Contributor

itaysk commented Jul 10, 2024

The table I referenced is linked from the trivy documentation under "Out of Scope Features". The table includes a section titled "Compiled binaries" which puts "identifies popular applications by huristics" as a commercial feature. We could improve the wording to be more clear about it.

Even though this case is pretty obvious, the doc is not meant to be a definitive and comprehansive policy. We can't forsee all potential future suggestions and decide in advance what will be in/out of scope. It's possible that a feature suggestion will be rejected based on maintainer's discretion.

You have made your point clear about the necessity of this feature, thanks for the feedback. We will consider it in the future as we re-evaluate the scope guidelines.

Copy link

github-actions bot commented Sep 9, 2024

This PR is stale because it has been labeled with inactivity.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and will be auto-closed. label Sep 9, 2024
@github-actions github-actions bot closed this Sep 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/stale Denotes an issue or PR has remained open with no activity and will be auto-closed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants