Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scan and report dependency groups of vulnerabilities #655

Merged
merged 20 commits into from
Dec 11, 2023
Merged

Conversation

cuixq
Copy link
Contributor

@cuixq cuixq commented Nov 15, 2023

Issue #332

Non-default dependency groups are recorded in strings as per eco-system:

  • Composer: development dependencies in packages-dev
  • Conan: dependencies in build-requires and python-requires
  • Maven: <scope/> in <dependency/>
  • npm: dev and optional dependencies
  • pipenv: development dependencies in develop
  • pnpm: development dependencies with dev as true
  • Poetry: optional dependencies with optional = true
  • Pubspec: development dependencies marked with dev
  • requirements.txt: group of a dependency is the file name without the extension

Reporters:

  • table: non-default groups are appended to the end of package name, for example: abc (development)
  • json: non-default group information in dependencyGroups

@codecov-commenter
Copy link

codecov-commenter commented Nov 15, 2023

Codecov Report

Attention: 16 lines in your changes are missing coverage. Please review.

Comparison is base (cfe3604) 78.86% compared to head (11d7a28) 78.92%.

Files Patch % Lines
pkg/lockfile/types.go 55.55% 7 Missing and 1 partial ⚠️
internal/output/table.go 40.00% 2 Missing and 1 partial ⚠️
pkg/lockfile/parse-maven-lock.go 70.00% 2 Missing and 1 partial ⚠️
pkg/osv/osv.go 60.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #655      +/-   ##
==========================================
+ Coverage   78.86%   78.92%   +0.06%     
==========================================
  Files          84       85       +1     
  Lines        5943     6032      +89     
==========================================
+ Hits         4687     4761      +74     
- Misses       1054     1066      +12     
- Partials      202      205       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@G-Rath
Copy link
Collaborator

G-Rath commented Nov 15, 2023

I've not had a full look over this, but fwiw I think it would be better to make groups an array rather than a single string because some ecosystems support packages belonging to multiple groups i.e. at least Bundler and Poetry.

While I don't think that's expressed in pretty much any of the lockfiles, given the accuracy goals of the scanner I don't think it's unreasonable to expect (/hope) that one day the scanner will be able to determine that (such as by looking at related files) and so having an array now should make it easier to support (otherwise it'll be a breaking change).

I'm also not sure if it's worth making public constants for all the different types of groups, but I'm less fussed about that 🤷

@cuixq cuixq requested a review from G-Rath November 15, 2023 00:58
@cuixq
Copy link
Contributor Author

cuixq commented Nov 15, 2023

We actually had a discussion about if we should make the group info an array (my first proposal was to make it an array). Considering the purpose is to help users prfioritize what vulnerabilities to handle, we may only need to know if a dependency is of the default group or not. Then I tried to just use a bool value to represent that, but this makes it harder to provide more information to users in the reporters, so I finally made this a string. However I think if we do care about the exact group names, we should make it an array.

Regarding the constants, it is of the similar consideration - I would like to display the information as close as to the ecosystem terminology since this dependency group information is quite different among ecosystems. I am happy to remove them and just use the strings.

Also when I was researching the dependency groups for different ecosystems, I realised that some ecosystems only have this information available in manifest but not lockfile, so if we want to have a better support on this, we should also support parsing manifests.

@G-Rath
Copy link
Collaborator

G-Rath commented Nov 15, 2023

Yeah it comes down to what your priorities are - personally I think at least in the lockfile/library space the scanner should be trying to capture info in a generally neutral manner i.e. it should not decide which group of dependency takes priority, it should just capture all groups a package is part of.

Instead deciding how to treat the different groups should be a downstream responsibility (e.g. being handled by the scanner CLI component, scorecard who are just using the library, etc).

Case and point, NPM records if a package is a dev dependency, peer dependency, and optional dependency, and you demonstrate in your test cases its possible for a dependency to be both dev and optional - you have then had to choose one to favor because you cannot record both (incidentally I believe you should be favoring "dev" not "optional" because the latter will always be installed if possible, where the former might be excluded in production type builds).

You're also not capturing the groupings for requirements.txt correctly: if a dependency is in dev.txt, it gets marked as such even though it could be in production.txt too (or similar file)

@cuixq
Copy link
Contributor Author

cuixq commented Nov 15, 2023

Agree with the point that the parser should capture all the relevant information and let the downstream to decide.

This reminds me of one thing about making groups as an array: how we record the default group?

  • Nothing: then this may cause problem if the package is of the default group and another group since only the non-default one is recorded
  • An empty string: sounds like a better option, but this means the array will be of length one with an empty string very often (default group packages). will this cause confusion?
  • Still string, but with the default group name: similar to above, in addition with the information that makes more sense. Will this cause any problem of using too much space somewhere? I feel is no, since this is not stored somewhere, but just pass to the users

@cuixq
Copy link
Contributor Author

cuixq commented Nov 15, 2023

Also regarding the parsers, feel free to point out anything that I am not doing it correctly!

I am not an expert on all the ecosystems, and some of the implementation is just based on my brief research.

There is also a TODO for Gradle: I saw developmentOnly in the lockfile, but not sure if this indicates the development dependencies - I cannot find any official documentation on this.

@G-Rath
Copy link
Collaborator

G-Rath commented Nov 15, 2023

For now I think it should be fine to just capture groups as they are - so in the case of e.g. NPM which doesn't record production: true, nothing gets added to the array.

I'm pretty sure the only ecosystem currently where that might cause confusion is with requirements.txt but that would be resolvable by saying the rule there is the filename sans the extension is the group, because at best you'd end up with base, production, etc and at worse you'll have requirements.txt but because the downstream knows the name of the original lockfile it should be able to easily filter that out if needed.

Also regarding the parsers, feel free to point out anything that I am not doing it correctly!

You're doing fine 🙂 while I wrote most of them and all of semantic, that was done more by being comfortable hacking with the package managers than knowing each of them in depth - especially ones like for Java and C++ I'm pretty happy to be just be told if something is wrong and if it should be done another way for the ecosystem 🤷

@cuixq cuixq marked this pull request as draft November 16, 2023 03:16
@cuixq cuixq marked this pull request as ready for review November 16, 2023 23:55
internal/ci/vulnerability_result_diff.go Outdated Show resolved Hide resolved
outputRow = append(outputRow, pkg.Package.Ecosystem, pkg.Package.Name, pkg.Package.Version)
name := pkg.Package.Name
if len(pkg.DepGroups) > 0 {
name += fmt.Sprintf(" (%s)", strings.Join(pkg.DepGroups, ","))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, can we perhaps be more opinionated here in the table output, and just output e.g. "(dev)" in the cases where it is a dev dependency?

Users may not care about optional etc most of the time, and it would just clutter this table. If they do want the exact group details, they can use the JSON output.

We can define perhaps some kind of per-ecosystem function which just does:

type Ecosystem interface {
  IsDevGroup(groups []string) bool
}

or something similar per ecosystem?

@another-rex @G-Rath wdyt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ecosystem is now a string can we still make it an interface?

Or, we can have something like this?

func (s Ecosystem) IsDevGroup(groups []string) bool {
  switch s {
  case CargoEcosystem: 
     ...
  case NpmEcocystem:
    ...
  }
  ...
  return false;
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree we should just stick to a single output in the table output, but I think we should do it with a separate column.

I'm thinking "prod" column with a checkmark if it's a production dependency, and empty if not a production dep. This way it makes it applicable to all the other not "dev" named dependencies (e.g. optional in npm).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re column, I think the reason we went with a "(dev)" marker for now is that it because the column becomes redundant whenever we do/support any kind of filtering, and makes things inconsistent.

This is also somethign we can change at any time, given that there's no stability guarantees for the table/human readable output.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a method on Ecosystem to check if there is dev group in the provided groups.

@@ -122,7 +124,7 @@ func (e DpkgStatusExtractor) Extract(f DepFile) ([]PackageDetails, error) {

// PackageDetails does not contain any field that represent a "not installed" state
// To manage this state and avoid false positives, empty struct means "not installed" so skip it
if (PackageDetails{}) == pkg {
if reflect.DeepEqual(PackageDetails{}, pkg) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my understanding is that pulling in reflect can be a big deal in Go, and at at least in the past with some methods (which tbh probably don't include DeepEqual) could trigger sizeable deoptimizations.

I don't think this check is anything special and very internal so could be easily replaced with something like returning a pointer (so that we can then return nil), returning an error that we can eat here, or setting the Name to something like <not installed>.

I think that's probably worth doing either way so that we're not as dependent on the whole structure of PackageDetails

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I modified the code here to check each field in PackageDetails and if they are all "empty". wdty?

@@ -23,6 +23,7 @@ type PnpmLockPackage struct {
Resolution PnpmLockPackageResolution `yaml:"resolution"`
Name string `yaml:"name"`
Version string `yaml:"version"`
Dev string `yaml:"dev"`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we be able to parse this as a bool directly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch - I remember bool didn't work so I put string here but I just tried this again and bool is perfectly fine.

josieang and others added 6 commits November 27, 2023 20:00
this just makes it easier for our users to use.
Note that the `SBOMReader` refactor wasn't required for this, but my IDE
flagged it so I just included it here 🤷
[![Mend
Renovate](https://app.renovatebot.com/images/banner.svg)](https://renovatebot.com)

This PR contains the following updates:

| Package | Type | Update | Change |
|---|---|---|---|
| [deps.dev/api/v3alpha](https://github.com/google/deps.dev) | require
| digest | `a2ccd03` -> `e40c4d5` |
| golang.org/x/exp | require | digest | `7918f67` -> `9a3e603` |
| golang.org/x/term | require | minor | `v0.13.0` -> `v0.14.0` |

---

### Configuration

📅 **Schedule**: Branch creation - "before 6am on monday" in timezone
Australia/Sydney, Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you
are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the
rebase/retry checkbox.

👻 **Immortal**: This PR will be recreated if closed unmerged. Get
[config help](https://github.com/renovatebot/renovate/discussions) if
that's undesired.

---

- [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check
this box

---

This PR has been generated by [Mend
Renovate](https://www.mend.io/free-developer-tools/renovate/). View
repository job log
[here](https://developer.mend.io/github/google/osv-scanner).

<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy40Ni4wIiwidXBkYXRlZEluVmVyIjoiMzcuNDYuMCIsInRhcmdldEJyYW5jaCI6Im1haW4ifQ==-->
pkg/lockfile/dpkg-status.go Outdated Show resolved Hide resolved
Copy link
Collaborator

@another-rex another-rex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@cuixq cuixq removed the request for review from michaelkedar December 8, 2023 02:36
@cuixq cuixq merged commit 64f5c2d into google:main Dec 11, 2023
9 checks passed
@cuixq cuixq deleted the group branch December 11, 2023 04:10
Copy link
Collaborator

@oliverchang oliverchang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @cuixq!! This looks awesome. Sorry for the delay in reviewing this.

@spencerschrock
Copy link
Contributor

Out of curiosity, what's the difference between PackageVulns and VulnerabilityFlattened?

I see Licenses and LicenseViolations exist in both, but DepGroups was only added to PackageVulns. Is this intentional?

@cuixq
Copy link
Contributor Author

cuixq commented Jan 18, 2024

good find - we should add DepGroups to VulnerabilityFlattened as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants