Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add "vertical" output format #889

Merged
merged 13 commits into from
Jul 26, 2024
Merged

Conversation

G-Rath
Copy link
Collaborator

@G-Rath G-Rath commented Mar 26, 2024

This adds a new "vertical" output format that is designed for humans and based on the output of osv-detector, which effectively aims to group the output relating to each entity being scanned in vertical slices:

image

Unfortunately I think it suffers significantly due to the assumptions made by the rest of the codebase for outputting that made sense when the final output was a table i.e. we dump a lot of information as we go about scanning, config files, vulnerability filtering, and so on that really should be grouped but currently cannot because they're all outputted at different stages - I think a way to address that could be using some sort of event-emitter type pattern so that the reporters could be responsible for deciding what they actually do (e.g. r.Emit("filtered-vulnerability", ...) and then most reporters could choose to just print immediately, and ones like "vertical" could choose to add it to an internal struct), but I think that'll involve a lot more work; for now I'm just going to ignore the pre-results output.

Resolves #85

@G-Rath G-Rath force-pushed the add-vertical-reporter branch 3 times, most recently from 505d532 to bae0efe Compare March 28, 2024 02:59
@codecov-commenter
Copy link

codecov-commenter commented Mar 28, 2024

Codecov Report

Attention: Patch coverage is 93.33333% with 14 lines in your changes missing coverage. Please review.

Project coverage is 66.55%. Comparing base (1ea785e) to head (347ca61).

Files Patch % Lines
pkg/reporter/vertical_reporter.go 70.27% 11 Missing ⚠️
internal/output/vertical.go 98.24% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #889      +/-   ##
==========================================
+ Coverage   66.11%   66.55%   +0.44%     
==========================================
  Files         160      162       +2     
  Lines       12792    13002     +210     
==========================================
+ Hits         8457     8654     +197     
- Misses       3877     3889      +12     
- Partials      458      459       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@G-Rath

This comment was marked as resolved.

another-rex pushed a commit that referenced this pull request May 24, 2024
This introduces a set of crafted scanner results that each supported
`output` format is run through to showcase how they look across all the
different results possible from a scanner run - it originally started
life as the tests for #889 but I realised they could base used more
generally for testing and reviewing all the outputters, so here we are.

~It looks like this has also revealed the SARIF output is unstable in
its ordering, which I'll aim to address in a dedicated PR~
@G-Rath G-Rath force-pushed the add-vertical-reporter branch 2 times, most recently from d5c8f4d to b266097 Compare May 24, 2024 19:47
@G-Rath

This comment was marked as resolved.

@G-Rath G-Rath force-pushed the add-vertical-reporter branch 3 times, most recently from 1ecd8e0 to 14e97cd Compare May 28, 2024 01:08
@G-Rath G-Rath marked this pull request as ready for review May 28, 2024 03:51
@cuixq
Copy link
Contributor

cuixq commented May 29, 2024

LGTM - however I am not sure if we want the output in this color schema. @another-rex is there any discussion/agreement on this?

josieang pushed a commit to josieang/osv-scanner that referenced this pull request Jun 6, 2024
This introduces a set of crafted scanner results that each supported
`output` format is run through to showcase how they look across all the
different results possible from a scanner run - it originally started
life as the tests for google#889 but I realised they could base used more
generally for testing and reviewing all the outputters, so here we are.

~It looks like this has also revealed the SARIF output is unstable in
its ordering, which I'll aim to address in a dedicated PR~
@hogo6002
Copy link
Contributor

Thanks @G-Rath! It looks really nice! I tested it on container scanning, and the vertical results are much clearer than table ones. Adding some suggestions to further improve the display of container scanning results:

  • The vertical output doesn't work when pipe results to a text file. (osv-scanner scan ... > result.txt)
  • Shall we add a summary at the top to Include the image name, OS version, and total vulnerability count. (OS version and total vulnerability count may need to wait for v2).
  • We should probably remove Docker names from each source and add ecosystem names (e.g., change "go-tester.tar:/var/lib/dpkg/status" to "/var/lib/dpkg/status (Debian:12)").
  • To maintain consistency, should we also display the vulnerability and license issue count at the top of each source section ("2 known vulnerabilities found").
  • We could use CVE IDs as the default identifiers, along with their severity scores. For each vulnerability, we can include all associated group aliases, along with their descriptions and fixed versions. (Group aliases should not be counted as separate vulnerabilities in the total count.)
  • It would be better to place results from the same ecosystem together. For example, all language packages could be listed first, followed by OS related packages.
  • Add uncalled vulnerabilities. (Maybe we can consider excluding uncalled vulnerabilities from the total count, or hide them by default with an option to show them)

Example (Feel free to modify):

Scanning image go-tester.tar
170 unimportant vulnerabilities have been filtered out.
Filtered 170 vulnerabilities from output

Image go-tester.tar (Debian 12.5) result
Total 7 (4 High, ...)

/artifact/main-built-with-1-21-0 (Go): found 2 package with issues
  2 known vulnerabilities found
  x license issue found
  stdlib@1.21.0 is affected by the following vulnerabilities:
    CVE-0000-0001: 8.0 (HIGH)
      GO-0000-0001: description here
        Fixed version: 1.22.0
      GHSA-0000-0001: description here

  test@1.0 is affected by the following vulnerabilities:
    CVE-0000-0002: 8.1 (HIGH)
      GO-0000-0002: description here
        Fixed version: 1.1
      GHSA-0000-0002: description here
        Fixed version: 1.1

  Uncalled vulnerabilities
    We found 1 potential vulnerabilities, but it is currently not being called or executed in the code
    test2@1.2.0 may be affected by the following vulnerabilities:
      GO-0000-0003:   (when no CVE ID)
        GO-0000-0003: description here       (when no fixed version available)
        GHSA-0000-0003: description here


/src/main (Go): found 3 packages with issues         (Place other sources from the same ecosystem here)
  3 known vulnerabilities found 
  .....


/var/lib/dpkg/status (Debian:12): found 8 packages with issues
  2 known vulnerabilities found                   (Even though there is only one unique vulnerability, it is detected in multiple packages, so we count it multiple times in the report.)

  libprd@9.1-1 is affected by the following vulnerabilities:
    CVE-0000-2781: 5.0 (Medium)
      CVE-0000-2781:
        description: description here

  libdev@9.1-1 is affected by the following vulnerabilities:
    CVE-0000-2781: 5.0 (Medium)
      CVE-0000-2781:
        description: description here

@G-Rath
Copy link
Collaborator Author

G-Rath commented Jun 28, 2024

@hogo6002 thanks for the review and detailed writeup! It would be really useful if you could provide some reproducations for each of your points as I'm not across all of the different aspects of the scanner like I think you are, so I'm less setup to e.g. produce a result that has a uncalled vulnerability, multiple ecosystems, image scanner results, etc.

Alternatively feel free to provide test cases since then you can just craft the situation you're thinking of rather than having to track down a real-world image/lockfile, and since we'll need those anyway to reflect the changes in the snapshots.

I think the main concern I have (if I'm understanding things right) is that your proposals would likely greatly increase the output in both directions, which is potentially a bad thing

e.g.

We could use CVE IDs as the default identifiers, along with their severity scores. For each vulnerability, we can include all associated group aliases, along with their descriptions and fixed versions. (Group aliases should not be counted as separate vulnerabilities in the total count.)

In my experience most advisories have at least three aliases: a CVE, a GHSA, and an ID from the original source (Go DB, Rust sec DB, Ruby sec DB, Python sec DB, etc) - so this would mean for every advisory you'd be looking at four lines min + each of those will have a summary that is probably very similar but not always the same that the user has to look through.

We'll also want to include a link to make it easy to access, so that's a link per advisory; and the representation of a "fix version" is typically different and multiple per advisory - some have commits, some have version ranges, and some have both. Plus there's determining which fixed version to show (either we show just the latest fix version which might not be in the same major version, or we try to determine the best fit in which case this outputter is suddenly doing a lot more complex work...).

In doing this, we'd have also added at least 2-3 levels of indenting, pushing the content further right, and this gets even worse in ecosystems like NPM that support multiple versions of the same package 😅

I definitely think there's more exploring to do with the outputting, but it feels like it could be too much to be try and tackle in this initial PR - maybe also because currently the outputters don't have any real concept of context, and that that might play a role? (i.e. they don't know when you're scanning a container vs a host vs an sbom, and I think things like summaries and os versions have different usages depending on what scanning mode you're actually in)


re your other comments:

The vertical output doesn't work when pipe results to a text file. (osv-scanner scan ... > result.txt)

I can't reproduce this and is very unexpected as we're explicitly printing to stdout + matching the table reporter - can you confirm the exact command you're running and in what terminal?

Shall we add a summary at the top to Include the image name, OS version, and total vulnerability count. (OS version and total vulnerability count may need to wait for v2).

To maintain consistency, should we also display the vulnerability and license issue count at the top of each source section ("2 known vulnerabilities found").

I'll skip this until we've discussed my opener, but just fyi in case you missed it, we do already put the summary count at the end of each section which I had there are a form of block-end-marker.

It would be better to place results from the same ecosystem together. For example, all language packages could be listed first, followed by OS related packages.

I don't think we should do this at least for now because it makes an assumption about the classification and priority of ecosystems that afaik isn't captured in the osv spec

@hogo6002
Copy link
Contributor

hogo6002 commented Jul 1, 2024

@G-Rath you are right, this is just an initial PR, we should probably not add too many new features to this. I will address the container scanning output format in a separate PR with some screenshots for better understanding.

Adding some clarification for other comments:

I can't reproduce this and is very unexpected as we're explicitly printing to stdout + matching the table reporter - can you confirm the exact command you're running and in what terminal?

I am using the command go run ./cmd/osv-scanner/ scan -r --format vertical ./project > result.txt The output file seems to have some bugs:
image

Another comment about uncalled vulnerabilities: when I use the command to scan a Go project, the vertical output doesn't show which vulnerabilities are uncalled.
image

@G-Rath
Copy link
Collaborator Author

G-Rath commented Jul 2, 2024

@hogo6002 that output is correct - the weirdness you're seeing is the color characters.

It seems that the color library we're using doesn't do any special checks on when to include colors - I don't know the full details, but my understanding is that there are ways libraries can detect that they're being piped and so often these libraries have that built in.

I'll dig further into that and see what we can do

@another-rex
Copy link
Collaborator

FYI we do a similar check with the table reporter to avoid adding styling when stdout isn't the terminal by passing in a terminalWidth of 0. (cmd/osv-scanner/main.go:155)

@G-Rath
Copy link
Collaborator Author

G-Rath commented Jul 2, 2024

@hogo6002 color outputting should now be fixed - while looking into that, I discovered that we could simplify the other outputters a bit too so have opened #1087 doing just that; weirdly, calling DisableColor in reporter seems to only work for the vertical output so it's a little inconsistent but I'll revisit that once we've landed this since we've talked about making changes to the table output anyway (@another-rex I can't remember if we were going to completely remove the table output or just make vertical the default?)

Per our catch-up today, that just leaves call analysis to implement as the rest we'll tackle as followups - I should have mentioned it at the time, but fact that call analysis isn't in any of our output snapshots means we're missing them entirely from the output fixtures; it'd be great if someone could do a PR adding even just a basic sample of what that looks like in the scanner results struct, to save me some time - but if everyone's busy I can try to figure it out from the existing fixtures

@hogo6002
Copy link
Contributor

hogo6002 commented Jul 3, 2024

@G-Rath Is this the call analysis output that you are looking for:

| Uncalled vulnerabilities | | | | | |

@another-rex
Copy link
Collaborator

The first step will be making the vertical output default. I'm still not sure about whether we want to remove table output or not yet.

@G-Rath G-Rath force-pushed the add-vertical-reporter branch 2 times, most recently from c8358d4 to 36f708c Compare July 10, 2024 20:06
@G-Rath
Copy link
Collaborator Author

G-Rath commented Jul 10, 2024

@another-rex @hogo6002 called vs uncalled is now being separated in the vertical output so this should be good for re-review - I've intentionally stuck with a minimal presentation for now rather than introduce a new layer of intending and headers that more explicitly call out "there are uncalled" like in @hogo6002's original sample, as I think it's serviceable and figure that'd be better to start with then over-implementing something very wordy that we might not like anyway

@hogo6002 hogo6002 self-requested a review July 11, 2024 04:14
Copy link
Contributor

@hogo6002 hogo6002 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! LGTM with some minor comments.

continue
}

for _, id := range group.IDs {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it's a bit indented, maybe we can refactor this into a separate function.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like the awkward name of such a function would offset the gain in reducing indenting here

for _, id := range group.IDs {
for _, v := range pkg.Vulnerabilities {
if v.ID == id {
if group.IsCalled() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's also kind of indented. Should we use some extra data structures to track vulnerability count?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd love to have this less intended but I don't really think there's a way to do that which wouldn't just be smearing the indenting around (/ replacing it with different complexity).

Not saying we should do this everywhere, but personally I view it as a natural downside to the currently complex-but-necessary shape of the data structure, and something to review as/when there's more need for this kind of stuff

i.e. v2 might involve some reshaping to this structure that lets us improve this, or a new property might come along that makes it useful to have some functions that could also be used here

Copy link
Collaborator

@another-rex another-rex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@hogo6002 hogo6002 merged commit 42514fc into google:main Jul 26, 2024
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Optimise human readable output for narrow terminals
5 participants