feat: add "vertical" output format #889

G-Rath · 2024-03-26T20:39:19Z

This adds a new "vertical" output format that is designed for humans and based on the output of osv-detector, which effectively aims to group the output relating to each entity being scanned in vertical slices:

Unfortunately I think it suffers significantly due to the assumptions made by the rest of the codebase for outputting that made sense when the final output was a table i.e. we dump a lot of information as we go about scanning, config files, vulnerability filtering, and so on that really should be grouped but currently cannot because they're all outputted at different stages - I think a way to address that could be using some sort of event-emitter type pattern so that the reporters could be responsible for deciding what they actually do (e.g. r.Emit("filtered-vulnerability", ...) and then most reporters could choose to just print immediately, and ones like "vertical" could choose to add it to an internal struct), but I think that'll involve a lot more work; for now I'm just going to ignore the pre-results output.

Resolves #85

codecov-commenter · 2024-03-28T03:03:53Z

Codecov Report

Attention: Patch coverage is 93.33333% with 14 lines in your changes missing coverage. Please review.

Project coverage is 66.55%. Comparing base (1ea785e) to head (347ca61).

Files	Patch %	Lines
pkg/reporter/vertical_reporter.go	70.27%	11 Missing ⚠️
internal/output/vertical.go	98.24%	2 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #889      +/-   ##
==========================================
+ Coverage   66.11%   66.55%   +0.44%     
==========================================
  Files         160      162       +2     
  Lines       12792    13002     +210     
==========================================
+ Hits         8457     8654     +197     
- Misses       3877     3889      +12     
- Partials      458      459       +1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

This introduces a set of crafted scanner results that each supported `output` format is run through to showcase how they look across all the different results possible from a scanner run - it originally started life as the tests for #889 but I realised they could base used more generally for testing and reviewing all the outputters, so here we are. ~It looks like this has also revealed the SARIF output is unstable in its ordering, which I'll aim to address in a dedicated PR~

cuixq · 2024-05-29T03:34:20Z

LGTM - however I am not sure if we want the output in this color schema. @another-rex is there any discussion/agreement on this?

This introduces a set of crafted scanner results that each supported `output` format is run through to showcase how they look across all the different results possible from a scanner run - it originally started life as the tests for google#889 but I realised they could base used more generally for testing and reviewing all the outputters, so here we are. ~It looks like this has also revealed the SARIF output is unstable in its ordering, which I'll aim to address in a dedicated PR~

hogo6002 · 2024-06-26T07:50:41Z

Thanks @G-Rath! It looks really nice! I tested it on container scanning, and the vertical results are much clearer than table ones. Adding some suggestions to further improve the display of container scanning results:

The vertical output doesn't work when pipe results to a text file. (osv-scanner scan ... > result.txt)
Shall we add a summary at the top to Include the image name, OS version, and total vulnerability count. (OS version and total vulnerability count may need to wait for v2).
We should probably remove Docker names from each source and add ecosystem names (e.g., change "go-tester.tar:/var/lib/dpkg/status" to "/var/lib/dpkg/status (Debian:12)").
To maintain consistency, should we also display the vulnerability and license issue count at the top of each source section ("2 known vulnerabilities found").
We could use CVE IDs as the default identifiers, along with their severity scores. For each vulnerability, we can include all associated group aliases, along with their descriptions and fixed versions. (Group aliases should not be counted as separate vulnerabilities in the total count.)
It would be better to place results from the same ecosystem together. For example, all language packages could be listed first, followed by OS related packages.
Add uncalled vulnerabilities. (Maybe we can consider excluding uncalled vulnerabilities from the total count, or hide them by default with an option to show them)

Example (Feel free to modify):

Scanning image go-tester.tar
170 unimportant vulnerabilities have been filtered out.
Filtered 170 vulnerabilities from output

Image go-tester.tar (Debian 12.5) result
Total 7 (4 High, ...)

/artifact/main-built-with-1-21-0 (Go): found 2 package with issues
  2 known vulnerabilities found
  x license issue found
  stdlib@1.21.0 is affected by the following vulnerabilities:
    CVE-0000-0001: 8.0 (HIGH)
      GO-0000-0001: description here
        Fixed version: 1.22.0
      GHSA-0000-0001: description here

  test@1.0 is affected by the following vulnerabilities:
    CVE-0000-0002: 8.1 (HIGH)
      GO-0000-0002: description here
        Fixed version: 1.1
      GHSA-0000-0002: description here
        Fixed version: 1.1

  Uncalled vulnerabilities
    We found 1 potential vulnerabilities, but it is currently not being called or executed in the code
    test2@1.2.0 may be affected by the following vulnerabilities:
      GO-0000-0003:   (when no CVE ID)
        GO-0000-0003: description here       (when no fixed version available)
        GHSA-0000-0003: description here


/src/main (Go): found 3 packages with issues         (Place other sources from the same ecosystem here)
  3 known vulnerabilities found 
  .....


/var/lib/dpkg/status (Debian:12): found 8 packages with issues
  2 known vulnerabilities found                   (Even though there is only one unique vulnerability, it is detected in multiple packages, so we count it multiple times in the report.)

  libprd@9.1-1 is affected by the following vulnerabilities:
    CVE-0000-2781: 5.0 (Medium)
      CVE-0000-2781:
        description: description here

  libdev@9.1-1 is affected by the following vulnerabilities:
    CVE-0000-2781: 5.0 (Medium)
      CVE-0000-2781:
        description: description here

G-Rath · 2024-06-28T01:29:42Z

@hogo6002 thanks for the review and detailed writeup! It would be really useful if you could provide some reproducations for each of your points as I'm not across all of the different aspects of the scanner like I think you are, so I'm less setup to e.g. produce a result that has a uncalled vulnerability, multiple ecosystems, image scanner results, etc.

Alternatively feel free to provide test cases since then you can just craft the situation you're thinking of rather than having to track down a real-world image/lockfile, and since we'll need those anyway to reflect the changes in the snapshots.

I think the main concern I have (if I'm understanding things right) is that your proposals would likely greatly increase the output in both directions, which is potentially a bad thing

e.g.

We could use CVE IDs as the default identifiers, along with their severity scores. For each vulnerability, we can include all associated group aliases, along with their descriptions and fixed versions. (Group aliases should not be counted as separate vulnerabilities in the total count.)

In my experience most advisories have at least three aliases: a CVE, a GHSA, and an ID from the original source (Go DB, Rust sec DB, Ruby sec DB, Python sec DB, etc) - so this would mean for every advisory you'd be looking at four lines min + each of those will have a summary that is probably very similar but not always the same that the user has to look through.

We'll also want to include a link to make it easy to access, so that's a link per advisory; and the representation of a "fix version" is typically different and multiple per advisory - some have commits, some have version ranges, and some have both. Plus there's determining which fixed version to show (either we show just the latest fix version which might not be in the same major version, or we try to determine the best fit in which case this outputter is suddenly doing a lot more complex work...).

In doing this, we'd have also added at least 2-3 levels of indenting, pushing the content further right, and this gets even worse in ecosystems like NPM that support multiple versions of the same package 😅

I definitely think there's more exploring to do with the outputting, but it feels like it could be too much to be try and tackle in this initial PR - maybe also because currently the outputters don't have any real concept of context, and that that might play a role? (i.e. they don't know when you're scanning a container vs a host vs an sbom, and I think things like summaries and os versions have different usages depending on what scanning mode you're actually in)

re your other comments:

The vertical output doesn't work when pipe results to a text file. (osv-scanner scan ... > result.txt)

I can't reproduce this and is very unexpected as we're explicitly printing to stdout + matching the table reporter - can you confirm the exact command you're running and in what terminal?

Shall we add a summary at the top to Include the image name, OS version, and total vulnerability count. (OS version and total vulnerability count may need to wait for v2).

To maintain consistency, should we also display the vulnerability and license issue count at the top of each source section ("2 known vulnerabilities found").

I'll skip this until we've discussed my opener, but just fyi in case you missed it, we do already put the summary count at the end of each section which I had there are a form of block-end-marker.

It would be better to place results from the same ecosystem together. For example, all language packages could be listed first, followed by OS related packages.

I don't think we should do this at least for now because it makes an assumption about the classification and priority of ecosystems that afaik isn't captured in the osv spec

hogo6002 · 2024-07-01T05:32:51Z

@G-Rath you are right, this is just an initial PR, we should probably not add too many new features to this. I will address the container scanning output format in a separate PR with some screenshots for better understanding.

Adding some clarification for other comments:

I can't reproduce this and is very unexpected as we're explicitly printing to stdout + matching the table reporter - can you confirm the exact command you're running and in what terminal?

I am using the command go run ./cmd/osv-scanner/ scan -r --format vertical ./project > result.txt The output file seems to have some bugs:

Another comment about uncalled vulnerabilities: when I use the command to scan a Go project, the vertical output doesn't show which vulnerabilities are uncalled.

G-Rath · 2024-07-02T03:14:28Z

@hogo6002 that output is correct - the weirdness you're seeing is the color characters.

It seems that the color library we're using doesn't do any special checks on when to include colors - I don't know the full details, but my understanding is that there are ways libraries can detect that they're being piped and so often these libraries have that built in.

I'll dig further into that and see what we can do

another-rex · 2024-07-02T03:18:07Z

FYI we do a similar check with the table reporter to avoid adding styling when stdout isn't the terminal by passing in a terminalWidth of 0. (cmd/osv-scanner/main.go:155)

G-Rath · 2024-07-02T04:39:20Z

@hogo6002 color outputting should now be fixed - while looking into that, I discovered that we could simplify the other outputters a bit too so have opened #1087 doing just that; weirdly, calling DisableColor in reporter seems to only work for the vertical output so it's a little inconsistent but I'll revisit that once we've landed this since we've talked about making changes to the table output anyway (@another-rex I can't remember if we were going to completely remove the table output or just make vertical the default?)

Per our catch-up today, that just leaves call analysis to implement as the rest we'll tackle as followups - I should have mentioned it at the time, but fact that call analysis isn't in any of our output snapshots means we're missing them entirely from the output fixtures; it'd be great if someone could do a PR adding even just a basic sample of what that looks like in the scanner results struct, to save me some time - but if everyone's busy I can try to figure it out from the existing fixtures

hogo6002 · 2024-07-03T05:57:54Z

@G-Rath Is this the call analysis output that you are looking for:

osv-scanner/cmd/osv-scanner/__snapshots__/main_test.snap

Line 479 in 4a3375f

    
           | Uncalled vulnerabilities            |      |           |                             |         |                                          |

another-rex · 2024-07-04T01:29:01Z

The first step will be making the vertical output default. I'm still not sure about whether we want to remove table output or not yet.

G-Rath · 2024-07-10T20:25:12Z

@another-rex @hogo6002 called vs uncalled is now being separated in the vertical output so this should be good for re-review - I've intentionally stuck with a minimal presentation for now rather than introduce a new layer of intending and headers that more explicitly call out "there are uncalled" like in @hogo6002's original sample, as I think it's serviceable and figure that'd be better to start with then over-implementing something very wordy that we might not like anyway

…stent

…ted function

hogo6002

Thanks! LGTM with some minor comments.

hogo6002 · 2024-07-11T04:18:10Z

internal/output/vertical.go

+			continue
+		}
+
+		for _, id := range group.IDs {


nit: it's a bit indented, maybe we can refactor this into a separate function.

I feel like the awkward name of such a function would offset the gain in reducing indenting here

hogo6002 · 2024-07-11T04:26:02Z

internal/output/vertical.go

+			for _, id := range group.IDs {
+				for _, v := range pkg.Vulnerabilities {
+					if v.ID == id {
+						if group.IsCalled() {


it's also kind of indented. Should we use some extra data structures to track vulnerability count?

I'd love to have this less intended but I don't really think there's a way to do that which wouldn't just be smearing the indenting around (/ replacing it with different complexity).

Not saying we should do this everywhere, but personally I view it as a natural downside to the currently complex-but-necessary shape of the data structure, and something to review as/when there's more need for this kind of stuff

i.e. v2 might involve some reshaping to this structure that lets us improve this, or a new property might come along that makes it useful to have some functions that could also be used here

another-rex

LGTM!

G-Rath force-pushed the add-vertical-reporter branch 3 times, most recently from 505d532 to bae0efe Compare March 28, 2024 02:59

G-Rath force-pushed the add-vertical-reporter branch from b32bc50 to 9d20ff7 Compare March 28, 2024 22:37

G-Rath mentioned this pull request Apr 1, 2024

[Bug]: race condition with cleanup and parallel gkampitakis/go-snaps#96

Closed

G-Rath force-pushed the add-vertical-reporter branch from 1e47759 to 181148d Compare April 5, 2024 19:27

G-Rath mentioned this pull request Apr 5, 2024

test: cleanup duplicate snapshots #902

Closed

This comment was marked as resolved.

Sign in to view

G-Rath force-pushed the add-vertical-reporter branch from 181148d to e2e8479 Compare April 18, 2024 19:49

G-Rath mentioned this pull request Apr 23, 2024

test: add cases for output functions #937

Merged

G-Rath force-pushed the add-vertical-reporter branch from e2e8479 to b8ac63e Compare May 3, 2024 04:02

G-Rath force-pushed the add-vertical-reporter branch 2 times, most recently from d5c8f4d to b266097 Compare May 24, 2024 19:47

This comment was marked as resolved.

Sign in to view

G-Rath force-pushed the add-vertical-reporter branch 3 times, most recently from 1ecd8e0 to 14e97cd Compare May 28, 2024 01:08

G-Rath marked this pull request as ready for review May 28, 2024 03:51

another-rex requested review from another-rex and cuixq May 28, 2024 04:37

G-Rath force-pushed the add-vertical-reporter branch from 14e97cd to ac14ce7 Compare June 28, 2024 19:36

G-Rath force-pushed the add-vertical-reporter branch from ac14ce7 to 7f8bd9d Compare July 2, 2024 04:05

G-Rath mentioned this pull request Jul 2, 2024

test: add output fixtures for call analysis #1093

Merged

G-Rath force-pushed the add-vertical-reporter branch from 2518d79 to 443a202 Compare July 2, 2024 21:20

G-Rath force-pushed the add-vertical-reporter branch 2 times, most recently from c8358d4 to 36f708c Compare July 10, 2024 20:06

G-Rath added 13 commits July 11, 2024 08:32

feat: add "vertical" output format

111ca05

refactor: change how vertical printing is done

a50d50e

feat: print license violations

c0ec374

fix: support multiple license violations

6160715

fix: add some spacing to vertical output blocks

83257cf

fix: specify the count of packages is of those "with issues"

a21df50

feat: restructure how license violations are printed to be more consi…

82381c5

…stent

refactor: replace fatih/color with go-pretty

0a615f2

fix: disable color output if we're not being called from a terminal

85f44e3

fix: omit uncalled vulnerabilities

76f25fa

fix: include uncalled vulnerabilities in vertical output

d89b096

refactor: pull out vulnerabilities count summary printing into dedica…

d1b1b86

…ted function

fix: adjust phrasing of "called" vs "uncalled" vulnerabilities

347ca61

G-Rath force-pushed the add-vertical-reporter branch from 36f708c to 347ca61 Compare July 10, 2024 20:33

hogo6002 self-requested a review July 11, 2024 04:14

hogo6002 approved these changes Jul 11, 2024

View reviewed changes

another-rex approved these changes Jul 25, 2024

View reviewed changes

hogo6002 merged commit 42514fc into google:main Jul 26, 2024
13 checks passed

BrewTestBot mentioned this pull request Aug 7, 2024

osv-scanner 1.8.3 Homebrew/homebrew-core#180352

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add "vertical" output format #889

feat: add "vertical" output format #889

G-Rath commented Mar 26, 2024

codecov-commenter commented Mar 28, 2024 •

edited

Loading

This comment was marked as resolved.

This comment was marked as resolved.

cuixq commented May 29, 2024

hogo6002 commented Jun 26, 2024

G-Rath commented Jun 28, 2024

hogo6002 commented Jul 1, 2024

G-Rath commented Jul 2, 2024

another-rex commented Jul 2, 2024

G-Rath commented Jul 2, 2024

hogo6002 commented Jul 3, 2024 •

edited

Loading

another-rex commented Jul 4, 2024

G-Rath commented Jul 10, 2024

hogo6002 left a comment

hogo6002 Jul 11, 2024

G-Rath Jul 11, 2024

hogo6002 Jul 11, 2024

G-Rath Jul 11, 2024

another-rex left a comment

feat: add "vertical" output format #889

feat: add "vertical" output format #889

Conversation

G-Rath commented Mar 26, 2024

codecov-commenter commented Mar 28, 2024 • edited Loading

Codecov Report

This comment was marked as resolved.

This comment was marked as resolved.

cuixq commented May 29, 2024

hogo6002 commented Jun 26, 2024

G-Rath commented Jun 28, 2024

hogo6002 commented Jul 1, 2024

G-Rath commented Jul 2, 2024

another-rex commented Jul 2, 2024

G-Rath commented Jul 2, 2024

hogo6002 commented Jul 3, 2024 • edited Loading

another-rex commented Jul 4, 2024

G-Rath commented Jul 10, 2024

hogo6002 left a comment

Choose a reason for hiding this comment

hogo6002 Jul 11, 2024

Choose a reason for hiding this comment

G-Rath Jul 11, 2024

Choose a reason for hiding this comment

hogo6002 Jul 11, 2024

Choose a reason for hiding this comment

G-Rath Jul 11, 2024

Choose a reason for hiding this comment

another-rex left a comment

Choose a reason for hiding this comment

codecov-commenter commented Mar 28, 2024 •

edited

Loading

hogo6002 commented Jul 3, 2024 •

edited

Loading