Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for SARIF output #216

Closed
Dentrax opened this issue Feb 17, 2023 · 20 comments · Fixed by #432 or #534
Closed

Add support for SARIF output #216

Dentrax opened this issue Feb 17, 2023 · 20 comments · Fixed by #432 or #534
Labels
enhancement New feature or request

Comments

@Dentrax
Copy link

Dentrax commented Feb 17, 2023

It'd be great to have a SARIF output format to upload the results to GitHub. (i.e. with github/codeql-action/upload-sarif action)

Blocked by #217

@marcinguy
Copy link

@Dentrax You can get SARIF output and scanning GH action using betterscan https://github.com/marcinguy/betterscan-ce

It has OSV scanner as one the scanners

@Dentrax
Copy link
Author

Dentrax commented Feb 17, 2023

Hey @marcinguy, since this is slightly out-of-context of what I'm proposing here; but I definitely check that project, seems interesting! Thanks for dropping that.

@marcinguy
Copy link

@Dentrax No worries. Was trying to help. Yea, different outputs from OSV scanner would be great. +1

@oliverchang oliverchang added the enhancement New feature or request label Feb 19, 2023
@oliverchang
Copy link
Collaborator

Thanks for the suggestion! This is indeed something we've been thinking about. This is something that may be blocked on #150.

Can you also speak a bit about how you intend to consume the SARIF output? Is this meant to be fed to e.g. GitHub's code scanning alerts? Are there other use cases for this?

@Dentrax
Copy link
Author

Dentrax commented Feb 20, 2023

Is this meant to be fed to e.g. GitHub's code scanning alerts?

Actually, yes. This is my only use-case.

@jaskaransinghdr6j
Copy link

Hey guys, any update on SARIF support?

@pwa-tapptic
Copy link

I'm also interested in this feature. It makes integrations with CI/CD tools a way nicer!

@oliverchang
Copy link
Collaborator

oliverchang commented Jul 3, 2023

This is already being worked on by @another-rex in #420.

This was referenced Jul 4, 2023
another-rex added a commit that referenced this issue Jul 31, 2023
This PR features:

- Refactors the format flag's internal logic so that we can don't need
to repeat the format types so much, and we can test when we add a new
format entry if we forgot anything.
- Adds a new format "sarif", which returns a SARIF report (closes #216 )
- Adds a Github Action `action.yaml` and it's specialized dockerfile
`action.dockerfile`. This docker image runs a bash script wrapping
osv-scanner, first by preprocessing the input so the last argument will
be split by new line, allowing the workflow user to pass in multiple
directories/files they wish to scan. The script also changes exit codes
127 and 128 to 0 as they contain errors that the user can't really do
anything about.
- Adds two reusable workflows using this new github action for this repo
- Reusable PR workflow, for using to check if PRs introduce new
vulnerabilities.
- Reusable Scheduled workflow, for use to regularly check for new vulns
applying to your existing vulns.
- Adds an experimental flag: `--experimental-diff`, which will only
output the difference between a previous run and this run of the
osv-scanner. This is for use in the PR workflow.
- Sorts the grouped ID output.

Closes #57 

Currently the reusable workflow has to point to a specific action which
cannot be relative (otherwise it would point to the wrong action when
reused in another repo). This means right now it's pointed to this
fork/branch instead of the master branch, this will need to be updated
once this PR is merged.

Example of what workflow sarif output looks like:

![image](https://github.com/google/osv-scanner/assets/106129829/fc7a0ac4-f3d8-4524-93ba-7b03dd0313cd)

Here is an example of the PR reusable workflow working:

another-rex/scorecard-check-osv-e2e#1

That PR adds an additional vulnerability, which causes it to fail. You
can see that only the new vuln is showing up in the code scanning
report:
https://github.com/another-rex/scorecard-check-osv-e2e/security/code-scanning/1


TODO after this PR is merged:
- Change links that point to this PR branch to point to main (and/or a
tagged commit of main)
- Add support for annotations
- Add documentation (this is for later, as we want to dogfood it in our
own repos first before broadcasting this widely)

---------

Signed-off-by: Rex P <rexpan@google.com>
@Ma1tobiose
Copy link

I tested this and it outputs to sarif format great, but probably not quite what I expected.
I'm going through osv which will check for say 10 issues, and I'm expecting all 10 to be expanded and displayed (in order to be put into results), but at the moment it looks like it's just writing a result and printing a table in the message. This should not be the desired result for this requirement.

@oliverchang
Copy link
Collaborator

I tested this and it outputs to sarif format great, but probably not quite what I expected. I'm going through osv which will check for say 10 issues, and I'm expecting all 10 to be expanded and displayed (in order to be put into results), but at the moment it looks like it's just writing a result and printing a table in the message. This should not be the desired result for this requirement.

Thanks for the feedback @Ma1tobiose. Can you please give a sample output that you'd like to see?

Also curious if there's feedback from any others in this issue. e.g. @pwa-tapptic , @jaskaransinghdr6j @Dentrax

@pwa-tapptic
Copy link

I tested it on my side and I'd have same remark as @Ma1tobiose .
https://github.com/anchore/grype creates SCA results in SARIF format quite nicely (syft . -o json | grype -q -o sarif).
As @Ma1tobiose said, OSV compresses everything into single finding in SARIF, but findings should be rather separate objects in results array:

{
  "version": "2.1.0",
  "$schema": "https://json.schemastore.org/sarif-2.1.0-rtm.5.json",
  "runs": [
    {
      "tool": { ...
      },
      "results": [
        {
          "ruleId": "GHSA-72xf-g2v4-qvf3-tough-cookie",
          "message": {
            "text": "The path /yarn.lock reports tough-cookie at version 4.1.2  which would result in a vulnerable (npm) package installed"
          },
          "locations": [
            {
              "physicalLocation": {
                "artifactLocation": {
                  "uri": "/yarn.lock"
                },
                "region": {
                  "startLine": 1,
                  "startColumn": 1,
                  "endLine": 1,
                  "endColumn": 1
                }
              }
            }
          ]
        },
        {
          "ruleId": "GHSA-776f-qx25-q3cc-xml2js",
          "message": {
            "text": "The path /yarn.lock reports xml2js at version 0.4.23  which would result in a vulnerable (npm) package installed"
          },
          "locations": [
            {
              "physicalLocation": {
                "artifactLocation": {
                  "uri": "/yarn.lock"
                },
                "region": {
                  "startLine": 1,
                  "startColumn": 1,
                  "endLine": 1,
                  "endColumn": 1
                }
              }
            }
          ]
        },
...

just to give you an example.

@oliverchang
Copy link
Collaborator

Thanks!

Our motivation behind a single aggregated report, is to better connect this to #352 via a single remediation command included in the result (e.g. "to fix these interactively, run osv-scanner fix ..."), where we provide a tool that can help with interactively (or automatically) addressing multiple vulnerabilities at once in the same lockfile/manifest file.

Could we understand a bit better how having separate results helps your use cases? Does this provide better UX for you to look at vulnerability results individually as opposed to a single one?

@Ma1tobiose
Copy link

Ma1tobiose commented Aug 22, 2023

Thanks! @oliverchang
I can understand that osv will store the data structure in an aggregated form in order to be able to fix it in one click.
But if the output is sarif or json, my side hope is to access the CICD process, osv as one of the tools in the output standardized report, so that it is easy to do the aggregated display with the results of other tools. So I would need a report in a format similar to the one provided by @pwa-tapptic (results in a separate form).

@pwa-tapptic
Copy link

@oliverchang one-click fix is an awesome feature and great motivation behind tool as OSV, it must be seen as something valuable from CLI user perspective.
But SARIF (i.e. Static Analysis Results Interchange Format) has slightly different motivations behind (details here), as the name of the format itself suggests. CI/CD tools are one of potential beneficiaries of the standardized interchange format.

@oliverchang oliverchang reopened this Aug 22, 2023
@oliverchang
Copy link
Collaborator

@another-rex let's investigate if there's a good middle ground here.

@jaskaransinghdr6j
Copy link

+1 SARIF should ideally represent each "finding" as it's own line item. Being an interchange format, it would be right to use it as a 1 CVE per item template.

@another-rex
Copy link
Collaborator

another-rex commented Sep 11, 2023

Here is the current format I am thinking of merging, would love any feedback (fyi: @pwa-tapptic @jaskaransinghdr6j @Ma1tobiose)

{
  "version": "2.1.0",
  "$schema": "https://json.schemastore.org/sarif-2.1.0.json",
  "runs": [
    {
      "tool": {
        "driver": {
          "informationUri": "https://github.com/google/osv-scanner",
          "name": "osv-scanner",
          "rules": [
            {
              "id": "RUSTSEC-2022-0041",
              "shortDescription": {
                "text": "OSV.Summary"
              },
              "fullDescription": {
                "text": "OSV.Details",
                "markdown": "OSV.Details"
              },
              "help": {
                "text": "...",
                "markdown": "Markdown table of occurrences in the repo (i.e. If this vuln affects multiple packages/lockfiles)"
              }
            }
          ]
        }
      },
      "artifacts": [
        {
          "location": {
            "uri": "path/to/Cargo.lock"
          },
          "length": -1
        }
      ],
      "results": [
        {
          "ruleId": "RUSTSEC-2023-0045",
          "ruleIndex": 0,
          "level": "warning",
          "message": {
            "text": "Package 'memoffset@0.6.1' is vulnerable to 'RUSTSEC-2023-0045', please upgrade to versions '0.6.2' to fix this vulnerability"
          },
          "locations": [
            {
              "physicalLocation": {
                "artifactLocation": {
                  "uri": "path/to/Cargo.lock"
                }
              }
            }
          ]
        }
        // ... One result per vulnerable package per vulnerability
      ]
    }
  ]
}

+1 SARIF should ideally represent each "finding" as it's own line item. Being an interchange format, it would be right to use it as a 1 CVE per item template.

This is complicated by the fact that not every OSV entry has a corresponding CVE (e.g. RUSTSEC-2023-0045), and we can have multiple advisories from different sources referring to the exact same vulnerability.

This is common for vulnerabilities published by github advisories, where for some languages there will be security advisories published by the language maintainers as well.

One solution to this we are thinking is to fill "ruleId" with all the aliased vulnerabilities joined together (e.g. id: GHSA-wfg4-322g-9vqv,RUSTSEC-2023-0045). SARIF provides a 3.49.4 deprecatedIds property which is an array of unique IDs, in which we can separate those IDs out into separate array entries.

Tell us if this will work for your workflows!

@jaskaransinghdr6j
Copy link

jaskaransinghdr6j commented Sep 11, 2023

This is great! However, I feel as many findings aggregators will be using these findings to autofill fields like "Finding Title", the joined ruleId might be a bit too onerous.

There seems to be no straightforward way to deal with this according to the SARIF spec. What if we default the ruleId to "CVE-XXXX" if available, and fallback to package specific id "RUSTSEC-YYYY" if the former is not available? (or vice-versa)

The other approach could be to list all the aliases in the message text field as : "Also Reported as RUSTSEC-YYYY and GHSA-ZZZZ"

@Ma1tobiose
Copy link

The formatting is great, and I agree with @jaskaransinghdr6j's solution (if it's feasible), being able to easily query the cve and refer sources would make it much quicker to find vulnerability details and fixes.

another-rex added a commit that referenced this issue Sep 20, 2023
Fixes #216 with a new format that separates out individual
vulnerabilities.

Each vulnerability is now it's own rule violation. The aliased
vulnerabilities are grouped together as one rule violation, with an ID
picked in this priority (CVE -> [Eco Specific] -> GHSA).
@another-rex
Copy link
Collaborator

There seems to be no straightforward way to deal with this according to the SARIF spec. What if we default the ruleId to "CVE-XXXX" if available, and fallback to package specific id "RUSTSEC-YYYY" if the former is not available? (or vice-versa)

I implemented this in the SARIF PR and it's now been merged, so please try it out in latest main branch of osv-scanner (this change is not in 1.4.0) and see if this fits your use cases!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
7 participants