Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance issues from scanSBOMFile reading every file multiple times #257

Closed
spencerschrock opened this issue Mar 2, 2023 · 0 comments · Fixed by #258
Closed

Performance issues from scanSBOMFile reading every file multiple times #257

spencerschrock opened this issue Mar 2, 2023 · 0 comments · Fixed by #258
Assignees
Labels
bug Something isn't working

Comments

@spencerschrock
Copy link
Contributor

spencerschrock commented Mar 2, 2023

Scorecard utilizes osvscanner.DoScan when performing its Vulnerabilities check. The time to complete the check is more than an order of magnitude higher than other checks. Running pprof shows a hot spot in scanSBOMFile.

It looks like when walking a directory, every file is potentially parsed as an SBOM:

// No need to check for error
// If scan fails, it means it isn't a valid SBOM file,
// so just move onto the next file
_ = scanSBOMFile(r, query, path)

There are currently two providers, SPDX and CycloneDX. While SPDX checks for a filename, there's no such check for CycloneDX:

for _, provider := range sbom.Providers {
if provider.Name() == "SPDX" &&
!strings.Contains(strings.ToLower(filepath.Base(path)), ".spdx") {
// All spdx files should have the .spdx in the filename, even if
// it's not the extension: https://spdx.github.io/spdx-spec/v2.3/conformance/
// Skip if this isn't the case to avoid panics
continue
}

I believe this means attempting to parse every file as a CycloneDX SBOM, twice. In large repositories, this adds up ( longest observed is 5 minutes) :

func (c *CycloneDX) GetPackages(r io.ReadSeeker, callback func(Identifier) error) error {
var bom cyclonedx.BOM
for _, formatType := range cycloneDXTypes {
_, err := r.Seek(0, io.SeekStart)
if err != nil {
return fmt.Errorf("failed to seek to start of file: %w", err)
}
decoder := cyclonedx.NewBOMDecoder(r, formatType)
err = decoder.Decode(&bom)

8sg3cyNvvMYG5HH

@another-rex another-rex self-assigned this Mar 2, 2023
@another-rex another-rex added the bug Something isn't working label Mar 2, 2023
another-rex added a commit that referenced this issue Mar 2, 2023
Make sure the file name follows the recognized file names here:
https://cyclonedx.org/specification/overview/#recognized-file-patterns

Resolves #257 

Also 
- adds tests for sboms,
- makes SBOM scan logging output consistent with lockfile scan logging
output
- Minor refactor of SBOMs

I believe we will also want something similar to parse-as in lockfiles
for SBOMs as well in the future to allow file names that doesn't conform
to the standard to be scanned.
hayleycd pushed a commit that referenced this issue Mar 9, 2023
Make sure the file name follows the recognized file names here:
https://cyclonedx.org/specification/overview/#recognized-file-patterns

Resolves #257 

Also 
- adds tests for sboms,
- makes SBOM scan logging output consistent with lockfile scan logging
output
- Minor refactor of SBOMs

I believe we will also want something similar to parse-as in lockfiles
for SBOMs as well in the future to allow file names that doesn't conform
to the standard to be scanned.
julieqiu pushed a commit to julieqiu/osv-scanner that referenced this issue May 2, 2023
Make sure the file name follows the recognized file names here:
https://cyclonedx.org/specification/overview/#recognized-file-patterns

Resolves google#257 

Also 
- adds tests for sboms,
- makes SBOM scan logging output consistent with lockfile scan logging
output
- Minor refactor of SBOMs

I believe we will also want something similar to parse-as in lockfiles
for SBOMs as well in the future to allow file names that doesn't conform
to the standard to be scanned.
julieqiu pushed a commit to julieqiu/osv-scanner that referenced this issue May 2, 2023
Make sure the file name follows the recognized file names here:
https://cyclonedx.org/specification/overview/#recognized-file-patterns

Resolves google#257 

Also 
- adds tests for sboms,
- makes SBOM scan logging output consistent with lockfile scan logging
output
- Minor refactor of SBOMs

I believe we will also want something similar to parse-as in lockfiles
for SBOMs as well in the future to allow file names that doesn't conform
to the standard to be scanned.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants