Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlock in HydrateWithClient #1077

Closed
spencerschrock opened this issue Jun 27, 2024 · 0 comments · Fixed by #1078
Closed

Deadlock in HydrateWithClient #1077

spencerschrock opened this issue Jun 27, 2024 · 0 comments · Fixed by #1078

Comments

@spencerschrock
Copy link
Contributor

spencerschrock commented Jun 27, 2024

The OpenSSF Scorecard project had a few of our Kubernetes pods get stuck while running osv-scanner. Our binaries had a few profiling/monitoring routines running, so while the executable never fatally deadlocked, no work was being performed.

Debugging with pprof showed the following goroutine stack dump:

goroutine profile: total 46
25 @ 0x4401ae 0x4099ad 0x409617 0xd1febb 0x4774c1
#	0xd1feba	github.com/google/osv-scanner/pkg/osv.HydrateWithClient.func1+0x11a	github.com/google/osv-scanner@v1.7.4/pkg/osv/osv.go:292

1 @ 0x4401ae 0x451e65 0xabec94 0xd1fb7c 0xf89cdc 0xf89cd7 0xf88cb9 0xf8c59f 0x122f02a 0x1237c18 0x1128eb1 0x1250550 0x4774c1
#	0xabec93	golang.org/x/sync/semaphore.(*Weighted).Acquire+0x333				golang.org/x/sync@v0.7.0/semaphore/semaphore.go:74
#	0xd1fb7b	github.com/google/osv-scanner/pkg/osv.HydrateWithClient+0x31b			github.com/google/osv-scanner@v1.7.4/pkg/osv/osv.go:285
#	0xf89cdb	github.com/google/osv-scanner/pkg/osv.Hydrate+0x6fb				github.com/google/osv-scanner@v1.7.4/pkg/osv/osv.go:265
#	0xf89cd6	github.com/google/osv-scanner/pkg/osvscanner.makeRequest+0x6f6			github.com/google/osv-scanner@v1.7.4/pkg/osvscanner/osvscanner.go:982
#	0xf88cb8	github.com/google/osv-scanner/pkg/osvscanner.DoScan+0x1278			github.com/google/osv-scanner@v1.7.4/pkg/osvscanner/osvscanner.go:837
#	0xf8c59e	github.com/ossf/scorecard/v5/clients.osvClient.ListUnfixedVulnerabilities+0x29e	github.com/ossf/scorecard/v5/clients/osv.go:55
#	0x122f029	github.com/ossf/scorecard/v5/checks/raw.Vulnerabilities+0x149			github.com/ossf/scorecard/v5/checks/raw/vulnerabilities.go:36
#	0x1237c17	github.com/ossf/scorecard/v5/checks.Vulnerabilities+0x57			github.com/ossf/scorecard/v5/checks/vulnerabilities.go:43
#	0x1128eb0	github.com/ossf/scorecard/v5/checker.(*Runner).Run+0x8f0			github.com/ossf/scorecard/v5/checker/check_runner.go:118
#	0x125054f	github.com/ossf/scorecard/v5/pkg.runEnabledChecks.func1+0x1af			github.com/ossf/scorecard/v5/pkg/scorecard.go:68

1 @ 0x4401ae 0x452ea5 0x452e74 0x4734a5 0x4842c8 0x1250332 0x4774c1
#	0x4734a4	sync.runtime_Semacquire+0x24				runtime/sema.go:62
#	0x4842c7	sync.(*WaitGroup).Wait+0x47				sync/waitgroup.go:116
#	0x1250331	github.com/ossf/scorecard/v5/pkg.runEnabledChecks+0x231	github.com/ossf/scorecard/v5/pkg/scorecard.go:71

## some unrelated monitoring and pprof routines I've omitted. I can provide it if needed.

Observations:

  • The projects that we were scanning when the deadlock occurred seem to have many open vulnerabilities, indicating there were a maximal number of concurrent requests (25+).
  • The frozen goroutines seem to be stuck on line 285 and 292 (of v1.7.4)

With all that in mind, I think the following occurred:

  1. There were 25+ vulnerabilities which needed hydrating causing the semaphore to be depleted, and the "main" goroutine couldn't proceed

    osv-scanner/pkg/osv/osv.go

    Lines 283 to 285 in d4657bf

    for batchIdx, response := range resp.Results {
    for resultIdx, vuln := range response.Vulns {
    if err := rateLimiter.Acquire(ctx, 1); err != nil {

  2. All 25 of the hydrating goroutines had an error calling GetWithClient but cannot send their error on the errChan because the "main" goroutine does not read from errChan until after starting a hydration goroutine for all vulns. This seems like an uncommon occurrence, but I think Failed to hydrate an OSV response due to an unexpected severity type format osv.dev#2335 caused an increased error rate this week and made this happen.

    osv-scanner/pkg/osv/osv.go

    Lines 289 to 292 in d4657bf

    go func(id string, batchIdx int, resultIdx int) {
    vuln, err := GetWithClient(id, client)
    if err != nil {
    errChan <- err

    osv-scanner/pkg/osv/osv.go

    Lines 313 to 315 in d4657bf

    for err := range errChan {
    return nil, err
    }

  3. Deadlock. All 25 requests were frozen until the "main" goroutine continued, which could not happen due to concurrency limits.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant