You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The OpenSSF Scorecard project had a few of our Kubernetes pods get stuck while running osv-scanner. Our binaries had a few profiling/monitoring routines running, so while the executable never fatally deadlocked, no work was being performed.
Debugging with pprof showed the following goroutine stack dump:
The projects that we were scanning when the deadlock occurred seem to have many open vulnerabilities, indicating there were a maximal number of concurrent requests (25+).
The frozen goroutines seem to be stuck on line 285 and 292 (of v1.7.4)
With all that in mind, I think the following occurred:
There were 25+ vulnerabilities which needed hydrating causing the semaphore to be depleted, and the "main" goroutine couldn't proceed
All 25 of the hydrating goroutines had an error calling GetWithClient but cannot send their error on the errChan because the "main" goroutine does not read from errChan until after starting a hydration goroutine for all vulns. This seems like an uncommon occurrence, but I think Failed to hydrate an OSV response due to an unexpected severity type format osv.dev#2335 caused an increased error rate this week and made this happen.
The OpenSSF Scorecard project had a few of our Kubernetes pods get stuck while running
osv-scanner
. Our binaries had a few profiling/monitoring routines running, so while the executable never fatally deadlocked, no work was being performed.Debugging with pprof showed the following goroutine stack dump:
Observations:
v1.7.4
)With all that in mind, I think the following occurred:
There were 25+ vulnerabilities which needed hydrating causing the semaphore to be depleted, and the "main" goroutine couldn't proceed
osv-scanner/pkg/osv/osv.go
Lines 283 to 285 in d4657bf
All 25 of the hydrating goroutines had an error calling
GetWithClient
but cannot send their error on theerrChan
because the "main" goroutine does not read fromerrChan
until after starting a hydration goroutine for all vulns. This seems like an uncommon occurrence, but I think Failed to hydrate an OSV response due to an unexpected severity type format osv.dev#2335 caused an increased error rate this week and made this happen.osv-scanner/pkg/osv/osv.go
Lines 289 to 292 in d4657bf
osv-scanner/pkg/osv/osv.go
Lines 313 to 315 in d4657bf
Deadlock. All 25 requests were frozen until the "main" goroutine continued, which could not happen due to concurrency limits.
The text was updated successfully, but these errors were encountered: