You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
When using a persistent queue backed by file storage, the persisted state can cause the collector to get stuck midway during start-up. By stuck I mean that the collector process doesn't exit but some components are never initialized. Disabling the persistent queue fixes the issue.
From dumping the stacktraces of all goroutines using the pprof extension, it looks like the below goroutine is blocking start-up from continuing. I confirmed this stacktrace was always present the few times this bug happened, and never present on collector instances that start cleanly:
Line 154 is trying to write to a buffered channel. If i'm reading it correctly, the goroutine would get blocked if persisted write index - persisted read index > queue capacity?
Steps to reproduce
I've since disabled file storage and didn't have time to find consistent steps to reproduce the issue, but can revisit later if it'd help.
Here are some observations though:
Our exporter configuration was left broken for a while with many failures happening
I remember seeing error messages about the exporter queue being full
The issue never resolves itself, even after restarts or letting the instance sit for hours. I think we did fix it by deleting the persistent volume claims at one point, before realizing they were probably at fault, but it eventually happened again
Some of the collectors in our kubernetes stateful set never ran into the issue even when restarting, but did not test this thoroughly
What did you expect to see?
Collector starts-up cleanly
What did you see instead?
Collector is up and running but not doing anything, i.e. no new log statements and receivers are not accepting data.
What version did you use?
v0.92.0 of the collector-contrib image
Describe the bug
When using a persistent queue backed by file storage, the persisted state can cause the collector to get stuck midway during start-up. By stuck I mean that the collector process doesn't exit but some components are never initialized. Disabling the persistent queue fixes the issue.
From dumping the stacktraces of all goroutines using the pprof extension, it looks like the below goroutine is blocking start-up from continuing. I confirmed this stacktrace was always present the few times this bug happened, and never present on collector instances that start cleanly:
Line 154 is trying to write to a buffered channel. If i'm reading it correctly, the goroutine would get blocked if
persisted write index - persisted read index > queue capacity
?Steps to reproduce
I've since disabled file storage and didn't have time to find consistent steps to reproduce the issue, but can revisit later if it'd help.
Here are some observations though:
What did you expect to see?
Collector starts-up cleanly
What did you see instead?
Collector is up and running but not doing anything, i.e. no new log statements and receivers are not accepting data.
What version did you use?
v0.92.0 of the collector-contrib image
What config did you use?
Environment
The text was updated successfully, but these errors were encountered: