-
Notifications
You must be signed in to change notification settings - Fork 434
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SDK] BatchSpanProcessor::ForceFlush
appears to be hanging forever
#2574
Comments
Thanks, this may happen when more than one threads call |
Thanks for the quick diagnosis. I can say our problem definitely persists in 1.13.0 (so it's just coincidence that we started noticing after the 1.14.1 upgrade) and a lock does appear to fix the problem. |
Thanks. There should be only one thread calling this function internally, it's still a problem when users call it in another thread. |
This issue was marked as stale due to lack of activity. |
BatchSpanProcessor::ForceFlush
appears to be hanging foreverBatchSpanProcessor::ForceFlush
appears to be hanging forever
Describe your environment
Platform: Linux/x86_64
Build system: CMake
Package manager: conda/conda-forge
OpenTelemetry version: 1.14.1
Steps to reproduce
Unfortunately, this is entwined into a large proprietary application and I've not been able to extricate it. We are using the OTLP exporter, BatchSpanProcessor, and we are calling Flush periodically with a timeout.
What is the expected behavior?
ForceFlush with a timeout should reliably complete within the timeout (roughly)
What is the actual behavior?
ForceFlush appears to get stuck inside the spinlock:
or
These traces appear to me to correspond with the spinlock portion. It appears to me that the spinlock portion does not respect the timeout:
opentelemetry-cpp/sdk/src/trace/batch_span_processor.cc
Lines 120 to 137 in 07f6cb5
Additional context
We think this only occurred once we upgraded from 1.13.0 to 1.14.1. Note that we did not used to have a timeout, since we did not feel a need to. When we upgraded, we noticed it was getting stuck exporting, and added the timeout to try to avoid this (and as general good practice), but it seems there's some deeper issue as to why the export seems to never complete.
I did set a breakpoint on the actual Export method and it does appear to be called regularly, but the main thread that is blocking in ForceFlush isn't getting unblocked.
The text was updated successfully, but these errors were encountered: