-
Notifications
You must be signed in to change notification settings - Fork 410
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Metrics SDK] Segfault when short export period is used for metrics #1676
Comments
@ahadnagy Thanks for reporting. Looking at the stack trace, this seems different from #1663. Are you using the example code from here - https://github.com/open-telemetry/opentelemetry-cpp/blob/main/examples/otlp/grpc_metric_main.cc, as I am not able to repro it using given export periods. If possible, can you share your code (needn't be self-contained for now). Also for the observable instrument, ensure that the object is valid for the lifetime of metric collection. |
@lalitb Thanks for looking into this! The code I'm using closely follows the mentioned example with some utility abstractions. Before discovering this issue we were using commit#9e87a6eb5997bd923c3c3742727bd6bceff483e5 with the default values for period and timeout, that worked. There are two differences in our use-case worth mentioning: the registration of the instruments might happen a bit later than in the example (~few hundreds of ms), and we're using >10 async instruments. Tomorrow I'll try to reproduce this in the context of the example application, fingers crossed. |
@lalitb I did some more debugging today and it seems that the issues we're having happen when we'd like to terminate the application and the static objects are being destructed. I was able to capture the failure with Valgrind as well, this gives a bit more context into where and when things were allocated and freed: After figuring out the timing of the failure I was able to come up with a stand-alone reproducer: This fails with sigabrt, but the stack trace is almost identical. |
Thanks @ahadnagy, I can reproduce the problem with your code, should be good to troubleshoot further :) |
We're experiencing segfaults when using async instruments with a relatively short (e.g.
OTEL_METRICS_EXPORT_INTERVAL_MILLIS=100 OTEL_METRICS_EXPORT_TIMEOUT_MILLIS=50
) export period.For me, this looks like a race condition issue and strongly resembles #1663 , but it's still present after bumping
opentelemetry-cpp
to the last commit.I was able to catch this with GDB, here's the stack trace:
(gdb)_backtrace_full.txt
If necessary, I'll try to put together a self-contained reproducer, but I'm hoping that it's something simple. :)
The text was updated successfully, but these errors were encountered: