-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
prevent panics when used with per-layer filters #86
Conversation
821b978
to
f9eab2f
Compare
let follows_data = follows_extensions | ||
.get_mut::<OtelData>() | ||
.expect("Missing otel data span extensions"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its unclear to me if this also needs to be if let Some
...my understanding is that if Context::span
returned span data for the follows span, then this lay did in fact see the follows span and added this data in on_new_span
. This would mean that the parent_context
function could also .expect
getting the OtelData
from the parent span, but Im not sure...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not aware of cases where the contextual spans would not have their data in the registry so I think you are correct here that it's ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this change makes sense, especially since the only behavioral change is "don't panic".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks @guswynn, sorry about the delay
let follows_data = follows_extensions | ||
.get_mut::<OtelData>() | ||
.expect("Missing otel data span extensions"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not aware of cases where the contextual spans would not have their data in the registry so I think you are correct here that it's ok.
@jtescher thanks for the review! |
This change incorporates an upstream fix in `tracing-opentelemetry` to prevent parent and "follows-from" span association from panicking under certain scenarios. Additionally, the strategy to determine the appropriate span for PostgreSQL and NATS transactions/subscriptions has been updated slightly. The macros present prior to this change attempted to prevent another "Mutex poisoned" issue encountered a year ago. The updated `tx_span()` and `sub_span()` functions check to see if `Span::current()` is disabled, in which case a `Span::none()` is returned, otherwise the current Span is returned. The former macros were slightly less flexible in that they checked `target` and `level` span metadata--which happened to be correct in these two cases. References: tokio-rs/tracing-opentelemetry#14 References: tokio-rs/tracing-opentelemetry#86 References: tokio-rs/tracing#2399 References: tokio-rs/tracing#1766 Signed-off-by: Fletcher Nichol <fletcher@systeminit.com>
This change incorporates an upstream fix in `tracing-opentelemetry` to prevent parent and "follows-from" span association from panicking under certain scenarios. Additionally, the strategy to determine the appropriate span for PostgreSQL and NATS transactions/subscriptions has been updated slightly. The macros present prior to this change attempted to prevent another "Mutex poisoned" issue encountered a year ago. The updated `tx_span()` and `sub_span()` functions check to see if `Span::current()` is disabled, in which case a `Span::none()` is returned, otherwise the current Span is returned. The former macros were slightly less flexible in that they checked `target` and `level` span metadata--which happened to be correct in these two cases. References: tokio-rs/tracing-opentelemetry#14 References: tokio-rs/tracing-opentelemetry#86 References: tokio-rs/tracing#2399 References: tokio-rs/tracing#1766 Signed-off-by: Fletcher Nichol <fletcher@systeminit.com>
3222: chore(si-data-pg,si-data-nats): compute appropriate Span for txns/subs r=fnichol a=fnichol This change incorporates an upstream fix in `tracing-opentelemetry` to prevent parent and "follows-from" span association from panicking under certain scenarios. Additionally, the strategy to determine the appropriate span for PostgreSQL and NATS transactions/subscriptions has been updated slightly. The macros present prior to this change attempted to prevent another "Mutex poisoned" issue encountered a year ago. The updated `tx_span()` and `sub_span()` functions check to see if `Span::current()` is disabled, in which case a `Span::none()` is returned, otherwise the current Span is returned. The former macros were slightly less flexible in that they checked `target` and `level` span metadata--which happened to be correct in these two cases. References: tokio-rs/tracing-opentelemetry#14 References: tokio-rs/tracing-opentelemetry#86 References: tokio-rs/tracing#2399 References: tokio-rs/tracing#1766 Co-authored-by: Fletcher Nichol <fletcher@systeminit.com>
tokio-rs/tracing-opentelemetry#86 fixes some Mutex poisoining issues we're seeing (only in our test harness). Upgrading the various opentelemetry crates so we no longer have to deal with this panic.
3459: fix(*): upgrade opentelemetry crates r=zacharyhamm a=zacharyhamm tokio-rs/tracing-opentelemetry#86 fixes some Mutex poisoining issues we're seeing (only in our test harness). Upgrading the various opentelemetry crates so we no longer have to deal with this panic. Co-authored-by: Zachary Hamm <zack@systeminit.com>
…820) ## Summary Update tracing-opentelemetry to 0.23.0 to resolve panics ## Background We are seeing panics with backtraces pointing to either code provides json formatted fields, or to tracing-opentelemetry layers. This patch upgrades `astria-telemetry` to use tracing-opentelemetry 0.23 (and bumps related crates) to try and resolve that. ## Changes Bump the following crates - tracing-opentelemetry from 0.22 to 0.23 - opentelemetry from 0.21 to 0.22 - opentelemetry-otlp from 0.13 to 0.15 - opentelemetry-semantic-conventions from 0.13 to 0.14 - openelemetry-stdout from 0.2 to 0.3 - openelemetry_sdk from 0.21.1 to 0.22.1 ## Related Issues tokio-rs/tracing-opentelemetry#86
tokio-rs/tracing-opentelemetry#86 fixes some Mutex poisoining issues we're seeing (only in our test harness). Upgrading the various opentelemetry crates so we no longer have to deal with this panic.
Motivation
Currently, if the
tracing-opentelemetry
Layer
has a per-layer filter (in these examples, ones that allow only>=DEBUG
) the following two cases behave differently:The former panics, and the latter successfully creates a single-span trace. Note that this ONLY happens if there is another layer interested in the
TRACE
level (otherwise, the tracing current-level-filter fast-path filters the root spans out at their callsites).This may seem rare, but it becomes more common when the per-layer filter is a reload filter, which means the parent and child spans can be differently filtered if the filter is reloaded between their creation (example: https://github.com/MaterializeInc/database-issues/issues/4358)
Solution
Handle this case gracefully. I also did the same in
on_follows_from