Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prevent panics when used with per-layer filters #86

Merged
merged 2 commits into from
Jan 23, 2024

Conversation

guswynn
Copy link
Contributor

@guswynn guswynn commented Jan 9, 2024

Motivation

Currently, if the tracing-opentelemetry Layer has a per-layer filter (in these examples, ones that allow only >=DEBUG) the following two cases behave differently:

let root = tracing::trace_span!("root");
tracing::debug_span!(parent: &root, "child");
let root = tracing::trace_span!("root");
oot.in_scope(|| tracing::debug_span!("child"));

The former panics, and the latter successfully creates a single-span trace. Note that this ONLY happens if there is another layer interested in the TRACE level (otherwise, the tracing current-level-filter fast-path filters the root spans out at their callsites).

This may seem rare, but it becomes more common when the per-layer filter is a reload filter, which means the parent and child spans can be differently filtered if the filter is reloaded between their creation (example: https://github.com/MaterializeInc/database-issues/issues/4358)

Solution

Handle this case gracefully. I also did the same in on_follows_from

@guswynn guswynn requested a review from jtescher as a code owner January 9, 2024 18:55
@guswynn guswynn changed the title prevent pancis with per-layer filters prevent panics with per-layer filters Jan 9, 2024
Comment on lines +970 to +972
let follows_data = follows_extensions
.get_mut::<OtelData>()
.expect("Missing otel data span extensions");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its unclear to me if this also needs to be if let Some...my understanding is that if Context::span returned span data for the follows span, then this lay did in fact see the follows span and added this data in on_new_span. This would mean that the parent_context function could also .expect getting the OtelData from the parent span, but Im not sure...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not aware of cases where the contextual spans would not have their data in the registry so I think you are correct here that it's ok.

@guswynn guswynn changed the title prevent panics with per-layer filters prevent panics when used with per-layer filters Jan 9, 2024
Copy link
Member

@davidbarsky davidbarsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this change makes sense, especially since the only behavioral change is "don't panic".

Copy link
Collaborator

@jtescher jtescher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks @guswynn, sorry about the delay

Comment on lines +970 to +972
let follows_data = follows_extensions
.get_mut::<OtelData>()
.expect("Missing otel data span extensions");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not aware of cases where the contextual spans would not have their data in the registry so I think you are correct here that it's ok.

@jtescher jtescher merged commit bd90c86 into tokio-rs:v0.1.x Jan 23, 2024
13 checks passed
@guswynn
Copy link
Contributor Author

guswynn commented Jan 23, 2024

@jtescher thanks for the review!

fnichol added a commit to systeminit/si that referenced this pull request Jan 26, 2024
This change incorporates an upstream fix in `tracing-opentelemetry` to
prevent parent and "follows-from" span association from panicking under
certain scenarios.

Additionally, the strategy to determine the appropriate span for
PostgreSQL and NATS transactions/subscriptions has been updated
slightly. The macros present prior to this change attempted to prevent
another "Mutex poisoned" issue encountered a year ago. The updated
`tx_span()` and `sub_span()` functions check to see if `Span::current()`
is disabled, in which case a `Span::none()` is returned, otherwise the
current Span is returned. The former macros were slightly less flexible
in that they checked `target` and `level` span metadata--which happened
to be correct in these two cases.

References: tokio-rs/tracing-opentelemetry#14
References: tokio-rs/tracing-opentelemetry#86
References: tokio-rs/tracing#2399
References: tokio-rs/tracing#1766

Signed-off-by: Fletcher Nichol <fletcher@systeminit.com>
fnichol added a commit to systeminit/si that referenced this pull request Jan 26, 2024
This change incorporates an upstream fix in `tracing-opentelemetry` to
prevent parent and "follows-from" span association from panicking under
certain scenarios.

Additionally, the strategy to determine the appropriate span for
PostgreSQL and NATS transactions/subscriptions has been updated
slightly. The macros present prior to this change attempted to prevent
another "Mutex poisoned" issue encountered a year ago. The updated
`tx_span()` and `sub_span()` functions check to see if `Span::current()`
is disabled, in which case a `Span::none()` is returned, otherwise the
current Span is returned. The former macros were slightly less flexible
in that they checked `target` and `level` span metadata--which happened
to be correct in these two cases.

References: tokio-rs/tracing-opentelemetry#14
References: tokio-rs/tracing-opentelemetry#86
References: tokio-rs/tracing#2399
References: tokio-rs/tracing#1766

Signed-off-by: Fletcher Nichol <fletcher@systeminit.com>
si-bors-ng bot added a commit to systeminit/si that referenced this pull request Jan 26, 2024
3222: chore(si-data-pg,si-data-nats): compute appropriate Span for txns/subs r=fnichol a=fnichol

This change incorporates an upstream fix in `tracing-opentelemetry` to prevent parent and "follows-from" span association from panicking under certain scenarios.

Additionally, the strategy to determine the appropriate span for PostgreSQL and NATS transactions/subscriptions has been updated slightly. The macros present prior to this change attempted to prevent another "Mutex poisoned" issue encountered a year ago. The updated `tx_span()` and `sub_span()` functions check to see if `Span::current()` is disabled, in which case a `Span::none()` is returned, otherwise the current Span is returned. The former macros were slightly less flexible in that they checked `target` and `level` span metadata--which happened to be correct in these two cases.

References: tokio-rs/tracing-opentelemetry#14
References: tokio-rs/tracing-opentelemetry#86
References: tokio-rs/tracing#2399
References: tokio-rs/tracing#1766

Co-authored-by: Fletcher Nichol <fletcher@systeminit.com>
zacharyhamm added a commit to systeminit/si that referenced this pull request Mar 19, 2024
tokio-rs/tracing-opentelemetry#86 fixes some
Mutex poisoining issues we're seeing (only in our test harness).
Upgrading the various opentelemetry crates so we no longer have to deal
with this panic.
si-bors-ng bot added a commit to systeminit/si that referenced this pull request Mar 19, 2024
3459: fix(*): upgrade opentelemetry crates r=zacharyhamm a=zacharyhamm

tokio-rs/tracing-opentelemetry#86 fixes some Mutex poisoining issues we're seeing (only in our test harness). Upgrading the various opentelemetry crates so we no longer have to deal with this panic.

Co-authored-by: Zachary Hamm <zack@systeminit.com>
SuperFluffy added a commit to astriaorg/astria that referenced this pull request Mar 20, 2024
…820)

## Summary
Update tracing-opentelemetry to 0.23.0 to resolve panics

## Background
We are seeing panics with backtraces pointing to either code provides
json formatted fields, or to tracing-opentelemetry layers. This patch
upgrades `astria-telemetry` to use tracing-opentelemetry 0.23 (and bumps
related crates) to try and resolve that.


## Changes
Bump the following crates
- tracing-opentelemetry from 0.22 to 0.23
- opentelemetry from 0.21 to 0.22
- opentelemetry-otlp from 0.13 to 0.15
- opentelemetry-semantic-conventions from 0.13 to 0.14
- openelemetry-stdout from 0.2 to 0.3
- openelemetry_sdk from 0.21.1 to 0.22.1

## Related Issues
tokio-rs/tracing-opentelemetry#86
johnrwatson pushed a commit to systeminit/si that referenced this pull request Mar 26, 2024
tokio-rs/tracing-opentelemetry#86 fixes some
Mutex poisoining issues we're seeing (only in our test harness).
Upgrading the various opentelemetry crates so we no longer have to deal
with this panic.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants