Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bad interaction between tracing-error and tracing #1565

Open
asonix opened this issue Sep 14, 2021 · 6 comments
Open

Bad interaction between tracing-error and tracing #1565

asonix opened this issue Sep 14, 2021 · 6 comments
Labels
kind/bug Something isn't working

Comments

@asonix
Copy link

asonix commented Sep 14, 2021

Bug Report

Version

├── tracing v0.1.27
│   ├── tracing-attributes v0.1.16 (proc-macro)
│   └── tracing-core v0.1.20
├── tracing-error v0.1.2
│   ├── tracing v0.1.27 (*)
│   └── tracing-subscriber v0.2.22
│       ├── tracing v0.1.27 (*)
│       ├── tracing-core v0.1.20 (*)
│       ├── tracing-log v0.1.2
│       │   └── tracing-core v0.1.20 (*)
│       └── tracing-serde v0.1.2
│           └── tracing-core v0.1.20 (*)
└── tracing-subscriber v0.2.22 (*)

Platform

Linux firestar 5.11.0-34-generic #36~20.04.1-Ubuntu SMP Fri Aug 27 08:06:32 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Description

When passing a type containing tracing-error's SpanTrace to a span's record method, A rwlock is attemped to lock while it's already held, which would result in a deadlock if it didn't panic instead.

Repro:

[dependencies]
tracing = "0.1"
tracing-error = "0.1"
tracing-subscriber = { version = "0.2", features = ["fmt"] }
use tracing_error::{ErrorLayer, SpanTrace};
use tracing_subscriber::{layer::SubscriberExt, EnvFilter, Registry};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let env_filter = EnvFilter::try_from_default_env().unwrap_or_else(|_| EnvFilter::new("info"));
    let subscriber = Registry::default()
        .with(env_filter)
        .with(ErrorLayer::default())
        .with(tracing_subscriber::fmt::layer());

    tracing::subscriber::set_global_default(subscriber)?;

    let span = tracing::info_span!("Blows up!", exception = tracing::field::Empty);
    let entered = span.enter();

    let context = SpanTrace::capture();

    // this line goes boom
    span.record("exception", &tracing::field::debug(&context));

    drop(entered);
    Ok(())
}

Originally reported here: LukeMathWalker/tracing-actix-web#36

asonix added a commit to asonix/tracing-actix-web that referenced this issue Sep 18, 2021
@hawkw hawkw added the kind/bug Something isn't working label Sep 18, 2021
LukeMathWalker pushed a commit to LukeMathWalker/tracing-actix-web that referenced this issue Sep 21, 2021
* Allow root_span to wrap the full middlware chain, not just the output future

* Build error string outside of span.record

This is a workaround for tokio-rs/tracing#1565

* Reference issue being worked-around as comment
bobbymcr pushed a commit to bobbymcr/tracing-actix-web that referenced this issue Oct 7, 2021
* Allow root_span to wrap the full middlware chain, not just the output future

* Build error string outside of span.record

This is a workaround for tokio-rs/tracing#1565

* Reference issue being worked-around as comment
@asonix
Copy link
Author

asonix commented Dec 14, 2022

On the current version of the tracing libraries this no longer panics, and instead deadlocks

├── tracing v0.1.37
│   ├── tracing-attributes v0.1.23 (proc-macro)
│   └── tracing-core v0.1.30
├── tracing-error v0.2.0
│   ├── tracing v0.1.37 (*)
│   └── tracing-subscriber v0.3.16
│       ├── tracing v0.1.37 (*)
│       ├── tracing-core v0.1.30 (*)
│       └── tracing-log v0.1.3
│           └── tracing-core v0.1.30 (*)
└── tracing-subscriber v0.3.16 (*)

@thehabbos007
Copy link

thehabbos007 commented Jun 2, 2023

Just chiming in to say that we're also experiencing this behavior when using tracing and OTLP. Deadlocks happen when tracing the captured span within a span
https://github.com/hasharchives/otlp-tracing-problem-repro/blob/main/src/main.rs#L60
This repro was taken from our Axum webserver which would fully stall and never serve requests after the deadlock was initiated.

@vilkinsons
Copy link

Thanks, @thehabbos007. cc @CiaranMn @TimDiekmann

@AsmPrgmC3
Copy link

This is caused by the fact that the fields are stored/cached in formatted form inside a Span extension (FormattedFields). While new fields are recorded on a span, the lock for the extensions is held:

let mut extensions = span.extensions_mut();
if let Some(fields) = extensions.get_mut::<FormattedFields<N>>() {
let _ = self.fmt_fields.add_fields(fields, values);
return;

Later, when the SpanTrace is formatted, it also tries to lock the Span's extensions to read these formatted fields:

let cont = if let Some(fields) = span.extensions().get::<FormattedFields<F>>() {
f(span.metadata(), fields.fields.as_str())
} else {
f(span.metadata(), "")
};

I can see 5 possible solutions:

  • Don't change library code, the user needs to be careful to avoid the deadlock. One possible workaround is to format the SpanTrace before recording it on the Span:
let trace = format!("{trace:?}");
span.record("trace", trace);
  • Extend the tracing-error API to allow for a try_lock of the extensions and use that when formatting a SpanTrace. This would mean that no fields of the current Span appear in the span trace.
  • Temporarily remove the FormattedFields extension from the Span while adding new fields to it. This allows the release of the lock while the fileds are formatted. This would mean that no fileds of the current Span appear in the span trace iff the ErrorLayer uses the same FormatFields implementation as the fmt::Layer (by default both use DefaultFields).
  • Clone the FormattedFields and add the fields to the clone. With this solution all old fields do appear in the printed span trace, at the cost of an additional String clone every time Span::record is called.
  • Record the new fields onto a new FormattedFields and combine them afterwards. This is essentially the same as the last one (but might be slightly more or less efficient).

@thehabbos007 While very similar, that issue is slightly different, as the extensions are locked for a different reason. The first two solutions however are the same (see tokio-rs/tracing-opentelemetry#59).

@realsama
Copy link

realsama commented Nov 8, 2023

I've had this issue also. looks like it has been fixed by "tracing-opentelemetry v0.22.0". Thanks!.

@mladedav
Copy link
Contributor

Some thoughts on the possible solutions:

Extend the tracing-error API to allow for a try_lock of the extensions and use that when formatting a SpanTrace. This would mean that no fields of the current Span appear in the span trace.

This would work, although someone might still try use the extensions in their own implementation. I think this is a reasonable risk to take.

Temporarily remove the FormattedFields extension from the Span while adding new fields to it. This allows the release of the lock while the fileds are formatted. This would mean that no fileds of the current Span appear in the span trace iff the ErrorLayer uses the same FormatFields implementation as the fmt::Layer (by default both use DefaultFields).

This has a problem when someone uses the same span from multiple threads at the same time. When there are no extensions, the other thread may conclude it has to create new empty ones and then it will effectively overwrite all fields that have been collected up to this point. Or it could detect this when it tried to insert the fields and try again or merge the two fields while it holds the lock since now it doesn't need to call the visitors again.

Or we can add a fallible try_insert to ExtensionsMut and any write that did not observe the original fields will effectively drop their update, but at least it won't overwrite the current state which would be most likely richer. So we run the risk of losing data, but probably less.

Clone the FormattedFields and add the fields to the clone. With this solution all old fields do appear in the printed span trace, at the cost of an additional String clone every time Span::record is called.

This runs to the same problem with parallel writes as the last option, but now it's harder to notice that the content has changed or merge the two versions.

Record the new fields onto a new FormattedFields and combine them afterwards. This is essentially the same as the last one (but might be slightly more or less efficient).

Here multi-threaded spans work, but we need a way to combine the two FormattedFields which is a subtly breaking change. However, this will maybe be needed as a fallback for the other methods anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

7 participants