Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SimpleSpanProcessor is delaying traces (and sometimes crashing) #1143

Closed
pksunkara opened this issue Jul 6, 2023 · 13 comments
Closed

SimpleSpanProcessor is delaying traces (and sometimes crashing) #1143

pksunkara opened this issue Jul 6, 2023 · 13 comments
Labels
A-trace Area: issues related to tracing question Further information is requested

Comments

@pksunkara
Copy link

pksunkara commented Jul 6, 2023

Given the following server program, that uses install_simple() to export traces,

use std::net::SocketAddr;

use axum::{
    routing::{get, post},
    Router, Server,
};
use color_eyre::{install, Result};
use opentelemetry::global;
use opentelemetry::sdk::trace;
use opentelemetry_otlp::{new_exporter, new_pipeline, WithExportConfig};
use tracing_subscriber::{prelude::*, registry};

#[tokio::main]
async fn main() -> Result<()> {
    install()?;

    let tracer = new_pipeline()
        .tracing()
        .with_trace_config(trace::config())
        .with_exporter(
            new_exporter().tonic(),
        )
        .install_simple()?;

    registry()
        .with(tracing_opentelemetry::layer().with_tracer(tracer))
        .init();

    let result = run().await;

    global::shutdown_tracer_provider();

    result
}

async fn run() -> Result<()> {
    // build our application with a route
    let app = Router::new()
        // `GET /` goes to `root`
        .route("/", post(root))
        // `POST /users` goes to `create_user`
        .route("/users", get(create_user));

    let addr = SocketAddr::from(([127, 0, 0, 1], 3000));

    println!("listening on {addr}");

    Server::bind(&addr).serve(app.into_make_service()).await?;

    Ok(())
}

#[tracing::instrument]
pub async fn root() -> &'static str {
    hello()
}

#[tracing::instrument]
pub async fn create_user() -> &'static str {
    "User created!"
}

#[tracing::instrument]
fn hello() -> &'static str {
    "Hello, World!"
}

I have jaegar/all-in-one docker container running that takes the traces from the above program and shows me what's happening. These are the following observations:

  1. The first time (or minute), I hit an endpoint and generate traces, they are sent immediately and are visible in Jaegar.
  2. After that, any endpoints I hit, all the traces take up to 2 minutes to show up.

I have reproduced this with both tonic and http.

When I use tonic, sometimes the program crashes with SIGKILL which could be caused by #1048.

When I use tonic, sometimes the traces are dropped. I am not sure what's causing it.

@pksunkara pksunkara changed the title SimpleSpanProcessor is delaying traces (and sometimes crashing) SimpleSpanProcessor is delaying/dropping traces (and sometimes crashing) Jul 6, 2023
@pksunkara pksunkara changed the title SimpleSpanProcessor is delaying/dropping traces (and sometimes crashing) SimpleSpanProcessor is delaying traces (and sometimes crashing) Jul 6, 2023
@shaun-cox
Copy link
Contributor

Try changing the call to global::shutdown_tracer_provider(); with this instead:

tokio::task::spawn_blocking(opentelemetry::global::shutdown_tracer_provider)
        .await

@TommyCpp TommyCpp added question Further information is requested A-trace Area: issues related to tracing labels Sep 26, 2023
@stormshield-fabs
Copy link
Contributor

I've updated the dependencies in the provided code and was not able to reproduce this issue (the program did not crash, and no traces were dropped). I'm using Jaeger as the OTLP backend (docker run --rm -it -p4317:4317 -p16686:16686 jaegertracing/all-in-one:latest).

I suggest that we close this issue, especially since it hasn't seen any updates in more than 6 months.

Cargo.toml
[package]
name = "otel-rust-1143"
version = "0.2.0"
edition = "2021"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]
axum = "0.7.4"
color-eyre = "0.6.2"
opentelemetry = "0.21"
opentelemetry-otlp = "0.14"
opentelemetry_sdk = "0.21"
tokio = { version = "1.36.0", features = ["rt-multi-thread", "net"] }
tracing = "0.1.40"
tracing-opentelemetry = "0.22.0"
tracing-subscriber = { version = "0.3.18", features = ["env-filter"] }
main.rs
use std::net::SocketAddr;

use axum::{
    routing::{get, post},
    Router,
};
use color_eyre::{install, Result};
use opentelemetry::global;
use opentelemetry_otlp::{new_exporter, new_pipeline};
use opentelemetry_sdk::trace;
use tokio::net::TcpListener;
use tracing_subscriber::{prelude::*, registry, EnvFilter};

#[tokio::main]
async fn main() -> Result<()> {
    install()?;

    let tracer = new_pipeline()
        .tracing()
        .with_trace_config(trace::config())
        .with_exporter(new_exporter().tonic())
        .install_simple()?;

    registry()
        .with(
            tracing_opentelemetry::layer()
                .with_tracer(tracer)
                .with_filter(EnvFilter::from_default_env()),
        )
        .init();

    let result = run().await;

    global::shutdown_tracer_provider();

    result
}

async fn run() -> Result<()> {
    let app = Router::new()
        .route("/", get(root))
        .route("/users", post(create_user));

    let addr = SocketAddr::from(([127, 0, 0, 1], 3000));
    let listener = TcpListener::bind(addr).await?;
    println!("listening on {addr}");
    axum::serve(listener, app.into_make_service()).await?;

    Ok(())
}

#[tracing::instrument]
pub async fn root() -> &'static str {
    hello()
}

#[tracing::instrument]
pub async fn create_user() -> &'static str {
    "User created!"
}

#[tracing::instrument]
fn hello() -> &'static str {
    "Hello, World!"
}

@cijothomas
Copy link
Member

#1409 is related. SimpleSpanProcessor should not cause any delay as it does not store anything to queue/buffer/etc.

@cijothomas
Copy link
Member

@stormshield-fabs I used the same code as you shared, but my collector/jaeger did not show any spans coming from the app. Could you share the full setup used for repro? The envfilter makes me wonder if I have filtered out all spans or something?

@stormshield-fabs
Copy link
Contributor

Yes you need RUST_LOG=info because the instrument macro defaults to the INFO level.

-                 .with_filter(EnvFilter::from_default_env()),
+                 .with_filter(EnvFilter::from("info")),

@pksunkara
Copy link
Author

.with_filter(EnvFilter::from_default_env())

But in my original code in the first comment, I don't have this because I want to send all the traces. Am I thinking right?

@stormshield-fabs
Copy link
Contributor

Yes, I just added the filter because h2 emits a ton of TRACE level spans.

@pksunkara
Copy link
Author

pksunkara commented Mar 4, 2024

How do I make my app's TRACE spans to get into the system then?

@stormshield-fabs
Copy link
Contributor

I'd suggest to take a look at the EnvFilter directives documentation. When RUST_LOG is not set, the filter defaults to the ERROR level.

You could either:

  • set a level for your crate (in the example above, otel_rust_1143=trace, mind the dash-to-underscore conversion)
  • set a global level and override it for verbose crates (trace,h2=info)

@pksunkara
Copy link
Author

pksunkara commented Mar 4, 2024

Okay, thanks. To confirm, the original issue is happening because of too many h2 traces being added to the system?

if yes, feel free to close the issue.

@stormshield-fabs
Copy link
Contributor

I did not realize it initially but the problem is indeed related to h2 traces. That's because tonic uses h2 under the hood, so collecting h2 traces causes more traces to be emitted, indefinitely.

That's certainly the reason for both the delay and SIGKILLs you mentioned.

@stormshield-fabs
Copy link
Contributor

Related to #761

@cijothomas
Copy link
Member

I'll close this issue now as the issue of live-loop is tracked via #761
And we'll be changing SimpleSpanProcessor to match Logs in #1409

This is a high priority issue that needs to be resolved, and #761 can be used to further discuss.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-trace Area: issues related to tracing question Further information is requested
Projects
None yet
Development

No branches or pull requests

5 participants