Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errors not properly reported to Datadog #3516

Open
Meemaw opened this issue Jul 30, 2023 · 5 comments
Open

Errors not properly reported to Datadog #3516

Meemaw opened this issue Jul 30, 2023 · 5 comments
Labels
component/infra Related to internal infrastructure (e.g., tests, CI, tooling) component/open-telemetry OTLP, Datadog, Prometheus, etc. and the integrations around it. raised by user

Comments

@Meemaw
Copy link
Contributor

Meemaw commented Jul 30, 2023

Describe the bug
When using otel datadog exporter, errors are not properly sent to Datadog. In the UI you can see there is an error, but its missing stack trace, description etc.

This is how it looks in router:
Screenshot 2023-07-30 at 08 58 50

This is how it looks with a service that properly reports errors:
Screenshot 2023-07-30 at 08 58 40

Expected behavior
Errors should be sent to Datadog in a way Datadog expects them.

@BrynCooke
Copy link
Contributor

BrynCooke commented Jul 31, 2023

Possibly related to #3226. We may not be creating events at the correct place right now. But we need to investigatge.

@sbehrends
Copy link

sbehrends commented Sep 21, 2023

Not exactly matches but we are also facing some OpenTelemetry issues here.

Temporally @MatthiasLangGrover resolved it by creating a rhai script to log the errors

fn subgraph_service(service, subgraph) {
    let response_callback = |response| {
        if !response.body.errors.is_empty() {
            for error in response.body.errors {
                log_error(`Error received from subgraph (${subgraph}): ${error.message}`);
            }
        }
    };
    service.map_response(response_callback);
}

@BrynCooke
Copy link
Contributor

It looks like there is an issue for events that was introduced in: #2999

In particular this optimization assumes that the otel layer is not interested in events, which is not the case.

@BrynCooke
Copy link
Contributor

After removing the prefiltering I can see that via OTLP events are transmitted to Datadog:

Image

But it's not clear how to link that to the Datadog UI.

The Datadog exporter fared even worst with no events sent at all.

I think we need to do the following:

  1. Reinstate events being sent to the otel layer
  2. Then have a sustained effort to try and make sure the DataDog integration is good and we have appropriate instructions for users to set everything up.

@BrynCooke BrynCooke removed their assignment Nov 27, 2023
@abernix abernix added component/infra Related to internal infrastructure (e.g., tests, CI, tooling) component/open-telemetry OTLP, Datadog, Prometheus, etc. and the integrations around it. labels Nov 30, 2023
@Geal
Copy link
Contributor

Geal commented Jan 12, 2024

@Meemaw since #4102 it is possible to set the dd.trace_id option on span attributes to get the correlation id https://www.apollographql.com/docs/router/configuration/telemetry/instrumentation/spans#span-configuration-example
could you check if that is enough to get the log/span correlation working now?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/infra Related to internal infrastructure (e.g., tests, CI, tooling) component/open-telemetry OTLP, Datadog, Prometheus, etc. and the integrations around it. raised by user
Projects
None yet
Development

No branches or pull requests

5 participants