Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datadog Connector Memory Issues #29755

Closed
dineshg13 opened this issue Dec 12, 2023 · 8 comments
Closed

Datadog Connector Memory Issues #29755

dineshg13 opened this issue Dec 12, 2023 · 8 comments
Assignees
Labels
bug Something isn't working connector/datadog priority:p1 High

Comments

@dineshg13
Copy link
Member

Component(s)

connector/datadog

What happened?

Description

Customers using Datadog connector at scale have reported Collector memory issues. We are able to replicate the issue with the help of trace dump . The collector using Datadog connector increases memory and OOMs within a few minutes of starting.

Steps to Reproduce

Use the collector config and send the traces down the pipe.

Expected Result

Collector shouldn't OOM.

Actual Result

Collector memory and CPU spike and we are unable to use Datadog Connector at scale.

Collector version

v0.91.0

Environment information

Environment

Latest GKE cluster.

OpenTelemetry Collector configuration

receivers:
  prometheus:
    config:
      scrape_configs:
        - job_name: "otelcol"
          scrape_interval: 10s
          static_configs:
            - targets: ["0.0.0.0:8888"]
          metric_relabel_configs:
            - source_labels: [__name__]
              regex: ".*grpc_io.*"
              action: drop
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318
processors:
  batch:
  groupbyattrs:
    keys:
      - service.name
      - environment
  attributes/env:
    actions:
      - action: upsert
        key: deployment.environment
        value: "${env:DD_SERVICE}"
  attributes/drop:
    include:
      match_type: strict
      resources:
        - key: service.name
        - key: environment
    exclude:
      match_type: regexp
      resources:
        key: ".*"
    actions:
      - action: insert
        key: deployment.environment
        from_attribute: environment
  probabilistic_sampler:
    hash_seed: 22
    sampling_percentage: 1
  resourcedetection:
    detectors: [env, gcp]
    timeout: 2s
    override: false
extensions:
  health_check:
connectors:
  datadog/connector:
    trace_buffer: 500
exporters:
  datadog:
    sending_queue:
      queue_size: 10000
    traces:
      trace_buffer: 500
    metrics:
      resource_attributes_as_tags: true
      histograms:
        mode: "counters"
        send_count_sum_metrics: true
    api:
      key: "${env:DD_API_KEY}"
service:
  extensions:
    - health_check
  telemetry:
    logs:
      initial_fields:
        - service: "otel-collector"
  pipelines:
    metrics:
      receivers: [otlp, datadog/connector, prometheus]
      processors: [resourcedetection, attributes/env, batch]
      exporters: [datadog]
    traces/1:
      receivers: [otlp]
      processors: [attributes/env, groupbyattrs, resourcedetection]
      exporters: [datadog/connector]
    traces/2:
      receivers: [otlp]
      processors: [probabilistic_sampler, attributes/env, resourcedetection, batch]
      exporters: [datadog]

Log output

No response

Additional context

No response

@dineshg13 dineshg13 added bug Something isn't working needs triage New item requiring triage labels Dec 12, 2023
@mx-psi mx-psi added priority:p1 High connector/datadog and removed needs triage New item requiring triage labels Dec 12, 2023
@dineshg13 dineshg13 changed the title Datadog Connector Datadog Connector Memory Issues Dec 12, 2023
mx-psi pushed a commit that referenced this issue Jan 4, 2024
…30085)

**Description:** <Describe what has changed.>
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
Adds feature gate `connector.datadogconnector.performance` that can be
used optionally to reduce memory footprint of datadog connector.

**Link to tracking Issue:**
#29755

**Testing:** <Describe what testing was performed and which tests were
added.>
- Tested internally using client data. 

**Documentation:** <Describe the documentation added.>
cparkins pushed a commit to AmadeusITGroup/opentelemetry-collector-contrib that referenced this issue Jan 10, 2024
…pen-telemetry#30085)

**Description:** <Describe what has changed.>
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
Adds feature gate `connector.datadogconnector.performance` that can be
used optionally to reduce memory footprint of datadog connector.

**Link to tracking Issue:**
open-telemetry#29755

**Testing:** <Describe what testing was performed and which tests were
added.>
- Tested internally using client data. 

**Documentation:** <Describe the documentation added.>
@dineshg13
Copy link
Member Author

This is resolved via feature gate. See datadog connector readme.

@grzn
Copy link
Contributor

grzn commented Jan 30, 2024

Hi,

Wer'e still seeing memory issues, even with the feature gate enabled.

@mariohdoz
Copy link

Hi @grzn,

Can you please give me an example on how do you enable the feature gate? I was looking for an example on how to do that but I didn't find anything.

@arielvalentin
Copy link
Contributor

Hi,

Wer'e still seeing memory issues, even with the feature gate enabled.

Same for us. We've reported our issue directly to DataDog.

@grzn
Copy link
Contributor

grzn commented Jan 30, 2024

It's a command-line parameter to the binary.

We were in v0.7something and it was all good. Now we're trying 0.92 and it's leaking. Going to try 0.82 which is the last version before the processor refactor.

@arielvalentin
Copy link
Contributor

@grzn We didn't have success with the deprecated processor because it does not support computing stats by peer service and span kind.

Once we enable it, we lost the ability to see metrics for inferred services.

@grzn
Copy link
Contributor

grzn commented Jan 30, 2024

I'll push this through our DataDog channels as well.

@sirianni
Copy link
Contributor

Cross-referencing to #30828

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working connector/datadog priority:p1 High
Projects
None yet
Development

No branches or pull requests

6 participants