Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with prometheus exporter while using delta temporality. [Throwing error - panic: invalid access to shared data] #29574

Closed
bhupeshpadiyar opened this issue Nov 30, 2023 · 6 comments · Fixed by #29608
Labels
bug Something isn't working exporter/prometheus

Comments

@bhupeshpadiyar
Copy link

Component(s)

exporter/prometheus

What happened?

Description

I am getting the following error in otel-collector while sending the metrics to the VictoriaMetrics database with delta temporality.

panic: invalid access to shared data

Steps to Reproduce

I have configured the temporality and delta in the SDK configuration as following

SdkMeterProvider meterProvider = SdkMeterProvider.builder()
        .registerMetricReader(PeriodicMetricReader.builder(OtlpGrpcMetricExporter.builder()
                .setAggregationTemporalitySelector(AggregationTemporalitySelector.deltaPreferred())
                .setEndpoint(OTEL_EXPORTER_OTLP_ENDPOINT).build()).setInterval(Duration.ofMillis(1000)).build())
        .build();

I am using Prometheus exporter in the otel-collector and VictoriaMetrics is my datasource. I am using vmagent to pull the metrics and push them to VictoriaMetrics database.
Here is my collector exporter config:

exporters:
  debug:
    verbosity: detailed

  logging:
    loglevel: debug

  prometheus:
    endpoint: "0.0.0.0:4319"
    send_timestamps: true
    metric_expiration: "120m"

Expected Result

prometheus exporter must process the metrics seamlessly.

Actual Result

Getting the following error stack trace:

metrics-poc-otel-collector-1  |         {"kind": "exporter", "data_type": "metrics", "name": "logging"}
metrics-poc-otel-collector-1  | panic: invalid access to shared data
metrics-poc-otel-collector-1  |
metrics-poc-otel-collector-1  | goroutine 110 [running]:
metrics-poc-otel-collector-1  | go.opentelemetry.io/collector/pdata/internal.(*State).AssertMutable(...)
metrics-poc-otel-collector-1  |         go.opentelemetry.io/collector/pdata@v1.0.0-rcv0018/internal/state.go:20
metrics-poc-otel-collector-1  | go.opentelemetry.io/collector/pdata/pmetric.NumberDataPoint.SetStartTimestamp(...)
metrics-poc-otel-collector-1  |         go.opentelemetry.io/collector/pdata@v1.0.0-rcv0018/pmetric/generated_numberdatapoint.go:61
metrics-poc-otel-collector-1  | github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusexporter.(*lastValueAccumulator).accumulateSum(0xc002b46780, {0xc002d701c0?, 0xc0021fe8f0?}, {0xc002b66460?, 0xc0021fe8f0?}, {0xc002d5c138?, 0xc0021fe8f0?}, {0xc15187eb06ebdf7a, 0x2436bf76f, 0xdb89900})
metrics-poc-otel-collector-1  |         github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusexporter@v0.89.0/accumulator.go:205 +0xe65
metrics-poc-otel-collector-1  | github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusexporter.(*lastValueAccumulator).addMetric(0xc002b46780, {0xc002d701c0?, 0xc0021fe8f0?}, {0xc002b66460?, 0xc0021fe8f0?}, {0xc002d5c138?, 0xc0021fe8f0?}, {0xc15187eb06ebdf7a, 0x2436bf76f, 0xdb89900})
metrics-poc-otel-collector-1  |         github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusexporter@v0.89.0/accumulator.go:85 +0x1fc
metrics-poc-otel-collector-1  | github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusexporter.(*lastValueAccumulator).Accumulate(0x36d6ac567?, {0xc002d5c120?, 0xc0021fe8f0?})
metrics-poc-otel-collector-1  |         github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusexporter@v0.89.0/accumulator.go:71 +0x10f
metrics-poc-otel-collector-1  | github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusexporter.(*collector).processMetrics(...)
metrics-poc-otel-collector-1  |         github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusexporter@v0.89.0/collector.go:92
metrics-poc-otel-collector-1  | github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusexporter.(*prometheusExporter).ConsumeMetrics(0xc002ad3400, {0xc0005b76e0?, 0xc002869a90?}, {0xc002202a68?, 0xc0021fe8f0?})
metrics-poc-otel-collector-1  |         github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusexporter@v0.89.0/prometheus.go:85 +0x5d
metrics-poc-otel-collector-1  | go.opentelemetry.io/collector/exporter/exporterhelper.(*metricsRequest).Export(0x8fb5890?, {0x8fb5858?, 0xc002ed8de0?})
metrics-poc-otel-collector-1  |         go.opentelemetry.io/collector/exporter@v0.89.0/exporterhelper/metrics.go:58 +0x37
metrics-poc-otel-collector-1  | go.opentelemetry.io/collector/exporter/exporterhelper.(*timeoutSender).send(0xc0021d75a0?, {0x8fb5890?, 0xc002ed8db0?}, {0x8f83720?, 0xc002202c60?})
metrics-poc-otel-collector-1  |         go.opentelemetry.io/collector/exporter@v0.89.0/exporterhelper/timeout_sender.go:38 +0x90
metrics-poc-otel-collector-1  | go.opentelemetry.io/collector/exporter/exporterhelper.(*baseRequestSender).send(0xc002ad3500?, {0x8fb5890?, 0xc002ed8db0?}, {0x8f83720?, 0xc002202c60?})
metrics-poc-otel-collector-1  |         go.opentelemetry.io/collector/exporter@v0.89.0/exporterhelper/common.go:33 +0x36
metrics-poc-otel-collector-1  | go.opentelemetry.io/collector/exporter/exporterhelper.(*metricsSenderWithObservability).send(0xc002b468d0, {0x8fb5890?, 0xc002b46b10?}, {0x8f83720?, 0xc002202c60?})
metrics-poc-otel-collector-1  |         go.opentelemetry.io/collector/exporter@v0.89.0/exporterhelper/metrics.go:173 +0x84
metrics-poc-otel-collector-1  | go.opentelemetry.io/collector/exporter/exporterhelper.(*baseRequestSender).send(0xc002b466f0?, {0x8fb5890?, 0xc002b46b10?}, {0x8f83720?, 0xc002202c60?})
metrics-poc-otel-collector-1  |         go.opentelemetry.io/collector/exporter@v0.89.0/exporterhelper/common.go:33 +0x36
metrics-poc-otel-collector-1  | go.opentelemetry.io/collector/exporter/exporterhelper.(*baseExporter).send(...)
metrics-poc-otel-collector-1  |         go.opentelemetry.io/collector/exporter@v0.89.0/exporterhelper/common.go:189
metrics-poc-otel-collector-1  | go.opentelemetry.io/collector/exporter/exporterhelper.NewMetricsExporter.func1({0x8fb5890, 0xc002b46b10}, {0xc002202a68?, 0xc0021fe8f0?})
metrics-poc-otel-collector-1  |         go.opentelemetry.io/collector/exporter@v0.89.0/exporterhelper/metrics.go:98 +0xed
metrics-poc-otel-collector-1  | go.opentelemetry.io/collector/consumer.ConsumeMetricsFunc.ConsumeMetrics(...)
metrics-poc-otel-collector-1  |         go.opentelemetry.io/collector/consumer@v0.89.0/metrics.go:25
metrics-poc-otel-collector-1  | go.opentelemetry.io/collector/internal/fanoutconsumer.(*metricsConsumer).ConsumeMetrics(0xc002869f80?, {0x8fb5890, 0xc002b46b10}, {0xc002202a68?, 0xc0021fe8f0?})

Collector version

0.89.0

Environment information

Environment

OS: (e.g., "MacOS 13.5 (22G74)")

OpenTelemetry Collector configuration

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317

exporters:
  debug:
    verbosity: detailed

  logging:
    loglevel: debug

  prometheus:
    endpoint: "0.0.0.0:4319"
    send_timestamps: true
    metric_expiration: "120m"

processors:
  batch:

extensions:
  health_check:

service:
  extensions: [health_check]
  pipelines:
    metrics:
      receivers: [otlp]
      processors: [batch]
      exporters: [logging, prometheus, debug]

Log output

metrics-poc-otel-collector-1  |         {"kind": "exporter", "data_type": "metrics", "name": "logging"}
metrics-poc-otel-collector-1  | panic: invalid access to shared data
metrics-poc-otel-collector-1  |
metrics-poc-otel-collector-1  | goroutine 110 [running]:
metrics-poc-otel-collector-1  | go.opentelemetry.io/collector/pdata/internal.(*State).AssertMutable(...)
metrics-poc-otel-collector-1  |         go.opentelemetry.io/collector/pdata@v1.0.0-rcv0018/internal/state.go:20
metrics-poc-otel-collector-1  | go.opentelemetry.io/collector/pdata/pmetric.NumberDataPoint.SetStartTimestamp(...)
metrics-poc-otel-collector-1  |         go.opentelemetry.io/collector/pdata@v1.0.0-rcv0018/pmetric/generated_numberdatapoint.go:61
metrics-poc-otel-collector-1  | github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusexporter.(*lastValueAccumulator).accumulateSum(0xc002b46780, {0xc002d701c0?, 0xc0021fe8f0?}, {0xc002b66460?, 0xc0021fe8f0?}, {0xc002d5c138?, 0xc0021fe8f0?}, {0xc15187eb06ebdf7a, 0x2436bf76f, 0xdb89900})
metrics-poc-otel-collector-1  |         github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusexporter@v0.89.0/accumulator.go:205 +0xe65
metrics-poc-otel-collector-1  | github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusexporter.(*lastValueAccumulator).addMetric(0xc002b46780, {0xc002d701c0?, 0xc0021fe8f0?}, {0xc002b66460?, 0xc0021fe8f0?}, {0xc002d5c138?, 0xc0021fe8f0?}, {0xc15187eb06ebdf7a, 0x2436bf76f, 0xdb89900})
metrics-poc-otel-collector-1  |         github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusexporter@v0.89.0/accumulator.go:85 +0x1fc
metrics-poc-otel-collector-1  | github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusexporter.(*lastValueAccumulator).Accumulate(0x36d6ac567?, {0xc002d5c120?, 0xc0021fe8f0?})
metrics-poc-otel-collector-1  |         github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusexporter@v0.89.0/accumulator.go:71 +0x10f
metrics-poc-otel-collector-1  | github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusexporter.(*collector).processMetrics(...)
metrics-poc-otel-collector-1  |         github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusexporter@v0.89.0/collector.go:92
metrics-poc-otel-collector-1  | github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusexporter.(*prometheusExporter).ConsumeMetrics(0xc002ad3400, {0xc0005b76e0?, 0xc002869a90?}, {0xc002202a68?, 0xc0021fe8f0?})
metrics-poc-otel-collector-1  |         github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusexporter@v0.89.0/prometheus.go:85 +0x5d
metrics-poc-otel-collector-1  | go.opentelemetry.io/collector/exporter/exporterhelper.(*metricsRequest).Export(0x8fb5890?, {0x8fb5858?, 0xc002ed8de0?})
metrics-poc-otel-collector-1  |         go.opentelemetry.io/collector/exporter@v0.89.0/exporterhelper/metrics.go:58 +0x37
metrics-poc-otel-collector-1  | go.opentelemetry.io/collector/exporter/exporterhelper.(*timeoutSender).send(0xc0021d75a0?, {0x8fb5890?, 0xc002ed8db0?}, {0x8f83720?, 0xc002202c60?})
metrics-poc-otel-collector-1  |         go.opentelemetry.io/collector/exporter@v0.89.0/exporterhelper/timeout_sender.go:38 +0x90
metrics-poc-otel-collector-1  | go.opentelemetry.io/collector/exporter/exporterhelper.(*baseRequestSender).send(0xc002ad3500?, {0x8fb5890?, 0xc002ed8db0?}, {0x8f83720?, 0xc002202c60?})
metrics-poc-otel-collector-1  |         go.opentelemetry.io/collector/exporter@v0.89.0/exporterhelper/common.go:33 +0x36
metrics-poc-otel-collector-1  | go.opentelemetry.io/collector/exporter/exporterhelper.(*metricsSenderWithObservability).send(0xc002b468d0, {0x8fb5890?, 0xc002b46b10?}, {0x8f83720?, 0xc002202c60?})
metrics-poc-otel-collector-1  |         go.opentelemetry.io/collector/exporter@v0.89.0/exporterhelper/metrics.go:173 +0x84
metrics-poc-otel-collector-1  | go.opentelemetry.io/collector/exporter/exporterhelper.(*baseRequestSender).send(0xc002b466f0?, {0x8fb5890?, 0xc002b46b10?}, {0x8f83720?, 0xc002202c60?})
metrics-poc-otel-collector-1  |         go.opentelemetry.io/collector/exporter@v0.89.0/exporterhelper/common.go:33 +0x36
metrics-poc-otel-collector-1  | go.opentelemetry.io/collector/exporter/exporterhelper.(*baseExporter).send(...)
metrics-poc-otel-collector-1  |         go.opentelemetry.io/collector/exporter@v0.89.0/exporterhelper/common.go:189
metrics-poc-otel-collector-1  | go.opentelemetry.io/collector/exporter/exporterhelper.NewMetricsExporter.func1({0x8fb5890, 0xc002b46b10}, {0xc002202a68?, 0xc0021fe8f0?})
metrics-poc-otel-collector-1  |         go.opentelemetry.io/collector/exporter@v0.89.0/exporterhelper/metrics.go:98 +0xed
metrics-poc-otel-collector-1  | go.opentelemetry.io/collector/consumer.ConsumeMetricsFunc.ConsumeMetrics(...)
metrics-poc-otel-collector-1  |         go.opentelemetry.io/collector/consumer@v0.89.0/metrics.go:25
metrics-poc-otel-collector-1  | go.opentelemetry.io/collector/internal/fanoutconsumer.(*metricsConsumer).ConsumeMetrics(0xc002869f80?, {0x8fb5890, 0xc002b46b10}, {0xc002202a68?, 0xc0021fe8f0?})

Additional context

No response

@bhupeshpadiyar bhupeshpadiyar added bug Something isn't working needs triage New item requiring triage labels Nov 30, 2023
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@crobert-1
Copy link
Member

Hello @bhupeshpadiyar, thanks for filing this issue and including all of the information!

Context

When metrics are ingested into collector components (the Prometheus exporter in this case), they're created in the pdata format with their State set depending on the value of the MutatesData capability. Since the prometheus exporter doesn't explicitly set MutatesData: true, the metrics are created as read-only. The logic of converting the delta sum to cumulative modifies the datapoints' timestamp, which then hits the panic you're seeing.

Side note: Metrics are essentially copied by reference to each exporter consumer, so the same underlying data is being referenced by possibly multiple components or consumers. This is why we want to ensure that MutatesData: true, otherwise we'll run into lots of race and parallelism issues with data being modified by different consumers in the background. This is what I meant by metrics being "created" when being passed to the Prometheus exporter.

Solution

I believe we need to explicitly set the MutatesData capability to true for this exporter. I believe the solution is to simply add a line like this when creating the exporter.

Related issues for background

open-telemetry/opentelemetry-collector#6794
open-telemetry/opentelemetry-collector#8634 - This may be why it's just now being hit and wasn't earlier, but I haven't been able to 100% confirm.
#29111 - Same panic and solution, just a different component

@crobert-1
Copy link
Member

crobert-1 commented Dec 1, 2023

Another note, I learned in testing that this will only be hit if there are multiple exporters in your data pipeline. Metrics are set to Mutable automatically if there's only one exporter. The capability is only checked and impacts behavior with multiple exporters in your pipeline.

@bhupeshpadiyar
Copy link
Author

Hi @crobert-1, Thanks for the updates.

Is, it possible to fix this for multiple exporters since we need both logging/debug and prometheus at the same time?

Thanks

@crobert-1
Copy link
Member

Hello @bhupeshpadiyar, the fix that I've submitted in PR #29608 will solve this issue for multiple exporters in the same pipeline.

(My previous comment was pointing out that the bug was only hit when multiple exporters were in the data pipeline, so the bug and fix is only relevant for that situation. Sorry if it caused some confusion there!)

@bhupeshpadiyar
Copy link
Author

Understood. Thanks @crobert-1

codeboten pushed a commit that referenced this issue Dec 8, 2023
The prometheus exporter hit a panic when accumulating `Delta` metrics
into `Cumulative` sums. This is because the exporter does not enable
mutating data in its capability. This change enables the exporter to
mutate data in a safe and supported way.

Fixed #29574

**Testing**
There are existing tests that hit the logic that was panicking, but the
metrics are set to `StateMutable` in testing (which is the only way they
can be created and setup for testing). I believe that means that before
this change the tests were invalid (didn't represent reality), but after
this change they'll properly represent the exporter's functionality.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working exporter/prometheus
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants