Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[prometheus] Invalid exemplar structure in open metrics format http resonse #27813

Closed
CodeBlanch opened this issue Oct 17, 2023 · 2 comments
Closed
Labels
bug Something isn't working exporter/prometheus

Comments

@CodeBlanch
Copy link
Member

CodeBlanch commented Oct 17, 2023

Component(s)

receivers:
  otlp:
    protocols:
      grpc:
      http:

exporters:
  logging:
    loglevel: debug
  prometheus:
    endpoint: ":9201"
    send_timestamps: true
    metric_expiration: 180m
    enable_open_metrics: true
  otlp:
    endpoint: tempo:4317
    tls:
      insecure: true

service:
  pipelines:
    traces:
      receivers: [otlp]
      exporters: [logging,otlp]
    metrics:
      receivers: [otlp]
      exporters: [logging,prometheus]
    logs:
      receivers: [otlp]
      exporters: [logging]

What happened?

[Originally reported here: https://github.com/prometheus/prometheus/issues/12975]

Description

I sent a histogram to OpenTelemetry Collector which has an exemplar which does NOT have a TraceId or SpanId:

exemplars-otel-collector-1  | 2023-10-12T20:16:21.499Z  info    MetricsExporter {"kind": "exporter", "data_type": "metrics", "name": "logging", "resource metrics": 1, "metrics": 1, "data points": 1}
exemplars-otel-collector-1  | 2023-10-12T20:16:21.499Z  info    ResourceMetrics #0
exemplars-otel-collector-1  | Resource SchemaURL:
exemplars-otel-collector-1  | Resource attributes:
exemplars-otel-collector-1  |      -> service.name: Str(OpenTelemetryDemo)
exemplars-otel-collector-1  |      -> telemetry.sdk.name: Str(opentelemetry)
exemplars-otel-collector-1  |      -> telemetry.sdk.language: Str(dotnet)
exemplars-otel-collector-1  |      -> telemetry.sdk.version: Str(1.6.1-alpha.0.55)
exemplars-otel-collector-1  | ScopeMetrics #0
exemplars-otel-collector-1  | ScopeMetrics SchemaURL:
exemplars-otel-collector-1  | InstrumentationScope Microsoft.AspNetCore.Server.Kestrel
exemplars-otel-collector-1  | Metric #0
exemplars-otel-collector-1  | Descriptor:
exemplars-otel-collector-1  |      -> Name: kestrel.connection.duration
exemplars-otel-collector-1  |      -> Description: The duration of connections on the server.
exemplars-otel-collector-1  |      -> Unit: s
exemplars-otel-collector-1  |      -> DataType: Histogram
exemplars-otel-collector-1  |      -> AggregationTemporality: Cumulative
exemplars-otel-collector-1  | HistogramDataPoints #0
exemplars-otel-collector-1  | StartTimestamp: 2023-10-12 20:16:08.523585 +0000 UTC
exemplars-otel-collector-1  | Timestamp: 2023-10-12 20:16:21.4626029 +0000 UTC
exemplars-otel-collector-1  | Count: 3
exemplars-otel-collector-1  | Sum: 11.745288
exemplars-otel-collector-1  | Min: 0.004035
exemplars-otel-collector-1  | Max: 5.906036
exemplars-otel-collector-1  | ExplicitBounds #0: 5000.000000
exemplars-otel-collector-1  | Buckets #0, Count: 3
exemplars-otel-collector-1  | Buckets #1, Count: 0
exemplars-otel-collector-1  | Exemplars:
exemplars-otel-collector-1  | Exemplar #0
exemplars-otel-collector-1  |      -> Trace ID:
exemplars-otel-collector-1  |      -> Span ID:
exemplars-otel-collector-1  |      -> Timestamp: 2023-10-12 20:16:21.4410366 +0000 UTC
exemplars-otel-collector-1  |      -> Value: 5.906036
exemplars-otel-collector-1  |      -> FilteredAttributes:
exemplars-otel-collector-1  |           -> server.address: Str(::1)
exemplars-otel-collector-1  |           -> server.port: Int(5291)
exemplars-otel-collector-1  |           -> network.type: Str(ipv6)
exemplars-otel-collector-1  |           -> network.transport: Str(tcp)
exemplars-otel-collector-1  |           -> network.protocol.name: Str(http)
exemplars-otel-collector-1  |           -> network.protocol.version: Str(1.1)

The Prometheus exporter inside the collector responds to the scrape request like this:

# HELP kestrel_connection_duration_seconds The duration of connections on the server.
# TYPE kestrel_connection_duration_seconds histogram
kestrel_connection_duration_seconds_bucket{job="OpenTelemetryDemo",le="5000.0"} 3 1.697141781462e+09 #  5.9060365 1.6971417814410365e+09
kestrel_connection_duration_seconds_bucket{job="OpenTelemetryDemo",le="+Inf"} 3 1.697141781462e+09
kestrel_connection_duration_seconds_sum{job="OpenTelemetryDemo"} 11.745288200000001 1.697141781462e+09
kestrel_connection_duration_seconds_count{job="OpenTelemetryDemo"} 3 1.697141781462e+09

This line...

kestrel_connection_duration_seconds_bucket{job="OpenTelemetryDemo",le="5000.0"} 3 1.697141781462e+09 # 5.9060365 1.6971417814410365e+09

...blows up Prometheus scraper:

2023-10-12 13:16:25 ts=2023-10-12T20:16:25.845Z caller=scrape.go:1399 level=debug component="scrape manager" scrape_pool=otel target=http://otel-collector:9201/metrics msg="Append failed" err="expected next entry after timestamp, got \" #  \" (\"INVALID\") while parsing: \"kestrel_connection_duration_seconds_bucket{job=\\\"OpenTelemetryDemo\\\",le=\\\"5000.0\\\"} 3 1.697141781462e+09 #  \""

Collector version

37e7f494a600

Additional context

The prometheus team looked at this (prometheus/prometheus#12975 (comment)) and said...

kestrel_connection_duration_seconds_bucket{job="OpenTelemetryDemo",le="5000.0"} 3 1.697141781462e+09 # 5.9060365 1.6971417814410365e+09

...should write out empty label set ({})...

kestrel_connection_duration_seconds_bucket{job="OpenTelemetryDemo",le="5000.0"} 3 1.697141781462e+09 # {} 5.9060365 1.6971417814410365e+09

@CodeBlanch CodeBlanch added bug Something isn't working needs triage New item requiring triage labels Oct 17, 2023
@github-actions
Copy link
Contributor

Pinging code owners for exporter/prometheus: @Aneurysm9. See Adding Labels via Comments if you do not have permissions to add labels yourself.

@crobert-1
Copy link
Member

Hello @CodeBlanch, I was able to look into this and it looks like it's actually an issue with the prometheus/client_golang package. It looks like you're hitting a frequency of prometheus/client_golang#1333. The code path looks different, but investigation shows the same method causing an issue. Let me know if anything here doesn't make sense, happy to help. Sorry for another redirection!

On the OpenTelemetry Collector side of things, here's the code path:
Inside convertDoubleHistogram:

        // Prometheus here is: 	"github.com/prometheus/client_golang/prometheus"
        if len(exemplars) > 0 {
                // The exemplars currently has labels set to be an empty map
		m, err = prometheus.NewMetricWithExemplars(m, exemplars...)
		if err != nil {
			return nil, err
		}
	}

prometheus.NewMetricWithExemplars:

		exs[i], err = newExemplar(e.Value, ts, e.Labels)

newExemplar:

func newExemplar(value float64, ts time.Time, l Labels) (*dto.Exemplar, error) {
        ...
	labelPairs := make([]*dto.LabelPair, 0, len(l))
        ...
        // This loop is skipped since there are no labels
	for name, value := range l {
        ...
        }
	e.Label = labelPairs

Issue prometheus/client_golang#1333 appears to take a different route, but it ends up in the exact same method:

        // From example in issue description:
	withExemplar.ObserveWithExemplar(time.Since(t).Seconds(), prometheus.Labels{})

ObserveWithExemplar:

func (h *histogram) ObserveWithExemplar(v float64, e Labels) {
        ...
	h.updateExemplar(v, i, e)

updateExemplar:

func (h *histogram) updateExemplar(v float64, bucket int, l Labels) {
        ...
	e, err := newExemplar(v, h.now(), l)

@crobert-1 crobert-1 closed this as not planned Won't fix, can't repro, duplicate, stale Oct 27, 2023
@crobert-1 crobert-1 removed the needs triage New item requiring triage label Oct 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working exporter/prometheus
Projects
None yet
Development

No branches or pull requests

2 participants