Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Synchronous gauges should get exposed at ongoing current timestamps, not the timestamp of the last Record() call #5510

Closed
juliusv opened this issue Jun 13, 2024 · 0 comments · Fixed by #5517
Assignees
Labels
area:metrics Part of OpenTelemetry Metrics bug Something isn't working pkg:SDK Related to an SDK package
Milestone

Comments

@juliusv
Copy link
Contributor

juliusv commented Jun 13, 2024

Chatted about this with @gouthamve and he said @dashpole recommend filing an issue:

Description

When I create a synchronous gauge and call Record(1) on it once, the gauge gets exposed repeatedly (over time) at the timestamp of the initial Record() call. My expectation was that the gauge would be exposed with ongoing (current) timestamps whenever metrics are pushed/read. From what I can gather from the metrics SDK spec, my expectation is also how it should behave:

The ending timestamp (i.e. TimeUnixNano) MUST always be equal to time the metric data point took effect, which is equal to when MetricReader.Collect was invoked.

The current behavior breaks any usage of synchronous gauges with Prometheus where you don't set the value regularly, because the data point being ingested at always the same timestamp initially just leads to a 1-point (non-continuous) time series being ingested, and then after a while just msg="Out of order sample from remote write" err="out of bounds" style errors from Prometheus as OTel tries to send data points that are too old for Prometheus to ingest.

Steps To Reproduce

To see the whole issue, including Prometheus raising errors about out-of-bounds samples, run a Prometheus server with OTLP ingestion enabled:

./prometheus --enable-feature=otlp-write-receiver

And then run the following OTel example to ingest gauge samples into it (and also pretty-print-log them to stdout):

package main

import (
	"context"
	"log"
	"os"
	"os/signal"
	"time"

	"go.opentelemetry.io/otel"
	"go.opentelemetry.io/otel/exporters/otlp/otlpmetric/otlpmetrichttp"
	"go.opentelemetry.io/otel/exporters/stdout/stdoutmetric"
	sdk_metric "go.opentelemetry.io/otel/sdk/metric"
)

func main() {
	// Handle SIGINT (CTRL+C) gracefully.
	ctx, stop := signal.NotifyContext(context.Background(), os.Interrupt)

	otlpMetricExporter, err := otlpmetrichttp.New(ctx, otlpmetrichttp.WithEndpointURL("http://localhost:9090/api/v1/otlp/v1/metrics"))
	if err != nil {
		log.Fatalf("failed to create OTLP metric exporter: %v", err)
	}

	stdoutMetricExporter, err := stdoutmetric.New(stdoutmetric.WithPrettyPrint(), stdoutmetric.WithTemporalitySelector(sdk_metric.DefaultTemporalitySelector))
	if err != nil {
		log.Fatalf("failed to create stdout metric exporter: %v", err)
	}

	meterProvider := sdk_metric.NewMeterProvider(
		// Send metrics via OTLP.
		sdk_metric.WithReader(sdk_metric.NewPeriodicReader(otlpMetricExporter, sdk_metric.WithInterval(5*time.Second))),
		// Also log metrics to stdout.
		sdk_metric.WithReader(sdk_metric.NewPeriodicReader(stdoutMetricExporter, sdk_metric.WithInterval(5*time.Second))),
	)
	otel.SetMeterProvider(meterProvider)
	defer meterProvider.Shutdown(ctx)

	meter := meterProvider.Meter("broken-gauge-example")

	// Synchronous gauge.
	gauge, err := meter.Float64Gauge("my_gauge")
	if err != nil {
		log.Fatalf("failed to create gauge: %v", err)
	}
	gauge.Record(ctx, 42)

	// Wait for interruption / first CTRL+C.
	<-ctx.Done()
	// Stop receiving signal notifications as soon as possible.
	stop()
}
@juliusv juliusv added the bug Something isn't working label Jun 13, 2024
@dashpole dashpole added area:metrics Part of OpenTelemetry Metrics pkg:SDK Related to an SDK package labels Jun 13, 2024
@dashpole dashpole added this to the v1.28.0 milestone Jun 13, 2024
@dashpole dashpole self-assigned this Jun 13, 2024
pellared pushed a commit that referenced this issue Jun 19, 2024
Fixes #5510

From https://opentelemetry.io/docs/specs/otel/metrics/sdk/#metricreader:

> The ending timestamp (i.e. TimeUnixNano) MUST always be equal to time
the metric data point took effect, which is equal to when
[MetricReader.Collect](https://opentelemetry.io/docs/specs/otel/metrics/sdk/#collect)
was invoked. These rules apply to all metrics, not just those whose
[point
kinds](https://opentelemetry.io/docs/specs/otel/metrics/data-model/#point-kinds)
includes an aggregation temporality field.

Before #5305, we
used the measurement time as the timestamp, but it didn't matter because
the collection time is always the same as the measurement time for
asynchronous instruments. We didn't catch the issue when we implemented
synchronous instruments.

This PR changes the (end) timestamp handling for the Last Value
aggregation to match the (end) timestamp handling for sum and histogram
aggregations. As described in the spec above, we take the timestamp when
computing the aggregation during Collect.
OrHayat pushed a commit to OrHayat/opentelemetry-go that referenced this issue Jun 23, 2024
…5517)

Fixes open-telemetry#5510

From https://opentelemetry.io/docs/specs/otel/metrics/sdk/#metricreader:

> The ending timestamp (i.e. TimeUnixNano) MUST always be equal to time
the metric data point took effect, which is equal to when
[MetricReader.Collect](https://opentelemetry.io/docs/specs/otel/metrics/sdk/#collect)
was invoked. These rules apply to all metrics, not just those whose
[point
kinds](https://opentelemetry.io/docs/specs/otel/metrics/data-model/#point-kinds)
includes an aggregation temporality field.

Before open-telemetry#5305, we
used the measurement time as the timestamp, but it didn't matter because
the collection time is always the same as the measurement time for
asynchronous instruments. We didn't catch the issue when we implemented
synchronous instruments.

This PR changes the (end) timestamp handling for the Last Value
aggregation to match the (end) timestamp handling for sum and histogram
aggregations. As described in the spec above, we take the timestamp when
computing the aggregation during Collect.
OrHayat pushed a commit to OrHayat/opentelemetry-go that referenced this issue Jun 23, 2024
…5517)

Fixes open-telemetry#5510

From https://opentelemetry.io/docs/specs/otel/metrics/sdk/#metricreader:

> The ending timestamp (i.e. TimeUnixNano) MUST always be equal to time
the metric data point took effect, which is equal to when
[MetricReader.Collect](https://opentelemetry.io/docs/specs/otel/metrics/sdk/#collect)
was invoked. These rules apply to all metrics, not just those whose
[point
kinds](https://opentelemetry.io/docs/specs/otel/metrics/data-model/#point-kinds)
includes an aggregation temporality field.

Before open-telemetry#5305, we
used the measurement time as the timestamp, but it didn't matter because
the collection time is always the same as the measurement time for
asynchronous instruments. We didn't catch the issue when we implemented
synchronous instruments.

This PR changes the (end) timestamp handling for the Last Value
aggregation to match the (end) timestamp handling for sum and histogram
aggregations. As described in the spec above, we take the timestamp when
computing the aggregation during Collect.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:metrics Part of OpenTelemetry Metrics bug Something isn't working pkg:SDK Related to an SDK package
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants