-
Notifications
You must be signed in to change notification settings - Fork 673
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix #3089 'histograms are dissapearing' #3196
Fix #3089 'histograms are dissapearing' #3196
Conversation
|
@@ -275,7 +275,7 @@ def collect( | |||
""" | |||
with self._lock: | |||
if not any(self._bucket_counts): | |||
return None | |||
return self._previous_point |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is incorrect; we can't return the previous point always.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please suggest what cases should I consider to make the correct change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will look closely at the linked issue to see the correct fix, but the issue with returning self._previous_point
is when the temporality is delta and no measurement is recorded, it will return an incorrect result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can try to make it return a new HistogramDataPoint with value 0 or something like that, but I've tested the solution with AggregationTemporality.Delta and it seems to work
Here's the code I used for test:
import time
from opentelemetry.exporter.otlp.proto.http.metric_exporter import OTLPMetricExporter
from opentelemetry.metrics import get_meter_provider, set_meter_provider
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.sdk.metrics.export import PeriodicExportingMetricReader
from opentelemetry.sdk.metrics import (
Histogram,
)
from opentelemetry.sdk.metrics.export import AggregationTemporality
exporter = OTLPMetricExporter(endpoint="http://localhost:4445/v1/metrics",
preferred_temporality={
Histogram: AggregationTemporality.DELTA,
})
reader = PeriodicExportingMetricReader(exporter, export_interval_millis=1000)
set_meter_provider(MeterProvider(metric_readers=[reader]))
meter = get_meter_provider().get_meter("myapp", "0.1.2")
histogram = meter.create_histogram(
"py_histogram",
unit="ms",
description="Test histogram",
)
counter = meter.create_counter(
f"py_counter",
unit=f"items",
description=f"Test counter"
)
histogram.record(12)
time.sleep(1)
counter.add(1)
time.sleep(1)
counter.add(1)
time.sleep(1)
counter.add(1)
time.sleep(1)
counter.add(1)
time.sleep(1)
Here's logs from running this code:
otel-log.txt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if this helps, but I'm currently trying this approach, so the metrics are always present, but possibly with zero values:
diff --git a/opentelemetry-sdk/src/opentelemetry/sdk/metrics/_internal/aggregation.py b/opentelemetry-sdk/src/opentelemetry/sdk/metrics/_internal/aggregation.py
index ae21db90..25f03f69 100644
--- a/opentelemetry-sdk/src/opentelemetry/sdk/metrics/_internal/aggregation.py
+++ b/opentelemetry-sdk/src/opentelemetry/sdk/metrics/_internal/aggregation.py
@@ -289,9 +289,6 @@ class _ExplicitBucketHistogramAggregation(_Aggregation[HistogramPoint]):
Atomically return a point for the current value of the metric.
"""
with self._lock:
- if not any(self._bucket_counts):
- return None
-
bucket_counts = self._bucket_counts
start_time_unix_nano = self._start_time_unix_nano
sum_ = self._sum
@@ -316,6 +313,9 @@ class _ExplicitBucketHistogramAggregation(_Aggregation[HistogramPoint]):
max=max_,
)
+ if not any(bucket_counts):
+ self._previous_point = None
+ return current_point
+
if self._previous_point is None or (
self._instrument_temporality is aggregation_temporality
):
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hrm, never mind, returning zero values like this does not fix my issue, that the see next commentcount
in Prometheus shows lots of wrong, way too large values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recently came back to this, and have edited my above Diff: Also resetting _previous_point
seems to do the right thing for me.
Any update on this? |
@DraggonFantasy |
Description
This PR fixes #3089 - the problem with histogram metric type when the exporting the metrics with unchanged histogram value crashes the OTEL Collector.
Fixes #3089
Type of change
Please delete options that are not relevant.
How Has This Been Tested?
Setup
Have a running OTEL Collector + Prometheus.
OTEL Collector config:
Prometheus config:
I've ran the following Python snippet (it's the minimal example I figured out to reproduce the issue):
And checked the logs on OTEL collector.
Before the fix:
After the fix:
Also ran unit tests to make sure that the change doesn't break anything.
Does This PR Require a Contrib Repo Change?
Checklist: