-
Notifications
You must be signed in to change notification settings - Fork 79
Stop including artificial 0 bound when existing lowest bound is negative #262
Conversation
Codecov Report
@@ Coverage Diff @@
## master #262 +/- ##
=======================================
Coverage 72.02% 72.02%
=======================================
Files 17 17
Lines 1691 1691
=======================================
Hits 1218 1218
Misses 397 397
Partials 76 76
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change itself LGTM, as long as negative bounds are actually supported / allowed.
Please confirm with @rghetia if that's a legit change.
Also note that we will need to make a release and update the dependency in https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/master/exporter/stackdriverexporter to pick up this change in the OpenTelemetry Collector.
@@ -536,7 +536,7 @@ func newTypedValue(vd *view.View, r *view.Row) *monitoringpb.TypedValue { | |||
} | |||
|
|||
func shouldInsertZeroBound(bounds ...float64) bool { | |||
if len(bounds) > 0 && bounds[0] != 0.0 { | |||
if len(bounds) > 0 && bounds[0] > 0.0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I digged a bit into the history behind this condition, and it seems like in OpenCensus the negative bounds were explicitly forbidden, see census-instrumentation/opencensus-go#963
You supposedly have encountered this issue in the OpenTelemetry Collector pipeline? Are negative bounds allowed there?
This specific condition was discussed in the original PR, see #89 (comment)
@rghetia @mayurkale22 do you happen to know more context on this, and possible difference between OpenCensus vs OpenTelemetry in that regard?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I came across this when trying to export Prometheus metrics (using OpenTelemetry Prometheus receiver) to Stackdriver (using Stackdriver exporter). The error I got came from Stackdriver Monitoring API, essentially saying that bounds[0] > bounds[1]. So that distribution came all the way through OpenTelemetry just fine:
rpc error: code = InvalidArgument desc = Field timeSeries[35].points[0].distributionValue had an invalid value: Distribution |explicit_buckets.bounds| entry 1 has a value of -4.94065645841247e-324 which is less than the value of entry 0 which is 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. @mayurkale22 claimed in census-instrumentation/opencensus-go#963 that both Prometheus and Stackdriver don't support negative bounds... Has that changed?
Either way, this seems like a harmless change.
If it turns out that Stackdriver indeed doesn't support negative values, we should explicitly validate the buckets in the exporter rather than relying on the OpenCensus / OpenTelemetry code being correct, i.e. fail earlier before invoking this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So from what I have seen I'm pretty sure Prometheus client libs have no problem exposing negative bounds. I'm not actually sure about Stackdriver, it is possible that this change will only turn one error into another. I'll check that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docs only seem to require that bucket bounds are increasing, nothing about forbidding negative buckets: https://cloud.google.com/monitoring/api/ref_v3/rest/v3/TypedValue#explicit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually opentelemetry-proto also doesn't support negative bounds.
/cc @bogdandrutu
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. So, next culprit: Prometheus receiver. I saw negative bounds on Prometheus endpoint and that was ingested into OpenTelemetry Collector via Prometheus receiver just fine. Should the negative bound be dropped there instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the -ve bound is a legitimate use case then we have to address that in all relevant libraries. If it is not then it is okay to drop it OR with an optional flag allow -ve bounds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the right thing to do is to allow negative bounds, they are not really useful for things like request latencies, but may have other use cases (e.g. Prometheus docs give temperatures as an example).
I just created open-telemetry/opentelemetry-proto#156 to extend the histogram support to negative bucket bounds.
Can this change be merged though? If OT won't end up extending histogram support to cover negative values, this change should be a no-op. If it does, this change will make that compatible with the updated model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fine with merging this change. However it will still be incorrect in the following case.
"If the distribution has lowest bound >0 AND if it includes -ve values then this change distorts the distribution by adding additional '0' bound."
May be add this to the comment and link to the above issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I will wait for @rghetia and / or @mayurkale22 to approve this change before merging.
@@ -536,7 +536,7 @@ func newTypedValue(vd *view.View, r *view.Row) *monitoringpb.TypedValue { | |||
} | |||
|
|||
func shouldInsertZeroBound(bounds ...float64) bool { | |||
if len(bounds) > 0 && bounds[0] != 0.0 { | |||
if len(bounds) > 0 && bounds[0] > 0.0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fine with merging this change. However it will still be incorrect in the following case.
"If the distribution has lowest bound >0 AND if it includes -ve values then this change distorts the distribution by adding additional '0' bound."
May be add this to the comment and link to the above issue.
@nilebox