From 5f6c7fc5dc3f6f74a39d44503c076497c5e95967 Mon Sep 17 00:00:00 2001 From: Joshua MacDonald Date: Fri, 13 May 2022 10:18:41 -0700 Subject: [PATCH] Specify optional Exponential Histogram Aggregation, add example code in the data model (#2252) --- CHANGELOG.md | 11 +++ specification/metrics/datamodel.md | 77 +++++++++++++++++-- specification/metrics/sdk.md | 115 +++++++++++++++++++++++++++-- 3 files changed, 191 insertions(+), 12 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 04fba7118a3..cd90f664ef8 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -13,6 +13,17 @@ release. ### Metrics +- Clarify that API support for multi-instrument callbacks is permitted. + ([#2263](https://github.com/open-telemetry/opentelemetry-specification/pull/2263)). +- Clarify SDK behavior when view conflicts are present + ([#2462](https://github.com/open-telemetry/opentelemetry-specification/pull/2462)). +- Clarify MetricReader.Collect result + ([#2495](https://github.com/open-telemetry/opentelemetry-specification/pull/2495)). +- Add database connection pool metrics semantic conventions + ([#2273](https://github.com/open-telemetry/opentelemetry-specification/pull/2273)). +- Specify optional support for an Exponential Histogram Aggregation. + ([#2252](https://github.com/open-telemetry/opentelemetry-specification/pull/2252)) + ### Logs ### Resource diff --git a/specification/metrics/datamodel.md b/specification/metrics/datamodel.md index ef78116816a..db664490783 100644 --- a/specification/metrics/datamodel.md +++ b/specification/metrics/datamodel.md @@ -22,6 +22,7 @@ * [Sums](#sums) * [Gauge](#gauge) * [Histogram](#histogram) + + [Histogram: Bucket inclusivity](#histogram-bucket-inclusivity) * [ExponentialHistogram](#exponentialhistogram) + [Exponential Scale](#exponential-scale) + [Exponential Buckets](#exponential-buckets) @@ -33,6 +34,7 @@ - [Positive Scale: Use a Lookup Table](#positive-scale-use-a-lookup-table) + [ExponentialHistogram: Producer Recommendations](#exponentialhistogram-producer-recommendations) + [ExponentialHistogram: Consumer Recommendations](#exponentialhistogram-consumer-recommendations) + + [ExponentialHistogram: Bucket inclusivity](#exponentialhistogram-bucket-inclusivity) * [Summary (Legacy)](#summary-legacy) - [Exemplars](#exemplars) - [Single-Writer](#single-writer) @@ -522,6 +524,8 @@ Bucket counts are optional. A Histogram without buckets conveys a population in terms of only the sum and count, and may be interpreted as a histogram with single bucket covering `(-Inf, +Inf)`. +#### Histogram: Bucket inclusivity + Bucket upper-bounds are inclusive (except for the case where the upper-bound is +Inf) while bucket lower-bounds are exclusive. That is, buckets express the number of values that are greater than their lower @@ -716,6 +720,21 @@ func GetExponent(value float64) int32 { } ``` +Implementations are permitted to round subnormal values up to the +smallest normal value, which may permit the use of a built-in function: + +```golang + +func GetExponent(value float64) int { + // Note: Frexp() rounds submnormal values to the smallest normal + // value and returns an exponent corresponding to fractions in the + // range [0.5, 1), whereas we want [1, 2), so subtract 1 from the + // exponent. + _, exp := math.Frexp(value) + return exp - 1 +} +``` + ##### Negative Scale: Extract and Shift the Exponent For negative scales, the index of a value equals the normalized @@ -727,19 +746,59 @@ correct rounding for the negative indices. This may be written as: return GetExponent(value) >> -scale ``` +The reverse mapping function is: + +```golang + return math.Ldexp(1, index << -scale) +``` + +Note that the reverse mapping function is expected to produce +subnormal values even when the mapping function rounds them into +normal values, since the lower boundary of the bucket containing the +smallest normal value may be subnormal. For example, at scale -4 the +smallest normal value `0x1p-1022` falls into a bucket with lower +boundary `0x1p-1024`. + ##### All Scales: Use the Logarithm Function -For any scale, use of the built-in natural logarithm -function. A multiplicative factor equal to `2**scale / ln(2)` -proves useful (where `ln()` is the natural logarithm), for example: +For any scale, the built-in natural logarithm function can be used to +compute the bucket index. A multiplicative factor equal to `2**scale +/ ln(2)` proves useful (where `ln()` is the natural logarithm), for +example: ```golang - scaleFactor := math.Log2E * math.Exp2(scale) - return int64(math.Floor(math.Log(value) * scaleFactor)) + scaleFactor := math.Ldexp(math.Log2E, scale) + return math.Floor(math.Log(value) * scaleFactor) ``` Note that in the example Golang code above, the built-in `math.Log2E` -is defined as `1 / ln(2)`. +is defined as the inverse of the natural logarithm of 2, i.e., `1 / ln(2)`. + +The reverse mapping function is: + +```golang + inverseFactor := math.Ldexp(math.Ln2, -scale) + return math.Exp(index * inverseFactor), nil +``` + +Implementations are expected to verify that their mapping function and +inverse mapping function are correct near the lowest and highest IEEE +floating point values. A mathematically correct formula may produce +wrong result, because of accumulated floating point calculation error +or underflow/overflow of intermediate results. In the Golang +reference implementation, for example, the above formula computes +`+Inf` for the maximum-index bucket. In this case, it is appropriate +to subtract `1<
The Default Value represents the following buckets:
(-∞, 0], (0, 5.0], (5.0, 10.0], (10.0, 25.0], (25.0, 50.0], (50.0, 75.0], (75.0, 100.0], (100.0, 250.0], (250.0, 500.0], (500.0, 1000.0], (1000.0, +∞) | | RecordMinMax | true, false | true | Whether to record min and max. | -This Aggregation informs the SDK to collect: +Explicit buckets are stated in terms of their upper boundary. Buckets +are exclusive of their lower boundary and inclusive of their upper +bound (except at positive infinity). A measurement is defined to fall +into the greatest-numbered bucket with boundary that is greater than +or equal to the measurement. -- Count of `Measurement` values falling within explicit bucket boundaries. -- Arithmetic sum of `Measurement` values in population. This SHOULD NOT be collected when used with -instruments that record negative measurements, e.g. `UpDownCounter` or `ObservableGauge`. -- Min (optional) `Measurement` value in population. -- Max (optional) `Measurement` value in population. +#### Exponential Histogram Aggregation + +The Exponential Histogram Aggregation informs the SDK to collect data +for the [Exponential Histogram Metric +Point](./datamodel.md#exponentialhistogram), which uses an exponential +formula to determine bucket boundaries and an integer `scale` +parameter to control resolution. + +Scale is not a configurable property of this Aggregation, the +implementation will adjust it as necessary given the data. This +Aggregation honors the following configuration parameter: + +| Key | Value | Default Value | Description | +|---------|---------|---------------|--------------------------------------------------------------------------------------------------------------| +| MaxSize | integer | 160 | Maximum number of buckets in each of the positive and negative ranges, not counting the special zero bucket. | + +The default of 160 buckets is selected to establish default support +for a high-resolution histogram able to cover a long-tail latency +distribution from 1ms to 100s with less than 5% relative error. +Because 160 can be factored into `10 * 2**K`, maximum contrast is +relatively simple to derive for scale `K`: + +| Scale | Maximum data contrast at 10 * 2**K buckets | +|-------|--------------------------------------------| +| K+2 | 5.657 (2**(10/4)) | +| K+1 | 32 (2**(10/2)) | +| K | 1024 (2**10) | +| K-1 | 1048576 (2**20) | + +The following table shows how the ideal scale for 160 buckets is +calculated as a function of the input range: + +| Input range | Contrast | Ideal Scale | Base | Relative error | +|-------------|----------|-------------|----------|----------------| +| 1ms - 4ms | 4 | 6 | 1.010889 | 0.542% | +| 1ms - 20ms | 20 | 5 | 1.021897 | 1.083% | +| 1ms - 1s | 10**3 | 4 | 1.044274 | 2.166% | +| 1ms - 100s | 10**5 | 3 | 1.090508 | 4.329% | +| 1μs - 10s | 10**7 | 2 | 1.189207 | 8.643% | + +Note that relative error is calculated as half of the bucket width +divided by the bucket midpoint, which is the same in every bucket. +Using the bucket from [1, base), we have `(bucketWidth / 2) / +bucketMidpoint = ((base - 1) / 2) / ((base + 1) / 2) = (base - 1) / +(base + 1)`. + +This Aggregation uses the notion of "ideal" scale. The ideal scale is +either: + +1. The maximum supported scale, generally used for single-value histogram Aggregations where scale is not otherwise constrained +2. The largest value of scale such that no more than the maximum number of buckets are needed to represent the full range of input data in either of the positive or negative ranges. + +##### Exponential Histogram Aggregation: Handle all normal values + +Implementations are REQUIRED to accept the entire normal range of IEEE +floating point values (i.e., all values except for +Inf, -Inf and NaN +values). + +Implementations SHOULD NOT incorporate non-normal values (i.e., +Inf, +-Inf, and NaNs) into the `sum`, `min`, and `max` fields, because these +values do not map into a valid bucket. + +Implementations MAY round subnormal values away from zero to the +nearest normal value. + +##### Exponential Histogram Aggregation: Support a minimum and maximum scale + +The implementation MUST maintain reasonable minimum and maximum scale +parameters that the automatic scale parameter will not exceed. + +##### Exponential Histogram Aggregation: Use the maximum scale for single measurements + +When the histogram contains not more than one value in either of the +positive or negative ranges, the implementation SHOULD use the maximum +scale. + +##### Exponential Histogram Aggregation: Maintain the ideal scale + +Implementations SHOULD adjust the histogram scale as necessary to +maintain the best resolution possible, within the constraint of +maximum size (max number of buckets). Best resolution (highest scale) +is achieved when the number of positive or negative range buckets +exceeds half the maximum size, such that increasing scale by one would +not be possible given the size constraint. ### Observations inside asynchronous callbacks