Define Exemplar requirements in the Metrics SDK spec #1797

reyang · 2021-07-06T21:03:02Z

What are you trying to achieve?

The metrics data model specification has covered Exemplar here.

The goal is to have the SDK specification supporting exemplars.

Related to #1260.

jsuereth · 2021-07-08T13:17:09Z

Requirements for SDK + Metric Exemplars

Here's a set of requirements for Metric Exemplars based on some prototype exemplar sampling work I've done, as well as looking at existing Exemplar implementations. This is for discussion (for now) and i'll formalize into a PR once the aggregator section in the SDK is a bit more fleshed out, as this relies on aggregators.

Basics

MeasurementProcessor should be able to sample incoming measurements as exemplars
Sampled exemplars are NOT cumulative. The list of sampled exemplars may change during every metric stream export.
Exemplars should pull a recording timestamp with the measurement. This can be done when the sampling decision is made, if that decision is done "synchronously".
Exemplars should automatically pull TraceId/SpanId information from associated context on a measurement.
When configuring an SDK (or MeterProvider), the user MUST be able to configure exemplar sampling. See the sampling header for more details.

Sampling

Aggregators should be able to influence exemplar sampling, e.g. histogram leveraging bucket boundaries for exemplar selection, and attempting to keep exemplars per-bucket.
Exemplar samplers need access to context (Span/Trace), and an leverage Span information in sampling decisions.
Can't be the same as Trace sampling (memory overhead)
Should be able to leverage trace sampling decisions

Built-in Implementations

The following built-in samplers SHOULD be provided with easy configuration:

No-Sampling - This sampler never selects exemplars.
Preserve-latest-with-sampled-trace
- Only samples measurements that are recording in a Context with a sampled Span.
- Only keeps "latest" exemplar and drops any past history.
- For histogram aggreagation, this should keep "latest exemplar per bucket".

Prometheus Exporter

When exporting to Prometheus, the following should happen:

The latest sampled exemplar should be reported with any "Sample" point.
For Histograms, this should additionally be restricted based on the bucket being reported such that the exemplar chosen is unique to the currently reported metric sample. While histograms are reported in cumulative "less than or equal" count sums, the exemplar for a particular bucket should not be one that could be included in previous buckets.

Prototype implementation can be found here;

reyang · 2021-07-09T20:52:16Z

@jsuereth nice summary! Here are my feedback:

No-Sampling - consider Always-off Sampling. When people hear "no-sampling", they might have different interpretations - 1) "there is no sampler, so I will get everything" 2) "there is a sampler and it is taking nothing" 3) "there is a sampler and it is not filtering out anything, so I will get everything".

I think similar to Span Limits, we will have some limits on how many samples are we going to allow at maximum (for a bucket, for a time series data point. etc.). This could be useful for sync instruments where users are taking too many samples, or can be useful for pull exporter scenario where we don't want to hold the samples for too long time (e.g. if the scraper stopped pulling for hours).

When we need to "merge" histograms based on interpolation (whenever lossless merge is not available), samples can actually go to the new buckets with 100% confidence (because we have the raw information such as the duration).

jmacd · 2021-07-12T21:46:40Z

I prefer when "Sampling" means something statistical is taking place, and the word "Exemplar" explicitly suggests a selection technique that is not sampling. Thus, Sampling should be an option and instead of "Always-off sampling" or "No-sampling", maybe just "No exemplars".

Instead of "Preserve-latest-with-sampled-trace", maybe "Latest exemplars".

When it comes to sampling, open-telemetry/oteps#148 has recommendations for using exemplars to convey sample events with a sampling.adjusted_count attribute. To compute a sample (i.e., Exemplars with Probabilities) probably means using a reservoir sampling algorithm and picking more than 1 exemplar per stream point per period, and there are simple algorithmic options available.

reyang · 2021-07-13T07:09:07Z

Do we prefer to model MIN/MAX as exemplars (e.g. a cumulative sum 100, with the MAX 5 and MIN 2)?
Or we think there are many cases where people just want to know MIN/MAX without all the other details (e.g. trace id, span id, all the attributes, etc.) so they should be modeled as separate aggregation?

reyang · 2021-07-13T07:11:28Z

Do we allow users to control what data to report with exemplars (e.g. I want the trace id / span id and all the items in the baggage vs. I just need trace id / span id)?

jsuereth · 2021-07-13T14:01:40Z

@jmac

I prefer when "Sampling" means something statistical is taking place, and the word "Exemplar" explicitly suggests a selection technique that is not sampling.

I like this phrasing. When propsing defaults I'll use this.

To compute a sample (i.e., Exemplars with Probabilities) probably means using a reservoir sampling algorithm and picking more than 1 exemplar per stream point per period

Yes, I'm working on reservoir sampling in the Java Metrics prototype right now so we can see how well it does in practice. Specifically, right now Prometheus (and OpenCensus) sample with a "take-latest-per-histogram-bucket" approach (for histogram aggregation). I like the idea of reservoir sampling, and I like the idea of it being the default. The only question in my mind is if we should have a "sample like OpenCensus/Prometheus" hook here.

@reyang

Do we allow users to control what data to report with exemplars (e.g. I want the trace id / span id and all the items in the baggage vs. I just need trace id / span id)?

This is a good point. Want to call out a few things:

Views can specify which baggage attributes to preserve in a metric, so THAT does exist if necessary.
Exemplars only display "difference" attributes (i.e. those the aggregator removed), so in the base-case will not report anything.

So, I don't think baggage-labels on Exemplar is initially important here, but it's a good use case to follow up with. From my view, that's some kind of Measurement => Exemplar function, likely something we should specify on the MeasurementProcessor interface / API.

Do we prefer to model MIN/MAX as exemplars (e.g. a cumulative sum 100, with the MAX 5 and MIN 2)?
Or we think there are many cases where people just want to know MIN/MAX without all the other details (e.g. trace id, span id, all the attributes, etc.) so they should be modeled as separate aggregation?

I think MIN/MAX could be exemplars (possible where we add labels denoting this). However, I don't think that should be the default behavior, and it makes consuming the data a bit harder. I'd prefer reservoir sampling and knowing your min/max are min/max BUT we could encode min/max into exemplars.

jsuereth · 2021-07-27T00:02:36Z

Proposal here: https://github.com/jsuereth/opentelemetry-specification/tree/wip-exemplar

blocked on #1804

reyang added spec:metrics Related to the specification/metrics directory area:sdk Related to the SDK labels Jul 6, 2021

github-actions bot assigned bogdandrutu Jul 6, 2021

reyang added this to the Metrics API/SDK Experimental Release milestone Jul 6, 2021

reyang assigned jsuereth and unassigned bogdandrutu Jul 6, 2021

reyang mentioned this issue Jul 6, 2021

Define Aggregator Requirements in Metric SDK Spec #1260

Closed

jsuereth mentioned this issue Jul 9, 2021

Wire Exemplars into the metrics.data package open-telemetry/opentelemetry-java#3353

Merged

jsuereth mentioned this issue Jul 27, 2021

Metric Exemplars SDK Specification #1828

Merged

jmacd closed this as completed in #1828 Aug 30, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Define Exemplar requirements in the Metrics SDK spec #1797

Define Exemplar requirements in the Metrics SDK spec #1797

reyang commented Jul 6, 2021

jsuereth commented Jul 8, 2021 •

edited

Loading

reyang commented Jul 9, 2021

jmacd commented Jul 12, 2021

reyang commented Jul 13, 2021

reyang commented Jul 13, 2021

jsuereth commented Jul 13, 2021 •

edited

Loading

jsuereth commented Jul 27, 2021

Define Exemplar requirements in the Metrics SDK spec #1797

Define Exemplar requirements in the Metrics SDK spec #1797

Comments

reyang commented Jul 6, 2021

jsuereth commented Jul 8, 2021 • edited Loading

Requirements for SDK + Metric Exemplars

Basics

Sampling

Built-in Implementations

Prometheus Exporter

reyang commented Jul 9, 2021

jmacd commented Jul 12, 2021

reyang commented Jul 13, 2021

reyang commented Jul 13, 2021

jsuereth commented Jul 13, 2021 • edited Loading

jsuereth commented Jul 27, 2021

jsuereth commented Jul 8, 2021 •

edited

Loading

jsuereth commented Jul 13, 2021 •

edited

Loading