Add metrics view API #89

c24t · 2020-03-05T19:06:25Z

This OTEP addresses the Future Work: Configurable Aggregations / View API section of api-metrics.md. It proposes a specification for a view API that allows users to configure aggregations for individual metric instruments.

It covers some of the same ground as open-telemetry/opentelemetry-specification#347, and summarizes a few weeks of conversation in metrics SIG meetings.

See also the google doc draft.

text/0089-metrics-views-api.md

tsloughter · 2020-03-06T17:22:05Z

I will do my best to make the metrics sig meeting on the 11th where this might be easier to discuss, but my first issue with this proposal is simply allowing Counter to be used for anything but Sum.

Having an explicit Counter instrument type allows for optimizations that are lost if the Counter has to be recorded the same as a Measure, where within a collection interval all values are recorded and then aggregated. If the user wants this I'd expect them to use a Measure and if they want a Sum they'd use a Counter, something that can rely on high performance counters on some runtimes.

c24t · 2020-03-06T19:50:08Z

recorded the same as a Measure, where within a collection interval all values are recorded and then aggregated

The values should actually be continuously aggregated, so we update the aggregate sum with each new measurement. Other aggregations would work this way too, e.g. a histogram aggregation would update a bucket count for each new recorded value, and wouldn't keep a reference to the measurement. Only the "exact" aggregation would store measurements.

It is odd that the view API lets you register a non-sum aggregation for a counter instrument though. What do you recommend here?

tsloughter · 2020-03-06T21:18:35Z

I would just remove the non-sum aggregation from the view API for counter instrument.

jkwatson · 2020-03-06T21:20:23Z

text/0089-metrics-views-api.md

+
+The potentially contentious changes this OTEP proposes include:
+
+- Require views to be registered: don't record measurements from API for metric instruments that don't appear in any view.


If views are registered by the operator/application developer, does this mean that that person will get no metric data if they don't add additional configuration? How can they know a priori what metrics they they need to register views for?

The way I've done it is each library exports a function with predefined views. In the simplest case the application developer either enables them or not, while still having the ability to instead define their own, or take a subset of the default views a library exports.

So the operator/application developer has to allow metrics from each library, instead of them being automatic and having to have a way to disable what you don't want.

I'm now thinking of the auto-instrumentation projects, where there may be dozens of libraries being instrumented that the operator may or may not be familiar with. This is going to be a very surprising thing for "normal" auto-instrumentation agent users to have to do, as they are used to getting things OOTB without extra configuration. Would the auto-instrumentation agent then be expected to just register all the defaults provided with the instrumented libraries?

does this mean that that person will get no metric data if they don't add additional configuration?

It does!

Would the auto-instrumentation agent then be expected to just register all the defaults provided with the instrumented libraries?

That's what I would expect, and IIRC it's what we did in OC. For example, the predefined default views for gRPC are here: https://github.com/census-instrumentation/opencensus-java/blob/8b1fd5bbf98b27d0ad27394891e0c64c1171cb2b/contrib/grpc_metrics/src/main/java/io/opencensus/contrib/grpc/metrics/RpcViewConstants.java.

Auto-instrumentation could enable all these views when it loads the gRPC integration.

@trask @prydin @tylerbenson thoughts on this? It definitely seems on the surface like it would make the auto-instrumentation job significantly more difficult. Especially since auto-instrumentation still has to support multiple backends with potentially different aggregations/metric types.

@c24t This only seems viable to me if there's a way for an exporter author to say (for example) "for everything that's a Measure, create a view with a MinMaxSumCount aggregation.". Otherwise, I feel like this is a recipe for users thinking that nothing is working, causing a support nightmare.

That sounds like a good solution to me, and since measures already have unique names I don't see any reason in principle that we couldn't do this.

@bogdandrutu might be able to shed some light on the design of OC views.

I added a note about this in 46948ac.

jkwatson · 2020-03-06T21:28:51Z

Does this proposed Views API live at the same level as the other APIs, or is it a default-SDK-only thing? I asked, because this would be the first "API" that didn't target instrumentors, and instead targets operators/app developers. It feels like that will lead to confusion (it confuses me!) from the user base, especially the instrumentation-building user.

c24t · 2020-03-06T22:51:13Z

Does this proposed Views API live at the same level as the other APIs, or is it a default-SDK-only thing?

Views would live in the API package. Developers (as I understand the persona) would use metric instruments in application code, operators (same caveat) would specify views in either static config or code. Aggregators are SDK objects, and a view might rely on some third party package to define a particular aggregator class.

lzchen · 2020-04-30T17:32:42Z

Does this proposed Views API live at the same level as the other APIs, or is it a default-SDK-only thing?

Views would live in the API package. Developers (as I understand the persona) would use metric instruments in application code, operators (same caveat) would specify views in either static config or code. Aggregators are SDK objects, and a view might rely on some third party package to define a particular aggregator class.

I vague remember discussion about this in SIG meeting - Views API would be part of the SDK, not in API. It makes sense to me to keep Views API in the SDK only, as the intended users are application writers/operations, and not Library authors.

Yes I believe the comment was made before the discussion in the SIG meeting. You are correct.

cijothomas · 2020-04-30T21:57:06Z

text/0089-metrics-views-api.md

+
+Each view describes a relationship between a metric instrument and an aggregation, and specifies the set of label keys to preserve in the aggregated data.
+
+Views are tracked in a registry, which exporters may use to get the current aggregate values for all registered aggregations at export time.


Is Views a global thing, and hence must be registered with MeterProvider, so that all meters created the MeterProvider will get access to View.?

Additional thought around this: would it be valuable to enable specific aggregations to be tied to specific exporters?

I feel like this is another reason why aggregations should be tied to a MeterProvider. Then you can choose different meterprovider sets for different metrics, as necessary.

Something like:

MeterProvider - exporters - aggregators - metrics (this may be unnecessary)

cijothomas · 2020-04-30T21:59:03Z

text/0089-metrics-views-api.md

+Because we don't require the user to specify the set of label keys up front, and because we don't prevent users from recording measurements with missing labels in the API, some label values may be undefined.
+Aggregators should preserve undefined label values, and exporters may convert them as required by the backend.
+
+For example, consider a Sum-aggregated Counter instrument that captures four consecutive measurements:


Can you also add a note about boundinstruments and views? Boundinstruments are supposed to be the fastest way to record a value, as it avoids a lookup for the time series. With Views, bound instruments should still be the fastest, and should not be doing lookups.

This maybe implementation detail, but calling it out in spec as well.

cijothomas

Thanks @c24t for getting this started!
Have left some comments - mostly after looking at Python view prototype, and .NET prototype I am working on.

Most important things I'd like to see addressed:

Where is Views registered? MeterProvider? And Should we allow views to added/removed after MeterProvider is initialized?
Recommendation about View being configured though static files/json etc.
Default behavior if no Views registered. I believe the consensus is to do default aggregation and retain all labels, unless a user configured View overrides this.
Should Views be part of SDK only?

toumorokoshi · 2020-05-03T04:54:25Z

text/0089-metrics-views-api.md

+
+The aggregation may be configured with options specific to its type.
+For example, a _Histogram_ aggregation may be configured with bucket boundaries, e.g. `{0, 1, 10, 100, 200, 1000, inf}`.
+A _Sketch_ aggregation that estimates order statistics (i.e. quantiles), may be configured with a set of predetermined quantiles, e.g. `{.5, .95, .99, 1.00}`.


Could this be called "quantile" aggregation? A sketch doesn't immediately make me think of quantiles.

toumorokoshi · 2020-05-03T04:57:06Z

text/0089-metrics-views-api.md

+Implementations should refuse to register two views with the same name.
+
+Note that a view does not describe the type (e.g. `int`, `float`) or unit of measurement (e.g. "bytes", "milliseconds") of the metric to be exported.
+The unit is determined by the metric instrument, and the aggregation and exporter may preserve or change the unit.


It seems like there's only two operations which are appropriate for a unit modification on a metric instrument:

preserve the unit

change the unit entirely (e.g. to count)

I feel like it may be valuable to call out that there won't be a modification such as "seconds" to "milliseconds". But maybe that's just obvious.

toumorokoshi · 2020-05-03T04:59:17Z

text/0089-metrics-views-api.md

+- A **float**-valued _mean_ metric with "ms" units
+- An **int** valued, unitless (i.e. unit "1") _count_ metric
+
+This OTEP doesn't propose a particular API for Aggregators, just that the API is sufficient for exporters to get all this information, including:


Should the otep recommend an API? reading the examples, it seems like something that may work would be (name, value, unit). I believe it seems it satisfies the examples below.

toumorokoshi · 2020-05-03T05:26:03Z

text/0089-metrics-views-api.md

+min([1, 2, 3, 4, 5, 6]) = min([min([1]), min([2, 3]), min([4, 5, 6])])
+```
+
+but quantile aggregations, e.g. _p95_ and _p99_ are not.


Does this mean that quantiles are not supported as an aggregation? There's a strong need for quantiles, so I assume there's some way to ensure that it is measured and exported, but not clear how.

toumorokoshi · 2020-05-03T05:27:56Z

text/0089-metrics-views-api.md

+but quantile aggregations, e.g. _p95_ and _p99_ are not.
+Applications that export quantile metrics should use a mergeable aggregations such as [DDSketch](https://arxiv.org/abs/1908.10693), which estimates quantile values with bounded errors, or export raw measurements without aggregation and compute exact quantiles on the backend.
+
+We require aggregations to be mergeable so that they produce the same results regardless of the collection interval, or the number of collection events per export interval.


I guess again looking at quantiles as the edge case: what's the need for ensuring that the produced result are the same, regardless of collection interval? Is there some caveat where the collection interval cannot be guaranteed? I guess in current API design it's in the pushController side of things.

toumorokoshi · 2020-05-03T05:29:51Z

text/0089-metrics-views-api.md

+
+Every measurement is associated with a _LabelSet_, a set of key-value pairs that describes the environment in which the measurement was captured.
+Label keys and values may be extracted from the [correlation context](https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/api-correlationcontext.md) at the time of measurement.
+They may also be set by the user [at the time of measurement](https://github.com/open-telemetry/opentelemetry-specification/blob/f4aeb318a5b77c9c39132a8cbc5d995e222d124f/specification/api-metrics-user.md#direct-instrument-calling-convention) or at the time at which [the metric instrument is bound](https://github.com/open-telemetry/opentelemetry-specification/blob/f4aeb318a5b77c9c39132a8cbc5d995e222d124f/specification/api-metrics-user.md#bound-instrument-calling-convention).


How specifically are values from the correlation context piped in to the labelset? I figured that the user would still be responsible for that, but as this is calling out label extraction as a separate line, it sounds like that's not the case.

jkwatson · 2020-07-13T21:35:18Z

My opinions on this:

I think we absolutely should have default aggregations, and I think the metric SIG did agree to that.
I don't think the exporters should have to interact with the Views API unless they want to tell the SDK that they want only delta aggregations for a given instrument type (for example). Exporters shouldn't get their data via views, but the SDK uses the view definition to figure out what aggregation/batching strategy should be used.
Things I think we should shoot for for GA: Being able to specify label reductions, delta/cumulative temporality, and the type of aggregation to use, based on the instrument descriptor.

This OTEP seems to range far and wide outside of the scope for a SDK-only basic views API that I think we should shoot for for GA.

jmacd · 2021-05-25T07:31:57Z

Closing this as obsolete.
❤️ @c24t

morigs · 2021-06-30T14:35:21Z

Excuse me, it's unclear what's the current vision on metrics aggregation. Are there other OTEPs with the same target?

Add metrics view API

8b6d17a

c24t requested review from arminru, bogdandrutu, carlosalberto, iredelmeier, jmacd, reyang, SergeyKanzhelev, tedsuo, tigrannajaryan and yurishkuro as code owners March 5, 2020 19:06

jkwatson reviewed Mar 5, 2020

View reviewed changes

text/0089-metrics-views-api.md Outdated Show resolved Hide resolved

c24t added 5 commits March 5, 2020 13:27

Clarify users

8919885

Formatting fixes and updates from doc

6dba262

Formatting fixes

c78c23b

No LaTeX :(

8a25e19

Add open questions section

e98068f

Fix ungrouped, default batcher references

fb79f78

jkwatson reviewed Mar 6, 2020

View reviewed changes

Add question about standard aggregations

db5c442

c24t added 3 commits March 6, 2020 15:03

Add note on observer aggregation

d71ee90

Formatting fixes

b01f4dc

Add automatic view creation as future possibility

46948ac

c24t added the metrics Relates to the Metrics API/SDK label Mar 26, 2020

c24t mentioned this pull request Apr 9, 2020

Metrics: Scope the Views API open-telemetry/opentelemetry-specification#466

Closed

jmacd mentioned this pull request Apr 30, 2020

Buckets should be defined at instrument level open-telemetry/opentelemetry-go#689

Closed

cijothomas reviewed Apr 30, 2020

View reviewed changes

cijothomas requested changes Apr 30, 2020

View reviewed changes

toumorokoshi reviewed May 3, 2020

View reviewed changes

nilebox mentioned this pull request May 13, 2020

Port Metrics exporter from OpenCensus GoogleCloudPlatform/opentelemetry-operations-java#4

Closed

nilebox mentioned this pull request Jun 9, 2020

Allow histograms to specify negative bucket bounds open-telemetry/opentelemetry-proto#156

Closed

jmacd mentioned this pull request Jun 16, 2020

Add metrics semantic conventions for timed operations open-telemetry/opentelemetry-specification#657

Closed

jmacd mentioned this pull request Jun 24, 2020

Metrics Transform Processor Proposal open-telemetry/opentelemetry-collector-contrib#332

Closed

sonofachamp mentioned this pull request Jun 24, 2020

Add StatsD receiver open-telemetry/opentelemetry-collector-contrib#290

Closed

vmarchaud mentioned this pull request Aug 26, 2020

A way to specify metric aggregator open-telemetry/opentelemetry-js#1465

Closed

dyladan mentioned this pull request Aug 28, 2020

[Metrics] Views API Tracking Issue open-telemetry/opentelemetry-js#1477

Closed

jmacd mentioned this pull request Sep 11, 2020

System metrics semantic conventions open-telemetry/opentelemetry-specification#937

Merged

rinx mentioned this pull request Sep 24, 2020

Use OpenTelemetry-go instead of OpenCensus-go vdaas/vald#722

Closed

18 tasks

ericgribkoff mentioned this pull request Sep 29, 2020

Export detailed metrics via OpenTelemetry grpc/grpc-java#7429

Closed

alanwest mentioned this pull request Oct 19, 2020

Capture http.server.duration metric for ASP.NET Core instrumentation open-telemetry/opentelemetry-dotnet#1347

Closed

2 tasks

tsloughter mentioned this pull request Nov 27, 2020

Views API open-telemetry/opentelemetry-erlang#163

Closed

beanliu mentioned this pull request Jan 18, 2021

Metric View API Prototype open-telemetry/opentelemetry-go#1473

Closed

Base automatically changed from master to main January 27, 2021 20:37

bogdandrutu requested a review from a team as a code owner January 27, 2021 20:37

jmacd closed this May 25, 2021

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add metrics view API #89

Add metrics view API #89

c24t commented Mar 5, 2020

tsloughter commented Mar 6, 2020

c24t commented Mar 6, 2020

tsloughter commented Mar 6, 2020

jkwatson Mar 6, 2020

tsloughter Mar 6, 2020

jkwatson Mar 6, 2020

c24t Mar 6, 2020

jkwatson Mar 6, 2020

jkwatson Mar 6, 2020

c24t Mar 6, 2020

c24t Mar 6, 2020

jkwatson commented Mar 6, 2020

c24t commented Mar 6, 2020

lzchen commented Apr 30, 2020

cijothomas Apr 30, 2020

toumorokoshi May 3, 2020

cijothomas Apr 30, 2020

cijothomas left a comment

toumorokoshi May 3, 2020

toumorokoshi May 3, 2020

toumorokoshi May 3, 2020

toumorokoshi May 3, 2020

toumorokoshi May 3, 2020

toumorokoshi May 3, 2020

jkwatson commented Jul 13, 2020 •

edited

Loading

jmacd commented May 25, 2021

morigs commented Jun 30, 2021


		The potentially contentious changes this OTEP proposes include:

		- Require views to be registered: don't record measurements from API for metric instruments that don't appear in any view.


		Each view describes a relationship between a metric instrument and an aggregation, and specifies the set of label keys to preserve in the aggregated data.

		Views are tracked in a registry, which exporters may use to get the current aggregate values for all registered aggregations at export time.

Add metrics view API #89

Add metrics view API #89

Conversation

c24t commented Mar 5, 2020

tsloughter commented Mar 6, 2020

c24t commented Mar 6, 2020

tsloughter commented Mar 6, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jkwatson commented Mar 6, 2020

c24t commented Mar 6, 2020

lzchen commented Apr 30, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cijothomas left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jkwatson commented Jul 13, 2020 • edited Loading

jmacd commented May 25, 2021

morigs commented Jun 30, 2021

jkwatson commented Jul 13, 2020 •

edited

Loading