Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metrics: Scope the Views API #466

Closed
jmacd opened this issue Feb 13, 2020 · 12 comments · Fixed by #1730
Closed

Metrics: Scope the Views API #466

jmacd opened this issue Feb 13, 2020 · 12 comments · Fixed by #1730
Assignees
Labels
area:sdk Related to the SDK priority:p1 Highest priority level release:required-for-ga Must be resolved before GA release, or nice to have before GA spec:metrics Related to the specification/metrics directory

Comments

@jmacd
Copy link
Contributor

jmacd commented Feb 13, 2020

The OpenCensus system gave us a precedent for a "Views" API over metrics, allowing users to configure how instruments are aggregated, including which operator to use and which dimensions to use.

We have debated whether the developers, who are responsible for defining metric instruments, should have the ability to declare aggregations directly, at the point of definition. As an alternative, we have discussed the excluding this ability from the instrument declaration and requiring all aggregations to be configured through a Views API.

In the 0.4 specification we are considering to remove WithRecommendedKeys which is the only existing facility in the specification that lets a user configure aggregation.

As a whole, the Views API needs to be scoped for OpenTelemetry. I believe this should be considered an SDK API, meaning the default SDK will implement this facility. Will a Views API be written for every language? (I assume not.) Will some languages expect to export raw events, to allow a Views API to execute in the OTel collector? (I assume so.)

@jmacd jmacd added the spec:metrics Related to the specification/metrics directory label Feb 13, 2020
@jmacd jmacd added this to the v0.5 milestone Feb 13, 2020
@toumorokoshi
Copy link
Member

I'm not confident on my perspective on every issue here, but wanted to add some thoughts regardless:

Static vs Dynamic / API driven configuration

Is there some way that both can supported? It's certainly true that many existing solutions do not provide a way to dynamically configure the aggregation, but it seems to me like the common developer workflow would be:

  1. as a developer, I may choose a cardinality-conservative aggregation, but pass more labels than I need. As cardinality on metrics generally correlates to cost, conservative to start is a a pragmatic option.
  2. As issues arise, I will need to use more of the labels to better understand the drivers of my metric. I will probably turn this on temporarily, as it helps me debug an issue at that point in time, and turn it off afterward once the issue is investigated.

Thus I feel like having a static way to initially define the aggregation in the normally operating case, and allowing dynamic overrides for deeper debugging, is the most desirable scenario.

Will a Views API be written for every language

I feel like it's valuable to ensure consistency between the availability of such APIs across languages. It would be quite a shock if one rewrote a service in a different language, only to find that one has lost some flexibility in their telemetry in the process.

Why is it expected that a views API would not exist in every language?

APIs in the collector

Definitely, the collector is the right place for this type of dynamic dimensionality configuration. I'd almost say that to the point that we shouldn't implement dynamic dimensionality anywhere except the collector, but I imagine that would make it very difficult for those who want to adopt views but can't use the collector for some reason.

Naming: Aggregation over Views?

Just a random thought. Views felt to me like a somewhat intuitive name, but I think aggregations actually is a very eloquent way of describing this feature.

@SergeyKanzhelev
Copy link
Member

One important note about views - it's great to allow multiple views be defined on a single metric with the different set of dimensions and aggregation interval. Each view is defined by and for a separate exporter.

@jmacd
Copy link
Contributor Author

jmacd commented Feb 20, 2020

@rghetia and @c24t will take ownership of this.

@jmacd
Copy link
Contributor Author

jmacd commented Feb 20, 2020

I think aggregations actually is a very eloquent way of describing this feature.

+1

@jmacd
Copy link
Contributor Author

jmacd commented Feb 20, 2020

having a static way to initially define the aggregation in

Would you accept having to call a second API to configure the default aggregation keys in your SDK? There is a question about keeping or removing the WithRecommendedKeys option, which is provided to allow you to set the aggregation dimensions at the point where you allocate a metric instrument.

@c24t c24t assigned rghetia and c24t Feb 20, 2020
@toumorokoshi
Copy link
Member

Would you accept having to call a second API to configure the default aggregation keys in your SDK?

I'm thinking through this a bit. It would be nice to have a way to define the aggregation I want by default, and do it in a single call with the construction of the metric. I would imagine this could work by passing an aggregation to the metric constructor itself.

One major reason is that in Python, annotating functions with decorators that define a metric is common. So it would be nice to have a single definition:

@counter("foo.bar", labels=("env", "service"), Aggregation=MinMaxSumCountAggregator())
def perform_work():
     ...

(Sorry I know that isn't precisely the interface in the spec, I'm just giving a conceptual example)

1 similar comment
@toumorokoshi
Copy link
Member

Would you accept having to call a second API to configure the default aggregation keys in your SDK?

I'm thinking through this a bit. It would be nice to have a way to define the aggregation I want by default, and do it in a single call with the construction of the metric. I would imagine this could work by passing an aggregation to the metric constructor itself.

One major reason is that in Python, annotating functions with decorators that define a metric is common. So it would be nice to have a single definition:

@counter("foo.bar", labels=("env", "service"), Aggregation=MinMaxSumCountAggregator())
def perform_work():
     ...

(Sorry I know that isn't precisely the interface in the spec, I'm just giving a conceptual example)

@tsloughter
Copy link
Member

+1 to being in the SDK. The API allows developers to record metrics for their logic but then the end user decides what, if anything, they want to do with those recordings.

The only exception is the counter. I like the counter as an optimization by the developer for times a count is really all that makes sense for a metric. In many languages it can be optimized to a very efficient update function that can be done inline and skip any collection of metric recordings and aggregation logic.

Allowing the raw events (except for counter in my opinion) to be sent to the collector for aggregation, so views being defined there, is a nice idea too. A compact protocol for sending those is a must in that case.

So I disagree with allowing an aggregation on a counter by the API like in @toumorokoshi 's python example. Or at the very least, if aggregation is to be part of the API, that should have to be a @measure to have an aggregation that isn't sum.

@toumorokoshi
Copy link
Member

this is a tricky situation, as I understand that exposing some things via the API is undesirable, and one aspect here is to reduce the amount of work / background one has to have on the API consumer side.

I think it's very common for the one who instruments to be opinionated in how the data is aggregated.

Within Zillow Group, we have a global aggregation configuration setting for timing metric (similar to statsite's max,min,avg,pXX). A common ask for uses it to have different aggregations for their specific metric. They have some understanding of the behavior of the metric (e.g. high volume) that would dictate them choosing the aggregation.

It could be a two-call operation to 1. instrument and 2. choose the aggregation, but how would that look to the user? One thing I like about our current trace api is that the user basically needs to know nothing about how the SDK is actually configured. I can configure the span processor and exporter in a generic fashion, and the user has control over what to instrument, and a way to tell the exporter that a specific trace/span must be emitted (the TraceFlags). If possible, I'd like the metric aspect of things to be similarly hands off.

@c24t
Copy link
Member

c24t commented Apr 9, 2020

Linking the views OTEP here: open-telemetry/oteps#89.

@carlosalberto carlosalberto modified the milestones: v0.5, v0.6 Jun 9, 2020
@andrewhsu andrewhsu added release:required-for-ga Must be resolved before GA release, or nice to have before GA area:api Cross language API specification issue area:sdk Related to the SDK priority:p2 Medium priority level and removed area:api Cross language API specification issue labels Jul 28, 2020
@andrewhsu andrewhsu assigned jkwatson and bogdandrutu and unassigned rghetia, c24t and jkwatson Oct 20, 2020
@jsuereth
Copy link
Contributor

My $.02 here, we have three use cases:

  1. I think having a default aggregation / "view" for instruments is a wonderful productivity boon for users, especially around synchronous instruments that need to be aggregated before reporting at known intervals.
  2. I think there's a use case where the default aggregation is NOT the one the user wants. in scenarios where the author of the instrument understands the full semantics and wants a different default aggregation, they likely need a mechanism to do so. In OpenCensus since all aggregation required configuration, this was a solved problem.
  3. I think it'd be highly useful for folks to report MULTIPLE metric data streams with different aggregations from the same instrument. This may be more important if I consume some instrumentation lirbary and want to reconfigure their default aggregation to work better with my backend. As @toumorokoshi points out, this could also be a temporary aggregation to diagnose a problem in production.

OpenCensus views solved 1 and 2, but I think struggled a bit with 3. Whatever solutions OTel provides, I think we should target these 3 use cases, and possibly with 3 mechanisms (not one). I.e. I don't think a "views" API should be used to solve use case 3. I also don't think an API which solves 3 should be the only way to deal with a bad default aggregation (use case 2).

Can we split the discussion into two problems/solutions:

  • An API to allow users to choose a different default aggregation. This should be an acceptable option if we find an Instrument type that could be aggregated in more than one way. (Use case 2)
  • An API/Configuration/Processing that allows users to provide additional aggregations on existing instruments. (Use case 3)

@tsloughter
Copy link
Member

Agreed.

Also, OpenCensus solved (3) well, was one of its strengths in my opinions. For example, you could aggregate the request latency measure as a count of the number of requests.

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:sdk Related to the SDK priority:p1 Highest priority level release:required-for-ga Must be resolved before GA release, or nice to have before GA spec:metrics Related to the specification/metrics directory
Projects
None yet
Development

Successfully merging a pull request may close this issue.