Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rationalize naming of metric instruments and their default aggregations #96

Closed
wants to merge 10 commits into from
153 changes: 153 additions & 0 deletions text/0096-metric-instrument-terminology.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,153 @@
# Rationalize naming of metric instruments and their default aggregations

Propose final names for the seven metric instruments introduced in [OTEP 93](https://github.com/open-telemetry/oteps/pull/93) and address related confusion.

## Motivation

[OTEP 88](https://github.com/open-telemetry/oteps/pull/88) introduced
a logical structure for metric instruments with two foundational
categories of instrument, called "synchronous" vs. "asynchronous",
named "Measure" and "Observer" in the abstract. This proposal
identified four kinds of "refinement" and mapped out the space of
_possible_ instruments, while not proposing which would actually be
included in the standard.

[OTEP 93](https://github.com/open-telemetry/oteps/pull/93) followed
with a list of six standard instruments, the most necessary and useful
combination of instrument refinements, plus one special case used to
record timing measurements.

This proposal finalizes the names used to describe the seven
instruments above, seeking to address core confusion related to
"Measure":

1. OTEP 88 stipulates that the terms currently in use to name
synchronous and asynchronous instruments become abstract, but also
using "Measure-like" and "Observer-like" to discuss instruments with
refinements. This proposal states that we shall prefer the
adjectives, commonly abbreviated "Sync" and "Async", when describing
instruments.
2. Prior to OTEP 88, but even with OTEPs 88 and 93 included, there is
inconsistency in the naming of instruments. Note that "Counter" and
"Observer" end in "-er", a noun suffix used in the sense of "[person
occupationally connected
with](https://www.merriam-webster.com/dictionary/-er)", while the term
"Measure" does not fit this pattern. This proposal proposes to
replace the abstract term "Measure" by "Recorder", since the
associated method name (verb) is specified as `Record()`.
3. The OTEP 93 asynchronous instruments ("LastValueObserver",
"DeltaObserver", and "CumulativeObserver") have the pattern
"-Observer", while the OTEP 93 synchronous instruments
("Counter", "UpDownCounter", "Distribution", "Timing") do not. This
proposal keeps "Counter" and "UpDownCounter" for Sum-only synchronous instruments, and does the same
with "Recorder", yielding "Recorder" and "TimingRecorder".
4. Confusion over the loss of "Gauge" is addressed by replacing
"LastValueObserver" with "GaugeObserver".

This proposal also repeats the current specification of the default
Aggregator for each kind of instrument.

## Explanation

The following table summarizes the four synchronous instruments and
three asynchronous instruments that will be standardized as a result
of this set of proposals.

| Existing name | OTEP 93 name | **Final name** | Sync or Async | Function | Default aggregation | Measurement kind | Rate support |
| ------------- | ------------------ | ---------------------- | ----------- | ------------- | ---------- | ---- | --- |
| Counter | Counter | **Counter** | Sync | Add() | Sum | Delta | Yes |
| | UpDownCounter | **UpDownCounter** | Sync | Add() | Sum | Delta | No |
| Measure | Distribution | **Recorder** | Sync | Record() | MinMaxSumCount | Instantaneous | No |
| | Timing | **TimingRecorder** | Sync | Record() | MinMaxSumCount | Instantaneous | No |
| Observer | LastValueObserver | **GaugeObserver** | Async | Observe() | MinMaxSumCount | Instantaneous | No |
| | DeltaObserver | **DeltaObserver** | Async | Observe() | Sum | Delta | Yes |
| | CumulativeObserver | **CumulativeObserver** | Async | Observe() | Sum | Cumulative | Yes |

The argument for "Recorder" instead of "Distribution" is that we
should prefer instrument descriptives associated with the action being
performed ("occupationally connected"), not the value being computed,
as the latter is dependent on SDK configuration. A "Recorder" records
a value that is part of a distribution. A "Counter" counts a value
jmacd marked this conversation as resolved.
Show resolved Hide resolved
that is part of a sum. An "GaugeObserver" observes an instantaneous
value ("reads a gauge"). A "Recorder" records an arbitrary value. A
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is confusing that "Recorder" is included twice in this list. Once saying it records a distribution and the other saying it records an arbitrary value. I think the latter is more accurate. If a view of a Recorder were to 'aggregate' with an array (i.e. no aggregation) saying a "Recorder" records a value that is part of a distribution, while not incorrect, isn't precise.

"TimingRecorder" records a timing value, and so on.

## Details

This proposal consolidates OTEP 88 and OTEP 93 and proposes a consistent
pattern for naming instruments. It will be the source of truth when
applying OTEP 88 and OTEP 93 to the OpenTelemetry metrics specification.

### Function names

The function names of the standard instruments are determined as
follows.

#### Counter and UpDownCounter

Counter and UpDownCounter instruments use `Add()` as the function
name, since they capture deltas to a Sum-only instrument. We prefer
`Add()` as opposed to `Count()`, since floating point numbers are
supported, to avoid some an association with "Countable" numbers, a
mathemtical term associated with natural numbers.

#### Recorder and TimingRecorder

Recorder and TimingRecorder use `Record()` as the function name, as in
the existing specification for Measure instruments. This conveys the
fact that these are not a sum, and that individual events are of
importance.

#### Asynchronous instruments

Asynchronous instruments use `Observe()` as the function name. This
signifies that the instrument passively captures a measurement, is not
an active participant, as implied by `Record()`. _Observation_ also
conveys the last-value relationship specified for asynchronous
instruments. The observer can only observe one value at a time, where
the last-observed value wins.

### Default aggregations

This [OTEP 93
conversation](https://github.com/open-telemetry/oteps/pull/93#discussion_r405852507)
raised a question about the default aggregation for GaugeObserver,
given as MinMaxSumCount. Would "Sum" be a more appropriate default?

Note that the distinction between whether the default aggregation is
"Sum" or "MinMaxSumCount" corresponds exactly to whether the
instrument has the Sum-only refinement. "Sum" is the default
aggregation for any Sum-only instrument since, by definition, the
Sum aggregation provides complete information.

The three instruments with a default "MinMaxSumCount" are all used to
record a value that is, by definition, more than only a sum. In this
case, "complete information" requires recording every value, i.e., no
aggregation. MinMaxSumCount is applied in these cases because it
provides the maximum amount of information that can be recorded using
a fixed number of values, per time series, per collection interval.

### GaugeObserver aggregation

Why should GaugeObserver aggregate the Min, Max, Sum, and Count when
it is permitted to observe just one measurement per interval? This
says that when observed values are aggregated they should be treated
like a distribution--we are intersted in more than a sum, by
definition. If observing only a sum, the DeltaObserver or
CumulativeObserver should be used instead.

Clearly, when Count equals 1, the Min, Max, and Sum are equal to the
value. Exporters may be able take advantage of this fact when
exporting data from these instruments. In particular, since it is
known that asynchronous instruments produce only one valiue per
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
known that asynchronous instruments produce only one valiue per
known that asynchronous instruments produce only one value per

interval (with last-value-wins semantics), when we know in the SDK
that no spatial aggregation is configured, we can be sure that Count
equals one, and we can use the most appropriate exposition format for
the target system.

This means Prometheus and Statsd exporters SHOULD export Gauge values
for the GaugeObserver when there is no spatial aggregation being
applied, because that is the natural exposition format for
MinMaxSumCount aggregations when Count equals 1. If there is spatial
aggregation being applied, the default MinMaxSumCount aggregation
still applies.