Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metrics API instrument foundation and refinements #88

Merged
merged 18 commits into from
Apr 10, 2020

Conversation

jmacd
Copy link
Contributor

@jmacd jmacd commented Feb 25, 2020

Copy link
Member

@bogdandrutu bogdandrutu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am pro having the properties of monotonic/absolute embedded in the name/type of the Instrument instead of being properties of the instrument.

text/0088-metric-instrument-optional-refinements.md Outdated Show resolved Hide resolved
Copy link
Member

@c24t c24t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, much simpler having optional instruments than instrument options.

Reading this, I realize I don't understand how observer measurements are supposed to be aggregated. Is the idea that a single observer instrument can capture multiple measurements, but each has to have a different labelset?

text/0088-metric-instrument-optional-refinements.md Outdated Show resolved Hide resolved
text/0088-metric-instrument-optional-refinements.md Outdated Show resolved Hide resolved
@jmacd
Copy link
Contributor Author

jmacd commented Mar 10, 2020

As posted in https://gitter.im/open-telemetry/metrics-specs?at=5e67b66ef046a30bbe55b092

In last week's Metrics SIG we had a conversation about the interpretation of the "instrument refinements" proposed in the OTEP, and we came to realize a problem with the set of refinements being proposed, which were "sum-only", "non-negative", and "monotonic". The test case was how to think about exporting a cumulative CPU-usage counter. The problem was that metric events are aggregated differently with respect to time than they are with respect to the spatial dimensions for cumulative instruments.

I've updated the document with a significant amount of new text. I've introduced the term "Rate aggregation", which is defined differently based on whether the values being reported are deltas or cumulatives. Now the document describes 4 refinements, "sum-only", "non-negative" are still there from before, and the two new ones are "precomputed-sum" and "non-negative-rate".

The "pre-computed sum" refinement says that a value is cumulative. The "non-negative-rate" refinement says that the sum is monotonic.

There are still no functional changes in this proposal, just a revision of the mental model.

@jmacd jmacd added the metrics Relates to the Metrics API/SDK label Mar 10, 2020
@jkwatson
Copy link
Contributor

Based on discussion in the metrics SIG, I have started a document to capture vendor and open-source backend metric types. Please add all relevant information: https://docs.google.com/document/d/1-Tut-w5lACJhrjGG1Jn9C41uDMquZpkd1_oXLpiYFK0/edit?usp=sharing

@bogdandrutu
Copy link
Member

@open-telemetry/specs-approvers please review this.

@jmacd
Copy link
Contributor Author

jmacd commented Apr 10, 2020

@open-telemetry/specs-approvers We are experiencing delays. PLEASE REVIEW.

Copy link
Contributor

@tedsuo tedsuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pulling all this together Josh.

Based on the Last Value relationship, we can ask and answer questions
such as "what is the average last value of a metric at a point in
time?". Observer instruments define the Last Value relationship
without referring to the collection interval and without ambiguity.
Copy link
Contributor

@jkwatson jkwatson Apr 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since "last value" is, by its very nature, a single value for a given reporting interval, doesn't the "average" need to be defined over some set of collection intervals? And, if this is a "temporal" average (rather than a "spatial" average), doesn't there need to be some concept of a time window over which the average is computed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think of average being well-defined for a single. interval. If there is no aggregation, there's a count of 1 and the average equals the single value. If there is spatial aggregation within the interval, the average is still well defined.

If there is temporal aggregation over multiple intervals, I think the average is still well defined. What we're not defining, however, is the average over a specific time range--the average is defined over a range of intervals.

I like the "last value" relationship defined here, since we can write queries that apply across machines, even though their collection intervals are not synchronized. The query is something like "read the last-value of metric X at time T across a set of machines and average the results". We are using the last value at the most-recent point in time as the input to a distributed spatial aggregation.

We could try to define an average across time and space using this data, but now some interpolation may be required, since the contributing points are spread across time and the query window is probably not aligned with the collection intervals of the target machines.

See also https://github.com/open-telemetry/oteps/pull/96/files#diff-e822e6cd6cb430ee3f62ff7fbeaed0daR130

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's good. I was just confused about the question when the answer seemed very obvious. :)

_AbsoluteMeasure_ with the correct unit and a language-specific
duration type for measuring time.

If the above open question is decided in favor of treating the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the open question referenced disappeared in some edit. It's not there now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The question was whether names like "Measure" and "Observer" should be reserved as explanatory, descriptive names, and whether we should find new, different names for the actual, concrete unrestricted/unrefined form of these instruments handled by users. This was discussed in the SIG call but no issue specifically was filed.

This has been referred to as "Abstract foundation instruments" vs "Non-abstract foundation instruments". I originally proposed Non-abstract, i.e., that the unrestricted Measure instrument should be called Measure. @bogdandrutu convinced me to go the other way, which I have done in this document's sample proposal and in OTEP 93.

However, I have changed that stance again in OTEP 96, which states that we should use "Recorder" instead of "Measure", and that the default kind of Recorder instance should be named "Recorder--this is the Non-abstract proposal.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, I know the open question...it just isn't in this doc any more that I could see. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
metrics Relates to the Metrics API/SDK
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants