Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add metrics semantic conventions for timed operations #657

Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 43 additions & 0 deletions specification/metrics/semantic_conventions/metric-general.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# General

The conventions described in this section are generic; they are not specific to a
particular operation.

## Summarizing timed operations

### Justification

Timed operations are generally fully described by Spans. Because these Spans may be
sampled, aggregating information about them downstream is subject to inaccuracies.
Computing count, duration, and error rate from _all_ Spans (including `sampled=true` and
`sampled=false`) results in more accurate aggregates.

### Convention

When creating duration metrics from Spans, the duration **in seconds** of all
_recorded_ Spans of a given category and kind should be aggregated together in a single
[`ValueRecorder`](../api.md#valuerecorder).

The instrument's name should be prefixed with a category and the Span's kind,
using the pattern `{category}.{span.kind}.duration`.

If the duration is derived from Spans that follow one of the common semantic-conventional
_areas_, the category should be the label prefix used in that semantic convention.

For example:

* `http.server.duration`
* `http.client.duration`
* `db.client.duration`

#### Labels
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like we need to record timing, status, and attributes of a Span to support the metrics derived from spans. Does this mean that the unsampled span becomes just a non-exported Span but otherwise stops being a no-op structure? This might add quite a bit of overhead in the unsampled case, but it does also make sense to me to use Span as a fallback generic structure for a request across all instrumentation. I wonder what about libraries that do natively record metric information in their own representation of a request, for example

https://javadoc.io/doc/com.linecorp.armeria/armeria-javadoc/latest/com/linecorp/armeria/common/logging/RequestLog.html

For these, would we recommend unsampled spans to continue to be pure no-op and use this native functionality for metrics?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are raising good points. To export metrics from spans requires a real span object that can remember attributes from start and wait for the status and duration to be known. If there is a good (future) way to encode metrics in spans we might choose that instead. It wonder if the Span Sampler API could be used to return a reduced-size Span which would act like a no-op span but store enough information to report a duration metric.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is a good (future) way

Ah I didn't realize that's future work since this doc talks a lot about Spans. But I wonder is it just referring to the concept of a span (window of time) and not the data model we use in tracing?

For sampling, yeah I think it would be possible to have three types of spans, exported real span, non-exported real span, and no-op span if we needed it.


Labels applied to these metric recordings should follow the attribute semantic conventions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be written like this?

Labels applied to these metric recordings should follow the attribute semantic conventions of the corresponding span for the operation. In other words, the labels should match as best as possible the attributes the Span for the operation would have if it had been sampled.

I think this along with the duration tweak suggested above decouples this PR from the sampling aspect.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like the corresponding Span for the operation wording. I think I'll leave off the second sentence.
It's important that this guidance is followed by the writers of the category-specific semantic conventions. And I think it'll be clear enough for them.

of the corresponding Span for the operation.

Care should be taken when adding labels to the duration instruments in order to avoid
excessive cardinality.

For example, adding `http.status_code` as a label to the instrument whose duration
represents HTTP client latency would be reasonable, but `http.response_content_length`
would not.