-
Notifications
You must be signed in to change notification settings - Fork 893
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add metrics semantic conventions for timed operations #657
Changes from all commits
f63a729
a73803e
f130a79
ffa2640
6ed45ca
a3ff441
bd1dda1
5bff9b8
753de65
9b5446e
4f71365
7417406
3182288
c2283b6
8ade2de
c92d8f5
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,43 @@ | ||
# General | ||
|
||
The conventions described in this section are generic; they are not specific to a | ||
particular operation. | ||
|
||
## Summarizing timed operations | ||
|
||
### Justification | ||
|
||
Timed operations are generally fully described by Spans. Because these Spans may be | ||
sampled, aggregating information about them downstream is subject to inaccuracies. | ||
Computing count, duration, and error rate from _all_ Spans (including `sampled=true` and | ||
`sampled=false`) results in more accurate aggregates. | ||
|
||
### Convention | ||
|
||
When creating duration metrics from Spans, the duration **in seconds** of all | ||
_recorded_ Spans of a given category and kind should be aggregated together in a single | ||
[`ValueRecorder`](../api.md#valuerecorder). | ||
|
||
The instrument's name should be prefixed with a category and the Span's kind, | ||
using the pattern `{category}.{span.kind}.duration`. | ||
|
||
If the duration is derived from Spans that follow one of the common semantic-conventional | ||
_areas_, the category should be the label prefix used in that semantic convention. | ||
|
||
For example: | ||
|
||
* `http.server.duration` | ||
* `http.client.duration` | ||
* `db.client.duration` | ||
|
||
#### Labels | ||
|
||
Labels applied to these metric recordings should follow the attribute semantic conventions | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could this be written like this?
I think this along with the duration tweak suggested above decouples this PR from the sampling aspect. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I really like the corresponding Span for the operation wording. I think I'll leave off the second sentence. |
||
of the corresponding Span for the operation. | ||
|
||
Care should be taken when adding labels to the duration instruments in order to avoid | ||
excessive cardinality. | ||
|
||
For example, adding `http.status_code` as a label to the instrument whose duration | ||
represents HTTP client latency would be reasonable, but `http.response_content_length` | ||
would not. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like we need to record timing, status, and attributes of a
Span
to support the metrics derived from spans. Does this mean that the unsampled span becomes just a non-exported Span but otherwise stops being a no-op structure? This might add quite a bit of overhead in the unsampled case, but it does also make sense to me to useSpan
as a fallback generic structure for a request across all instrumentation. I wonder what about libraries that do natively record metric information in their own representation of a request, for examplehttps://javadoc.io/doc/com.linecorp.armeria/armeria-javadoc/latest/com/linecorp/armeria/common/logging/RequestLog.html
For these, would we recommend unsampled spans to continue to be pure no-op and use this native functionality for metrics?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are raising good points. To export metrics from spans requires a real span object that can remember attributes from start and wait for the status and duration to be known. If there is a good (future) way to encode metrics in spans we might choose that instead. It wonder if the Span Sampler API could be used to return a reduced-size Span which would act like a no-op span but store enough information to report a duration metric.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I didn't realize that's future work since this doc talks a lot about Spans. But I wonder is it just referring to the concept of a span (window of time) and not the data model we use in tracing?
For sampling, yeah I think it would be possible to have three types of spans, exported real span, non-exported real span, and no-op span if we needed it.