-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: Add best practices for metrics #2528
Conversation
af6c15a
to
95a7f71
Compare
LGTM for Stability |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking really nice so far. A couple of comments.
|
||
### Optional properties | ||
|
||
Some Kubernetes objects have optional fields. In case there is an optional value, it is better to not expose the label at all instead of exposing a "nil" value or an empty string. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I'm not sure about this one.
Prometheus metric families should have consistent labels.
For example, this is considered invalid.
my_metric{a="foo",b="bar"} 1
my_metric{b="baz"}1
In the above case, you want to expose an empty string label so the metric family is consistent.
my_metric{a="foo",b="bar"} 1
my_metric{a="",b="baz"}1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's good to know! I'll fix it (and will add some fixes to the next release of ksm).
Is there any upstream documentation on this that suggests labels to be empty instead of leaving them out so I can reference them in here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we had an exmaple in the OpenMetrics exposition spec, but I can't find it right now. I'll see if I can find the docs on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The technicality here can be split into two considerations:
- The lint enforcement mentioned above comes about as a result of the registry machinery.
- KSM does not employ the same for its metrics since none of these are structured (except KSM's own meta-metrics, but piped-out the in the same non-proto manner).
KSM bypasses this trait by design (a promtool check metrics
will succeed for the invalid example above).
That being said, it's always good to be consistent with the ecosystem, given we can incorporate a solution that ensures this in the CI.
I'm also trying to suggest we move most labels into a single metric
instead of
unless there are good reasons why the latter one is better for a TSDB/Querying etc |
I believe field-specific metrics may help (a) limit cardinality (ref)? In the original issue, there was talk about including only tightly bounded labelsets in the |
This is why you need to look at the lifecycle of the object. If it's not lifetime scoped, it's dynamic. I'd rather have a few metric series with more labels than a bunch of metrics to avoid bad UX during querying where you need to do a few group lefts and joins to get to the result you want to see. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/triage accepted |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm!
|
||
### Optional properties | ||
|
||
Some Kubernetes objects have optional fields. In case there is an optional value, the label should still be exposed, ideally as an empty string. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
empty string and not having the label is the same thing in Prometheus, so why do this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be treated differently by other monitoring systems, so rather stay explicit here and ensure we do it the same way everywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to chime in on #2528 (comment)
Co-authored-by: Ben Kochie <superq@gmail.com>
Co-authored-by: Ben Kochie <superq@gmail.com>
Co-authored-by: Ben Kochie <superq@gmail.com>
8af99a5
to
19d3c3c
Compare
@dgrisonnet @CatherineF-dev @rexagod I think this is in a good state now. :) |
19d3c3c
to
dcfaae9
Compare
/assign @dgrisonnet For visibility. |
/assign Talked this through with Damien offline, no blockers. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: brancz, mrueg, rexagod, SuperQ The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What this PR does / why we need it:
This PR introduces a doc for best practices around metrics.
Hoping to get some comments and an agreement on these ideas.
@rexagod @dgrisonnet @dashpole @logicalhan @CatherineF-dev
How does this change affect the cardinality of KSM: (increases, decreases or does not change cardinality)
None
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #