-
Notifications
You must be signed in to change notification settings - Fork 2k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
docs: Add best practices for metrics
- Loading branch information
Showing
1 changed file
with
71 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,71 @@ | ||
# Kube-State-Metrics - Timeseries best practices | ||
|
||
--- | ||
|
||
Author: Manuel Rüger (<manuel@rueg.eu>) | ||
|
||
Date: October 17th 2024 | ||
|
||
--- | ||
|
||
# Introduction | ||
|
||
Kube-State-Metrics' goal is to provide insights into the state of Kubernetes objects by exposing them as metrics. | ||
This document provides guidelines with the goal to create a good user experience when using these metrics. | ||
|
||
Please be aware that this document is introduced in a later stage of the project and there might be metrics that do not follow these best practices. | ||
Feel encouraged to report these metrics and provide a pull request to improve them. | ||
|
||
# General best practices | ||
|
||
We follow [Prometheus](https://prometheus.io/docs/practices/naming/) best practices in terms of naming and labeloing. | ||
|
||
# Best practices for kube-state-metrics | ||
|
||
## Avoid pre-computation | ||
|
||
kube-state-metrics should expose metrics on an individual object level and avoid any sort of pre-computation unless it is required due to for example high cardinality on objects. | ||
By exposing raw metrics instead of counters, kube-state-metrics allows the user to have full control on how they want to use the metrics. | ||
|
||
## Static object properties | ||
|
||
An object usually has a stable set of properties that do not change during its lifecycle in Kubernetes. | ||
This includes properties like name, namespace, uid etc. | ||
It is a good practice to group those together into an `_info` metric | ||
|
||
## Dynamic object properties | ||
|
||
An object can also have a dynamic set of properties, which are usually part of the status field. | ||
These change during the lifecycle of the object. | ||
For example a pod can be in different states like "Pending", "Running" etc. | ||
These should be part of a new metric that includes labels that identify the object as well as the dynamic property. | ||
|
||
## Linked properties | ||
|
||
If an object contains a substructure that links multiple properties together (e.g. endpoint address and port), those should be reported in the same metric. | ||
|
||
## Optional properties | ||
|
||
Some Kubernetes objects have optional fields. In case there is an optional value, it is better to not expose the label at all instead of exposing a "nil" value or an empty string. | ||
|
||
## Timestamps | ||
|
||
Timestamps like creation time or modification time should be exposed as a value. The metric should end with `_timestamp_seconds`. | ||
|
||
## Cardinality | ||
|
||
Some object properties can cause cardinality issues if they can contain a lot of different values or are linked together with multiple properties that also can change a lot. | ||
In this case it is better to limit the number of values that can be exposed within kube-state-metrics by allowing only a few of them and have a default for others. | ||
If for example the Kubernetes object contains a status field that contains an error message that can change a lot, it would be better to have a boolean `error="true"` label in case there is an error. | ||
If there are some error messages that are worth exposing, those could be exposed and for any other message, a default value could be provided. | ||
|
||
# Stability | ||
|
||
We follow the stability framework derived from Kubernetes, in which we expose experimental and stable metrics. | ||
Experimental metrics are recently introduced or expose alpha/beta resources in the Kubernetes API. | ||
They can change anytime and should be used with caution. | ||
They can be promoted to a stable metric once the object stabilized in the Kubernetes API or they were part of two consecutive releases and haven't observed any changes in them. | ||
|
||
Stable metrics are considered frozen with the exception of new labels being added. | ||
A stable metric or a label on a stable metric can be deprecated in release Major.Minor and the earliest point it will be removed is the release Major.Minor+2. | ||
|