Skip to content

Commit

Permalink
Merge pull request #2528 from mrueg/metrics-best-practices
Browse files Browse the repository at this point in the history
docs: Add best practices for metrics
  • Loading branch information
k8s-ci-robot authored Nov 27, 2024
2 parents 9652811 + dcfaae9 commit 32e7727
Showing 1 changed file with 72 additions and 0 deletions.
72 changes: 72 additions & 0 deletions docs/design/metrics-best-practices.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
# Kube-State-Metrics - Timeseries best practices

---

Author: Manuel Rüger (<manuel@rueg.eu>)

Date: October 17th 2024

---

## Introduction

Kube-State-Metrics' goal is to provide insights into the state of Kubernetes objects by exposing them as metrics.
This document provides guidelines with the goal to create a good user experience when using these metrics.

Please be aware that this document is introduced in a later stage of the project and there might be metrics that do not follow these best practices.
Feel encouraged to report these metrics and provide a pull request to improve them.

## General best practices

We follow [Prometheus](https://prometheus.io/docs/practices/naming/) best practices in terms of naming and labeling.

## Best practices for kube-state-metrics

### Avoid pre-computation

kube-state-metrics should expose metrics on an individual object level and avoid any sort of pre-computation unless it is required due to for example high cardinality on objects.
We prefer not to add metrics that can be derived from existing raw metrics. For example, we would not want to expose a metric called `kube_pod_total` as it can be computed with `count(kube_pod_info)`.
This way kube-state-metrics allows the user to have full control on how they want to use the metrics and gives them flexibility to do specific computation.

### Static object properties

An object usually has a stable set of properties that do not change during its lifecycle in Kubernetes.
This includes properties like name, namespace, uid etc. that have a 1:1 relationship with the object.
It is a good practice to group those together into an `_info` metric.
If there is a 1:n relationship (e.g. a list of ports), it should be in a separate metric to avoid generating too many metrics.

### Dynamic object properties

An object can also have a dynamic set of properties, which are usually part of the status field.
These change during the lifecycle of the object.
For example a pod can be in different states like "Pending", "Running" etc.
These should be part of a "State Set" that includes labels that identify the object as well as the dynamic property.

### Linked properties

If an object contains a substructure that links multiple properties together (e.g. endpoint address and port), those should be reported in the same metric.

### Optional properties

Some Kubernetes objects have optional fields. In case there is an optional value, the label should still be exposed, ideally as an empty string.

### Timestamps

Timestamps like creation time or modification time should be exposed as a value. The metric should end with `_timestamp_seconds`. The date value is represented in [UNIX epoch seconds](https://en.wikipedia.org/wiki/Unix_time).

### Cardinality

Some object properties can cause cardinality issues if they can contain a lot of different values or are linked together with multiple properties that also can change a lot.
In this case it is better to limit the number of values that can be exposed within kube-state-metrics by allowing only a few of them and have a default for others.
If for example the Kubernetes object contains a status field that contains an error message that can change a lot, it would be better to have a boolean `error="true"` label in case there is an error.
If there are some error messages that are worth exposing, those could be exposed and for any other message, a default value could be provided.

## Stability

We follow the stability framework derived from Kubernetes, in which we expose experimental and stable metrics.
Experimental metrics are recently introduced or expose alpha/beta resources in the Kubernetes API.
They can change anytime and should be used with caution.
They can be promoted to a stable metric once the object stabilized in the Kubernetes API or they were part of two consecutive releases and haven't observed any changes in them.

Stable metrics are considered frozen with the exception of new labels being added.
A stable metric or a label on a stable metric can be deprecated in release Major.Minor and the earliest point it will be removed is the release Major.Minor+2.

0 comments on commit 32e7727

Please sign in to comment.