Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

keps/sig-instrumentation: Add metrics overhaul KEP #2909

Merged
merged 1 commit into from
Nov 28, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion keps/NEXT_KEP_NUMBER
Original file line number Diff line number Diff line change
@@ -1 +1 @@
31
32
113 changes: 113 additions & 0 deletions keps/sig-instrumentation/0031-kubernetes-metrics-overhaul.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
---
kep-number: 0031
title: Kubernetes Metrics Overhaul
authors:
- "@brancz"
owning-sig: sig-instrumentation
participating-sigs:
- sig-aaa
- sig-bbb
reviewers:
- "@piosz"
- "@DirectXMan12"
approvers:
- "@piosz"
- "@DirectXMan12"
editor: @DirectXMan12
creation-date: 2018-11-06
last-updated: 2018-11-06
status: provisional
---

# Kubernetes Metrics Overhaul

## Table of Contents

* [Table of Contents](#table-of-contents)
* [Summary](#summary)
* [Motivation](#motivation)
* [Goals](#goals)
* [Non-Goals](#non-goals)
* [Proposal](#proposal)
* [cAdvisor instrumentation changes](#cadvisor-instrumentation-changes)
* [Consistent labeling](#consistent-labeling)
* [Changing API latency histogram buckets](#changing-api-latency-histogram-buckets)
* [Kubelet metric changes](#kubelet-metric-changes)
* [Make metrics aggregatable](#make-metrics-aggregatable)
* [Export less metrics](#export-less-metrics)
* [Prevent apiserver's metrics from accidental registration](#prevent-apiservers-metrics-from-accidental-registration)
* [Risks and Mitigations](#risks-and-mitigations)
* [Graduation Criteria](#graduation-criteria)
* [Implementation History](#implementation-history)

## Summary

This Kubernetes Enhancement Proposal (KEP) outlines the changes planned in the scope of an overhaul of all metrics instrumented in the main kubernetes/kubernetes repository. This is a living document and as existing metrics, that are planned to change are added to the scope, they will be added to this document. As this initiative is going to affect all current users of Kubernetes metrics, this document will also be a source for migration documentation coming out of this effort.

This KEP is targeted to land in Kubernetes 1.14. The aim is to get all changes into one Kubernetes minor release, to have only a migration be necessary. We are preparing a number of changes, but intend to only start merging them once the 1.14 development window opens.

## Motivation

A number of metrics that Kubernetes is instrumented with do not follow the [official Kubernetes instrumentation guidelines](https://github.com/kubernetes/community/blob/master/contributors/devel/instrumentation.md). This is for a number of reasons, such as the metrics having been created before the instrumentation guidelines were put in place (around two years ago), and just missing it in code reviews. Beyond the Kubernetes instrumentation guidelines, there are several violations of the [Prometheus instrumentation best practices](https://prometheus.io/docs/practices/instrumentation/). In order to have consistently named and high quality metrics, this effort aims to make working with metrics exposed by Kubernetes consistent with the rest of the ecosystem. In fact even metrics exposed by Kubernetes are inconsistent in themselves, making joining of metrics difficult.

Kubernetes also makes extensive use of a global metrics registry to register metrics to be exposed. Aside from general shortcomings of global variables, Kubernetes is seeing actual effects of this, causing a number of components to use `sync.Once` or other mechanisms to ensure to not panic, when registering metrics. Instead a metrics registry should be passed to each component in order to explicitly register metrics instead of through `init` methods or other global, non-obvious executions. Within the scope of this KEP, we want to explore other ways, however, it is not blocking for its success, as the primary goal is to make the metrics exposed themselves more consistent and stable.

While uncertain at this point, once cleaned up, this effort may put us a step closer to having stability guarantees for Kubernetes around metrics. Currently metrics are excluded from any kind of stability requirements.

### Goals

* Provide consistently named and high quality metrics in line with the rest of the Prometheus ecosystem.
* Consistent labeling in order to allow straightforward joins of metrics.

### Non-Goals

* Add/remove metrics. The scope of this effort just concerns the existing metrics. As long as the same or higher value is presented, adding/removing may be in scope (this is handled on a case by case basis).
* This effort does not concern logging or tracing instrumentation.

## Proposal

### cAdvisor instrumentation changes

#### Consistent labeling

Change the container metrics exposed through cAdvisor (which is compiled into the Kubelet) to [use consistent labeling according to the instrumentation guidelines](https://github.com/kubernetes/kubernetes/pull/69099). Concretely what that means is changing all the occurrences of the labels:
`pod_name` to `pod`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds fine to me. Would we just have a release where both labels are present?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As proposed I think I would prefer providing Prometheus recording rules that allow for backward compatibility, this is what’s been done in the Prometheus ecosystem. But I have no strong feelings about this, I’m also happy with a transition release having both.

`container_name` to `container`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about existing metrics scrapes which rely on the current labeling? Do we provide some deprecation period where old metric labels would still be available?


As Kubernetes currently rewrites meta labels of containers to “well-known” `pod_name`, and `container_name` labels, this code is [located in the Kubernetes source](https://github.com/kubernetes/kubernetes/blob/097f300a4d8dd8a16a993ef9cdab94c1ef1d36b7/pkg/kubelet/cadvisor/cadvisor_linux.go#L96-L98), so it does not concern the cAdvisor code base.

### Changing API latency histogram buckets

API server histogram latency buckets run from 125ms to 8s. This range does not accurately model most API server request latencies, which could run as low as 1ms for GETs or as high as 60s before hitting the API server global timeout.

https://github.com/kubernetes/kubernetes/pull/67476

### Kubelet metric changes

#### Make metrics aggregatable

Currently, all Kubelet metrics are exposed as summary data types. This means that it is impossible to calculate certain metrics in aggregate across a cluster, as summaries cannot be aggregated meaningfully. For example, currently one cannot calculate the [pod start latency in a given percentile on a cluster](https://github.com/kubernetes/kubernetes/issues/66791).

Hence, where possible, we should change summaries to histograms, or provide histograms in addition to summaries like with the API server metrics.

#### Export less metrics

https://github.com/kubernetes/kubernetes/issues/68522

#### Prevent apiserver's metrics from accidental registration

https://github.com/kubernetes/kubernetes/pull/63924

### Risks and Mitigations

Risks include users upgrading Kubernetes, but not updating their usage of Kubernetes exposed metrics in alerting and dashboarding potentially causing incidents to go unnoticed.

To prevent this, we will implement recording rules for Prometheus that allow best effort backward compatibility as well as update uses of breaking metric usages in the [Kubernetes monitoring mixin](https://github.com/kubernetes-monitoring/kubernetes-mixin), a widely used collection of Prometheus alerts and Grafana dashboards for Kubernetes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, this answered my question from above :-D


## Graduation Criteria

All metrics exposed by components from kubernetes/kubernetes follow Prometheus best practices and (nice to have) tooling is built and enabled in CI to prevent simple violations of said best practices.

## Implementation History

Multiple pull requests have already been opened, but not merged as of writing of this document.