Skip to content

CPUThrottlingHigh on metrics server (Prometheus alert)

Natan Yellin edited this page Nov 8, 2021 · 10 revisions

Disambiguation

This is a special case of the general CPUThrottlingHigh alert. This page only deals with occurences of the alert on metrics-server. When the alert occurs on other applications, see the general case.

Alert explanation

The default CPU limits for metrics-server are too low resulting in CPU starvation. This is a real issue which should be fixed by increasing the CPU limits.

Recommended Remediation

metrics-server does not respect normal CPU limits and therefore you cannot fix this alert by raising CPU limits the normal way. metrics-server dynamically updates its CPU limits using the official Kubernetes addon-resizer. To fix this issue, raise the --cpu parameter in the command line for the metrics-server-nanny container. See line in bold below.


apiVersion: apps/v1
kind: Deployment
metadata:
  name: metrics-server-v0.3.6
  namespace: kube-system
spec:
  template:
    spec:
      containers:
        ...
        - command:
            - /pod_nanny
            - '--config-dir=/etc/config'
            - '--cpu=40m'
            - '--extra-cpu=0.5m'
            - '--memory=35Mi'
            - '--extra-memory=4Mi'
            - '--threshold=5'
            - '--deployment=metrics-server-v0.3.6'
            - '--container=metrics-server'
            - '--poll-period=300000'
            - '--estimator=exponential'
            - '--scale-down-delay=24h'
            - '--minClusterSize=5'
            - '--use-metrics=true'
          image: 'gke.gcr.io/addon-resizer:1.8.11-gke.0'
          name: metrics-server-nanny
Clone this wiki locally