Metricbeat: Missing kubernetes.pod cpu and memory usage percentage on non-leader nodes #32232

tetianakravchenko · 2022-07-06T16:59:39Z

For confirmed bugs, please report:

Version: tested on 7.17.5, 8.2.4, likely other as well
Operating System:
Steps to Reproduce:

metricbeat configmap (note: node metricset is not enabled):

configmap

apiVersion: v1
kind: ConfigMap
metadata:
  name: metricbeat-daemonset-config
  namespace: kube-system
  labels:
    k8s-app: metricbeat
data:
  metricbeat.yml: |-
    logging.level: debug
    metricbeat.config.modules:
      # Mounted `metricbeat-daemonset-modules` configmap:
      path: ${path.config}/modules.d/*.yml
      # Reload module configs as they change:
      reload.enabled: false

    metricbeat.autodiscover:
      providers:
        - type: kubernetes
          scope: cluster
          node: ${NODE_NAME}
          # In large Kubernetes clusters consider setting unique to false
          # to avoid using the leader election strategy and
          # instead run a dedicated Metricbeat instance using a Deployment in addition to the DaemonSet
          unique: true
          templates:
            - config:
                - module: kubernetes
                  hosts: ["kube-state-metrics:8080"]
                  period: 3m
                  add_metadata: true
                  metricsets:
                    - state_node
                    - state_deployment
                    - state_daemonset
                    - state_replicaset
                    - state_pod
                    - state_container
                    # - state_job
                    - state_cronjob
                    - state_resourcequota
                    - state_statefulset
                    - state_service
                    - state_persistentvolume
                    - state_persistentvolumeclaim
                  # If `https` is used to access `kube-state-metrics`, uncomment following settings:
                  # bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
                  # ssl.certificate_authorities:
                  #   - /var/run/secrets/kubernetes.io/serviceaccount/service-ca.crt
                - module: kubernetes
                  metricsets:
                    - apiserver
                  hosts: ["https://${KUBERNETES_SERVICE_HOST}:${KUBERNETES_SERVICE_PORT}"]
                  bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
                  ssl.certificate_authorities:
                    - /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
                  period: 3m
                # Uncomment this to get k8s events:
                #- module: kubernetes
                #  metricsets:
                #    - event
        # To enable hints based autodiscover uncomment this:
        #- type: kubernetes
        #  node: ${NODE_NAME}
        #  hints.enabled: true

    processors:
      - add_cloud_metadata:

    cloud.id: ${ELASTIC_CLOUD_ID}
    cloud.auth: ${ELASTIC_CLOUD_AUTH}

    output.elasticsearch:
      hosts: ['${ELASTICSEARCH_HOST:elasticsearch}:${ELASTICSEARCH_PORT:9200}']
      username: ${ELASTICSEARCH_USERNAME}
      password: ${ELASTICSEARCH_PASSWORD}
      allow_older_versions: true
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: metricbeat-daemonset-modules
  namespace: kube-system
  labels:
    k8s-app: metricbeat
data:
  system.yml: |-
    - module: system
      period: 3m
      metricsets:
        - cpu
        - load
        - memory
        - network
        - process
        - process_summary
        #- core
        - diskio
        #- socket
      processes: ['.*']
      process.include_top_n:
        by_cpu: 10      # include top 5 processes by CPU
        by_memory: 10   # include top 5 processes by memory

    - module: system
      period: 3m
      metricsets:
        - filesystem
        - fsstat
      processors:
      - drop_event.when.regexp:
          system.filesystem.mount_point: '^/(sys|cgroup|proc|dev|etc|host|lib|snap)($|/)'
  kubernetes.yml: |-
    - module: kubernetes
      metricsets:
        # - node
        # - system
        - pod
        - container
        - volume
      period: 3m
      host: ${NODE_NAME}
      hosts: ["https://${NODE_NAME}:10250"]
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      ssl.verification_mode: "none"
      # If there is a CA bundle that contains the issuer of the certificate used in the Kubelet API,
      # remove ssl.verification_mode entry and use the CA, for instance:
      #ssl.certificate_authorities:
        #- /var/run/secrets/kubernetes.io/serviceaccount/service-ca.crt
    - module: kubernetes
      metricsets:
        - proxy
      period: 3m
      host: ${NODE_NAME}
      hosts: ["localhost:10249"]
      # If using Red Hat OpenShift should be used this `hosts` setting instead:
      # hosts: ["localhost:29101"]

Run multi node k8s cluster
kind create cluster --config kind-conf.yml

kind-config.yml

kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
  kubeadmConfigPatches:
      - |
        kind: KubeProxyConfiguration
        metricsBindAddress: "0.0.0.0"
- role: worker
- role: worker

deploy some workload on workers and deploy metricbeat

counter-pod.yml

apiVersion: v1
kind: Pod
metadata:
  name: counter
spec:
  containers:
    - name: count
      image: busybox:1.28
      args:
        - /bin/sh
        - -c
        - >
          i=0;
          while true;
          do
            echo "$i: $(date)" >> /var/log/1.log;
            echo "$(date) INFO $i" >> /var/log/2.log;
            i=$((i+1));
            sleep 5;
          done
      volumeMounts:
        - name: varlog
          mountPath: /var/log
      resources:
        limits:
          memory: "64Mi"
          cpu: "100m"
    - name: count-log-1
      image: busybox:1.28
      args: [/bin/sh, -c, "tail -n+1 -F /var/log/1.log"]
      volumeMounts:
        - name: varlog
          mountPath: /var/log
      resources:
        limits:
          memory: "64Mi"
          cpu: "100m"
  volumes:
    - name: varlog
      emptyDir: {}

kubectl deploy -f counter-pod.yml

edit counter-pod.yml manifest: change the pod name to counter2 and run command above again

In this setup:
kind-worker is a node, where metricbeat leader is running.

Issue: kubernetes.pod.cpu.usage.limit.pct and kubernetes.pod.memory.usage.limit.pct are missing on non-leader nodes, even though cpu/mem limits are defined for the counter2 pod from the screenshot above.

The text was updated successfully, but these errors were encountered:

tetianakravchenko · 2022-07-21T09:07:28Z

To get those metrics you can suggest to enable node metricset, they shouldn’t have issue with missing metrics then:

...
- module: kubernetes
  metricsets:
    - node    # <- node should be enabled
    #- system
    - pod
    - container
    - volume
 ...

draemi · 2022-07-21T10:21:42Z

Enabling that metricset will help in getting some metrics for POD CPU and MEM Usage pct, however they are not accurate.
Secondly, I'm not an expert on elastic code, however looking at the metricbeat code where these metrices are computed, to me looks like the nodeCores from node metricset, should be used only as a fallback when the container limits are not available (is a default when container core limits are not found in the cache)

https://github.com/elastic/beats/blob/ad192cd501359d543de1a2b0036f9f5c74aa2289/metricbeat/module/kubernetes/pod/data.go#L56

https://github.com/elastic/beats/blob/ad192cd501359d543de1a2b0036f9f5c74aa2289/metricbeat/module/kubernetes/pod/data.go#L121

With system and node metrices enabled, even if we get some cpu/mem usage limit pct data for the PODs, I don't think they are pretty accurate as they are not computed based on the container limits, but based on node cores.

The nodeCores (provided by node metricset) is used as a fallback. If a container has limits defined the cache will contain this defined in the container spec cpu.limit https://github.com/elastic/beats/blob/ad192cd501359d543de1a2b0036f9f5c74aa2289/metricbeat/module/kubernetes/util/kubernetes.go#L246

To us doesn't look right that metricbeat leader computes POD metrices based on some info it finds in the cache, and the other metricbeats compute POD metrices based on different info.

Please note we were in discussion with elastic for weeks regarding this issue and we reached to a common agreement this is not the way to go (enabling node metricset).
So please don't start over!

tetianakravchenko added the Team:Cloudnative-Monitoring Label for the Cloud Native Monitoring team label Jul 6, 2022

gsantoro mentioned this issue Jul 28, 2022

Feature/remove k8s cache #32539

Merged

6 tasks

gsantoro closed this as completed in #32539 Aug 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metricbeat: Missing kubernetes.pod cpu and memory usage percentage on non-leader nodes #32232

Metricbeat: Missing kubernetes.pod cpu and memory usage percentage on non-leader nodes #32232

tetianakravchenko commented Jul 6, 2022 •

edited

Loading

tetianakravchenko commented Jul 21, 2022

draemi commented Jul 21, 2022

Metricbeat: Missing kubernetes.pod cpu and memory usage percentage on non-leader nodes #32232

Metricbeat: Missing kubernetes.pod cpu and memory usage percentage on non-leader nodes #32232

Comments

tetianakravchenko commented Jul 6, 2022 • edited Loading

tetianakravchenko commented Jul 21, 2022

draemi commented Jul 21, 2022

tetianakravchenko commented Jul 6, 2022 •

edited

Loading