Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Example Metric Dashboards and rules use deprecated "kubernetes_pod_name" and "container_name" labels #3308

Closed
chgl opened this issue Jul 12, 2020 · 3 comments · Fixed by #3312
Assignees
Labels

Comments

@chgl
Copy link
Contributor

chgl commented Jul 12, 2020

Describe the bug
The Grafana Dashboards and Prometheus AlertManager rules use the kubernetes_pod_name and container_name labels which were deprecated (see kubernetes/kubernetes#80376).

I am currently updating the naming for the strimzi-zookeeper, strimzi-kafka, strimzi-kafka-connect Grafana Dashboards while also migrating the stats panel to Grafana v7's newest version. Let me know if there's interest in a PR.

To Reproduce
Steps to reproduce the behavior:

  1. Create a kind Kubernetes v1.18.2 cluster: kind create cluster

  2. Install the prometheus operator: helm install prometheus stable/prometheus-operator

  3. Install Strimzi: helm install strimzi strimzi/strimzi-kafka-operator

  4. Create a Kafka Cluster via kafka.yaml

  5. Create a service monitor via kafka-sm.yaml

Note that I have commented out the __meta_kubernetes_endpoints_name re-labeling as it would -for some reason- prevent metrics from being scraped at all.

  1. Open the Prometheus Operator's default Grafana Installation
  2. Import the Strimzi Kafka Dashboard from https://github.com/strimzi/strimzi-kafka-operator/blob/a040cf2f7cddcc82b0d1a731dc7bf59089537931/examples/metrics/grafana-dashboards/strimzi-kafka.json
  3. Observe that the charts display 0 as a default value or "No Data"

Expected behavior
The charts included in the dashboard should display the expected metrics.

Environment (please complete the following information):

  • Strimzi version: 0.18.0
  • Installation method: Helm
  • Kubernetes cluster: Kubernetes Kind 1.18.2
  • Prometheus Operator: prometheus-operator-8.16.1 app version: 0.38.1

YAML files and logs

kafka.yaml:

apiVersion: kafka.strimzi.io/v1beta1
kind: Kafka
metadata:
  name: kafka-cluster
spec:
  kafkaExporter: {}
  kafka:
    version: 2.5.0
    replicas: 1
    config:
      offsets.topic.replication.factor: 1
      transaction.state.log.replication.factor: 1
      transaction.state.log.min.isr: 1
      log.message.format.version: "2.5"
    storage:
      type: ephemeral
    listeners:
      plain: {}
      tls: {}
    metrics:
      # Inspired by config from Kafka 2.0.0 example rules:
      # https://github.com/prometheus/jmx_exporter/blob/master/example_configs/kafka-2_0_0.yml
      lowercaseOutputName: true
      rules:
        # Special cases and very specific rules
        - pattern: kafka.server<type=(.+), name=(.+), clientId=(.+), topic=(.+), partition=(.*)><>Value
          name: kafka_server_$1_$2
          type: GAUGE
          labels:
            clientId: "$3"
            topic: "$4"
            partition: "$5"
        - pattern: kafka.server<type=(.+), name=(.+), clientId=(.+), brokerHost=(.+), brokerPort=(.+)><>Value
          name: kafka_server_$1_$2
          type: GAUGE
          labels:
            clientId: "$3"
            broker: "$4:$5"
        # Some percent metrics use MeanRate attribute
        # Ex) kafka.server<type=(KafkaRequestHandlerPool), name=(RequestHandlerAvgIdlePercent)><>MeanRate
        - pattern: kafka.(\w+)<type=(.+), name=(.+)Percent\w*><>MeanRate
          name: kafka_$1_$2_$3_percent
          type: GAUGE
        # Generic gauges for percents
        - pattern: kafka.(\w+)<type=(.+), name=(.+)Percent\w*><>Value
          name: kafka_$1_$2_$3_percent
          type: GAUGE
        - pattern: kafka.(\w+)<type=(.+), name=(.+)Percent\w*, (.+)=(.+)><>Value
          name: kafka_$1_$2_$3_percent
          type: GAUGE
          labels:
            "$4": "$5"
        # Generic per-second counters with 0-2 key/value pairs
        - pattern: kafka.(\w+)<type=(.+), name=(.+)PerSec\w*, (.+)=(.+), (.+)=(.+)><>Count
          name: kafka_$1_$2_$3_total
          type: COUNTER
          labels:
            "$4": "$5"
            "$6": "$7"
        - pattern: kafka.(\w+)<type=(.+), name=(.+)PerSec\w*, (.+)=(.+)><>Count
          name: kafka_$1_$2_$3_total
          type: COUNTER
          labels:
            "$4": "$5"
        - pattern: kafka.(\w+)<type=(.+), name=(.+)PerSec\w*><>Count
          name: kafka_$1_$2_$3_total
          type: COUNTER
        # Generic gauges with 0-2 key/value pairs
        - pattern: kafka.(\w+)<type=(.+), name=(.+), (.+)=(.+), (.+)=(.+)><>Value
          name: kafka_$1_$2_$3
          type: GAUGE
          labels:
            "$4": "$5"
            "$6": "$7"
        - pattern: kafka.(\w+)<type=(.+), name=(.+), (.+)=(.+)><>Value
          name: kafka_$1_$2_$3
          type: GAUGE
          labels:
            "$4": "$5"
        - pattern: kafka.(\w+)<type=(.+), name=(.+)><>Value
          name: kafka_$1_$2_$3
          type: GAUGE
        # Emulate Prometheus 'Summary' metrics for the exported 'Histogram's.
        # Note that these are missing the '_sum' metric!
        - pattern: kafka.(\w+)<type=(.+), name=(.+), (.+)=(.+), (.+)=(.+)><>Count
          name: kafka_$1_$2_$3_count
          type: COUNTER
          labels:
            "$4": "$5"
            "$6": "$7"
        - pattern: kafka.(\w+)<type=(.+), name=(.+), (.+)=(.*), (.+)=(.+)><>(\d+)thPercentile
          name: kafka_$1_$2_$3
          type: GAUGE
          labels:
            "$4": "$5"
            "$6": "$7"
            quantile: "0.$8"
        - pattern: kafka.(\w+)<type=(.+), name=(.+), (.+)=(.+)><>Count
          name: kafka_$1_$2_$3_count
          type: COUNTER
          labels:
            "$4": "$5"
        - pattern: kafka.(\w+)<type=(.+), name=(.+), (.+)=(.*)><>(\d+)thPercentile
          name: kafka_$1_$2_$3
          type: GAUGE
          labels:
            "$4": "$5"
            quantile: "0.$6"
        - pattern: kafka.(\w+)<type=(.+), name=(.+)><>Count
          name: kafka_$1_$2_$3_count
          type: COUNTER
        - pattern: kafka.(\w+)<type=(.+), name=(.+)><>(\d+)thPercentile
          name: kafka_$1_$2_$3
          type: GAUGE
          labels:
            quantile: "0.$4"
  zookeeper:
    replicas: 3
    storage:
      type: ephemeral
    metrics:
      # Inspired by Zookeeper rules
      # https://github.com/prometheus/jmx_exporter/blob/master/example_configs/zookeeper.yaml
      lowercaseOutputName: true
      rules:
        # replicated Zookeeper
        - pattern: "org.apache.ZooKeeperService<name0=ReplicatedServer_id(\\d+)><>(\\w+)"
          name: "zookeeper_$2"
          type: GAUGE
        - pattern: "org.apache.ZooKeeperService<name0=ReplicatedServer_id(\\d+), name1=replica.(\\d+)><>(\\w+)"
          name: "zookeeper_$3"
          type: GAUGE
          labels:
            replicaId: "$2"
        - pattern: "org.apache.ZooKeeperService<name0=ReplicatedServer_id(\\d+), name1=replica.(\\d+), name2=(\\w+)><>(Packets\\w+)"
          name: "zookeeper_$4"
          type: COUNTER
          labels:
            replicaId: "$2"
            memberType: "$3"
        - pattern: "org.apache.ZooKeeperService<name0=ReplicatedServer_id(\\d+), name1=replica.(\\d+), name2=(\\w+)><>(\\w+)"
          name: "zookeeper_$4"
          type: GAUGE
          labels:
            replicaId: "$2"
            memberType: "$3"
        - pattern: "org.apache.ZooKeeperService<name0=ReplicatedServer_id(\\d+), name1=replica.(\\d+), name2=(\\w+), name3=(\\w+)><>(\\w+)"
          name: "zookeeper_$4_$5"
          type: GAUGE
          labels:
            replicaId: "$2"
            memberType: "$3"
        # standalone Zookeeper
        - pattern: "org.apache.ZooKeeperService<name0=StandaloneServer_port(\\d+)><>(\\w+)"
          type: GAUGE
          name: "zookeeper_$2"
        - pattern: "org.apache.ZooKeeperService<name0=StandaloneServer_port(\\d+), name1=InMemoryDataTree><>(\\w+)"
          type: GAUGE
          name: "zookeeper_$2"
  entityOperator:
    topicOperator: {}

kafka-sm.yaml:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: kafka-cluster-service-monitor
  labels:
    release: prometheus
spec:
  selector:
    matchExpressions:
      - { key: strimzi.io/kind, operator: In, values: [Kafka, KafkaConnect] }
  namespaceSelector:
    matchNames:
      - default
  endpoints:
    - port: tcp-prometheus
      honorLabels: true
      interval: 10s
      scrapeTimeout: 10s
      path: /metrics
      scheme: http
      relabelings:
        # - sourceLabels: [__meta_kubernetes_endpoints_name]
        #   separator: ;
        #   regex: prometheus-kube-state-metrics
        #   replacement: $1
        #   action: keep
        - separator: ;
          regex: __meta_kubernetes_service_label_(.+)
          replacement: $1
          action: labelmap
        - sourceLabels: [__meta_kubernetes_namespace]
          separator: ;
          regex: (.*)
          targetLabel: namespace
          replacement: $1
          action: replace
        - sourceLabels: [__meta_kubernetes_namespace]
          separator: ;
          regex: (.*)
          targetLabel: kubernetes_namespace
          replacement: $1
          action: replace
        - sourceLabels: [__meta_kubernetes_service_name]
          separator: ;
          regex: (.*)
          targetLabel: kubernetes_name
          replacement: $1
          action: replace
        - sourceLabels: [__meta_kubernetes_pod_node_name]
          separator: ;
          regex: (.*)
          targetLabel: node_name
          replacement: $1
          action: replace
        - sourceLabels: [__meta_kubernetes_pod_host_ip]
          separator: ;
          regex: (.*)
          targetLabel: node_ip
          replacement: $1
          action: replace

    #### job_name: node-exporter
    - port: tcp-prometheus
      honorLabels: true
      interval: 10s
      scrapeTimeout: 10s
      path: /metrics
      scheme: http
      relabelings:
        - sourceLabels: [__meta_kubernetes_endpoints_name]
          separator: ;
          regex: prometheus-node-exporter
          replacement: $1
          action: keep
        - separator: ;
          regex: __meta_kubernetes_service_label_(.+)
          replacement: $1
          action: labelmap
        - sourceLabels: [__meta_kubernetes_namespace]
          separator: ;
          regex: (.*)
          targetLabel: namespace
          replacement: $1
          action: replace
        - sourceLabels: [__meta_kubernetes_namespace]
          separator: ;
          regex: (.*)
          targetLabel: kubernetes_namespace
          replacement: $1
          action: replace
        - sourceLabels: [__meta_kubernetes_service_name]
          separator: ;
          regex: (.*)
          targetLabel: kubernetes_name
          replacement: $1
          action: replace
        - sourceLabels: [__meta_kubernetes_pod_node_name]
          separator: ;
          regex: (.*)
          targetLabel: node_name
          replacement: $1
          action: replace
        - sourceLabels: [__meta_kubernetes_pod_host_ip]
          separator: ;
          regex: (.*)
          targetLabel: node_ip
          replacement: $1
          action: replace
@chgl chgl added the bug label Jul 12, 2020
@ppatierno
Copy link
Member

ppatierno commented Jul 13, 2020

@chgl thanks for raising this, I was just opening an issue and a PR for this.
Actually, the removed label is pod_name (now it's pod) and not kubernetes_pod_name (this is the result of the relabelling of our Prometheus additional scrape configuration).
It's also related to a wrong metrics path. Now, cadvisor exposes metrics on /metrics/cadvisor.
Anyway, I am going to open the PR with changes on the additional scrape configuration and on all the dashboards using it.

@chgl
Copy link
Contributor Author

chgl commented Jul 13, 2020

Awesome! What's the motivation behind re-labelling the pod_name/pod to kubernetes_pod_name anyway? - I can see that robustness towards upstream metric changes is one, as you only have to modify the relabelling in one place.

@adelcast
Copy link

Just hit this on Friday...this is the cadvisor change that caused this issue: kubernetes/kubernetes#69099

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants