Duplicate sample for HPA metrics using multiple external metrics with same metric name #2405

tl-eirik-albrigtsen · 2024-05-30T12:22:39Z

What happened:
Upgraded to prometheus 2.52 which now is more strict about duplicates: prometheus/prometheus#14089 (similar to #2390 )

Have an HPA that looks like this:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: payments-gateway
  namespace: payments
spec:
  maxReplicas: 30
  metrics:
  - external:
      metric:
        name: amqp_messages_unacknowledged
        selector:
          matchLabels:
            queue: queue_one
      target:
        averageValue: "40"
        type: AverageValue
    type: External
  - external:
      metric:
        name: amqp_messages_unacknowledged
        selector:
          matchLabels:
            queue: queue_two
      target:
        averageValue: "40"
        type: AverageValue
    type: External
  - external:
      metric:
        name: amqp_messages_unacknowledged
        selector:
          matchLabels:
            queue: queue_three
      target:
        averageValue: "40"
        type: AverageValue
    type: External
  - external:
      metric:
        name: amqp_messages_unacknowledged
        selector:
          matchLabels:
            queue: queue_four
      target:
        averageValue: "40"
        type: AverageValue
    type: External
  - resource:
      name: cpu
      target:
        averageValue: 300m
        type: AverageValue
    type: Resource
  minReplicas: 3
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: payments-gateway

and this causes KSM to attempt to produce duplicate metrics because it expects the metric name to be unique across target metrics which is not true (the selectors does that).

debug logs from prometheus:

ts=2024-05-30T12:02:24.763Z caller=scrape.go:1777 level=debug component="scrape manager" scrape_pool=serviceMonitor/monitoring/prometheus-stack-kube-state-metrics/0 target=http://10.42.123.239:8080/metrics msg="Duplicate sample for timestamp" series="kube_horizontalpodautoscaler_spec_target_metric{namespace=\"payments\",horizontalpodautoscaler=\"payments-gateway\",metric_name=\"amqp_messages_unacknowledged\",metric_target_type=\"average\"}"
ts=2024-05-30T12:02:24.763Z caller=scrape.go:1777 level=debug component="scrape manager" scrape_pool=serviceMonitor/monitoring/prometheus-stack-kube-state-metrics/0 target=http://10.42.123.239:8080/metrics msg="Duplicate sample for timestamp" series="kube_horizontalpodautoscaler_spec_target_metric{namespace=\"payments\",horizontalpodautoscaler=\"payments-gateway\",metric_name=\"amqp_messages_unacknowledged\",metric_target_type=\"average\"}"
ts=2024-05-30T12:02:24.763Z caller=scrape.go:1777 level=debug component="scrape manager" scrape_pool=serviceMonitor/monitoring/prometheus-stack-kube-state-metrics/0 target=http://10.42.123.239:8080/metrics msg="Duplicate sample for timestamp" series="kube_horizontalpodautoscaler_spec_target_metric{namespace=\"payments\",horizontalpodautoscaler=\"payments-gateway\",metric_name=\"amqp_messages_unacknowledged\",metric_target_type=\"average\"}"
ts=2024-05-30T12:02:24.764Z caller=scrape.go:1777 level=debug component="scrape manager" scrape_pool=serviceMonitor/monitoring/prometheus-stack-kube-state-metrics/0 target=http://10.42.123.239:8080/metrics msg="Duplicate sample for timestamp" series="kube_horizontalpodautoscaler_status_target_metric{namespace=\"payments\",horizontalpodautoscaler=\"payments-gateway\",metric_name=\"amqp_messages_unacknowledged\",metric_target_type=\"value\"}"
ts=2024-05-30T12:02:24.764Z caller=scrape.go:1777 level=debug component="scrape manager" scrape_pool=serviceMonitor/monitoring/prometheus-stack-kube-state-metrics/0 target=http://10.42.123.239:8080/metrics msg="Duplicate sample for timestamp" series="kube_horizontalpodautoscaler_status_target_metric{namespace=\"payments\",horizontalpodautoscaler=\"payments-gateway\",metric_name=\"amqp_messages_unacknowledged\",metric_target_type=\"average\"}"
ts=2024-05-30T12:02:24.764Z caller=scrape.go:1777 level=debug component="scrape manager" scrape_pool=serviceMonitor/monitoring/prometheus-stack-kube-state-metrics/0 target=http://10.42.123.239:8080/metrics msg="Duplicate sample for timestamp" series="kube_horizontalpodautoscaler_status_target_metric{namespace=\"payments\",horizontalpodautoscaler=\"payments-gateway\",metric_name=\"amqp_messages_unacknowledged\",metric_target_type=\"value\"}"
ts=2024-05-30T12:02:24.764Z caller=scrape.go:1777 level=debug component="scrape manager" scrape_pool=serviceMonitor/monitoring/prometheus-stack-kube-state-metrics/0 target=http://10.42.123.239:8080/metrics msg="Duplicate sample for timestamp" series="kube_horizontalpodautoscaler_status_target_metric{namespace=\"payments\",horizontalpodautoscaler=\"payments-gateway\",metric_name=\"amqp_messages_unacknowledged\",metric_target_type=\"average\"}"
ts=2024-05-30T12:02:24.764Z caller=scrape.go:1777 level=debug component="scrape manager" scrape_pool=serviceMonitor/monitoring/prometheus-stack-kube-state-metrics/0 target=http://10.42.123.239:8080/metrics msg="Duplicate sample for timestamp" series="kube_horizontalpodautoscaler_status_target_metric{namespace=\"payments\",horizontalpodautoscaler=\"payments-gateway\",metric_name=\"amqp_messages_unacknowledged\",metric_target_type=\"value\"}"
ts=2024-05-30T12:02:24.764Z caller=scrape.go:1777 level=debug component="scrape manager" scrape_pool=serviceMonitor/monitoring/prometheus-stack-kube-state-metrics/0 target=http://10.42.123.239:8080/metrics msg="Duplicate sample for timestamp" series="kube_horizontalpodautoscaler_status_target_metric{namespace=\"payments\",horizontalpodautoscaler=\"payments-gateway\",metric_name=\"amqp_messages_unacknowledged\",metric_target_type=\"average\"}"

and this causes the standard mixin alert PrometheusDuplicateTimestamps to continuously trigger.

What you expected to happen:

No duplicate metrics. I'm guessing the temporary solution is to drop action on kube_horizontalpodautoscaler_status_target_metric|kube_horizontalpodautoscaler_spec_target_metric, but figured it might be worth raising an issue here for others.

How to reproduce it (as minimally and precisely as possible):

An HPA like above, and some way to use external metrics in HPAs (prometheus-adapter or keda I guess) and I expectdefault kube-state-metrics scraping of hpa metrics.

Anything else we need to know?:

Environment:

kube-state-metrics version: v2.12.0
Kubernetes version: happens across Kubernetes versions (tested 1.25 and 1.29)
Other info:
resources flag from ksm:

    - --resources=certificatesigningrequests,configmaps,cronjobs,daemonsets,deployments,endpoints,horizontalpodautoscalers,ingresses,jobs,leases,limitranges,mutatingwebhookconfigurations,namespaces,networkpolicies,nodes,persistentvolumeclaims,persistentvolumes,poddisruptionbudgets,pods,replicasets,replicationcontrollers,resourcequotas,secrets,services,statefulsets,storageclasses,validatingwebhookconfigurations,volumeattachments

The text was updated successfully, but these errors were encountered:

tl-eirik-albrigtsen · 2024-05-31T10:53:51Z

Working workaround with metricsRelabelings (here using kube-prometheus-stack):

  kube-state-metrics:
    prometheus:
      monitor:
        enabled: true
        metricRelabelings:
        - action: drop
          sourceLabels: [__name__]
          # these metrics generates duplicates
          # https://github.com/kubernetes/kube-state-metrics/issues/2405
          regex: kube_horizontalpodautoscaler_status_target_metric|kube_horizontalpodautoscaler_spec_target_metric

dgrisonnet · 2024-06-13T16:59:46Z

Could be related to #2408

/assign
/triage accepted

dpericaxon · 2024-09-09T20:47:48Z

Has anyone been able to find a work around other then dropping the label?

tl-eirik-albrigtsen added the kind/bug Categorizes issue or PR as related to a bug. label May 30, 2024

k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label May 30, 2024

k8s-ci-robot assigned dgrisonnet Jun 13, 2024

k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jun 13, 2024

jutley mentioned this issue Jul 11, 2024

Duplicate sample for ingress path metrics #2445

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Duplicate sample for HPA metrics using multiple external metrics with same metric name #2405

Duplicate sample for HPA metrics using multiple external metrics with same metric name #2405

tl-eirik-albrigtsen commented May 30, 2024 •

edited

Loading

tl-eirik-albrigtsen commented May 31, 2024

dgrisonnet commented Jun 13, 2024

dpericaxon commented Sep 9, 2024

Duplicate sample for HPA metrics using multiple external metrics with same metric name #2405

Duplicate sample for HPA metrics using multiple external metrics with same metric name #2405

Comments

tl-eirik-albrigtsen commented May 30, 2024 • edited Loading

tl-eirik-albrigtsen commented May 31, 2024

dgrisonnet commented Jun 13, 2024

dpericaxon commented Sep 9, 2024

tl-eirik-albrigtsen commented May 30, 2024 •

edited

Loading