Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate sample for HPA metrics using multiple external metrics with same metric name #2405

Open
tl-eirik-albrigtsen opened this issue May 30, 2024 · 3 comments
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@tl-eirik-albrigtsen
Copy link

tl-eirik-albrigtsen commented May 30, 2024

What happened:
Upgraded to prometheus 2.52 which now is more strict about duplicates: prometheus/prometheus#14089 (similar to #2390 )

Have an HPA that looks like this:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: payments-gateway
  namespace: payments
spec:
  maxReplicas: 30
  metrics:
  - external:
      metric:
        name: amqp_messages_unacknowledged
        selector:
          matchLabels:
            queue: queue_one
      target:
        averageValue: "40"
        type: AverageValue
    type: External
  - external:
      metric:
        name: amqp_messages_unacknowledged
        selector:
          matchLabels:
            queue: queue_two
      target:
        averageValue: "40"
        type: AverageValue
    type: External
  - external:
      metric:
        name: amqp_messages_unacknowledged
        selector:
          matchLabels:
            queue: queue_three
      target:
        averageValue: "40"
        type: AverageValue
    type: External
  - external:
      metric:
        name: amqp_messages_unacknowledged
        selector:
          matchLabels:
            queue: queue_four
      target:
        averageValue: "40"
        type: AverageValue
    type: External
  - resource:
      name: cpu
      target:
        averageValue: 300m
        type: AverageValue
    type: Resource
  minReplicas: 3
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: payments-gateway

and this causes KSM to attempt to produce duplicate metrics because it expects the metric name to be unique across target metrics which is not true (the selectors does that).

debug logs from prometheus:

ts=2024-05-30T12:02:24.763Z caller=scrape.go:1777 level=debug component="scrape manager" scrape_pool=serviceMonitor/monitoring/prometheus-stack-kube-state-metrics/0 target=http://10.42.123.239:8080/metrics msg="Duplicate sample for timestamp" series="kube_horizontalpodautoscaler_spec_target_metric{namespace=\"payments\",horizontalpodautoscaler=\"payments-gateway\",metric_name=\"amqp_messages_unacknowledged\",metric_target_type=\"average\"}"
ts=2024-05-30T12:02:24.763Z caller=scrape.go:1777 level=debug component="scrape manager" scrape_pool=serviceMonitor/monitoring/prometheus-stack-kube-state-metrics/0 target=http://10.42.123.239:8080/metrics msg="Duplicate sample for timestamp" series="kube_horizontalpodautoscaler_spec_target_metric{namespace=\"payments\",horizontalpodautoscaler=\"payments-gateway\",metric_name=\"amqp_messages_unacknowledged\",metric_target_type=\"average\"}"
ts=2024-05-30T12:02:24.763Z caller=scrape.go:1777 level=debug component="scrape manager" scrape_pool=serviceMonitor/monitoring/prometheus-stack-kube-state-metrics/0 target=http://10.42.123.239:8080/metrics msg="Duplicate sample for timestamp" series="kube_horizontalpodautoscaler_spec_target_metric{namespace=\"payments\",horizontalpodautoscaler=\"payments-gateway\",metric_name=\"amqp_messages_unacknowledged\",metric_target_type=\"average\"}"
ts=2024-05-30T12:02:24.764Z caller=scrape.go:1777 level=debug component="scrape manager" scrape_pool=serviceMonitor/monitoring/prometheus-stack-kube-state-metrics/0 target=http://10.42.123.239:8080/metrics msg="Duplicate sample for timestamp" series="kube_horizontalpodautoscaler_status_target_metric{namespace=\"payments\",horizontalpodautoscaler=\"payments-gateway\",metric_name=\"amqp_messages_unacknowledged\",metric_target_type=\"value\"}"
ts=2024-05-30T12:02:24.764Z caller=scrape.go:1777 level=debug component="scrape manager" scrape_pool=serviceMonitor/monitoring/prometheus-stack-kube-state-metrics/0 target=http://10.42.123.239:8080/metrics msg="Duplicate sample for timestamp" series="kube_horizontalpodautoscaler_status_target_metric{namespace=\"payments\",horizontalpodautoscaler=\"payments-gateway\",metric_name=\"amqp_messages_unacknowledged\",metric_target_type=\"average\"}"
ts=2024-05-30T12:02:24.764Z caller=scrape.go:1777 level=debug component="scrape manager" scrape_pool=serviceMonitor/monitoring/prometheus-stack-kube-state-metrics/0 target=http://10.42.123.239:8080/metrics msg="Duplicate sample for timestamp" series="kube_horizontalpodautoscaler_status_target_metric{namespace=\"payments\",horizontalpodautoscaler=\"payments-gateway\",metric_name=\"amqp_messages_unacknowledged\",metric_target_type=\"value\"}"
ts=2024-05-30T12:02:24.764Z caller=scrape.go:1777 level=debug component="scrape manager" scrape_pool=serviceMonitor/monitoring/prometheus-stack-kube-state-metrics/0 target=http://10.42.123.239:8080/metrics msg="Duplicate sample for timestamp" series="kube_horizontalpodautoscaler_status_target_metric{namespace=\"payments\",horizontalpodautoscaler=\"payments-gateway\",metric_name=\"amqp_messages_unacknowledged\",metric_target_type=\"average\"}"
ts=2024-05-30T12:02:24.764Z caller=scrape.go:1777 level=debug component="scrape manager" scrape_pool=serviceMonitor/monitoring/prometheus-stack-kube-state-metrics/0 target=http://10.42.123.239:8080/metrics msg="Duplicate sample for timestamp" series="kube_horizontalpodautoscaler_status_target_metric{namespace=\"payments\",horizontalpodautoscaler=\"payments-gateway\",metric_name=\"amqp_messages_unacknowledged\",metric_target_type=\"value\"}"
ts=2024-05-30T12:02:24.764Z caller=scrape.go:1777 level=debug component="scrape manager" scrape_pool=serviceMonitor/monitoring/prometheus-stack-kube-state-metrics/0 target=http://10.42.123.239:8080/metrics msg="Duplicate sample for timestamp" series="kube_horizontalpodautoscaler_status_target_metric{namespace=\"payments\",horizontalpodautoscaler=\"payments-gateway\",metric_name=\"amqp_messages_unacknowledged\",metric_target_type=\"average\"}"

and this causes the standard mixin alert PrometheusDuplicateTimestamps to continuously trigger.

What you expected to happen:

No duplicate metrics. I'm guessing the temporary solution is to drop action on kube_horizontalpodautoscaler_status_target_metric|kube_horizontalpodautoscaler_spec_target_metric, but figured it might be worth raising an issue here for others.

How to reproduce it (as minimally and precisely as possible):

An HPA like above, and some way to use external metrics in HPAs (prometheus-adapter or keda I guess) and I expectdefault kube-state-metrics scraping of hpa metrics.

Anything else we need to know?:

Environment:

  • kube-state-metrics version: v2.12.0
  • Kubernetes version: happens across Kubernetes versions (tested 1.25 and 1.29)
  • Other info:
    resources flag from ksm:
    - --resources=certificatesigningrequests,configmaps,cronjobs,daemonsets,deployments,endpoints,horizontalpodautoscalers,ingresses,jobs,leases,limitranges,mutatingwebhookconfigurations,namespaces,networkpolicies,nodes,persistentvolumeclaims,persistentvolumes,poddisruptionbudgets,pods,replicasets,replicationcontrollers,resourcequotas,secrets,services,statefulsets,storageclasses,validatingwebhookconfigurations,volumeattachments
@tl-eirik-albrigtsen tl-eirik-albrigtsen added the kind/bug Categorizes issue or PR as related to a bug. label May 30, 2024
@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label May 30, 2024
@tl-eirik-albrigtsen
Copy link
Author

Working workaround with metricsRelabelings (here using kube-prometheus-stack):

  kube-state-metrics:
    prometheus:
      monitor:
        enabled: true
        metricRelabelings:
        - action: drop
          sourceLabels: [__name__]
          # these metrics generates duplicates
          # https://github.com/kubernetes/kube-state-metrics/issues/2405
          regex: kube_horizontalpodautoscaler_status_target_metric|kube_horizontalpodautoscaler_spec_target_metric

@dgrisonnet
Copy link
Member

Could be related to #2408

/assign
/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jun 13, 2024
@dpericaxon
Copy link

Has anyone been able to find a work around other then dropping the label?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

4 participants