AWS load balancer controller continues to provide high cardinality unbounded metrics to prometheus endpoint #2897

yodaflomaster · 2022-11-25T15:33:35Z

Describe the bug
After upgrading version of aws load balancer controller using helm, i keep seeing rest_client_request_latency_seconds histogram metric exposed on the Prometheus metrics endpoint. It includes a URL tag containing the URI of all API versions. It's about ~900 metrics. I've delete chart, check dependencies and redeploy. But the problem didn't go away
kubernetes-sigs/controller-runtime#1423
kubernetes-sigs/controller-runtime#1587

...
rest_client_request_latency_seconds_bucket{url="https://172.20.0.1:443/api/v1/endpoints?limit=%7Bvalue%7D&resourceVersion=%7Bvalue%7D",verb="GET",le="0.001"} 0
rest_client_request_latency_seconds_bucket{url="https://172.20.0.1:443/api/v1/endpoints?limit=%7Bvalue%7D&resourceVersion=%7Bvalue%7D",verb="GET",le="0.002"} 0
rest_client_request_latency_seconds_bucket{url="https://172.20.0.1:443/api/v1/endpoints?limit=%7Bvalue%7D&resourceVersion=%7Bvalue%7D",verb="GET",le="0.004"} 0
rest_client_request_latency_seconds_bucket{url="https://172.20.0.1:443/api/v1/endpoints?limit=%7Bvalue%7D&resourceVersion=%7Bvalue%7D",verb="GET",le="0.008"} 0
rest_client_request_latency_seconds_bucket{url="https://172.20.0.1:443/api/v1/endpoints?limit=%7Bvalue%7D&resourceVersion=%7Bvalue%7D",verb="GET",le="0.016"} 1
rest_client_request_latency_seconds_bucket{url="https://172.20.0.1:443/api/v1/endpoints?limit=%7Bvalue%7D&resourceVersion=%7Bvalue%7D",verb="GET",le="0.032"} 1
rest_client_request_latency_seconds_bucket{url="https://172.20.0.1:443/api/v1/endpoints?limit=%7Bvalue%7D&resourceVersion=%7Bvalue%7D",verb="GET",le="0.064"} 1
rest_client_request_latency_seconds_bucket{url="https://172.20.0.1:443/api/v1/endpoints?limit=%7Bvalue%7D&resourceVersion=%7Bvalue%7D",verb="GET",le="0.128"} 1
rest_client_request_latency_seconds_bucket{url="https://172.20.0.1:443/api/v1/endpoints?limit=%7Bvalue%7D&resourceVersion=%7Bvalue%7D",verb="GET",le="0.256"} 1
rest_client_request_latency_seconds_bucket{url="https://172.20.0.1:443/api/v1/endpoints?limit=%7Bvalue%7D&resourceVersion=%7Bvalue%7D",verb="GET",le="0.512"} 1
rest_client_request_latency_seconds_bucket{url="https://172.20.0.1:443/api/v1/endpoints?limit=%7Bvalue%7D&resourceVersion=%7Bvalue%7D",verb="GET",le="+Inf"} 1
rest_client_request_latency_seconds_sum{url="https://172.20.0.1:443/api/v1/endpoints?limit=%7Bvalue%7D&resourceVersion=%7Bvalue%7D",verb="GET"} 0.010152667
rest_client_request_latency_seconds_count{url="https://172.20.0.1:443/api/v1/endpoints?limit=%7Bvalue%7D&resourceVersion=%7Bvalue%7D",verb="GET"} 1
rest_client_request_latency_seconds_bucket{url="https://172.20.0.1:443/api/v1/namespaces/%7Bnamespace%7D/configmaps/%7Bname%7D",verb="GET",le="0.001"} 0
rest_client_request_latency_seconds_bucket{url="https://172.20.0.1:443/api/v1/namespaces/%7Bnamespace%7D/configmaps/%7Bname%7D",verb="GET",le="0.002"} 0
rest_client_request_latency_seconds_bucket{url="https://172.20.0.1:443/api/v1/namespaces/%7Bnamespace%7D/configmaps/%7Bname%7D",verb="GET",le="0.004"} 127
...

Steps to reproduce:

Deploy the aws-load-balancer-controller using the Helm Chart with the ServiceMonitor disabled (serviceMonitor.enabled=false Chart value). Get metrics from the exposed Prometheus endpoint (Chart default, :8080/metrics).

Expected outcome:

The rest_client_request_latency_seconds metric either not being present at in the exposed metrics.

Environment:

AWS Load Balancer controller: v2.4.5
Chart version: 1.4.6
EKS: 1.21.14-eks-fb459a0

Additional Context:
Here my chart values file, other values by default.

replicaCount: 2

image:
  repository: 602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon/aws-load-balancer-controller
  tag: v2.4.5
  pullPolicy: IfNotPresent

clusterName: main-eks-qa

fullnameOverride: aws-load-balancer-controller

serviceAccount:
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::############:role/aws-load-balancer-controller

podLabels:
  ######.####/instance: aws-load-balancer-controller

webhookTLS:
  caCert:
  cert:
  key:

disableIngressClassAnnotation: true

disableIngressGroupNameAnnotation: true

podDisruptionBudget:
  maxUnavailable: 1

serviceMonitor:
  enabled: false
  additionalLabels: {}
  interval: 1m

clusterSecretsPermissions:
  allowAllSecrets: false

The text was updated successfully, but these errors were encountered:

k8s-triage-robot · 2023-03-07T23:39:20Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

yodaflomaster · 2023-04-04T10:43:44Z

/remove-lifecycle stale

k8s-triage-robot · 2023-07-03T11:05:33Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

yodaflomaster · 2023-07-03T11:39:16Z

/remove-lifecycle stale

k8s-triage-robot · 2024-01-23T18:52:00Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

yodaflomaster · 2024-02-06T17:29:23Z

/remove-lifecycle stale

stevehipwell · 2024-04-11T16:25:00Z

I've created #3645 which will allow the dropping of metrics; you'll be able to do the following in the Helm values to remove the rest client metrics once it's merged.

serviceMonitor:
  enabled: true
  metricRelabelings:
    - sourceLabels: ["__name__"]
      regex: ^rest_client_.+
      action: drop

shraddhabang · 2024-05-20T17:42:53Z

Delivered in v2.8.0

M00nF1sh added the triage/needs-investigation label Dec 7, 2022

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 7, 2023

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 4, 2023

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 3, 2023

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 3, 2023

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 23, 2024

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 6, 2024

stevehipwell mentioned this issue Apr 11, 2024

feat(chart): Added additional service monitor functionality #3645

Merged

6 tasks

shraddhabang closed this as completed May 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AWS load balancer controller continues to provide high cardinality unbounded metrics to prometheus endpoint #2897

AWS load balancer controller continues to provide high cardinality unbounded metrics to prometheus endpoint #2897

yodaflomaster commented Nov 25, 2022

k8s-triage-robot commented Mar 7, 2023

yodaflomaster commented Apr 4, 2023

k8s-triage-robot commented Jul 3, 2023

yodaflomaster commented Jul 3, 2023

k8s-triage-robot commented Jan 23, 2024

yodaflomaster commented Feb 6, 2024

stevehipwell commented Apr 11, 2024

shraddhabang commented May 20, 2024

AWS load balancer controller continues to provide high cardinality unbounded metrics to prometheus endpoint #2897

AWS load balancer controller continues to provide high cardinality unbounded metrics to prometheus endpoint #2897

Comments

yodaflomaster commented Nov 25, 2022

k8s-triage-robot commented Mar 7, 2023

yodaflomaster commented Apr 4, 2023

k8s-triage-robot commented Jul 3, 2023

yodaflomaster commented Jul 3, 2023

k8s-triage-robot commented Jan 23, 2024

yodaflomaster commented Feb 6, 2024

stevehipwell commented Apr 11, 2024

shraddhabang commented May 20, 2024