kube_endpoint_address duplicates with Prometheus 2.52 #2408

gdlx · 2024-05-31T13:03:35Z

After upgrading to Prometheus 2.52, we had some alerts about dropped duplicates samples.

The prometheus log shown the following warning:

 scrape_pool=serviceMonitor/monitoring/kube-prometheus-stack-kube-state-metrics/0 target=http://100.91.220.12:8080/metrics msg="Error on ingesting samples with different value but same timestamp" num_dropped=1

Setting the log level to debug shown the concerned series:

scrape_pool=serviceMonitor/monitoring/kube-prometheus-stack-kube-state-metrics/0 target=http://100.91.220.12:8080/metrics msg="Duplicate sample for timestamp" series="kube_endpoint_address{namespace=\"monitoring\",endpoint=\"prometheus-operated\",ip=\"100.91.68.8\",ready=\"true\"}"

Checking the indicated series indeed shown the following duplicates:

kube_endpoint_address{namespace="monitoring",endpoint="prometheus-operated",ip="100.91.68.8",ready="true"} 1
kube_endpoint_address{namespace="monitoring",endpoint="prometheus-operated",ip="100.91.68.8",ready="true"} 1

The prometheus-operated endpoint has the following subsets:

subsets:
  - addresses:
      - ip: 100.91.43.113
        hostname: prometheus-kube-prometheus-istio-0
        nodeName: ip-100-91-48-253.eu-west-3.compute.internal
        targetRef:
          kind: Pod
          namespace: monitoring
          name: prometheus-kube-prometheus-istio-0
          uid: 1180e2a5-75e4-4098-961c-940264115438
      - ip: 100.91.68.8
        hostname: prometheus-kube-prometheus-stack-prometheus-0
        nodeName: ip-100-91-212-113.eu-west-3.compute.internal
        targetRef:
          kind: Pod
          namespace: monitoring
          name: prometheus-kube-prometheus-stack-prometheus-0
          uid: 257bdfed-e2b4-49c7-aaea-1b7bee1a520d
    ports:
      - name: http-web
        port: 9090
        protocol: TCP
  - addresses:
      - ip: 100.91.68.8
        hostname: prometheus-kube-prometheus-stack-prometheus-0
        nodeName: ip-100-91-212-113.eu-west-3.compute.internal
        targetRef:
          kind: Pod
          namespace: monitoring
          name: prometheus-kube-prometheus-stack-prometheus-0
          uid: 257bdfed-e2b4-49c7-aaea-1b7bee1a520d
    ports:
      - name: grpc
        port: 10901
        protocol: TCP

We can see the 2 entries on the same IP (100.91.68.8) but on different ports.
Grpc is enabled only by the Thanos sidecar container, and it's enabled only on one Prometheus instance.
I think there wouldn't have been duplicates if both instances had the same config (there would only have been one subset with both addresses and ports).

The only way I see to fix this would be to add a port label on the kube_endpoint_address metric.
Is there something else I can do or would this be considered as a bug ?

Thanks !

Environment:

kube-state-metrics version: 2.12.0
Kubernetes version: 1.28
Cloud provider or hardware configuration: AWS EKS
Other info:

The text was updated successfully, but these errors were encountered:

eimarfandino · 2024-06-05T08:27:52Z

I noticed the same, we are having

kube_endpoint_address{namespace="monitoring",endpoint="alertmanager-operated",ip="10.25.119.228",ready="true"} 1
kube_endpoint_address{namespace="monitoring",endpoint="alertmanager-operated",ip="10.25.119.228",ready="true"} 1

I do not know if this is related but in our case the endpoint is having this IP twice with different ports. in my case IP 10.25.119.228 listens to port 9094 and 9093.

gdlx · 2024-06-05T08:37:12Z

I do not know if this is related

@eimarfandino Yes, different ports but same issue.

zoopp · 2024-06-10T11:34:04Z

I'm writing to confirm that I'm seeing this on GKE as well. Services with multiple ports bound to the same IP lead to duplicate metrics being exported by kube-state-metrics. For example (IP addresses masked):

kube_endpoint_address{namespace="monitoring",endpoint="gke-mcs-dncadu41te",ip="xx.xx.xx.xx",ready="true"} 1
kube_endpoint_address{namespace="monitoring",endpoint="gke-mcs-dncadu41te",ip="xx.xx.xx.xx",ready="true"} 1
kube_endpoint_address{namespace="monitoring",endpoint="gke-mcs-1j2b5u4e7g",ip="yy.yy.yy.yy",ready="true"} 1
kube_endpoint_address{namespace="monitoring",endpoint="gke-mcs-1j2b5u4e7g",ip="yy.yy.yy.yy",ready="true"} 1
kube_endpoint_address{namespace="monitoring",endpoint="gke-mcs-1q8fig66j0",ip="zz.zz.zz.zz",ready="true"} 1
kube_endpoint_address{namespace="monitoring",endpoint="gke-mcs-1q8fig66j0",ip="zz.zz.zz.zz",ready="true"} 1
kube_endpoint_address{namespace="monitoring",endpoint="gke-mcs-7eknv6n114",ip="aa.aa.aa.aa",ready="true"} 1
kube_endpoint_address{namespace="monitoring",endpoint="gke-mcs-7eknv6n114",ip="zz.zz.zz.zz",ready="true"} 1

dgrisonnet · 2024-06-13T16:57:11Z

/assign
/triage accepted

Serializator · 2024-06-23T12:04:31Z

The kube_endpoint_address is explicitly for the addresses of an endpoint. The kube_endpoint_ports metric is for the ports of an endpoint. This would murky the water between these metrics and their purpose.

The other option is to ensure the IPs are unique when generating these metrics. There are a few concerns I have regarding this approach.

Can an IP address with different ports be available (.Addresses) and not ready (.NotReadyAddresses) at the same time?
The metric value of kube_endpoint_address_available and kube_endpoint_address_not_ready would not match the amount of kube_endpoint_address metrics anymore.

If we were to consider adding the port to the kube_endpoint_address metric, could this open of a conversation about a more generic kube_endpoint metric more suitable for this? What was the original decision making behind these separate _address and _ports metrics for endpoints?

gdlx · 2024-06-24T07:36:36Z

@Serializator That means the clean way would be the prometheus operator not to use the same address for different instances ?
That would consume more IPs but avoid this kind of hybrid endpoint subsets.

Serializator · 2024-06-24T08:58:28Z

Hi @gdlx! The Prometheus Operator is not doing anything it shouldn't be doing so I think it's on KSM to support this unforeseen circumstance. The Prometheus Operator is unfortunately the one which brings this problem to light. If it wasn't for the Prometheus Operator it would've been something else.

dgrisonnet · 2024-06-28T14:54:57Z

The bug lies in the fact that we don't distinguish between endpoint subsets. The metric was written in a way where we assumed that addresses and ports would always be unique for a single endpoint and never duplicated between subsets.

I looked a bit at Kubernetes' validation for Endpoints and it allows for duplicates ip/port pairs between subsets:
https://github.com/kubernetes/kubernetes/blob/master/pkg/apis/core/validation/validation.go#L7069-L7092

I think that the only option we have here is to add a subset label set to the index of the subset in the endpoint.
We could theoretically also replace kube_endpoint_address and kube_endpoint_ports by kube_endpoint_subsets, but both metrics are stable and looking at the validation code, we have no guarantees that two subsets wouldn't contain the same ip/port pair.

Any thoughts @mrueg @rexagod?

mrueg · 2024-06-28T15:43:54Z

We could also add port field to the kube_endpoint_address and mark the port one as deprecated.

kdeyko · 2024-07-03T06:35:09Z

Hi there!
Is there any workaround while this is in progress?

gdlx added the kind/bug Categorizes issue or PR as related to a bug. label May 31, 2024

k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label May 31, 2024

gdlx changed the title ~~Endpoint duplicates~~ kube_endpoint_address duplicates with Prometheus 2.52 May 31, 2024

k8s-ci-robot assigned dgrisonnet Jun 13, 2024

k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jun 13, 2024

dgrisonnet mentioned this issue Jun 13, 2024

Duplicate sample for HPA metrics using multiple external metrics with same metric name #2405

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kube_endpoint_address duplicates with Prometheus 2.52 #2408

kube_endpoint_address duplicates with Prometheus 2.52 #2408

gdlx commented May 31, 2024

eimarfandino commented Jun 5, 2024

gdlx commented Jun 5, 2024

zoopp commented Jun 10, 2024

dgrisonnet commented Jun 13, 2024

Serializator commented Jun 23, 2024

gdlx commented Jun 24, 2024

Serializator commented Jun 24, 2024

dgrisonnet commented Jun 28, 2024

mrueg commented Jun 28, 2024

kdeyko commented Jul 3, 2024

kube_endpoint_address duplicates with Prometheus 2.52 #2408

kube_endpoint_address duplicates with Prometheus 2.52 #2408

Comments

gdlx commented May 31, 2024

eimarfandino commented Jun 5, 2024

gdlx commented Jun 5, 2024

zoopp commented Jun 10, 2024

dgrisonnet commented Jun 13, 2024

Serializator commented Jun 23, 2024

gdlx commented Jun 24, 2024

Serializator commented Jun 24, 2024

dgrisonnet commented Jun 28, 2024

mrueg commented Jun 28, 2024

kdeyko commented Jul 3, 2024