Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KSM 2.8.0 can't be scraped by Prometheus if Native Histograms is enabled #2022

Closed
Joseph-Irving opened this issue Mar 13, 2023 · 5 comments · Fixed by #2024
Closed

KSM 2.8.0 can't be scraped by Prometheus if Native Histograms is enabled #2022

Joseph-Irving opened this issue Mar 13, 2023 · 5 comments · Fixed by #2024
Labels
kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.

Comments

@Joseph-Irving
Copy link
Member

Joseph-Irving commented Mar 13, 2023

What happened:
Starting from version 2.8.0, if Prometheus is started with --enable-feature=native-histograms flag it fails to scrape KSM, instead erroring with the error:

proto: wrong wireType = 0 for field Metric

What you expected to happen:
KSM should be able to be scraped by Prometheus

How to reproduce it (as minimally and precisely as possible):
I can reproduce this just on minikube with the standard helm prometheus, chart https://github.com/prometheus-community/helm-charts/tree/main/charts/prometheus

helm install prometheus prometheus-community/prometheus --set server.extraFlags={"enable-feature=native-histograms"\,"log.level=debug"} --version=19.7.2

This deploys kube-state-metrics version: registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.8.0
And Prometheus version: quay.io/prometheus/prometheus:v2.41.0
With the --enable-feature=native-histograms set.

Anything else we need to know?:
The problem was introduced in this PR #1974

I think the issue, is that when you enable Native Histograms, Prometheus is sending headers with a content type of:

content type: %v application/vnd.google.protobuf; proto=io.prometheus.client.MetricFamily; encoding=delimited

Which we're then mirroring in our headers:

resHeader.Set("Content-Type", string(contentType))

But we're not actually changing our response so we're replying with invalid protobuf hence the error from Prometheus.

I did this as a simple workaround and it fixed the issue:

if contentType == expfmt.FmtProtoDelim {
    resHeader.Set("Content-Type", `text/plain; version=`+"0.0.4")
} else {
    resHeader.Set("Content-Type", string(contentType))
}

However I'm not sure this is the right approach, is it intended that kube-state-metrics always responds with content headers that match the request? Or was this just a change for Open Metrics? In which case perhaps the header should only get changed if the contentType == expfmt.FmtOpenMetric? e.g reverse the logic, only change the headers if we detect open metrics headers:

if contentType == expfmt.FmtOpenMetrics {
   resHeader.Set("Content-Type", string(contentType))
} else {
   resHeader.Set("Content-Type", `text/plain; version=`+"0.0.4")
}

That way the code would behave the same as it did prior to 2.8.0 if the open metrics headers are not present.

Environment:

  • kube-state-metrics version: 2.8.0
  • Kubernetes version (use kubectl version): 1.26.1
  • Cloud provider or hardware configuration: n/a
  • Other info:
@Joseph-Irving Joseph-Irving added the kind/bug Categorizes issue or PR as related to a bug. label Mar 13, 2023
@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Mar 13, 2023
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If kube-state-metrics contributors determine this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@Joseph-Irving Joseph-Irving changed the title KSM 1.8.0 can't be scraped by Prometheus is Native Histograms is enabled KSM 1.8.0 can't be scraped by Prometheus if Native Histograms is enabled Mar 13, 2023
@CatherineF-dev
Copy link
Contributor

I could reproduce this issue.

# install prometheus and ksm
kind create cluster --name ksm-cluster
kind export kubeconfig --name=ksm-cluster

helm install prometheus prometheus-community/prometheus --set server.extraFlags={"enable-feature=native-histograms"} --version=19.7.2

export POD_NAME=$(kubectl get pods --namespace default -l "app=prometheus,component=server" -o jsonpath
="{.items[0].metadata.name}")
kubectl --namespace default port-forward $POD_NAME 9090

After open localhost:9090, can see

kubernetes-service-endpoints (3/4 up)
http://10.244.0.7:8080/metrics proto: wrong wireType = 0 for field Metric

(It's a little weird that I can't find this error from pod logs. And I need to open localhost:9090)

@Joseph-Irving
Copy link
Member Author

Joseph-Irving commented Mar 13, 2023

By default Prometheus doesn't log out scraping errors, if you change the helm command to turn on debug logging log.level=debug like so:

helm upgrade --install  prometheus prometheus-community/prometheus --set server.extraFlags={"enable-feature=native-histograms"\,"log.level=debug"} --version=19.7.2

You can see the errors in the container logs

ts=2023-03-13T15:07:35.733Z caller=scrape.go:1358 level=debug component="scrape manager" scrape_pool=kubernetes-service-endpoints target=http://10.244.0.21:8080/metrics msg="Append failed" err="proto: wrong wireType = 0 for field Metric"

(Updated my original comment to include turning on the debug logging)

@CatherineF-dev
Copy link
Contributor

Currently, kube-state-metrics is using contentType="application/vnd.google.protobuf; proto=io.prometheus.client.MetricFamily; encoding=delimited"

I agree with

if contentType == expfmt.FmtOpenMetrics {
   resHeader.Set("Content-Type", string(contentType))
} else {
   resHeader.Set("Content-Type", `text/plain; version=`+"0.0.4")
}

@Joseph-Irving Joseph-Irving changed the title KSM 1.8.0 can't be scraped by Prometheus if Native Histograms is enabled KSM 2.8.0 can't be scraped by Prometheus if Native Histograms is enabled Mar 13, 2023
@CatherineF-dev
Copy link
Contributor

fyi: fix is #2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants