Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues in kube-state-metrics dev-setup #8951

Open
sbueringer opened this issue Jul 3, 2023 · 10 comments
Open

Issues in kube-state-metrics dev-setup #8951

sbueringer opened this issue Jul 3, 2023 · 10 comments
Labels
area/devtools Issues or PRs related to devtools help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/bug Categorizes issue or PR as related to a bug. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@sbueringer
Copy link
Member

sbueringer commented Jul 3, 2023

I saw a few (potential) issues with kube-state-metrics when running our dev-setup. Not sure if they are upstream issues or if they are related to our configuration.

  1. Custom resource state added metrics is logged twice for each metric

For example:

I0703 11:33:29.295822       1 custom_resource_metrics.go:79] "Custom resource state added metrics" familyNames=[capi_clusterclass_info capi_clusterclass_created capi_clusterclass_annotation_paused capi_clusterclass_status_condition capi_clusterclass_status_condition_last_transition_time capi_clusterclass_owner]
I0703 11:33:29.295883       1 custom_resource_metrics.go:79] "Custom resource state added metrics" familyNames=[capi_clusterclass_info capi_clusterclass_created capi_clusterclass_annotation_paused capi_clusterclass_status_condition capi_clusterclass_status_condition_last_transition_time capi_clusterclass_owner]
  1. There are a lot of errors/warnings (?) in the kube-state-metric logs e.g. when creating a cluster
E0703 11:37:51.841041       1 registry_factory.go:649] "capi_kubeadmconfig_status_condition" err="[status,conditions]: expected value for path to be string, got <nil>"
E0703 11:37:51.841064       1 registry_factory.go:649] "capi_kubeadmconfig_status_condition_last_transition_time" err="[status,conditions]: got nil while resolving path"
E0703 11:37:51.841081       1 registry_factory.go:649] "capi_kubeadmconfig_status_condition" err="[status,conditions]: expected value for path to be string, got <nil>"
E0703 11:37:51.841096       1 registry_factory.go:649] "capi_kubeadmconfig_status_condition_last_transition_time" err="[status,conditions]: got nil while resolving path"

Not sure if they are expected. I think this would become very noisy with a few hundreds of clusters.

  1. Metrics dont' work with an empy labelsFromPath. I had to work around this in a few cases because we didn't have any metric-specific labels otherwise, e.g.: https://github.com/kubernetes-sigs/cluster-api/blob/39b2431b11863456818837fc5416cba75d99ecd7/hack/observability/kube-state-metrics/crd-config.yaml#L25C1-L26
@sbueringer sbueringer added kind/bug Categorizes issue or PR as related to a bug. area/devtools Issues or PRs related to devtools labels Jul 3, 2023
@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Jul 3, 2023
@sbueringer
Copy link
Member Author

@chrischdi It's low priority, but maybe you can take a look when you have some time.

@fabriziopandini
Copy link
Member

/triage accepted
Thanks for reporting those issues!
+1 to investigate on our side first, then eventually we can open focused/well-documented issues in kube-state-metrics if necessary

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jul 3, 2023
@chrischdi
Copy link
Member

Just verified:

  1. Custom resource state added metrics is logged twice for each metric

This is fixed in v2.9.2

  1. There are a lot of errors/warnings (?) in the kube-state-metric logs e.g. when creating a cluster

This is fixed in v2.9.2

  1. Metrics dont' work with an empy labelsFromPath. I had to work around this in a few cases because we didn't have any metric-specific labels otherwise, e.g.: https://github.com/kubernetes-sigs/cluster-api/blob/39b2431b11863456818837fc5416cba75d99ecd7/hack/observability/kube-state-metrics/crd-config.yaml#L25C1-L26

This one is questionable :-)

We have this issue currently only for CRDs, where would otherwise define an empty _info metric.
They would always be of value 1 and only have the labels cluster_name, name, namespace, uid.

So they do not really provide any information other than that labels. The above labels are also available at every other metric which exists for these CRDs.

Other observation with v2.9.2: Kube-state-metrics logs the following for lots of our defined metrics (once per affected metric, when loading the configuration):

I0707 15:53:56.895686   34785 generator.go:79] "Info metric %s does not have _info suffix" capi_kubeadmconfig_owner="(MISSING)"

@sbueringer
Copy link
Member Author

Other observation with v2.9.2: Kube-state-metrics logs the following for lots of our defined metrics (once per affected metric, when loading the configuration)

Would the general convention be that every info metrics needs an _info prefix? So not only our one "main" info metric per CRD but also something like annotation_paused?

@sbueringer
Copy link
Member Author

sbueringer commented Jul 7, 2023

So they do not really provide any information other than that labels. The above labels are also available at every other metric which exists for these CRDs.

Hm makes sense to some degree. But I think in cases where you don't have additional useful labels it makes still sense to have one "main" info metric without additional labels. Vs just using some other metric (with potentially many other dimensions) as a info metric.

@chrischdi
Copy link
Member

Other observation with v2.9.2: Kube-state-metrics logs the following for lots of our defined metrics (once per affected metric, when loading the configuration)

Would the general convention be that every info metrics needs an _info prefix? So not only our one "main" info metric per CRD but also something like annotation_paused?

From current perspective: also the annotation_paused one would require to be a _info metric.

xref: OpenMetrics specification / suffixes.

We use the info type in the configuration which results in defining it as "opemetrics info type" metric. It actually should be defined as gauge, as e.g. the kube_*_annotations metrics. Example from the kube-state-metrics endpoint:

# TYPE kube_daemonset_annotations gauge
# TYPE capi_machinedeployment_annotation_paused info

@sbueringer
Copy link
Member Author

Ah makes sense. Would be good to adjust them.

@chrischdi
Copy link
Member

So what we actually would want is:

# TYPE kube_daemonset_annotations gauge
# TYPE capi_machinedeployment_annotation_paused gauge

The configuration does currently not allow us to define a gauge type with a fixed value of 1. Maybe that would be the best.

@sbueringer
Copy link
Member Author

sbueringer commented Jul 7, 2023

Ah, got it

Probably worth discussing with kube-state-metrics folks

@fabriziopandini
Copy link
Member

/priority important-longterm

@k8s-ci-robot k8s-ci-robot added the priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. label Apr 11, 2024
@fabriziopandini fabriziopandini added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label May 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/devtools Issues or PRs related to devtools help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/bug Categorizes issue or PR as related to a bug. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

4 participants