Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

custom metrics for resources w/ group="" broken, e.g. Node #2434

Closed
robbat2 opened this issue Jun 28, 2024 · 6 comments
Closed

custom metrics for resources w/ group="" broken, e.g. Node #2434

robbat2 opened this issue Jun 28, 2024 · 6 comments
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@robbat2
Copy link

robbat2 commented Jun 28, 2024

What happened:
Custom metrics for resources where group is empty, e.g. Node resource do not work. This was a regression between v2.8.2 and v2.9.0

What you expected to happen:
Custom metrics should work even with an empty group.

How to reproduce it (as minimally and precisely as possible):
Run kube-state-metrics v2.9.0, compare vs v2.8.2

resources:
  - groupVersionKind:
      kind: Node
      version: v1
      group: ""
    labelsFromPath:
      node: [metadata, name]
    metricNamePrefix: demo
    metrics:
      - name: timestamp
        each:
          type: Gauge
          gauge:
            path: [metadata]
            valueFrom: [creationTimestamp]

Error:

"failed to resolve GVK" err="group is required in the defined GVK /v1, Kind=Node" gvk="_v1_Node"

This breakage was introduced accidently in #1851

Specifically here:

if g == "" || g == "*" {
return nil, fmt.Errorf("group is required in the defined GVK %v", gvk)
}

Anything else we need to know?:
This was previously reported #2333

Rolling KSM back to v2.8.2 and the custom metric DOES work, but that's not an option in production environment due to the need for RBAC features introduced

Environment:

  • kube-state-metrics version: v2.10.1
  • Kubernetes version (use kubectl version): v1.25.1
  • Cloud provider or hardware configuration: private
  • Other info:

Screenshot of the metrics working in v2.8.2:
screenshot

@robbat2 robbat2 added the kind/bug Categorizes issue or PR as related to a bug. label Jun 28, 2024
@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Jun 28, 2024
robbat2 added a commit to coreweave/kube-state-metrics that referenced this issue Jun 28, 2024
PR#1851 introduced accidental breakage for custom metrics on resources
with an empty empty, e.g. Node.

Fix it, and add tests to catch the next breakage.

Reference: kubernetes#1851
Signed-off-by: Robin H. Johnson <rjohnson@coreweave.com>
Fixes: kubernetes#2434
robbat2 added a commit to coreweave/kube-state-metrics that referenced this issue Jun 28, 2024
PR#1851 introduced accidental breakage for custom metrics on resources
with an empty empty, e.g. Node.

Fix it, and add tests to catch the next breakage.

Reference: kubernetes#1851
Reference: kubernetes#2434
Signed-off-by: Robin H. Johnson <rjohnson@coreweave.com>
robbat2 added a commit to coreweave/kube-state-metrics that referenced this issue Jun 28, 2024
PR#1851 introduced accidental breakage for custom metrics on resources
with an empty empty, e.g. Node.

Fix it, and add tests to catch the next breakage.

Reference: kubernetes#1851
Reference: kubernetes#2434
Signed-off-by: Robin H. Johnson <rjohnson@coreweave.com>
robbat2 added a commit to coreweave/kube-state-metrics that referenced this issue Jun 28, 2024
PR#1851 introduced accidental breakage for custom metrics on resources
with an empty empty, e.g. Node.

Fix it, and add tests to catch the next breakage.

```
demo_timestamp{customresource_group="",customresource_kind="Node",customresource_version="v1",node="kind-control-plane"} 1.719618433e+09
```

Reference: kubernetes#1851
Reference: kubernetes#2434
Signed-off-by: Robin H. Johnson <rjohnson@coreweave.com>
robbat2 added a commit to coreweave/kube-state-metrics that referenced this issue Jun 29, 2024
PR#1851 introduced accidental breakage for custom metrics on resources
with an empty empty, e.g. Node.

Fix it, and add tests to catch the next breakage.

```
demo_timestamp{customresource_group="",customresource_kind="Node",customresource_version="v1",node="kind-control-plane"} 1.719618433e+09
```

Reference: kubernetes#1851
Reference: kubernetes#2434
Signed-off-by: Robin H. Johnson <rjohnson@coreweave.com>
@mblaschke
Copy link

can confirm this issue, it's silently dropping the metrics.

empty group should be allowed and actually the exporter should panic if something is wrong and not silently drop metrics

@mblaschke
Copy link

same with v2.12.0

@dgrisonnet
Copy link
Member

/assign
/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Aug 8, 2024
@dgrisonnet
Copy link
Member

Based on the slack thread and the discussions we've had on the PR, I think we've come to the conclusion that we should reject this particular feature request (the fact that it was originally possible was a bug) as it is out of scope for kube-state-metrics.

To quote some of the reasons I gave in the slack thread:

What is the mechanism for defining custom metrics for core resources? Could it be possible to duplicate the GVK in a completely separate process mimicking the CRS for core metrics?

There is no such mechanism. The fact that this was possible in the first place was an oversight from our side.

Going forward, we don't want to support defining custom metrics for core resources. If we follow kube-state-metrics scope and goal to create a 1:1 mapping between Kubernetes resources and Prometheus metrics, there should never be any use case where this mechanism make sense since users can't modify core resources schema.
Let me explain why your use case doesn't fit kube-state-metrics' scope. What you are trying to achieve is to turn an annotation value into a metric. However an annotation value is not part of an object's API, it is just one of multiple values that its API field annotations can take. It is specifically because it is not part of the object's API that it is out of scope for kube-state-metrics to convert annotation values into metrics (note that this is also the case for any values).
The fact that this is possible today with custom metrics is also an oversight and will need to be patched because both native and custom metrics should follow the same principles.

@dgrisonnet
Copy link
Member

empty group should be allowed and actually the exporter should panic if something is wrong and not silently drop metrics

@mblaschke although I would normally agree with you for panicking in case of misconfiguration, here we are talking about a dynamic configuration that can be updated at runtime. If a user updating ksm configuration at runtime would cause it to crash that would be quite dreadful. That said I can definitely see a pain point here and maybe we should add a new metrics to report any kind of misconfiguration so that it would be more visible to cluster admin when kube-state-metrics is misconfigured.

@rexagod do you have any thoughts on the above?

@rexagod
Copy link
Member

rexagod commented Aug 13, 2024

Seconding Damien, I'm not for panicking the binary since we expect a reload on configuration change, and then a log (or metric) indicating an issue with the given configuration. Crashing the binary goes against the controller-esque behavior and allows anyone (with access) to crash a running instance simply by supplying a bad config, breaking existing expectations.

That said I can definitely see a pain point here and maybe we should add a new metrics to report any kind of misconfiguration so that it would be more visible to cluster admin when kube-state-metrics is misconfigured.

+1 all the way. I'll add one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants