CustomResourceStateMetrics Gauges do not work for Kubernetes Controller Status Conditions #1962

jan-kantert · 2023-01-25T17:18:22Z

What happened:
We are trying to export a metric based on the standard status conditions CRD for objects of our kubernetes controllers (i.e. cert-manager or custom controllers based on kubebuilder).

We defined --custom-resource-state-config in kube-state-metics 2.7.0:

      - --custom-resource-state-config
      # in YAML files, | allows a multi-line string to be passed as a flag value
      # see https://yaml-multiline.info
      -  |
          spec:
            resources:
            - groupVersionKind:
                group: xxxxx.xxx
                kind: "xxx"
                version: "v1"
              metrics:
              - name: "xxx_ready"
                help: "xxx ready state"
                each:
                  type: Gauge
                  gauge:
                    path: [status, conditions, "[type=Ready]", status]

Unfortunately, by default kubernetes controllers use a string condition:

status:
  conditions:
    - lastTransitionTime: "2019-10-22T16:29:31Z"
      status: "True"
      type: Ready

And kube-state-metrics yields a parse error:

registry_factory.go:589] \"kube_crd_xxxx_ready\" err=\"[status,conditions,[type=Ready],status]: []: strconv.ParseFloat: parsing \\\"True\\\": invalid syntax\""}

What you expected to happen:

I would expect it to be able to parse "True"/"False" or a way to map them to 1/0.

How to reproduce it (as minimally and precisely as possible):

Install a controller which exposes status conditions in its CRD. For instance cert-manager would work.
Add the config line from above
Run kube-state-metrics

Anything else we need to know?:

Environment:

kube-state-metrics version: 2.7.0
Kubernetes version (use kubectl version): v1.22.17
Cloud provider or hardware configuration: AWS via Kops
Other info:

The text was updated successfully, but these errors were encountered:

jan-kantert · 2023-01-25T17:32:25Z

I guess we need a case for "True" and "False" (probably also lower case) here:

kube-state-metrics/pkg/customresourcestate/registry_factory.go

Line 630 in f01ce58

case string:

Alternatively, there could be a way to provide a mapping from value (i.e. string) to float.

logicalhan · 2023-01-26T17:52:59Z

/assign @rexagod
/triage accepted

chrischdi · 2023-01-27T13:55:34Z

In CAPI we do export conditions as StateSet this way (source):

    - name: status_condition
      help: The condition of a cluster.
      each:
        stateSet:
          labelName: status
          labelsFromPath:
            type:
            - type
          list:
          - 'True'
          - 'False'
          - Unknown
          path:
          - status
          - conditions
          valueFrom:
          - status
        type: StateSet

jabdoa2 · 2023-01-27T14:35:14Z

In CAPI we do export conditions as StateSet this way (source):

    - name: status_condition
      help: The condition of a cluster.
      each:
        stateSet:
          labelName: status
          labelsFromPath:
            type:
            - type
          list:
          - 'True'
          - 'False'
          - Unknown
          path:
          - status
          - conditions
          valueFrom:
          - status
        type: StateSet

Looks like a valid workaround. Downside is probably twice (or 3x) the number of metrics and your queries become more complicated.

I guess another more generic way would be a valueMap for gauges. Could look like this:

  spec:
    resources:
    - groupVersionKind:
        group: xxxxx.xxx
        kind: "xxx"
        version: "v1"
      metrics:
      - name: "xxx_ready"
        help: "xxx status"
        each:
          type: Gauge
          gauge:
            path: [status, conditions]
            labelsFromPath:
              type: ["type"]
            valueFrom: ["status"]
            valueMap:
              'True': 1.0
              'False': 0.0

Not sure how many use cases exist for that kind of mapping though.

chrischdi · 2023-01-27T16:22:37Z

From prior art: kube-state-metrics does it the same way for core resources and _status_condition metrics as the above stateSet does.

Examples:

kube_namespace_status_condition
kube_deployment_status_condition
Several others ending in _status_condition

From code to metric perspective: the field is a string and not typed to a boolean which is why I'm curious if the custom resource metric should get adjusted to support such cases like hardcoding the strings True and False to 1/0.

jabdoa2 · 2023-01-27T16:40:43Z

I understand your point from a CS perspective. However, in practice it (a) makes working with those metrics a lot harder, (b) increases cardinality, and (c) I have rarely seen status Unknown at all.

a) Having three metrics for 1 resource where only one "exists" at a time causes issues in prometheus as they overlap due to stale timeout. We often see incorrect counts and flaky alerts because of this. One status value per resource is robust to query and easy to understand. According to the discussion in the KEP about status.conditions the format has been choosen this way to ensure compatibility with most libraries and not because it is the best way to represent the data.
b) This reduces resource usage by factor 2 to 3 in kube-state-metrics and prometheus. For use memory usage in Prometheus is an huge issue.
c) In my experience this is hardly used. Theoretically, we could choose to map Unknown to nil and then let the users decide using NilIsZero if it should be mapped to 0 or yield an error.

This is not the only implicit transformation. I documented all transformations in #1964.

chrischdi · 2023-01-27T18:33:50Z

These are all indeed fair points and I also see the advantages of implementing parsing true/false to 1/0 👍 .

I just tend to first take a look at what's already there and what the options we have, before rushing into an actual implementation.

The only con I currently see is, that people may come over from "core condition metrics" and want to adapt their queries and may not be aware of the difference in the metric. But I also think that this should not be a blocker.

Thanks for the deep dive and information from the KEP! 😃

Just one point regarding c: from UX perspective this would make it hard to guess what the config really is doing. The mapping variant seems to be the most readable to me which does not require taking a further look at the docs or actual implementation 👍

jabdoa2 · 2023-01-27T19:03:45Z

Just one point regarding c: from UX perspective this would make it hard to guess what the config really is doing. The mapping variant seems to be the most readable to me which does not require taking a further look at the docs or actual implementation +1

Fair point. Should we add that as that as well or instead? I guess it would not be hard to implement. Changing config would be ok as we would only add stuff.

EronWright · 2023-05-15T23:31:39Z

I observe that Unknown is still not a supported value, as of v2.8.2.
Also, Google Managed Prometheus doesn't support info or stateset as metric types, so it is important to have a solution based on gauge.

Here's the solution that I use:

        - name: "status_condition"
          help: The condition of a foo.
          each:
            type: Gauge
            gauge:
              path: [ status, conditions ]
              labelsFromPath:
                condition: [ "type" ]
              valueFrom: [ "status" ]

jan-kantert added the kind/bug Categorizes issue or PR as related to a bug. label Jan 25, 2023

k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Jan 25, 2023

jabdoa2 mentioned this issue Jan 26, 2023

support "True" and "False" as string in custom-resource-state #1963

Merged

k8s-ci-robot assigned rexagod Jan 26, 2023

k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jan 26, 2023

k8s-ci-robot closed this as completed in #1963 Feb 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CustomResourceStateMetrics Gauges do not work for Kubernetes Controller Status Conditions #1962

CustomResourceStateMetrics Gauges do not work for Kubernetes Controller Status Conditions #1962

jan-kantert commented Jan 25, 2023 •

edited

Loading

jan-kantert commented Jan 25, 2023 •

edited

Loading

logicalhan commented Jan 26, 2023

chrischdi commented Jan 27, 2023

jabdoa2 commented Jan 27, 2023

chrischdi commented Jan 27, 2023

jabdoa2 commented Jan 27, 2023

chrischdi commented Jan 27, 2023

jabdoa2 commented Jan 27, 2023

EronWright commented May 15, 2023 •

edited

Loading

CustomResourceStateMetrics Gauges do not work for Kubernetes Controller Status Conditions #1962

CustomResourceStateMetrics Gauges do not work for Kubernetes Controller Status Conditions #1962

Comments

jan-kantert commented Jan 25, 2023 • edited Loading

jan-kantert commented Jan 25, 2023 • edited Loading

logicalhan commented Jan 26, 2023

chrischdi commented Jan 27, 2023

jabdoa2 commented Jan 27, 2023

chrischdi commented Jan 27, 2023

jabdoa2 commented Jan 27, 2023

chrischdi commented Jan 27, 2023

jabdoa2 commented Jan 27, 2023

EronWright commented May 15, 2023 • edited Loading

jan-kantert commented Jan 25, 2023 •

edited

Loading

jan-kantert commented Jan 25, 2023 •

edited

Loading

EronWright commented May 15, 2023 •

edited

Loading