Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CustomResourceStateMetrics Gauges do not work for Kubernetes Controller Status Conditions #1962

Closed
jan-kantert opened this issue Jan 25, 2023 · 9 comments · Fixed by #1963
Closed
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@jan-kantert
Copy link

jan-kantert commented Jan 25, 2023

What happened:
We are trying to export a metric based on the standard status conditions CRD for objects of our kubernetes controllers (i.e. cert-manager or custom controllers based on kubebuilder).

We defined --custom-resource-state-config in kube-state-metics 2.7.0:

      - --custom-resource-state-config
      # in YAML files, | allows a multi-line string to be passed as a flag value
      # see https://yaml-multiline.info
      -  |
          spec:
            resources:
            - groupVersionKind:
                group: xxxxx.xxx
                kind: "xxx"
                version: "v1"
              metrics:
              - name: "xxx_ready"
                help: "xxx ready state"
                each:
                  type: Gauge
                  gauge:
                    path: [status, conditions, "[type=Ready]", status]

Unfortunately, by default kubernetes controllers use a string condition:

status:
  conditions:
    - lastTransitionTime: "2019-10-22T16:29:31Z"
      status: "True"
      type: Ready

And kube-state-metrics yields a parse error:

registry_factory.go:589] \"kube_crd_xxxx_ready\" err=\"[status,conditions,[type=Ready],status]: []: strconv.ParseFloat: parsing \\\"True\\\": invalid syntax\""}

What you expected to happen:

I would expect it to be able to parse "True"/"False" or a way to map them to 1/0.

How to reproduce it (as minimally and precisely as possible):

  1. Install a controller which exposes status conditions in its CRD. For instance cert-manager would work.
  2. Add the config line from above
  3. Run kube-state-metrics

Anything else we need to know?:

Environment:

  • kube-state-metrics version: 2.7.0
  • Kubernetes version (use kubectl version): v1.22.17
  • Cloud provider or hardware configuration: AWS via Kops
  • Other info:
@jan-kantert jan-kantert added the kind/bug Categorizes issue or PR as related to a bug. label Jan 25, 2023
@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Jan 25, 2023
@jan-kantert
Copy link
Author

jan-kantert commented Jan 25, 2023

I guess we need a case for "True" and "False" (probably also lower case) here:

Alternatively, there could be a way to provide a mapping from value (i.e. string) to float.

@logicalhan
Copy link
Member

/assign @rexagod
/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jan 26, 2023
@chrischdi
Copy link
Member

In CAPI we do export conditions as StateSet this way (source):

    - name: status_condition
      help: The condition of a cluster.
      each:
        stateSet:
          labelName: status
          labelsFromPath:
            type:
            - type
          list:
          - 'True'
          - 'False'
          - Unknown
          path:
          - status
          - conditions
          valueFrom:
          - status
        type: StateSet

@jabdoa2
Copy link
Contributor

jabdoa2 commented Jan 27, 2023

In CAPI we do export conditions as StateSet this way (source):

    - name: status_condition
      help: The condition of a cluster.
      each:
        stateSet:
          labelName: status
          labelsFromPath:
            type:
            - type
          list:
          - 'True'
          - 'False'
          - Unknown
          path:
          - status
          - conditions
          valueFrom:
          - status
        type: StateSet

Looks like a valid workaround. Downside is probably twice (or 3x) the number of metrics and your queries become more complicated.

I guess another more generic way would be a valueMap for gauges. Could look like this:

  spec:
    resources:
    - groupVersionKind:
        group: xxxxx.xxx
        kind: "xxx"
        version: "v1"
      metrics:
      - name: "xxx_ready"
        help: "xxx status"
        each:
          type: Gauge
          gauge:
            path: [status, conditions]
            labelsFromPath:
              type: ["type"]
            valueFrom: ["status"]
            valueMap:
              'True': 1.0
              'False': 0.0

Not sure how many use cases exist for that kind of mapping though.

@chrischdi
Copy link
Member

From prior art: kube-state-metrics does it the same way for core resources and _status_condition metrics as the above stateSet does.

Examples:

  • kube_namespace_status_condition
  • kube_deployment_status_condition
  • Several others ending in _status_condition

From code to metric perspective: the field is a string and not typed to a boolean which is why I'm curious if the custom resource metric should get adjusted to support such cases like hardcoding the strings True and False to 1/0.

@jabdoa2
Copy link
Contributor

jabdoa2 commented Jan 27, 2023

I understand your point from a CS perspective. However, in practice it (a) makes working with those metrics a lot harder, (b) increases cardinality, and (c) I have rarely seen status Unknown at all.

a) Having three metrics for 1 resource where only one "exists" at a time causes issues in prometheus as they overlap due to stale timeout. We often see incorrect counts and flaky alerts because of this. One status value per resource is robust to query and easy to understand. According to the discussion in the KEP about status.conditions the format has been choosen this way to ensure compatibility with most libraries and not because it is the best way to represent the data.
b) This reduces resource usage by factor 2 to 3 in kube-state-metrics and prometheus. For use memory usage in Prometheus is an huge issue.
c) In my experience this is hardly used. Theoretically, we could choose to map Unknown to nil and then let the users decide using NilIsZero if it should be mapped to 0 or yield an error.

This is not the only implicit transformation. I documented all transformations in #1964.

@chrischdi
Copy link
Member

These are all indeed fair points and I also see the advantages of implementing parsing true/false to 1/0 👍 .

I just tend to first take a look at what's already there and what the options we have, before rushing into an actual implementation.

The only con I currently see is, that people may come over from "core condition metrics" and want to adapt their queries and may not be aware of the difference in the metric. But I also think that this should not be a blocker.

Thanks for the deep dive and information from the KEP! 😃

Just one point regarding c: from UX perspective this would make it hard to guess what the config really is doing. The mapping variant seems to be the most readable to me which does not require taking a further look at the docs or actual implementation 👍

@jabdoa2
Copy link
Contributor

jabdoa2 commented Jan 27, 2023

Just one point regarding c: from UX perspective this would make it hard to guess what the config really is doing. The mapping variant seems to be the most readable to me which does not require taking a further look at the docs or actual implementation +1

Fair point. Should we add that as that as well or instead? I guess it would not be hard to implement. Changing config would be ok as we would only add stuff.

@EronWright
Copy link

EronWright commented May 15, 2023

I observe that Unknown is still not a supported value, as of v2.8.2.
Also, Google Managed Prometheus doesn't support info or stateset as metric types, so it is important to have a solution based on gauge.

Here's the solution that I use:

        - name: "status_condition"
          help: The condition of a foo.
          each:
            type: Gauge
            gauge:
              path: [ status, conditions ]
              labelsFromPath:
                condition: [ "type" ]
              valueFrom: [ "status" ]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants