Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify custom resource metrics API by leveraging jq/CEL #1978

Open
CatherineF-dev opened this issue Feb 7, 2023 · 14 comments
Open

Simplify custom resource metrics API by leveraging jq/CEL #1978

CatherineF-dev opened this issue Feb 7, 2023 · 14 comments
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.

Comments

@CatherineF-dev
Copy link
Contributor

CatherineF-dev commented Feb 7, 2023

What would you like to be added:

Doc: #2059

A simplified API for CustomResourceStateMetrics, which only supports values and labels, instead of supporting each, path, labelFromKey, labelsFromPath, valueFrom, commonLabels, labelsFromPath and *.

# new 
kind: CustomResourceStateMetricsV2
spec:
  resources:
    - groupVersionKind:
        group: myteam.io
        kind: "Foo"
        version: "v1"
      metrics:
        - name: "ready_count"
          help: "Number Foo Bars ready"
          values: jq '[.status.sub[].ready]' # valueFrom: [ready] // [2,4]
          labels:
          - jq '[ .status.sub | keys | .[] | {name: .}]' # labelFromKey: type // [{"name": "type-a"}, {"name": "type-b"}]
          - jq '[{ custom_metric:"yes" }]' # custom_metric: "yes" // [{custom_metric="yes"}]
          - jq '[.metadata.labels]' # "*": [metadata, labels] // [{"bar": "baz","qux": "quxx"}]
          - jq '[.metadata.annotations]' # "**": [metadata, annotations] // [{"foo": "bar"}]
          - jq '[{ name: .metadata.name }]' # name: [metadata, name] // [{"name": "foo"}]
          - jq '[{ foo: .metadata.labels.foo }]' # foo: [metadata, labels, foo] // [{foo": "bar"}]
          - jq '[.status.sub[].active | {active: .}]' # labelsFromPath:  active: [active] // [{active": 1}, {"active": 3}]
          
 # old
      metrics:
        - name: "status_phase"
          help: "Foo status_phase"
          each:
            type: StateSet
            stateSet:
              labelName: phase
              path: [status, phase]
              list: [Pending, Bar, Baz]

In the above case, it will have these 2 metrics. The cardinality of each label (|label|) must equal to 1 or max(|values|, |label_i|). If |label| = 1, then this label is copied into N metric streams.

kube_customresource_ready_count{customresource_group="myteam.io", customresource_kind="Foo", customresource_version="v1", active="1",custom_metric="yes",foo="bar",name="foo",bar="baz",qux="quxx",type="type-a"} 2
kube_customresource_ready_count{customresource_group="myteam.io", customresource_kind="Foo", customresource_version="v1", active="3",custom_metric="yes",foo="bar",name="foo",bar="baz",qux="quxx",type="type-b"} 4

Why is this needed:

  1. simpler and easier for KSM community to maintain. Have seen several issues around corner cases with custom resource metrics (Crash on nonexistent metric paths in custom resources #1992).
  2. easier for users to use and debug custom resource metrics

Describe the solution you'd like

Additional context
Han recommended cel

  1. https://kubernetes.io/docs/reference/using-api/cel/
  2. https://github.com/google/cel-go
@CatherineF-dev CatherineF-dev added the kind/feature Categorizes issue or PR as related to a new feature. label Feb 7, 2023
@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Feb 7, 2023
@chrischdi
Copy link
Member

I like the idea of cleaning up the configuration. When doing so we should take care that we are still able to address all use-cases which got addressed currently.

IMHO: if we introduce a new version for the configuration, we should do it in a way to have auto-conversion from the old configuration by using a custom config type as in https://book.kubebuilder.io/component-config-tutorial/config-type.html .

Also related issue: #1948 .

@logicalhan
Copy link
Member

/triage accepted
/assign @CatherineF-dev

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Feb 9, 2023
@dgrisonnet
Copy link
Member

@CatherineF-dev could you perhaps start a design doc highlighting the different options we have to improve the UX of the existing API?

@CatherineF-dev
Copy link
Contributor Author

Okay!

@CatherineF-dev
Copy link
Contributor Author

@CatherineF-dev
Copy link
Contributor Author

CatherineF-dev commented Apr 13, 2023

Existing problems for KSM custom resource

1. Custom resource API is complicated and not flexible

Now, it supports 7 operations:
each, path, labelFromKey, labelsFromPath, valueFrom, commonLabels and *.

It’s not easy to use and brings some corner case issues.

  • I am using this custom resource document , but I couldn't find a clear yes or no there. I want to capture replsets. <all-the-elements>. size. Is this possible?
  • "LabelFromKey" not available #1868
  • Crash on nonexistent metric paths in custom resources #1992
  • Can’t aggregate metrics for multiple CRs. For example, It can’t answer “How many CRs under one CRD”?

Existing proposals:
- Pr 2014 proposes a metric generation tool to create configurations from monitored CRD.
- Issue 1978 proposes to simplify the API from 7 operations to 2 operations using jq (JsonPath). It can support some aggregations to answer question “How many CRs under one CRD”?

2. Coupled monitoring pipeline and monitoring target

Need to modify kube-state-metrics agents if you want to monitor one custom resource.

--custom-resource-state-config "inline yaml (see example)" 
--custom-resource-state-config-file /path/to/config.yaml

Existing proposals:
Issue 1948 proposes to support CustomResourceDefinition CRD.

Proposal

Supports
Issue 1948 proposes to support CustomResourceDefinition CRD.
Issue 1978 proposes to simply API from 7 operations to 2 operations.

@CatherineF-dev
Copy link
Contributor Author

cc @dgrisonnet,

What else do I need to add into #1978 (comment)? Thx!

@dgrisonnet
Copy link
Member

Feel free to open a PR adding your design doc under docs/design and we can review it from there

@nathanperkins
Copy link

nathanperkins commented Apr 19, 2023

Would this be capable of parsing annotation values as json? In some cases, we have controllers that save state on an object as json annotations. It would be nice to expose fields within those json objects.

@chrischdi
Copy link
Member

One point which came to my mind we should consider if this gets done: performance!

@logicalhan
Copy link
Member

I feel like we should use CEL since that's the direction that Kubernetes is moving.

@CatherineF-dev
Copy link
Contributor Author

CatherineF-dev commented May 8, 2023

Reply @nathanperkins, I think CEL can parse annotation values as json. So it's feasible.

Design doc is here: Simplify custom resource state metrics API using CEL(#2059)

Also, it can support counting the number of CRs under one CRD. Or anything else which can be queried using CEL.

@CatherineF-dev CatherineF-dev changed the title Simplify custom resource metrics API by leveraging jq or yq Simplify custom resource metrics API by leveraging CEL May 8, 2023
@CatherineF-dev CatherineF-dev changed the title Simplify custom resource metrics API by leveraging CEL Simplify custom resource metrics API by leveraging jq/CEL May 8, 2023
@CatherineF-dev
Copy link
Contributor Author

CatherineF-dev commented Jan 16, 2024

@k8s-triage-robot
Copy link

This issue has not been updated in over 1 year, and should be re-triaged.

You can:

  • Confirm that this issue is still relevant with /triage accepted (org members only)
  • Close this issue with /close

For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/

/remove-triage accepted

@k8s-ci-robot k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. and removed triage/accepted Indicates an issue or PR is ready to be actively worked on. labels Jan 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants