Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicated custom resource metrics when exposed for builtin type #2044

Open
grzesuav opened this issue Apr 12, 2023 · 10 comments
Open

Duplicated custom resource metrics when exposed for builtin type #2044

grzesuav opened this issue Apr 12, 2023 · 10 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.

Comments

@grzesuav
Copy link
Contributor

What happened:

Wanted to expose additional information for storageclass, expanded the config

    kind: CustomResourceStateMetrics
    spec:
      resources:
        - groupVersionKind:
            group: storage.k8s.io
            kind: StorageClass
            version: v1
          metricNamePrefix: kube_storageclass
          metrics:
            - name: "parameters"
              help: "StorageClass parameters"
              each:
                type: Info
                info:
                  labelsFromPath:
                    skuName: [parameters, skuName]
                    storageclass: [metadata, name]

I noticed that they are exposed twice, which makes it impossible to be scraped by prometheus (metrics are duplicated)

# HELP kube_storageclass_parameters StorageClass parameters
# TYPE kube_storageclass_parameters info
kube_storageclass_parameters{customresource_group="storage.k8s.io",customresource_kind="StorageClass",customresource_version="v1",storageclass="standard"} 1
# HELP kube_storageclass_parameters StorageClass parameters
# TYPE kube_storageclass_parameters info
kube_storageclass_parameters{customresource_group="storage.k8s.io",customresource_kind="StorageClass",customresource_version="v1",storageclass="standard"} 1

What you expected to happen:

Metrics are exposed once

How to reproduce it (as minimally and precisely as possible):

Use the config provided with k-s-m and kind cluster

Anything else we need to know?:
Full logs

I0412 11:54:47.701990       1 wrapper.go:98] "Starting kube-state-metrics"
I0412 11:54:47.702368       1 builder.go:192] "The internal resource store already exists and is overridden by a custom resource store with the same name, please make sure it meets your expectation" registryName="storageclasses"
I0412 11:54:47.702518       1 server.go:186] "Used default resources"
I0412 11:54:47.702577       1 types.go:184] "Using all namespaces"
I0412 11:54:47.702660       1 server.go:219] "Metric allow-denylisting" allowDenyStatus="Including the following lists that were on allowlist: kube_deployment_labels, kube_pod_labels, kube_storageclass_labels, kube_storageclass_parameters"
W0412 11:54:47.702744       1 client_config.go:618] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I0412 11:54:47.703382       1 server.go:364] "Tested communication with server"
I0412 11:54:47.710477       1 server.go:369] "Run with Kubernetes cluster version" major="1" minor="26" gitVersion="v1.26.3" gitTreeState="clean" gitCommit="9e644106593f3f4aa98f8a84b23db5fa378900bd" platform="linux/amd64"
I0412 11:54:47.710870       1 server.go:370] "Communication with server successful"
I0412 11:54:47.711399       1 server.go:316] "Started metrics server" metricsServerAddress="[::]:8080"
I0412 11:54:47.711684       1 server.go:74] levelinfomsgListening onaddress[::]:8080
I0412 11:54:47.711779       1 server.go:74] levelinfomsgTLS is disabled.http2falseaddress[::]:8080
I0412 11:54:47.711911       1 metrics_handler.go:99] "Autosharding disabled"
I0412 11:54:47.713339       1 server.go:305] "Started kube-state-metrics self metrics server" telemetryAddress="[::]:8081"
I0412 11:54:47.713536       1 server.go:74] levelinfomsgListening onaddress[::]:8081
I0412 11:54:47.713564       1 server.go:74] levelinfomsgTLS is disabled.http2falseaddress[::]:8081
I0412 11:54:47.714317       1 custom_resource_metrics.go:79] "Custom resource state added metrics" familyNames=[kube_storageclass_parameters]
I0412 11:54:47.714778       1 custom_resource_metrics.go:79] "Custom resource state added metrics" familyNames=[kube_storageclass_parameters]

Environment:

  • kube-state-metrics version: 2.8.0
  • Kubernetes version (use kubectl version): 1.25 and 1.26
  • Cloud provider or hardware configuration: AKS and local - kind
  • Other info:
@grzesuav grzesuav added the kind/bug Categorizes issue or PR as related to a bug. label Apr 12, 2023
@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Apr 12, 2023
@chrischdi
Copy link
Member

Can you please also add the kubernetes object you have created at your cluster (the storageclass)?

@CatherineF-dev
Copy link
Contributor

Yes, could you list all CRs for this CRD (storage.k8s.io)?

Guess there are two CRs.

@grzesuav
Copy link
Contributor Author

@chrischdi @CatherineF-dev

❯ k get storageclass
NAME                                     PROVISIONER                RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
azurefile                                file.csi.azure.com         Delete          Immediate              true                   426d
azurefile-csi                            file.csi.azure.com         Delete          Immediate              true                   426d
azurefile-csi-premium                    file.csi.azure.com         Delete          Immediate              true                   426d
azurefile-premium                        file.csi.azure.com         Delete          Immediate              true                   426d
blob-storage-cockroach-azure-centralus   disk.csi.azure.com         Delete          WaitForFirstConsumer   true                   220d
default (default)                        disk.csi.azure.com         Delete          WaitForFirstConsumer   true                   426d
kafka-test-ssd-centralus                 kubernetes.io/azure-disk   Delete          WaitForFirstConsumer   true                   17d
managed                                  disk.csi.azure.com         Delete          WaitForFirstConsumer   true                   426d
managed-csi                              disk.csi.azure.com         Delete          WaitForFirstConsumer   true                   426d
managed-csi-premium                      disk.csi.azure.com         Delete          WaitForFirstConsumer   true                   426d
managed-kafka-example-ssd-centralus      kubernetes.io/azure-disk   Delete          WaitForFirstConsumer   true                   20d
managed-premium                          disk.csi.azure.com         Delete          WaitForFirstConsumer   true                   426d
mongodb                                  kubernetes.io/azure-disk   Delete          WaitForFirstConsumer   true                   271d
postgresql-hdd                           disk.csi.azure.com         Delete          WaitForFirstConsumer   true                   39d
postgresql-premium-ssd                   disk.csi.azure.com         Delete          WaitForFirstConsumer   true                   39d
postgresql-ssd                           disk.csi.azure.com         Delete          WaitForFirstConsumer   true                   46d
prometheus-premium                       kubernetes.io/azure-disk   Delete          Immediate              true                   39d
prometheus-ssd                           kubernetes.io/azure-disk   Delete          Immediate              true                   78d
test-cockroach-azure-centralus           disk.csi.azure.com         Delete          Immediate              true                   31d
test-volume-populator                    disk.csi.azure.com         Delete          WaitForFirstConsumer   true                   70d
test-volume-populator-no-wait            disk.csi.azure.com         Delete          Immediate              false                  69d

and the endpoint result is

# HELP kube_storageclass_parameters StorageClass parameters
# TYPE kube_storageclass_parameters info
kube_storageclass_parameters{customresource_group="storage.k8s.io",customresource_kind="StorageClass",customresource_version="v1",skuName="Standard_LRS",storageclass="azurefile"} 1
kube_storageclass_parameters{customresource_group="storage.k8s.io",customresource_kind="StorageClass",customresource_version="v1",skuName="Standard_LRS",storageclass="azurefile-csi"} 1
kube_storageclass_parameters{customresource_group="storage.k8s.io",customresource_kind="StorageClass",customresource_version="v1",skuName="Premium_LRS",storageclass="azurefile-csi-premium"} 1
kube_storageclass_parameters{customresource_group="storage.k8s.io",customresource_kind="StorageClass",customresource_version="v1",skuName="Premium_LRS",storageclass="azurefile-premium"} 1
kube_storageclass_parameters{customresource_group="storage.k8s.io",customresource_kind="StorageClass",customresource_version="v1",skuName="Premium_LRS",storageclass="blob-storage-cockroach-azure-centralus"} 1
kube_storageclass_parameters{customresource_group="storage.k8s.io",customresource_kind="StorageClass",customresource_version="v1",skuName="Premium_LRS",storageclass="prometheus-premium"} 1
kube_storageclass_parameters{customresource_group="storage.k8s.io",customresource_kind="StorageClass",customresource_version="v1",skuName="Premium_LRS",storageclass="test-cockroach-azure-centralus"} 1
kube_storageclass_parameters{customresource_group="storage.k8s.io",customresource_kind="StorageClass",customresource_version="v1",skuName="StandardSSD_LRS",storageclass="test-volume-populator"} 1
kube_storageclass_parameters{customresource_group="storage.k8s.io",customresource_kind="StorageClass",customresource_version="v1",skuName="StandardSSD_LRS",storageclass="test-volume-populator-no-wait"} 1
# HELP kube_storageclass_parameters StorageClass parameters
# TYPE kube_storageclass_parameters info
kube_storageclass_parameters{customresource_group="storage.k8s.io",customresource_kind="StorageClass",customresource_version="v1",skuName="Standard_LRS",storageclass="azurefile"} 1
kube_storageclass_parameters{customresource_group="storage.k8s.io",customresource_kind="StorageClass",customresource_version="v1",skuName="Standard_LRS",storageclass="azurefile-csi"} 1
kube_storageclass_parameters{customresource_group="storage.k8s.io",customresource_kind="StorageClass",customresource_version="v1",skuName="Premium_LRS",storageclass="azurefile-csi-premium"} 1
kube_storageclass_parameters{customresource_group="storage.k8s.io",customresource_kind="StorageClass",customresource_version="v1",skuName="Premium_LRS",storageclass="azurefile-premium"} 1
kube_storageclass_parameters{customresource_group="storage.k8s.io",customresource_kind="StorageClass",customresource_version="v1",skuName="Premium_LRS",storageclass="blob-storage-cockroach-azure-centralus"} 1
kube_storageclass_parameters{customresource_group="storage.k8s.io",customresource_kind="StorageClass",customresource_version="v1",skuName="Premium_LRS",storageclass="prometheus-premium"} 1
kube_storageclass_parameters{customresource_group="storage.k8s.io",customresource_kind="StorageClass",customresource_version="v1",skuName="Premium_LRS",storageclass="test-cockroach-azure-centralus"} 1
kube_storageclass_parameters{customresource_group="storage.k8s.io",customresource_kind="StorageClass",customresource_version="v1",skuName="StandardSSD_LRS",storageclass="test-volume-populator"} 1
kube_storageclass_parameters{customresource_group="storage.k8s.io",customresource_kind="StorageClass",customresource_version="v1",skuName="StandardSSD_LRS",storageclass="test-volume-populator-no-wait"} 1

not all resources are exposed as they do not have property I want to expose and they are skipped IIUC

@rexagod
Copy link
Member

rexagod commented May 10, 2023

Right off the top of my mind, I'd say this happens because we buildCustomStore for anything defined in the CRS config. If the GVK is something that KSM supports natively, this will result in two different stores, but only serving the latest (registered) metrics. With #1851, we will move to supporting this feature only for CRs, and it is recommended to send a PR for anything you think is worth adding into KSM that folks can benefit from on a larger scale, similar to the ones that were merged previously, so this shouldn't really be a problem if the addition makes sense.

Additionally, I'd like to mention here that this is not quite the way things were meant to be defined (native resources in the CRS config) since, in a nutshell, this is something that, while being technically possible pre-#1851, was never really officially supported (it'd always conflict with the native store for the builtin type), even if users got the desired metrics up somehow.

EDIT.

A bit more context.

2nd EDIT.

  • I mistook the issue statement for something slightly different, I've cut that part out.

@murphd40
Copy link
Contributor

murphd40 commented May 14, 2023

I have also encountered this issue. It was not present in 8.7.0

For me, it occurs when I specify a --custom-resource-state-config and include it in --resources

$ go run main.go --port=8080 --telemetry-port=8081 --kubeconfig=$KUBECONFIG \
--custom-resource-state-config='{"spec":{"resources":[{"groupVersionKind":{"group":"operators.coreos.com","version":"v1alpha1","kind":"ClusterServiceVersion"},"metrics":[{"name":"csv_info","help":"Cluster Service Version install status","each":{"type":"Info","info":{"labelsFromPath":{"name":["metadata","name"],"status":["status","phase"]}}}}]}]}}' \
--resources clusterserviceversions
I0514 01:30:02.929882    7804 wrapper.go:98] "Starting kube-state-metrics"
I0514 01:30:02.930730    7804 server.go:201] "Used resources" resources=[clusterserviceversions clusterserviceversions]
I0514 01:30:02.930792    7804 types.go:184] "Using all namespaces"
I0514 01:30:02.930801    7804 server.go:228] "Metric allow-denylisting" allowDenyStatus="Excluding the following lists that were on denylist: "
I0514 01:30:02.932450    7804 server.go:367] "Tested communication with server"
I0514 01:30:03.546785    7804 server.go:372] "Run with Kubernetes cluster version" major="1" minor="26" gitVersion="v1.26.3+k3s1" gitTreeState="clean" gitCommit="01ea3ff27be0b04f945179171cec5a8e11a14f7b" platform="linux/amd64"
I0514 01:30:03.546909    7804 server.go:373] "Communication with server successful"
I0514 01:30:03.551387    7804 server.go:324] "Started metrics server" metricsServerAddress="[::]:8080"
I0514 01:30:03.551408    7804 server.go:313] "Started kube-state-metrics self metrics server" telemetryAddress="[::]:8081"
I0514 01:30:03.551408    7804 metrics_handler.go:99] "Autosharding disabled"
I0514 01:30:03.551692    7804 custom_resource_metrics.go:79] "Custom resource state added metrics" familyNames=[kube_customresource_csv_info]
I0514 01:30:03.552009    7804 server.go:73] levelinfomsgListening onaddress[::]:8081
I0514 01:30:03.552033    7804 server.go:73] levelinfomsgTLS is disabled.http2falseaddress[::]:8081
I0514 01:30:03.552036    7804 server.go:73] levelinfomsgListening onaddress[::]:8080
I0514 01:30:03.552063    7804 server.go:73] levelinfomsgTLS is disabled.http2falseaddress[::]:8080
I0514 01:30:03.552103    7804 custom_resource_metrics.go:79] "Custom resource state added metrics" familyNames=[kube_customresource_csv_info]
I0514 01:30:03.552186    7804 builder.go:246] "Active resources" activeStoreNames="clusterserviceversions,clusterserviceversions"

I believe it was introduced by #1928

resources := make([]string, len(factories))
for i, factory := range factories {
resources[i] = factory.Name()
}

default:
resources = append(resources, opts.Resources.AsSlice()...)
klog.InfoS("Used resources", "resources", resources)
}

See that custom resources will be added twice as a result of this - once from factories and once from opts.Resources


I think the fix should be either:

1. Revert back to the v2.7.0 implementation:

default:
klog.InfoS("Used resources", "resources", opts.Resources.String())
resources = opts.Resources.AsSlice()
}

This means that when using both --custom-resource-state-config and --resources, the custom resource names must be included in the resources list in order to be included.

Personally I think this is the best option. When using --resources, I think only the values in the supplied resources list should be included, regardless of any custom resource configs

OR

2. Add logic to remove duplicates from the resources list

I can open a PR to fix this if that helps

@rexagod
Copy link
Member

rexagod commented May 14, 2023

Putting it out there that this is still the case, and the later built (custom) stores are the only ones in effect.

Details

Native Store (registered first)

Screenshot from 2023-05-14 11-16-10

CRS Store (overrides the native store)

Screenshot from 2023-05-14 11-16-48

CRS configuration that was used (to build custom Deployment stores that replaced the native Deployment stores)

kind: CustomResourceStateMetrics
spec:
  resources:
    - groupVersionKind:
        group: "apps"
        version: "v1"
        kind: "Deployment"
      metrics:
        - name: "test_metric"
          help: "foo baz"
          each:
            type: Info
            info:
              path: [metadata]
              labelsFromPath:
                name: [name]

Final Deployment metrics generated (from the overriding (custom) stores)

image

Also, as I mentioned, native objects won't be supported in custom resource configurations as we will depend entirely on CRDs going forward, and this supersedes this issue.

@rexagod
Copy link
Member

rexagod commented May 14, 2023

@grzesuav I'm not sure if it's possible (I'm leaning towards the contrary based on my understanding), but is it possible for KSM to expose metrics for the same native and CRS object (for instance, Deployments, or StorageClass in your case) simultaneously?

Wanted to expose additional information for storageclass

I'm trying to understand how will this be facilitated from one KSM process, because AFAIR this was never the supported behavior (same internal object under both flags), and shouldn't be possible without the CR store overriding the native one. Can you provide the compete command (flags, and args) you're using to invoke KSM that allow you to "add" custom metrics for native objects on top of the original metrics exposed natively by KSM for the same object?

@rexagod
Copy link
Member

rexagod commented May 14, 2023

I found out that --custom-resource-state-only can be used to only output CRS metrics, using this flag fixes this issue in case of same object in the CRS configuration and the --resources argument, by suppressing the latter's output, which technically should be the original metrics (not the generated ones, hence the "duplicacy"), but as I mentioned earlier the native metrics are replaced by CRS ones for that case (which we will sunset, hence removing this conflict between stores).

@dgrisonnet
Copy link
Member

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jun 14, 2023
@k8s-triage-robot
Copy link

This issue has not been updated in over 1 year, and should be re-triaged.

You can:

  • Confirm that this issue is still relevant with /triage accepted (org members only)
  • Close this issue with /close

For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/

/remove-triage accepted

@k8s-ci-robot k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. and removed triage/accepted Indicates an issue or PR is ready to be actively worked on. labels Jun 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.
Projects
None yet
Development

No branches or pull requests

8 participants