Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CustomResourceStateMetrics didn't report the custom resource status data to metrics and kube-state-metrics crash if the custom resource property change #2141

Closed
chihshenghuang opened this issue Aug 10, 2023 · 5 comments
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@chihshenghuang
Copy link
Contributor

chihshenghuang commented Aug 10, 2023

What happened:
There are 2 issues:

  1. kube-state-metrics didn't report the status data for custom resource to metrics when deploy the custom resource at first, it will report the metrics only after kube-state-metrics pod restart. The reason is some status data will be reported later, and kube-state-metrics throw error "got nil while resolving path" at the beginning when there is no status data, and never check the status data again unless restart the pod. And it will cause other metrics can't be reported.
  2. kube-state-metrics will crash if remove the property in the custom resource which is used in the valueFrom path, unless use nilIsTrue. The reason is this function might return value as nil, then trying to access a property from nil value at here

What you expected to happen:

  1. kube-state-metrics should detect the custom resource status data change, and report the status data to metrics.
  2. kube-state-metrics don't panic and crash, which can be fixed by the PR I sent fix: Don't crash on non-existent valueFrom path values #2140

How to reproduce it (as minimally and precisely as possible):

  1. For kube-state-metrics can't report status data issue
  • Use the following config file:
apiVersion: v1
data:
  config.yaml: |-
    kind: CustomResourceStateMetrics
    spec:
      resources:
        - groupVersionKind:
            group: autoscaling.k8s.io
            kind: "VerticalPodAutoscaler"
            version: "v1"
          labelsFromPath:
            verticalpodautoscaler: [metadata, name]
            namespace: [metadata, namespace]
            target_api_version: [spec, targetRef, apiVersion]
            target_kind: [spec, targetRef, kind]
            target_name: [spec, targetRef, name]
          metrics:
            - name: "verticalpodautoscaler_status_recommendation_containerrecommendations_target_memory"
              help: "Kubernetes labels converted to Prometheus labels."
              each:
                type: Gauge
                gauge:
                  path: [status, recommendation, containerRecommendations]
                  valueFrom: [target, memory]
                  labelsFromPath:
                    container: [containerName]
                commonLabels:
                  resource: "memory"
                  unit: "byte"
            - name: "verticalpodautoscaler_status_recommendation_containerrecommendations_target_cpu"
              help: "Kubernetes labels converted to Prometheus labels."
              each:
                type: Gauge
                gauge:
                  path: [status, recommendation, containerRecommendations]
                  valueFrom: [target, cpu]
                  labelsFromPath:
                    container: [containerName]
                commonLabels:
                  resource: "cpu"
                  unit: "core"
            - name: "verticalpodautoscaler_spec_resourcepolicy_container_policies_minallowed_memory"
              help: "Kubernetes labels converted to Prometheus labels."
              each:
                type: Gauge
                gauge:
                  path: [spec, resourcePolicy, containerPolicies]
                  valueFrom: [minAllowed, memory]
                  labelsFromPath:
                    container: [containerName]
                commonLabels:
                  resource: "memory"
                  unit: "byte"
            - name: "verticalpodautoscaler_spec_resourcepolicy_container_policies_minallowed_cpu"
              help: "Kubernetes labels converted to Prometheus labels."
              each:
                type: Gauge
                gauge:
                  path: [spec, resourcePolicy, containerPolicies]
                  valueFrom: [minAllowed, cpu]
                  labelsFromPath:
                    container: [containerName]
                commonLabels:
                  resource: "cpu"
                  unit: "core"
kind: ConfigMap
metadata:
  name: custom-resource-config
  • Deploy the test VPA object
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: test-vpa
spec:
  targetRef:
    apiVersion: "apps/v1"
    kind: Deployment
    name: nginx
  resourcePolicy:
    containerPolicies:
    - containerName: "nginx1"
      minAllowed:
        cpu: 300m
        memory: 70Mi
  • VPA won't report recommended value immediately and update the recommended data about 15s later
$k get vpa -A
NAMESPACE   NAME       MODE   CPU   MEM   PROVIDED   AGE
default     test-vpa   Auto                          3s
$k get vpa -A
NAMESPACE   NAME       MODE   CPU   MEM       PROVIDED   AGE
default     test-vpa   Auto   12m   131072k   True       15s
  • only get recommendation target memory. Missing target cpu, minAllowed cpu and minAllowed memory metrics.
kube_customresource_verticalpodautoscaler_status_recommendation_containerrecommendations_target_memory{container="nginx1",customresource_group="autoscaling.k8s.io",customresource_kind="VerticalPodAutoscaler",customresource_version="v1",namespace="default",target_api_version="autoscaling.k8s.io/v1",target_kind="Deployment",target_name="nginx",verticalpodautoscaler="test-vpa"} 1.31072e+08
  • looks like the error "nil while resolving path" cause other metrics can't be reported
  1. For kube-state-metrics crash issue
  • Use the same config above for CustomResourceStateMetrics
  • Deploy the test VPA object same as above
  • Restart the kube-state-metrics pod to fix the previous data didn't report issue
  • Remove the minAllowed cpu/memory by using kubectl edit and vpa object becomes
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: test-vpa
spec:
  targetRef:
    apiVersion: "apps/v1"
    kind: Deployment
    name: nginx
  resourcePolicy:
    containerPolicies:
    - containerName: "nginx1"
  • Kube-state-metrics crash immediately and throw panic error (see below for error log)

Anything else we need to know?:

  1. log for custom resource status didn't export
$k logs kube-state-metrics-8699b49768-wb5wq
Defaulted container "kube-state-metrics" out of: kube-state-metrics, wait-for-apiserver (init)
I0810 05:42:05.692316       1 wrapper.go:98] "Starting kube-state-metrics"
I0810 05:42:05.694837       1 server.go:195] "Used CRD resources only"
I0810 05:42:05.694864       1 types.go:184] "Using all namespaces"
I0810 05:42:05.694891       1 server.go:225] "Metric allow-denylisting" allowDenyStatus="Excluding the following lists that were on denylist: "
I0810 05:42:05.696167       1 utils.go:70] "Tested communication with server"
I0810 05:42:05.732392       1 utils.go:75] "Run with Kubernetes cluster version" major="1" minor="25" gitVersion="v1.25.6" gitTreeState="clean" gitCommit="94c50547e633f1db5d4c56b2b305670e14987d59" platform="linux/amd64"
I0810 05:42:05.732686       1 utils.go:76] "Communication with server successful"
I0810 05:42:05.734661       1 server.go:347] "Started metrics server" metricsServerAddress="[::]:8080"
I0810 05:42:05.735089       1 server.go:336] "Started kube-state-metrics self metrics server" telemetryAddress="[::]:8081"
I0810 05:42:05.735428       1 server.go:72] levelinfomsgListening onaddress[::]:8080
I0810 05:42:05.735454       1 server.go:72] levelinfomsgTLS is disabled.http2falseaddress[::]:8080
I0810 05:42:05.735490       1 server.go:72] levelinfomsgListening onaddress[::]:8081
I0810 05:42:05.735512       1 server.go:72] levelinfomsgTLS is disabled.http2falseaddress[::]:8081
I0810 05:42:08.734719       1 config.go:82] "Using custom resource plural" resource="autoscaling.k8s.io_v1_VerticalPodAutoscaler" plural="verticalpodautoscalers"
I0810 05:42:08.734868       1 discovery.go:274] "discovery finished, cache updated"
I0810 05:42:08.734895       1 metrics_handler.go:99] "Autosharding disabled"
I0810 05:42:08.734952       1 custom_resource_metrics.go:79] "Custom resource state added metrics" familyNames=[kube_customresource_verticalpodautoscaler_status_recommendation_containerrecommendations_target_memory kube_customresource_verticalpodautoscaler_status_recommendation_containerrecommendations_target_cpu kube_customresource_verticalpodautoscaler_spec_resourcepolicy_container_policies_minallowed_memory kube_customresource_verticalpodautoscaler_spec_resourcepolicy_container_policies_minallowed_cpu]
I0810 05:42:18.844070       1 builder.go:631] "No CRs found for GVR" group="autoscaling.k8s.io" version="v1" resource="verticalpodautoscalers"
I0810 05:42:18.844137       1 builder.go:275] "Active resources" activeStoreNames="autoscaling.k8s.io/v1, Resource=verticalpodautoscalers"
E0810 05:42:34.050472       1 registry_factory.go:662] "kube_customresource_verticalpodautoscaler_status_recommendation_containerrecommendations_target_memory" err="[status,recommendation,containerRecommendations]: got nil while resolving path"
E0810 05:42:34.050524       1 registry_factory.go:662] "kube_customresource_verticalpodautoscaler_status_recommendation_containerrecommendations_target_cpu" err="[status,recommendation,containerRecommendations]: got nil while resolving path"
  1. log for panic error
Defaulted container "kube-state-metrics" out of: kube-state-metrics, wait-for-apiserver (init)
I0810 00:36:08.183251       1 wrapper.go:98] "Starting kube-state-metrics"
I0810 00:36:08.189263       1 server.go:195] "Used CRD resources only"
I0810 00:36:08.189442       1 types.go:184] "Using all namespaces"
I0810 00:36:08.189515       1 server.go:225] "Metric allow-denylisting" allowDenyStatus="Excluding the following lists that were on denylist: "
I0810 00:36:08.191512       1 utils.go:70] "Tested communication with server"
I0810 00:36:08.223249       1 utils.go:75] "Run with Kubernetes cluster version" major="1" minor="25" gitVersion="v1.25.6" gitTreeState="clean" gitCommit="94c50547e633f1db5d4c56b2b305670e14987d59" platform="linux/amd64"
I0810 00:36:08.223271       1 utils.go:76] "Communication with server successful"
I0810 00:36:08.224194       1 server.go:347] "Started metrics server" metricsServerAddress="[::]:8080"
I0810 00:36:08.224306       1 server.go:336] "Started kube-state-metrics self metrics server" telemetryAddress="[::]:8081"
I0810 00:36:08.224623       1 server.go:72] levelinfomsgListening onaddress[::]:8081
I0810 00:36:08.224747       1 server.go:72] levelinfomsgTLS is disabled.http2falseaddress[::]:8081
I0810 00:36:08.224623       1 server.go:72] levelinfomsgListening onaddress[::]:8080
I0810 00:36:08.224794       1 server.go:72] levelinfomsgTLS is disabled.http2falseaddress[::]:8080
I0810 00:36:11.225204       1 config.go:82] "Using custom resource plural" resource="autoscaling.k8s.io_v1_VerticalPodAutoscaler" plural="verticalpodautoscalers"
I0810 00:36:11.225286       1 discovery.go:274] "discovery finished, cache updated"
I0810 00:36:11.225309       1 metrics_handler.go:99] "Autosharding disabled"
I0810 00:36:11.225386       1 custom_resource_metrics.go:79] "Custom resource state added metrics" familyNames=[kube_customresource_verticalpodautoscaler_status_recommendation_containerrecommendations_target_cpu kube_customresource_verticalpodautoscaler_spec_resourcepolicy_container_policies_minallowed_cpu]
I0810 00:36:11.336865       1 builder.go:275] "Active resources" activeStoreNames="autoscaling.k8s.io/v1, Resource=verticalpodautoscalers"
E0810 00:36:11.340009       1 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
goroutine 100 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic({0x189f3c0?, 0x29b54e0})
	/go/pkg/mod/k8s.io/apimachinery@v0.26.5/pkg/util/runtime/runtime.go:75 +0x99
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0x2540be400?})
	/go/pkg/mod/k8s.io/apimachinery@v0.26.5/pkg/util/runtime/runtime.go:49 +0x75
panic({0x189f3c0, 0x29b54e0})
	/usr/local/go/src/runtime/panic.go:884 +0x213
k8s.io/kube-state-metrics/v2/pkg/customresourcestate.(*compiledGauge).Values(0xc000278600, {0x178a540?, 0xc000479248?})
	/go/src/k8s.io/kube-state-metrics/pkg/customresourcestate/registry_factory.go:281 +0x1202
k8s.io/kube-state-metrics/v2/pkg/customresourcestate.scrapeValuesFor({0x7f235d846cc0, 0xc000278600}, 0xc0001b5590)
	/go/src/k8s.io/kube-state-metrics/pkg/customresourcestate/registry_factory.go:678 +0xc4
k8s.io/kube-state-metrics/v2/pkg/customresourcestate.generate(0xc0004200c0, {{0xc0007ec600, 0x5f}, {0xc000046880, 0x31}, {0x7f235d846cc0, 0xc000278600}, 0xc0005318f0, 0xc000531a10, 0x0}, ...)
	/go/src/k8s.io/kube-state-metrics/pkg/customresourcestate/registry_factory.go:660 +0x29c
k8s.io/kube-state-metrics/v2/pkg/customresourcestate.famGen.func1({0x1ab4020?, 0xc0004200c0?})
	/go/src/k8s.io/kube-state-metrics/pkg/customresourcestate/registry_factory.go:649 +0x6c
k8s.io/kube-state-metrics/v2/pkg/metric_generator.(*FamilyGenerator).Generate(0xc0001778b8, {0x1ab4020?, 0xc0004200c0?})
	/go/src/k8s.io/kube-state-metrics/pkg/metric_generator/generator.go:74 +0x39
k8s.io/kube-state-metrics/v2/pkg/metric_generator.ComposeMetricGenFuncs.func1({0x1ab4020, 0xc0004200c0})
	/go/src/k8s.io/kube-state-metrics/pkg/metric_generator/generator.go:124 +0xcd
k8s.io/kube-state-metrics/v2/pkg/metrics_store.(*MetricsStore).Add(0xc000136940, {0x1ab4020, 0xc0004200c0})
	/go/src/k8s.io/kube-state-metrics/pkg/metrics_store/metrics_store.go:71 +0xd4
k8s.io/kube-state-metrics/v2/pkg/metrics_store.(*MetricsStore).Replace(0xc000136940, {0xc00040dee0, 0x1, 0xc12d2a1ed44106aa?}, {0xbebdf1a5?, 0x29e01e0?})
	/go/src/k8s.io/kube-state-metrics/pkg/metrics_store/metrics_store.go:133 +0xa8
k8s.io/client-go/tools/cache.(*Reflector).syncWith(0xc0002b80f0, {0xc00040ded0, 0x1, 0x0?}, {0xc000050580, 0x6})
	/go/pkg/mod/k8s.io/client-go@v0.26.5/tools/cache/reflector.go:469 +0x13b
k8s.io/client-go/tools/cache.(*Reflector).list(0xc0002b80f0, 0xc0001820c0)
	/go/pkg/mod/k8s.io/client-go@v0.26.5/tools/cache/reflector.go:454 +0x82b
k8s.io/client-go/tools/cache.(*Reflector).ListAndWatch(0xc0002b80f0, 0xc0001820c0)
	/go/pkg/mod/k8s.io/client-go@v0.26.5/tools/cache/reflector.go:259 +0x152
k8s.io/client-go/tools/cache.(*Reflector).Run.func1()
	/go/pkg/mod/k8s.io/client-go@v0.26.5/tools/cache/reflector.go:223 +0x26
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x10?)
	/go/pkg/mod/k8s.io/apimachinery@v0.26.5/pkg/util/wait/wait.go:157 +0x3e
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc000445700?, {0x1d2dce0, 0xc0004b6500}, 0x1, 0xc0001820c0)
	/go/pkg/mod/k8s.io/apimachinery@v0.26.5/pkg/util/wait/wait.go:158 +0xb6
k8s.io/client-go/tools/cache.(*Reflector).Run(0xc0002b80f0, 0xc0001820c0)
	/go/pkg/mod/k8s.io/client-go@v0.26.5/tools/cache/reflector.go:222 +0x185
created by k8s.io/kube-state-metrics/v2/internal/store.(*Builder).startReflector
	/go/src/k8s.io/kube-state-metrics/internal/store/builder.go:662 +0x2c5
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x16d2762]

goroutine 100 [running]:
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0x2540be400?})
	/go/pkg/mod/k8s.io/apimachinery@v0.26.5/pkg/util/runtime/runtime.go:56 +0xd7
panic({0x189f3c0, 0x29b54e0})
	/usr/local/go/src/runtime/panic.go:884 +0x213
k8s.io/kube-state-metrics/v2/pkg/customresourcestate.(*compiledGauge).Values(0xc000278600, {0x178a540?, 0xc000479248?})
	/go/src/k8s.io/kube-state-metrics/pkg/customresourcestate/registry_factory.go:281 +0x1202
k8s.io/kube-state-metrics/v2/pkg/customresourcestate.scrapeValuesFor({0x7f235d846cc0, 0xc000278600}, 0xc0001b5590)
	/go/src/k8s.io/kube-state-metrics/pkg/customresourcestate/registry_factory.go:678 +0xc4
k8s.io/kube-state-metrics/v2/pkg/customresourcestate.generate(0xc0004200c0, {{0xc0007ec600, 0x5f}, {0xc000046880, 0x31}, {0x7f235d846cc0, 0xc000278600}, 0xc0005318f0, 0xc000531a10, 0x0}, ...)
	/go/src/k8s.io/kube-state-metrics/pkg/customresourcestate/registry_factory.go:660 +0x29c
k8s.io/kube-state-metrics/v2/pkg/customresourcestate.famGen.func1({0x1ab4020?, 0xc0004200c0?})
	/go/src/k8s.io/kube-state-metrics/pkg/customresourcestate/registry_factory.go:649 +0x6c
k8s.io/kube-state-metrics/v2/pkg/metric_generator.(*FamilyGenerator).Generate(0xc0001778b8, {0x1ab4020?, 0xc0004200c0?})
	/go/src/k8s.io/kube-state-metrics/pkg/metric_generator/generator.go:74 +0x39
k8s.io/kube-state-metrics/v2/pkg/metric_generator.ComposeMetricGenFuncs.func1({0x1ab4020, 0xc0004200c0})
	/go/src/k8s.io/kube-state-metrics/pkg/metric_generator/generator.go:124 +0xcd
k8s.io/kube-state-metrics/v2/pkg/metrics_store.(*MetricsStore).Add(0xc000136940, {0x1ab4020, 0xc0004200c0})
	/go/src/k8s.io/kube-state-metrics/pkg/metrics_store/metrics_store.go:71 +0xd4
k8s.io/kube-state-metrics/v2/pkg/metrics_store.(*MetricsStore).Replace(0xc000136940, {0xc00040dee0, 0x1, 0xc12d2a1ed44106aa?}, {0xbebdf1a5?, 0x29e01e0?})
	/go/src/k8s.io/kube-state-metrics/pkg/metrics_store/metrics_store.go:133 +0xa8
k8s.io/client-go/tools/cache.(*Reflector).syncWith(0xc0002b80f0, {0xc00040ded0, 0x1, 0x0?}, {0xc000050580, 0x6})
	/go/pkg/mod/k8s.io/client-go@v0.26.5/tools/cache/reflector.go:469 +0x13b
k8s.io/client-go/tools/cache.(*Reflector).list(0xc0002b80f0, 0xc0001820c0)
	/go/pkg/mod/k8s.io/client-go@v0.26.5/tools/cache/reflector.go:454 +0x82b
k8s.io/client-go/tools/cache.(*Reflector).ListAndWatch(0xc0002b80f0, 0xc0001820c0)
	/go/pkg/mod/k8s.io/client-go@v0.26.5/tools/cache/reflector.go:259 +0x152
k8s.io/client-go/tools/cache.(*Reflector).Run.func1()
	/go/pkg/mod/k8s.io/client-go@v0.26.5/tools/cache/reflector.go:223 +0x26
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x10?)
	/go/pkg/mod/k8s.io/apimachinery@v0.26.5/pkg/util/wait/wait.go:157 +0x3e
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc000445700?, {0x1d2dce0, 0xc0004b6500}, 0x1, 0xc0001820c0)
	/go/pkg/mod/k8s.io/apimachinery@v0.26.5/pkg/util/wait/wait.go:158 +0xb6
k8s.io/client-go/tools/cache.(*Reflector).Run(0xc0002b80f0, 0xc0001820c0)
	/go/pkg/mod/k8s.io/client-go@v0.26.5/tools/cache/reflector.go:222 +0x185
created by k8s.io/kube-state-metrics/v2/internal/store.(*Builder).startReflector
	/go/src/k8s.io/kube-state-metrics/internal/store/builder.go:662 +0x2c5

Environment:

  • kube-state-metrics version: 2.9.0
  • Kubernetes version (use kubectl version):
$kubectl version
Client Version: version.Info{Major:"1", Minor:"26", GitVersion:"v1.26.1", GitCommit:"8f94681cd294aa8cfd3407b8191f6c70214973a4", GitTreeState:"clean", BuildDate:"2023-01-18T15:51:24Z", GoVersion:"go1.19.5", Compiler:"gc", Platform:"darwin/amd64"}
Kustomize Version: v4.5.7
Server Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.11", GitCommit:"8cfcba0b15c343a8dc48567a74c29ec4844e0b9e", GitTreeState:"clean", BuildDate:"2023-06-19T16:12:25Z", GoVersion:"go1.19.10", Compiler:"gc", Platform:"linux/amd64"}
@chihshenghuang chihshenghuang added the kind/bug Categorizes issue or PR as related to a bug. label Aug 10, 2023
@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Aug 10, 2023
@chihshenghuang chihshenghuang changed the title Didn't report the custom resource status data to metrics and kube-state-metrics crash if the custom resource property change CustomResourceStateMetrics didn't report the custom resource status data to metrics and kube-state-metrics crash if the custom resource property change Aug 10, 2023
@dashpole
Copy link

/assign @dgrisonnet
/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Aug 10, 2023
@buger

This comment was marked as spam.

@dgrisonnet
Copy link
Member

Hi @chihshenghuang, thank you for reporting theses bugs. Are you also willing to look into fixing bug nº1?

@chihshenghuang
Copy link
Contributor Author

@dgrisonnet for the first bug, @chrischdi report the same issue at #2142 and he send a PR #2154 to fix the issue. Could you help to take a look the PR?

@rexagod
Copy link
Member

rexagod commented Aug 29, 2023

/close
Fixed in #2154.

@rexagod rexagod closed this as completed Aug 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

6 participants