kube-state-metrics keep serving stale metrics after extended apiserver outage #694

bergerx · 2019-03-04T16:33:48Z

Is this a BUG REPORT or FEATURE REQUEST?:
/kind bug

What happened:
We wanted to check the metrics for some pods and realised we don't have metrics for the pod in question, then check the deployment and other resources, no metrics are there, but some other pod metrics were available.
We checked the metrics exposed from the /metrics endpoint and we saw metrics exposed for pods which are deleted 2+ weeks ago.
So kube-state-metrics were reporting stale metrics for last 3 weeks.

Here are some relevant logs from kube-state-metrics pod:

2019-02-10T21:56:52.294138617Z E0210 21:56:52.293444       1 streamwatcher.go:109] Unable to decode an event from the watch stream: http2: server sent GOAWAY and closed the connection; LastStreamID=124961, ErrCode=NO_ERROR, debug=""
2019-02-10T21:56:52.295193557Z E0210 21:56:52.293956       1 streamwatcher.go:109] Unable to decode an event from the watch stream: http2: server sent GOAWAY and closed the connection; LastStreamID=124961, ErrCode=NO_ERROR, debug=""
2019-02-10T21:56:52.295465367Z E0210 21:56:52.294137       1 streamwatcher.go:109] Unable to decode an event from the watch stream: http2: server sent GOAWAY and closed the connection; LastStreamID=124961, ErrCode=NO_ERROR, debug=""
2019-02-10T21:56:52.29578788Z E0210 21:56:52.294170       1 streamwatcher.go:109] Unable to decode an event from the watch stream: http2: server sent GOAWAY and closed the connection; LastStreamID=124961, ErrCode=NO_ERROR, debug=""
... There are total of 17 `Unable to decode an event` logs within same second

2019-02-10T21:56:52.321674569Z W0210 21:56:52.321518       1 reflector.go:341] k8s.io/kube-state-metrics/pkg/collectors/collectors.go:91: watch of *v1.Endpoints ended with: very short watch: k8s.io/kube-state-metrics/pkg/collectors/collectors.go:91: Unexpected watch close - watch lasted less than a second and no items received
2019-02-10T21:56:52.322241891Z W0210 21:56:52.322003       1 reflector.go:341] k8s.io/kube-state-metrics/pkg/collectors/collectors.go:91: watch of *v1beta1.CronJob ended with: very short watch: k8s.io/kube-state-metrics/pkg/collectors/collectors.go:91: Unexpected watch close - watch lasted less than a second and no items received
... There are total of 17 `Unexpected watch close` logs within same second 

# here the logs pivoted to `TLS handshake timeout`
2019-02-10T21:57:03.330774532Z E0210 21:57:03.330640       1 reflector.go:205] k8s.io/kube-state-metrics/pkg/collectors/collectors.go:91: Failed to list *v1.Pod: Get https://msa-dev-az-xxxxxx.hcp.eastus.azmk8s.io:443/api/v1/pods?limit=500&resourceVersion=0: net/http: TLS handshake timeout
...
... # there are quite many `net/http: TLS handshake timeout` messages here until the logs stop
...
2019-02-10T21:57:25.366656235Z E0210 21:57:25.366563       1 reflector.go:205] k8s.io/kube-state-metrics/pkg/collectors/collectors.go:91: Failed to list *v1.PersistentVolumeClaim: Get https://msa-dev-az-xxxxxx.hcp.eastus.azmk8s.io:443/api/v1/persistentvolumeclaims?limit=500&resourceVersion=0: net/http: TLS handshake timeout
2019-02-10T21:57:25.367127153Z E0210 21:57:25.366998       1 reflector.go:205] k8s.io/kube-state-metrics/pkg/collectors/collectors.go:91: Failed to list *v2beta1.HorizontalPodAutoscaler: Get https://msa-dev--xxxxxx.hcp.eastus.azmk8s.io:443/apis/autoscaling/v2beta1/horizontalpodautoscalers?limit=500&resourceVersion=0: net/http: TLS handshake timeout
# this is the very last log line, current date is 2019-03-04 and still no logs, pod is still up and reporting stale metrics from that time, so last log line was 3 weeks ago

What you expected to happen:
kube-state-metrics should not be serving stale metrics more than a certain time (I'd say at most couple of minutes for our case).
Seems like the kube-state-metrics didn't try to reconnect to the kube-apiserver after having some issues with the kube-apiserver.
Instead of keep serving stale metrics kube-state-metrics should at least panic out our report unhealthy status.

How to reproduce it (as minimally and precisely as possible):
We didn't reproduce it yet. I'll update if we are able to reproduce.

Anything else we need to know?:
Here is a summary of the deployment status:

$ ptc msai msa-dev:seed insight deployment/msa-monitoring-kube-state-metrics -n monitoring   -r
Deployment/msa-monitoring-kube-state-metrics[monitoring], created 2 months ago
  desired:1, existing:1, ready:1, updated:1, available:1
  Available:True for 2 months MinimumReplicasAvailable :'Deployment has minimum availability.' last update was 2 months ago
  Progressing:True for 2 months NewReplicaSetAvailable :'ReplicaSet "msa-monitoring-kube-state-metrics-6bb98dc558" has successfully progressed.' last update was 2 months ago
Pod/msa-monitoring-kube-state-metrics-6bb98dc558-tlfvb[monitoring], created 2 months ago
  Running BestEffort
  Initialized:True for 2 months
  Ready:True for 4 days
  ContainersReady:True
  PodScheduled:True for 2 months
  Container: kube-state-metrics deploys quay.io/coreos/kube-state-metrics:v1.4.0
    ready:True
    state: running for 2 months

Environment:

Kubernetes version (use kubectl version): v1.11.5 --> Server Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.5", GitCommit:"753b2dbc622f5cc417845f0ff8a77f539a4213ea", GitTreeState:"clean", BuildDate:"2018-11-26T14:31:35Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}
Kube-state-metrics image version: quay.io/coreos/kube-state-metrics:v1.4.0

The text was updated successfully, but these errors were encountered:

mxinden · 2019-03-05T13:29:28Z

Interesting, thanks for the bug report @bergerx.

We depend on Kubernetes client-go reflectors to keep our caches in sync. Reflectors have the option of specifying a resync interval, which we currently disable. I feel like a first step would be to enable the resync interval to have a periodic full resync cycle next to the adhoc diff notifications.

In addition we should also add proper instrumentation to the metric store implementation to catch issues like this early.

Would this be something you would like to contribute @bergerx? Shouldn't be a big patch, I am happy to guide you along the way.

//CC @mgoodness Crossreferencing this Twitter report: https://twitter.com/opsgoodness/status/1098266699382620160

transient1 · 2019-05-29T14:48:08Z

I came here from a Google search for the same error and posting here in case they're related. I have a single kube-state-metrics pod that has been running for 26 days. I have an alert that started firing

kube_daemonset_status_number_ready{job="kube-state-metrics"}
  / kube_daemonset_status_desired_number_scheduled{job="kube-state-metrics"}
  * 100 < 100

and upon viewing logs for that pod I saw similar entries as the above, without the TLS errors

2019-05-17T16:04:53.573452071Z E0517 16:04:53.573302       1 streamwatcher.go:109] Unable to decode an event from the watch stream: http2: server sent GOAWAY and closed the connection; LastStreamID=2351, ErrCode=NO_ERROR, debug=""
2019-05-17T16:04:53.574364247Z E0517 16:04:53.574288       1 streamwatcher.go:109] Unable to decode an event from the watch stream: http2: server sent GOAWAY and closed the connection; LastStreamID=2351, ErrCode=NO_ERROR, debug=""
2019-05-17T16:04:53.574739793Z E0517 16:04:53.574660       1 streamwatcher.go:109] Unable to decode an event from the watch stream: http2: server sent GOAWAY and closed the connection; LastStreamID=2351, ErrCode=NO_ERROR, debug=""
2019-05-17T16:04:53.575269397Z E0517 16:04:53.575185       1 streamwatcher.go:109] Unable to decode an event from the watch stream: http2: server sent GOAWAY and closed the connection; LastStreamID=2351, ErrCode=NO_ERROR, debug=""
......
2019-05-17T16:04:53.577591604Z W0517 16:04:53.576601       1 reflector.go:341] k8s.io/kube-state-metrics/pkg/collectors/collectors.go:91: watch of *v1.Job ended with: very short watch: k8s.io/kube-state-metrics/pkg/collectors/collectors.go:91: Unexpected watch close - watch lasted less than a second and no items received
2019-05-17T16:04:53.577606944Z E0517 16:04:53.576682       1 reflector.go:322] k8s.io/kube-state-metrics/pkg/collectors/collectors.go:91: Failed to watch *v1.LimitRange: Get https://100.64.0.1:443/api/v1/limitranges?resourceVersion=55151870&timeoutSeconds=398&watch=true: dial tcp 100.64.0.1:443: connect: connection refused
.....
2019-05-17T16:04:53.677478437Z W0517 16:04:53.677302       1 reflector.go:341] k8s.io/kube-state-metrics/pkg/collectors/collectors.go:91: watch of *v1beta1.CronJob ended with: too old resource version: 55151873 (66552376)
2019-05-17T16:04:53.706509998Z W0517 16:04:53.706369       1 reflector.go:341] k8s.io/kube-state-metrics/pkg/collectors/collectors.go:91: watch of *v1.ReplicationController ended with: too old resource version: 55151870 (66552374)
2019-05-17T16:04:53.995074558Z W0517 16:04:53.994876       1 reflector.go:341] k8s.io/kube-state-metrics/pkg/collectors/collectors.go:91: watch of *v1.PersistentVolumeClaim ended with: too old resource version: 55151870 (66552374)

The last log line for my pod is from 5/17, but current date is 5/29.

brancz · 2019-05-29T14:53:25Z

that means that that hasn't happened since then, that is a good thing :) watches being renewed is a perfectly normal thing, nothing to be worried about

transient1 · 2019-05-29T15:10:59Z

Ah, okay. Then I have a different problem I need to track down. Thanks!

fejta-bot · 2019-08-27T15:54:15Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2019-09-26T16:37:55Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

bergerx · 2019-10-02T02:14:38Z

/remove-lifecycle stale.
I don't think this is being worked on

bergerx · 2019-10-02T02:24:36Z

/remove-lifecycle rotten

fejta-bot · 2019-12-31T02:54:58Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

bergerx · 2019-12-31T11:06:23Z

/remove-lifecycle stale

We are not hit by this recently, but the original issue doesn't seem to be solved as per @mxinden s first comment in this issue:, the NewReflector is called with resyncPeriod=0 here:

kube-state-metrics/internal/store/builder.go

Line 336 in 607bf8c

    
           reflector := cache.NewReflector(sharding.NewShardedListWatch(b.shard, b.totalShards, instrumentedListWatch), expectedType, store, 0)

From https://godoc.org/k8s.io/client-go/tools/cache#NewReflector:

func NewReflector(lw ListerWatcher, expectedType interface{}, store Store, resyncPeriod time.Duration) *Reflector

... If resyncPeriod is non-zero, then lists will be executed after every resyncPeriod, so that you can use reflectors to periodically process everything as well as incrementally processing the things that change.

brancz · 2020-01-07T14:05:57Z

Resync is really only for external reconciliation, where something external could be modifying the state we're reconciling, but the state that we're reconciling is entirely in-memory state of kube-state-metrics, which is not prone to that. Resync only re-processes items that we already know about, relists already happen automatically on connection resets or similar, which is what the initial report here was about as the apiserver was actually unavailable.

ekesken · 2020-03-17T15:32:55Z

we hit this issue in ~10 clusters among 45 within ~2 months.

k8s server => v1.14.9-eks-502bfb
kube-state-metrics => quay.io/coreos/kube-state-metrics:v1.9.1
prometheus => prom/prometheus:v2.15.2

brancz · 2020-03-18T09:43:42Z

Do you run kube-state-metrics in some specific way? With versions pre-1.9.0 you may be experiencing #942, if you're configuring individual namespaces. (note 1.14 is not supported by ksm 1.9.0)

Is anyone able to reliably reproduce this issue?

fejta-bot · 2020-06-16T10:37:26Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

ekesken · 2020-06-16T11:15:02Z

/remove-lifecycle stale

we're still observing this issue in our clusters.

lilic · 2020-06-16T11:57:55Z

@ekesken can you answer the question from Frederic if you are still running into this issue? :)

ekesken · 2020-06-16T13:19:00Z

@brancz

Do you run kube-state-metrics in some specific way?

we use 10.4.0 version of prometheus helm chart: https://github.com/helm/charts/blame/3944b1421eb039725816c905c00608193226b001/stable/prometheus/Chart.yaml

This is our diff for the settings:

kubeStateMetrics:
  podDisruptionBudget:
    enabled: true

  priorityClassName: "some-class"

  args:
   port: 8080
   telemetry-port: 8081
   namespace: some-namespace,kube-system,argo-cd,default

  tolerations:
    - key: "some-internal-name.com/dedicated"
      operator: "Equal"
      value: "cluster-management"
      effect: "NoSchedule"

  nodeSelector:
    unicron.mpi-internal.com/role: "mgmt"

  resources:
    limits:
      cpu: 200m
      memory: 256Mi
    requests:
      cpu: 100m
      memory: 200Mi

With versions pre-1.9.0 you may be experiencing #942

we're using 1.9.1

if you're configuring individual namespaces. (note 1.14 is not supported by ksm 1.9.0)

yes we do, and as I can see 1.16 is the only supported version for 1.9.5, unfortunately we don't have 1.16 clusters to observe the situation.

Is anyone able to reliably reproduce this issue?

we don't have a clue about the reason to reproduce it, but we keep observing the issue in our clusters in AWS EKS with the version v1.14.9-eks-f459c0

lilic · 2020-06-16T13:35:47Z

cc @tariq1890 don't have much experience with the ksm helm chart can you have a look.

bergerx · 2020-06-24T12:56:05Z

Aha i just came across this in https://github.com/kubernetes/kube-state-metrics/blob/1dfe6681e990623a4078b4e53a2e3c8761e213c6/docs/design/metrics-store-performance-optimization.md:

Kube-state-metrics listens on add, update and delete events via Kubernetes client-go reflectors. On add and update events kube-state-metrics generates all time series related to the Kubernetes object based on the event’s payload, concatenates the time series to a single byte slice and sets / replaces the byte slice in the store at the uuid of the Kubernetes object. One can precompute the length of a time series byte slice before allocation as the sum of the length of the metric name, label keys and values as well as the metric value in string representation. On delete events kube-state-metrics deletes the uuid entry of the given Kubernetes object in the cache map.

I didn't check the code but if this optimisation went in, and the deletion of the cached object are now depending on the events, could it be that the stale pod metrics persist if a pod delete event is missed by the ksm while it's trying to reconnecting to api-server restart right after an api-server disconnect for whatever reason.

brancz · 2020-06-26T05:29:16Z

It always depended on delete's to remove objects from cache before and after the optimizations. What was optimized was the cache itself. Is anyone still experiencing this with the latest patch release of kube-state-metrics 1.9 and a Kubernetes 1.18 cluster?

bergerx · 2020-06-27T01:53:17Z

Oh, so you mean ksm has always been edge triggered only, as in level-vs-edge-triggered, not level.

I can't comment on your question as we are not yet on Kubernetes 1.18. But here is a scenario that can likely happen for AKS case:

api-server outages are not something really rare
api-server (master components) are hosted by azure and they are not visible to users
master components (including apiserver and controller manager) and nodes are in different networks, and so Nodes (kubelets) and workloads running on the nodes (which includes operators running in the cluster and also ksm), are reaching out to the apiserver through en external LB,
api-server and controller-manager may be up and running at a given time, but api-server may not yet be available to the workload on the node pools e.g. during cluster updates
if somehow a pod is deleted and the event is fired during this time ksm can miss some delete events, not likely on small clusters, but can happen on busy clusters with frequent api-server outages.

lilic · 2021-01-08T13:32:16Z

@bergerx I tried to reproduce on my cluster and asked our Red Hat OpenShift QE to try to reproduce but we have not had any luck so far. We also did not get any reports from our customers so far that I know of for this bug. Can you possibly provide with exact steps, this would make it easier to fix it. thanks!

Note to anyone in this issue, you are welcome to also give it a try in fixing this, PRs are always welcome, we can backport the fix to 1.9 as well as into 2.0!

bergerx · 2021-02-11T01:39:31Z

I started to think this issue to be mainly happening on to hosted Kubernetes services like AKS and EKS. There are many here using it without any issue, and there is another set of users who are repeatedly experiencing issues.

Also see this weird issue stops suddenly stopping for some set of users in a python implementation.
kiwigrid/k8s-sidecar#85

Maybe the issue is lying in the way how the apiserver is hosted behind a managed load balancer.

@lilic I belive you can reproduce this if you try this on an AKS cluster. Most users complaining here are using AKS/EKS.

bergerx · 2021-02-11T01:46:24Z

Actually lets do a quick poll among AKS/EKS users, can you vote this entry with:

upvote (:+1: ) if you are experiencing issue on AKS/EKS
downvote (:-1: ) if you are not experincing this issue on AKS/EKS

Addresses a bug that causes a gap between `list` and `watch` when kube-state-metrics is sharded (fix for kubernetes#694) Kube-state-metrics does a `list` and then enters a `watch` loop. The intention is to `watch` **all** events after the initial list. The k8s API takes an optional `resource version` parameter which is returned as part of the `list` call and can be forwarded to the `watch` call, in order to fetch all events after the initial `list`. In its sharded version, kube-state-metrics intercepts the returned `list` in order to filter out the events for other shards. It reconstructs the response, but it does not propagate the `resource version` to the modified response. The subsequent `watch` call does not refer to a resource version. When `watch` is called without a `resource version`, it will provide a view consistent with the **most recent** resource version of the `watch` call, missing the events between the `resource version` at `list` call and the most recent one. The k8s documentation captures this as follows: _Get State and Start at Most Recent: Start a watch at the most recent resource version, which must be consistent (i.e. served from etcd via a quorum read). To establish initial state, the watch begins with synthetic "Added" events of all resources instances that exist at the starting resource version. All following watch events are for all changes that occurred after the resource version the watch started at._ Testing: Reproduced the original bug report deterministically by introducing an artificial delay (120s) in list, prior to returning the response, and terminating some pods. Unless the bug is fix, the terminated pods continue to be reported as running by kube-state-metrics

andreihagiescu · 2021-02-22T17:56:29Z

@bergerx Found an issue that manifests only when the sharded implementation is used (see #1390 ). This could explain why only a subset of the users see this.

Addresses a bug that causes a gap between `list` and `watch` when kube-state-metrics is sharded (fix for kubernetes#694) Kube-state-metrics does a `list` and then enters a `watch` loop. The intention is to `watch` **all** events after the initial list. The k8s API takes an optional `resource version` parameter which is returned as part of the `list` call and can be forwarded to the `watch` call, in order to fetch all events after the initial `list`. In its sharded version, kube-state-metrics intercepts the returned `list` in order to filter out the events for other shards. It reconstructs the response, but it does not propagate the `resource version` to the modified response. The subsequent `watch` call does not refer to a resource version. When `watch` is called without a `resource version`, it will provide a view consistent with the **most recent** resource version of the `watch` call, missing the events between the `resource version` at `list` call and the most recent one. The k8s documentation captures this as follows: _Get State and Start at Most Recent: Start a watch at the most recent resource version, which must be consistent (i.e. served from etcd via a quorum read). To establish initial state, the watch begins with synthetic "Added" events of all resources instances that exist at the starting resource version. All following watch events are for all changes that occurred after the resource version the watch started at._ Testing: Reproduced the original bug report deterministically by introducing an artificial delay (120s) in list, prior to returning the response, and terminating some pods. Unless the bug is fix, the terminated pods continue to be reported as running by kube-state-metrics (cherry picked from commit c1842eb)

AnastasiaBlack · 2021-04-28T15:13:34Z

Hello! I am not sure, whether this issue has the similar root casue with the one that we are now facing ( please see #1467 ). We also have AKS, and we have multiple pods (jobs) that run for a short period of time (a couple of seconds for some of them) and then terminate (with reason Completed), and when we make a query with kube_pod_start_time for every pod - we get the expected result (that is the timestamp when the pod started). But now we want to get the timestamps when the Containers were started, so we use kube_pod_container_state_started and it appears that it works for some pods and doesn't show the result for others (while other queries, including kube_pod_start_time show the result for all of the pods). If we check the pods with kubectl describe pod [pod_name] - the values we need (container start time) are present.

Maybe our issue is also somehow connected to caching? But we don't see any warnings, or timeout, or error in the kube-state-metrics pod's logs.
And the most interesting part is that we can successfully get metrics for the pods in question with kube_pod_start_time , while kube_pod_container_state_started doesn't work for the same pods (but it work for some other pods in the same namespace).

k8s-triage-robot · 2021-07-27T15:25:12Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

k8s-triage-robot · 2021-08-26T15:26:04Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

fpetkovski · 2021-08-27T06:39:20Z

/remove-lifecycle rotten

k8s-triage-robot · 2021-11-25T06:48:01Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2021-12-25T07:02:14Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot · 2022-01-24T07:07:43Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen
Mark this issue or PR as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

k8s-ci-robot · 2022-01-24T07:08:04Z

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen

Mark this issue or PR as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

JohnRusk · 2022-02-28T04:52:26Z

I presume that this helps kubernetes/kubernetes#59848 (comment). See also #1499 (comment)

k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Mar 4, 2019

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 27, 2019

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Sep 26, 2019

k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Oct 2, 2019

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 31, 2019

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 31, 2019

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 16, 2020

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 16, 2020

lilic added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Jan 4, 2021

bergerx mentioned this issue Feb 11, 2021

Watching stops working after 10 minutes kiwigrid/k8s-sidecar#85

Closed

This was referenced Feb 22, 2021

Propagate resource version when sharded andreihagiescu/kube-state-metrics#1

Closed

Propagate resource version when sharded #1390

Merged

fpetkovski mentioned this issue Jun 17, 2021

Replace multiListerWatcher with independent listWatchers per namespace #1499

Merged

fpetkovski mentioned this issue Jun 25, 2021

Ignore metrics if node is removed #1487

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 27, 2021

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Aug 26, 2021

k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Aug 27, 2021

fpetkovski mentioned this issue Sep 2, 2021

deleted pods still reporting metrics #1569

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 25, 2021

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Dec 25, 2021

k8s-ci-robot closed this as completed Jan 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kube-state-metrics keep serving stale metrics after extended apiserver outage #694

kube-state-metrics keep serving stale metrics after extended apiserver outage #694

bergerx commented Mar 4, 2019

mxinden commented Mar 5, 2019

transient1 commented May 29, 2019 •

edited

Loading

brancz commented May 29, 2019

transient1 commented May 29, 2019

fejta-bot commented Aug 27, 2019

fejta-bot commented Sep 26, 2019

bergerx commented Oct 2, 2019

bergerx commented Oct 2, 2019

fejta-bot commented Dec 31, 2019

bergerx commented Dec 31, 2019 •

edited

Loading

brancz commented Jan 7, 2020

ekesken commented Mar 17, 2020 •

edited

Loading

brancz commented Mar 18, 2020 •

edited

Loading

fejta-bot commented Jun 16, 2020

ekesken commented Jun 16, 2020

lilic commented Jun 16, 2020

ekesken commented Jun 16, 2020

lilic commented Jun 16, 2020

bergerx commented Jun 24, 2020

brancz commented Jun 26, 2020

bergerx commented Jun 27, 2020 •

edited

Loading

lilic commented Jan 8, 2021

bergerx commented Feb 11, 2021

bergerx commented Feb 11, 2021

andreihagiescu commented Feb 22, 2021

AnastasiaBlack commented Apr 28, 2021

k8s-triage-robot commented Jul 27, 2021

k8s-triage-robot commented Aug 26, 2021

fpetkovski commented Aug 27, 2021

k8s-triage-robot commented Nov 25, 2021

k8s-triage-robot commented Dec 25, 2021

k8s-triage-robot commented Jan 24, 2022

k8s-ci-robot commented Jan 24, 2022

JohnRusk commented Feb 28, 2022

kube-state-metrics keep serving stale metrics after extended apiserver outage #694

kube-state-metrics keep serving stale metrics after extended apiserver outage #694

Comments

bergerx commented Mar 4, 2019

mxinden commented Mar 5, 2019

transient1 commented May 29, 2019 • edited Loading

brancz commented May 29, 2019

transient1 commented May 29, 2019

fejta-bot commented Aug 27, 2019

fejta-bot commented Sep 26, 2019

bergerx commented Oct 2, 2019

bergerx commented Oct 2, 2019

fejta-bot commented Dec 31, 2019

bergerx commented Dec 31, 2019 • edited Loading

brancz commented Jan 7, 2020

ekesken commented Mar 17, 2020 • edited Loading

brancz commented Mar 18, 2020 • edited Loading

fejta-bot commented Jun 16, 2020

ekesken commented Jun 16, 2020

lilic commented Jun 16, 2020

ekesken commented Jun 16, 2020

lilic commented Jun 16, 2020

bergerx commented Jun 24, 2020

brancz commented Jun 26, 2020

bergerx commented Jun 27, 2020 • edited Loading

lilic commented Jan 8, 2021

bergerx commented Feb 11, 2021

bergerx commented Feb 11, 2021

andreihagiescu commented Feb 22, 2021

AnastasiaBlack commented Apr 28, 2021

k8s-triage-robot commented Jul 27, 2021

k8s-triage-robot commented Aug 26, 2021

fpetkovski commented Aug 27, 2021

k8s-triage-robot commented Nov 25, 2021

k8s-triage-robot commented Dec 25, 2021

k8s-triage-robot commented Jan 24, 2022

k8s-ci-robot commented Jan 24, 2022

JohnRusk commented Feb 28, 2022

transient1 commented May 29, 2019 •

edited

Loading

bergerx commented Dec 31, 2019 •

edited

Loading

ekesken commented Mar 17, 2020 •

edited

Loading

brancz commented Mar 18, 2020 •

edited

Loading

bergerx commented Jun 27, 2020 •

edited

Loading