Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support kube_pod_ready_time metric #1465

Closed
sgrzemski opened this issue Apr 26, 2021 · 23 comments · Fixed by #1938
Closed

Support kube_pod_ready_time metric #1465

sgrzemski opened this issue Apr 26, 2021 · 23 comments · Fixed by #1938
Labels
kind/feature Categorizes issue or PR as related to a new feature.

Comments

@sgrzemski
Copy link
Contributor

What would you like to be added:

Hello kube-state-metrics team!
I am a happy user of your metrics software. I would like the kube-state-metrics to report also the time, when pod became ready (passing readiness probe). According to docs, there are already a couple of gauges in seconds: kube_pod_start_time, kube_pod_container_state_started, etc.

Why is this needed:

I would like to be able to measure the time needed for the container to become fully operational and healthy. I already have the created and start timestamp, so a simple delta query in prometheus would do the trick if a metric reporting the ready time would be implemented.

Describe the solution you'd like

Query the Kubernetes API to get the ready timestamp.

Additional context

@sgrzemski sgrzemski added the kind/feature Categorizes issue or PR as related to a new feature. label Apr 26, 2021
@lilic
Copy link
Member

lilic commented Apr 29, 2021

Hey 👋 Can you explain where this API is at? If k8s API reports this, it sounds good to me.

Note that ContainerState is the only thing that reports StartedAt. I haven't looked into StartTime if that can be used somehow.

@sgrzemski
Copy link
Contributor Author

Pardon my delay, I was off for some time.
I took a look at the code and it looks like kube-state-metrics uses Pod objects, Pod.Status.StartTime specifically, to create the kube_pod_start_time metric. However, according to Pod Lifecycle docs, PodStatus should have an array of PodConditions, containing the following information:

  • PodScheduled: the Pod has been scheduled to a node.
  • ContainersReady: all containers in the Pod are ready.
  • Initialized: all init containers have started successfully.
  • Ready: the Pod is able to serve requests and should be added to the load balancing pools of all matching Services.

Those come with two useful properties called:

  • lastProbeTime: Timestamp of when the Pod condition was last probed.
  • lastTransitionTime: Timestamp for when the Pod last transitioned from one status to another.

This information should be enough to form a metric called kube_pod_ready_time and with a simple PromQL get the time needed for the pod to start.

@sgrzemski
Copy link
Contributor Author

sgrzemski commented May 17, 2021

I've patched v1.9.8 release with some additional code to report both ContainersReady and Ready timestamps and the transition between states can happen multiple times (e.g. pod stopped passing readiness probes). Quoting Pod Lifecycle docs:

Pod is evaluated to be ready only when both the following statements apply:

All containers in the Pod are ready.
All conditions specified in readinessGates are True.

I will change those metrics to report latest timestamp and match with your current convention: kube_pod_status_ready_time and kube_pod_status_containers_ready_time and prepare a PR.

@brancz
Copy link
Member

brancz commented Jun 7, 2021

I'm pretty sure the kubelet exposes metrics about the readiness probes. I think it's the kubelet's responsibility to expose this.

@szymon-grzemski
Copy link

I'm pretty sure the kubelet exposes metrics about the readiness probes. I think it's the kubelet's responsibility to expose this.

I am running 1.19+ and I am seeing kubelet_pod_start_duration_seconds_bucket, _sum and _count in Prometheus, but they are node level, not per specific pods.

@slamdev
Copy link

slamdev commented Aug 10, 2021

@lilic @brancz any plans to merge this? looking forward to use this metric

@kevinwubert
Copy link

I was looking for something just like this! Was there something blocking this from getting merged into kube state metrics 2?

@fpetkovski
Copy link
Contributor

It looks like the PR has gone stale. Would you be interested in wrapping up the work?

@sgrzemski
Copy link
Contributor Author

Would love to! I will update in the next couple of days.

@SpectralHiss
Copy link

SpectralHiss commented Dec 15, 2021

👍 This is useful for doing accurate total pod start time calculation, instead of trying to infer from ready count or something, in my particular case trying to benchmark the effect of some Istio sidecar settings on startup time.
any updates @sgrzemski ?

@PrayagS
Copy link

PrayagS commented Feb 16, 2022

Looking forward to testing out this feature. @sgrzemski Are we stuck somewhere?

My team has a similar use case where we're trying to figure out the time it takes a pod to be scheduled to a particular node. We can get hold of the time when the pod transitioned to PodScheduled and report that as a metric.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 17, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 16, 2022
@stat-johan
Copy link

This would be really nice to have, @sgrzemski

@fpetkovski
Copy link
Contributor

/remove-lifecycle rotten

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jul 1, 2022
@fpetkovski
Copy link
Contributor

/remove-lifecycle stale

@qingguee
Copy link

Looking forward for this PR be merged, because I find that the "kube_pod_status_ready" has 2s delay to show ready status which compare with Ready timestamp from pod condition. I have reported a Issue, but no response yet.
#1830

So, we can't rely on "kube_pod_status_ready". If we want to calculate POD startup time, that's an issue.
It's always an issue by using metrics to calculate time, so we need a metrics can return ready timestamp which can get from pod conditions.

@sumanthkumarc
Copy link

This would be a really great metric to have. Helps us to understand the time taken for services to come up in cluster.

@max-rocket-internet
Copy link

The metric would be incredibly valuable! For example to know:

  • Seconds until pod is scheduled
  • Seconds until pod is Ready

@coleary-hyperscience
Copy link

I'm still pretty new to Prometheus, but I'm using this query to collect an almost equivalent metric. Please let me know if you find this useful or foresee any issues with it: sort_desc(max(sum_over_time(kube_pod_status_phase{namespace=~"$namespace", phase="Pending"}[$__range])/4) by (pod))
This returns the approximate (30 sec accuracy) time (in min) for each pod in the pending state.

@vijaynidhi85
Copy link

I'm still pretty new to Prometheus, but I'm using this query to collect an almost equivalent metric. Please let me know if you find this useful or foresee any issues with it: sort_desc(max(sum_over_time(kube_pod_status_phase{namespace=~"$namespace", phase="Pending"}[$__range])/4) by (pod)) This returns the approximate (30 sec accuracy) time (in min) for each pod in the pending state.

Could I know why the /4 is required?

@coleary-hyperscience
Copy link

Could I know why the /4 is required?

For sure, it looks to me like kube_pod_status_phase pings 4 times every min. So it is to convert the sum of those 4 pings to a min rate.

Not a great way to do it, but seems to be working for me, at least until this other PR gets merged.

@jkdihenkar
Copy link

Hi @sgrzemski - can you share us the export of the dashboard that you've plotted based on these metrics?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment