Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Directly emit container ready time metric #2119

Open
bbdouglas opened this issue Jul 18, 2023 · 7 comments
Open

Directly emit container ready time metric #2119

bbdouglas opened this issue Jul 18, 2023 · 7 comments
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@bbdouglas
Copy link

What would you like to be added:

It would be great to have a metric for the container ready time in seconds to be emitted directly. There is currently a boolean gauge kube_pod_container_status_ready, which emits whether the container is ready or not, but that requires some computation to get at the time when the container flipped to the ready state. I'm interested in learning the amount of time it took between when the container started and when it was ready, and that would be simpler and more efficient to measure if kube-state-metrics emitted the ready time directly.

There was a similar metric added at the pod level (#1465), but this would be at the container level. In the pods that I am tracking, there are many containers with wildly varying ready times, so it is helpful for debugging and optimization purposes to know how long each container takes to get ready.

Why is this needed:

Similar to the pod-level ready time metric (#1465), I'd like to measure the ready time of each individual container within my pod. This is helpful for tracking startup-times at a finer level of granularity than the whole pod, especially when a pod has many containers.

It is possible to use the existing boolean kube_pod_container_status_ready boolean to calculate this by looking at a series of data points and choosing the first point in time when that flag flips from false to true, but in practice that can be very resource intensive for Prometheus to calculate if there are a large number of pods/containers.

Describe the solution you'd like

I would ideally like to see a new metric analogous to kube_pod_status_ready_time emitted at the container granularity.

Additional context

I'm not that familiar with the internals of the Kubernetes API, but unfortunately it does not look like ContainerStatus has the same breadth of information as PodCondition, which includes a LastTransitionTime. So this might not be a simple addition.

@bbdouglas bbdouglas added the kind/feature Categorizes issue or PR as related to a new feature. label Jul 18, 2023
@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Jul 18, 2023
@dashpole
Copy link

/triage accepted
/assign @dgrisonnet

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jul 27, 2023
@dgrisonnet
Copy link
Member

The container level metric should already be available:

"kube_pod_status_container_ready_time",

@bbdouglas
Copy link
Author

Hi @dgrisonnet, thanks for looking into this.

Unfortunately, I believe the metric you pointed to is actually at the pod level, representing the time that all containers are ready (ContainersReady). From the comments in the api:

// ContainersReady indicates whether all containers in the pod are ready.

@dgrisonnet
Copy link
Member

Correct, the name got me.

We should probably base kube_pod_status_container_ready_time on ContainerStatus rather than on the pod status.

@abhiraut
Copy link

abhiraut commented Jan 9, 2024

It is possible to use the existing boolean kube_pod_container_status_ready boolean to calculate this by looking at a series of data points and choosing the first point in time when that flag flips from false to true

@bbdouglas I am curious how you currently calculate this with promQL?

@bbdouglas
Copy link
Author

@abhiraut Here is the query I came up with. Since it's looking back, you have to manually set the maximum age that you expect a pod to be up. Here I have assumed no pod lives for more than 1 day.

min_over_time(timestamp(kube_pod_container_status_ready{container="mycontainer", pod_phase="Running"} == 1)[1d])

@abhiraut
Copy link

thanks !
@dgrisonnet do you think we can directly emit the ready time? i think it would be helpful and consistent with how the readiness is emitted at Pod level.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

5 participants