-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add "Terminating" status in kube_pod_status_phase metrics #1013
Add "Terminating" status in kube_pod_status_phase metrics #1013
Conversation
This PR seems like it addresses #348 as well, which was closed w/o resolution. |
@jinnovation it is exactly the use case that we want to monitor with this fix: Pod blocked in Terminated state that require specific operation on the node. |
internal/store/pod.go
Outdated
@@ -274,6 +278,9 @@ var ( | |||
} | |||
} | |||
|
|||
// compute the "Terminating" phase in order to reflect what a user can see with kubectl: https://github.com/kubernetes/kubernetes/blob/6a4216ba59ce1d09c8ac1c6229649bb71b7f6c85/pkg/printers/internalversion/printers.go#L737-L741 | |||
isPodTerminating := p.DeletionTimestamp != nil && p.Status.Reason != nodeUnreachablePodReason |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not really how kube state metrics is implemented otherwise. We usually just expose the raw data and leave computations like this to query time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I completely agree, but some logic was already added to compute the Unknow status
.
https://github.com/kubernetes/kube-state-metrics/pull/1013/files#diff-791ee1d484ea70de1ecd1d8875f8f5c0L286
Here we want to reflect what is displayed by "kubectl" to avoid confusion, and avoid having a Pod in a Running state when it is not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that was a mistake and since we're in the process of cleaning up kube-state-metrics and releasing preparing a breaking release, the time is right to change this. @lilic @tariq1890 what do you think? We should definitely make sure that for these cases we have the required raw data available so we can compute these things at runtime, but we should strive to remove any of these types of things from kube-state-metrics, just like we state in our docs :) .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes agreed, we should expose just raw metrics, users can compute whatever they want afterwards, and/or create recording rules for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way with the current information to know how many pods have the deletionTimestamp set?
With this info, we could indeed easily compute the "Terminating" status
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that sounds perfect! :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR updated with 2 new metrics:
- kube_pod_deleted
- kube_pod_status_reason
I can create another PR to remove the current logic for the Unknown
pod phase.
5fd6a40
to
e69b6fc
Compare
Tests are failing as you need to update the docs for the metrics. |
Hi @lilic I updated the PR with proper documentation and also flagged new metrics as "EXPERIMENTAL" in the documentation. |
fd9409a
to
680456e
Compare
680456e
to
2aff65e
Compare
Hi @tariq1890 |
@clamoriniere Sorry for the delay and Thank you so much for your patience with this PR! It's greatly appreciated :). The code looks good, my last request is for you to provide a few examples (doesn;'t have to be complete) of using these raw basic metrics together for asserting on particular pod states (such as Terminating) in the |
Ok no problem. |
@clamoriniere Just following up on this :) |
Hi @tariq1890,
Let me know if you prefer that I move the doc in the README.md |
54e369a
to
f0f27ae
Compare
Aim of this PullRequest is to add several new pod metrics: - kube_pod_deleted: Unix deletion timestamp - kube_pod_status_reason: The pod status reasons (NodeLost, Evicted) These new metrics can be used to determine the `pod.status.phase` displayed by `kubectl`: Running, Terminated, Unknown… It will allow removing the `kubectl` display logic added previously to compute the “Unknown” phase. Signed-off-by: cedric lamoriniere <cedric.lamoriniere@datadoghq.com>
Signed-off-by: cedric lamoriniere <cedric.lamoriniere@datadoghq.com>
Signed-off-by: cedric lamoriniere <cedric.lamoriniere@datadoghq.com>
Signed-off-by: cedric lamoriniere <cedric.lamoriniere@datadoghq.com>
f0f27ae
to
02844c3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@clamoriniere Thank you! Last set of review comments and this should be good to go :)
@tariq1890 let me know if you want me to do another PR that removes the current logic to generate the kube-state-metrics/internal/store/pod.go Line 285 in aa8a0af
|
Sure! That sounds good to me :) |
Signed-off-by: cedric lamoriniere <cedric.lamoriniere@datadoghq.com>
564ed16
to
311c682
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: clamoriniere, tariq1890 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
As explained in the doc: "kube-state-metrics exposes raw data unmodified from the Kubernetes API". The goal of this change was to remove the kubctl logic introduced to get the `Unknown` phase in `kube_pod_status_phase` metric. With the introduction of `kube_pod_deleted` metric, it is now possible to get the same result with a promQL query. (more info in this kubernetes#1013). Signed-off-by: cedric lamoriniere <cedric.lamoriniere@datadoghq.com>
Which release will this be included in ? |
I think in the next major release: |
I have a pod in status Terminating but whith kube-state-metrics:v2.7.0 can not see |
What this PR does:
Aim of this PullRequest is to add several new pod metrics:
These new metrics can be used to determine the
pod.status.phase
displayed by
kubectl
: Running, Terminated, Unknown…It will allow removing the
kubectl
display logic added previouslyto compute the “Unknown” phase.
kube-state-metrics/internal/store/pod.go
Line 288 in d35e7ba
why we need it:
In some cases, a Pod can be stuck in the "Terminating" phase due to a "Kubelet" issue; for example: the Kubelet is not able to communicate with the container runtime, or the container runtime is not able to delete the associated container.
And so, it can be interesting to have a way to create an alert on this kind of "bad" Pod state, which is currently not possible to do since the pod is flag by
kube_pod_status_phase
metrics as "Running".Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #