Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide last_terminated_reason_timestamp for when last_terminated_reason event happens in K8s Integration #3802

Closed
gizas opened this issue Nov 22, 2023 · 5 comments · Fixed by elastic/integrations#10503
Assignees
Labels
Team:Cloudnative-Monitoring Label for the Cloud Native Monitoring team

Comments

@gizas
Copy link
Contributor

gizas commented Nov 22, 2023

Describe the enhancement:
Provide the time that last_terminated_reason of cotainer occurred in Kubernetes Integration
Possible fields that can support that (from KSM-Pod metrics) can be kube_pod_status_container_ready_time.

Describe a specific use case for the enhancement or feature:
The "kubernetes.container.status.last_terminated_reason" is a useful field, especially in the case of missed metrics. This metric on its own can be difficult to identify when it happened as the container or pod can keep this error since last restart and also can be
This request is specifically for a timestamp for when this last_terminated_reason occurred.

What is the definition of done?

  • Elastic Kubernetes Inetegration package update
  • Beats code update to support the new field eg. last_terminated_reason_timestamp
  • Update Relevant documentation
  • Provide Test/ Evidence that new field is present and integration works as expected
@gizas gizas added the Team:Cloudnative-Monitoring Label for the Cloud Native Monitoring team label Nov 22, 2023
@sophiec20
Copy link
Contributor

I imagine that the accuracy of this value will be hard to pinpoint, but even an approx time would be great ... from looking at the data, it's useful to know if an error (such as OOMKilled) happened 7 mins or 7 hours or 7 days ago.

@afharo
Copy link
Member

afharo commented Nov 24, 2023

Possible fields that can support that (from KSM-Pod metrics) can be kube_pod_status_container_ready_time.

I just tried using that field (elastic/beats#37192), and unfortunately, they seem to come in different events, leading to separate Metricbeat entries :(

We need to look at any other potential fields or have Metricbeat somehow generate it for us.

It looks like there's a request in the kubernetes repo to log the event whenever there's an OOMKilled: kubernetes/kubernetes#69676

@tetianakravchenko
Copy link
Contributor

tetianakravchenko commented Dec 28, 2023

I just tried using that field (elastic/beats#37192), and unfortunately, they seem to come in different events, leading to separate Metricbeat entries :(

kube_pod_status_container_ready_time identify the time when the Readiness probe was successful and the container is ready to a accept connections

Checking the existent metrics:

  • kube_pod_completion_time Completion time in unix timestamp for a pod. - from my understanding it is mainly used for the jobs

    I've noticed there was created another issue to add kube_pod_completion_time for the same reason, but I believe this metric does not represent what we need - Add kube_pod_completion_time to kube-state-metrics beats#37206 (comment)

  • kube_pod_container_state_started Start time in unix timestamp for a pod container (reffers to containerStatus.State.Terminated.StartedAt)

  • kube_pod_created Unix creation timestamp of the pod (refers to v1.Pod.CreationTimestamp)

  • kube_pod_deletion_timestamp Unix deletion timestamp (refers to v1.Pod.DeletionTimestamp)

  • kube_pod_start_time Start time in unix timestamp for a pod. (refers to v1.Pod.Status.StartTime)

  • kube_pod_status_ready_time - Readiness achieved time in unix timestamp for a pod. (v1.Pod.Status.Conditions, reported if pod condition of type type: Ready is status: "True")

  • kube_pod_status_initialized_time Initialized time in unix timestamp for a pod. (v1.Pod.Status.Conditions, reported if pod condition of type type: Initialized is status: "True")

  • kube_pod_status_container_ready_time Readiness achieved time in unix timestamp for a pod containers. (v1.Pod.Status.Conditions, reported if pod condition of type type: ContainersReady is status: "True" )

  • kube_pod_status_scheduled_time Unix timestamp when pod moved into scheduled status (v1.Pod.Status.Conditions, reported if pod condition of type type: PodScheduled is status: "True" )
    Example of the status.conditions that last 4 metrics above are referring to:

status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2023-12-27T16:31:05Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2023-12-28T10:49:20Z"
    status: "True"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2023-12-28T10:49:20Z"
    status: "True"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2023-12-27T16:31:05Z"
    status: "True"
    type: PodScheduled

=> there is no such metric at the moment that would provide the time that last_terminated_reason of cotainer occurred

What is needed:

What is the interest of this issue is a cs.LastTerminationState.Terminated.FinishedAt as I understood:

Containers:
  kube-scheduler:
    ...
    State:          Running
      Started:      Wed, 27 Dec 2023 17:05:34 +0100
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Wed, 27 Dec 2023 13:38:18 +0100
      Finished:     Wed, 27 Dec 2023 17:05:33 +0100
    Ready:          True
    Restart Count:  1

@tetianakravchenko
Copy link
Contributor

Here is a PR to report kube_pod_container_status_last_terminated_timestamp kubernetes/kube-state-metrics#2291

@tetianakravchenko
Copy link
Contributor

tetianakravchenko commented Apr 24, 2024

Progress:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Cloudnative-Monitoring Label for the Cloud Native Monitoring team
Projects
None yet
4 participants