Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add last termination state when pod is in CrashloopBackoff #792

Merged

Conversation

akhil-rane
Copy link
Contributor

@akhil-rane akhil-rane commented Dec 15, 2023

📑 Description

The OOMKilled is a very short-lived container state, probably around few seconds. As soon as kubernetes sees a pod in OOMKilled state it tries to restart the pod. If pod is getting terminated and getting restarted multiple times, kubernetes marks it under CrashloopBackoff state. The event message in CrashloopBackoff state does not capture OOMKilled context. When k8sgpt PodAnalyzer sees a pod in CrashloopBackoff state it should check why pod is getting terminated/restarting (LastTerminatedState) and add it in the failure message. The following is the before and after results for containers in CrashloopBackoff state

Before


3 opa/opa-849dcb7499-9chrt(Deployment/opa)

- Error: back-off 5m0s restarting failed container=kube-mgmt pod=opa-849dcb7499-9chrt_opa(7f8061d3-3603-49d6-8e03-77cb4b1bbed4)

- Error: back-off 5m0s restarting failed container=opa pod=opa-849dcb7499-9chrt_opa(7f8061d3-3603-49d6-8e03-77cb4b1bbed4)

Error: The containers 'kube-mgmt' and 'opa' in the pod 'opa-849dcb7499-9chrt_opa' have failed to start and are continuously restarting.



Solution: 

1. Use 'kubectl describe pod opa-849dcb7499-9chrt_opa' to get more details about the error.

2. Check the logs of the failing containers.

3. Resolve any identified issues, such as fixing code errors or adjusting resource limits.

4. After fixing, redeploy the pod.

After

3 opa/opa-849dcb7499-9chrt(Deployment/opa)

- Error: back-off 5m0s restarting failed container=kube-mgmt pod=opa-849dcb7499-9chrt_opa(7f8061d3-3603-49d6-8e03-77cb4b1bbed4)

- Error: the last termination reason is Error container=kube-mgmt pod=opa-849dcb7499-9chrt

- Error: back-off 5m0s restarting failed container=opa pod=opa-849dcb7499-9chrt_opa(7f8061d3-3603-49d6-8e03-77cb4b1bbed4)

- Error: the last termination reason is OOMKilled container=opa pod=opa-849dcb7499-9chrt

Error: The containers 'kube-mgmt' and 'opa' in the pod 'opa-849dcb7499-9chrt' are failing to restart due to an error and 'out of memory' (OOM) issue respectively.



Solution: 

1. Check the logs for 'kube-mgmt' to identify the error.

2. Allocate more memory to the 'opa' container.

3. Restart the pod.

✅ Checks

  • My pull request adheres to the code style of this project
  • My code requires changes to the documentation
  • I have updated the documentation as required
  • All the tests have passed

ℹ Additional Information

Signed-off-by: Akhil Rane <akhil131192@gmail.com>
@akhil-rane akhil-rane requested review from a team as code owners December 15, 2023 23:48
@AlexsJones
Copy link
Member

@akhil-rane welcome and thank you for your contribution

@AlexsJones AlexsJones merged commit ff4aaf7 into k8sgpt-ai:main Dec 20, 2023
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

None yet

2 participants