Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add node analyzer #272

Merged
merged 2 commits into from
Apr 14, 2023
Merged

Conversation

yeahservice
Copy link
Member

Closes #210

📑 Description

Adds a node analyzer, that checks the condition on nodes and reports failures accordingly.

Conditions are currently checked according to https://kubernetes.io/docs/concepts/architecture/nodes/#condition. But since cloud providers might introduce their own NodeConditionTypes and a new NodeConditionStatus might be added, the check takes this into account and at worst reports them as false positives.

✅ Checks

  • My pull request adheres to the code style of this project
  • My code requires changes to the documentation
  • I have updated the documentation as required
  • All the tests have passed

ℹ Additional Information

./k8sgpt filters list
Active: 
> Node
Unused: 
> CronJob
> Pod
> Deployment
> PersistentVolumeClaim
> Service
> Ingress
> StatefulSet
> ReplicaSet
> NetworkPolicy
> HorizontalPodAutoScaler
> PodDisruptionBudget
./k8sgpt analyze --explain 

0 node1(node1)
- Error: node1 has condition of type MemoryPressure, reason NodeStatusUnknown: Kubelet stopped posting node status.
- Error: node1 has condition of type DiskPressure, reason NodeStatusUnknown: Kubelet stopped posting node status.
- Error: node1 has condition of type PIDPressure, reason NodeStatusUnknown: Kubelet stopped posting node status.
- Error: node1 has condition of type Ready, reason NodeStatusUnknown: Kubelet stopped posting node status.
The error message is indicating that the Kubernetes node is experiencing memory, disk, and PID pressure. This is due to the Kubelet not posting node status. The reason for this could be one of several factors, such as network issues or the Kubelet process crashing.

To solve this issue, you should start by checking the status of the Kubelet process on the affected node. You can do this using the command "systemctl status kubelet". If the process is not running, you can try starting it manually using "systemctl start kubelet". If the Kubelet process crashes frequently, you may need to investigate why this is happening and take steps to prevent it from happening in the future.

Additionally, you should check the network connectivity between the affected node and the Kubernetes master. This can often be a source of issues with node status updates. If there are any issues with network connectivity, you may need to troubleshoot those as well.

@yeahservice yeahservice requested review from a team as code owners April 14, 2023 12:14
Signed-off-by: Dominik Augustin <dom.augustin@gmx.at>
Copy link
Member

@AlexsJones AlexsJones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent work thank you!

Copy link
Contributor

@thschue thschue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@thschue thschue merged commit 6247a1c into k8sgpt-ai:main Apr 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

feature: analyzer for Nodes
3 participants