Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Metricbeat] Add support for node conditions to state_node metricset #18049

Closed
sorantis opened this issue Apr 28, 2020 · 8 comments · Fixed by #23905
Closed

[Metricbeat] Add support for node conditions to state_node metricset #18049

sorantis opened this issue Apr 28, 2020 · 8 comments · Fixed by #23905
Assignees
Labels
enhancement Metricbeat Metricbeat Team:Integrations Label for the Integrations team Team:Platforms Label for the Integrations - Platforms team

Comments

@sorantis
Copy link
Contributor

sorantis commented Apr 28, 2020

Today the state_node metricset in Kubernetes module only covers the following states:

"status": {
        "ready": "true",
        "unschedulable": false
      }

However the kube_node_state_condition kube_node_status_condition in node metrics covers many other conditions, such as: MemoryPressure, DiskPressure, OutOfDisk.

These conditions are useful for alerting purposes because it allows users to monitor resource states rather than health: "Is MemoryUsed > 80%" vs. "Are my nodes running out of memory?"

Add support for node status conditions to the Metricbeat's Kubernetes module that would allow building conditions like:
if kube node status condition is "MemoryPressure" and status is "true" then fire an alert

@sorantis sorantis added enhancement Metricbeat Metricbeat Team:Integrations Label for the Integrations team Team:Platforms Label for the Integrations - Platforms team labels Apr 28, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/integrations (Team:Integrations)

@elasticmachine
Copy link
Collaborator

Pinging @elastic/integrations-platforms (Team:Platforms)

@ioandr
Copy link
Contributor

ioandr commented Jul 20, 2020

Hi @sorantis, this seems like a very interesting extension to the existing state_node Metricset.

I would be happy to invest some cycles on it based on your description.

For starters, I supply the officially documented Node Conditions [1] which are the ones I also found in the Kubernetes source [2]. You have already mentioned some of them, so I think the following ones are a good start:

Node Condition Description
DiskPressure True if pressure exists on the disk size--that is, if the disk capacity is low; otherwise False
MemoryPressure True if pressure exists on the node memory. That is, if the node memory is low; otherwise False
PIDPressure True if pressure exists on the processes. That is, if there are too many processes on the node; otherwise False
NetworkUnavailable True if the network for the node is not correctly configured, otherwise False

Plus, is the state_node Metricset targeting a specific Kubernetes version? Node conditions might have been renamed or deprecated across Kubernetes versions. For example see the following PRs:

[1] https://kubernetes.io/docs/concepts/architecture/nodes/#condition
[2] https://github.com/kubernetes/api/blob/release-1.18/core/v1/types.go#L4561

@sorantis
Copy link
Contributor Author

@ioandr Thanks for your interest in developing this!
The module is tested for the Kubernetes versions mentioned here. I'd first focus on the conditions applicable to all versions, like the ones you mentioned.

cc: @ChrsMark @exekias

@ChrsMark
Copy link
Member

ChrsMark commented Jan 27, 2021

@ioandr are you still planning to work on this? If no, no worries, I can take over in order to have it in soon, just let me know.

@ChrsMark
Copy link
Member

Just a note here regarding supported versions. We should focus on kube-state-metrics service versions since this is the service we collect from. Currently we support ksm.v1.3 and ksm.v1.8 so I think can stick to ksm.v1.8 for this one, I see there is also the v1.9 and a v2 coming but let's keep the support for them out of the context of this PR.

@ioandr
Copy link
Contributor

ioandr commented Jan 27, 2021

Hi @ChrsMark, thanks for the headsup.

are you still planning to work on this?

Let's resume the work here and see what it takes to deliver this as soon as possible!

I will provide updates here as I go.

@kbujold
Copy link

kbujold commented Mar 1, 2023

Is there a reason why the "NetworkUnavailable" Kubernetes Node Conditions was not added?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Metricbeat Metricbeat Team:Integrations Label for the Integrations team Team:Platforms Label for the Integrations - Platforms team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants