Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[node metrics] add message field to kube_node_status_condition #2404

Open
daveoy opened this issue May 29, 2024 · 5 comments
Open

[node metrics] add message field to kube_node_status_condition #2404

daveoy opened this issue May 29, 2024 · 5 comments
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@daveoy
Copy link

daveoy commented May 29, 2024

What would you like to be added:

the message field from kube_node_status_condition would be helpful to have for visualization and alerting purposes -- especially for custom conditions

Why is this needed:

to enhance alerts based on these conditions or add context to visualizations of node status conditions

Describe the solution you'd like

add a message label to the existing timeseries generated by the generator family function that tracks this metric

Additional context

could it be as simple as adding the desired field as a label here

func createNodeStatusConditionFamilyGenerator() generator.FamilyGenerator {
return *generator.NewFamilyGeneratorWithStability(
"kube_node_status_condition",
"The condition of a cluster node.",
metric.Gauge,
basemetrics.STABLE,
"",
wrapNodeFunc(func(n *v1.Node) *metric.Family {
ms := make([]*metric.Metric, len(n.Status.Conditions)*len(conditionStatuses))
// Collect node conditions and while default to false.
for i, c := range n.Status.Conditions {
conditionMetrics := addConditionMetrics(c.Status)
for j, m := range conditionMetrics {
metric := m
metric.LabelKeys = []string{"condition", "status"}
metric.LabelValues = append([]string{string(c.Type)}, metric.LabelValues...)
ms[i*len(conditionStatuses)+j] = metric
}
}
return &metric.Family{
Metrics: ms,
}
}),
)
}
?

@daveoy daveoy added the kind/feature Categorizes issue or PR as related to a new feature. label May 29, 2024
@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label May 29, 2024
@ricardoapl
Copy link
Member

ricardoapl commented May 30, 2024

I would be happy to submit a patch to support your use case.

However, I'm afraid such label could have a significant impact on cardinality. It seems to me that conditions DiskPressure, MemoryPressure, PIDPressure, and NetworkUnavailable have a small, bounded set of values for the message field, whereas condition Ready doesn't.

Let's wait for others to comment on this.

@daveoy
Copy link
Author

daveoy commented May 30, 2024

Thanks for that. I'm happy to contribute the patch as well.

Regarding cardinality; Perhaps it could be feature flagged or something?

@dgrisonnet
Copy link
Member

I agree with @ricardoapl, message is unbounded so it could cause cardinality explosions if we introduced it. Especially if some variable information such as timestamps are introduced inside of it.

Regarding cardinality; Perhaps it could be feature flagged or something?

I wouldn't be in favor of adding any feature to kube-state-metrics that could harm users clusters, even behind feature flags. In the end we would still be responsible for the cardinality explosions that are bound to happen with such metrics.

@daveoy
Copy link
Author

daveoy commented Jun 13, 2024

how about if enablement meant automatically adding a labeldrop blacklist that doesn't allow message / reason etc that the user had to repeal:

relabel_configs:
- action: labeldrop
  regex: message
- action: labeldrop
  regex: reason

or perhaps a default regex sort of thing in kubernetes sd objects (serviceMonitor,podMonitor etc) -- and docs for non-kubernetes config use cases? this way users could allow a certain set of regex-matching messages through and contain the chaos?

@dgrisonnet
Copy link
Member

/assign @ricardoapl
/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jun 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

4 participants