Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kube_job_failed should have reason label #2382

Closed
keisku opened this issue Apr 29, 2024 · 5 comments
Closed

kube_job_failed should have reason label #2382

keisku opened this issue Apr 29, 2024 · 5 comments
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@keisku
Copy link

keisku commented Apr 29, 2024

What would you like to be added:

Add reason label here other than condition.

*generator.NewFamilyGeneratorWithStability(
"kube_job_failed",
"The job has failed its execution.",
metric.Gauge,
basemetrics.STABLE,
"",
wrapJobFunc(func(j *v1batch.Job) *metric.Family {
ms := []*metric.Metric{}
for _, c := range j.Status.Conditions {
if c.Type == v1batch.JobFailed {
metrics := addConditionMetrics(c.Status)
for _, m := range metrics {
metric := m
metric.LabelKeys = []string{"condition"}
ms = append(ms, metric)
}
}
}
return &metric.Family{
Metrics: ms,
}
}),
),

Why is this needed:

Enable to monitor why a job fails.

Describe the solution you'd like

-- 					metric.LabelKeys = []string{"condition"}
++ 					metric.LabelKeys = []string{"condition", "reason"}

Additional context

I would like to know why the current implementation doesn't allow adding reason label. Any concerns?

@keisku keisku added the kind/feature Categorizes issue or PR as related to a new feature. label Apr 29, 2024
@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Apr 29, 2024
@ricardoapl
Copy link
Member

I would like to know why the current implementation doesn't allow adding reason label. Any concerns?

I don't know, but wouldn't using kube_job_status_failed instead of kube_job_failed solve your issue?

@logicalhan
Copy link
Member

/triage accepted
/assign @CatherineF-dev

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 2, 2024
@CatherineF-dev
Copy link
Contributor

Yes, kube_job_status_failed has reason.

kube_job_status_failed{job_name="FailedJob1",namespace="ns1",reason="DeadlineExceeded"} 0

@CatherineF-dev
Copy link
Contributor

/close

@k8s-ci-robot
Copy link
Contributor

@CatherineF-dev: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

5 participants