-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix casing while scraping failure reason for kube_job_status_failed
#2046
Fix casing while scraping failure reason for kube_job_status_failed
#2046
Conversation
|
This issue is currently awaiting triage. If kube-state-metrics contributors determine this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Welcome @juanjjaramillo! |
This metric is a stable metric. https://github.com/kubernetes/kube-state-metrics/blob/main/docs/job-metrics.md Changing this might break alerts using this reason. cc @dgrisonnet do we allow this change for stable metrics? |
Was it ever DeadLineExceeded in upstream kubernetes? If not this seems like a bug in kubestatemetrics |
Yes this is a bug, I believe it was always @juanjjaramillo could you perhaps also update the value in https://github.com/kubernetes/kube-state-metrics/blob/main/internal/store/job.go#L42? |
@CatherineF-dev yes we do. It is very rare that label value changes are breaking changes and even if they are I think we should always go forward with them, but if there is any doubt, we should always call the change out in the changelog/release note. Moreover in this particular case it fixes an actual bug that prevent the actual reason to be exposed, so we should definitely fix it. |
internal/store/job.go
Outdated
@@ -429,5 +430,5 @@ func failureReason(jc *v1batch.JobCondition, reason string) bool { | |||
if jc == nil { | |||
return false | |||
} | |||
return jc.Reason == reason | |||
return strings.EqualFold(jc.Reason, reason) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want to map the value exactly to make sure that we have a bounded list of potential values, so I don't think we should make that case insensitive. We should rather update the list of reasons and make sure we are more careful in the future to avoid these kind of typos.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense @dgrisonnet
I have already undo this change. Casing in https://github.com/kubernetes/kube-state-metrics/blob/main/internal/store/job.go#L42 was already part of the previous commit, so I only needed to update unit testing to make tests pass. Please let me know if something else is needed.
Thank you for the review!
kube_job_status_failed
kube_job_status_failed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @juanjjaramillo! :)
/lgtm
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dgrisonnet, juanjjaramillo The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What this PR does / why we need it:
When scraping the failure reason for a job, KSM does a case-sensitive search for the keyword
DeadLineExceeded
. Since Kubernetes is generating the stringDeadlineExceeded
, the search fails to scrape the reason. This PR proposes to do a case-insensitive search instead, so changes in casing do not affect the scraping of the reason.How does this change affect the cardinality of KSM: (increases, decreases or does not change cardinality)
Does not change cardinality
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #2045