-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Emit Job creation failed event #448
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: danielvegamyhre The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
✅ Deploy Preview for kubernetes-sigs-jobset canceled.
|
pkg/controllers/jobset_controller.go
Outdated
@@ -475,6 +475,9 @@ func (r *JobSetReconciler) createJobs(ctx context.Context, js *jobset.JobSet, ow | |||
} | |||
allErrs := errors.Join(finalErrs...) | |||
if allErrs != nil { | |||
// Emit event to propagate the Job creation failures up to be more visible to the user. | |||
// TODO(#422): Investigate ways to validate Job templates at JobSet validation time. | |||
r.Record.Eventf(js, corev1.EventTypeWarning, "JobCreationFailed", "Job creation(s) failed with error: %s", allErrs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a bit more information to this event? If we had multiple jobs, which job creation failed? I guess validation would be in logs or kubetctl somewhere so they could dig into this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the allErrs
variable should have these details
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we sure that events actually shows the errors in all its entirity?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be safe, I updated the PR and wrapped the err
returned by r.Create(...)
in a custom error which includes the Job name at the beginning of the error message, let me know what you think of this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/hold
maybe @ahg-g wants to weigh in.
/lgtm |
/hold cancel |
Fixes #447
I opted to only emit 1 event if any Job creation fails, rather than emit a separate event for each Job failure (since there could be many).
While unconventional, I included the error message in the event message since the intent is to allow the user to quickly see why Jobs aren't being created without digging through logs.