Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Emit Job creation failed event #448

Merged
merged 2 commits into from
Mar 14, 2024
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions pkg/controllers/jobset_controller.go
Original file line number Diff line number Diff line change
Expand Up @@ -475,6 +475,9 @@ func (r *JobSetReconciler) createJobs(ctx context.Context, js *jobset.JobSet, ow
}
allErrs := errors.Join(finalErrs...)
if allErrs != nil {
// Emit event to propagate the Job creation failures up to be more visible to the user.
// TODO(#422): Investigate ways to validate Job templates at JobSet validation time.
r.Record.Eventf(js, corev1.EventTypeWarning, "JobCreationFailed", "Job creation(s) failed with error: %s", allErrs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a bit more information to this event? If we had multiple jobs, which job creation failed? I guess validation would be in logs or kubetctl somewhere so they could dig into this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the allErrs variable should have these details

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure that events actually shows the errors in all its entirity?

Copy link
Contributor Author

@danielvegamyhre danielvegamyhre Mar 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be safe, I updated the PR and wrapped the err returned by r.Create(...) in a custom error which includes the Job name at the beginning of the error message, let me know what you think of this

return allErrs
}
// Skip emitting a condition for StartupPolicy if JobSet is suspended
Expand Down