generated from kubernetes/kubernetes-template-project
-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Include first failed job name in event emitted when JobSet fails, as well as the JobSet failure condition #477
Merged
Merged
Changes from all commits
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
3c293f4
include first failed job name in event and jobset failed condition
danielvegamyhre 4d1bc06
address comments and refactor
danielvegamyhre a887df4
add comments to constants
danielvegamyhre 381fa92
move constants to pkg/constants
danielvegamyhre 5b33e6f
fix dockerfile
danielvegamyhre File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,56 @@ | ||
/* | ||
Copyright 2023 The Kubernetes Authors. | ||
|
||
Licensed under the Apache License, Version 2.0 (the "License"); | ||
you may not use this file except in compliance with the License. | ||
You may obtain a copy of the License at | ||
|
||
http://www.apache.org/licenses/LICENSE-2.0 | ||
|
||
Unless required by applicable law or agreed to in writing, software | ||
distributed under the License is distributed on an "AS IS" BASIS, | ||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
See the License for the specific language governing permissions and | ||
limitations under the License. | ||
*/ | ||
|
||
package constants | ||
|
||
const ( | ||
// JobOwnerKey is the field used to build the JobSet index, which enables looking up Jobs | ||
// by the owner JobSet quickly. | ||
JobOwnerKey = ".metadata.controller" | ||
|
||
// RestartsKey is an annotation and label key which defines the restart attempt number | ||
// the JobSet is currently on. | ||
RestartsKey = "jobset.sigs.k8s.io/restart-attempt" | ||
|
||
// MaxParallelism defines the maximum number of parallel Job creations/deltions that | ||
// the JobSet controller can perform. | ||
MaxParallelism = 50 | ||
|
||
// Event reason and message for when a JobSet fails due to reaching max restarts | ||
// defined in its failure policy. | ||
ReachedMaxRestartsReason = "ReachedMaxRestarts" | ||
ReachedMaxRestartsMessage = "jobset failed due to reaching max number of restarts" | ||
|
||
// Event reason and message for when a JobSet fails due to any Job failing, when | ||
// no failure policy is defined. | ||
// This is the default failure handling behavior. | ||
FailedJobsReason = "FailedJobs" | ||
FailedJobsMessage = "jobset failed due to one or more job failures" | ||
|
||
// Event reason and message for when a Jobset completes successfully. | ||
AllJobsCompletedReason = "AllJobsCompleted" | ||
AllJobsCompletedMessage = "jobset completed successfully" | ||
|
||
// Event reason used when a Job creation fails. | ||
// The event uses the error(s) as the reason. | ||
JobCreationFailedReason = "JobCreationFailed" | ||
|
||
// Event reason and message for when the pod controller detects a violation | ||
// of the JobSet exclusive placment policy (i.e., follower pods not colocated in | ||
// the same topology domain as the leader pod for that Job). | ||
ExclusivePlacementViolationReason = "ExclusivePlacementViolation" | ||
ExclusivePlacementViolationMessage = "Pod violated JobSet exclusive placement policy" | ||
) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we move these out of this file?
They don't require the controller so it may be useful to move them to a separate file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah jobset_controller.go is overdue for refactoring. To start I think we can refactor some of the many helper functions into separate files based on the feature (similar to what you did with startup policy).
I did some refactoring in this PR (e.g. moving some functions into success_policy.go, adding a constants pkg, etc.)
However, for these particular functions, I'm not sure of the best place to put them yet. They are about finding the first failed job for a Jobset and generating an event message for it, which doesn't fit into any existing (or new) logical grouping.
I think for now we should leave these 3 functions here and maybe in a separate PR we can refactor some more, I don't want to go overboard splitting things up.