-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow to update JobSets on suspend #644
Conversation
✅ Deploy Preview for kubernetes-sigs-jobset canceled.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/assign @ahg-g @danielvegamyhre
test/e2e/e2e_test.go
Outdated
@@ -131,6 +132,70 @@ var _ = ginkgo.Describe("JobSet", func() { | |||
}) | |||
}) | |||
|
|||
// This test is added to test the JobSet transitions as Kueue would when: | |||
// doing: resume in RF1 -> suspend -> resume in RF2. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use the explicit name ResourceFlavor
instead of the abbreviation RF
for comments/documentation in JobSet repo, since "RF" will be unfamiliar to developers who don't work with Kueue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. Done
test/e2e/e2e_test.go
Outdated
|
||
ginkgo.By("Create a suspended JobSet", func() { | ||
js.Spec.Suspend = ptr.To(true) | ||
js.Spec.TTLSecondsAfterFinished = ptr.To[int32](5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need to set TTL for this test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not needed really, used it just to make sure the Job is deleted at some point, but we don't need to. Reverted.
/lgtm I'll remove the hold once CI tests pass. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: danielvegamyhre, mimowo The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@kannon92 @danielvegamyhre could we cherry-pick this? |
/hold cancel |
v0.6.0 will be released next week, I think that would be easier. Is it urgent to cherry pick this into v0.5.x? |
I think 0.6 is fine. |
@danielvegamyhre I think you said there was some issues with kicking off v0.6.0 so we are a holding pattern for this. ref: #523 (comment) Is it worth cherry-picking this to unblock Kueue? |
/cherry-pick release-0.5.0 |
/cherry-pick release-0.5 |
opened up #651 (comment) |
* E2e for updating JobSet on suspend * Allow to mutate the PodTemplate in JobSet on suspend * Review remarks --------- Co-authored-by: Michal Wozniak <michalwozniak@google.com>
What type of PR is this?
/kind bug
What this PR does / why we need it:
To fix the main scenarios of integration with Kueue, where Kueue may want to suspend a workload,
and re-admit in another ResourceFlavor (with different nodeSelectors)
Which issue(s) this PR fixes:
Fixes #624
Special notes for your reviewer:
This is a minimal fix for the main scenarios needed for integration with Kueue - it does not
allow to "restore" fully PodTemplate, but it allows to overwrite it, which is enough in most cases
See previous approach: #640
There are two commits:
Does this PR introduce a user-facing change?