-
Notifications
You must be signed in to change notification settings - Fork 228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Plain Pod gets deleted once admitted via ProvisioningRequest (DWS) #2239
Plain Pod gets deleted once admitted via ProvisioningRequest (DWS) #2239
Conversation
Skipping CI for Draft Pull Request. |
/assign |
✅ Deploy Preview for kubernetes-sigs-kueue ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
/assign |
/test all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
LGTM label has been added. Git tree hash: 5e8cbb0f5bedafeb681fe1d3b14d0f5068185dee
|
func comparePodTemplate(a, b *corev1.PodSpec) bool { | ||
if !equality.Semantic.DeepEqual(a.Tolerations, b.Tolerations) { | ||
func comparePodTemplate(a, b *corev1.PodSpec, ignoreTolerations bool) bool { | ||
if !ignoreTolerations && !equality.Semantic.DeepEqual(a.Tolerations, b.Tolerations) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently this PR only ignores the equality check on tolerations, while the comment indicates we should also relax the validation for nodeSelectors. How confident we are tolerations are enough?
IIUC the autoscaling.gke.io/provisioning-request
nodeSelector can be added as well based on the annotation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The node selectors are not part of the equivalency checks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, thanks for clarifying. Do we have this covered in a unit / integration test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not ignored, because it doesn't participate in comparison as of now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would just like to have a test to make sure this is the case, to prevent future regressions, since this is an important scenario.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anyway we can probably defer this to a follow up manual testing
@@ -745,17 +745,17 @@ func equivalentToWorkload(ctx context.Context, c client.Client, job GenericJob, | |||
jobPodSets := clearMinCountsIfFeatureDisabled(job.PodSets()) | |||
|
|||
if runningPodSets := expectedRunningPodSets(ctx, c, wl); runningPodSets != nil { | |||
if equality.ComparePodSetSlices(jobPodSets, runningPodSets) { | |||
if equality.ComparePodSetSlices(jobPodSets, runningPodSets, workload.IsAdmitted(wl)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This ignoring of tolerations is done independently of the framework, but there is no issue with jobs, just with pods. I'm wondering if we should make the fix more specific. WDYT @alculquicondor ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On one hand the relaxation of the check isn't necessary for Jobs.
On the other it seems that scoping the fix would be more involving because we would need to pass this information, the options I see:
- add a new param to
FindMatchingWorkloads
- add a new interface like
CustomEquivalenceConfigJob
with a function likeignoreTolerations() bool
None of these is appealing, especially since we will not need the custom options when the proper fix is implemented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ok with this for now. But can we use the condition status of QuotaReserved, instead of Admitted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it makes any difference, the toleration are only changing after the pods are unsuspended.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sg, thanks for the reminder.
wantJob: *baseJobWrapper.Clone().Toleration(corev1.Toleration{ | ||
Key: "tolerationkey1", | ||
Key: "tolerationkey2", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make sure we have an analogous test (either by adding or finding an existing) which demonstrate that node selectors are not part of the equality check.
@@ -745,17 +745,17 @@ func equivalentToWorkload(ctx context.Context, c client.Client, job GenericJob, | |||
jobPodSets := clearMinCountsIfFeatureDisabled(job.PodSets()) | |||
|
|||
if runningPodSets := expectedRunningPodSets(ctx, c, wl); runningPodSets != nil { | |||
if equality.ComparePodSetSlices(jobPodSets, runningPodSets) { | |||
if equality.ComparePodSetSlices(jobPodSets, runningPodSets, workload.IsAdmitted(wl)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ok with this for now. But can we use the condition status of QuotaReserved, instead of Admitted?
for name, tc := range cases { | ||
t.Run(name, func(t *testing.T) { | ||
got := ComparePodSetSlices(tc.a, tc.b) | ||
got := ComparePodSetSlices(tc.a, tc.b, false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needs a unit test for true too
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: alculquicondor, vladikkuzn The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/test pull-kueue-test-integration-main |
we can leave the additional test for a follow up. /hold cancel |
/cherry-pick release-0.6 |
@alculquicondor: once the present PR merges, I will cherry-pick it on top of release-0.6 in a new PR and assign it to you. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
@alculquicondor: #2239 failed to apply on top of branch "release-0.6":
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
* Test for ignoreTolerations
* Test for ignoreTolerations
* Test for ignoreTolerations * Add node selector to test to make sure it's not a part of equality check
* Test for ignoreTolerations * Add node selector to test to make sure it's not a part of equality check
* Test for ignoreTolerations * Add node selector to test to make sure it's not a part of equality check
* Add separate unit test for node selectors
* Add separate unit test for node selectors
* Rename wantEqual -> wantEquivalent
/release-note-edit
|
/release-note-edit
|
What type of PR is this?
/kind bug
What this PR does / why we need it:
Ignores pods' tolerations during equality check for admitted workloads
Which issue(s) this PR fixes:
Fixes #2213
Special notes for your reviewer:
Does this PR introduce a user-facing change?