Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WaitForPodsReady: Reset .status.requeueState.count once the workload is deactivated #2219

Conversation

tenzen-y
Copy link
Member

@tenzen-y tenzen-y commented May 17, 2024

What type of PR is this?

/kind bug

What this PR does / why we need it:

If the Workload is deactivated, the workload resets the requeueState to avoid a race condition between the kueue-scheduler and the workload controller in the case of the deactivated workload.

When the workload is deactivated by exceeding the backoffLimitCount, the workload is added the The workload is deactivated due to exceeding the maximum number of re-queuing retries. I will introduce a dedicated reason in another PR to focus on fixing the bug.

Which issue(s) this PR fixes:

Fixes #2174

Special notes for your reviewer:

The cherry-pick was manually created here #2220.
Because we have significant changes in the mechanism of re-queuing between v0.6 and the main branch.

Does this PR introduce a user-facing change?

WaitForPodsReady: Fix a bug that causes the reactivated Workload to be immediately deactivated even though it doesn't exceed the backoffLimit.

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels May 17, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: tenzen-y

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels May 17, 2024
Copy link

netlify bot commented May 17, 2024

Deploy Preview for kubernetes-sigs-kueue canceled.

Name Link
🔨 Latest commit f49c1a2
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-kueue/deploys/6654a226c077d50008907b39

Comment on lines -842 to -901

if tc.wantWorkload != nil {
if requeueState := tc.wantWorkload.Status.RequeueState; requeueState != nil && requeueState.RequeueAt != nil {
gotRequeueState := gotWorkload.Status.RequeueState
if gotRequeueState != nil && gotRequeueState.RequeueAt != nil {
if !gotRequeueState.RequeueAt.Equal(requeueState.RequeueAt) {
t.Errorf("Unexpected requeueState.requeueAt; gotRequeueAt %v needs to be after requeueAt %v", requeueState.RequeueAt, gotRequeueState.RequeueAt)
}
} else {
t.Errorf("Unexpected nil requeueState.requeuAt; requeueState.requeueAt shouldn't be nil")
}
}
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This verification has been invalid since we introduced the fake clock.

@alculquicondor
Copy link
Contributor

cc @mimowo

if wl.Status.Admission != nil {
metrics.ReportEvictedWorkloads(string(wl.Status.Admission.ClusterQueue), kueue.WorkloadEvictedByDeactivation)
if !apimeta.IsStatusConditionTrue(wl.Status.Conditions, kueue.WorkloadEvicted) {
workload.SetEvictedCondition(&wl, kueue.WorkloadEvictedByDeactivation, "The workload is deactivated")
Copy link
Contributor

@alculquicondor alculquicondor May 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I don't like about this is that we are losing context as to why the workload was deactivated.

In addition to adding the evicted condition here, could we add it inside reconcileNotReadyTimeout so that, if the Kueue controller doesn't restart of fail to send the status update, we keep that information in the Evicted reason and message?

Or maybe we can just deduce that the deactivation is because of the backoffLimit by looking at the requeueState.

Copy link
Contributor

@mimowo mimowo May 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, we have the status.requeueState in this function, just before we clean it just above, can we use it in the message, for example to say "The workload is deactivated due to 5 failed attempts to run all pods within the PodsReady timeout"? We probably don't want to set "WorkloadInactive" reason prior to setting active=false.

I think long-term the cleanest would be if active was in status, then we could do the following in one request:

active: false
conditions:
  Evicted: true
  Reason: PodsReadyTimeoutExceeded
  Message: The PodsReady timeout of 5min is exceeded
  Deactivated: true
  Reason: PodsReadyBackoffExceeded
  Message: Deactivated the workload after 5 attempts due to exceeding the PodsReady timeout.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I don't like about this is that we are losing context as to why the workload was deactivated.

In addition to adding the evicted condition here, could we add it inside reconcileNotReadyTimeout so that, if the Kueue controller doesn't restart of fail to send the status update, we keep that information in the Evicted reason and message?

Or maybe we can just deduce that the deactivation is because of the backoffLimit by looking at the requeueState.

I think it would be better to deduce the reason here to avoid 2 API calls (spec and status) in the reconcileNotReadyTimeout . I think that we can say the workload is deactivated by exceeding the limit in the case of .spec.active=false AND .status.requeueState.requeueAt == nil AND .status.requeueState.count != nil.
@alculquicondor WDYT?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, we have the status.requeueState in this function, just before we clean it just above, can we use it in the message, for example to say "The workload is deactivated due to 5 failed attempts to run all pods within the PodsReady timeout"? We probably don't want to set "WorkloadInactive" reason prior to setting active=false.

@mimowo Actually, we have the similar discussion here: #2175
So, could we work on enhancing the reason as another feature to focus on fixing the bug?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM. I suspect you want to have #2175 in 0.7?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect you want to have #2175 in 0.7?

Yes, I would like to include #2175 in v0.7.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, could we work on enhancing the reason as another feature to focus on fixing the bug?

sgtm, I was actually suggesting that the current place is ok, since we have the data at hand if we want to build the message. I would be fine to build the message like "The workload is deactivated due to exceeding the PodsReady timeout 5 times in a row.". I'm also ok leave it for #2175.

ginkgo.By("evicted re-admitted workload should have 2 in the re-queue count")
util.ExpectWorkloadToHaveRequeueCount(ctx, k8sClient, client.ObjectKeyFromObject(prodWl), ptr.To[int32](2))
time.Sleep(podsReadyTimeout)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this a real sleep?
can it be made a few miliseconds just for this test?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping @tenzen-y

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alculquicondor Thank you for reminding this. Does it mean that decreasing the timeout would be better?

podsReadyTimeout = 3 * time.Second

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. 3s is too long

Copy link
Member Author

@tenzen-y tenzen-y May 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me try it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I'm even wondering if we could reduce 3s for other places, maybe we should review that

Copy link
Contributor

@mimowo mimowo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall

@mimowo mimowo mentioned this pull request May 22, 2024
@tenzen-y tenzen-y changed the title WaitForPodsReady: Reset .status.requeueState.count once the workload is deactivated WIP: WaitForPodsReady: Reset .status.requeueState.count once the workload is deactivated May 22, 2024
@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels May 22, 2024
@tenzen-y
Copy link
Member Author

I found that the current podsready integration test is not correct. So, I will fix the test in another PR, first.

@tenzen-y tenzen-y force-pushed the reset-backoff-count-when-reached-requeuing-limit-count branch 8 times, most recently from 2cbde11 to d49bab4 Compare May 25, 2024 07:48
…is deactivated

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
@tenzen-y tenzen-y force-pushed the reset-backoff-count-when-reached-requeuing-limit-count branch from d49bab4 to ff4ca7f Compare May 25, 2024 07:51
@tenzen-y tenzen-y changed the title WIP: WaitForPodsReady: Reset .status.requeueState.count once the workload is deactivated WaitForPodsReady: Reset .status.requeueState.count once the workload is deactivated May 26, 2024
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 26, 2024
((!workload.HasRequeueState(&wl) && ptr.Equal(r.waitForPodsReady.requeuingBackoffLimitCount, ptr.To[int32](0))) ||
(workload.HasRequeueState(&wl) && wl.Status.RequeueState.RequeueAt == nil &&
ptr.Equal(wl.Status.RequeueState.Count, r.waitForPodsReady.requeuingBackoffLimitCount))) {
message = fmt.Sprintf("%s by exceeded the maximum number of re-queuing retries", message)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mimowo @alculquicondor I added the dedicated message like this based on your recommendation. HDYT?
Also, let me try to add a dedicated reason in another enhancement PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the dedicated message is useful

@tenzen-y
Copy link
Member Author

I found that the current podsready integration test is not correct. So, I will fix the test in another PR, first.

I found that the test issues depend on resetting the whole of the .status.requeueState field. So, I fixed the issues in this PR.

@@ -193,7 +193,7 @@ var _ = ginkgo.Describe("SchedulerWithWaitForPodsReady", func() {

var _ = ginkgo.Context("Short PodsReady timeout", func() {
ginkgo.BeforeEach(func() {
podsReadyTimeout = 3 * time.Second
podsReadyTimeout = 250 * time.Millisecond
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decreased the timeout to 250 ms in all short podsready timeout cases.
HDYT? @mimowo @alculquicondor

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice win! This will speed up the tests. Can we even go below that, say 10ms?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to use the 10ms, but test case the below failed:

util.ExpectWorkloadToHaveRequeueState(ctx, k8sClient, client.ObjectKeyFromObject(prodWl), &kueue.RequeueState{
Count: ptr.To[int32](1),
}, false)

  [FAILED] Timed out after 25.000s.
  The function passed to Eventually failed at /Users/tenzen/go/src/sigs.k8s.io/kueue/test/util/util.go:354 with:
  Expected object to be comparable, diff:   &v1beta1.RequeueState{
  -     Count: &2,
  +     Count: &1,
        ... // 1 ignored field
    }
  
  In [It] at: /Users/tenzen/go/src/sigs.k8s.io/kueue/test/integration/scheduler/podsready/scheduler_test.go:261 @ 05/27/24 16:59:35.799

I guess that the workload controller could not observe the transition of the state.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, maybe this is because the scheduler admits the workload before we can set active: false. This might be the reason the time was set to 3s to make sure such race conditions don't happen. Probably reducing to 1s might be safer.

Also, for the new tests, maybe we could consider moving them here: https://github.com/kubernetes-sigs/kueue/blob/ff4ca7f566e4a12e13ffc3db62cbe896e7993685/test/integration/controller/jobs/job/job_controller_test.go#L1743C39-L1743C92. WDYT?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, maybe this is because the scheduler admits the workload before we can set active: false.

Yeah, I agree with you.

This might be the reason the time was set to 3s to make sure such race conditions don't happen.

This guess doesn't seem to be correct since you added this case when you implemented the podsready feature here: 9d9336b
When we added the podsready feature, we didn't have the deactivation feature.

Probably reducing to 1s might be safer.

SGTM

Also, for the new tests, maybe we could consider moving them here: https://github.com/kubernetes-sigs/kueue/blob/ff4ca7f566e4a12e13ffc3db62cbe896e7993685/test/integration/controller/jobs/job/job_controller_test.go#L1743C39-L1743C92.

I think that this case still valid to put here (scheduler+controller) since it would be better to keep observe not to happen the unexpected race conditions bwteen the scheduler and controllers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This guess doesn't seem to be correct since you added this case when you implemented the podsready feature here: 9d9336b

makes sense, I don't remember why it was 3s, I'm ok to reduce it to as low value as possible, provided the tests continue to pass.

I think that this case still valid to put here (scheduler+controller) since it would be better to keep observe not to happen the unexpected race conditions bwteen the scheduler and controllers.

Makes sense, on the other hand it exposes us to the race conditions. If you still think the tests are valuable with scheduler I'm ok to keep them here. Maybe we can consider two types of short timeouts: 10ms for tests which are not vulnerable to the race conditions, and longer 250ms or 1s for tests which are vulnerable?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, on the other hand it exposes us to the race conditions. If you still think the tests are valuable with scheduler I'm ok to keep them here. Maybe we can consider two types of short timeouts: 10ms for tests which are not vulnerable to the race conditions, and longer 250ms or 1s for tests which are vulnerable?

That sounds reasonable, but we need to restructure podsready test. So, could we just increase 250ms to 1s in this PR, and then could we try to restructure podsready integration test in followup PR?

@mimowo WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds reasonable, but we need to restructure podsready test. So, could we just increase 250ms to 1s in this PR, and then could we try to restructure podsready integration test in followup PR?

sgtm, thanks

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.
Also I opened an issue: #2285

Copy link
Contributor

@mimowo mimowo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall, just nits

((!workload.HasRequeueState(&wl) && ptr.Equal(r.waitForPodsReady.requeuingBackoffLimitCount, ptr.To[int32](0))) ||
(workload.HasRequeueState(&wl) && wl.Status.RequeueState.RequeueAt == nil &&
ptr.Equal(wl.Status.RequeueState.Count, r.waitForPodsReady.requeuingBackoffLimitCount))) {
message = fmt.Sprintf("%s by exceeded the maximum number of re-queuing retries", message)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
message = fmt.Sprintf("%s by exceeded the maximum number of re-queuing retries", message)
message = fmt.Sprintf("%s due to exceeding the maximum number of re-queuing retries", message)

nit, I think it will read better

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM, thank you for this recommendation!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -193,7 +193,7 @@ var _ = ginkgo.Describe("SchedulerWithWaitForPodsReady", func() {

var _ = ginkgo.Context("Short PodsReady timeout", func() {
ginkgo.BeforeEach(func() {
podsReadyTimeout = 3 * time.Second
podsReadyTimeout = 250 * time.Millisecond
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice win! This will speed up the tests. Can we even go below that, say 10ms?

@@ -1892,6 +1892,7 @@ var _ = ginkgo.Describe("Job controller interacting with scheduler", ginkgo.Orde
var _ = ginkgo.Describe("Job controller interacting with Workload controller when waitForPodsReady with requeuing strategy is enabled", ginkgo.Ordered, ginkgo.ContinueOnFailure, func() {
var (
backoffBaseSeconds int32
backLimitCount *int32
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
backLimitCount *int32
backoffLimitCount *int32

nit

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

ginkgo.By("checking the workload is evicted by deactivated due to PodsReadyTimeout")
gomega.Eventually(func(g gomega.Gomega) {
g.Expect(k8sClient.Get(ctx, wlKey, wl)).Should(gomega.Succeed())
g.Expect(wl.Status.RequeueState).Should(gomega.BeNil())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: let's assert also spec.active: false.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, good point! Thanks!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

ginkgo.By("checking the 'prod' workload is admitted")
util.ExpectWorkloadsToHaveQuotaReservation(ctx, k8sClient, prodClusterQ.Name, prodWl)
util.ExpectQuotaReservedWorkloadsTotalMetric(prodClusterQ, 1)
util.ExpectAdmittedWorkloadsTotalMetric(prodClusterQ, 1)
ginkgo.By("exceed the timeout for the 'prod' workload")
time.Sleep(podsReadyTimeout)
util.AwaitWorkloadEvictionByPodsReadyTimeoutAndSetRequeuedCondition(ctx, k8sClient, client.ObjectKeyFromObject(prodWl), podsReadyTimeout)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No strong view here, but I prefer not to mix too many responsibilities into a single function, the name looks long. Maybe we can have two functions: AwaitWorkloadEvictionByPodsReadyTimeout and SetRequeuedCondition?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds reasonable. Let me split this helper into two ones.
Thanks!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

… test

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
…to exceeding the backoffLimitCount

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
…into AwaitWorkloadEvictionByPodsReadyTimeout and SetRequeuedConditionWithPodsReadyTimeout

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
if err := workload.ApplyAdmissionStatus(ctx, r.client, &wl, true); err != nil {
return ctrl.Result{}, fmt.Errorf("setting eviction: %w", err)
} else {
var updated, evicted bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it is better to declare evicted as *string as it is used to bump the metric.

I worry that if we have a bool, then someone can forget updating line 219 when adding a new scenario for deactivation. WDYT?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mimowo Uhm, do you assume that the evicted has the reason for eviction? (here is InactiveWorkload)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, never mind, I thought for another scenario that Pods ready, but in that case it would still be WorkloadInactive

Copy link
Member Author

@tenzen-y tenzen-y May 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, after we introduce a dedicated reason, we may refactor here somehow, though.

Comment on lines 293 to 308
ginkgo.By("the reactivated workload should not be deactivated by the scheduler unless exceeding the backoffLimitCount")
gomega.Eventually(func(g gomega.Gomega) {
g.Expect(k8sClient.Get(ctx, client.ObjectKeyFromObject(prodWl), prodWl)).Should(gomega.Succeed())
prodWl.Spec.Active = ptr.To(true)
g.Expect(k8sClient.Update(ctx, prodWl)).Should(gomega.Succeed())
}, util.Timeout, util.Interval).Should(gomega.Succeed(), "Reactivate inactive Workload")
gomega.Eventually(func(g gomega.Gomega) {
g.Expect(k8sClient.Get(ctx, client.ObjectKeyFromObject(prodWl), prodWl)).Should(gomega.Succeed())
g.Expect(prodWl.Status.RequeueState).Should(gomega.BeNil())
}, util.Timeout, util.Interval).Should(gomega.Succeed())
util.FinishEvictionForWorkloads(ctx, k8sClient, prodWl)
util.ExpectWorkloadsToHaveQuotaReservation(ctx, k8sClient, prodClusterQ.Name, prodWl)
util.ExpectQuotaReservedWorkloadsTotalMetric(prodClusterQ, 4)
util.ExpectAdmittedWorkloadsTotalMetric(prodClusterQ, 4)
util.AwaitWorkloadEvictionByPodsReadyTimeout(ctx, k8sClient, client.ObjectKeyFromObject(prodWl), podsReadyTimeout)
util.SetRequeuedConditionWithPodsReadyTimeout(ctx, k8sClient, client.ObjectKeyFromObject(prodWl))
util.FinishEvictionForWorkloads(ctx, k8sClient, prodWl)
util.ExpectWorkloadToHaveRequeueState(ctx, k8sClient, client.ObjectKeyFromObject(prodWl), &kueue.RequeueState{
Count: ptr.To[int32](1),
}, false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No strong view, but it feels there is quite a potential for race conditions and flakes here.

Could we just end the test on the previous check, that the RequeueState is nil, and the metric is incremented?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No strong view, but it feels there is quite a potential for race conditions and flakes here.

I'm sure your concerns, but I think we need to have reactivation test to observe such #2174 bug.

So, let's replace the line 306-308 with verification of spec.active=true.
WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sure your concerns, but I think we need to have reactivation test to observe such #2174 bug.

That is true, but since we now check that requeueState is cleaned, I'm not sure how a similar bug could happen, maybe if this was stored in-memory, but I suppose the risk is minimal.

So, let's replace the line 306-308 with verification of spec.active=true.

Testing for spec.active=true would be better for sure.

But also ExpectWorkloadsToHaveQuotaReservation is time-sensitive and has a potential for raciness, because the workload gets evicted soon later AwaitWorkloadEvictionByPodsReadyTimeout. Maybe just check that the metrics were bumped - this approach would not be time-sensitive.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mimowo I think that there is no such race condition since this integration test doesn't have jobframework reconciler, which means the Job is never readmitted unless performing the FinishEvictionForWorkloads, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, got it. So if the PodsReady timeout happens quickly then you would still pass the ExpectWorkloadsToHaveQuotaReservation (L300) check, because it is not removed until you call FinishEvictionForWorkloads. It convinces me there is no race condition indeed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, exactly. I agree that comment would be helpful.

}, util.Timeout, util.Interval).Should(gomega.Succeed())
util.FinishEvictionForWorkloads(ctx, k8sClient, prodWl)
util.ExpectWorkloadsToHaveQuotaReservation(ctx, k8sClient, prodClusterQ.Name, prodWl)
Copy link
Contributor

@mimowo mimowo May 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to clarify the flow in this test by a comment along the lines:

// We await for re-admission. Then, the workload keeps the QuotaReserved
// condition even after timeout until FinishEvictionForWorkloads is called.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the suggestion!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@tenzen-y tenzen-y force-pushed the reset-backoff-count-when-reached-requeuing-limit-count branch from 0fcaab1 to f879c5c Compare May 27, 2024 14:39
}, util.Timeout, util.Interval).Should(gomega.Succeed())
// We await for re-admission. Then, the workload keeps the QuotaReserved condition
// even after timeout until FinishEvictionForWorkloads is called in line 307.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// even after timeout until FinishEvictionForWorkloads is called in line 307.
// even after timeout until FinishEvictionForWorkloads is called below.

nit

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
@tenzen-y tenzen-y force-pushed the reset-backoff-count-when-reached-requeuing-limit-count branch from f879c5c to f49c1a2 Compare May 27, 2024 15:09
@mimowo
Copy link
Contributor

mimowo commented May 27, 2024

/lgtm
Thanks!

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 27, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 2e93d6ec4b714b30d4180273d66424659345fd0b

@k8s-ci-robot k8s-ci-robot merged commit 464aca4 into kubernetes-sigs:main May 27, 2024
15 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v0.7 milestone May 27, 2024
@alculquicondor
Copy link
Contributor

/release-note-edit

WaitForPodsReady: Fix a bug that causes the reactivated Workload to be immediately deactivated even though it doesn't exceed the backoffLimit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
4 participants