-
Notifications
You must be signed in to change notification settings - Fork 228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[release-0.6] WaitForPodsReady: Reset .status.requeueState once the workload is deactivated #2220
Conversation
✅ Deploy Preview for kubernetes-sigs-kueue canceled.
|
3344b67
to
d79a196
Compare
d79a196
to
a811079
Compare
a811079
to
7267bb7
Compare
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: tenzen-y The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
7267bb7
to
70de278
Compare
183fab0
to
ef502f0
Compare
…ctivated Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
ef502f0
to
5609e61
Compare
still WIP? |
@alculquicondor This PR is ready for review right now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
LGTM label has been added. Git tree hash: 89e36512bbe041155d132d044739c277a459ee6c
|
What type of PR is this?
/kind bug
What this PR does / why we need it:
If the Workload is deactivated, the workload resets the requeueState to avoid a race condition between the kueue-scheduler and the workload controller in the case of the deactivated workload.
When the workload is deactivated by exceeding the backoffLimitCount, the workload is added
The workload is deactivated due to exceeding the maximum number of re-queuing retries
. I will introduce a dedicated reason in another PR to focus on fixing the bug.Which issue(s) this PR fixes:
Fixes #2174
Special notes for your reviewer:
This PR is cherry-pick of #2219
The primary difference from #2219 is the criteria to deduce that the Workload is deactivated due to exceeding the backoffLimitCount.
In the v0.7 implementation, the deactivated workload doesn't have the
.status.requeuesState.requeueAt
since therequeueAt
should be reset when the workload was requeued (Requeued=true
condition).But, in the v0.6 implementation, an empty
requeueAt
is dropped from the criteria since we never reset therequeueAt
andRequeued
condition, a new feature for the v0.7.Does this PR introduce a user-facing change?