-
Notifications
You must be signed in to change notification settings - Fork 228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make ordering timestamp configurable when Workload evicted due to Pods-ready #1542
Make ordering timestamp configurable when Workload evicted due to Pods-ready #1542
Conversation
✅ Deploy Preview for kubernetes-sigs-kueue canceled.
|
Hi @nstogner. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
PS: I still need to add the user-story test case. |
/ok-to-test |
432f4d7
to
6a1ed38
Compare
/assign |
62414b3
to
8d32336
Compare
/assign @mimowo |
ack |
apis/config/v1beta1/defaults.go
Outdated
@@ -111,12 +110,15 @@ func SetDefaults_Configuration(cfg *Configuration) { | |||
} | |||
cfg.WaitForPodsReady.BlockAdmission = &defaultBlockAdmission | |||
} | |||
if cfg.WaitForPodsReady.RequeuingTimestamp == "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use checking the nil here
apis/config/v1beta1/defaults.go
Outdated
DefaultQueueVisibilityUpdateIntervalSeconds int32 = 5 | ||
DefaultClusterQueuesMaxCount int32 = 10 | ||
DefaultJobFrameworkName = "batch/job" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change might be ok, but seems unrelated, so I would suggest reverting it so that the PR is focused.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I added this to avoid an import cycle. I updated the tests to be in a dedicated _test
package to make sure we keep the constants in sync.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sgtm, if required to avoid the cycle
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reverted package change
cmd/kueue/main.go
Outdated
@@ -370,6 +371,13 @@ func waitForPodsReady(cfg *configapi.Configuration) bool { | |||
return cfg.WaitForPodsReady != nil && cfg.WaitForPodsReady.Enable | |||
} | |||
|
|||
func podsReadyRequeuingTimestamp(cfg *configapi.Configuration) configapi.RequeuingTimestamp { | |||
if cfg.WaitForPodsReady != nil { | |||
return cfg.WaitForPodsReady.RequeuingTimestamp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we go for ptr, I think we will also need to check it is not nil here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or use ptr.Deref
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to pointer and added nil check
cmd/kueue/main.go
Outdated
if cfg.WaitForPodsReady != nil { | ||
return cfg.WaitForPodsReady.RequeuingTimestamp | ||
} | ||
return "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's return Eviction here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
||
func (e entryOrdering) Len() int { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I would suggest reverting the renames, but feel free to keep
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also think that the previous receiver name, e
would be better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
pkg/workload/workload.go
Outdated
if c := apimeta.FindStatusCondition(w.Status.Conditions, kueue.WorkloadEvicted); c != nil && c.Status == metav1.ConditionTrue && c.Reason == kueue.WorkloadEvictedByPodsReadyTimeout { | ||
return &c.LastTransitionTime | ||
func (o Ordering) GetQueueOrderTimestamp(w *kueue.Workload) *metav1.Time { | ||
if o.PodsReadyRequeuingTimestamp == config.Eviction || o.PodsReadyRequeuingTimestamp == "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if o.PodsReadyRequeuingTimestamp == config.Eviction || o.PodsReadyRequeuingTimestamp == "" { | |
if o.PodsReadyRequeuingTimestamp == config.Eviction { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should not be needed, if we adjust fallback in podsReadyRequeuingTimestamp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
gomega.Expect(createdEquivJob2.Spec.Template.Spec.NodeSelector[instanceKey]).Should(gomega.Equal(equivalentFlavorB.Name)) | ||
util.ExpectPendingWorkloadsMetric(equivalentClusterQ, 0, 0) | ||
|
||
time.Sleep(time.Second) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see the first sleep is needed to make sure the creation timestamps differ.
However, can we simplify the scenario, so that it captures the change, but uses at most one sleep(1s)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing these tests as per Aldo's comment
gomega.Expect(k8sClient.Create(ctx, wl1)).Should(gomega.Succeed()) | ||
time.Sleep(time.Second) | ||
gomega.Expect(k8sClient.Create(ctx, wl2)).Should(gomega.Succeed()) | ||
time.Sleep(time.Second) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly here, can we capture the essence with just 1 sleep?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The expectation here is workload 1 gets admitted (need spacing between 1 & 2), then workload 2 gets admitted (need spacing between 2 & 3 to make this consistent), then workload 1 gets admitted again. I think we need the workloads to be spaced out in time to have trigger this behavior and maintain the conditions being asserted on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out that my test was wrong and wl2 was getting admitted on borrowed capacity. I simplified the test to get rid of borrowed capacity and use 2 workloads and 1 sleep.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can think of an alternative approach in which we add a timestamp field in Workload.Info
that is populated by the queue manager as the workloads are added or updated. Then the rest of the system can use this cached timestamp without having to understand where it's coming from.
It might simplify some code, but I'll leave it up to you to decide which approach to keep.
|
||
const ( | ||
// Creation timestamp (from Workload .metadata.creationTimestamp). | ||
Creation RequeuingTimestamp = "Creation" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Creation RequeuingTimestamp = "Creation" | |
CreationTimestamp RequeuingTimestamp = "Creation" |
Creation RequeuingTimestamp = "Creation" | ||
|
||
// Eviction timestamp (from Workload .status.conditions). | ||
Eviction RequeuingTimestamp = "Eviction" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eviction RequeuingTimestamp = "Eviction" | |
EvictionTimestamp RequeuingTimestamp = "Eviction" |
cmd/kueue/main.go
Outdated
@@ -370,6 +371,13 @@ func waitForPodsReady(cfg *configapi.Configuration) bool { | |||
return cfg.WaitForPodsReady != nil && cfg.WaitForPodsReady.Enable | |||
} | |||
|
|||
func podsReadyRequeuingTimestamp(cfg *configapi.Configuration) configapi.RequeuingTimestamp { | |||
if cfg.WaitForPodsReady != nil { | |||
return cfg.WaitForPodsReady.RequeuingTimestamp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or use ptr.Deref
apis/config/v1beta1/defaults_test.go
Outdated
UpdateIntervalSeconds: 10, | ||
ClusterQueues: &ClusterQueueVisibility{ | ||
ClusterQueues: &v1beta1.ClusterQueueVisibility{ | ||
MaxCount: 0, | ||
}, | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add a case, "defaulting waitForPodsReady.RequeuingTimestamp"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The assertion is being captured in the renamed defaulting waitForPodsReady.*
case ... it was already asserting on other waitForPodsReady fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes sense.
func GetQueueOrderTimestamp(w *kueue.Workload) *metav1.Time { | ||
if c := apimeta.FindStatusCondition(w.Status.Conditions, kueue.WorkloadEvicted); c != nil && c.Status == metav1.ConditionTrue && c.Reason == kueue.WorkloadEvictedByPodsReadyTimeout { | ||
return &c.LastTransitionTime | ||
func (o Ordering) GetQueueOrderTimestamp(w *kueue.Workload) *metav1.Time { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add a unit test since this function is exported?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test is there, updated it just now
|
||
func (e entryOrdering) Len() int { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also think that the previous receiver name, e
would be better.
/kind api-change |
24ec9d2
to
64404b5
Compare
Ah, I missed you comment and implemented all of the other requested changes. I am looking at what you suggested now. |
While I think this would remove some of the configuration propagation that is occurring at the moment, it seems a bit unnatural to carry a global config value around in each instance of a Workload (via |
64404b5
to
3f738ee
Compare
apis/config/v1beta1/defaults.go
Outdated
defaultRequeuingTimestamp := EvictionTimestamp | ||
cfg.WaitForPodsReady.RequeuingTimestamp = &defaultRequeuingTimestamp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
defaultRequeuingTimestamp := EvictionTimestamp | |
cfg.WaitForPodsReady.RequeuingTimestamp = &defaultRequeuingTimestamp | |
cfg.WaitForPodsReady.RequeuingTimestamp = ptr.To(EvictionTimestamp) |
@@ -38,7 +39,10 @@ func TestFIFOClusterQueue(t *testing.T) { | |||
Spec: kueue.ClusterQueueSpec{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually preferred putting &kueue.ClusterQueue
in a new line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing back to new line
}) | ||
|
||
ginkgo.By("waiting the timeout, the first workload should be evicted and readmitted", func() { | ||
time.Sleep(podsReadyTimeout) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use Eventually to detect that the Workload gets the Eviction condition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
util.FinishEvictionForWorkloads()
has a retry loop built in - using that and removing the sleep.
}) | ||
|
||
ginkgo.By("waiting the timeout, the first workload should be evicted and the second should be admitted", func() { | ||
time.Sleep(podsReadyTimeout) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use Eventually to detect that the Workload gets the Eviction condition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing the sleep
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just some nits.
apis/config/v1beta1/defaults.go
Outdated
@@ -41,6 +39,7 @@ const ( | |||
defaultPodsReadyTimeout = 5 * time.Minute | |||
DefaultQueueVisibilityUpdateIntervalSeconds int32 = 5 | |||
DefaultClusterQueuesMaxCount int32 = 10 | |||
DefaultJobFrameworkName = "batch/job" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DefaultJobFrameworkName = "batch/job" | |
defaultJobFrameworkName = "batch/job" |
Is this constant used from external packages?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, un-exporting
apis/config/v1beta1/defaults_test.go
Outdated
UpdateIntervalSeconds: 10, | ||
ClusterQueues: &ClusterQueueVisibility{ | ||
ClusterQueues: &v1beta1.ClusterQueueVisibility{ | ||
MaxCount: 0, | ||
}, | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes sense.
apis/config/v1beta1/defaults_test.go
Outdated
@@ -312,7 +310,7 @@ func TestSetDefaults_Configuration(t *testing.T) { | |||
QueueVisibility: defaultQueueVisibility, | |||
}, | |||
}, | |||
"defaulting waitForPodsReady.timeout": { | |||
"defaulting waitForPodsReady.*": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"defaulting waitForPodsReady.*": { | |
"defaulting waitForPodsReady values": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
pkg/queue/cluster_queue_impl_test.go
Outdated
@@ -35,8 +36,14 @@ const ( | |||
defaultNamespace = "default" | |||
) | |||
|
|||
var ( | |||
defaultQueueOrdering = queueOrderingFunc(workload.Ordering{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
defaultQueueOrdering = queueOrderingFunc(workload.Ordering{ | |
defaultQueueOrderingFunc = queueOrderingFunc(workload.Ordering{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
}) | ||
|
||
ginkgo.It("Should prioritize workloads submitted earlier", func() { | ||
localQueueName := "eviction-creation-lq" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to define a new lq? Should we use prodQueue
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question, removing
}) | ||
|
||
ginkgo.It("Should keep the evicted workload at the front of the queue", func() { | ||
localQueueName := "eviction-creation-lq" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is the same comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing
ginkgo.By("waiting the timeout, the first workload should be evicted and readmitted", func() { | ||
time.Sleep(podsReadyTimeout) | ||
util.FinishEvictionForWorkloads(ctx, k8sClient, wl1) | ||
util.ExpectWorkloadsToHaveQuotaReservation(ctx, k8sClient, prodClusterQ.Name, wl1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also verify what happens to wl2, does it remain pending?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, it gets admitted via borrowed capacity from the dev queue. I am now creating a standalone queue and asserting that it does not get admitted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sgtm
defer func() { | ||
gomega.Expect(util.DeleteClusterQueue(ctx, k8sClient, standaloneClusterQ)).Should(gomega.Succeed()) | ||
}() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we perform this in ginkgo.AfterEach
in this Context
node instead of the defer
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC this is because the queues are declared only within the It
, while the AfterEach
is for all tests.
Alternatively, we could declare the variable for all tests in the context, and in afterEach delete in not nil, but I don't hold a view which one is better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, we could declare the variable for all tests in the context, and in afterEach delete in not nil
Yes, if we want to use AfterEach
, we must put standaloneClusterQ
on the line 548.
but I don't hold a view which one is better.
I would suggest using AfterEach
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated to use a JustBeforeEach()
(cant use BeforeEach()
b/c you need the k8sClient
) and an AfterEach()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, we set up k8client in the JustBeforeEach()
node...
Ideally, we want to set up the k8sclient in the BeforeAll()
or BeforeEach()
. However, I think the investigation is out of scope in this PR.
defer func() { | ||
gomega.Expect(util.DeleteLocalQueue(ctx, k8sClient, standaloneQueue)).Should(gomega.Succeed()) | ||
}() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Due to #1593 |
@@ -192,8 +192,22 @@ type WaitForPodsReady struct { | |||
// until the jobs reach the PodsReady=true condition. It defaults to false if Enable is false | |||
// and defaults to true otherwise. | |||
BlockAdmission *bool `json:"blockAdmission,omitempty"` | |||
|
|||
// RequeuingTimestamp defines the timestamp used for requeuing a Workload | |||
// that was evicted due to Pod readiness. Defaults to Eviction (back of queue). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if "(back of queue)" could be misleading, there could be lower priority workloads that are even more behind in the queue order, because priority is checked first:
kueue/pkg/queue/cluster_queue_strict_fifo.go
Lines 50 to 55 in 2790286
p1 := utilpriority.Priority(objA.Obj) | |
p2 := utilpriority.Priority(objB.Obj) | |
if p1 != p2 { | |
return p1 > p2 | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
Please address the suggestions for the integration test.
ginkgo.BeforeEach(func() { | ||
podsReadyTimeout = 3 * time.Second | ||
requeuingTimestamp = config.CreationTimestamp | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there should be a corresponding AfterEach
resetting the original values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not done in the other tests today (they all set these values in their BeforeEach()
s). Would you like me to modify the others?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally, yes, but we can leave it for a follow up.
Just add an AfterEach for this new case in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha, I guess it makes sense to add some variables for tracking the original values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Take a look at this commit to see if that is what you prefer.
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: alculquicondor, nstogner The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
LGTM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nstogner Thank you!
/lgtm
/hold cancel
LGTM label has been added. Git tree hash: 2d4a963ffcad71f03c8030b210b27ecb3b7f716e
|
/release-note-edit
|
/release-note-edit
|
/release-note-edit
|
What type of PR is this?
/kind feature
What this PR does / why we need it:
Allow admins to choose the timestamp used for sorting workloads after eviction due to wait-for-pods-ready. See KEP-1282.
Which issue(s) this PR fixes:
Fixes #1282
Special notes for your reviewer:
Does this PR introduce a user-facing change?