-
Notifications
You must be signed in to change notification settings - Fork 254
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prevent borrowing when preemption could help #475
Prevent borrowing when preemption could help #475
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: alculquicondor The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
c8557f3
to
24f3653
Compare
/assign @ahg-g |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Major comments:
- IIUC when there is a workload requesting more than
>flavor.Min
, but<flavor.Max
then we don't attempt preemption (neither passive nor active) to make room for it (eitherFit
orNoFit
mode is returned). IIUC, I suggest to make it clear in the comments and PR description or support. - I find the description of the modes hard to understand, suggested rephrasing
- I find the logic in
fitsFlavorLimits
hard to follow. Suggested avoiding mutation ofmode
to improve it.
// If it fits, also returns any borrowing required. | ||
func fitsFlavorLimits(rName corev1.ResourceName, val int64, cq *cache.ClusterQueue, flavor *cache.FlavorLimits) (int64, *Status) { | ||
func fitsFlavorLimits(rName corev1.ResourceName, val int64, cq *cache.ClusterQueue, flavor *cache.FlavorLimits) (FlavorAssignmentMode, int64, *Status) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when val > flavor.Min
either Fit
or NoFit
is returned - meaning that we don't attempt to trigger preemption, but preemption could help in that case as long as val <= flavor.Max
by using freed cohort resources. I think we should make it clear or support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a good point, I think we want to support this for higher priority jobs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, this sentence matches my case "The proposed policies for preemption within cohort require that the Workload fits within the min quota of the ClusterQueue.". I'm fine with deferring the implementation then.
Although I'm a bit confused by the next sentence: "In other words, we don't try to borrow quota when preempting.". IIUC, it is possible (but maybe unlikely) that a big high priority workload gets starved and cannot borrow while the smaller workloads don't trigger preemption cause they are constantly fitting inside their ClusterQueues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, but if a single high priority workload uses more than the entire min quota for the ClusterQueue, you are probably asking for trouble :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree, wondering if there is a good place in the docs to explain that, but maybe it can wait for feedback from users if it is a real scenario
mode = ClusterQueuePreempt | ||
} | ||
borrow := used + val - flavor.Min | ||
if borrow <= 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to me that readability could be improved by moving this if
into the if val <= flavor.Min {
branch to avoid mutating mode
and to make it clear that this happens only if val <= flavor.Min
. For example, due to the mutability of mode
it is not obvious if the comment under val <= flavor.Min
describes the entering of the branch or the semantics of the mode ClusterQueuePreempt
- because the semantics of the mode depends on what happens later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the mode ClusterQueuePreempt
still applies as we move down the logic, but the logic below tries to find a "better" one, and hence keeps changing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this function is attempting to find a better mode as it goes down.
And we need to calculate borrow
in all branches.
But I moved the calculation of borrow
to where borrowing is needed (when it fits in the unused quota)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, I was thinking about something like this:
mode := NoFit
if val <= flavor.Min {
if val <= flavor.Min - used {
// The request can be satisfied by the min quota, if all active workloads
// from other ClusterQueues in the cohort are preempted.
mode = CohortReclaim
} else {
// The request can be satisfied by the min quota, if all active workloads
// in the ClusterQueue are preempted.
mode = ClusterQueuePreempt
}
}
...
if lack <= 0 {
borrow := used + val - flavor.Min
if borrow <= 0 {
borrow = 0
}
return Fit, borrow, nil
}
...
However, this recomputes the value for used + val - flavor.Min
, so I'm ok with keeping as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy with the current flow. It might make it easier to split each mode into it's own independent function in the future, if necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also happy with that approach
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to document the placement order, something like:
- If it fits within quota
- borrowing
- reclaiming lent quota
- preempting within the CQ based on priority
and discuss cases that we don't currently support:
- preempting to borrow
what else?
@@ -40,12 +43,56 @@ type Assignment struct { | |||
// usedResources is the accumulated usage of resources as podSets get | |||
// flavors assigned. | |||
usage cache.ResourceQuantities | |||
|
|||
// repMode is the cached representative mode for this assignment. | |||
repMode *FlavorAssignmentMode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assignmentMode so that it is aligned with the type name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll use the full word "representative". This field is already part of the struct "Assignment", so I think that would be repetitive.
if flavor.Max != nil && used+val > *flavor.Max { | ||
status.append(fmt.Sprintf("borrowing limit for %s flavor %s exceeded", rName, flavor.Name)) | ||
return 0, &status | ||
return mode, 0, &status |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this if statement will never be entered if we entered the previous one, perhaps hook them via an else to emphasize that we are now looking at whether we can borrow resources:
} else if flavor.Max != nil && used+val > *flavor.Max {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can also be thought the other way: if we are past Max, we can stop and we don't need to check how much borrowing we need. Moved it up.
status.append(fmt.Sprintf("insufficient quota for %s flavor %s, %s more needed", rName, flavor.Name, &lackQuantity)) | ||
if lack <= 0 { | ||
return Fit, borrow, nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel the function will be more readable if we move this check and early exit to be done first thing in the function (first, we check if we fit without preemption).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried, but we actually need to check if we are past the Max first. Also, this fits the narrative that we are trying to find a better mode as we go down.
// If it fits, also returns any borrowing required. | ||
func fitsFlavorLimits(rName corev1.ResourceName, val int64, cq *cache.ClusterQueue, flavor *cache.FlavorLimits) (int64, *Status) { | ||
func fitsFlavorLimits(rName corev1.ResourceName, val int64, cq *cache.ClusterQueue, flavor *cache.FlavorLimits) (FlavorAssignmentMode, int64, *Status) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a good point, I think we want to support this for higher priority jobs.
0669c6b
to
4808831
Compare
// the resources that the podset requests. Each assigned flavor is accompanied | ||
// with an AssignmentMode. | ||
// Empty flavors can be interpreted as NoFit mode for all the resources. | ||
// Empty status can be interpreted as Fit mode for all the resources. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clarify that Status should not be nil if Flavors is empty and vice versa.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
// with lower priority. | ||
// ClusterQueuePreempt means that there is not enough unused min quota in the | ||
// ClusterQueue. Preempting other workloads in the ClusterQueue or waiting for | ||
// them to finish might allow to assign this flavor. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"may make it possible to assign this flavor"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -138,17 +138,24 @@ func (s *Status) Equal(o *Status) bool { | |||
})) | |||
} | |||
|
|||
// PodSetAssignment holds the assigned flavors and status messages for each of | |||
// the resources that the podset requests. Each assigned flavor is accompanied |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: in some comments it is podset and other it is pod sets...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😅
using "pod set" now.
Assign flavors that could be satisfied with preemption. If we find such assigment, prevent borrowing in the cohort.
Change-Id: I7c3aebf552bef138ff86bc723d54c0ad083095a5
f50078c
to
8aa5e4a
Compare
squashed into two commits |
/lgtm in case Michal has other comments |
LGTM |
What type of PR is this?
/kind feature
What this PR does / why we need it:
Calculate flavor assignments that could be satisfied with preemption.
If we find such assignment, prevent borrowing in the cohort. Actual preemption will be implemented in a follow up.
Which issue(s) this PR fixes:
Part of #83
This also solves what #403 was trying to solve.
Special notes for your reviewer: