Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent borrowing when preemption could help #475

Merged
merged 2 commits into from
Dec 21, 2022

Conversation

alculquicondor
Copy link
Contributor

@alculquicondor alculquicondor commented Dec 13, 2022

What type of PR is this?

/kind feature

What this PR does / why we need it:

Calculate flavor assignments that could be satisfied with preemption.

If we find such assignment, prevent borrowing in the cohort. Actual preemption will be implemented in a follow up.

Which issue(s) this PR fixes:

Part of #83

This also solves what #403 was trying to solve.

Special notes for your reviewer:

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Dec 13, 2022
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alculquicondor

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Dec 13, 2022
@alculquicondor alculquicondor force-pushed the flavor-modes branch 2 times, most recently from c8557f3 to 24f3653 Compare December 14, 2022 20:20
@alculquicondor alculquicondor changed the title WIP: Prevent borrowing when preemption could help Prevent borrowing when preemption could help Dec 14, 2022
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 14, 2022
@alculquicondor
Copy link
Contributor Author

/assign @ahg-g

Copy link
Contributor

@mimowo mimowo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Major comments:

  • IIUC when there is a workload requesting more than >flavor.Min, but <flavor.Max then we don't attempt preemption (neither passive nor active) to make room for it (either Fit or NoFit mode is returned). IIUC, I suggest to make it clear in the comments and PR description or support.
  • I find the description of the modes hard to understand, suggested rephrasing
  • I find the logic in fitsFlavorLimits hard to follow. Suggested avoiding mutation of mode to improve it.

pkg/scheduler/flavorassigner/flavorassigner.go Outdated Show resolved Hide resolved
pkg/scheduler/flavorassigner/flavorassigner.go Outdated Show resolved Hide resolved
pkg/scheduler/flavorassigner/flavorassigner.go Outdated Show resolved Hide resolved
pkg/scheduler/flavorassigner/flavorassigner.go Outdated Show resolved Hide resolved
// If it fits, also returns any borrowing required.
func fitsFlavorLimits(rName corev1.ResourceName, val int64, cq *cache.ClusterQueue, flavor *cache.FlavorLimits) (int64, *Status) {
func fitsFlavorLimits(rName corev1.ResourceName, val int64, cq *cache.ClusterQueue, flavor *cache.FlavorLimits) (FlavorAssignmentMode, int64, *Status) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when val > flavor.Min either Fit or NoFit is returned - meaning that we don't attempt to trigger preemption, but preemption could help in that case as long as val <= flavor.Max by using freed cohort resources. I think we should make it clear or support.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a good point, I think we want to support this for higher priority jobs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@mimowo mimowo Dec 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, this sentence matches my case "The proposed policies for preemption within cohort require that the Workload fits within the min quota of the ClusterQueue.". I'm fine with deferring the implementation then.

Although I'm a bit confused by the next sentence: "In other words, we don't try to borrow quota when preempting.". IIUC, it is possible (but maybe unlikely) that a big high priority workload gets starved and cannot borrow while the smaller workloads don't trigger preemption cause they are constantly fitting inside their ClusterQueues.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, but if a single high priority workload uses more than the entire min quota for the ClusterQueue, you are probably asking for trouble :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree, wondering if there is a good place in the docs to explain that, but maybe it can wait for feedback from users if it is a real scenario

pkg/scheduler/flavorassigner/flavorassigner.go Outdated Show resolved Hide resolved
mode = ClusterQueuePreempt
}
borrow := used + val - flavor.Min
if borrow <= 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that readability could be improved by moving this if into the if val <= flavor.Min { branch to avoid mutating mode and to make it clear that this happens only if val <= flavor.Min. For example, due to the mutability of mode it is not obvious if the comment under val <= flavor.Min describes the entering of the branch or the semantics of the mode ClusterQueuePreempt - because the semantics of the mode depends on what happens later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the mode ClusterQueuePreempt still applies as we move down the logic, but the logic below tries to find a "better" one, and hence keeps changing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this function is attempting to find a better mode as it goes down.

And we need to calculate borrow in all branches.

But I moved the calculation of borrow to where borrowing is needed (when it fits in the unused quota)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, I was thinking about something like this:

mode := NoFit
if val <= flavor.Min {
	if val <= flavor.Min - used {
		// The request can be satisfied by the min quota, if all active workloads
		// from other ClusterQueues in the cohort are preempted.
		mode = CohortReclaim
	} else {
		// The request can be satisfied by the min quota, if all active workloads
		// in the ClusterQueue are preempted.
		mode = ClusterQueuePreempt
	}
}
...
if lack <= 0 {
  borrow := used + val - flavor.Min
  if borrow <= 0 {
	  borrow = 0
  }
  return Fit, borrow, nil
}
...

However, this recomputes the value for used + val - flavor.Min, so I'm ok with keeping as is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with the current flow. It might make it easier to split each mode into it's own independent function in the future, if necessary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also happy with that approach

Copy link
Contributor

@ahg-g ahg-g left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to document the placement order, something like:

  1. If it fits within quota
  2. borrowing
  3. reclaiming lent quota
  4. preempting within the CQ based on priority

and discuss cases that we don't currently support:

  1. preempting to borrow
    what else?

@@ -40,12 +43,56 @@ type Assignment struct {
// usedResources is the accumulated usage of resources as podSets get
// flavors assigned.
usage cache.ResourceQuantities

// repMode is the cached representative mode for this assignment.
repMode *FlavorAssignmentMode
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assignmentMode so that it is aligned with the type name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll use the full word "representative". This field is already part of the struct "Assignment", so I think that would be repetitive.

pkg/scheduler/flavorassigner/flavorassigner.go Outdated Show resolved Hide resolved
pkg/scheduler/flavorassigner/flavorassigner.go Outdated Show resolved Hide resolved
if flavor.Max != nil && used+val > *flavor.Max {
status.append(fmt.Sprintf("borrowing limit for %s flavor %s exceeded", rName, flavor.Name))
return 0, &status
return mode, 0, &status
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this if statement will never be entered if we entered the previous one, perhaps hook them via an else to emphasize that we are now looking at whether we can borrow resources:

} else if flavor.Max != nil && used+val > *flavor.Max {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can also be thought the other way: if we are past Max, we can stop and we don't need to check how much borrowing we need. Moved it up.

pkg/scheduler/flavorassigner/flavorassigner.go Outdated Show resolved Hide resolved
status.append(fmt.Sprintf("insufficient quota for %s flavor %s, %s more needed", rName, flavor.Name, &lackQuantity))
if lack <= 0 {
return Fit, borrow, nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel the function will be more readable if we move this check and early exit to be done first thing in the function (first, we check if we fit without preemption).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried, but we actually need to check if we are past the Max first. Also, this fits the narrative that we are trying to find a better mode as we go down.

// If it fits, also returns any borrowing required.
func fitsFlavorLimits(rName corev1.ResourceName, val int64, cq *cache.ClusterQueue, flavor *cache.FlavorLimits) (int64, *Status) {
func fitsFlavorLimits(rName corev1.ResourceName, val int64, cq *cache.ClusterQueue, flavor *cache.FlavorLimits) (FlavorAssignmentMode, int64, *Status) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a good point, I think we want to support this for higher priority jobs.

pkg/scheduler/flavorassigner/flavorassigner.go Outdated Show resolved Hide resolved
// the resources that the podset requests. Each assigned flavor is accompanied
// with an AssignmentMode.
// Empty flavors can be interpreted as NoFit mode for all the resources.
// Empty status can be interpreted as Fit mode for all the resources.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clarify that Status should not be nil if Flavors is empty and vice versa.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

// with lower priority.
// ClusterQueuePreempt means that there is not enough unused min quota in the
// ClusterQueue. Preempting other workloads in the ClusterQueue or waiting for
// them to finish might allow to assign this flavor.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"may make it possible to assign this flavor"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -138,17 +138,24 @@ func (s *Status) Equal(o *Status) bool {
}))
}

// PodSetAssignment holds the assigned flavors and status messages for each of
// the resources that the podset requests. Each assigned flavor is accompanied
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: in some comments it is podset and other it is pod sets...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😅

using "pod set" now.

Assign flavors that could be satisfied with preemption.

If we find such assigment, prevent borrowing in the cohort.
Change-Id: I7c3aebf552bef138ff86bc723d54c0ad083095a5
@alculquicondor
Copy link
Contributor Author

squashed into two commits

@ahg-g
Copy link
Contributor

ahg-g commented Dec 20, 2022

/lgtm
/hold

in case Michal has other comments

@k8s-ci-robot k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm "Looks good to me", indicates that a PR is ready to be merged. labels Dec 20, 2022
@mimowo
Copy link
Contributor

mimowo commented Dec 21, 2022

LGTM
/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 21, 2022
@k8s-ci-robot k8s-ci-robot merged commit 7b8316f into kubernetes-sigs:main Dec 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants