-
Notifications
You must be signed in to change notification settings - Fork 267
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP-1224: Lending Limit to the cohort #1331
Conversation
✅ Deploy Preview for kubernetes-sigs-kueue canceled.
|
Hi @B1F030. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/cc @kerthcet |
/remove-kind api-change |
This is only a proposal, no user-facing change. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First found of review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally LGTM.
|
||
# The following PRR answers are required at alpha release | ||
# List the feature gate name and the components for which it must be enabled | ||
feature-gates: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We add a feature gate here for experiment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Plz don't squash or I have no idea where you have changed since last review.
cc @alculquicondor @tenzen-y can you take a look then we can push forward. |
Co-authored-by: kerthcet <kerthcet@gmail.com> Signed-off-by: B1F030 <646337422@qq.com>
I finally came here. So I will review this KEP this week. |
For more information, WIP, first commit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally, lgtm.
@B1F030 Thanks for this effort :)
The design for test cases was especially helpful in reviewing this proposal.
keps/1224-lending-limit/README.md
Outdated
- When cq-a's BorrowingLimit unset, cq-a can borrow as much as `cq-b's LendingLimit`. | ||
- When cq-a's BorrowingLimit set, cq-a can borrow as much as `min(cq-b's LendingLimit, cq-a's BorrowingLimit)`. | ||
- In a ClusterQueue with 2 ResourceFlavors a, b: | ||
- When rf-b's LendingLimit set, and FlavorFungibility set to `WhenCanBorrow: Borrow`: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest adding tests for the whenCanBorrow: TryNextFlavor
case since we recently found some hidden bugs related to fungibility. So, to avoid such a case, I want to add tests for the whenCanBorrow: TryNextFlavor
.
However, we can confirm in the implementation PR if it's case is valuable.
It means that I'm ok without updating here now.
keps/1224-lending-limit/README.md
Outdated
With both BorrowingLimit and LendingLimit configured, one clusterQueue may not be able to borrow up to the limit just because we reserved the lending limit quota of resource. | ||
|
||
To reduce confusion, we will recommend to users to only set borrowingLimit or lendingLimit, but not both, even though both will be supported at the same time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, there is such a confusion. However, I don't think that we should recommend users not to use both BorrowingLimit
and LendingLimit
. Because using both BorrowingLimit
and LendingLimit
would be helpful for jobs with autoscale / elastic semantics.
I would suggest to remove Line 66.
@alculquicondor @B1F030 WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ok either way. I remember seeing a similar recommendation in YARN not to use borrowingLimit and weight at the same time.
Another alternative would be some form of troubleshooting, for example:
Why is my CQ not borrowing?
Check borrowingLimit of your CQ and lendingLimit of other CQs in the cohort.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ok either way. I remember seeing a similar recommendation in YARN not to use borrowingLimit and weight at the same time.
Another alternative would be some form of troubleshooting, for example:
Why is my CQ not borrowing?
Check borrowingLimit of your CQ and lendingLimit of other CQs in the cohort.
I prefer to take this case (BorrowingLimit
and LendingLimit
) as a troubleshooting technique.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds reasonable. I agree to remove Line 66.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not about we should only set borrowingLimit
or lendingLimit
or both, but a practice that if we configured the lendingLimit
for demand, then borrowingLimit
could be omitted for the maximum resource utilization as there's no borrowingLimit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Plz update the words.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not about we should only set
borrowingLimit
orlendingLimit
or both, but a practice that if we configured thelendingLimit
for demand, thenborrowingLimit
could be omitted for the maximum resource utilization as there's no borrowingLimit.
@kerthcet Does this mean the following situation?
cq-a and cq-b belong to the same cohort.
-
cq-a:
cpu: 10
borrowingLimit: 5 -
cq-b:
cpu: 5
lendingLimit: 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can set the borrowingLimit or lendingLimit by your demands, but if we set the lendingLimit in cq-b, then cq-a can leave the borrowingLimit unset to consume the resources as much as possible, as
- cq-a:
cpu: 10
# borrowingLimit: 5 # no need to set this if you want to maximum the resource utilization
# and don't need to worry about use too much resources because other clusterQueues can
# constrain this by setting lendingLimit.
- cq-b:
cpu: 5
lendingLimit: X
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, it makes sense. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
I'll leave the LGTM to @tenzen-y
keps/1224-lending-limit/README.md
Outdated
With both BorrowingLimit and LendingLimit configured, one clusterQueue may not be able to borrow up to the limit just because we reserved the lending limit quota of resource. | ||
|
||
To reduce confusion, we will recommend to users to only set borrowingLimit or lendingLimit, but not both, even though both will be supported at the same time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ok either way. I remember seeing a similar recommendation in YARN not to use borrowingLimit and weight at the same time.
Another alternative would be some form of troubleshooting, for example:
Why is my CQ not borrowing?
Check borrowingLimit of your CQ and lendingLimit of other CQs in the cohort.
Co-authored-by: kerthcet <kerthcet@gmail.com> Signed-off-by: B1F030 <646337422@qq.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@B1F030 Thanks!
/lgtm
/approve
LGTM label has been added. Git tree hash: b093ca59cb765a191d2f11299b34ca1c8d6b5105
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: alculquicondor, B1F030, tenzen-y The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/hold cancel |
* KEP-1224: Lending Limit to the cohort Co-authored-by: kerthcet <kerthcet@gmail.com> Signed-off-by: B1F030 <646337422@qq.com> * add test for whenCanBorrow: TryNextFlavor Co-authored-by: kerthcet <kerthcet@gmail.com> Signed-off-by: B1F030 <646337422@qq.com> --------- Signed-off-by: B1F030 <646337422@qq.com> Co-authored-by: kerthcet <kerthcet@gmail.com>
What type of PR is this?
/kind document
What this PR does / why we need it:
This proposal provides a guaranteed resource quota for user. By setting
lending limit
, users can have a reservation of resource quota(nominalQuota
-lendingLimit
) that will never be borrowed by other clusterqueues in the same cohort.Which issue(s) this PR fixes:
#1224
Special notes for your reviewer:
Does this PR introduce a user-facing change?