Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP-1224: Lending Limit to the cohort #1331

Merged
merged 3 commits into from
Dec 4, 2023

Conversation

B1F030
Copy link
Member

@B1F030 B1F030 commented Nov 14, 2023

What type of PR is this?

/kind document

What this PR does / why we need it:

This proposal provides a guaranteed resource quota for user. By setting lending limit, users can have a reservation of resource quota(nominalQuota - lendingLimit) that will never be borrowed by other clusterqueues in the same cohort.

Which issue(s) this PR fixes:

#1224

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API labels Nov 14, 2023
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Nov 14, 2023
Copy link

netlify bot commented Nov 14, 2023

Deploy Preview for kubernetes-sigs-kueue canceled.

Name Link
🔨 Latest commit b2ae689
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-kueue/deploys/6569be496497a90008e36e1e

@k8s-ci-robot
Copy link
Contributor

Hi @B1F030. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Nov 14, 2023
@B1F030
Copy link
Member Author

B1F030 commented Nov 14, 2023

/cc @kerthcet
/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 14, 2023
@B1F030 B1F030 changed the title KEP-1224: Lending Limit to the cohort [WIP] KEP-1224: Lending Limit to the cohort Nov 14, 2023
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 14, 2023
@kerthcet
Copy link
Contributor

/remove-kind api-change
/kind documentation
/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. kind/documentation Categorizes issue or PR as related to documentation. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API labels Nov 15, 2023
@kerthcet
Copy link
Contributor

This is only a proposal, no user-facing change.

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. and removed release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Nov 15, 2023
Copy link
Contributor

@kerthcet kerthcet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First found of review.

keps/1224-lending-limit/README.md Outdated Show resolved Hide resolved
keps/1224-lending-limit/kep.yaml Show resolved Hide resolved
keps/1224-lending-limit/README.md Outdated Show resolved Hide resolved
keps/1224-lending-limit/kep.yaml Outdated Show resolved Hide resolved
keps/1224-lending-limit/README.md Outdated Show resolved Hide resolved
keps/1224-lending-limit/README.md Outdated Show resolved Hide resolved
keps/1224-lending-limit/README.md Outdated Show resolved Hide resolved
Copy link
Contributor

@kerthcet kerthcet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM.

keps/1224-lending-limit/README.md Outdated Show resolved Hide resolved
keps/1224-lending-limit/README.md Outdated Show resolved Hide resolved
keps/1224-lending-limit/README.md Outdated Show resolved Hide resolved

# The following PRR answers are required at alpha release
# List the feature gate name and the components for which it must be enabled
feature-gates:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We add a feature gate here for experiment.

keps/1224-lending-limit/README.md Outdated Show resolved Hide resolved
keps/1224-lending-limit/README.md Outdated Show resolved Hide resolved
keps/1224-lending-limit/README.md Outdated Show resolved Hide resolved
keps/1224-lending-limit/README.md Outdated Show resolved Hide resolved
@B1F030 B1F030 changed the title [WIP] KEP-1224: Lending Limit to the cohort KEP-1224: Lending Limit to the cohort Nov 21, 2023
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 21, 2023
Copy link
Contributor

@kerthcet kerthcet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plz don't squash or I have no idea where you have changed since last review.

keps/1224-lending-limit/README.md Outdated Show resolved Hide resolved
keps/1224-lending-limit/README.md Outdated Show resolved Hide resolved
keps/1224-lending-limit/README.md Show resolved Hide resolved
keps/1224-lending-limit/README.md Outdated Show resolved Hide resolved
@kerthcet
Copy link
Contributor

cc @alculquicondor @tenzen-y can you take a look then we can push forward.

@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Nov 24, 2023
@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Nov 24, 2023
Co-authored-by: kerthcet <kerthcet@gmail.com>
Signed-off-by: B1F030 <646337422@qq.com>
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Nov 24, 2023
@tenzen-y
Copy link
Member

I finally came here. So I will review this KEP this week.

@B1F030
Copy link
Member Author

B1F030 commented Nov 30, 2023

For more information, WIP, first commit

Copy link
Member

@tenzen-y tenzen-y left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, lgtm.
@B1F030 Thanks for this effort :)

The design for test cases was especially helpful in reviewing this proposal.

- When cq-a's BorrowingLimit unset, cq-a can borrow as much as `cq-b's LendingLimit`.
- When cq-a's BorrowingLimit set, cq-a can borrow as much as `min(cq-b's LendingLimit, cq-a's BorrowingLimit)`.
- In a ClusterQueue with 2 ResourceFlavors a, b:
- When rf-b's LendingLimit set, and FlavorFungibility set to `WhenCanBorrow: Borrow`:
Copy link
Member

@tenzen-y tenzen-y Nov 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest adding tests for the whenCanBorrow: TryNextFlavor case since we recently found some hidden bugs related to fungibility. So, to avoid such a case, I want to add tests for the whenCanBorrow: TryNextFlavor.

However, we can confirm in the implementation PR if it's case is valuable.

It means that I'm ok without updating here now.

Comment on lines 64 to 66
With both BorrowingLimit and LendingLimit configured, one clusterQueue may not be able to borrow up to the limit just because we reserved the lending limit quota of resource.

To reduce confusion, we will recommend to users to only set borrowingLimit or lendingLimit, but not both, even though both will be supported at the same time.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, there is such a confusion. However, I don't think that we should recommend users not to use both BorrowingLimit and LendingLimit. Because using both BorrowingLimit and LendingLimit would be helpful for jobs with autoscale / elastic semantics.

I would suggest to remove Line 66.
@alculquicondor @B1F030 WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok either way. I remember seeing a similar recommendation in YARN not to use borrowingLimit and weight at the same time.

Another alternative would be some form of troubleshooting, for example:

Why is my CQ not borrowing?
Check borrowingLimit of your CQ and lendingLimit of other CQs in the cohort.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok either way. I remember seeing a similar recommendation in YARN not to use borrowingLimit and weight at the same time.

Another alternative would be some form of troubleshooting, for example:

Why is my CQ not borrowing?
Check borrowingLimit of your CQ and lendingLimit of other CQs in the cohort.

I prefer to take this case (BorrowingLimit and LendingLimit) as a troubleshooting technique.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds reasonable. I agree to remove Line 66.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not about we should only set borrowingLimit or lendingLimit or both, but a practice that if we configured the lendingLimit for demand, then borrowingLimit could be omitted for the maximum resource utilization as there's no borrowingLimit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plz update the words.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not about we should only set borrowingLimit or lendingLimit or both, but a practice that if we configured the lendingLimit for demand, then borrowingLimit could be omitted for the maximum resource utilization as there's no borrowingLimit.

@kerthcet Does this mean the following situation?

cq-a and cq-b belong to the same cohort.

  • cq-a:
    cpu: 10
    borrowingLimit: 5

  • cq-b:
    cpu: 5
    lendingLimit: 0

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can set the borrowingLimit or lendingLimit by your demands, but if we set the lendingLimit in cq-b, then cq-a can leave the borrowingLimit unset to consume the resources as much as possible, as

- cq-a:
  cpu: 10
  # borrowingLimit: 5 # no need to set this if you want to maximum the resource utilization 
  # and don't need to worry about use too much resources because other clusterQueues can 
  # constrain this by setting lendingLimit.

- cq-b:
  cpu: 5
  lendingLimit: X

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, it makes sense. Thanks!

Copy link
Contributor

@alculquicondor alculquicondor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

I'll leave the LGTM to @tenzen-y

Comment on lines 64 to 66
With both BorrowingLimit and LendingLimit configured, one clusterQueue may not be able to borrow up to the limit just because we reserved the lending limit quota of resource.

To reduce confusion, we will recommend to users to only set borrowingLimit or lendingLimit, but not both, even though both will be supported at the same time.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok either way. I remember seeing a similar recommendation in YARN not to use borrowingLimit and weight at the same time.

Another alternative would be some form of troubleshooting, for example:

Why is my CQ not borrowing?
Check borrowingLimit of your CQ and lendingLimit of other CQs in the cohort.

keps/1224-lending-limit/README.md Outdated Show resolved Hide resolved
@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 30, 2023
B1F030 and others added 2 commits December 1, 2023 19:05
Co-authored-by: kerthcet <kerthcet@gmail.com>
Signed-off-by: B1F030 <646337422@qq.com>
Copy link
Member

@tenzen-y tenzen-y left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@B1F030 Thanks!

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 4, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: b093ca59cb765a191d2f11299b34ca1c8d6b5105

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alculquicondor, B1F030, tenzen-y

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [alculquicondor,tenzen-y]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kerthcet
Copy link
Contributor

kerthcet commented Dec 4, 2023

/hold cancel
Thansk @B1F030

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 4, 2023
@k8s-ci-robot k8s-ci-robot merged commit 8458264 into kubernetes-sigs:main Dec 4, 2023
14 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v0.6 milestone Dec 4, 2023
@B1F030 B1F030 deleted the KEP-1224 branch December 8, 2023 07:30
kannon92 pushed a commit to openshift-kannon92/kubernetes-sigs-kueue that referenced this pull request Nov 19, 2024
* KEP-1224: Lending Limit to the cohort

Co-authored-by: kerthcet <kerthcet@gmail.com>
Signed-off-by: B1F030 <646337422@qq.com>

* add test for whenCanBorrow: TryNextFlavor

Co-authored-by: kerthcet <kerthcet@gmail.com>
Signed-off-by: B1F030 <646337422@qq.com>

---------

Signed-off-by: B1F030 <646337422@qq.com>
Co-authored-by: kerthcet <kerthcet@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/documentation Categorizes issue or PR as related to documentation. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants