Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP-1714: Fair Sharing #1773

Merged
merged 1 commit into from
Mar 18, 2024
Merged

KEP-1714: Fair Sharing #1773

merged 1 commit into from
Mar 18, 2024

Conversation

mwielgus
Copy link
Contributor

@mwielgus mwielgus commented Feb 27, 2024

What type of PR is this?

/kind feature
/kind api-change

What this PR does / why we need it:

Defines how fair sharing of unused resources will work in Kueue.

Which issue(s) this PR fixes:

Part of #1714

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/feature Categorizes issue or PR as related to a new feature. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API labels Feb 27, 2024
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Feb 27, 2024
Copy link

netlify bot commented Feb 27, 2024

Deploy Preview for kubernetes-sigs-kueue canceled.

Name Link
🔨 Latest commit 3f8dcd3
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-kueue/deploys/65f82935d0631f0008d30432

@alculquicondor
Copy link
Contributor

/cc @KunWuLuan @tenzen-y

@k8s-ci-robot
Copy link
Contributor

@alculquicondor: GitHub didn't allow me to request PR reviews from the following users: KunWuLuan.

Note that only kubernetes-sigs members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc @KunWuLuan @tenzen-y

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@alculquicondor
Copy link
Contributor

cc @kerthcet

@tenzen-y
Copy link
Member

/cc @KunWuLuan @tenzen-y

Yea, I should review this KEP.

keps/1714-fair-sharing/README.md Show resolved Hide resolved
keps/1714-fair-sharing/README.md Show resolved Hide resolved
TeamE can submit as many workloads and consume as many resources as they can while
TeamW is not a work and doesn’t need resources. However, once they arrive, some of
the already submitted workloads from TeamE may be preempted(preferably the least
important) to ensure equal extra space (irregardless of their given quota) for both teams.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why here to ensure equal extra space, didn't understand the intension here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

space from the company-wide pool, as described in the first paragraph.

keps/1714-fair-sharing/README.md Show resolved Hide resolved
keps/1714-fair-sharing/README.md Show resolved Hide resolved
keps/1714-fair-sharing/README.md Show resolved Hide resolved
Copy link
Contributor

@mimowo mimowo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass mostly with questions to better understand. For now I skipped the preemption part.

keps/1714-fair-sharing/README.md Outdated Show resolved Hide resolved
keps/1714-fair-sharing/README.md Outdated Show resolved Hide resolved
keps/1714-fair-sharing/README.md Show resolved Hide resolved
keps/1714-fair-sharing/README.md Show resolved Hide resolved
keps/1714-fair-sharing/README.md Show resolved Hide resolved
keps/1714-fair-sharing/README.md Outdated Show resolved Hide resolved
keps/1714-fair-sharing/README.md Show resolved Hide resolved
keps/1714-fair-sharing/README.md Outdated Show resolved Hide resolved
flavor, that are above the nominal quota. The value for a resource is the ratio of T_r and the
total nominal quotas in the hierarchy of the parent of C.

The value for the CQ or cohort is the maximum among the values for each resource, divided by the weight, if defined.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"The value for the CQ or cohort" - is this the fair share value? This paragraph feels quite abstract, I think it would be helpful to back it up with a small example so that we can have some intuition.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

keps/1714-fair-sharing/README.md Outdated Show resolved Hide resolved
keps/1714-fair-sharing/README.md Show resolved Hide resolved
keps/1714-fair-sharing/README.md Show resolved Hide resolved
keps/1714-fair-sharing/README.md Show resolved Hide resolved
keps/1714-fair-sharing/README.md Show resolved Hide resolved
keps/1714-fair-sharing/README.md Show resolved Hide resolved
keps/1714-fair-sharing/README.md Outdated Show resolved Hide resolved
TeamE can submit as many workloads and consume as many resources as they can while
TeamW is not a work and doesn’t need resources. However, once they arrive, some of
the already submitted workloads from TeamE may be preempted(preferably the least
important) to ensure equal extra space (irregardless of their given quota) for both teams.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

space from the company-wide pool, as described in the first paragraph.

keps/1714-fair-sharing/README.md Outdated Show resolved Hide resolved
keps/1714-fair-sharing/README.md Show resolved Hide resolved
keps/1714-fair-sharing/README.md Outdated Show resolved Hide resolved
keps/1714-fair-sharing/README.md Outdated Show resolved Hide resolved
keps/1714-fair-sharing/README.md Outdated Show resolved Hide resolved
keps/1714-fair-sharing/README.md Show resolved Hide resolved
keps/1714-fair-sharing/README.md Outdated Show resolved Hide resolved
keps/1714-fair-sharing/README.md Outdated Show resolved Hide resolved
keps/1714-fair-sharing/README.md Outdated Show resolved Hide resolved
keps/1714-fair-sharing/README.md Outdated Show resolved Hide resolved
keps/1714-fair-sharing/README.md Show resolved Hide resolved
flavor, that are above the nominal quota. The value for a resource is the ratio of T_r and the
total nominal quotas in the hierarchy of the parent of C.

The value for the CQ or cohort is the maximum among the values for each resource, divided by the weight, if defined.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

keps/1714-fair-sharing/README.md Outdated Show resolved Hide resolved

The value for the CQ or cohort is the maximum among the values for each resource, divided by the weight, if defined.

Weights will be added to ClusterQueueSpec and CohortSpec in the following optional struct:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we set FairSharing configuration, what happens?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's mentioned in the previous section and detailed in the sections below

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is my bad.
I wanted to say "When we set FairSharing configuration to ClusterQueue without Cohort, what happens?"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing. Fair sharing only applies above CQs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You meant that we will add any validation?


Weights will be added to ClusterQueueSpec and CohortSpec in the following optional struct:

```go
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I can remember, we have a plan to have DRF as another queueingStarategy like this: https://docs.google.com/document/d/1VQ0qxWA-jwgvLq_WYG46OkXWW00O6q7b1BsR_Uv-acs/edit?usp=sharing

So, did you compare the pros and cons of the following options?

  1. Exptend CohortSpec and ClusterQueueSpec (current your approach)
  2. Introduce a new queueing strategy, "DRF", and have a new CRD, "FairSharing".

@mwielgus @alculquicondor

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DRF within a CQ wouldn't easily extrapolate to more complex hierarchies.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense.
Recording this discussion as an Alternative approach might be worth it, right?

Copy link
Contributor

@alculquicondor alculquicondor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few notes to grasp the high level idea of the algorithms.

keps/1714-fair-sharing/README.md Show resolved Hide resolved

Weights will be added to ClusterQueueSpec and CohortSpec in the following optional struct:

```go
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DRF within a CQ wouldn't easily extrapolate to more complex hierarchies.


The value for the CQ or cohort is the maximum among the values for each resource, divided by the weight, if defined.

Weights will be added to ClusterQueueSpec and CohortSpec in the following optional struct:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's mentioned in the previous section and detailed in the sections below

keps/1714-fair-sharing/README.md Show resolved Hide resolved
keps/1714-fair-sharing/README.md Show resolved Hide resolved
@alculquicondor
Copy link
Contributor

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 18, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 7e88dcc1f51499d039498371bab9716694e2c9f7

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alculquicondor, mwielgus

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 18, 2024
@k8s-ci-robot k8s-ci-robot merged commit 8d768df into kubernetes-sigs:main Mar 18, 2024
14 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v0.7 milestone Mar 18, 2024
vsoch pushed a commit to researchapps/kueue that referenced this pull request Apr 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants