Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix concurrent admission in cohort when borrowing #805

Merged
merged 2 commits into from
May 30, 2023

Conversation

trasc
Copy link
Contributor

@trasc trasc commented May 24, 2023

What type of PR is this?

/kind bug

What this PR does / why we need it:

Giving a situation with a cohort composed of more than two queues, in which at least one queue is borrowing resources , when two new workloads are evaluated for admission within the same scheduling cycle, since the two workloads are not aware of each other, they both can be evaluated as fit leading to over-admitting workloads within the cohort.

A simple solution could be to make sure that only one workload within a cohort is evaluated for admission within a single scheduling cycle, however since this can be a b drastic, we are limiting this limit to cohorts in which any of its queues is borrowing.

Which issue(s) this PR fixes:

Fixes #804

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Fix: Potential over-admission within cohort when borrowing.

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels May 24, 2023
@netlify
Copy link

netlify bot commented May 24, 2023

Deploy Preview for kubernetes-sigs-kueue canceled.

Name Link
🔨 Latest commit 48713ed
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-kueue/deploys/64744ea0b6affe00085c3cc6

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label May 24, 2023
@trasc trasc marked this pull request as draft May 24, 2023 12:50
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 24, 2023
@trasc trasc force-pushed the concurrent-cohort-preemption branch from 2a6f650 to b2f415a Compare May 24, 2023 14:33
@trasc
Copy link
Contributor Author

trasc commented May 24, 2023

/retest

@trasc trasc changed the title Concurrent cohort preemption Concurrent admission in cohort when borrowing May 24, 2023
@trasc trasc marked this pull request as ready for review May 24, 2023 14:49
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 24, 2023
@k8s-ci-robot k8s-ci-robot requested a review from denkensk May 24, 2023 14:49
@alculquicondor
Copy link
Contributor

/milestone v0.4

@k8s-ci-robot k8s-ci-robot added this to the v0.4 milestone May 24, 2023
pkg/scheduler/scheduler.go Outdated Show resolved Hide resolved
pkg/scheduler/scheduler.go Show resolved Hide resolved
test/integration/scheduler/preemption_test.go Outdated Show resolved Hide resolved
test/integration/scheduler/preemption_test.go Outdated Show resolved Hide resolved
test/integration/scheduler/preemption_test.go Outdated Show resolved Hide resolved
@trasc trasc force-pushed the concurrent-cohort-preemption branch from b2f415a to f0d14ca Compare May 25, 2023 09:18
@alculquicondor alculquicondor changed the title Concurrent admission in cohort when borrowing Fix concurrent admission in cohort when borrowing May 25, 2023
@trasc trasc force-pushed the concurrent-cohort-preemption branch from f0d14ca to 26dd904 Compare May 26, 2023 07:12
Copy link
Contributor

@alculquicondor alculquicondor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just some nits

/approve

pkg/scheduler/scheduler.go Show resolved Hide resolved
pkg/scheduler/scheduler_test.go Outdated Show resolved Hide resolved
pkg/scheduler/scheduler_test.go Outdated Show resolved Hide resolved
Copy link
Contributor

@alculquicondor alculquicondor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just some nits

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alculquicondor, trasc

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 26, 2023
@trasc trasc force-pushed the concurrent-cohort-preemption branch from 26dd904 to c32e1a1 Compare May 29, 2023 06:54
@trasc trasc force-pushed the concurrent-cohort-preemption branch from c32e1a1 to 48713ed Compare May 29, 2023 07:05
Copy link
Contributor

@alculquicondor alculquicondor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/label tide/merge-method-squash

@k8s-ci-robot k8s-ci-robot added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label May 29, 2023
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 29, 2023
@alculquicondor
Copy link
Contributor

tests failing due to kubernetes/test-infra#29622

@trasc
Copy link
Contributor Author

trasc commented May 29, 2023

/retest

1 similar comment
@trasc
Copy link
Contributor Author

trasc commented May 30, 2023

/retest

@trasc
Copy link
Contributor Author

trasc commented May 30, 2023

/test pull-kueue-test-integration-main

@k8s-ci-robot k8s-ci-robot merged commit f694062 into kubernetes-sigs:main May 30, 2023
@alculquicondor
Copy link
Contributor

/cherry-pick release-0.3

@k8s-infra-cherrypick-robot
Copy link
Contributor

@alculquicondor: #805 failed to apply on top of branch "release-0.3":

Applying: Concurrent preemption within cohort
Applying: Fix over-admission in concurrent preemption
Using index info to reconstruct a base tree...
M	pkg/cache/cache.go
M	pkg/scheduler/scheduler.go
Falling back to patching base and 3-way merge...
Auto-merging pkg/scheduler/scheduler.go
Auto-merging pkg/cache/cache.go
CONFLICT (content): Merge conflict in pkg/cache/cache.go
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0002 Fix over-admission in concurrent preemption
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

In response to this:

/cherry-pick release-0.3

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@alculquicondor
Copy link
Contributor

@trasc can you create a manual cherry-pick for the release-0.3 branch?

@trasc
Copy link
Contributor Author

trasc commented May 30, 2023

will do

trasc added a commit to epam/kubernetes-kueue that referenced this pull request May 31, 2023
* [scheduler/tests] Concurrent preemption within cohort

* [scheduler] Fix over-admission in concurrent preemption
k8s-ci-robot added a commit that referenced this pull request May 31, 2023
Fix concurrent admission in cohort when borrowing (#805)
@trasc trasc deleted the concurrent-cohort-preemption branch June 9, 2023 07:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Concurrent preemption within cohort can lead to over-admission.
4 participants