[Capacity Scheduler] Capacity scheduler won't preempt pods if all resource items are used #683

bfinta · 2023-12-05T14:17:35Z

Area

Scheduler
Controller
Helm Chart
Documents

Other components

No response

What happened?

Capacity scheduler won't preempt pods if all resource items are used because of https://github.com/kubernetes-sigs/scheduler-plugins/blob/master/pkg/capacityscheduling/capacity_scheduling.go#L592

Scenario:
Cluster has 5 GPUs. Team A has the following elastic quota: gpu.min: 4, gpu.max: 5. Team B has gpu.min: 1, gpu.max: 5.
Team A runs a workload with 5 pods and it consumes all 5 GPUs. When Team B wants to run their workload even with 1 GPU, the pod stays in Pending, because
sum(quotas.used) + pod.requests > sum(quotas.min)
5 + 1 > 5

What did you expect to happen?

The scheduler should preempt pods until the other EQ's min is reached.

How can we reproduce it (as minimally and precisely as possible)?

No response

Anything else we need to know?

It would be great to be configured in the scheduler configuration file.

Kubernetes version

Client Version: version.Info{Major:"1", Minor:"26", GitVersion:"v1.26.1", GitCommit:"8f94681cd294aa8cfd3407b8191f6c70214973a4", GitTreeState:"clean", BuildDate:"2023-01-18T15:58:16Z", GoVersion:"go1.19.5", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v4.5.7
Server Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.6", GitCommit:"b39bf148cd654599a52e867485c02c4f9d28b312", GitTreeState:"clean", BuildDate:"2022-09-21T13:12:04Z", GoVersion:"go1.18.6", Compiler:"gc", Platform:"linux/amd64"}

Scheduler Plugins version

registry.k8s.io/scheduler-plugins/controller:v0.24.9

The text was updated successfully, but these errors were encountered:

Huang-Wei · 2023-12-08T01:11:30Z

I think it's part of original design to ensure the system's resource is not fully occupied by a single tenant/namespace.

@denkensk I mentioned this symptom to you the other day. Overall, I felt this is a bit counterintuitive as setting two tenants as 4/5 and 1/5 while having 5 in total sounds a common practice to allocate elastic quota. Could you help shed some light on the original design?

k8s-triage-robot · 2024-03-07T01:14:20Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

bfinta · 2024-03-07T10:34:05Z

/remove-lifecycle stale

k8s-triage-robot · 2024-06-05T11:13:30Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2024-07-05T11:35:41Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot · 2024-08-04T11:48:56Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot · 2024-08-04T11:49:00Z

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen

Mark this issue as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

bfinta added the kind/bug Categorizes issue or PR as related to a bug. label Dec 5, 2023

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 7, 2024

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 7, 2024

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 5, 2024

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jul 5, 2024

k8s-ci-robot closed this as not planned Won't fix, can't repro, duplicate, stale Aug 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Capacity Scheduler] Capacity scheduler won't preempt pods if all resource items are used #683

[Capacity Scheduler] Capacity scheduler won't preempt pods if all resource items are used #683

bfinta commented Dec 5, 2023 •

edited

Loading

Huang-Wei commented Dec 8, 2023

k8s-triage-robot commented Mar 7, 2024

bfinta commented Mar 7, 2024

k8s-triage-robot commented Jun 5, 2024

k8s-triage-robot commented Jul 5, 2024

k8s-triage-robot commented Aug 4, 2024

k8s-ci-robot commented Aug 4, 2024

[Capacity Scheduler] Capacity scheduler won't preempt pods if all resource items are used #683

[Capacity Scheduler] Capacity scheduler won't preempt pods if all resource items are used #683

Comments

bfinta commented Dec 5, 2023 • edited Loading

Area

Other components

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

Kubernetes version

Scheduler Plugins version

Huang-Wei commented Dec 8, 2023

k8s-triage-robot commented Mar 7, 2024

bfinta commented Mar 7, 2024

k8s-triage-robot commented Jun 5, 2024

k8s-triage-robot commented Jul 5, 2024

k8s-triage-robot commented Aug 4, 2024

k8s-ci-robot commented Aug 4, 2024

bfinta commented Dec 5, 2023 •

edited

Loading