[Features] improve scheduling performance on batch jobs #492

jiaxuanzhou · 2018-12-10T09:40:18Z

Is this a BUG REPORT or FEATURE REQUEST?:
more like feature, but bug when scheduling tasks with Critical Source.

Uncomment only one, leave it on its own line:

/kind feature

What happened:
1, Looping checking the whole nodes of the cluster on each loop of scheduling one task
2, Allocated resource for part of tasks within one job, even the idle resource of the cluster can not satisfy all the tasks of the job.
What you expected to happen:
1, For action of allocate as an example:
https://github.com/kubernetes-sigs/kube-batch/blob/c4896a41a061cd2e3d071fc01b1dd15df06b84ea/pkg/scheduler/actions/allocate/allocate.go#L112
before allocating resources for the tasks of one job, better filtering the nodes and calculate the idle of the nodes to check if they can satisfy the req resources of the whole job, if not just leave it in the queue and continue to handle the next one...

2, And for https://github.com/kubernetes-sigs/kube-batch/blob/c4896a41a061cd2e3d071fc01b1dd15df06b84ea/pkg/scheduler/framework/session.go#L144
Is it better to release the resources of the tasks which are not bound to node as the idle resources of the cluster can not satisfy the whole job?
Currently, it often occurred that 2 jobs were pending even the idle resources of the cluster can satisfy the last submitted one.
How to reproduce it (as minimally and precisely as possible):
Suppose the idle resources of the cluster is: 100 cores, 100GB mem
1, submit 1st job with 4 pods, each req: 30 cores, 40GB mem
2, submit 2nd job with 4 pods, each req: 20 cores, 20GB mem

Anything else we need to know?:

Environment:
kube-batch: master branch

The text was updated successfully, but these errors were encountered:

jiaxuanzhou · 2018-12-10T09:41:54Z

/cc @k82cn

k82cn · 2018-12-13T19:18:36Z

Is it better to release the resources of the tasks which are not bound to node as the idle resources of the cluster can not satisfy the whole job?

We need to care about starvation of "huge job" by "smaller job"; the other part is ok to me :)

k82cn · 2019-04-15T10:22:58Z

/kind feature
/sig scheduling

dhzhuo · 2019-04-23T15:50:57Z

@jiaxuanzhou

before allocating resources for the tasks of one job, better filtering the nodes and calculate the idle of the nodes to check if they can satisfy the req resources of the whole job, if not just leave it in the queue and continue to handle the next one

With this design, big jobs (i.e. a job with many tasks) are more likely to be starved because we keep allocating resources to smaller jobs. If we add a starvation prevention mechanism, big jobs will eventually be scheduled. But this behavior is not ideal because starvation prevention mechanism will kick in only after the job has waited for long than specified starvation threshold.

PR 821 proposed a slightly different approach. We always allocate resources to big jobs. But if the big job is not ready to run, allocated resources will be released in backfill phase, and smaller jobs will be scheduled as backfill jobs. In the following scheduling rounds, if the scheduler notices that the big job is conditionally ready, it will either 1) preempt backfill jobs and start big job right away (in preemption mode), or 2) disable backfilling, wait for backfill jobs to finish, and start the big job (in non-preemption mode). In either case, the big job is likely to start earlier.

jiaxuanzhou · 2019-04-24T01:24:29Z

@DonghuiZhuo looks good, looking forward the pr of backfill part, thanks 👍

k82cn · 2019-04-24T11:31:50Z

There're two issue here: 1. fragment because of queue's algorithm 2. starvation. In this issue, we only need to handle first one :)

dhzhuo · 2019-04-24T16:10:58Z

@jiaxuanzhou PR 805 implements backfill.

fejta-bot · 2019-07-23T16:57:03Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2019-08-22T16:58:12Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

fejta-bot · 2019-09-21T17:41:43Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

k8s-ci-robot · 2019-09-21T17:41:51Z

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Dec 10, 2018

jiaxuanzhou changed the title ~~[Features] improve scheduling on batch jobs~~ [Features] improve scheduling performance on batch jobs Dec 10, 2018

k82cn mentioned this issue Dec 26, 2018

big PodGroup blocks scheduling issue #514

Closed

k82cn mentioned this issue Jan 25, 2019

Schedulable job is pending if there is a non-scheduable job submitted prior to it #561

Closed

k8s-ci-robot added the sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. label Apr 15, 2019

jiaxuanzhou mentioned this issue Apr 23, 2019

Added design doc for backfill and starvation prevention #821

Closed

k82cn mentioned this issue Jun 21, 2019

Theme of next release volcano-sh/volcano#244

Closed

10 tasks

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 23, 2019

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Aug 22, 2019

k8s-ci-robot closed this as completed Sep 21, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Features] improve scheduling performance on batch jobs #492

[Features] improve scheduling performance on batch jobs #492

jiaxuanzhou commented Dec 10, 2018 •

edited

Loading

jiaxuanzhou commented Dec 10, 2018

k82cn commented Dec 13, 2018

k82cn commented Apr 15, 2019

dhzhuo commented Apr 23, 2019 •

edited

Loading

jiaxuanzhou commented Apr 24, 2019 •

edited

Loading

k82cn commented Apr 24, 2019

dhzhuo commented Apr 24, 2019

fejta-bot commented Jul 23, 2019

fejta-bot commented Aug 22, 2019

fejta-bot commented Sep 21, 2019

k8s-ci-robot commented Sep 21, 2019

[Features] improve scheduling performance on batch jobs #492

[Features] improve scheduling performance on batch jobs #492

Comments

jiaxuanzhou commented Dec 10, 2018 • edited Loading

jiaxuanzhou commented Dec 10, 2018

k82cn commented Dec 13, 2018

k82cn commented Apr 15, 2019

dhzhuo commented Apr 23, 2019 • edited Loading

jiaxuanzhou commented Apr 24, 2019 • edited Loading

k82cn commented Apr 24, 2019

dhzhuo commented Apr 24, 2019

fejta-bot commented Jul 23, 2019

fejta-bot commented Aug 22, 2019

fejta-bot commented Sep 21, 2019

k8s-ci-robot commented Sep 21, 2019

jiaxuanzhou commented Dec 10, 2018 •

edited

Loading

dhzhuo commented Apr 23, 2019 •

edited

Loading

jiaxuanzhou commented Apr 24, 2019 •

edited

Loading