spread scheduling algorithm #7810

notnoop · 2020-04-27T12:38:21Z

This PR introduces spread scheduling algorithm, as an alternative to binpacking.

In mostly-static clusters, some cluster operators prefer to spread the load rather than binpack them. When spreading, clients will share the workload more evenly. It also handles failures a bit more gracefully: if an allocation misbehaves and becomes a noisy neighbor by saturating IO or network (currently not isolated by nomad), it will affect less allocations. Also, if a client becomes an unhealthy, the bounds on number of affected allocations will be tighter.

One significant downside of spreading is management of large allocations in fragmented clusters. Consider a simple case: 2 nodes with 4GB each, each are running an allocation using 2GB; A new job requiring 3GB of RAM will fail to schedule despite the cluster having 4GB free RAM. Though possible in binpacking algorithm, it's more likely under spread conditions. Operators can mitigate this by ensuring that jobs require small fraction of each node resources, sparating large jobs to its own datacenter/region, and/or ensuring that job CPU/RAM requirements are similar.

Implementation Details

This implementation adds a SchedulerAlgorithm option to the operator SchedulerConfiguration knob.

The flag controls the cluster/region level, so it's controlled by cluster operators, rather than job submitters. Given the advantages (and disadvantages) affect overall cluster performance and management rather than individual jobs, it's operators that can control it.

Future iterations may allow for scheduler config options per datacenter or another layer of pooling. Depending on demand, we can introduce a per-DC config override, for both this scheduling algorithm or for preemption too if demand raises.

@angrycub wrote most of the implementation. I mostly did some cosmetic changes.

nomad/structs/funcs_test.go

schmichael

Looks great. We'll need docs and a changelog entry as well.

AFAICT this can be set via default_scheduler_config, but I thought I'd double check as it seems likely operators will want to bake this setting in from the beginning for bare metal clusters.

notnoop · 2020-04-30T22:41:19Z

AFAICT this can be set via default_scheduler_config, but I thought I'd double check as it seems likely operators will want to bake this setting in from the beginning for bare metal clusters.

Yes! The parsing is tested with command/agent/testdata/basic.hcl change, and from there, it's the same path as before - where server commits default scheduler config to raft on becoming a leader.

spread scheduling algorithm

spuder · 2020-05-17T17:42:56Z

Is there documentation for this feature? I'd like to use spread over binpack but haven't been able to find how to implement it yet.

angrycub · 2020-05-17T18:06:45Z

You can set it in an existing cluster using the API - https://www.nomadproject.io/api-docs/operator/#update-scheduler-configuration. For a new cluster, you can pre-configure the defaults here https://www.nomadproject.io/docs/configuration/server/#configuring-scheduler-config

schmichael · 2020-05-18T16:13:43Z

@spuder: I also created #7999 to improve our documentation. It's a disproportionately small code change to the impact it has on scheduling! We should guide users appropriately.

Legogris · 2020-07-11T04:33:10Z

General question:

One significant downside of spreading is management of large allocations in fragmented clusters. Consider a simple case: 2 nodes with 4GB each, each are running an allocation using 2GB; A new job requiring 3GB of RAM will fail to schedule despite the cluster having 4GB free RAM.

If one of the two jobs were reallocated at that point, that would make all jobs fit. Are there any intentions to improve scheduling to be able to accommodate for redistributing already running jobs to be able to fit allocations better as jobs come and go for future versions?

github-actions · 2022-12-30T02:15:13Z

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

notnoop added the theme/scheduling label Apr 27, 2020

notnoop requested review from schmichael and angrycub April 27, 2020 12:38

jippi reviewed Apr 27, 2020

View reviewed changes

nomad/structs/funcs_test.go Outdated Show resolved Hide resolved

schmichael approved these changes Apr 30, 2020

View reviewed changes

angrycub and others added 4 commits May 1, 2020 13:13

Add SchedulerAlgorithm to SchedulerConfig

6571cce

Wiring algorithm to scheduler calls

1af6a2a

tests and some clean up

5078e0c

changelog and fix typo

9962b9f

notnoop force-pushed the spread-configuration branch from d73f869 to 9962b9f Compare May 1, 2020 17:14

notnoop merged commit f5775de into master May 1, 2020

notnoop deleted the spread-configuration branch May 1, 2020 17:15

notnoop pushed a commit that referenced this pull request May 1, 2020

Merge pull request #7810 from hashicorp/spread-configuration

8883843

spread scheduling algorithm

notnoop pushed a commit that referenced this pull request May 3, 2020

Merge pull request #7810 from hashicorp/spread-configuration

8d73a97

spread scheduling algorithm

schmichael mentioned this pull request May 18, 2020

Document Spread scheduling algorithm #7999

Open

idrennanvmware mentioned this pull request Jun 5, 2020

Unable to change default SchedulerAlgorithm via Api #8070

Closed

tgross mentioned this pull request Jul 31, 2020

New Ranker that would facilitate "fairly" spreading job groups to eligible nodes #3690

Closed

github-actions bot locked as resolved and limited conversation to collaborators Dec 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

spread scheduling algorithm #7810

spread scheduling algorithm #7810

notnoop commented Apr 27, 2020

schmichael left a comment

notnoop commented Apr 30, 2020

spuder commented May 17, 2020

angrycub commented May 17, 2020

schmichael commented May 18, 2020

Legogris commented Jul 11, 2020

github-actions bot commented Dec 30, 2022

spread scheduling algorithm #7810

spread scheduling algorithm #7810

Conversation

notnoop commented Apr 27, 2020

Implementation Details

schmichael left a comment

Choose a reason for hiding this comment

notnoop commented Apr 30, 2020

spuder commented May 17, 2020

angrycub commented May 17, 2020

schmichael commented May 18, 2020

Legogris commented Jul 11, 2020

github-actions bot commented Dec 30, 2022