core: fix node reservation scoring #7730

schmichael · 2020-04-15T21:39:35Z

The BinPackIter accounted for node reservations twice when scoring nodes
which could bias scores toward nodes with reservations.

Pseudo-code for previous algorithm:

	proposed  = reservedResources + sum(allocsResources)
	available = nodeResources - reservedResources
	score     = 1 - (proposed / available)

The node's reserved resources are added to the total resources used by
allocations, and then the node's reserved resources are later
substracted from the node's overall resources.

The new algorithm is:

	proposed  = sum(allocResources)
	available = nodeResources - reservedResources
	score     = 1 - (proposed / available)

The node's reserved resources are no longer added to the total resources
used by allocations.

My guess as to how this bug happened is that the resource utilization
variable (util) is calculated and returned by the AllocsFit function
which needs to take reserved resources into account as a basic
feasibility check.

To avoid re-calculating alloc resource usage (because there may be a
large number of allocs), we reused util in the ScoreFit function.
ScoreFit properly accounts for reserved resources by subtracting them
from the node's overall resources. However since util also took
reserved resources into account the score would be incorrect.

Prior to the fix the added test output:

Node: reserved     Score: 1.0000
Node: reserved2    Score: 1.0000
Node: no-reserved  Score: 0.9741

The scores being 1.0 for both nodes with reserved resources is a good
hint something is wrong as they should receive different scores (and neither
is a perfect fit). Upon further inspection the double accounting of reserved
resources caused their scores to be >1.0 and clamped.

After the fix the added test outputs:

Node: no-reserved  Score: 0.9741
Node: reserved     Score: 0.9480
Node: reserved2    Score: 0.8717

The BinPackIter accounted for node reservations twice when scoring nodes which could bias scores toward nodes with reservations. Pseudo-code for previous algorithm: ``` proposed = reservedResources + sum(allocsResources) available = nodeResources - reservedResources score = 1 - (proposed / available) ``` The node's reserved resources are added to the total resources used by allocations, and then the node's reserved resources are later substracted from the node's overall resources. The new algorithm is: ``` proposed = sum(allocResources) available = nodeResources - reservedResources score = 1 - (proposed / available) ``` The node's reserved resources are no longer added to the total resources used by allocations. My guess as to how this bug happened is that the resource utilization variable (`util`) is calculated and returned by the `AllocsFit` function which needs to take reserved resources into account as a basic feasibility check. To avoid re-calculating alloc resource usage (because there may be a large number of allocs), we reused `util` in the `ScoreFit` function. `ScoreFit` properly accounts for reserved resources by subtracting them from the node's overall resources. However since `util` _also_ took reserved resources into account the score would be incorrect. Prior to the fix the added test output: ``` Node: reserved Score: 1.0000 Node: reserved2 Score: 1.0000 Node: no-reserved Score: 0.9741 ``` The scores being 1.0 for *both* nodes with reserved resources is a good hint something is wrong as they should receive different scores. Upon further inspection the double accounting of reserved resources caused their scores to be >1.0 and clamped. After the fix the added test outputs: ``` Node: no-reserved Score: 0.9741 Node: reserved Score: 0.9480 Node: reserved2 Score: 0.8717 ```

dadgar

Code changes look good! Skimmed the tests

website/pages/docs/upgrade/upgrade-specific.mdx

Co-authored-by: Alex Dadgar <alex@hashicorp.com>

github-actions · 2023-01-07T02:15:27Z

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

schmichael requested review from notnoop and dadgar April 15, 2020 21:39

schmichael force-pushed the b-reserved-scoring branch from f02c834 to e86059b Compare April 15, 2020 21:44

schmichael added a commit that referenced this pull request Apr 15, 2020

docs: add #7730 to changelog

7fdbb8a

schmichael added 2 commits April 15, 2020 15:13

docs: add #7730 to changelog

68aca51

schmichael force-pushed the b-reserved-scoring branch from 7fdbb8a to 68aca51 Compare April 15, 2020 22:13

dadgar approved these changes Apr 16, 2020

View reviewed changes

notnoop approved these changes Apr 20, 2020

View reviewed changes

docs: mention scoring change from #7730

df4af4d

dadgar reviewed Apr 24, 2020

View reviewed changes

website/pages/docs/upgrade/upgrade-specific.mdx Outdated Show resolved Hide resolved

schmichael and others added 2 commits April 30, 2020 14:47

Update website/pages/docs/upgrade/upgrade-specific.mdx

26d34f0

Co-authored-by: Alex Dadgar <alex@hashicorp.com>

Merge branch 'master' into b-reserved-scoring

e3cba0c

schmichael merged commit cbcd3eb into master Apr 30, 2020

schmichael deleted the b-reserved-scoring branch April 30, 2020 21:48

github-actions bot locked as resolved and limited conversation to collaborators Jan 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

core: fix node reservation scoring #7730

core: fix node reservation scoring #7730

schmichael commented Apr 15, 2020 •

edited

Loading

dadgar left a comment

github-actions bot commented Jan 7, 2023

core: fix node reservation scoring #7730

core: fix node reservation scoring #7730

Conversation

schmichael commented Apr 15, 2020 • edited Loading

dadgar left a comment

Choose a reason for hiding this comment

github-actions bot commented Jan 7, 2023

schmichael commented Apr 15, 2020 •

edited

Loading