Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: allocator cpu balancing for overload protection #90582

Closed
13 tasks done
kvoli opened this issue Oct 24, 2022 · 3 comments · Fixed by #97424
Closed
13 tasks done

kvserver: allocator cpu balancing for overload protection #90582

kvoli opened this issue Oct 24, 2022 · 3 comments · Fixed by #97424
Assignees
Labels
A-kv-distribution Relating to rebalancing and leasing. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-kv KV Team
Milestone

Comments

@kvoli kvoli added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-kv-distribution Relating to rebalancing and leasing. labels Oct 24, 2022
@kvoli kvoli added this to the 23.1 milestone Oct 24, 2022
@blathers-crl blathers-crl bot added the T-kv KV Team label Oct 24, 2022
@irfansharif
Copy link
Contributor

x-linking #83490.

@kvoli
Copy link
Collaborator Author

kvoli commented Oct 25, 2022

related #90574

@kvoli
Copy link
Collaborator Author

kvoli commented Dec 8, 2022

Moved #90140 here from tracking issue #90137.

@exalate-issue-sync exalate-issue-sync bot changed the title kvserver: allocator should prevent and disperse overload when possible kvserver: allocator should prevent and disperse CPU overload when possible Dec 8, 2022
@exalate-issue-sync exalate-issue-sync bot changed the title kvserver: allocator should prevent and disperse CPU overload when possible kvserver: allocator should prevent and disperse overload when possible Dec 8, 2022
kvoli added a commit to kvoli/cockroach that referenced this issue Jan 10, 2023
Previously, there was no metric indication that a store was unable to
reduces its load below a balance threshold due to exhausing potential
rebalance actions. This metric is desirable to determine when the action
space needs to be increased (by splitting etc) to balance load.

This patch adds this metric:

`rebalancing.rebalancing.state.imbalanced_overfull_options_exhausted`

Which maintains a counter, incremented each time the a store's
rebalancer is unable to reduce the load on the store below the overfull
threshold due to running out of available rebalance actions.

Part of: cockroachdb#90582

Release note: None
kvoli added a commit to kvoli/cockroach that referenced this issue Jan 10, 2023
Previously, there was no metric that indicated a store was unable to
reduce its load below the cluster defined balance threshold due to
exhausting available rebalance actions. This metric is desirable to
determine when the rebalance action space needs to be expanded to
balance load e.g. splitting heavily loaded range.

This patch adds this metric:

`rebalancing.state.imbalanced_overfull_options_exhausted`

Which maintains a counter, incremented each time a store's
rebalancer is unable to reduce the load on the store below the overfull
threshold due to running out of available rebalance actions.

Part of: cockroachdb#90582

Release note: None
kvoli added a commit to kvoli/cockroach that referenced this issue Jan 11, 2023
Previously, there was no metric that indicated a store was unable to
reduce its load below the cluster defined balance threshold due to
exhausting available rebalance actions. This metric is desirable to
determine when the rebalance action space needs to be expanded to
balance load e.g. splitting heavily loaded range.

This patch adds this metric:

`rebalancing.state.imbalanced_overfull_options_exhausted`

Which maintains a counter, incremented each time a store's
rebalancer is unable to reduce the load on the store below the overfull
threshold due to running out of available rebalance actions.

Part of: cockroachdb#90582

Release note: None
kvoli added a commit to kvoli/cockroach that referenced this issue Jan 13, 2023
This patch instruments the store rebalancer using store cpu time as
opposed to QPS when balancing the cluster. This patch adds `store_cpu`
as an option with the existing, now public cluster setting:

`kv.allocator.load_based_rebalancing_dimension`

When set to `store_cpu`, rather than `qps`. The store rebalancer will perform
a mostly identical function, however target balancing the sum of all
replica's cpu time on each store, rather than qps.

Similar to QPS, the rebalance threshold can be set to allow controlling
the aggressiveness of balancing:

`kv.allocator.store_cpu_rebalance_threshold`: 0.1

Part of: cockroachdb#90582

Release note (ops change):
Add option to balance store cpu time instead of queries per second (qps)
by setting `kv.allocator.load_based_rebalancing_dimension='store_cpu'`.
`kv.allocator.store_cpu_rebalance_threshold` is also added, similar
to `kv.allocator.qps_rebalance_threshold` to control the target range
for store cpu above and below the cluster mean.
@kvoli kvoli changed the title kvserver: allocator should prevent and disperse overload when possible kvserver: allocator cpu balance for overload Jan 17, 2023
@kvoli kvoli changed the title kvserver: allocator cpu balance for overload kvserver: allocator cpu balancing for overload protection Jan 17, 2023
craig bot pushed a commit that referenced this issue Feb 22, 2023
97424: kvserver: enable cpu balancing by default r=nvanbenschoten a=kvoli

This commit switches the default load based rebalancing objective from `qps` to `cpu`. A performance comparison can be found on #90590.

resolves: #90582

Release note (ops change): CPU balancing is enabled as the default load based rebalancing objective. This can be reverted by setting `kv.allocator.load_based_rebalancing.objective` to `qps`.

Co-authored-by: Austen McClernon <austen@cockroachlabs.com>
@craig craig bot closed this as completed in 8b202b9 Feb 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-distribution Relating to rebalancing and leasing. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-kv KV Team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants