-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvserver: allocator cpu balancing for overload protection #90582
Labels
A-kv-distribution
Relating to rebalancing and leasing.
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
T-kv
KV Team
Milestone
Comments
kvoli
added
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
A-kv-distribution
Relating to rebalancing and leasing.
labels
Oct 24, 2022
x-linking #83490. |
related #90574 |
4 tasks
exalate-issue-sync
bot
changed the title
kvserver: allocator should prevent and disperse overload when possible
kvserver: allocator should prevent and disperse CPU overload when possible
Dec 8, 2022
exalate-issue-sync
bot
changed the title
kvserver: allocator should prevent and disperse CPU overload when possible
kvserver: allocator should prevent and disperse overload when possible
Dec 8, 2022
kvoli
added a commit
to kvoli/cockroach
that referenced
this issue
Jan 10, 2023
Previously, there was no metric indication that a store was unable to reduces its load below a balance threshold due to exhausing potential rebalance actions. This metric is desirable to determine when the action space needs to be increased (by splitting etc) to balance load. This patch adds this metric: `rebalancing.rebalancing.state.imbalanced_overfull_options_exhausted` Which maintains a counter, incremented each time the a store's rebalancer is unable to reduce the load on the store below the overfull threshold due to running out of available rebalance actions. Part of: cockroachdb#90582 Release note: None
kvoli
added a commit
to kvoli/cockroach
that referenced
this issue
Jan 10, 2023
Previously, there was no metric that indicated a store was unable to reduce its load below the cluster defined balance threshold due to exhausting available rebalance actions. This metric is desirable to determine when the rebalance action space needs to be expanded to balance load e.g. splitting heavily loaded range. This patch adds this metric: `rebalancing.state.imbalanced_overfull_options_exhausted` Which maintains a counter, incremented each time a store's rebalancer is unable to reduce the load on the store below the overfull threshold due to running out of available rebalance actions. Part of: cockroachdb#90582 Release note: None
kvoli
added a commit
to kvoli/cockroach
that referenced
this issue
Jan 11, 2023
Previously, there was no metric that indicated a store was unable to reduce its load below the cluster defined balance threshold due to exhausting available rebalance actions. This metric is desirable to determine when the rebalance action space needs to be expanded to balance load e.g. splitting heavily loaded range. This patch adds this metric: `rebalancing.state.imbalanced_overfull_options_exhausted` Which maintains a counter, incremented each time a store's rebalancer is unable to reduce the load on the store below the overfull threshold due to running out of available rebalance actions. Part of: cockroachdb#90582 Release note: None
kvoli
added a commit
to kvoli/cockroach
that referenced
this issue
Jan 13, 2023
This patch instruments the store rebalancer using store cpu time as opposed to QPS when balancing the cluster. This patch adds `store_cpu` as an option with the existing, now public cluster setting: `kv.allocator.load_based_rebalancing_dimension` When set to `store_cpu`, rather than `qps`. The store rebalancer will perform a mostly identical function, however target balancing the sum of all replica's cpu time on each store, rather than qps. Similar to QPS, the rebalance threshold can be set to allow controlling the aggressiveness of balancing: `kv.allocator.store_cpu_rebalance_threshold`: 0.1 Part of: cockroachdb#90582 Release note (ops change): Add option to balance store cpu time instead of queries per second (qps) by setting `kv.allocator.load_based_rebalancing_dimension='store_cpu'`. `kv.allocator.store_cpu_rebalance_threshold` is also added, similar to `kv.allocator.qps_rebalance_threshold` to control the target range for store cpu above and below the cluster mean.
kvoli
changed the title
kvserver: allocator should prevent and disperse overload when possible
kvserver: allocator cpu balance for overload
Jan 17, 2023
kvoli
changed the title
kvserver: allocator cpu balance for overload
kvserver: allocator cpu balancing for overload protection
Jan 17, 2023
craig bot
pushed a commit
that referenced
this issue
Feb 22, 2023
97424: kvserver: enable cpu balancing by default r=nvanbenschoten a=kvoli This commit switches the default load based rebalancing objective from `qps` to `cpu`. A performance comparison can be found on #90590. resolves: #90582 Release note (ops change): CPU balancing is enabled as the default load based rebalancing objective. This can be reverted by setting `kv.allocator.load_based_rebalancing.objective` to `qps`. Co-authored-by: Austen McClernon <austen@cockroachlabs.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
A-kv-distribution
Relating to rebalancing and leasing.
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
T-kv
KV Team
This is a tracking issue for balancing CPU in a cluster. The motivation for balancing CPU is to prevent and if necessary, mitigate CPU saturation. The current balance objective is based on QPS, which has known limitations 12 in managing CPU saturation.
Tracked issues:
Epic: CRDB-20845
Footnotes
related issue ↩
internal allocator signal experiments ↩
The text was updated successfully, but these errors were encountered: