Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: cpu based load splitting #95377

Closed
Tracked by #90582
kvoli opened this issue Jan 17, 2023 · 1 comment · Fixed by #96128
Closed
Tracked by #90582

kvserver: cpu based load splitting #95377

kvoli opened this issue Jan 17, 2023 · 1 comment · Fixed by #96128
Assignees
Labels
A-kv-distribution Relating to rebalancing and leasing. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-kv KV Team
Milestone

Comments

@kvoli
Copy link
Collaborator

kvoli commented Jan 17, 2023

In #90574 we added support for finding a split point of a range based on a weighted midpoint. Previously, only unweighted midpoints e.g. all queries=1 was supported.

This issue is to integrate this weighted split finder with request cpu and instrument cpu based load splitting.

additional considerations

The merge queue uses with the load of a range and the load split threshold when deciding whether to merge two adjacent ranges. These are collected via RPC. This presents an issue in

  1. mixed version

Additional fields may not be populated for CPU, so a fallback is needed.

  1. in-between or shortly after switching cpu/qps based load splitting

Similar issue to above.

Jira issue: CRDB-23491

@kvoli kvoli added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Jan 17, 2023
@blathers-crl blathers-crl bot added the T-kv KV Team label Jan 17, 2023
@kvoli kvoli added this to the 23.1 milestone Jan 17, 2023
@kvoli kvoli self-assigned this Jan 17, 2023
@kvoli kvoli added the A-kv-distribution Relating to rebalancing and leasing. label Jan 17, 2023
kvoli added a commit to kvoli/cockroach that referenced this issue Jan 17, 2023
This commit instruments the ability to peform load based splitting with
replica cpu usage rather than queries per second. Load based splitting
now will use either cpu or qps for deciding split points, depending on
the cluster setting `kv.allocator.load_based_rebalancing_dimension`.

When set to `qps`, qps is used in deciding split points and when
splitting should occur; similarly, `store_cpu` means that request cpu
against a replicas keyspan is used to decide split points.

To revert to QPS splits when
`kv.allocator.load_based_rebalancing_dimension` is `store_cpu`,
`kv.unweighted_lb_split_finder.enabled` can be set to `true`. This will
disable using CPU splits. TODO(kvoli): reasoning on this.

The split threshold when using `store_cpu` is the cluster setting
`kv.range_split.load_cpu_threshold` which defaults to `250ms` of cpu time
per second, i.e. a replica using 1/4 processor of a machine consistently.

The merge queue uses the load based splitter in deciding whether to
merge due to low load. This commit also updates the merge queue to be
consistent with the load based splitter signal. When switching between
`qps` and `store_cpu`, the load based splitter for every replica is
reset to avoid spurious results.

resolves: cockroachdb#95377

Release note (ops change): Load based splitter now supports using request
cpu usage to split ranges. This is introduced with the previous cluster
setting `kv.allocator.load_based_rebalancing_dimension`, which when set
to `store_cpu`, will use request cpu usage. The threshold above which
CPU usage of a replica is considered for splitting is defined in the
cluster setting `kv.range_split.load_cpu_threshold`, which has a default
value of `250ms`.
kvoli added a commit to kvoli/cockroach that referenced this issue Jan 17, 2023
This commit instruments the ability to peform load based splitting with
replica cpu usage rather than queries per second. Load based splitting
now will use either cpu or qps for deciding split points, depending on
the cluster setting `kv.allocator.load_based_rebalancing_dimension`.

When set to `qps`, qps is used in deciding split points and when
splitting should occur; similarly, `store_cpu` means that request cpu
against a replicas keyspan is used to decide split points.

To revert to QPS splits when
`kv.allocator.load_based_rebalancing_dimension` is `store_cpu`,
`kv.unweighted_lb_split_finder.enabled` can be set to `true`. This will
disable using CPU splits. TODO(kvoli): reasoning on this.

The split threshold when using `store_cpu` is the cluster setting
`kv.range_split.load_cpu_threshold` which defaults to `250ms` of cpu time
per second, i.e. a replica using 1/4 processor of a machine consistently.

The merge queue uses the load based splitter in deciding whether to
merge due to low load. This commit also updates the merge queue to be
consistent with the load based splitter signal. When switching between
`qps` and `store_cpu`, the load based splitter for every replica is
reset to avoid spurious results.

resolves: cockroachdb#95377

Release note (ops change): Load based splitter now supports using request
cpu usage to split ranges. This is introduced with the previous cluster
setting `kv.allocator.load_based_rebalancing_dimension`, which when set
to `store_cpu`, will use request cpu usage. The threshold above which
CPU usage of a replica is considered for splitting is defined in the
cluster setting `kv.range_split.load_cpu_threshold`, which has a default
value of `250ms`.
@kvoli
Copy link
Collaborator Author

kvoli commented Jan 20, 2023

xref: #83519

@kvoli kvoli changed the title kvserver: instrument cpu based load splitting kvserver: cpu based load splitting Jan 20, 2023
kvoli added a commit to kvoli/cockroach that referenced this issue Jan 23, 2023
This commit adds the ability to peform load based splitting with replica
cpu usage rather than queries per second. Load based splitting now will
use either cpu or qps for deciding split points, depending on the
cluster setting `kv.allocator.load_based_rebalancing_dimension`.

When set to `qps`, qps is used in deciding split points and when
splitting should occur; similarly, `store_cpu` means that request cpu
against a replicas keyspan is used to decide split points.

To revert to QPS splits when
`kv.allocator.load_based_rebalancing_dimension` is `store_cpu`,
`kv.unweighted_lb_split_finder.enabled` can be set to `true`. This will
disable using CPU splits. TODO(kvoli): reasoning on this.

The split threshold when using `store_cpu` is the cluster setting
`kv.range_split.load_cpu_threshold` which defaults to `250ms` of cpu time
per second, i.e. a replica using 1/4 processor of a machine consistently.

The merge queue uses the load based splitter in deciding whether to
merge due to low load. This commit also updates the merge queue to be
consistent with the load based splitter signal. When switching between
`qps` and `store_cpu`, the load based splitter for every replica is
reset to avoid spurious results.

resolves: cockroachdb#95377

Release note (ops change): Load based splitter now supports using request
cpu usage to split ranges. This is introduced with the previous cluster
setting `kv.allocator.load_based_rebalancing_dimension`, which when set
to `store_cpu`, will use request cpu usage. The threshold above which
CPU usage of a replica is considered for splitting is defined in the
cluster setting `kv.range_split.load_cpu_threshold`, which has a default
value of `250ms`.
kvoli added a commit to kvoli/cockroach that referenced this issue Jan 27, 2023
This commit adds the ability to peform load based splitting with replica
cpu usage rather than queries per second. Load based splitting now will
use either cpu or qps for deciding split points, depending on the
cluster setting `kv.allocator.load_based_rebalancing.objective`.

When set to `qps`, qps is used in deciding split points and when
splitting should occur; similarly, `cpu` means that request cpu against
the leasholder replica is to decide split points.

The split threshold when using `cpu` is the cluster setting
`kv.range_split.load_cpu_threshold` which defaults to `250ms` of cpu
time per second, i.e. a replica using 1/4 processor of a machine
consistently.

The merge queue uses the load based splitter to make decisions on
whether to merge two adjacent ranges due to low load. This commit also
updates the merge queue to be consistent with the load based splitter
signal. When switching between `qps` and `cpu`, the load based splitter
for every replica is reset to avoid spurious results.

resolves: cockroachdb#95377

Release note (ops change): Load based splitter now supports using request
cpu usage to split ranges. This is introduced with the previous cluster
setting `kv.allocator.load_based_rebalancing.objective`, which when set
to `cpu`, will use request cpu usage. The threshold above which
CPU usage of a range is considered for splitting is defined in the
cluster setting `kv.range_split.load_cpu_threshold`, which has a default
value of `250ms`.
kvoli added a commit to kvoli/cockroach that referenced this issue Jan 27, 2023
This commit adds the ability to perform load based splitting with replica
cpu usage rather than queries per second. Load based splitting now will
use either cpu or qps for deciding split points, depending on the
cluster setting `kv.allocator.load_based_rebalancing.objective`.

When set to `qps`, qps is used in deciding split points and when
splitting should occur; similarly, `cpu` means that request cpu against
the leaseholder replica is to decide split points.

The split threshold when using `cpu` is the cluster setting
`kv.range_split.load_cpu_threshold` which defaults to `250ms` of cpu
time per second, i.e. a replica using 1/4 processor of a machine
consistently.

The merge queue uses the load based splitter to make decisions on
whether to merge two adjacent ranges due to low load. This commit also
updates the merge queue to be consistent with the load based splitter
signal. When switching between `qps` and `cpu`, the load based splitter
for every replica is reset to avoid spurious results.

resolves: cockroachdb#95377

Release note (ops change): Load based splitter now supports using request
cpu usage to split ranges. This is introduced with the previous cluster
setting `kv.allocator.load_based_rebalancing.objective`, which when set
to `cpu`, will use request cpu usage. The threshold above which
CPU usage of a range is considered for splitting is defined in the
cluster setting `kv.range_split.load_cpu_threshold`, which has a default
value of `250ms`.
kvoli added a commit to kvoli/cockroach that referenced this issue Feb 9, 2023
This commit adds the ability to perform load based splitting with replica
cpu usage rather than queries per second. Load based splitting now will
use either cpu or qps for deciding split points, depending on the
cluster setting `kv.allocator.load_based_rebalancing.objective`.

When set to `qps`, qps is used in deciding split points and when
splitting should occur; similarly, `cpu` means that request cpu against
the leaseholder replica is to decide split points.

The split threshold when using `cpu` is the cluster setting
`kv.range_split.load_cpu_threshold` which defaults to `250ms` of cpu
time per second, i.e. a replica using 1/4 processor of a machine
consistently.

The merge queue uses the load based splitter to make decisions on
whether to merge two adjacent ranges due to low load. This commit also
updates the merge queue to be consistent with the load based splitter
signal. When switching between `qps` and `cpu`, the load based splitter
for every replica is reset to avoid spurious results.

resolves: cockroachdb#95377

Release note (ops change): Load based splitter now supports using request
cpu usage to split ranges. This is introduced with the previous cluster
setting `kv.allocator.load_based_rebalancing.objective`, which when set
to `cpu`, will use request cpu usage. The threshold above which
CPU usage of a range is considered for splitting is defined in the
cluster setting `kv.range_split.load_cpu_threshold`, which has a default
value of `250ms`.
kvoli added a commit to kvoli/cockroach that referenced this issue Feb 9, 2023
This commit adds the ability to perform load based splitting with replica
cpu usage rather than queries per second. Load based splitting now will
use either cpu or qps for deciding split points, depending on the
cluster setting `kv.allocator.load_based_rebalancing.objective`.

When set to `qps`, qps is used in deciding split points and when
splitting should occur; similarly, `cpu` means that request cpu against
the leaseholder replica is to decide split points.

The split threshold when using `cpu` is the cluster setting
`kv.range_split.load_cpu_threshold` which defaults to `250ms` of cpu
time per second, i.e. a replica using 1/4 processor of a machine
consistently.

The merge queue uses the load based splitter to make decisions on
whether to merge two adjacent ranges due to low load. This commit also
updates the merge queue to be consistent with the load based splitter
signal. When switching between `qps` and `cpu`, the load based splitter
for every replica is reset to avoid spurious results.

resolves: cockroachdb#95377

Release note (ops change): Load based splitter now supports using request
cpu usage to split ranges. This is introduced with the previous cluster
setting `kv.allocator.load_based_rebalancing.objective`, which when set
to `cpu`, will use request cpu usage. The threshold above which
CPU usage of a range is considered for splitting is defined in the
cluster setting `kv.range_split.load_cpu_threshold`, which has a default
value of `250ms`.
kvoli added a commit to kvoli/cockroach that referenced this issue Feb 9, 2023
This commit adds the ability to perform load based splitting with replica
cpu usage rather than queries per second. Load based splitting now will
use either cpu or qps for deciding split points, depending on the
cluster setting `kv.allocator.load_based_rebalancing.objective`.

When set to `qps`, qps is used in deciding split points and when
splitting should occur; similarly, `cpu` means that request cpu against
the leaseholder replica is to decide split points.

The split threshold when using `cpu` is the cluster setting
`kv.range_split.load_cpu_threshold` which defaults to `250ms` of cpu
time per second, i.e. a replica using 1/4 processor of a machine
consistently.

The merge queue uses the load based splitter to make decisions on
whether to merge two adjacent ranges due to low load. This commit also
updates the merge queue to be consistent with the load based splitter
signal. When switching between `qps` and `cpu`, the load based splitter
for every replica is reset to avoid spurious results.

resolves: cockroachdb#95377

Release note (ops change): Load based splitter now supports using request
cpu usage to split ranges. This is introduced with the previous cluster
setting `kv.allocator.load_based_rebalancing.objective`, which when set
to `cpu`, will use request cpu usage. The threshold above which
CPU usage of a range is considered for splitting is defined in the
cluster setting `kv.range_split.load_cpu_threshold`, which has a default
value of `250ms`.
kvoli added a commit to kvoli/cockroach that referenced this issue Feb 9, 2023
This commit adds the ability to perform load based splitting with replica
cpu usage rather than queries per second. Load based splitting now will
use either cpu or qps for deciding split points, depending on the
cluster setting `kv.allocator.load_based_rebalancing.objective`.

When set to `qps`, qps is used in deciding split points and when
splitting should occur; similarly, `cpu` means that request cpu against
the leaseholder replica is to decide split points.

The split threshold when using `cpu` is the cluster setting
`kv.range_split.load_cpu_threshold` which defaults to `250ms` of cpu
time per second, i.e. a replica using 1/4 processor of a machine
consistently.

The merge queue uses the load based splitter to make decisions on
whether to merge two adjacent ranges due to low load. This commit also
updates the merge queue to be consistent with the load based splitter
signal. When switching between `qps` and `cpu`, the load based splitter
for every replica is reset to avoid spurious results.

resolves: cockroachdb#95377

Release note (ops change): Load based splitter now supports using request
cpu usage to split ranges. This is introduced with the previous cluster
setting `kv.allocator.load_based_rebalancing.objective`, which when set
to `cpu`, will use request cpu usage. The threshold above which
CPU usage of a range is considered for splitting is defined in the
cluster setting `kv.range_split.load_cpu_threshold`, which has a default
value of `250ms`.
kvoli added a commit to kvoli/cockroach that referenced this issue Feb 9, 2023
This commit adds the ability to perform load based splitting with replica
cpu usage rather than queries per second. Load based splitting now will
use either cpu or qps for deciding split points, depending on the
cluster setting `kv.allocator.load_based_rebalancing.objective`.

When set to `qps`, qps is used in deciding split points and when
splitting should occur; similarly, `cpu` means that request cpu against
the leaseholder replica is to decide split points.

The split threshold when using `cpu` is the cluster setting
`kv.range_split.load_cpu_threshold` which defaults to `250ms` of cpu
time per second, i.e. a replica using 1/4 processor of a machine
consistently.

The merge queue uses the load based splitter to make decisions on
whether to merge two adjacent ranges due to low load. This commit also
updates the merge queue to be consistent with the load based splitter
signal. When switching between `qps` and `cpu`, the load based splitter
for every replica is reset to avoid spurious results.

resolves: cockroachdb#95377

Release note (ops change): Load based splitter now supports using request
cpu usage to split ranges. This is introduced with the previous cluster
setting `kv.allocator.load_based_rebalancing.objective`, which when set
to `cpu`, will use request cpu usage. The threshold above which
CPU usage of a range is considered for splitting is defined in the
cluster setting `kv.range_split.load_cpu_threshold`, which has a default
value of `250ms`.
kvoli added a commit to kvoli/cockroach that referenced this issue Feb 10, 2023
This commit adds the ability to perform load based splitting with replica
cpu usage rather than queries per second. Load based splitting now will
use either cpu or qps for deciding split points, depending on the
cluster setting `kv.allocator.load_based_rebalancing.objective`.

When set to `qps`, qps is used in deciding split points and when
splitting should occur; similarly, `cpu` means that request cpu against
the leaseholder replica is to decide split points.

The split threshold when using `cpu` is the cluster setting
`kv.range_split.load_cpu_threshold` which defaults to `250ms` of cpu
time per second, i.e. a replica using 1/4 processor of a machine
consistently.

The merge queue uses the load based splitter to make decisions on
whether to merge two adjacent ranges due to low load. This commit also
updates the merge queue to be consistent with the load based splitter
signal. When switching between `qps` and `cpu`, the load based splitter
for every replica is reset to avoid spurious results.

resolves: cockroachdb#95377

Release note (ops change): Load based splitter now supports using request
cpu usage to split ranges. This is introduced with the previous cluster
setting `kv.allocator.load_based_rebalancing.objective`, which when set
to `cpu`, will use request cpu usage. The threshold above which
CPU usage of a range is considered for splitting is defined in the
cluster setting `kv.range_split.load_cpu_threshold`, which has a default
value of `250ms`.
craig bot pushed a commit that referenced this issue Feb 10, 2023
96128: kvserver: introduce cpu load based splits r=nvanbenschoten a=kvoli

This PR adds the ability to perform load based splitting with replica
cpu usage rather than queries per second. Load based splitting now will
use either cpu or qps for deciding split points, depending on the
cluster setting `kv.allocator.load_based_rebalancing.objective`.

When set to `qps`, qps is used in deciding split points and when
splitting should occur; similarly, `cpu` means that request cpu against
the leaseholder replica is to decide split points.

The split threshold when using `cpu` is the cluster setting
`kv.range_split.load_cpu_threshold` which defaults to `250ms` of cpu
time per second, i.e. a replica using 1/4 processor of a machine
consistently.

The merge queue uses the load based splitter to make decisions on
whether to merge two adjacent ranges due to low load. This commit also
updates the merge queue to be consistent with the load based splitter
signal. When switching between `qps` and `cpu`, the load based splitter
for every replica is reset to avoid spurious results.

![image](https://user-images.githubusercontent.com/39606633/215580650-b12ff509-5cf5-4ffa-880d-8387e2ef0afa.png)


resolves: #95377
resolves: #83519
depends on: #96127

Release note (ops change): Load based splitter now supports using request
cpu usage to split ranges. This is introduced with the previous cluster
setting `kv.allocator.load_based_rebalancing.objective`, which when set
to `cpu`, will use request cpu usage. The threshold above which
CPU usage of a range is considered for splitting is defined in the
cluster setting `kv.range_split.load_cpu_threshold`, which has a default
value of `250ms`.

96129: bazel: is_dev_linux config setting should be false when cross_flag is r=rickystewart a=srosenberg

true

When cross-compiling on a gce worker and invoking bazel directly, build fails with "cannot find -lresolv_wrapper", owing to the fact that is_dev_linux is not mutually exclusive of cross_linux. That is, when both configs are active "-lresolv_wrapper" is passed to clinkopts in pkg/ccl/gssapiccl/BUILD.bazel; the cross-compiler doesn't have this lib nor does it need it.

Note, when using ./dev instead of bazel, the above issue is side-stepped because the dev wrapper invokes bazel inside docker which ignores ~/.bazelrc.

The workaround is to make is_dev_linux false when cross_flag is true.

Epic: none

Release note: None

96431: ui: databases shows partial results for size limit error r=j82w a=j82w

The databases page displays partial results instead of
just showing an error message.

Sorting is disabled if there are more than 2 pages of results
which is currently configured to 40dbs. This still allows most 
user to use sort functionality, but prevents large customers
from breaking when it would need to do a network call per a
database.

The database details are now loaded on demand for the first
page only. Previously a network call was done for all databases 
which would result in 2k network calls. It now only loads the page
of details the user is looking at. 

part of: #94333

https://www.loom.com/share/31b213b2f1764d0f9868bd967b9388b8

Release note: none

Co-authored-by: Austen McClernon <austen@cockroachlabs.com>
Co-authored-by: Stan Rosenberg <stan@cockroachlabs.com>
Co-authored-by: j82w <jwilley@cockroachlabs.com>
@craig craig bot closed this as completed in 408e227 Feb 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-distribution Relating to rebalancing and leasing. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-kv KV Team
Projects
None yet
1 participant