Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
97717: multitenant: add AdminUnsplitRequest capability r=knz a=ecwall

Fixes #97716

1) Add a new `tenantcapabilitiespb.CanAdminUnsplit` capability.
2) Check capability in `Authorizer.HasCapabilityForBatch`.
3) Remove `ExecutorConfig.RequireSystemTenant` call from
   `execFactory.ConstructAlterTableUnsplit`,
   `execFactory.ConstructAlterTableUnsplitAll`.

Release note: None


98250: kvserver: add minimum cpu lb split threshold r=andrewbaptist a=kvoli

Previously, `kv.range_split.load_cpu_threshold` had no minimum setting value. It is undesirable to allow users to set this setting to low as excessive splitting may occur.

`kv.range_split.load_cpu_threshold` now has a minimum setting value of `10ms`.

See #96869 for additional context on the threshold.

Resolves: #98107

Release note (ops change): `kv.range_split.load_cpu_threshold` now has a minimum setting value of `10ms`.

98270: dashboards: add replica cpu to repl dashboard r=xinhaoz a=kvoli

In #96127 we added the option to load balance replica CPU instead of QPS across
stores in a cluster. It is desirable to view the signal being controlled for
rebalancing in the replication dashboard, similar to QPS.

This pr adds the `rebalancing.cpunanospersecond` metric to the replication
metrics dashboard.

The avg QPS graph on the replication graph previously described the metric as
"Exponentially weighted average", however this is not true.

This pr updates the description to just be "moving average" which is accurate.
Note that follow the workload does use an exponentially weighted value, however
the metric in the dashboard is not the same.

This pr also updates the graph header to include Replica in the title: "Average
Replica Queries per Node". QPS is specific to replicas. This is already
mentioned in the description.

Resolves: #98109


98289: storage: mvccExportToWriter should always return buffered range keys r=adityamaru a=stevendanna

In #96691, we changed the return type of mvccExportToWriter such that it now indicates when a CPU limit has been reached. As part of that change, when the CPU limit was reached, we would immedately `return` rather than `break`ing out of the loop. This introduced a bug, since the code after the loop that the `break` was taking us to is important. Namely, we may have previously buffered range keys that we need to write into our response still. By replacing the break with a return, these range keys were lost.

The end-user impact of this is that a BACKUP that _ought_ to have included range keys (such as a backup of a table with a rolled back IMPORT) would not include those range keys and thus would end up resurecting deleted keys upon restore.

This PR brings back the `break` and adds a test that covers exporting with range keys under CPU exhaustion.

This bug only ever existed on master.

Informs #97694

Epic: none

Release note: None

98329: sql: fix iteration conditions in crdb_internal.scan r=ajwerner a=stevendanna

Rather than using the Next() key of the last key in the response when iterating, we should use the resume span. The previous code could result in a failure in the rare case that the end key of our scan exactly matched the successor key of the very last key in the iteration.

Epic: none

Release note: None

Co-authored-by: Evan Wall <wall@cockroachlabs.com>
Co-authored-by: Austen McClernon <austen@cockroachlabs.com>
Co-authored-by: Steven Danna <danna@cockroachlabs.com>
  • Loading branch information
4 people committed Mar 9, 2023
6 parents eb0ba11 + d8f9575 + 903747d + 646d5e6 + 74c8915 + 15df935 commit 62eb4b8
Show file tree
Hide file tree
Showing 32 changed files with 386 additions and 198 deletions.
20 changes: 10 additions & 10 deletions pkg/ccl/backupccl/testdata/backup-restore/restore-tenants
Original file line number Diff line number Diff line change
Expand Up @@ -29,9 +29,9 @@ CREATE DATABASE db1;
query-sql
SELECT id,name,data_state,service_mode,active,crdb_internal.pb_to_json('cockroach.multitenant.ProtoInfo', info, true) FROM system.tenants;
----
1 system 1 2 true {"capabilities": {"canAdminSplit": false, "canViewNodeInfo": false, "canViewTsdbMetrics": false, "spanConfigBounds": null}, "deprecatedDataState": "READY", "deprecatedId": "1", "droppedName": "", "tenantReplicationJobId": "0"}
5 <nil> 2 0 false {"capabilities": {"canAdminSplit": false, "canViewNodeInfo": false, "canViewTsdbMetrics": false, "spanConfigBounds": null}, "deprecatedDataState": "DROP", "deprecatedId": "5", "droppedName": "tenant-5", "tenantReplicationJobId": "0"}
6 tenant-6 1 1 true {"capabilities": {"canAdminSplit": false, "canViewNodeInfo": false, "canViewTsdbMetrics": false, "spanConfigBounds": null}, "deprecatedDataState": "READY", "deprecatedId": "6", "droppedName": "", "tenantReplicationJobId": "0"}
1 system 1 2 true {"capabilities": {"canAdminSplit": false, "canAdminUnsplit": false, "canViewNodeInfo": false, "canViewTsdbMetrics": false, "spanConfigBounds": null}, "deprecatedDataState": "READY", "deprecatedId": "1", "droppedName": "", "tenantReplicationJobId": "0"}
5 <nil> 2 0 false {"capabilities": {"canAdminSplit": false, "canAdminUnsplit": false, "canViewNodeInfo": false, "canViewTsdbMetrics": false, "spanConfigBounds": null}, "deprecatedDataState": "DROP", "deprecatedId": "5", "droppedName": "tenant-5", "tenantReplicationJobId": "0"}
6 tenant-6 1 1 true {"capabilities": {"canAdminSplit": false, "canAdminUnsplit": false, "canViewNodeInfo": false, "canViewTsdbMetrics": false, "spanConfigBounds": null}, "deprecatedDataState": "READY", "deprecatedId": "6", "droppedName": "", "tenantReplicationJobId": "0"}

exec-sql
BACKUP INTO 'nodelocal://1/cluster_without_tenants'
Expand Down Expand Up @@ -81,8 +81,8 @@ job paused at pausepoint
query-sql
SELECT id,active,crdb_internal.pb_to_json('cockroach.multitenant.ProtoInfo', info, true) FROM system.tenants;
----
1 true {"capabilities": {"canAdminSplit": false, "canViewNodeInfo": false, "canViewTsdbMetrics": false, "spanConfigBounds": null}, "deprecatedDataState": "READY", "deprecatedId": "1", "droppedName": "", "tenantReplicationJobId": "0"}
6 false {"capabilities": {"canAdminSplit": false, "canViewNodeInfo": false, "canViewTsdbMetrics": false, "spanConfigBounds": null}, "deprecatedDataState": "ADD", "deprecatedId": "6", "droppedName": "", "tenantReplicationJobId": "0"}
1 true {"capabilities": {"canAdminSplit": false, "canAdminUnsplit": false, "canViewNodeInfo": false, "canViewTsdbMetrics": false, "spanConfigBounds": null}, "deprecatedDataState": "READY", "deprecatedId": "1", "droppedName": "", "tenantReplicationJobId": "0"}
6 false {"capabilities": {"canAdminSplit": false, "canAdminUnsplit": false, "canViewNodeInfo": false, "canViewTsdbMetrics": false, "spanConfigBounds": null}, "deprecatedDataState": "ADD", "deprecatedId": "6", "droppedName": "", "tenantReplicationJobId": "0"}

exec-sql
SET CLUSTER SETTING jobs.debug.pausepoints = ''
Expand All @@ -102,8 +102,8 @@ USE defaultdb;
query-sql
SELECT id,name,data_state,service_mode,active,crdb_internal.pb_to_json('cockroach.multitenant.ProtoInfo', info, true) FROM system.tenants;
----
1 system 1 2 true {"capabilities": {"canAdminSplit": false, "canViewNodeInfo": false, "canViewTsdbMetrics": false, "spanConfigBounds": null}, "deprecatedDataState": "READY", "deprecatedId": "1", "droppedName": "", "tenantReplicationJobId": "0"}
6 tenant-6 1 1 true {"capabilities": {"canAdminSplit": false, "canViewNodeInfo": false, "canViewTsdbMetrics": false, "spanConfigBounds": null}, "deprecatedDataState": "READY", "deprecatedId": "6", "droppedName": "", "tenantReplicationJobId": "0"}
1 system 1 2 true {"capabilities": {"canAdminSplit": false, "canAdminUnsplit": false, "canViewNodeInfo": false, "canViewTsdbMetrics": false, "spanConfigBounds": null}, "deprecatedDataState": "READY", "deprecatedId": "1", "droppedName": "", "tenantReplicationJobId": "0"}
6 tenant-6 1 1 true {"capabilities": {"canAdminSplit": false, "canAdminUnsplit": false, "canViewNodeInfo": false, "canViewTsdbMetrics": false, "spanConfigBounds": null}, "deprecatedDataState": "READY", "deprecatedId": "6", "droppedName": "", "tenantReplicationJobId": "0"}


exec-sql expect-error-regex=(tenant 6 not in backup)
Expand Down Expand Up @@ -133,9 +133,9 @@ RESTORE TENANT 6 FROM LATEST IN 'nodelocal://1/tenant6' WITH tenant_name = 'newn
query-sql
SELECT id,name,data_state,service_mode,active,crdb_internal.pb_to_json('cockroach.multitenant.ProtoInfo', info, true) FROM system.tenants;
----
1 system 1 2 true {"capabilities": {"canAdminSplit": false, "canViewNodeInfo": false, "canViewTsdbMetrics": false, "spanConfigBounds": null}, "deprecatedDataState": "READY", "deprecatedId": "1", "droppedName": "", "tenantReplicationJobId": "0"}
2 newname 1 1 true {"capabilities": {"canAdminSplit": false, "canViewNodeInfo": false, "canViewTsdbMetrics": false, "spanConfigBounds": null}, "deprecatedDataState": "READY", "deprecatedId": "2", "droppedName": "", "tenantReplicationJobId": "0"}
6 tenant-6 1 1 true {"capabilities": {"canAdminSplit": false, "canViewNodeInfo": false, "canViewTsdbMetrics": false, "spanConfigBounds": null}, "deprecatedDataState": "READY", "deprecatedId": "6", "droppedName": "", "tenantReplicationJobId": "0"}
1 system 1 2 true {"capabilities": {"canAdminSplit": false, "canAdminUnsplit": false, "canViewNodeInfo": false, "canViewTsdbMetrics": false, "spanConfigBounds": null}, "deprecatedDataState": "READY", "deprecatedId": "1", "droppedName": "", "tenantReplicationJobId": "0"}
2 newname 1 1 true {"capabilities": {"canAdminSplit": false, "canAdminUnsplit": false, "canViewNodeInfo": false, "canViewTsdbMetrics": false, "spanConfigBounds": null}, "deprecatedDataState": "READY", "deprecatedId": "2", "droppedName": "", "tenantReplicationJobId": "0"}
6 tenant-6 1 1 true {"capabilities": {"canAdminSplit": false, "canAdminUnsplit": false, "canViewNodeInfo": false, "canViewTsdbMetrics": false, "spanConfigBounds": null}, "deprecatedDataState": "READY", "deprecatedId": "6", "droppedName": "", "tenantReplicationJobId": "0"}

# Check that another service mode is also preserved.
exec-sql
Expand Down
4 changes: 4 additions & 0 deletions pkg/ccl/logictestccl/testdata/logic_test/crdb_internal
Original file line number Diff line number Diff line change
Expand Up @@ -195,9 +195,11 @@ SELECT * FROM crdb_internal.node_tenant_capabilities_cache
----
tenant_id capability_id capability_value
1 can_admin_split false
1 can_admin_unsplit false
1 can_view_node_info false
1 can_view_tsdb_metrics false
5 can_admin_split false
5 can_admin_unsplit false
5 can_view_node_info false
5 can_view_tsdb_metrics false

Expand All @@ -210,9 +212,11 @@ SELECT * FROM crdb_internal.node_tenant_capabilities_cache
----
tenant_id capability_id capability_value
1 can_admin_split false
1 can_admin_unsplit false
1 can_view_node_info false
1 can_view_tsdb_metrics false
5 can_admin_split true
5 can_admin_unsplit false
5 can_view_node_info false
5 can_view_tsdb_metrics false

Expand Down
7 changes: 7 additions & 0 deletions pkg/ccl/logictestccl/testdata/logic_test/tenant_capability
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ SELECT capability_id, capability_value FROM [SHOW TENANT "no-capabilities-tenant
----
capability_id capability_value
can_admin_split false
can_admin_unsplit false
can_view_node_info false
can_view_tsdb_metrics false

Expand All @@ -50,6 +51,7 @@ SELECT capability_id, capability_value FROM [SHOW TENANT "bool-capability-no-val
----
capability_id capability_value
can_admin_split true
can_admin_unsplit false
can_view_node_info false
can_view_tsdb_metrics false

Expand All @@ -61,6 +63,7 @@ SELECT capability_id, capability_value FROM [SHOW TENANT "bool-capability-no-val
----
capability_id capability_value
can_admin_split false
can_admin_unsplit false
can_view_node_info false
can_view_tsdb_metrics false

Expand All @@ -79,6 +82,7 @@ SELECT capability_id, capability_value FROM [SHOW TENANT "bool-capability-with-v
----
capability_id capability_value
can_admin_split true
can_admin_unsplit false
can_view_node_info false
can_view_tsdb_metrics false

Expand All @@ -97,6 +101,7 @@ SELECT capability_id, capability_value FROM [SHOW TENANT "bool-capability-with-e
----
capability_id capability_value
can_admin_split true
can_admin_unsplit false
can_view_node_info false
can_view_tsdb_metrics false

Expand All @@ -115,6 +120,7 @@ SELECT capability_id, capability_value FROM [SHOW TENANT "multiple-capability-te
----
capability_id capability_value
can_admin_split true
can_admin_unsplit false
can_view_node_info true
can_view_tsdb_metrics false

Expand All @@ -126,6 +132,7 @@ SELECT capability_id, capability_value FROM [SHOW TENANT "multiple-capability-te
----
capability_id capability_value
can_admin_split false
can_admin_unsplit false
can_view_node_info false
can_view_tsdb_metrics false

Expand Down
6 changes: 0 additions & 6 deletions pkg/ccl/logictestccl/testdata/logic_test/tenant_unsupported
Original file line number Diff line number Diff line change
Expand Up @@ -45,12 +45,6 @@ SELECT * FROM crdb_internal.kv_node_status

# Cannot perform operations that issue Admin requests.

statement error operation is unsupported in multi-tenancy mode
ALTER TABLE kv UNSPLIT AT VALUES ('foo')

statement error operation is unsupported in multi-tenancy mode
ALTER TABLE kv UNSPLIT ALL

statement error tenant cluster setting sql.scatter.allow_for_secondary_tenant.enabled disabled
ALTER TABLE kv SCATTER

Expand Down
Original file line number Diff line number Diff line change
@@ -1,9 +1,7 @@
query-sql-system
SHOW TENANT [10] WITH CAPABILITIES
SELECT * FROM [SHOW TENANT [10] WITH CAPABILITIES] WHERE capability_id = 'can_admin_split'
----
10 tenant-10 ready none can_admin_split false
10 tenant-10 ready none can_view_node_info false
10 tenant-10 ready none can_view_tsdb_metrics false

exec-sql-tenant
CREATE TABLE t(a INT)
Expand Down
1 change: 1 addition & 0 deletions pkg/kv/kvpb/BUILD.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ go_library(
"//pkg/util/humanizeutil",
"//pkg/util/log",
"//pkg/util/protoutil",
"//pkg/util/stringerutil",
"//pkg/util/uuid",
"@com_github_cockroachdb_errors//:errors",
"@com_github_cockroachdb_errors//extgrpc",
Expand Down
11 changes: 11 additions & 0 deletions pkg/kv/kvpb/method.go
Original file line number Diff line number Diff line change
Expand Up @@ -10,12 +10,21 @@

package kvpb

import "github.com/cockroachdb/cockroach/pkg/util/stringerutil"

// Method is the enumerated type for methods.
type Method int

// SafeValue implements redact.SafeValue.
func (Method) SafeValue() {}

var StringToMethodMap = stringerutil.StringToEnumValueMap(
_Method_index[:],
_Method_name,
0,
MaxMethod,
)

//go:generate stringer -type=Method
const (
// Get fetches the value for a key from the KV map, respecting a
Expand Down Expand Up @@ -174,6 +183,8 @@ const (
// IsSpanEmpty is a non-transaction read request used to determine whether
// a span contains any keys whatsoever (garbage or otherwise).
IsSpanEmpty
// MaxMethod is the maximum method.
MaxMethod Method = iota - 1
// NumMethods represents the total number of API methods.
NumMethods
)
1 change: 1 addition & 0 deletions pkg/kv/kvpb/method_string.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

14 changes: 14 additions & 0 deletions pkg/kv/kvserver/replica_split_load.go
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,20 @@ var SplitByLoadCPUThreshold = settings.RegisterDurationSetting(
"kv.range_split.load_cpu_threshold",
"the CPU use per second over which, the range becomes a candidate for load based splitting",
500*time.Millisecond,
func(threshold time.Duration) error {
// We enforce a minimum because of recursive splitting that may occur if
// the threshold is set too low. There is a fixed CPU overhead for a
// replica. At the moment no split key will be produced unless there are
// more than 100 samples (batch requests) to that replica, however the
// memory overhead of tracking split keys in split/weighted_finder.go is
// noticeable and a finder is created after exceeding this threshold.
if threshold < 10*time.Millisecond {
return errors.Errorf(
"Cannot set `kv.range_split.load_cpu_threshold` less than 10ms",
)
}
return nil
},
).WithPublic()

func (obj LBRebalancingObjective) ToSplitObjective() split.SplitObjective {
Expand Down
14 changes: 12 additions & 2 deletions pkg/multitenant/tenantcapabilities/capabilities.go
Original file line number Diff line number Diff line change
Expand Up @@ -166,6 +166,12 @@ const (
// cluster.
CanAdminSplit // can_admin_split

// CanAdminUnsplit describes the ability of a tenant to perform manual
// KV range unsplit requests. These operations need a capability
// because excessive KV range unsplits can overwhelm the storage
// cluster.
CanAdminUnsplit // can_admin_unsplit

// CanViewNodeInfo describes the ability of a tenant to read the
// metadata for KV nodes. These operations need a capability because
// the KV node record contains sensitive operational data which we
Expand Down Expand Up @@ -215,10 +221,14 @@ const (
// CapabilityType returns the type of a given capability.
func (c CapabilityID) CapabilityType() Type {
switch c {
case CanAdminSplit, CanViewNodeInfo, CanViewTSDBMetrics:
case
CanAdminSplit,
CanAdminUnsplit,
CanViewNodeInfo,
CanViewTSDBMetrics:
return Bool

default:
panic(errors.AssertionFailedf("missing case: %d", c))
panic(errors.AssertionFailedf("missing case: %q", c))
}
}
13 changes: 7 additions & 6 deletions pkg/multitenant/tenantcapabilities/capabilityid_string.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,8 @@ func (a *Authorizer) HasCapabilityForBatch(
}

for _, ru := range ba.Requests {
requiredCap, hasCap := reqMethodToCap[ru.GetInner().Method()]
request := ru.GetInner()
requiredCap, hasCap := reqMethodToCap[request.Method()]
if requiredCap == noCapCheckNeeded {
continue
}
Expand All @@ -79,7 +80,7 @@ func (a *Authorizer) HasCapabilityForBatch(
// disallowed. This prevents accidents where someone adds a new
// sensitive request type in KV and forgets to add an explicit
// authorization rule for it here.
return errors.Newf("client tenant does not have capability %q (%T)", requiredCap, ru.GetInner())
return errors.Newf("client tenant does not have capability %q (%T)", requiredCap, request)
}
}
return nil
Expand Down Expand Up @@ -117,15 +118,15 @@ var reqMethodToCap = map[kvpb.Method]tenantcapabilities.CapabilityID{
kvpb.Scan: noCapCheckNeeded,

// The following are authorized via specific capabilities.
kvpb.AdminSplit: tenantcapabilities.CanAdminSplit,
kvpb.AdminSplit: tenantcapabilities.CanAdminSplit,
kvpb.AdminUnsplit: tenantcapabilities.CanAdminUnsplit,

// TODO(ecwall): The following should also be authorized via specific capabilities.
kvpb.AdminChangeReplicas: noCapCheckNeeded,
kvpb.AdminMerge: noCapCheckNeeded,
kvpb.AdminRelocateRange: noCapCheckNeeded,
kvpb.AdminScatter: noCapCheckNeeded,
kvpb.AdminTransferLease: noCapCheckNeeded,
kvpb.AdminUnsplit: noCapCheckNeeded,

// TODO(knz,arul): Verify with the relevant teams whether secondary
// tenants have legitimate access to any of those.
Expand Down
Original file line number Diff line number Diff line change
@@ -1,50 +1,62 @@
upsert ten=10 can_admin_split=true can_view_node_info=true can_view_tsdb_metrics=true
upsert ten=10 can_admin_split=true can_admin_unsplit=true can_view_node_info=true can_view_tsdb_metrics=true
----
ok

upsert ten=11 can_admin_split=false can_view_node_info=false can_view_tsdb_metrics=false
upsert ten=11 can_admin_split=false can_admin_unsplit=false can_view_node_info=false can_view_tsdb_metrics=false
----
ok

# Tenant 10 should be able to issue splits, given it has the capability to do
# so.
has-capability-for-batch ten=10 cmds=(split, scan, cput)
has-capability-for-batch ten=10 cmds=(AdminSplit, Scan, ConditionalPut)
----
ok

has-capability-for-batch ten=10 cmds=(AdminUnsplit, Scan, ConditionalPut)
----
ok

# Tenant 11 shouldn't be able to issue splits.
has-capability-for-batch ten=11 cmds=(split, scan, cput)
has-capability-for-batch ten=11 cmds=(AdminSplit, Scan, ConditionalPut)
----
client tenant does not have capability "can_admin_split" (*kvpb.AdminSplitRequest)

has-capability-for-batch ten=11 cmds=(AdminUnsplit, Scan, ConditionalPut)
----
client tenant does not have capability "can_admin_unsplit" (*kvpb.AdminUnsplitRequest)

# Test that the order of the split request doesn't have any effect.
has-capability-for-batch ten=11 cmds=(scan, cput)
has-capability-for-batch ten=11 cmds=(Scan, ConditionalPut)
----
ok

# However, a batch request which doesn't include a split (by tenant 11) should
# work as you'd expect.
has-capability-for-batch ten=11 cmds=(scan, cput)
has-capability-for-batch ten=11 cmds=(Scan, ConditionalPut)
----
ok

# Ditto for tenant 10.
has-capability-for-batch ten=10 cmds=(scan, cput)
has-capability-for-batch ten=10 cmds=(Scan, ConditionalPut)
----
ok

# Lastly, flip tenant 10's capability for splits; ensure it can no longer issue
# splits as a result.
upsert ten=10 can_admin_split=false can_view_node_info=true can_view_tsdb_metrics=true
upsert ten=10 can_admin_split=false can_admin_unsplit=false can_view_node_info=true can_view_tsdb_metrics=true
----
ok

has-capability-for-batch ten=10 cmds=(split, scan, cput)
has-capability-for-batch ten=10 cmds=(AdminSplit, Scan, ConditionalPut)
----
client tenant does not have capability "can_admin_split" (*kvpb.AdminSplitRequest)

has-capability-for-batch ten=10 cmds=(AdminUnsplit, Scan, ConditionalPut)
----
client tenant does not have capability "can_admin_unsplit" (*kvpb.AdminUnsplitRequest)

# However, this has no effect on batch requests that don't contain splits.
has-capability-for-batch ten=10 cmds=(scan, cput)
has-capability-for-batch ten=10 cmds=(Scan, ConditionalPut)
----
ok

Expand All @@ -56,7 +68,6 @@ has-node-status-capability ten=11
----
client tenant does not have capability to query cluster node metadata


has-tsdb-query-capability ten=10
----
ok
Expand Down
Loading

0 comments on commit 62eb4b8

Please sign in to comment.