Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: scatter with rf=1 doesn't randomize replica placement #124171

Closed
kvoli opened this issue May 14, 2024 · 5 comments · Fixed by #124284
Closed

kvserver: scatter with rf=1 doesn't randomize replica placement #124171

kvoli opened this issue May 14, 2024 · 5 comments · Fixed by #124284
Assignees
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-support Would prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docs P-1 Issues/test failures with a fix SLA of 1 month T-kv KV Team

Comments

@kvoli
Copy link
Collaborator

kvoli commented May 14, 2024

Describe the problem

When an AdminScatterRequest is issued against a range which has a replication factor of 1, the range to be scattered will add an additional replica before completing. This leaves the range with 2 replicas, one on the initial store and another on a random store.

To Reproduce

Restore a database with RF=1 into a multi-node cluster.

A targeted reproduction using split, then scatter is shown below.

roachprod create local -n 5
roachprod start local --binary=./artifacts/cockroach
# Create a test table with 1k splits which we will use to split+scatter.
roachprod sql local:1 -- -e "CREATE TABLE t (i INT PRIMARY KEY);"
roachprod sql local:1 -- -e "INSERT INTO t select generate_series(1,1000);"
roachprod sql local:1 -- -e "ALTER TABLE t CONFIGURE ZONE USING num_replicas=1, constraints = '[+node1]';"
sleep 10
# Stop the replicate/lease queue from interfering with the test. The range for
# the test table should be down-replicated to just node 1 by now.
roachprod sql local:1 -- -e "SELECT lease_holder, count(*) FROM [SHOW RANGES FROM TABLE t WITH DETAILS] GROUP BY lease_holder;"
roachprod sql local:1 -- -e "SET CLUSTER SETTING kv.replicate_queue.enabled = false"
roachprod sql local:1 -- -e "SET CLUSTER SETTING kv.lease_queue.enabled = false"
roachprod sql local:1 -- -e "ALTER TABLE t SPLIT AT SELECT i FROM t;"
roachprod sql local:1 -- -e "ALTER TABLE t CONFIGURE ZONE USING num_replicas=1, constraints = '[]';"
roachprod sql local:1 -- -e "WITH ranges_info AS ( SHOW RANGES FROM TABLE t WITH DETAILS), store_replica_count AS ( SELECT unnest(replicas) AS store_id FROM ranges_info), store_lease_count AS ( SELECT lease_holder AS store_id FROM ranges_info), replica_counts AS ( SELECT store_id, COUNT(*) AS replica_count FROM store_replica_count GROUP BY store_id), lease_counts AS ( SELECT store_id, COUNT(*) AS lease_count FROM store_lease_count GROUP BY store_id), max_counts AS ( SELECT (SELECT MAX(replica_count) FROM replica_counts) AS max_replica_count, (SELECT MAX(lease_count) FROM lease_counts) AS max_lease_count) SELECT r.store_id, r.replica_count, repeat('#', CEIL(10.0 * r.replica_count / m.max_replica_count)::INT) AS replica_distribution, COALESCE(l.lease_count, 0) AS lease_count, repeat('#', CEIL(10.0 * COALESCE(l.lease_count, 0) / m.max_lease_count)::INT) AS lease_distribution FROM replica_counts r LEFT JOIN lease_counts l ON r.store_id = l.store_id CROSS JOIN max_counts m ORDER BY r.replica_count DESC;"
# This sleep isn't strictly necessary, but makes it obvious when the table was
# split and when it was later scattered when looking at timeseries.
sleep 10
# Remove the constraint so that the range can scatter unrestricted.
roachprod sql local:1 -- -e "ALTER TABLE t SCATTER;"
roachprod sql local:1 -- -e "WITH ranges_info AS ( SHOW RANGES FROM TABLE t WITH DETAILS), store_replica_count AS ( SELECT unnest(replicas) AS store_id FROM ranges_info), store_lease_count AS ( SELECT lease_holder AS store_id FROM ranges_info), replica_counts AS ( SELECT store_id, COUNT(*) AS replica_count FROM store_replica_count GROUP BY store_id), lease_counts AS ( SELECT store_id, COUNT(*) AS lease_count FROM store_lease_count GROUP BY store_id), max_counts AS ( SELECT (SELECT MAX(replica_count) FROM replica_counts) AS max_replica_count, (SELECT MAX(lease_count) FROM lease_counts) AS max_lease_count) SELECT r.store_id, r.replica_count, repeat('#', CEIL(10.0 * r.replica_count / m.max_replica_count)::INT) AS replica_distribution, COALESCE(l.lease_count, 0) AS lease_count, repeat('#', CEIL(10.0 * COALESCE(l.lease_count, 0) / m.max_lease_count)::INT) AS lease_distribution FROM replica_counts r LEFT JOIN lease_counts l ON r.store_id = l.store_id CROSS JOIN max_counts m ORDER BY r.replica_count DESC;"

This results in the following lease and replica distribution after scattering:

  store_id | replica_count | replica_distribution | lease_count | lease_distribution
-----------+---------------+----------------------+-------------+---------------------
         1 |          1001 | ##########           |         500 | ##########
         5 |           291 | ###                  |         147 | ###
         4 |           275 | ###                  |         137 | ###
         3 |           229 | ###                  |         118 | ###
         2 |           206 | ###                  |          99 | ##

Note how despite there being RF=1 and 1k ranges, there are 2k replicas, of which 1k remains on the original store (1). As a result of a replica always remaining on the replica where scatter was evaluated, that store also retains 1/2 the leases, leading to imbalanced lease counts which would affect restore processor planning.

Expected behavior

Scatter randomly places a replica and leaves the range with exactly 1 replica.

Environment:

  • CockroachDB version: All versions affected

Additional context

See #108420 for explanation why rebalancing a voter with RF=1 will first add a voter.

Jira issue: CRDB-38742

@kvoli kvoli added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-kv KV Team labels May 14, 2024
@kvoli kvoli added O-support Would prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docs P-1 Issues/test failures with a fix SLA of 1 month labels May 14, 2024
@kvoli
Copy link
Collaborator Author

kvoli commented May 14, 2024

The distribution looks more reasonable with a small prototype (#124185) which directly relocates the range when processing a scatter request when the range only has 1 existing replica:

  store_id | replica_count | replica_distribution | lease_count | lease_distribution
-----------+---------------+----------------------+-------------+---------------------
         2 |           326 | ##########           |         326 | ##########
         4 |           294 | ##########           |         294 | ##########
         3 |           193 | ######               |         193 | ######
         5 |           188 | ######               |         188 | ######

The distribution avoids the existing store entirely unfortunately, however.

kvoli added a commit to kvoli/cockroach that referenced this issue May 15, 2024
When a range with exactly 1 replica is scattered it could only add an
additional replica to a valid store, without also removing the existing
replica. This left ranges post-scatter over-replicated and not random --
as only the newly added replica would be randomly placed. The cause is
the replicate queue falling back to adding a replica, instead of both
adding and removing a replica in an atomic operation.

Relocate the range, instead of processing via the replicate queue. The
relocation carries out the multi-step add+remove. A remaining issue is
that the existing replica will currently never be selected as a target
when having an existing high replica count.

Distribution without patch:

```
  store_id | replica_count | replica_distribution | lease_count | lease_distribution
-----------+---------------+----------------------+-------------+---------------------
         1 |          1001 | ##########           |         500 | ##########
         5 |           291 | ###                  |         147 | ###
         4 |           275 | ###                  |         137 | ###
         3 |           229 | ###                  |         118 | ###
         2 |           206 | ###                  |          99 | ##
```

Distribution with patch:

```
  store_id | replica_count | replica_distribution | lease_count | lease_distribution
-----------+---------------+----------------------+-------------+---------------------
         2 |           326 | ##########           |         326 | ##########
         4 |           294 | ##########           |         294 | ##########
         3 |           193 | ######               |         193 | ######
         5 |           188 | ######               |         188 | ######
```

Resolves: cockroachdb#124171
Release note: None
@kvoli
Copy link
Collaborator Author

kvoli commented May 16, 2024

It might be possible to entirely lift the RF=1 rebalancing restriction (due to #74077), where we could just issue the change replicas operation as usual like:

diff --git a/pkg/kv/kvserver/allocator/plan/util.go b/pkg/kv/kvserver/allocator/plan/util.go
index 33e066a04f5..974d9b45392 100644
--- a/pkg/kv/kvserver/allocator/plan/util.go
+++ b/pkg/kv/kvserver/allocator/plan/util.go
@@ -17,7 +17,6 @@ import (
 	"github.com/cockroachdb/cockroach/pkg/kv/kvserver/allocator/allocatorimpl"
 	"github.com/cockroachdb/cockroach/pkg/raft"
 	"github.com/cockroachdb/cockroach/pkg/roachpb"
-	"github.com/cockroachdb/cockroach/pkg/util/log"
 	"github.com/cockroachdb/errors"
 	"github.com/cockroachdb/redact"
 )
@@ -36,51 +35,6 @@ func ReplicationChangesForRebalance(
 	rebalanceTargetType allocatorimpl.TargetReplicaType,
 ) (chgs []kvpb.ReplicationChange, performingSwap bool, err error) {
 	rdesc, found := desc.GetReplicaDescriptor(addTarget.StoreID)
-	if rebalanceTargetType == allocatorimpl.VoterTarget && numExistingVoters == 1 {
-		// If there's only one replica, the removal target is the
-		// leaseholder and this is unsupported and will fail. However,
-		// this is also the only way to rebalance in a single-replica
-		// range. If we try the atomic swap here, we'll fail doing
-		// nothing, and so we stay locked into the current distribution
-		// of replicas. (Note that maybeTransferLeaseAway above will not
-		// have found a target, and so will have returned (false, nil).
-		//
-		// Do the best thing we can, which is carry out the addition
-		// only, which should succeed, and the next time we touch this
-		// range, we will have one more replica and hopefully it will
-		// take the lease and remove the current leaseholder.
-		//
-		// It's possible that "rebalancing deadlock" can occur in other
-		// scenarios, it's really impossible to tell from the code given
-		// the constraints we support. However, the lease transfer often
-		// does not happen spuriously, and we can't enter dangerous
-		// configurations sporadically, so this code path is only hit
-		// when we know it's necessary, picking the smaller of two evils.
-		//
-		// See https://github.com/cockroachdb/cockroach/issues/40333.
-		log.KvDistribution.Infof(ctx, "can't swap replica due to lease; falling back to add")
-
-		// Even when there is only 1 existing voter, there may be other replica
-		// types in the range. Check if the add target already has a replica, if so
-		// it must be a non-voter or the rebalance is invalid.
-		if found && rdesc.Type == roachpb.NON_VOTER {
-			// The receiving store already has a non-voting replica. Instead of just
-			// adding a voter to the receiving store, we *must* promote the non-voting
-			// replica to a voter.
-			chgs = kvpb.ReplicationChangesForPromotion(addTarget)
-		} else if !found {
-			chgs = []kvpb.ReplicationChange{
-				{ChangeType: roachpb.ADD_VOTER, Target: addTarget},
-			}
-		} else {
-			return nil, false, errors.AssertionFailedf(
-				"invalid rebalancing decision: trying to"+
-					" move voter to a store that already has a replica %s for the range", rdesc,
-			)
-		}
-		return chgs, false, err
-	}
-
 	switch rebalanceTargetType {
 	case allocatorimpl.VoterTarget:
 		// Check if the target being added already has a non-voting replica.

Testing the same reproduction yields a similar result to using relocate range, where the distribution is reasonable between 4/5 stores, but excludes the original leaseholder store entirely.

@kvoli
Copy link
Collaborator Author

kvoli commented May 16, 2024

With the queues re-enabled and running the repro again with the above patch, the distribution looks passable:

  store_id | replica_count | replica_distribution | lease_count | lease_distribution
-----------+---------------+----------------------+-------------+---------------------
         2 |           242 | ##########           |         241 | ##########
         4 |           227 | ##########           |         227 | ##########
         5 |           217 | #########            |         216 | #########
         3 |           209 | #########            |         208 | #########
         1 |           106 | #####                |         109 | #####

kvoli added a commit to kvoli/cockroach that referenced this issue May 16, 2024
The allocator would add a voter, instead of both adding and removing the
existing voter when rebalancing ranges with one replica. Removing the
leaseholder replica was not possible prior to cockroachdb#74077, so the addition
only was necessary.

This restriction is no longer necessary. Allow rebalancing a one voter
range between stores using joint configurations, where the lease will be
transferred to the incoming voter store, from the outgoing demoting
voter.

Scattering ranges with one voter will now leave the range with exactly
one voter, where previously both the leaseholder voter evaluating the
scatter, and the new voter would be left.

Before this patch, scattering 1000 ranges with RF=1 on a 5 store
cluster:

```
  store_id | replica_count | replica_distribution | lease_count | lease_distribution
-----------+---------------+----------------------+-------------+---------------------
         1 |          1001 | ##########           |         500 | ##########
         5 |           291 | ###                  |         147 | ###
         4 |           275 | ###                  |         137 | ###
         3 |           229 | ###                  |         118 | ###
         2 |           206 | ###                  |          99 | ##
```

After:

```
  store_id | replica_count | replica_distribution | lease_count | lease_distribution
-----------+---------------+----------------------+-------------+---------------------
         2 |           242 | ##########           |         241 | ##########
         4 |           227 | ##########           |         227 | ##########
         5 |           217 | #########            |         216 | #########
         3 |           209 | #########            |         208 | #########
         1 |           106 | #####                |         109 | #####
```

Fixes: cockroachdb#108420
Fixes: cockroachdb#124171

Release note (bug fix): Scattering a range with replication factor=1, no
longer erroneously up-replicates the range to two replicas. Leases will
also no longer thrash between nodes when perturbed with replication
factor=1.
kvoli added a commit to kvoli/cockroach that referenced this issue May 16, 2024
The allocator would add a voter, instead of both adding and removing the
existing voter when rebalancing ranges with one replica. Removing the
leaseholder replica was not possible prior to cockroachdb#74077, so the addition
only was necessary.

This restriction is no longer necessary. Allow rebalancing a one voter
range between stores using joint configurations, where the lease will be
transferred to the incoming voter store, from the outgoing demoting
voter.

Scattering ranges with one voter will now leave the range with exactly
one voter, where previously both the leaseholder voter evaluating the
scatter, and the new voter would be left.

Before this patch, scattering 1000 ranges with RF=1 on a 5 store
cluster:

```
  store_id | replica_count | replica_distribution | lease_count | lease_distribution
-----------+---------------+----------------------+-------------+---------------------
         1 |          1001 | ##########           |         500 | ##########
         5 |           291 | ###                  |         147 | ###
         4 |           275 | ###                  |         137 | ###
         3 |           229 | ###                  |         118 | ###
         2 |           206 | ###                  |          99 | ##
```

After:

```
  store_id | replica_count | replica_distribution | lease_count | lease_distribution
-----------+---------------+----------------------+-------------+---------------------
         2 |           242 | ##########           |         241 | ##########
         4 |           227 | ##########           |         227 | ##########
         5 |           217 | #########            |         216 | #########
         3 |           209 | #########            |         208 | #########
         1 |           106 | #####                |         109 | #####
```

Fixes: cockroachdb#108420
Fixes: cockroachdb#124171

Release note (bug fix): Scattering a range with replication factor=1, no
longer erroneously up-replicates the range to two replicas. Leases will
also no longer thrash between nodes when perturbed with replication
factor=1.
kvoli added a commit to kvoli/cockroach that referenced this issue May 20, 2024
The allocator would add a voter, instead of both adding and removing the
existing voter when rebalancing ranges with one replica. Removing the
leaseholder replica was not possible prior to cockroachdb#74077, so the addition
only was necessary.

This restriction is no longer necessary. Allow rebalancing a one voter
range between stores using joint configurations, where the lease will be
transferred to the incoming voter store, from the outgoing demoting
voter.

Scattering ranges with one voter will now leave the range with exactly
one voter, where previously both the leaseholder voter evaluating the
scatter, and the new voter would be left.

Before this patch, scattering 1000 ranges with RF=1 on a 5 store
cluster:

```
  store_id | replica_count | replica_distribution | lease_count | lease_distribution
-----------+---------------+----------------------+-------------+---------------------
         1 |          1001 | ##########           |         500 | ##########
         5 |           291 | ###                  |         147 | ###
         4 |           275 | ###                  |         137 | ###
         3 |           229 | ###                  |         118 | ###
         2 |           206 | ###                  |          99 | ##
```

After:

```
  store_id | replica_count | replica_distribution | lease_count | lease_distribution
-----------+---------------+----------------------+-------------+---------------------
         2 |           242 | ##########           |         241 | ##########
         4 |           227 | ##########           |         227 | ##########
         5 |           217 | #########            |         216 | #########
         3 |           209 | #########            |         208 | #########
         1 |           106 | #####                |         109 | #####
```

Fixes: cockroachdb#108420
Fixes: cockroachdb#124171

Release note (bug fix): Scattering a range with replication factor=1, no
longer erroneously up-replicates the range to two replicas. Leases will
also no longer thrash between nodes when perturbed with replication
factor=1.
@kvoli kvoli self-assigned this May 20, 2024
@exalate-issue-sync exalate-issue-sync bot assigned kvoli and unassigned kvoli May 20, 2024
kvoli added a commit to kvoli/cockroach that referenced this issue May 20, 2024
The allocator would add a voter, instead of both adding and removing the
existing voter when rebalancing ranges with one replica. Removing the
leaseholder replica was not possible prior to cockroachdb#74077, so the addition
only was necessary.

This restriction is no longer necessary. Allow rebalancing a one voter
range between stores using joint configurations, where the lease will be
transferred to the incoming voter store, from the outgoing demoting
voter.

Scattering ranges with one voter will now leave the range with exactly
one voter, where previously both the leaseholder voter evaluating the
scatter, and the new voter would be left.

Before this patch, scattering 1000 ranges with RF=1 on a 5 store
cluster:

```
  store_id | replica_count | replica_distribution | lease_count | lease_distribution
-----------+---------------+----------------------+-------------+---------------------
         1 |          1001 | ##########           |         500 | ##########
         5 |           291 | ###                  |         147 | ###
         4 |           275 | ###                  |         137 | ###
         3 |           229 | ###                  |         118 | ###
         2 |           206 | ###                  |          99 | ##
```

After:

```
  store_id | replica_count | replica_distribution | lease_count | lease_distribution
-----------+---------------+----------------------+-------------+---------------------
         2 |           242 | ##########           |         241 | ##########
         4 |           227 | ##########           |         227 | ##########
         5 |           217 | #########            |         216 | #########
         3 |           209 | #########            |         208 | #########
         1 |           106 | #####                |         109 | #####
```

Fixes: cockroachdb#108420
Fixes: cockroachdb#124171

Release note (bug fix): Scattering a range with replication factor=1, no
longer erroneously up-replicates the range to two replicas. Leases will
also no longer thrash between nodes when perturbed with replication
factor=1.
craig bot pushed a commit that referenced this issue May 20, 2024
124284: kvserver: rebalance ranges with one voter using joint configurations  r=nvanbenschoten a=kvoli

The allocator would add a voter, instead of both adding and removing the
existing voter when rebalancing ranges with one replica. Removing the
leaseholder replica was not possible prior to #74077, so the addition
only was necessary.

This restriction is no longer necessary. Allow rebalancing a one voter
range between stores using joint configurations, where the lease will be
transferred to the incoming voter store, from the outgoing demoting
voter.

Scattering ranges with one voter will now leave the range with exactly
one voter, where previously both the leaseholder voter evaluating the
scatter, and the new voter would be left.

Before this patch, scattering 1000 ranges with RF=1 on a 5 store
cluster:

```
  store_id | replica_count | replica_distribution | lease_count | lease_distribution
-----------+---------------+----------------------+-------------+---------------------
         1 |          1001 | ##########           |         500 | ##########
         5 |           291 | ###                  |         147 | ###
         4 |           275 | ###                  |         137 | ###
         3 |           229 | ###                  |         118 | ###
         2 |           206 | ###                  |          99 | ##
```

After:

```
  store_id | replica_count | replica_distribution | lease_count | lease_distribution
-----------+---------------+----------------------+-------------+---------------------
         2 |           242 | ##########           |         241 | ##########
         4 |           227 | ##########           |         227 | ##########
         5 |           217 | #########            |         216 | #########
         3 |           209 | #########            |         208 | #########
         1 |           106 | #####                |         109 | #####
```

Fixes: #108420
Fixes: #124171

Release note (bug fix): Scattering a range with replication factor=1, no
longer erroneously up-replicates the range to two replicas. Leases will
also no longer thrash between nodes when perturbed with replication
factor=1.

124444: packer: move `add-apt-repository` call to after `sleep` r=jlinder a=rickystewart

Updates are still going on on the VM while this is happening.

Epic: none
Release note: None

Co-authored-by: Austen McClernon <austen@cockroachlabs.com>
Co-authored-by: Ricky Stewart <ricky@cockroachlabs.com>
@craig craig bot closed this as completed in 1995005 May 20, 2024
blathers-crl bot pushed a commit that referenced this issue May 20, 2024
The allocator would add a voter, instead of both adding and removing the
existing voter when rebalancing ranges with one replica. Removing the
leaseholder replica was not possible prior to #74077, so the addition
only was necessary.

This restriction is no longer necessary. Allow rebalancing a one voter
range between stores using joint configurations, where the lease will be
transferred to the incoming voter store, from the outgoing demoting
voter.

Scattering ranges with one voter will now leave the range with exactly
one voter, where previously both the leaseholder voter evaluating the
scatter, and the new voter would be left.

Before this patch, scattering 1000 ranges with RF=1 on a 5 store
cluster:

```
  store_id | replica_count | replica_distribution | lease_count | lease_distribution
-----------+---------------+----------------------+-------------+---------------------
         1 |          1001 | ##########           |         500 | ##########
         5 |           291 | ###                  |         147 | ###
         4 |           275 | ###                  |         137 | ###
         3 |           229 | ###                  |         118 | ###
         2 |           206 | ###                  |          99 | ##
```

After:

```
  store_id | replica_count | replica_distribution | lease_count | lease_distribution
-----------+---------------+----------------------+-------------+---------------------
         2 |           242 | ##########           |         241 | ##########
         4 |           227 | ##########           |         227 | ##########
         5 |           217 | #########            |         216 | #########
         3 |           209 | #########            |         208 | #########
         1 |           106 | #####                |         109 | #####
```

Fixes: #108420
Fixes: #124171

Release note (bug fix): Scattering a range with replication factor=1, no
longer erroneously up-replicates the range to two replicas. Leases will
also no longer thrash between nodes when perturbed with replication
factor=1.
@kvoli
Copy link
Collaborator Author

kvoli commented May 21, 2024

Will be closed on backport to 23.1, 23.2 and 24.1.

kvoli added a commit to kvoli/cockroach that referenced this issue May 28, 2024
The allocator would add a voter, instead of both adding and removing the
existing voter when rebalancing ranges with one replica. Removing the
leaseholder replica was not possible prior to cockroachdb#74077, so the addition
only was necessary.

This restriction is no longer necessary. Allow rebalancing a one voter
range between stores using joint configurations, where the lease will be
transferred to the incoming voter store, from the outgoing demoting
voter.

Scattering ranges with one voter will now leave the range with exactly
one voter, where previously both the leaseholder voter evaluating the
scatter, and the new voter would be left.

Before this patch, scattering 1000 ranges with RF=1 on a 5 store
cluster:

```
  store_id | replica_count | replica_distribution | lease_count | lease_distribution
-----------+---------------+----------------------+-------------+---------------------
         1 |          1001 | ##########           |         500 | ##########
         5 |           291 | ###                  |         147 | ###
         4 |           275 | ###                  |         137 | ###
         3 |           229 | ###                  |         118 | ###
         2 |           206 | ###                  |          99 | ##
```

After:

```
  store_id | replica_count | replica_distribution | lease_count | lease_distribution
-----------+---------------+----------------------+-------------+---------------------
         2 |           242 | ##########           |         241 | ##########
         4 |           227 | ##########           |         227 | ##########
         5 |           217 | #########            |         216 | #########
         3 |           209 | #########            |         208 | #########
         1 |           106 | #####                |         109 | #####
```

Fixes: cockroachdb#108420
Fixes: cockroachdb#124171

Release note (bug fix): Scattering a range with replication factor=1, no
longer erroneously up-replicates the range to two replicas. Leases will
also no longer thrash between nodes when perturbed with replication
factor=1.
kvoli added a commit to kvoli/cockroach that referenced this issue May 28, 2024
The allocator would add a voter, instead of both adding and removing the
existing voter when rebalancing ranges with one replica. Removing the
leaseholder replica was not possible prior to cockroachdb#74077, so the addition
only was necessary.

This restriction is no longer necessary. Allow rebalancing a one voter
range between stores using joint configurations, where the lease will be
transferred to the incoming voter store, from the outgoing demoting
voter.

Scattering ranges with one voter will now leave the range with exactly
one voter, where previously both the leaseholder voter evaluating the
scatter, and the new voter would be left.

Before this patch, scattering 1000 ranges with RF=1 on a 5 store
cluster:

```
  store_id | replica_count | replica_distribution | lease_count | lease_distribution
-----------+---------------+----------------------+-------------+---------------------
         1 |          1001 | ##########           |         500 | ##########
         5 |           291 | ###                  |         147 | ###
         4 |           275 | ###                  |         137 | ###
         3 |           229 | ###                  |         118 | ###
         2 |           206 | ###                  |          99 | ##
```

After:

```
  store_id | replica_count | replica_distribution | lease_count | lease_distribution
-----------+---------------+----------------------+-------------+---------------------
         2 |           242 | ##########           |         241 | ##########
         4 |           227 | ##########           |         227 | ##########
         5 |           217 | #########            |         216 | #########
         3 |           209 | #########            |         208 | #########
         1 |           106 | #####                |         109 | #####
```

Fixes: cockroachdb#108420
Fixes: cockroachdb#124171

Release note (bug fix): Scattering a range with replication factor=1, no
longer erroneously up-replicates the range to two replicas. Leases will
also no longer thrash between nodes when perturbed with replication
factor=1.
@kvoli kvoli closed this as completed May 29, 2024
@exalate-issue-sync exalate-issue-sync bot reopened this May 29, 2024
@kvoli kvoli closed this as completed May 29, 2024
@github-project-automation github-project-automation bot moved this to Incoming in KV Aug 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-support Would prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docs P-1 Issues/test failures with a fix SLA of 1 month T-kv KV Team
Projects
No open projects
Status: Incoming
1 participant