Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: get rid of bespoke relocation logic used by the mergeQueue #62631

Merged
merged 3 commits into from
Mar 29, 2021

Conversation

aayushshah15
Copy link
Contributor

@aayushshah15 aayushshah15 commented Mar 26, 2021

This PR contains two main commits:

kvserver: update AdminRelocateRange to leverage explicit swaps of
voters to non-voters

This commit updates AdminRelocateRange to use explicit atomic swaps of
voting replicas with non-voting replicas, that #58627 initially added
support for. The patch does so by generalizing behavior that's already
exercised by the replicateQueue when it decides to rebalance replicas.
See #61239.

This allows us, in the next commit, to remove bespoke relocation logic
that's used by the mergeQueue to align replica sets for the sake of a
range merge.

Release note: None

kvserver: get rid of bespoke relocation logic used by the mergeQueue

This commit removes the relocation logic used by the mergeQueue thus
far to align replica sets (added in #56197). This logic previously
existed in order to allow us to align the replica sets of a pair of
ranges (which is required for the range merge to proceed), while
avoiding redundant data movement.

Before #58627 and the previous commit in this PR, AdminRelocateRange
couldn't be directly used by the mergeQueue under various configurations
of the LHS and RHS ranges. Furthermore, even when it could be used, it
would involve redundant data movement. This all required us to compute
relocation targets for the RHS of a merge separately, above the call to
AdminRelocateRange, for the range merge to proceed.

All these limitations have been resolved by the previous commit which
teaches AdminRelocateRange to promote non-voters and demote voters
when needed, and the aforementioned bespoke relocation logic is no
longer needed.

Resolves #62370

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@aayushshah15
Copy link
Contributor Author

I'm going to add a randomized test (that just issues a bunch of random calls to AdminRelocateRange) -- kinda like

func TestReplicateQueueSwapVotersWithNonVoters(t *testing.T) {
. Feel free to hold off on reviewing until then if you'd prefer.

@aayushshah15 aayushshah15 force-pushed the fix-merge-queue-reloc-ub-2 branch 2 times, most recently from 3ea0d8d to d0be2f0 Compare March 26, 2021 18:37
Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very satisfying!

Reviewed 1 of 1 files at r1, 4 of 4 files at r2, 4 of 4 files at r3.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15 and @andreimatei)


pkg/kv/kvserver/allocator.go, line 193 at r2 (raw file):

// targetReplicaType.{Add,Remove}ChangeType methods wherever possible.
func (t targetReplicaType) AddChangeType() roachpb.ReplicaChangeType {
	switch typ := t; typ {

Is there a reason for the value assignment in these switch statements? Would switch t { accomplish the same thing with less code and less room for confusion?


pkg/kv/kvserver/allocator.go, line 205 at r2 (raw file):

// RemoveChangeType returns the roachpb.ReplicaChangeType corresponding to the
// given targetReplicaType.
func (t targetReplicaType) RemoveChangeType() roachpb.ReplicaChangeType {

Can we use this in replicateQueue.removeDead?


pkg/kv/kvserver/client_merge_test.go, line 21 at r2 (raw file):

	"reflect"
	"regexp"
	"sort"

I'm surprised to see an import change with no other code changes. Does this commit compile?


pkg/kv/kvserver/replica_command.go, line 2469 at r2 (raw file):

}

// GetTargetsToCollocateRHSForMerge decides the configuration of RHS replicas

💯


pkg/kv/kvserver/replica_command.go, line 2875 at r2 (raw file):

			StoreID: targetStore.StoreID,
		}
		if args.targetType == voterTarget {

Was there a reason to separate this block out from the next? Do you find it easier to read?

If we merged them, we could avoid scanning through the replicas twice, and could instead of a single iteration through existingNonVoters.


pkg/kv/kvserver/replica_command.go, line 3001 at r2 (raw file):

		// (Note that !shouldRemove implies that we're trying to remove the last
		// replica left in the descriptor which is illegal).
		ops = append(ops, roachpb.MakeReplicationChanges(args.targetType.RemoveChangeType(), removalTarget)...)

Why the append here?


pkg/kv/kvserver/replicate_queue.go, line 1130 at r2 (raw file):

				return false, err
			}
			if performingSwap {

nit: consider pulling this below rq.metrics.RebalanceReplicaCount.Inc(1) so that the non-conditional metric comes first and the conditional metrics come after.


pkg/kv/kvserver/replicate_queue.go, line 1221 at r2 (raw file):

		}
		log.VEventf(ctx, 1, "can't swap replica due to lease; falling back to add")
		return chgs, performingSwap, err

nit: we generally prefer returning a constant instead of a named return variable when the value is known. It makes things easier to read. So this would be return chgs, false, err.

Copy link
Contributor Author

@aayushshah15 aayushshah15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cleaned some things up and added a bunch more testing -- all in the second commit. Should be RFAL.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15, @andreimatei, and @nvanbenschoten)


pkg/kv/kvserver/allocator.go, line 193 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Is there a reason for the value assignment in these switch statements? Would switch t { accomplish the same thing with less code and less room for confusion?

No reason. Fixed now.


pkg/kv/kvserver/allocator.go, line 205 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Can we use this in replicateQueue.removeDead?

Done there and in removeDecommissioning.


pkg/kv/kvserver/client_merge_test.go, line 21 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I'm surprised to see an import change with no other code changes. Does this commit compile?

Messed up while re-organizing my changes. Fixed now.


pkg/kv/kvserver/replica_command.go, line 2875 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Was there a reason to separate this block out from the next? Do you find it easier to read?

If we merged them, we could avoid scanning through the replicas twice, and could instead of a single iteration through existingNonVoters.

That wasn't intentional, fixed.


pkg/kv/kvserver/replica_command.go, line 3001 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Why the append here?

Fixed.


pkg/kv/kvserver/replicate_queue.go, line 1130 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: consider pulling this below rq.metrics.RebalanceReplicaCount.Inc(1) so that the non-conditional metric comes first and the conditional metrics come after.

Done.


pkg/kv/kvserver/replicate_queue.go, line 1221 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: we generally prefer returning a constant instead of a named return variable when the value is known. It makes things easier to read. So this would be return chgs, false, err.

Done.

@aayushshah15 aayushshah15 force-pushed the fix-merge-queue-reloc-ub-2 branch 4 times, most recently from 3b59a24 to b4aeeda Compare March 27, 2021 19:59
@aayushshah15
Copy link
Contributor Author

The new randomized test I added is flaking because it also asserts that AdminRelocateRange moved the lease to the first voter in the given list of voterTargets. However, under stress sometimes the leaseholder node misses a liveness heartbeat, which ends up failing the test. I think maybe a 9 node TestCluster is a bit too much. I'm stressing with a 6 node TestCluster to see how that goes. If that doesn't do the trick, I thinking we can do away with the leaseholder check under stress.

@aayushshah15 aayushshah15 force-pushed the fix-merge-queue-reloc-ub-2 branch 2 times, most recently from 94f2dba to 74b6ec9 Compare March 28, 2021 03:40
Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 6 of 7 files at r4, 7 of 7 files at r5.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @aayushshah15 and @andreimatei)


pkg/kv/kvserver/replica_command.go, line 2875 at r2 (raw file):

Previously, aayushshah15 (Aayush Shah) wrote…

That wasn't intentional, fixed.

We're still searching for the non-voter twice, once with GetReplicaDescriptor and once with an iteration over existingNonVoters. Should we avoid that?


pkg/kv/kvserver/replica_command.go, line 2957 at r5 (raw file):

		}
	} else if shouldAdd {
		ops = roachpb.MakeReplicationChanges(args.targetType.AddChangeType(), additionTarget)

It's probably better to avoid setting ops if we're just going to immediately overwrite it on the canPromoteNonVoter/canDemoteVoter paths. This is currently a bit misleading and looks error-prone.

Copy link
Contributor Author

@aayushshah15 aayushshah15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andreimatei and @nvanbenschoten)


pkg/kv/kvserver/replica_command.go, line 2875 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

We're still searching for the non-voter twice, once with GetReplicaDescriptor and once with an iteration over existingNonVoters. Should we avoid that?

Ugh, sorry for being careless!

Fixed for real now.


pkg/kv/kvserver/replica_command.go, line 2957 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

It's probably better to avoid setting ops if we're just going to immediately overwrite it on the canPromoteNonVoter/canDemoteVoter paths. This is currently a bit misleading and looks error-prone.

Done (this is what you meant, right?).

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 5 of 5 files at r7.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @aayushshah15 and @andreimatei)


pkg/kv/kvserver/replica_command.go, line 2957 at r5 (raw file):

Previously, aayushshah15 (Aayush Shah) wrote…

Done (this is what you meant, right?).

Exactly.


pkg/kv/kvserver/replica_command.go, line 2847 at r7 (raw file):

				}
			}
			if promotionTargetIdx >= 0 {

Do we need promotionTargetIdx? Can we just move this condition into the loop?

voters to non-voters

This commit updates `AdminRelocateRange` to use explicit atomic swaps of
voting replicas with non-voting replicas, that cockroachdb#58627 initially added
support for. The patch does so by generalizing behavior that's already
exercised by the `replicateQueue` when it decides to rebalance replicas.
See cockroachdb#61239.

This allows us, in the next commit, to remove bespoke relocation logic
that's used by the `mergeQueue` to align replica sets for the sake of a
range merge.

Release note: None
This commit removes the relocation logic used by the `mergeQueue` thus
far to align replica sets (added in cockroachdb#56197). This logic previously
existed in order to allow us to align the replica sets of a pair of
ranges (which is required for the range merge to proceed), while
avoiding redundant data movement.

Before cockroachdb#58627 and the previous commit in this PR, `AdminRelocateRange`
couldn't be directly used by the mergeQueue under various configurations
of the LHS and RHS ranges. Furthermore, even when it could be used, it
would involve redundant data movement. This all required us to compute
relocation targets for the RHS of a merge separately, above the call to
`AdminRelocateRange`, for the range merge to proceed.

All these limitations have been resolved by the previous commit which
teaches `AdminRelocateRange` to promote non-voters and demote voters
when needed, and the aforementioned bespoke relocation logic is no
longer needed.

Release note: None
Copy link
Contributor Author

@aayushshah15 aayushshah15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for being patient with me for these last few comments, and thanks for the review.

bors r+

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andreimatei and @nvanbenschoten)


pkg/kv/kvserver/replica_command.go, line 2847 at r7 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Do we need promotionTargetIdx? Can we just move this condition into the loop?

Moved it into the loop.

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 5 of 5 files at r9.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andreimatei)

@craig
Copy link
Contributor

craig bot commented Mar 29, 2021

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

kvserver: update AdminRelocateRange to use explicit swaps of voters and non-voters
3 participants