Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: support atomic promotions and demotions of non-voting replicas #58627

Merged
merged 2 commits into from
Feb 22, 2021

Conversation

aayushshah15
Copy link
Contributor

@aayushshah15 aayushshah15 commented Jan 8, 2021

This PR teaches AdminChangeReplicas to atomically promote voters to non-voters, demote voters to non-voters or swap voters with non-voters via joint consensus.

The approach followed by this PR tries to coalesce ReplicationChanges of types ADD_VOTER and REMOVE_NON_VOTER on a given target as promotions of non-voters into voters and likewise, ADD_NON_VOTER and REMOVE_VOTER changes for a given target as demotions of voters into non-voters. When all 4 of these operations are simultaneously passed into a ChangeReplicasRequest, the patch will try to execute them as one atomic swap of a voter with a non-voter.

Fixes #58499
Informs #51943

Release note: None

@aayushshah15 aayushshah15 requested a review from a team as a code owner January 8, 2021 00:20
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@aayushshah15 aayushshah15 requested review from nvanbenschoten and andreimatei and removed request for a team January 8, 2021 00:23
@aayushshah15 aayushshah15 force-pushed the nonvoter-promo-demo branch 3 times, most recently from 858b029 to 58231e4 Compare January 8, 2021 20:03
Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still working my way through replica_command.go, but I've looked at everything else and am entering a few hours of meetings so I figured I'd post this for now.

Reviewed 17 of 18 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15, @andreimatei, and @nvanbenschoten)


pkg/kv/kvserver/client_replica_test.go, line 2393 at r1 (raw file):

	require.True(t, ok)
	require.Equal(t, roachpb.NON_VOTER, replDesc.GetType())
	replDesc, ok = desc.GetReplicaDescriptor(r1.StoreID)

Should this be r3?


pkg/kv/kvserver/replica_command.go, line 903 at r1 (raw file):

// 2. s1/1 s2/2 s3/3 s4/4LEARNER
// 3. s1/1 s2/2 s3/3 s4/4LEARNER s5/5LEARNER
// 4. s1/1VOTER_DEMOTING s2/2VOTER_DEMOTING s3/3 s4/4VOTER_INCOMING s5/5VOTER_INCOMING

Let's make sure all references to VOTER_DEMOTING are removed, or they'll cause confusion in the future.


pkg/kv/kvserver/replica_raftstorage.go, line 782 at r1 (raw file):

				// "applied by voters" here.
				case roachpb.VOTER_FULL, roachpb.VOTER_INCOMING, roachpb.VOTER_DEMOTING_LEARNER,
					roachpb.VOTER_OUTGOING, roachpb.LEARNER:

What about VOTER_DEMOTING_NON_VOTER? We should take a pass over every use of VOTER_DEMOTING_LEARNER and make sure we're also handling VOTER_DEMOTING_NON_VOTER appropriately.


pkg/roachpb/data.go, line 1626 at r1 (raw file):

			// first.
			changeType = raftpb.ConfChangeAddNode
		case LEARNER, NON_VOTER:

nit: it seems a bit strange to combine these, given that the rest of this switch-case block is intentionally verbose to give room for documentation.


pkg/roachpb/metadata_replicas.go, line 389 at r1 (raw file):

		case VOTER_OUTGOING:
			cs.VotersOutgoing = append(cs.VotersOutgoing, id)
		case VOTER_DEMOTING_LEARNER:

Do we need to update the various switch statements in this file and others to properly handle the VOTER_DEMOTING_NON_VOTER type?

@aayushshah15 aayushshah15 force-pushed the nonvoter-promo-demo branch 3 times, most recently from 00252c0 to 98e9e29 Compare January 12, 2021 05:47
Copy link
Contributor Author

@aayushshah15 aayushshah15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @nvanbenschoten)


pkg/kv/kvserver/client_replica_test.go, line 2393 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Should this be r3?

Yup, fixed.


pkg/kv/kvserver/replica_command.go, line 903 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Let's make sure all references to VOTER_DEMOTING are removed, or they'll cause confusion in the future.

Done.


pkg/kv/kvserver/replica_raftstorage.go, line 782 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

What about VOTER_DEMOTING_NON_VOTER? We should take a pass over every use of VOTER_DEMOTING_LEARNER and make sure we're also handling VOTER_DEMOTING_NON_VOTER appropriately.

Done. Shouldn't be any more locations where we're not handling VOTER_DEMOTING_NON_VOTER correctly.


pkg/roachpb/data.go, line 1626 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: it seems a bit strange to combine these, given that the rest of this switch-case block is intentionally verbose to give room for documentation.

I could separate them out, but I don't really have anything interesting to say about the NON_VOTER case there.


pkg/roachpb/metadata_replicas.go, line 389 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Do we need to update the various switch statements in this file and others to properly handle the VOTER_DEMOTING_NON_VOTER type?

Done.

@lunevalex lunevalex added the A-multiregion Related to multi-region label Jan 12, 2021
Copy link
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15 and @nvanbenschoten)


pkg/kv/kvserver/client_replica_test.go, line 2374 at r2 (raw file):

	key := tc.ScratchRange(t)
	// NB: The test cluster should start with r1 having a voting replica (and the

s/should//

Better to assert that, btw.


pkg/kv/kvserver/client_replica_test.go, line 2390 at r2 (raw file):

	desc, err := tc.SwapVoterWithNonVoter(key, r2, r3)
	require.NoError(t, err)
	replDesc, ok := desc.GetReplicaDescriptor(r2.StoreID)

I don't really understand these descriptor checks. Shouldn't they be inside tc.SwapVoterWithNonVoter() ?


pkg/kv/kvserver/replica_command.go, line 988 at r2 (raw file):

	}

	// TODO DURING REVIEW: I can't rationalize one way or another if we should

I guess ideally the order would be promotions, then voter additions, then voter removals, then voter demotions, then non-voter stuff.

Like, I think you want to replace a missing voter as soon as possible.


pkg/kv/kvserver/replica_command.go, line 996 at r2 (raw file):

	}

	// Synthesize the additions and removals that didn't get executed above as

I've got no idea what's going on here. I think maybeExecNonVoterPromotionsAndDemotions() should modify chgs and take out whatever it's done.


pkg/kv/kvserver/replica_command.go, line 1088 at r2 (raw file):

// (voter, non-voter) pair, it will execute an atomic swap using joint
// consensus.
func maybeExecNonVoterPromotionsAndDemotions(

I think the maybe in the name only serves to confuse, particularly nothing in the comment alludes to it. I think execNonVoterPromotionsAndDemotions() would be appropriate. Or just execPromotionsAndDemotions.


pkg/kv/kvserver/replica_command.go, line 1095 at r2 (raw file):

	details string,
	chgs roachpb.ReplicationChanges,
) (afterDesc *roachpb.RangeDescriptor, err error) {

nit: consider unnamed returns here


pkg/kv/kvserver/replica_command.go, line 1398 at r2 (raw file):

// suitable ChangeReplicasTrigger, see prepareChangeReplicasTrigger for details.
//
// Contrary to the name, *all* membership changes go through this method, even

Is this comment still accurate? Non-voters don't seem to make it here.


pkg/kv/kvserver/replica_command.go, line 1411 at r2 (raw file):

	reason kvserverpb.RangeLogEventReason,
	details string,
	voterAdditions, voterRemovals []roachpb.ReplicationTarget,

was this change particularly motivated by the rest of the patch, or is it generally a good idea in order to clarify that this function only deals with voters?


pkg/kv/kvserver/replica_command.go, line 1575 at r2 (raw file):

	internalChangeTypeAddNonVoter
	internalChangeTypePromoteLearner
	// TODO DURING REVIEW: We don't really need a new internalChangeType here for

It seems to me that a promotion should apply to either a LEARNER or a NON_VOTER just the same. I don't see the need to error out if we think we were promoting a LEARNER and instead we're promoting a NON_VOTER, or the other way around. No?


pkg/kv/kvserver/replica_command.go, line 1660 at r2 (raw file):

				rDesc, prevTyp, ok := updatedDesc.SetReplicaType(chg.target.NodeID, chg.target.StoreID, typ)
				if !ok || prevTyp != roachpb.NON_VOTER {
					return nil, errors.Errorf("cannot promote target %v which is missing as NON_VOTER", chg.target)

long line


pkg/roachpb/metadata.proto, line 46 at r2 (raw file):

In normal operation, VOTER_FULL and LEARNER are the only used states.

Is it time to update this with NON_VOTER?

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 12 of 12 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15 and @andreimatei)


pkg/kv/kvserver/replica_command.go, line 988 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

I guess ideally the order would be promotions, then voter additions, then voter removals, then voter demotions, then non-voter stuff.

Like, I think you want to replace a missing voter as soon as possible.

I don't think we can do it in exactly that order because we'd like to execute swaps of promotion/demotion atomically, like this PR is doing.


pkg/kv/kvserver/replica_command.go, line 1088 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

I think the maybe in the name only serves to confuse, particularly nothing in the comment alludes to it. I think execNonVoterPromotionsAndDemotions() would be appropriate. Or just execPromotionsAndDemotions.

Agreed.


pkg/kv/kvserver/replica_command.go, line 1095 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

nit: consider unnamed returns here

+1


pkg/kv/kvserver/replica_command.go, line 1096 at r2 (raw file):

	chgs roachpb.ReplicationChanges,
) (afterDesc *roachpb.RangeDescriptor, err error) {
	// Isolate the promotions of voters and the demotions of non-voters from the

nit: I would s/of/to/ here in both places to give some indication of direction.


pkg/kv/kvserver/replica_command.go, line 1221 at r2 (raw file):

			}
		}
		if prevChg, ok := byStoreID[chg.Target.StoreID]; ok {

Are we missing some new cases in TestValidateReplicationChanges?


pkg/kv/kvserver/replica_command.go, line 1271 at r2 (raw file):

			// before the voting replica has been upreplicated and switched from
			// LEARNER to VOTER_FULL.
			if rDesc.GetType() == roachpb.LEARNER {

The logic in here is a little confusing. We're matching against the replica type and the change type, without much of a pattern. Would this be cleaner if we included an outer switch like the following?

switch rDesc.GetType() {
case roachpb.LEARNER:
    // reject
case roachpb.NON_VOTER:
    if chg.ChangeType == roachpb.ADD_NON_VOTER {
        // reject
    }
default:
    if chg.ChangeType == roachpb.ADD_VOTER {
        // reject
    }
}

Doing so reveals interesting questions, like whether an ADD_NON_VOTER or an ADD_VOTER should be accepted for a VOTER_DEMOTING_NON_VOTER.


pkg/kv/kvserver/replica_command.go, line 1578 at r2 (raw file):

	// non-voter promotions. The only purpose this serves is a sanity check
	// assertion inside prepareChangeReplicasTrigger (which I do like). Should I
	// generalize the existing internalChangeTypePromoteLearner and just use that?

Generalize as in make the logic a function of both the internalChangeType and the prevTyp? Doing so seems a little weird if we can't also merge internalChangeTypeDemoteVoterToLearner and internalChangeTypeDemoteVoterToNonVoter.


pkg/kv/kvserver/replica_command.go, line 2800 at r2 (raw file):

	left, right []roachpb.ReplicationTarget,
) (intersection []roachpb.ReplicationTarget) {
	leftByStoreID := make(map[roachpb.StoreID]interface{})

This might be a case where an n^2 algorithm is actually better. Creating and seeding a map is expensive if we're just going to throw it away, and the size of these slices should be tiny. I'd imagine the crossover point would be somewhere around 50 elements in these slices.

Copy link
Contributor Author

@aayushshah15 aayushshah15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15, @andreimatei, and @nvanbenschoten)


pkg/kv/kvserver/replica_command.go, line 988 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I don't think we can do it in exactly that order because we'd like to execute swaps of promotion/demotion atomically, like this PR is doing.

In addition to what Nathan said, I think our discussion from yesterday about "trying to swap a non-voter with voter before trying to add a new voter" leads to the correct order being something like:

promotions/demotions/swaps -> voter additions -> voter removals -> non-voter additions -> non-voter removals.

Does that seem reasonable?


pkg/kv/kvserver/replica_command.go, line 1575 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

It seems to me that a promotion should apply to either a LEARNER or a NON_VOTER just the same. I don't see the need to error out if we think we were promoting a LEARNER and instead we're promoting a NON_VOTER, or the other way around. No?

The key difference here is that the promotion of a learner happens implicitly in the same ConfChange as the one that enters the raft group into the joint membership state. This is clearly very different from what we're doing with NON_VOTER promotions. If we think we're promoting a NON_VOTER but we're in fact promoting a LEARNER, we're likely operating with a stale range descriptor (or something went seriously wrong with our bookkeeping) and we probably don't want to allow this ConfChange through.

If you don't find this line of argument compelling I'll just use the same internalChangeType for both promotions.


pkg/kv/kvserver/replica_command.go, line 1578 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Generalize as in make the logic a function of both the internalChangeType and the prevTyp? Doing so seems a little weird if we can't also merge internalChangeTypeDemoteVoterToLearner and internalChangeTypeDemoteVoterToNonVoter.

Nope, I meant that we could just use one internalChangeTypePromote rather than two separate ones for nonvoters and learners.

@aayushshah15 aayushshah15 force-pushed the nonvoter-promo-demo branch 2 times, most recently from 3d654cb to 09ce463 Compare January 18, 2021 11:40
Copy link
Contributor Author

@aayushshah15 aayushshah15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @nvanbenschoten)


pkg/kv/kvserver/client_replica_test.go, line 2374 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

s/should//

Better to assert that, btw.

Done


pkg/kv/kvserver/client_replica_test.go, line 2390 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

I don't really understand these descriptor checks. Shouldn't they be inside tc.SwapVoterWithNonVoter() ?

Can you say why? We don't really do this in any of the other utility methods in TestCluster. All of these run their respective commands and return an error if the command failed.

The descriptor checks exist to ensure that, if the AdminChangeReplicas request didn't fail, the swap actually happened.


pkg/kv/kvserver/replica_command.go, line 988 at r2 (raw file):

Previously, aayushshah15 (Aayush Shah) wrote…

In addition to what Nathan said, I think our discussion from yesterday about "trying to swap a non-voter with voter before trying to add a new voter" leads to the correct order being something like:

promotions/demotions/swaps -> voter additions -> voter removals -> non-voter additions -> non-voter removals.

Does that seem reasonable?

Re-ordered the changes the way I described above.


pkg/kv/kvserver/replica_command.go, line 996 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

I've got no idea what's going on here. I think maybeExecNonVoterPromotionsAndDemotions() should modify chgs and take out whatever it's done.

I thought it was weird to have that function perform this grouping of replication changes so I separated the logic that groups replication changes into its own function. I think this has made changeReplicasImpl a lot cleaner. LMK if you don't like it.


pkg/kv/kvserver/replica_command.go, line 1088 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Agreed.

Changed the name.


pkg/kv/kvserver/replica_command.go, line 1095 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

+1

Done.


pkg/kv/kvserver/replica_command.go, line 1096 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: I would s/of/to/ here in both places to give some indication of direction.

Done.


pkg/kv/kvserver/replica_command.go, line 1221 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Are we missing some new cases in TestValidateReplicationChanges?

Added some tests.


pkg/kv/kvserver/replica_command.go, line 1271 at r2 (raw file):
Turned that part into a switch statement. This method has gotten really confusing and is due for a re-write.

whether an ADD_NON_VOTER or an ADD_VOTER should be accepted for a VOTER_DEMOTING_NON_VOTER.

I was confused about this too, and my last revision had introduced a bit of a bug here. Essentially, we must reject everything that enters that switch statement, since we're handling the add-remove case above before this loop. The only changes that will enter this switch block are simple additions (without accompanying removals) on stores that already have a replica (which we want to reject under all circumstances). For posterity's sake, I've added a comment above that should make it harder to miss.


pkg/kv/kvserver/replica_command.go, line 1398 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

Is this comment still accurate? Non-voters don't seem to make it here.

Updated the method name and the comment. LMK if you don't like it.


pkg/kv/kvserver/replica_command.go, line 1411 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

was this change particularly motivated by the rest of the patch, or is it generally a good idea in order to clarify that this function only deals with voters?

Both. After this latest revision, all such methods called from changeReplicasImpl take precisely the replication targets related to the changes they deal with.


pkg/kv/kvserver/replica_command.go, line 1660 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

long line

Fixed.


pkg/kv/kvserver/replica_command.go, line 2800 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

This might be a case where an n^2 algorithm is actually better. Creating and seeding a map is expensive if we're just going to throw it away, and the size of these slices should be tiny. I'd imagine the crossover point would be somewhere around 50 elements in these slices.

Done.


pkg/roachpb/metadata.proto, line 46 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…
In normal operation, VOTER_FULL and LEARNER are the only used states.

Is it time to update this with NON_VOTER?

Done.

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 12 of 12 files at r3.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15 and @andreimatei)


pkg/kv/kvserver/replica_command.go, line 988 at r2 (raw file):

Previously, aayushshah15 (Aayush Shah) wrote…

Re-ordered the changes the way I described above.

This all looks very good now. Could we add some commentary about why we execute things in this order though?


pkg/kv/kvserver/replica_command.go, line 1090 at r3 (raw file):

// demotions of voters into non-voters. The rest of the changes are handled
// distinctly and are thus segregated in the return result.
func synthesizeTargetsByChangeType(

This seems like a great function to wrap a unit test around 😉


pkg/kv/kvserver/replica_command.go, line 1128 at r3 (raw file):

	voterDemotions, nonVoterPromotions []roachpb.ReplicationTarget,
) (*roachpb.RangeDescriptor, error) {
	swaps := getInternalChangesForExplicitPromotionsAndDemotions(voterDemotions, nonVoterPromotions)

Should this be lifted outside of this function? We seem to perform the if len(...) > 0 { check in changeReplicasImpl for all other operations.


pkg/roachpb/metadata.proto, line 46 at r2 (raw file):

Previously, aayushshah15 (Aayush Shah) wrote…

Done.

Missing a comma after NON_VOTER.


pkg/testutils/serverutils/test_cluster_shim.go, line 107 at r3 (raw file):

	// SwapVoterWithNonVoter atomically swaps the voting replica located on
	// `voterTarget` with the non-voting replica located on `nonVoterTarget`.

"swaps" needs a bit of qualification. It might not be particularly obvious what it means in a few months when we forget about all of this. I'd add to the comment what exactly is swapping.

Copy link
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:LGTM:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @aayushshah15 and @andreimatei)


pkg/kv/kvserver/client_replica_test.go, line 2390 at r2 (raw file):

Can you say why? We don't really do this in any of the other utility methods in TestCluster. All of these run their respective commands and return an error if the command failed.

I guess in return I'd ask you for precedent for a test not trusting a TestCluster method to do what it said it'd do, and double-checking it.

The descriptor checks exist to ensure that, if the AdminChangeReplicas request didn't fail, the swap actually happened.

In other words, the checks ensure that, if SwapVoterWithNonVoter didn't fail, it did what it was supposed to do? Should every caller of the function do these checks?

Maybe I'm misunderstanding what this whole test is supposed to be. Is it testing production code, or is it testing SwapVoterWithNonVoter? These checks makes it seem like it's the latter - but in that case the test should be in a testcluster_test.go.
If it's the former, thenSwapVoterWithNonVoter is not a function under test - it's part of the test. So then if it wants any checks, it should do the checks itself.

I would move the checks into SwapVoterWithNonVoter (making it return errors, not calling t.Fatal), and then leave this test to be very small. It'd make sense - this test would check that SwapVoterWithNonVoter doesn't error out in a specific set of circumstances.


pkg/kv/kvserver/client_replica_test.go, line 2394 at r3 (raw file):

	desc, err := tc.SwapVoterWithNonVoter(key, r2, r3)
	require.NoError(t, err)
	replDesc, ok := desc.GetReplicaDescriptor(r2.StoreID)

name replDesc r2Desc and r3Desc, for clarity


pkg/kv/kvserver/replica_command.go, line 996 at r2 (raw file):

Previously, aayushshah15 (Aayush Shah) wrote…

I thought it was weird to have that function perform this grouping of replication changes so I separated the logic that groups replication changes into its own function. I think this has made changeReplicasImpl a lot cleaner. LMK if you don't like it.

I liked it


pkg/kv/kvserver/replica_command.go, line 1575 at r2 (raw file):

Previously, aayushshah15 (Aayush Shah) wrote…

The key difference here is that the promotion of a learner happens implicitly in the same ConfChange as the one that enters the raft group into the joint membership state. This is clearly very different from what we're doing with NON_VOTER promotions. If we think we're promoting a NON_VOTER but we're in fact promoting a LEARNER, we're likely operating with a stale range descriptor (or something went seriously wrong with our bookkeeping) and we probably don't want to allow this ConfChange through.

If you don't find this line of argument compelling I'll just use the same internalChangeType for both promotions.

sounds good
Consider leaving a comment explaining that these two are quite similar.


pkg/kv/kvserver/replica_command.go, line 1129 at r3 (raw file):

) (*roachpb.RangeDescriptor, error) {
	swaps := getInternalChangesForExplicitPromotionsAndDemotions(voterDemotions, nonVoterPromotions)
	if len(swaps) > 0 {

nit: invert the condition, return early, unindent the rest. And then you can also directly return execChangeReplicasTxn(...).


pkg/kv/kvserver/replica_command.go, line 1637 at r3 (raw file):

	// NB: demotions require joint consensus because of limitations in etcd/raft.
	// These could be lifted, but it doesn't seem worth it.
	isDemotion := c[0].typ == internalChangeTypeDemoteVoterToNonVoter || c[0].typ == internalChangeTypeDemoteVoterToLearner

long line :P


pkg/kv/kvserver/replica_command.go, line 1735 at r3 (raw file):

					// NB: this won't fire because cc.useJoint() is always true when
					// there's a demotion. This is just a sanity check.
					return nil, errors.Errorf("demotions require joint consensus")

make it an AssertionFailedError.


pkg/testutils/testcluster/testcluster.go, line 748 at r3 (raw file):

func (tc *TestCluster) SwapVoterWithNonVoter(
	startKey roachpb.Key, voterTarget, nonVoterTarget roachpb.ReplicationTarget,
) (roachpb.RangeDescriptor, error) {

consider returning the descriptor by pointer; that's generally how RangeDescriptors are passed

@craig craig bot merged commit eab1ae3 into cockroachdb:master Feb 22, 2021
aayushshah15 added a commit to aayushshah15/cockroach that referenced this pull request Mar 26, 2021
voters to non-voters

This commit updates `AdminRelocateRange` to use explicit atomic swaps of
voting replicas with non-voting replicas, that cockroachdb#58627 initially added
support for. The patch does so by generalizing behavior that's already
exercised by the `replicateQueue` when it decides to rebalance replicas.
See cockroachdb#61239.

This allows us, in the next commit, to remove bespoke relocation logic
that's used by the `mergeQueue` to align replica sets for the sake of a
range merge.

Release note: None
aayushshah15 added a commit to aayushshah15/cockroach that referenced this pull request Mar 26, 2021
This commit removes the relocation logic used by the `mergeQueue` thus
far to align replica sets (added in cockroachdb#56197). This logic previously
existed in order to allow us to align the replica sets of a pair of
ranges (which is required for the range merge to proceed), while
avoiding redundant data movement.

Before cockroachdb#58627 and the previous commit in this PR, `AdminRelocateRange`
couldn't be directly used by the mergeQueue under various configurations
of the LHS and RHS ranges. Furthermore, even when it could be used, it
would involve redundant data movement. This all required us to compute
relocation targets for the RHS of a merge separately, above the call to
`AdminRelocateRange`, for the range merge to proceed.

All these limitations have been resolved by the previous commit which
teaches `AdminRelocateRange` to promote non-voters and demote voters
when needed, and the aforementioned bespoke relocation logic is no
longer needed.

Release note: None
aayushshah15 added a commit to aayushshah15/cockroach that referenced this pull request Mar 28, 2021
voters to non-voters

This commit updates `AdminRelocateRange` to use explicit atomic swaps of
voting replicas with non-voting replicas, that cockroachdb#58627 initially added
support for. The patch does so by generalizing behavior that's already
exercised by the `replicateQueue` when it decides to rebalance replicas.
See cockroachdb#61239.

This allows us, in the next commit, to remove bespoke relocation logic
that's used by the `mergeQueue` to align replica sets for the sake of a
range merge.

Release note: None
aayushshah15 added a commit to aayushshah15/cockroach that referenced this pull request Mar 28, 2021
This commit removes the relocation logic used by the `mergeQueue` thus
far to align replica sets (added in cockroachdb#56197). This logic previously
existed in order to allow us to align the replica sets of a pair of
ranges (which is required for the range merge to proceed), while
avoiding redundant data movement.

Before cockroachdb#58627 and the previous commit in this PR, `AdminRelocateRange`
couldn't be directly used by the mergeQueue under various configurations
of the LHS and RHS ranges. Furthermore, even when it could be used, it
would involve redundant data movement. This all required us to compute
relocation targets for the RHS of a merge separately, above the call to
`AdminRelocateRange`, for the range merge to proceed.

All these limitations have been resolved by the previous commit which
teaches `AdminRelocateRange` to promote non-voters and demote voters
when needed, and the aforementioned bespoke relocation logic is no
longer needed.

Release note: None
aayushshah15 added a commit to aayushshah15/cockroach that referenced this pull request Mar 29, 2021
voters to non-voters

This commit updates `AdminRelocateRange` to use explicit atomic swaps of
voting replicas with non-voting replicas, that cockroachdb#58627 initially added
support for. The patch does so by generalizing behavior that's already
exercised by the `replicateQueue` when it decides to rebalance replicas.
See cockroachdb#61239.

This allows us, in the next commit, to remove bespoke relocation logic
that's used by the `mergeQueue` to align replica sets for the sake of a
range merge.

Release note: None
aayushshah15 added a commit to aayushshah15/cockroach that referenced this pull request Mar 29, 2021
This commit removes the relocation logic used by the `mergeQueue` thus
far to align replica sets (added in cockroachdb#56197). This logic previously
existed in order to allow us to align the replica sets of a pair of
ranges (which is required for the range merge to proceed), while
avoiding redundant data movement.

Before cockroachdb#58627 and the previous commit in this PR, `AdminRelocateRange`
couldn't be directly used by the mergeQueue under various configurations
of the LHS and RHS ranges. Furthermore, even when it could be used, it
would involve redundant data movement. This all required us to compute
relocation targets for the RHS of a merge separately, above the call to
`AdminRelocateRange`, for the range merge to proceed.

All these limitations have been resolved by the previous commit which
teaches `AdminRelocateRange` to promote non-voters and demote voters
when needed, and the aforementioned bespoke relocation logic is no
longer needed.

Release note: None
aayushshah15 added a commit to aayushshah15/cockroach that referenced this pull request Mar 29, 2021
voters to non-voters

This commit updates `AdminRelocateRange` to use explicit atomic swaps of
voting replicas with non-voting replicas, that cockroachdb#58627 initially added
support for. The patch does so by generalizing behavior that's already
exercised by the `replicateQueue` when it decides to rebalance replicas.
See cockroachdb#61239.

This allows us, in the next commit, to remove bespoke relocation logic
that's used by the `mergeQueue` to align replica sets for the sake of a
range merge.

Release note: None
aayushshah15 added a commit to aayushshah15/cockroach that referenced this pull request Mar 29, 2021
This commit removes the relocation logic used by the `mergeQueue` thus
far to align replica sets (added in cockroachdb#56197). This logic previously
existed in order to allow us to align the replica sets of a pair of
ranges (which is required for the range merge to proceed), while
avoiding redundant data movement.

Before cockroachdb#58627 and the previous commit in this PR, `AdminRelocateRange`
couldn't be directly used by the mergeQueue under various configurations
of the LHS and RHS ranges. Furthermore, even when it could be used, it
would involve redundant data movement. This all required us to compute
relocation targets for the RHS of a merge separately, above the call to
`AdminRelocateRange`, for the range merge to proceed.

All these limitations have been resolved by the previous commit which
teaches `AdminRelocateRange` to promote non-voters and demote voters
when needed, and the aforementioned bespoke relocation logic is no
longer needed.

Release note: None
craig bot pushed a commit that referenced this pull request Mar 29, 2021
62631: kvserver: get rid of bespoke relocation logic used by the mergeQueue r=aayushshah15 a=aayushshah15

This PR contains two main commits:

**kvserver: update `AdminRelocateRange` to leverage explicit swaps of
voters to non-voters**

This commit updates `AdminRelocateRange` to use explicit atomic swaps of
voting replicas with non-voting replicas, that #58627 initially added
support for. The patch does so by generalizing behavior that's already
exercised by the `replicateQueue` when it decides to rebalance replicas.
See #61239.

This allows us, in the next commit, to remove bespoke relocation logic
that's used by the `mergeQueue` to align replica sets for the sake of a
range merge.

Release note: None

**kvserver: get rid of bespoke relocation logic used by the mergeQueue**

This commit removes the relocation logic used by the `mergeQueue` thus
far to align replica sets (added in #56197). This logic previously
existed in order to allow us to align the replica sets of a pair of
ranges (which is required for the range merge to proceed), while
avoiding redundant data movement.

Before #58627 and the previous commit in this PR, `AdminRelocateRange`
couldn't be directly used by the mergeQueue under various configurations
of the LHS and RHS ranges. Furthermore, even when it could be used, it
would involve redundant data movement. This all required us to compute
relocation targets for the RHS of a merge separately, above the call to
`AdminRelocateRange`, for the range merge to proceed.

All these limitations have been resolved by the previous commit which
teaches `AdminRelocateRange` to promote non-voters and demote voters
when needed, and the aforementioned bespoke relocation logic is no
longer needed.

Resolves #62370

Release note: None


Co-authored-by: aayush <aayush@pop-os.localdomain>
Co-authored-by: Aayush Shah <aayush.shah15@gmail.com>
aayushshah15 added a commit to aayushshah15/cockroach that referenced this pull request Mar 29, 2021
voters to non-voters

This commit updates `AdminRelocateRange` to use explicit atomic swaps of
voting replicas with non-voting replicas, that cockroachdb#58627 initially added
support for. The patch does so by generalizing behavior that's already
exercised by the `replicateQueue` when it decides to rebalance replicas.
See cockroachdb#61239.

This allows us, in the next commit, to remove bespoke relocation logic
that's used by the `mergeQueue` to align replica sets for the sake of a
range merge.

Release note: None
aayushshah15 added a commit to aayushshah15/cockroach that referenced this pull request Mar 29, 2021
This commit removes the relocation logic used by the `mergeQueue` thus
far to align replica sets (added in cockroachdb#56197). This logic previously
existed in order to allow us to align the replica sets of a pair of
ranges (which is required for the range merge to proceed), while
avoiding redundant data movement.

Before cockroachdb#58627 and the previous commit in this PR, `AdminRelocateRange`
couldn't be directly used by the mergeQueue under various configurations
of the LHS and RHS ranges. Furthermore, even when it could be used, it
would involve redundant data movement. This all required us to compute
relocation targets for the RHS of a merge separately, above the call to
`AdminRelocateRange`, for the range merge to proceed.

All these limitations have been resolved by the previous commit which
teaches `AdminRelocateRange` to promote non-voters and demote voters
when needed, and the aforementioned bespoke relocation logic is no
longer needed.

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-multiregion Related to multi-region
Projects
None yet
Development

Successfully merging this pull request may close these issues.

kvserver: support atomic promotions, demotions and swaps of NON_VOTERs and VOTERs
5 participants