Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kv: don't unquiesce uninitialized replicas #73362

Merged
merged 5 commits into from
Dec 15, 2021

Conversation

nvanbenschoten
Copy link
Member

@nvanbenschoten nvanbenschoten commented Dec 2, 2021

In a support issue, we saw that 10s of thousands of uninitialized replicas were being ticked regularly and creating a large amount of background work on a node, driving up CPU. This commit updates the Raft quiescence logic to disallow uninitialized replicas from being unquiesced and Tick()'ing themselves.

Keeping uninitialized replicas quiesced even in the presence of Raft traffic avoids wasted work. We could Tick() these replicas, but doing so is unnecessary because uninitialized replicas can never win elections, so there is no reason for them to ever call an election. In fact, uninitialized replicas do not even know who their peers are, so there would be no way for them to call an election or for them to send any other non-reactive message. As a result, all work performed by an uninitialized replica is reactive and in response to incoming messages (see processRequestQueue).

There are multiple ways for an uninitialized replica to be created and then abandoned, and we don't do a good job garbage collecting them at a later point (see #73424), so it is important that they are cheap. Keeping them quiesced instead of letting them unquiesce and tick every 200ms indefinitely avoids a meaningful amount of periodic work for each uninitialized replica.

Release notes (bug fix): uninitialized replicas that are abandoned after an unsuccessful snapshot no longer perform periodic background work, so they no longer have a non-negligible cost.

@nvanbenschoten nvanbenschoten requested a review from tbg December 2, 2021 00:31
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@nvanbenschoten
Copy link
Member Author

Perhaps it is important that the uninitialized replica calls an election, if only so that it gets told to clean itself up if it has been abandoned. Except that this might be broken, because uninitialized replicas are learners, so they aren't even promotable and so they won't call elections. I think I'm missing something here.

I think this is incorrect in two ways which offset. Uninitialized replicas don't know that they are learners, because they don't have access to the range descriptor. This means that they aren't disqualified for being "promotable" because of that, at least from their perspective. However, uninitialized replicas don't know the other members of their range and don't even track themselves in their Progress map, so they still aren't "promotable" and wouldn't even know who to talk to if they were and decided to call an election. So ticking an uninitialized replica doesn't do anything.

This also means that leader elections cannot be used to trigger the cleanup of uninitialized replicas.

@tbg's new state machine diagram of replica initialization also hints at there being no path for an uninitialized replica to be removed without going through an initialized replica.

image (5)

@nvanbenschoten
Copy link
Member Author

#47982 looks relevant. I guess that confirms the suspicion.

So I guess the next steps here are:

  1. polishing this PR off (i.e. with a test), now that we're more confident that ticking uninitialized replicas is entirely useless. This makes uninitialized replicas cheaper and less disruptive.
  2. landing kvserver: garbage collect uninitialized replica eventually #47982 so that we will eventually (after ~1h) GC these abandoned uninitialized replicas.

@ajwerner do you have interest in pushing that PR over the finish line?

@tbg
Copy link
Member

tbg commented Dec 2, 2021

Let's keep Andrew out of it, I'm sure he has enough to do.

I might be missing something, but isn't GC'ing uninitialized replicas difficult? We would need an index RangeID->RangeDesc which we don't have (meta ranges are keyed on EndKey, not RangeID) or do full meta scans every now and then.
We can't "just" remove uninitialized replicas on a hunch without having proved that they are no longer a member of the range as this amounts to reducing availability of the range in a way that is more permanent than having a replica wait for a snapshot (removing a voter and preventing it from being recreated). We can remove non-voters (and expect that to be the common case) but to confirm that we need to see the descriptor, too.

Also worth pausing to think about the benefit of replicaGC'ing uninit'ed replicas once we don't tick them. Once we no longer tick them, other than a sense of closure the main upshot is removing them from the in-memory maps. The disk state is negligibly different, RangeTombstone key vs HardState key.

@tbg
Copy link
Member

tbg commented Dec 2, 2021

We can definitely replicaGC uninitialized replicas that have a trivial HardState, i.e. in particular Vote is unset. That should be the majority of cases. But what if Vote isn't unset? We know this uninitialized replica was a voter and we strongly suspect that it's obsolete, but there could also be permanent problems getting a snapshot there and losing this voter might lose quorum in a pretty permanent way.

I lean towards doing a ~daily scan through the meta ranges per node and performing GC of uninitialized replicas based on that, since that is simple and effective and handles the problem fully.

Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 4 files at r1, 2 of 2 files at r2, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)


-- commits, line 28 at r2:
It's by design in the sense that we can't replicaGC them for the reasons discussed in the main thread. There's definitely no desire to not be able to replicaGC them but it hasn't been a priority.


-- commits, line 31 at r2:
Correct, which might never happen, for example if the range doesn't exist any more (merge).


-- commits, line 35 at r2:
nit: they aren't always learners. You can also get uninit'ed voters with a quick split-rebalance succession. But the uninitialized replica doesn't even know what it is, it will have an empty config, see here:

// For uninitialized ranges, membership is unknown at this point.
if raft.IsEmptyHardState(hs) || err != nil {
return raftpb.HardState{}, raftpb.ConfState{}, err
}

So they certainly will not call an election, ever, since the progress map is empty. They also can't send a message to anyone in general, since that would require them to at least know about a peer, but these would have to be in the configuration. The only thing they can do is respond to incoming messages.

This indicates that by not ticking uninit'ed replicas, we're not regressing on their ability to "commonly" replicaGC themselves in response to a rejection of one of their outgoing messages, as this basically never happens in the first place anyway.


pkg/kv/kvserver/store_raft.go, line 638 at r2 (raw file):

			// Filter out uninitialized replicas. We could tick them, but doing so is
			// unnecessary because uninitialized replicas can never win elections, so

Might want to update this to reflect our finding that uninited replicas cannot even send non-reactive messages since they don't know who the peers are.


pkg/kv/kvserver/store_raft.go, line 647 at r2 (raw file):

			// them through the Raft scheduler and ticking them avoids a meaningful
			// amount of periodic work for each uninitialized replica.
			s.mu.RLock()

Reading this makes me wonder why we allow uninited replicas to be unquiesced. Not suggesting you change that, but maybe add a comment saying that it's just the way it is and there is no good reason and we don't like lots of things about uninitialized replicas and see #72374.

Copy link
Member Author

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't GC'ing uninitialized replicas difficult? We would need an index RangeID->RangeDesc which we don't have (meta ranges are keyed on EndKey, not RangeID) or do full meta scans every now and then.

This makes sense to me, thanks for explaining. This must mean that the approach in #47982 is broken. It seems to perform its consistent meta lookup with the replica's in-memory RangeDescriptor here. But if the replica is uninitialized, this descriptor will be empty. So it won't actually look up the right range.

Also worth pausing to think about the benefit of replicaGC'ing uninit'ed replicas once we don't tick them. Once we no longer tick them, other than a sense of closure the main upshot is removing them from the in-memory maps. The disk state is negligibly different, RangeTombstone key vs HardState key.

👍 that matches my understanding as well.

I lean towards doing a ~daily scan through the meta ranges per node and performing GC of uninitialized replicas based on that, since that is simple and effective and handles the problem fully.

This seems nice and simple.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @tbg)


pkg/kv/kvserver/store_raft.go, line 647 at r2 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Reading this makes me wonder why we allow uninited replicas to be unquiesced. Not suggesting you change that, but maybe add a comment saying that it's just the way it is and there is no good reason and we don't like lots of things about uninitialized replicas and see #72374.

That's a really interesting idea. If we allowed uninited replicas to quiesce (on their own, with no leader input) then we wouldn't need any of this special handling. These replicas would be removed from the unquiescedReplicas map and stay out of it until they heard from a peer. I'll explore that alternate approach, as it seems pretty clean and conceptually consistent.

@erikgrinaker
Copy link
Contributor

Thanks for all your work on this @nvanbenschoten! Wrote up an issue to handle GC in #73424.

@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/tickUninit branch from 6a7631b to 32b96f4 Compare December 6, 2021 02:21
@nvanbenschoten nvanbenschoten changed the title kv: don't tick uninitialized replicas kv: don't unquiesce uninitialized replicas Dec 6, 2021
@nvanbenschoten
Copy link
Member Author

Thanks for opening the issue @erikgrinaker! I updated the code comment to point to that issue.

@tbg I rewrote this PR according to your suggestion. Instead of letting uninitialized replicas unquiesce but then filtering them out in the tick loop, we now no longer let uninitialized replicas unquiesce. This seems cleaner to me, as it introduces a reasonable relationship between state machine initialization and quiescence and uses the existing quiescence logic to prevent these replicas from ticking, instead of introducing an additional orthogonal condition for ticking. PTAL.

@nvanbenschoten nvanbenschoten marked this pull request as ready for review December 6, 2021 02:26
@nvanbenschoten nvanbenschoten requested a review from a team as a code owner December 6, 2021 02:26
@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/tickUninit branch from 32b96f4 to 6b66392 Compare December 6, 2021 19:16
@tbg tbg self-requested a review December 7, 2021 11:02
Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm_strong:

Reviewed 2 of 4 files at r4, 9 of 9 files at r9, 4 of 4 files at r10, 4 of 4 files at r11, 4 of 4 files at r12, 7 of 7 files at r13, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten)


pkg/kv/kvserver/client_raft_test.go, line 4105 at r13 (raw file):

	defer tc.Stopper().Stop(ctx)

	_, desc, err := tc.Servers[0].ScratchRangeEx()

Add a comment about using an expiration-based lease or, my preference if possible, switch to ScratchRange.


pkg/kv/kvserver/client_raft_test.go, line 4110 at r13 (raw file):

	require.NoError(t, tc.WaitForSplitAndInitialization(key))

	// Block a snapshot.

// Block incoming snapshots on s2 until channel is signaled.


pkg/kv/kvserver/client_raft_test.go, line 4114 at r13 (raw file):

	handlerFuncs := noopRaftHandlerFuncs()
	handlerFuncs.snapErr = func(header *kvserver.SnapshotRequest_Header) error {
		<-blockSnapshot

give up on stopper.ShouldQuiesce()?


pkg/kv/kvserver/client_raft_test.go, line 4117 at r13 (raw file):

		return nil
	}
	s1, err := tc.Server(1).GetStores().(*kvserver.Stores).GetStore(tc.Server(1).GetFirstStoreID())

s2? s1 is on tc.Server(0). Indexing, always a joy.


pkg/kv/kvserver/client_raft_test.go, line 4125 at r13 (raw file):

	})

	// Try to up-replicate to s1. Should block on a learner snapshot after the new

s2 here and below?


pkg/kv/kvserver/client_raft_test.go, line 4130 at r13 (raw file):

	// receiving Raft traffic from the leader.
	replicateErrChan := make(chan error)
	go func() {

nit: I know it doesn't matter here but if we have a stopper might as well use it.


pkg/kv/kvserver/replica.go, line 306 at r13 (raw file):

		//
		// Replica objects always begin life in a quiescent state, as the field is
		// set to true in the Replica constructor newUnloadedReplica. They unquiesce

Ohh, interesting. I had no idea. That explains your earlier commit. I take back my comment about || internalRaftGroup == nil then.


pkg/kv/kvserver/replica.go, line 310 at r13 (raw file):

		// unquiesceWithOptionsLocked, which are called in response to Raft traffic.
		//
		// Only initialized replicas that have a non-nil internalRaftGroup are

This makes it sound like there are initialized replicas with a nil raft group, which raises the question of how these are allowed to unquiesce (since they are not according to this comment). So either init'ed replicas with nil group don't exist, or whether the group is nil or not doesn't matter for purposes of unquiescing?


pkg/kv/kvserver/replica_metrics.go, line 62 at r11 (raw file):

	raftStatus := r.raftStatusRLocked()
	leaseStatus := r.leaseStatusAtRLocked(ctx, now)
	quiescent := r.mu.quiescent || r.mu.internalRaftGroup == nil

Interesting. Have I missed a commit? Because, wouldn't an uninitialized replica start out with quiescent = false now when it was previously true? As is, this seems like a behavior change to me. Not a terribly important one (since this is only for ReplicaMetrics and internalRaftGroup is populated anyway within ~10 minutes (scan interval)) but still, this doesn't quite check out with the commit message.


pkg/kv/kvserver/replica_raft_quiesce.go, line 52 at r13 (raw file):

}

func (r *Replica) unquiesceWithOptionsLocked(campaignOnWake bool) {

maybeUnquiesceWithOptionsLocked to indicate that it may not be possible?


pkg/kv/kvserver/replica_raft_quiesce.go, line 116 at r13 (raw file):

		// lazily in response to any Raft traffic (see stepRaftGroup) or KV request
		// traffic (see maybeInitializeRaftGroup). If it has yet to be initialized,
		// let it remain quiesced. The Raft group will be initialized soon enough.

This is pretty annoying; we morally say that a replica unquiesces when it needs to use raft, but then we also say this isn't true, you need some folks bypassing this check and creating the group first? This is pretty complex and I'm not at all clear on why this works out in practice (I assume we have the right calls in the actual critical path).

I think we should initialize the raft group right away. Is now perhaps the right time to make that change? Doing this lazily was a half-baked way to avoid election storms back when this was still a problem and possibly before we even had quiescence. Now that quiescence is the default, what's the benefit of the lazy raft group?

I understand that we're looking to backport this, so maybe a fast-follow on master only?


pkg/kv/kvserver/store_create_replica.go, line 351 at r13 (raw file):

	// quiesced, so we don't need to campaign or wake the leader. We just want
	// to start ticking.
	lockedRepl.unquiesceWithOptionsLocked(false /* campaignOnWake */)

Beating the same drum as before, but now we have to know that the internal raft group is no longer nil (probably because we hit withReplicaForRequest?)

Did you have to add this line in response to any failures? It makes sense, but shouldn't be necessary, right?

Copy link
Member Author

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR! I finally got around to addressing your comments.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @tbg)


pkg/kv/kvserver/client_raft_test.go, line 4105 at r13 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Add a comment about using an expiration-based lease or, my preference if possible, switch to ScratchRange.

ScratchRangeEx isn't an expiration-based lease, it's the "extended" version of ScratchRange that returns the left and right descriptor instead of a start key. This made more sense to me than the logic I see in some places where we ScatchRange and then take the returned start key a look up the corresponding descriptor that we just dropped in ScratchRange.


pkg/kv/kvserver/client_raft_test.go, line 4110 at r13 (raw file):

// Block incoming snapshots on s2 until channel is signaled.
Done.


pkg/kv/kvserver/client_raft_test.go, line 4114 at r13 (raw file):

Previously, tbg (Tobias Grieger) wrote…

give up on stopper.ShouldQuiesce()?

Done.


pkg/kv/kvserver/client_raft_test.go, line 4117 at r13 (raw file):

Previously, tbg (Tobias Grieger) wrote…

s2? s1 is on tc.Server(0). Indexing, always a joy.

Done.


pkg/kv/kvserver/client_raft_test.go, line 4125 at r13 (raw file):

Previously, tbg (Tobias Grieger) wrote…

s2 here and below?

Done.


pkg/kv/kvserver/client_raft_test.go, line 4130 at r13 (raw file):

Previously, tbg (Tobias Grieger) wrote…

nit: I know it doesn't matter here but if we have a stopper might as well use it.

Done.


pkg/kv/kvserver/replica.go, line 306 at r13 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Ohh, interesting. I had no idea. That explains your earlier commit. I take back my comment about || internalRaftGroup == nil then.

Me neither until writing this up. But I think it makes sense for things to work like that.


pkg/kv/kvserver/replica.go, line 310 at r13 (raw file):

Previously, tbg (Tobias Grieger) wrote…

This makes it sound like there are initialized replicas with a nil raft group, which raises the question of how these are allowed to unquiesce (since they are not according to this comment). So either init'ed replicas with nil group don't exist, or whether the group is nil or not doesn't matter for purposes of unquiescing?

The dependency is the other way around. There are initialized replicas with a nil raft group. They unquiesce by first initializing their raft group in withRaftGroupLocked and then trying to unquiesce. But let's get rid of this craziness in #73715.


pkg/kv/kvserver/replica_metrics.go, line 62 at r11 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Interesting. Have I missed a commit? Because, wouldn't an uninitialized replica start out with quiescent = false now when it was previously true? As is, this seems like a behavior change to me. Not a terribly important one (since this is only for ReplicaMetrics and internalRaftGroup is populated anyway within ~10 minutes (scan interval)) but still, this doesn't quite check out with the commit message.

I think this comment came before your "I take back my comment", but in case it wasn't clear, all replicas (initialized or uninitialized) start with quiescent = true. They can then only unquiesce if r.mu.internalRaftGroup != nil, which has always been the case and is not changing in this PR. So !r.mu.quiescent implies r.mu.internalRaftGroup != nil, meaning that this was redundant.


pkg/kv/kvserver/replica_raft_quiesce.go, line 52 at r13 (raw file):

Previously, tbg (Tobias Grieger) wrote…

maybeUnquiesceWithOptionsLocked to indicate that it may not be possible?

Done. I also added a return value that we can use for assertions in a few places. PTAL at the new commit.


pkg/kv/kvserver/replica_raft_quiesce.go, line 116 at r13 (raw file):

Previously, tbg (Tobias Grieger) wrote…

This is pretty annoying; we morally say that a replica unquiesces when it needs to use raft, but then we also say this isn't true, you need some folks bypassing this check and creating the group first? This is pretty complex and I'm not at all clear on why this works out in practice (I assume we have the right calls in the actual critical path).

I think we should initialize the raft group right away. Is now perhaps the right time to make that change? Doing this lazily was a half-baked way to avoid election storms back when this was still a problem and possibly before we even had quiescence. Now that quiescence is the default, what's the benefit of the lazy raft group?

I understand that we're looking to backport this, so maybe a fast-follow on master only?

I think I see what you're saying. The idea would be that if quiescence is the default and we only unquiesce in all the same places that would otherwise initialize the raft group, then this lazy raft group creation isn't doing anything anymore, and so the dependency from internalRaftGroup != nil -> !quiescent is unnecessary complexity. I think I agree with that in theory, as it simplifies the states that a replica can be in significantly.

But I'd like to avoid expanding the scope of this PR any further, so a fast-follow sounds good. I opened #73715 to track the work.


pkg/kv/kvserver/store_create_replica.go, line 351 at r13 (raw file):

but now we have to know that the internal raft group is no longer nil (probably because we hit withReplicaForRequest?)

Yes, that's correct. If we made it here then it was in response to a raft message and so the internal raft group is no longer nil.

Did you have to add this line in response to any failures? It makes sense, but shouldn't be necessary, right?

No, this wasn't in response to any failures. In fact, it's not strictly necessary because this method is only ever called inside of handleRaftReadyRaftMuLocked in response to snapshots, which also calls withRaftGroupLocked with a true return value for unquiesceAndWakeLeader before completing. So if we ever initialize ourselves inside of handleRaftReadyRaftMuLocked and don't unquiesce before getting to the end, that will take care of unquiescing for us.

But this isn't entirely superfluous — there is a benefit to calling this here instead of waiting. That's because doing so allows us to call unquiesceWithOptionsLocked and to pass !campaignOnWake. So since we know we don't need to campaign or wake the leader, it's better to unquiesce now and prevent the other code path from doing so.

I added both of these points to the comment.


pkg/kv/kvserver/store_raft.go, line 638 at r2 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Might want to update this to reflect our finding that uninited replicas cannot even send non-reactive messages since they don't know who the peers are.

Done.

Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 8 of 8 files at r14, 1 of 1 files at r15, 1 of 1 files at r16, 6 of 6 files at r17, 10 of 10 files at r18, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten)


pkg/kv/kvserver/store_create_replica.go, line 369 at r18 (raw file):

	// the range, so we can unquiesce without campaigning or waking the leader.
	if !lockedRepl.maybeUnquiesceWithOptionsLocked(false /* campaignOnWake */) {
		return errors.Errorf("expected replica %s to unquiesce after initialization", desc)

errors.AssertionFailedf? If we hit this the world will end, right? Can't imagine this will be handled gracefully by the callers.

This is a small refactor that eliminates a call to `HLC.Now` on each replica
tick. Due to the issue fixed by the following commit, we were ticking 1000s of
uninitialized replicas in fast succession and as a result, these clock accesses
were a major source of mutex contention. We shouldn't be ticking uninitialized
replicas, but since we now know that this is an expensive part of Replica.tick,
we might as well make it cheaper.
A replica cannot unquiesce if its internalRaftGroup is nil, so this
condition was redundant but confusing.
In a [support issue](cockroachlabs/support#1340), we
saw that 10s of thousands of uninitialized replicas were being ticked regularly
and creating a large amount of background work on a node, driving up CPU. This
commit updates the Raft quiescence logic to disallow uninitialized replicas from
being unquiesced and Tick()'ing themselves.

Keeping uninitialized replicas quiesced even in the presence of Raft traffic
avoids wasted work. We could Tick() these replicas, but doing so is unnecessary
because uninitialized replicas can never win elections, so there is no reason
for them to ever call an election. In fact, uninitialized replicas do not even
know who their peers are, so there would be no way for them to call an election
or for them to send any other non-reactive message. As a result, all work
performed by an uninitialized replica is reactive and in response to incoming
messages (see processRequestQueue).

There are multiple ways for an uninitialized replica to be created and
then abandoned, and we don't do a good job garbage collecting them at a
later point (see cockroachdb#73424),
so it is important that they are cheap. Keeping them quiesced instead of
letting them unquiesce and tick every 200ms indefinitely avoids a
meaningful amount of periodic work for each uninitialized replica.

Release notes (bug fix): uninitialized replicas that are abandoned after an
unsuccessful snapshot no longer perform periodic background work, so they no
longer have a non-negligible cost.
This commit renames the `unquiesceXYZ` methods to include a "maybe" prefix to
indicate that they may not succeed in unquiescing the replica.

The commit also introduces a boolean return value to these methods to inform
the caller about whether the unquiesce attempt was successful.

Finally, the commit uses this return value in `maybeMarkReplicaInitializedLockedReplLocked`
to assert that after initializing a replica, it did unquiesce immediately.
@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/tickUninit branch from b4f814d to 95a7671 Compare December 14, 2021 23:05
Copy link
Member Author

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

bors r=tbg

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @tbg)


pkg/kv/kvserver/store_create_replica.go, line 369 at r18 (raw file):

Previously, tbg (Tobias Grieger) wrote…

errors.AssertionFailedf? If we hit this the world will end, right? Can't imagine this will be handled gracefully by the callers.

Done.

@craig
Copy link
Contributor

craig bot commented Dec 15, 2021

Build succeeded:

@craig craig bot merged commit acf40e9 into cockroachdb:master Dec 15, 2021
@blathers-crl
Copy link

blathers-crl bot commented Dec 15, 2021

Encountered an error creating backports. Some common things that can go wrong:

  1. The backport branch might have already existed.
  2. There was a merge conflict.
  3. The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.


error creating merge commit from 95a7671 to blathers/backport-release-21.1-73362: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 21.1.x failed. See errors above.


error creating merge commit from 95a7671 to blathers/backport-release-21.2-73362: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 21.2.x failed. See errors above.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

@nvanbenschoten nvanbenschoten deleted the nvanbenschoten/tickUninit branch December 22, 2021 01:10
pav-kv added a commit to pav-kv/cockroach that referenced this pull request Jan 11, 2023
After cockroachdb#73362, all uninitialized replicas are quiescent. The callers of
addToReplicasByRangeIDLocked are either inserting an unitialized Replica, or a
newly created initialized Replica which has not been linked to Store yet and
hence has not been unquiesced either. The quiescence check is thus a no-op, and
can be removed from addToReplicasByRangeIDLocked.

As a second "proof" for why it can be removed:

- rangeID is added to unquiescedReplicas in 2 places (Replica.maybeUnquiesce*),
  both of which also set r.mu.quiescent to false a few lines above.
- rangeID is removed from unquiescedReplicas in 2 places (Replica.quiesceLocked
	and Replica removal in Store.unlinkReplicaByRangeIDLocked). The first one
	sets r.mu.quiescent to true a few lines apart, and the second one outright
  removes the replica.

Thus, it is already true that r.mu.quiescent == false iff the replica's rangeID
is in unquiescedReplicas (with a little exception of when r is being removed
which does not impact our check).

Release note: none
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants