kvserver: purging outdated replicas during migrations run into nil pointers #58378

ajwerner · 2020-12-30T17:45:13Z

Describe the problem

On this build I see the following crash:

I201230 17:15:54.784810 1 util/stop/stopper.go:566 â‹® quiescing
W201230 17:15:54.784885 377 sql/sqlliveness/slinstance/slinstance.go:183 â‹® [n2] exiting heartbeat loop
F201230 17:15:54.770224 110793 kv/kvserver/replica_proposal.go:783 â‹® [n2,s2,r85/1:â€¹/Table/52/1/{49-50}â€º] not using applied state key in v21.1
goroutine 110793 [running]:
github.com/cockroachdb/cockroach/pkg/util/log.getStacks(0x7fdd001, 0xe9a5d6, 0x203000, 0x452890)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/get_stacks.go:25 +0xb9
github.com/cockroachdb/cockroach/pkg/util/log.(*loggerT).outputLogEntry(0xc000913000, 0x16558dfbdede14da, 0x4, 0x1b0c9, 0x72e7526, 0x1f, 0x30f, 0x0, 0xc001019040, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/clog.go:270 +0xc52
github.com/cockroachdb/cockroach/pkg/util/log.logfDepth(0x54d7220, 0xc0038f7000, 0x1, 0x4, 0x48433ac, 0x24, 0x0, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/channels.go:58 +0x1c5
github.com/cockroachdb/cockroach/pkg/util/log.Fatalf(...)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/log_channels_generated.go:804
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).evaluateProposal(0xc002994800, 0x54d7220, 0xc0038f7000, 0xc003412830, 0x8, 0xc003b350e0, 0xc002476840, 0x0, 0xc003412800, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_proposal.go:783 +0x690
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).requestToProposal(0xc002994800, 0x54d7220, 0xc0038f7000, 0xc003412830, 0x8, 0xc003b350e0, 0xc002476840, 0x484, 0x84818)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_proposal.go:816 +0x8e
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).evalAndPropose(0xc002994800, 0x54d7220, 0xc0038f7000, 0xc003b350e0, 0xc003fcc3f0, 0xc001b8d528, 0x0, 0x0, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_raft.go:74 +0xf1
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).executeWriteBatch(0xc002994800, 0x54d7220, 0xc0038f7000, 0xc003b350e0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_write.go:136 +0x73d
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).executeBatchWithConcurrencyRetries(0xc002994800, 0x54d7220, 0xc0038f7000, 0xc003b350e0, 0x4e95740, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_send.go:352 +0x491
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).sendWithRangeID(0xc002994800, 0x54d7220, 0xc0038f7000, 0x55, 0xc003b350e0, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_send.go:97 +0x53d
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).Send(...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_send.go:36
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*pendingLeaseRequest).requestLeaseAsync.func2(0x54d7220, 0xc0038f7000)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_range_lease.go:402 +0x632
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTask.func1(0xc000ec4240, 0x54d7220, 0xc0038f7000, 0xc0038eaa00, 0x35, 0xc000e8f500, 0xc002281b80)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:347 +0xdd
created by github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTask
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:342 +0x11d

This code was introduced in #58088. I suppose it's possible that my PR caused this issue, but I doubt it. Another thing to note is that I don't think this cluster was upgraded or anything like that so I have to assume that there's some initialization invariant that code is not expecting.

The text was updated successfully, but these errors were encountered:

blathers-crl · 2020-12-30T17:45:16Z

Hi @ajwerner, please add a C-ategory label to your issue. Check out the label system docs.

While you're here, please consider adding an A- label to help keep our repository tidy.

_{🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.}

irfansharif · 2020-12-31T00:36:25Z

I201230 17:15:54.760385 110364 kv/kvserver/replica_raftstorage.go:820 â‹® [n2,s2,r85/1:{-}] applying snapshot of type VIA_SNAPSHOT_QUEUE [id=â€¹1ca210deâ€º index=19]
I201230 17:15:54.770168 110793 1@kv/kvserver/replica_proposal.go:783 â‹® [n2,s2,r85/1:â€¹/Table/52/1/{49-50}â€º] the server is terminating due to a fatal error (see the DEV channel for details)
I201230 17:15:54.784698 110364 kv/kvserver/replica_raftstorage.go:841 â‹® [n2,s2,r85/1:â€¹/Table/52/1/{49-50}â€º] applied snapshot of type VIA_SNAPSHOT_QUEUE

I think this assertion is a bit racey. I'm a bit surprised by the synchronization we're seeing here, but it's evidently possible for a replica to be evaluating proposal while concurrently applying a snapshot? In the failure above, r85 is being freshly created by the snapshot, which is still being applied. Part of that application process entails setting the right replica state from the incoming snapshot, which happens here:

cockroach/pkg/kv/kvserver/replica_raftstorage.go

Lines 1001 to 1005 in bdd0b93

    
           // Update the rest of the Raft state. Changes to r.mu.state.Desc must be 
        
           // managed by r.setDescRaftMuLocked and changes to r.mu.state.Lease must be handled 
        
           // by r.leasePostApply, but we called those above, so now it's safe to 
        
           // wholesale replace r.mu.state. 
        
           r.mu.state = s

Given this test starts off at v21.1, state.UsingAppliedStateKey from the snapshot will always be true, which is supposed to make it's way to the replica's in-memory ReplicaState, and what the assertion below is checking for:

cockroach/pkg/kv/kvserver/replica_proposal.go

Lines 772 to 774 in bdd0b93

    
           r.mu.RLock() 
        
           usingAppliedStateKey := r.mu.state.UsingAppliedStateKey 
        
           r.mu.RUnlock()

But because the evaluation seems to be happening concurrently with the snapshot application, it's possible for us to read from an uninitialized ReplicaState, which would then trip up this assertion.

irfansharif · 2020-12-31T00:42:57Z

I'm not sure if I'm missing some obvious synchronization that would disallow command evaluation from happening concurrently with the replica being instantiated by means of a snapshot. I think we might have to drop the assertion here.

Touches cockroachdb#58378; a stop-gap while we investigate. Release note: None

tbg · 2021-01-04T09:29:01Z

Wow, that's... weird, right? First of all (and I'm sure you're aware, just writing this out anyway) there is nothing mixed-version going on. Everyone here ought to be using the applied state right off the bat. And it pretty much looks as though what you're seeing is true: we are evaluating a lease request on a replica that was just created in response to a snapshot, and the snapshot has not applied yet, meaning that the lease will evaluate against a completely blank disk state, while the in-memory state already has the descriptor (see the log tags on the fatal error).

But looking into this further, I'm not sure how this would work. We are setting the replica state here:

cockroach/pkg/kv/kvserver/replica_raftstorage.go

Lines 987 to 1010 in ce4b23d

    
           r.mu.Lock() 
        
           // We set the persisted last index to the last applied index. This is 
        
           // not a correctness issue, but means that we may have just transferred 
        
           // some entries we're about to re-request from the leader and overwrite. 
        
           // However, raft.MultiNode currently expects this behavior, and the 
        
           // performance implications are not likely to be drastic. If our 
        
           // feelings about this ever change, we can add a LastIndex field to 
        
           // raftpb.SnapshotMetadata. 
        
           r.mu.lastIndex = s.RaftAppliedIndex 
        
           r.mu.lastTerm = lastTerm 
        
           r.mu.raftLogSize = raftLogSize 
        
           // Update the store stats for the data in the snapshot. 
        
           r.store.metrics.subtractMVCCStats(ctx, r.mu.tenantID, *r.mu.state.Stats) 
        
           r.store.metrics.addMVCCStats(ctx, r.mu.tenantID, *s.Stats) 
        
           // Update the rest of the Raft state. Changes to r.mu.state.Desc must be 
        
           // managed by r.setDescRaftMuLocked and changes to r.mu.state.Lease must be handled 
        
           // by r.leasePostApply, but we called those above, so now it's safe to 
        
           // wholesale replace r.mu.state. 
        
           r.mu.state = s 
        
           // Snapshots typically have fewer log entries than the leaseholder. The next 
        
           // time we hold the lease, recompute the log size before making decisions. 
        
           r.mu.raftLogSizeTrusted = false 
        
           r.assertStateLocked(ctx, r.store.Engine()) 
        
           r.mu.Unlock()

but note that ingesting the actual data happens before. This would mean that there shouldn't be any way for a lease to make it to evaluation (wouldn't it bounce on the key check since it addresses to the start key?!) unless it can also observe the on-disk state.

So I'm still not sure what's going on here - the synchronization of req eval and snapshots seems lacking, but I can't use that to explain this particular problem. Can you? I don't think we'd ever send out a snapshot that has !UsingAppliedStateKey, right?

58387: kvserver: downgrade a potentially racey assertion r=tbg a=irfansharif Touches #58378; a stop-gap while we investigate. Release note: None Co-authored-by: irfan sharif <irfanmahmoudsharif@gmail.com>

ajwerner · 2021-01-04T13:44:08Z

We set the entry in the store before we set the raft state:

cockroach/pkg/kv/kvserver/replica_raftstorage.go

Line 972 in ce4b23d

r.setDescRaftMuLocked(ctx, s.Desc)

I believe that circumvents the key check. We should move some of this logic around so that requests can't find the replica until after it has been initialized.

tbg · 2021-01-04T14:21:28Z

Oh - good call. I'll take a look.

tbg · 2021-01-04T14:42:03Z

Let's say we fix this - are we still on thin ice? I believe fixing this is good enough to address the case of the very first snapshot, as an uninitialized replica (probably) won't ever try to evaluate anything. When a snapshot arrives on an existing replica, and that replica is evaluating some requests at the time (probably follower reads then), it does so under a timestamp that is covered by the lease including the uncertainty interval and has the MVCC history as observable by the commands immutable, so it won't matter whether the commands apply against the old or new state per se. But can there still be weird interleavings regarding the in-memory state? A command may read something from r.mu.state and sees the new state instead of the old. I wonder whether there are any instances of this, but even if not, it makes me somewhat uncomfortable that we don't have better boundaries there. Morally speaking, shouldn't the snapshot application grab an "everything latch"?

cc @nvanbenschoten

tbg · 2021-01-04T16:17:17Z

I poked at this for a while and it's not super easy to untangle. We'd really want to update the in-memory state of the replica before switching out the placeholder at the store at least for the first snapshot, but in general we do want to update the descriptor atomically with updating the store's key lookup index. Nothing that can't be fixed here, but it will take a little bit of elbow grease and since it's very tedious with 5pm brain, I will look at it again tomorrow.

nvanbenschoten · 2021-01-05T05:58:59Z

Morally speaking, shouldn't the snapshot application grab an "everything latch"?

I think latching is a little too high-level for this kind of synchronization. Snapshots can block writes that are holding latches, so without some form of re-entrance, we'd run into all kinds of deadlocks if we tried to use them. This sounds more like the kind of issue that the readOnlyCmdMu is meant to solve - though in practice it does a pretty haphazard job: #46329.

The interactions between below Raft state changes and above Raft state observations are pretty subtle. In part, that's because we aren't explicit about when we're observing state above Raft (see #55461) so we end up needing to be very defensive about how we change things below Raft. For instance, snapshot ingestion goes through all kinds of hoops to make sure that the entire snapshot is ingested atomically to disk, even if that means that 4 of the 5 SSTs are tiny. If we had a cleaner policy about snapshotting an entire range's state during evaluation then we could add in some fine-grained synchronization around this snapshot and things would be a lot easier to think through.

Regarding this specific issue, it is surprising how late we set r.mu.state in Replica.applySnapshot. #58378 (comment) is thinking along the right lines.

tbg · 2021-01-05T13:58:47Z

Good point about latching being at the wrong level. I remember readOnlyCmdMu and its flaws.

I like the idea that the range state should be snapshotted as well. When we make a ReplicaEvalContext, it basically just passes through to Replica.mu. Instead, it could be a pointer to the replica.mu.state taken together with the pebble snapshot. and we adopt the policy that we never mutate r.mu.state in-place (we always switch it out wholesale, i.e. *r.mu.state is immutable).

I like how that gets us a small step closer to a world in which commands evaluate in a true sandbox,

Either way, with this issue on our plate now, I think the best I can do is to move the state update before the store update. That will already be difficult enough and requires some thinking through as well. For example if we "just" move the state update before the store update, if the snapshot spans a split, we will shrink the replica before notifying the store of that fact which can't be good; similar issues can come up on a merge (we'll destroy the RHS before the LHS expands? Not sure how this all works today). Ideally I can manage to do them both at the same time (i.e. hold replica.mu across store.mu).

There's a scary lack of synchronization around how we set the ReplicaState for a given replica, and how we mark a replica as "initialized". What this means is that very temporarily, it's possible for the entry in Store.mu.replicas to be both "initialized" and have an empty ReplicaState. This was an existing problem, but is now more likely to bite us given the migrations infrastructure attempts to purge outdated replicas at start up time (when replicas are being initialized, and we're iterating through extan replicas in the Store.mu.replicas map). This issue has caused a bit of recent instability: cockroachdb#59180, cockroachdb#58489, \cockroachdb#58523, and cockroachdb#58378. While we work on a more considered fix to the problem (tracked in cockroachdb#58489), we can introduce stop the bleeding in the interim (and unskip some tests). Release note: None

59194: kv: introduce a stopgap for lack of ReplicaState synchronization r=irfansharif a=irfansharif There's a scary lack of synchronization around how we set the ReplicaState for a given replica, and how we mark a replica as "initialized". What this means is that very temporarily, it's possible for the entry in Store.mu.replicas to be both "initialized" and have an empty ReplicaState. This was an existing problem, but is now more likely to bite us given the migrations infrastructure attempts to purge outdated replicas at start up time (when replicas are being initialized, and we're iterating through extan replicas in the Store.mu.replicas map). This issue has caused a bit of recent instability: #59180, #58489, \#58523, and #58378. While we work on a more considered fix to the problem (tracked in #58489), we can introduce stop the bleeding in the interim (and unskip some tests). Release note: None 59201: sql: add telemetry for materialized views and set schema. r=otan a=RichardJCai sql: add telemetry for materialized views and set schema. Release note: None Resolves #57299 Co-authored-by: irfan sharif <irfanmahmoudsharif@gmail.com> Co-authored-by: richardjcai <caioftherichard@gmail.com>

There's a scary lack of synchronization around how we set the ReplicaState for a given replica, and how we mark a replica as "initialized". What this means is that very temporarily, it's possible for the entry in Store.mu.replicas to be both "initialized" and have an empty ReplicaState. This was an existing problem, but is now more likely to bite us given the migrations infrastructure attempts to purge outdated replicas at start up time (when replicas are being initialized, and we're iterating through extan replicas in the Store.mu.replicas map). This issue has caused a bit of recent instability: cockroachdb#59180, cockroachdb#58489, \cockroachdb#58523, and cockroachdb#58378. While we work on a more considered fix to the problem (tracked in cockroachdb#58489), we can introduce stop the bleeding in the interim (and unskip some tests). Release note: None

See cockroachdb#59194 and cockroachdb#58489 for more details. In cockroachdb#58489 we observed a scary lack of synchronization around how we set the ReplicaState for a given replica, and how we mark a replica as "initialized". What this meant is that it was possible for the entry in Store.mu.replicas to be both "initialized" and have an empty ReplicaState. This is now more likely to bite us given the migrations infrastructure attempts to purge outdated replicas at start up time (when replicas are being initialized, and we're iterating through extan replicas in the Store.mu.replicas map). We believed this was addressed as part of cockroachdb#58378, but that appears not to be the case. Lets re-introduce this stop-gap while we investigate. Release note: None

60429: kv: (re-)introduce a stopgap for lack of ReplicaState synchronization r=irfansharif a=irfansharif See #59194 and #58489 for more details. In #58489 we observed a scary lack of synchronization around how we set the ReplicaState for a given replica, and how we mark a replica as "initialized". What this meant is that it was possible for the entry in Store.mu.replicas to be both "initialized" and have an empty ReplicaState. This is now more likely to bite us given the migrations infrastructure attempts to purge outdated replicas at start up time (when replicas are being initialized, and we're iterating through extan replicas in the Store.mu.replicas map). We believed this was addressed as part of #58378, but that appears not to be the case. Lets re-introduce this stop-gap while we investigate. Release note: None Co-authored-by: irfan sharif <irfanmahmoudsharif@gmail.com>

60281: sql/pgwire: send placeholder BackendKeyData r=asubiotto,spaskob a=rafiss fixes #13191 Some tools expect this message to be returned at connection time and will not connect without it. CockroachDB does not support pgwire cancellation, but we can still send a placeholder value here, and continue ignoring cancellation requests like we already do. Added a small test to make sure nothing broke. Release note (sql change): When a connection is established, CockroachDB will now return a placeholder BackendKeyData message in the response. This is for compatibility with some tools, but using the BackendKeyData to cancel a query will still have no effect, just the same as before. 60429: kv: (re-)introduce a stopgap for lack of ReplicaState synchronization r=irfansharif a=irfansharif See #59194 and #58489 for more details. In #58489 we observed a scary lack of synchronization around how we set the ReplicaState for a given replica, and how we mark a replica as "initialized". What this meant is that it was possible for the entry in Store.mu.replicas to be both "initialized" and have an empty ReplicaState. This is now more likely to bite us given the migrations infrastructure attempts to purge outdated replicas at start up time (when replicas are being initialized, and we're iterating through extan replicas in the Store.mu.replicas map). We believed this was addressed as part of #58378, but that appears not to be the case. Lets re-introduce this stop-gap while we investigate. Release note: None 60441: bazel: quash unnecessary dependency on `pkg/util/uuid` from protos r=rickystewart a=rickystewart This dependency can be replaced with a few `# keep` deps in a few choice proto targets, which is what we should have done the whole time anyway. This fixes build failures elsewhere in tree -- for example, `pkg/util/uuid:uuid_test`, which doesn't play nicely with `rules_go` in the presence of this dependency. Fixes #59778. Release note: None Co-authored-by: Rafi Shamim <rafi@cockroachlabs.com> Co-authored-by: irfan sharif <irfanmahmoudsharif@gmail.com> Co-authored-by: Ricky Stewart <ricky@cockroachlabs.com>

irfansharif · 2021-03-03T16:10:39Z

Going to re-open this issue as a placeholder given we needed to revert the stop-gap in #60429.

tbg · 2021-03-16T10:21:34Z

Starting to look at this now.

tbg · 2021-03-16T11:29:55Z

I ran the kvserver tests with this patch:

diff --git a/pkg/server/testserver.go b/pkg/server/testserver.go
index 082bf10558..b679fda253 100644
--- a/pkg/server/testserver.go
+++ b/pkg/server/testserver.go
@@ -456,7 +456,28 @@ func (ts *TestServer) NodeDialer() *nodedialer.Dialer {
 // completes.
 func (ts *TestServer) Start() error {
        ctx := context.Background()
-       return ts.Server.Start(ctx)
+       err := ts.Server.Start(ctx)
+       if err != nil {
+               return err
+       }
+       return ts.stopper.RunAsyncTask(ctx, "foo", func(ctx context.Context) {
+               for {
+                       select {
+                       case <-ts.stopper.ShouldQuiesce():
+                               return
+                       default:
+                       }
+                       if err := ts.Server.node.stores.VisitStores(func(s *kvserver.Store) error {
+                               s.VisitReplicas(func(repl *kvserver.Replica) (more bool) {
+                                       _ = repl.Version()
+                                       return true
+                               })
+                               return nil
+                       }); err != nil {
+                               panic(err)
+                       }
+               }
+       })
 }
 
 type dummyProtectedTSProvider struct {

To my dismay, it doesn't seem that easy to reproduce this bug.

Fixes cockroachdb#58378. Fixes cockroachdb#62267. Previously it was possible for us to have replicas in-memory, with pre-migrated state, even after a migration was finalized. This led to the kind of badness we were observing in cockroachdb#62267, where it appeared that a replica was not using the applied state key despite us having migrated into it (see TruncatedAndRangeAppliedState, introduced in cockroachdb#58088). --- To see how, consider the following set of events: - Say r42 starts off on n1, n2, and n3 - n3 flaps and so we place a replica for r42 on n4 - n3's replica, r42/3, is now GC-able, but still un-GC-ed - We run the applied state migration, first migrating all ranges into it and then purging outdated replicas - Well, we should want to purge r42/3, cause it's un-migrated and evaluating anything on it (say a lease request) is unsound because we've bumped version gates that tell the kvserver to always expect post-migration state - What happens when we try to purge r42/3? Previous to this PR if it didn't have a replica version, we'd skip over it (!) - Was it possible for r42/3 to not have a replica version? Shouldn't it have been accounted for when we migrated all ranges? No, that's precisely why the migration infrastructure purge outdated replicas. The migrate request only returns once its applied on all followers; in our example that wouldn't include r42/3 since it was no longer one - The stop-gap in cockroachdb#60429 made it so that we didn't GC r42/3, when we should've been doing the opposite. When iterating over a store's replicas for purging purposes, an empty replica version is fine and expected; we should interpret that as signal that we're dealing with a replica that was obviously never migrated (to even start using replica versions in the first place). Because it didn't have a valid replica version installed, we can infer that it's soon to be GC-ed (else we wouldn't have been able to finalize the applied state + replica version migration) - The conditions above made it possible for us to evaluate requests on replicas with migration state out-of-date relative to the store's version - Boom Release note: None

60835: kv: bump timestamp cache to Pushee.MinTimestamp on PUSH_ABORT r=nvanbenschoten a=nvanbenschoten Fixes #60779. Fixes #60580. We were only checking that the batch header timestamp was equal to or greater than this pushee's min timestamp, so this is as far as we can bump the timestamp cache. 62832: geo: minor performance improvement for looping over edges r=otan a=andyyang890 This patch slightly improves the performance of many spatial builtins by storing the number of edges used in the loop conditions of for loops into a variable. We discovered this was taking a lot of time when profiling the point-in-polygon optimization. Release note: None 62838: kvserver: purge gc-able, unmigrated replicas during migrations r=irfansharif a=irfansharif Fixes #58378. Fixes #62267. Previously it was possible for us to have replicas in-memory, with pre-migrated state, even after a migration was finalized. This led to the kind of badness we were observing in #62267, where it appeared that a replica was not using the applied state key despite us having migrated into it (see TruncatedAndRangeAppliedState, introduced in #58088). --- To see how, consider the following set of events: - Say r42 starts off on n1, n2, and n3 - n3 flaps and so we place a replica for r42 on n4 - n3's replica, r42/3, is now GC-able, but still un-GC-ed - We run the applied state migration, first migrating all ranges into it and then purging outdated replicas - Well, we should want to purge r42/3, cause it's un-migrated and evaluating anything on it (say a lease request) is unsound because we've bumped version gates that tell the kvserver to always expect post-migration state - What happens when we try to purge r42/3? Previous to this PR if it didn't have a replica version, we'd skip over it (!) - Was it possible for r42/3 to not have a replica version? Shouldn't it have been accounted for when we migrated all ranges? No, that's precisely why the migration infrastructure purge outdated replicas. The migrate request only returns once its applied on all followers; in our example that wouldn't include r42/3 since it was no longer one - The stop-gap in #60429 made it so that we didn't GC r42/3, when we should've been doing the opposite. When iterating over a store's replicas for purging purposes, an empty replica version is fine and expected; we should interpret that as signal that we're dealing with a replica that was obviously never migrated (to even start using replica versions in the first place). Because it didn't have a valid replica version installed, we can infer that it's soon to be GC-ed (else we wouldn't have been able to finalize the applied state + replica version migration) - The conditions above made it possible for us to evaluate requests on replicas with migration state out-of-date relative to the store's version - Boom Release note: None 62839: zonepb: make subzone DiffWithZone more accurate r=ajstorm a=otan * Subzones may be defined in a different order. We did not take this into account which can cause bugs when e.g. ADD REGION adds a subzone in the end rather than in the old "expected" location in the subzones array. This has been fixed by comparing subzones using an unordered map. * The ApplyZoneConfig we previously did overwrote subzone fields on the original subzone array element, meaning that if there was a mismatch it would not be reported through validation. This is now fixed by applying the expected zone config to *zonepb.NewZoneConfig() instead. * Added logic to only check for zone config matches subzones from active subzone IDs. * Improve the error messaging when a subzone config is mismatching - namely, add index and partitioning information and differentiate between missing fields and missing / extraneous zone configs Resolves #62790 Release note (bug fix): Fixed validation bugs during ALTER TABLE ... SET LOCALITY / crdb_internal.validate_multi_region_zone_config where validation errors could occur when the database of a REGIONAL BY ROW table has a new region added. Also fix a validation bug partition zone mismatches configs were not caught. 62872: build: use -json for RandomSyntax test r=otan a=rafiss I'm hoping this will help out with an issue where the test failures seem to be missing helpful logs. Release note: None Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com> Co-authored-by: Andy Yang <ayang@cockroachlabs.com> Co-authored-by: irfan sharif <irfanmahmoudsharif@gmail.com> Co-authored-by: Oliver Tan <otan@cockroachlabs.com> Co-authored-by: Rafi Shamim <rafi@cockroachlabs.com>

ajwerner assigned irfansharif Dec 30, 2020

irfansharif added a commit to irfansharif/cockroach that referenced this issue Dec 31, 2020

kvserver: downgrade a potentially racey assertion

ef64519

Touches cockroachdb#58378; a stop-gap while we investigate. Release note: None

irfansharif mentioned this issue Dec 31, 2020

kvserver: downgrade a potentially racey assertion #58387

Merged

craig bot pushed a commit that referenced this issue Jan 4, 2021

Merge #58387

fb71ee4

58387: kvserver: downgrade a potentially racey assertion r=tbg a=irfansharif Touches #58378; a stop-gap while we investigate. Release note: None Co-authored-by: irfan sharif <irfanmahmoudsharif@gmail.com>

tbg assigned tbg and unassigned irfansharif Jan 4, 2021

This was referenced Jan 4, 2021

roachtest: restore2TB/nodes=10 failed #58391

Closed

roachtest: restore2TB/nodes=32 failed #58392

Closed

nvanbenschoten mentioned this issue Jan 4, 2021

roachtest: acceptance/many-splits failed #58395

Closed

tbg mentioned this issue Jan 5, 2021

roachtest: tpccbench/nodes=9/cpu=4/multi-region failed #58279

Closed

tbg changed the title ~~kv,acceptance: acceptance/many-splits failed~~ kvserver: bug in synchronization between snapshot application and lease eval Jan 5, 2021

tbg mentioned this issue Jan 5, 2021

sql: TestScatterRandomizeLeases failed #55942

Closed

irfansharif mentioned this issue Jan 14, 2021

roachtest: decommission/mixed-versions failed #58523

Closed

irfansharif mentioned this issue Jan 20, 2021

kv: introduce a stopgap for lack of ReplicaState synchronization #59194

Merged

irfansharif mentioned this issue Feb 9, 2021

roachtest: schemaChangeStep is skipped in acceptance/version-upgrade #58489

Closed

irfansharif mentioned this issue Feb 10, 2021

kv: (re-)introduce a stopgap for lack of ReplicaState synchronization #60429

Merged

irfansharif reopened this Mar 3, 2021

irfansharif self-assigned this Mar 3, 2021

tbg unassigned irfansharif Mar 16, 2021

nvanbenschoten mentioned this issue Mar 23, 2021

roachtest: acceptance/version-upgrade failed [not using applied state] #62267

Closed

irfansharif changed the title ~~kvserver: bug in synchronization between snapshot application and lease eval~~ kvserver: purging outdated replicas during migrations run into nil pointers Mar 31, 2021

irfansharif mentioned this issue Mar 31, 2021

kvserver: purge gc-able, unmigrated replicas during migrations #62838

Merged

irfansharif mentioned this issue Mar 31, 2021

release-21.1: kvserver: purge gc-able, unmigrated replicas during migrations #62871

Merged

craig bot closed this as completed in 1416a4e Apr 1, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kvserver: purging outdated replicas during migrations run into nil pointers #58378

kvserver: purging outdated replicas during migrations run into nil pointers #58378

ajwerner commented Dec 30, 2020

blathers-crl bot commented Dec 30, 2020

irfansharif commented Dec 31, 2020 •

edited

Loading

irfansharif commented Dec 31, 2020

tbg commented Jan 4, 2021

ajwerner commented Jan 4, 2021

tbg commented Jan 4, 2021

tbg commented Jan 4, 2021 •

edited

Loading

tbg commented Jan 4, 2021

nvanbenschoten commented Jan 5, 2021

tbg commented Jan 5, 2021

irfansharif commented Mar 3, 2021

tbg commented Mar 16, 2021

tbg commented Mar 16, 2021

kvserver: purging outdated replicas during migrations run into nil pointers #58378

kvserver: purging outdated replicas during migrations run into nil pointers #58378

Comments

ajwerner commented Dec 30, 2020

blathers-crl bot commented Dec 30, 2020

irfansharif commented Dec 31, 2020 • edited Loading

irfansharif commented Dec 31, 2020

tbg commented Jan 4, 2021

ajwerner commented Jan 4, 2021

tbg commented Jan 4, 2021

tbg commented Jan 4, 2021 • edited Loading

tbg commented Jan 4, 2021

nvanbenschoten commented Jan 5, 2021

tbg commented Jan 5, 2021

irfansharif commented Mar 3, 2021

tbg commented Mar 16, 2021

tbg commented Mar 16, 2021

irfansharif commented Dec 31, 2020 •

edited

Loading

tbg commented Jan 4, 2021 •

edited

Loading