Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: purging outdated replicas during migrations run into nil pointers #58378

Closed
ajwerner opened this issue Dec 30, 2020 · 13 comments · Fixed by #62838
Closed

kvserver: purging outdated replicas during migrations run into nil pointers #58378

ajwerner opened this issue Dec 30, 2020 · 13 comments · Fixed by #62838
Assignees

Comments

@ajwerner
Copy link
Contributor

Describe the problem

On this build I see the following crash:

I201230 17:15:54.784810 1 util/stop/stopper.go:566 â‹® quiescing
W201230 17:15:54.784885 377 sql/sqlliveness/slinstance/slinstance.go:183 â‹® [n2] exiting heartbeat loop
F201230 17:15:54.770224 110793 kv/kvserver/replica_proposal.go:783 ⋮ [n2,s2,r85/1:‹/Table/52/1/{49-50}›] not using applied state key in v21.1
goroutine 110793 [running]:
github.com/cockroachdb/cockroach/pkg/util/log.getStacks(0x7fdd001, 0xe9a5d6, 0x203000, 0x452890)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/get_stacks.go:25 +0xb9
github.com/cockroachdb/cockroach/pkg/util/log.(*loggerT).outputLogEntry(0xc000913000, 0x16558dfbdede14da, 0x4, 0x1b0c9, 0x72e7526, 0x1f, 0x30f, 0x0, 0xc001019040, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/clog.go:270 +0xc52
github.com/cockroachdb/cockroach/pkg/util/log.logfDepth(0x54d7220, 0xc0038f7000, 0x1, 0x4, 0x48433ac, 0x24, 0x0, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/channels.go:58 +0x1c5
github.com/cockroachdb/cockroach/pkg/util/log.Fatalf(...)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/log_channels_generated.go:804
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).evaluateProposal(0xc002994800, 0x54d7220, 0xc0038f7000, 0xc003412830, 0x8, 0xc003b350e0, 0xc002476840, 0x0, 0xc003412800, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_proposal.go:783 +0x690
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).requestToProposal(0xc002994800, 0x54d7220, 0xc0038f7000, 0xc003412830, 0x8, 0xc003b350e0, 0xc002476840, 0x484, 0x84818)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_proposal.go:816 +0x8e
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).evalAndPropose(0xc002994800, 0x54d7220, 0xc0038f7000, 0xc003b350e0, 0xc003fcc3f0, 0xc001b8d528, 0x0, 0x0, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_raft.go:74 +0xf1
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).executeWriteBatch(0xc002994800, 0x54d7220, 0xc0038f7000, 0xc003b350e0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_write.go:136 +0x73d
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).executeBatchWithConcurrencyRetries(0xc002994800, 0x54d7220, 0xc0038f7000, 0xc003b350e0, 0x4e95740, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_send.go:352 +0x491
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).sendWithRangeID(0xc002994800, 0x54d7220, 0xc0038f7000, 0x55, 0xc003b350e0, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_send.go:97 +0x53d
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).Send(...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_send.go:36
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*pendingLeaseRequest).requestLeaseAsync.func2(0x54d7220, 0xc0038f7000)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_range_lease.go:402 +0x632
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTask.func1(0xc000ec4240, 0x54d7220, 0xc0038f7000, 0xc0038eaa00, 0x35, 0xc000e8f500, 0xc002281b80)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:347 +0xdd
created by github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTask
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:342 +0x11d

This code was introduced in #58088. I suppose it's possible that my PR caused this issue, but I doubt it. Another thing to note is that I don't think this cluster was upgraded or anything like that so I have to assume that there's some initialization invariant that code is not expecting.

@blathers-crl
Copy link

blathers-crl bot commented Dec 30, 2020

Hi @ajwerner, please add a C-ategory label to your issue. Check out the label system docs.

While you're here, please consider adding an A- label to help keep our repository tidy.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

@irfansharif
Copy link
Contributor

irfansharif commented Dec 31, 2020

I201230 17:15:54.760385 110364 kv/kvserver/replica_raftstorage.go:820 ⋮ [n2,s2,r85/1:{-}] applying snapshot of type VIA_SNAPSHOT_QUEUE [id=‹1ca210de› index=19]
I201230 17:15:54.770168 110793 1@kv/kvserver/replica_proposal.go:783 ⋮ [n2,s2,r85/1:‹/Table/52/1/{49-50}›] the server is terminating due to a fatal error (see the DEV channel for details)
I201230 17:15:54.784698 110364 kv/kvserver/replica_raftstorage.go:841 ⋮ [n2,s2,r85/1:‹/Table/52/1/{49-50}›] applied snapshot of type VIA_SNAPSHOT_QUEUE 

I think this assertion is a bit racey. I'm a bit surprised by the synchronization we're seeing here, but it's evidently possible for a replica to be evaluating proposal while concurrently applying a snapshot? In the failure above, r85 is being freshly created by the snapshot, which is still being applied. Part of that application process entails setting the right replica state from the incoming snapshot, which happens here:

// Update the rest of the Raft state. Changes to r.mu.state.Desc must be
// managed by r.setDescRaftMuLocked and changes to r.mu.state.Lease must be handled
// by r.leasePostApply, but we called those above, so now it's safe to
// wholesale replace r.mu.state.
r.mu.state = s

Given this test starts off at v21.1, state.UsingAppliedStateKey from the snapshot will always be true, which is supposed to make it's way to the replica's in-memory ReplicaState, and what the assertion below is checking for:

r.mu.RLock()
usingAppliedStateKey := r.mu.state.UsingAppliedStateKey
r.mu.RUnlock()

But because the evaluation seems to be happening concurrently with the snapshot application, it's possible for us to read from an uninitialized ReplicaState, which would then trip up this assertion.

@irfansharif
Copy link
Contributor

I'm not sure if I'm missing some obvious synchronization that would disallow command evaluation from happening concurrently with the replica being instantiated by means of a snapshot. I think we might have to drop the assertion here.

irfansharif added a commit to irfansharif/cockroach that referenced this issue Dec 31, 2020
Touches cockroachdb#58378; a stop-gap while we investigate.

Release note: None
@tbg
Copy link
Member

tbg commented Jan 4, 2021

Wow, that's... weird, right? First of all (and I'm sure you're aware, just writing this out anyway) there is nothing mixed-version going on. Everyone here ought to be using the applied state right off the bat. And it pretty much looks as though what you're seeing is true: we are evaluating a lease request on a replica that was just created in response to a snapshot, and the snapshot has not applied yet, meaning that the lease will evaluate against a completely blank disk state, while the in-memory state already has the descriptor (see the log tags on the fatal error).

But looking into this further, I'm not sure how this would work. We are setting the replica state here:

r.mu.Lock()
// We set the persisted last index to the last applied index. This is
// not a correctness issue, but means that we may have just transferred
// some entries we're about to re-request from the leader and overwrite.
// However, raft.MultiNode currently expects this behavior, and the
// performance implications are not likely to be drastic. If our
// feelings about this ever change, we can add a LastIndex field to
// raftpb.SnapshotMetadata.
r.mu.lastIndex = s.RaftAppliedIndex
r.mu.lastTerm = lastTerm
r.mu.raftLogSize = raftLogSize
// Update the store stats for the data in the snapshot.
r.store.metrics.subtractMVCCStats(ctx, r.mu.tenantID, *r.mu.state.Stats)
r.store.metrics.addMVCCStats(ctx, r.mu.tenantID, *s.Stats)
// Update the rest of the Raft state. Changes to r.mu.state.Desc must be
// managed by r.setDescRaftMuLocked and changes to r.mu.state.Lease must be handled
// by r.leasePostApply, but we called those above, so now it's safe to
// wholesale replace r.mu.state.
r.mu.state = s
// Snapshots typically have fewer log entries than the leaseholder. The next
// time we hold the lease, recompute the log size before making decisions.
r.mu.raftLogSizeTrusted = false
r.assertStateLocked(ctx, r.store.Engine())
r.mu.Unlock()

but note that ingesting the actual data happens before. This would mean that there shouldn't be any way for a lease to make it to evaluation (wouldn't it bounce on the key check since it addresses to the start key?!) unless it can also observe the on-disk state.

So I'm still not sure what's going on here - the synchronization of req eval and snapshots seems lacking, but I can't use that to explain this particular problem. Can you? I don't think we'd ever send out a snapshot that has !UsingAppliedStateKey, right?

craig bot pushed a commit that referenced this issue Jan 4, 2021
58387: kvserver: downgrade a potentially racey assertion r=tbg a=irfansharif

Touches #58378; a stop-gap while we investigate.

Release note: None

Co-authored-by: irfan sharif <irfanmahmoudsharif@gmail.com>
@ajwerner
Copy link
Contributor Author

ajwerner commented Jan 4, 2021

We set the entry in the store before we set the raft state:

r.setDescRaftMuLocked(ctx, s.Desc)

I believe that circumvents the key check. We should move some of this logic around so that requests can't find the replica until after it has been initialized.

@tbg
Copy link
Member

tbg commented Jan 4, 2021

Oh - good call. I'll take a look.

@tbg tbg assigned tbg and unassigned irfansharif Jan 4, 2021
@tbg
Copy link
Member

tbg commented Jan 4, 2021

Let's say we fix this - are we still on thin ice? I believe fixing this is good enough to address the case of the very first snapshot, as an uninitialized replica (probably) won't ever try to evaluate anything. When a snapshot arrives on an existing replica, and that replica is evaluating some requests at the time (probably follower reads then), it does so under a timestamp that is covered by the lease including the uncertainty interval and has the MVCC history as observable by the commands immutable, so it won't matter whether the commands apply against the old or new state per se. But can there still be weird interleavings regarding the in-memory state? A command may read something from r.mu.state and sees the new state instead of the old. I wonder whether there are any instances of this, but even if not, it makes me somewhat uncomfortable that we don't have better boundaries there. Morally speaking, shouldn't the snapshot application grab an "everything latch"?

cc @nvanbenschoten

@tbg
Copy link
Member

tbg commented Jan 4, 2021

I poked at this for a while and it's not super easy to untangle. We'd really want to update the in-memory state of the replica before switching out the placeholder at the store at least for the first snapshot, but in general we do want to update the descriptor atomically with updating the store's key lookup index. Nothing that can't be fixed here, but it will take a little bit of elbow grease and since it's very tedious with 5pm brain, I will look at it again tomorrow.

@nvanbenschoten
Copy link
Member

Morally speaking, shouldn't the snapshot application grab an "everything latch"?

I think latching is a little too high-level for this kind of synchronization. Snapshots can block writes that are holding latches, so without some form of re-entrance, we'd run into all kinds of deadlocks if we tried to use them. This sounds more like the kind of issue that the readOnlyCmdMu is meant to solve - though in practice it does a pretty haphazard job: #46329.

The interactions between below Raft state changes and above Raft state observations are pretty subtle. In part, that's because we aren't explicit about when we're observing state above Raft (see #55461) so we end up needing to be very defensive about how we change things below Raft. For instance, snapshot ingestion goes through all kinds of hoops to make sure that the entire snapshot is ingested atomically to disk, even if that means that 4 of the 5 SSTs are tiny. If we had a cleaner policy about snapshotting an entire range's state during evaluation then we could add in some fine-grained synchronization around this snapshot and things would be a lot easier to think through.

Regarding this specific issue, it is surprising how late we set r.mu.state in Replica.applySnapshot. #58378 (comment) is thinking along the right lines.

@tbg tbg changed the title kv,acceptance: acceptance/many-splits failed kvserver: bug in synchronization between snapshot application and lease eval Jan 5, 2021
@tbg
Copy link
Member

tbg commented Jan 5, 2021

Good point about latching being at the wrong level. I remember readOnlyCmdMu and its flaws.

I like the idea that the range state should be snapshotted as well. When we make a ReplicaEvalContext, it basically just passes through to Replica.mu. Instead, it could be a pointer to the replica.mu.state taken together with the pebble snapshot. and we adopt the policy that we never mutate r.mu.state in-place (we always switch it out wholesale, i.e. *r.mu.state is immutable).

I like how that gets us a small step closer to a world in which commands evaluate in a true sandbox,

Either way, with this issue on our plate now, I think the best I can do is to move the state update before the store update. That will already be difficult enough and requires some thinking through as well. For example if we "just" move the state update before the store update, if the snapshot spans a split, we will shrink the replica before notifying the store of that fact which can't be good; similar issues can come up on a merge (we'll destroy the RHS before the LHS expands? Not sure how this all works today). Ideally I can manage to do them both at the same time (i.e. hold replica.mu across store.mu).

irfansharif added a commit to irfansharif/cockroach that referenced this issue Jan 20, 2021
There's a scary lack of synchronization around how we set the
ReplicaState for a given replica, and how we mark a replica as
"initialized". What this means is that very temporarily, it's possible
for the entry in Store.mu.replicas to be both "initialized" and have an
empty ReplicaState. This was an existing problem, but is now more likely
to bite us given the migrations infrastructure attempts to purge
outdated replicas at start up time (when replicas are being initialized,
and we're iterating through extan replicas in the Store.mu.replicas
map).

This issue has caused a bit of recent instability: cockroachdb#59180, cockroachdb#58489,
\cockroachdb#58523, and cockroachdb#58378. While we work on a more considered fix to the
problem (tracked in cockroachdb#58489), we can introduce stop the bleeding in the
interim (and unskip some tests).

Release note: None
irfansharif added a commit to irfansharif/cockroach that referenced this issue Jan 20, 2021
There's a scary lack of synchronization around how we set the
ReplicaState for a given replica, and how we mark a replica as
"initialized". What this means is that very temporarily, it's possible
for the entry in Store.mu.replicas to be both "initialized" and have an
empty ReplicaState. This was an existing problem, but is now more likely
to bite us given the migrations infrastructure attempts to purge
outdated replicas at start up time (when replicas are being initialized,
and we're iterating through extan replicas in the Store.mu.replicas
map).

This issue has caused a bit of recent instability: cockroachdb#59180, cockroachdb#58489,
\cockroachdb#58523, and cockroachdb#58378. While we work on a more considered fix to the
problem (tracked in cockroachdb#58489), we can introduce stop the bleeding in the
interim (and unskip some tests).

Release note: None
craig bot pushed a commit that referenced this issue Jan 20, 2021
59194: kv: introduce a stopgap for lack of ReplicaState synchronization r=irfansharif a=irfansharif

There's a scary lack of synchronization around how we set the
ReplicaState for a given replica, and how we mark a replica as
"initialized". What this means is that very temporarily, it's possible
for the entry in Store.mu.replicas to be both "initialized" and have an
empty ReplicaState. This was an existing problem, but is now more likely
to bite us given the migrations infrastructure attempts to purge
outdated replicas at start up time (when replicas are being initialized,
and we're iterating through extan replicas in the Store.mu.replicas
map).

This issue has caused a bit of recent instability: #59180, #58489,
\#58523, and #58378. While we work on a more considered fix to the
problem (tracked in #58489), we can introduce stop the bleeding in the
interim (and unskip some tests).

Release note: None

59201:  sql: add telemetry for materialized views and set schema. r=otan a=RichardJCai

 sql: add telemetry for materialized views and set schema.

Release note: None

Resolves #57299 

Co-authored-by: irfan sharif <irfanmahmoudsharif@gmail.com>
Co-authored-by: richardjcai <caioftherichard@gmail.com>
pbardea pushed a commit to pbardea/cockroach that referenced this issue Jan 21, 2021
There's a scary lack of synchronization around how we set the
ReplicaState for a given replica, and how we mark a replica as
"initialized". What this means is that very temporarily, it's possible
for the entry in Store.mu.replicas to be both "initialized" and have an
empty ReplicaState. This was an existing problem, but is now more likely
to bite us given the migrations infrastructure attempts to purge
outdated replicas at start up time (when replicas are being initialized,
and we're iterating through extan replicas in the Store.mu.replicas
map).

This issue has caused a bit of recent instability: cockroachdb#59180, cockroachdb#58489,
\cockroachdb#58523, and cockroachdb#58378. While we work on a more considered fix to the
problem (tracked in cockroachdb#58489), we can introduce stop the bleeding in the
interim (and unskip some tests).

Release note: None
irfansharif added a commit to irfansharif/cockroach that referenced this issue Feb 10, 2021
See cockroachdb#59194 and cockroachdb#58489 for more details.

In cockroachdb#58489 we observed a scary lack of synchronization around how we set
the ReplicaState for a given replica, and how we mark a replica as
"initialized". What this meant is that it was possible for the entry in
Store.mu.replicas to be both "initialized" and have an empty
ReplicaState. This is now more likely to bite us given the migrations
infrastructure attempts to purge outdated replicas at start up time
(when replicas are being initialized, and we're iterating through extan
replicas in the Store.mu.replicas map).

We believed this was addressed as part of cockroachdb#58378, but that appears not
to be the case. Lets re-introduce this stop-gap while we investigate.

Release note: None
craig bot pushed a commit that referenced this issue Feb 10, 2021
60429: kv: (re-)introduce a stopgap for lack of ReplicaState synchronization r=irfansharif a=irfansharif

See #59194 and #58489 for more details.

In #58489 we observed a scary lack of synchronization around how we set
the ReplicaState for a given replica, and how we mark a replica as
"initialized". What this meant is that it was possible for the entry in
Store.mu.replicas to be both "initialized" and have an empty
ReplicaState. This is now more likely to bite us given the migrations
infrastructure attempts to purge outdated replicas at start up time
(when replicas are being initialized, and we're iterating through extan
replicas in the Store.mu.replicas map).

We believed this was addressed as part of #58378, but that appears not
to be the case. Lets re-introduce this stop-gap while we investigate.

Release note: None

Co-authored-by: irfan sharif <irfanmahmoudsharif@gmail.com>
craig bot pushed a commit that referenced this issue Feb 10, 2021
60281: sql/pgwire: send placeholder BackendKeyData r=asubiotto,spaskob a=rafiss

fixes #13191 

Some tools expect this message to be returned at connection time and
will not connect without it. CockroachDB does not support pgwire
cancellation, but we can still send a placeholder value here, and
continue ignoring cancellation requests like we already do.

Added a small test to make sure nothing broke.

Release note (sql change): When a connection is established, CockroachDB
will now return a placeholder BackendKeyData message in the response.
This is for compatibility with some tools, but using the BackendKeyData
to cancel a query will still have no effect, just the same as before.



60429: kv: (re-)introduce a stopgap for lack of ReplicaState synchronization r=irfansharif a=irfansharif

See #59194 and #58489 for more details.

In #58489 we observed a scary lack of synchronization around how we set
the ReplicaState for a given replica, and how we mark a replica as
"initialized". What this meant is that it was possible for the entry in
Store.mu.replicas to be both "initialized" and have an empty
ReplicaState. This is now more likely to bite us given the migrations
infrastructure attempts to purge outdated replicas at start up time
(when replicas are being initialized, and we're iterating through extan
replicas in the Store.mu.replicas map).

We believed this was addressed as part of #58378, but that appears not
to be the case. Lets re-introduce this stop-gap while we investigate.

Release note: None

60441: bazel: quash unnecessary dependency on `pkg/util/uuid` from protos r=rickystewart a=rickystewart

This dependency can be replaced with a few `# keep` deps in a few choice
proto targets, which is what we should have done the whole time anyway.
This fixes build failures elsewhere in tree -- for example,
`pkg/util/uuid:uuid_test`, which doesn't play nicely with `rules_go` in
the presence of this dependency.

Fixes #59778.

Release note: None

Co-authored-by: Rafi Shamim <rafi@cockroachlabs.com>
Co-authored-by: irfan sharif <irfanmahmoudsharif@gmail.com>
Co-authored-by: Ricky Stewart <ricky@cockroachlabs.com>
@irfansharif
Copy link
Contributor

Going to re-open this issue as a placeholder given we needed to revert the stop-gap in #60429.

@irfansharif irfansharif reopened this Mar 3, 2021
@irfansharif irfansharif self-assigned this Mar 3, 2021
@tbg
Copy link
Member

tbg commented Mar 16, 2021

Starting to look at this now.

@tbg
Copy link
Member

tbg commented Mar 16, 2021

I ran the kvserver tests with this patch:

diff --git a/pkg/server/testserver.go b/pkg/server/testserver.go
index 082bf10558..b679fda253 100644
--- a/pkg/server/testserver.go
+++ b/pkg/server/testserver.go
@@ -456,7 +456,28 @@ func (ts *TestServer) NodeDialer() *nodedialer.Dialer {
 // completes.
 func (ts *TestServer) Start() error {
        ctx := context.Background()
-       return ts.Server.Start(ctx)
+       err := ts.Server.Start(ctx)
+       if err != nil {
+               return err
+       }
+       return ts.stopper.RunAsyncTask(ctx, "foo", func(ctx context.Context) {
+               for {
+                       select {
+                       case <-ts.stopper.ShouldQuiesce():
+                               return
+                       default:
+                       }
+                       if err := ts.Server.node.stores.VisitStores(func(s *kvserver.Store) error {
+                               s.VisitReplicas(func(repl *kvserver.Replica) (more bool) {
+                                       _ = repl.Version()
+                                       return true
+                               })
+                               return nil
+                       }); err != nil {
+                               panic(err)
+                       }
+               }
+       })
 }
 
 type dummyProtectedTSProvider struct {

To my dismay, it doesn't seem that easy to reproduce this bug.

@irfansharif irfansharif changed the title kvserver: bug in synchronization between snapshot application and lease eval kvserver: purging outdated replicas during migrations run into nil pointers Mar 31, 2021
irfansharif added a commit to irfansharif/cockroach that referenced this issue Mar 31, 2021
Fixes cockroachdb#58378.
Fixes cockroachdb#62267.

Previously it was possible for us to have replicas in-memory, with
pre-migrated state, even after a migration was finalized. This led to
the kind of badness we were observing in cockroachdb#62267, where it appeared that
a replica was not using the applied state key despite us having migrated
into it (see TruncatedAndRangeAppliedState, introduced in cockroachdb#58088).

---

To see how, consider the following set of events:

- Say r42 starts off on n1, n2, and n3
- n3 flaps and so we place a replica for r42 on n4
- n3's replica, r42/3, is now GC-able, but still un-GC-ed
- We run the applied state migration, first migrating all ranges into it
  and then purging outdated replicas
- Well, we should want to purge r42/3, cause it's un-migrated and
  evaluating anything on it (say a lease request) is unsound because
  we've bumped version gates that tell the kvserver to always expect
  post-migration state
- What happens when we try to purge r42/3? Previous to this PR if it
  didn't have a replica version, we'd skip over it (!)
- Was it possible for r42/3 to not have a replica version? Shouldn't it
  have been accounted for when we migrated all ranges? No, that's precisely
  why the migration infrastructure purge outdated replicas. The migrate
  request only returns once its applied on all followers; in our example
  that wouldn't include r42/3 since it was no longer one
- The stop-gap in cockroachdb#60429 made it so that we didn't GC r42/3, when we
  should've been doing the opposite. When iterating over a store's
  replicas for purging purposes, an empty replica version is fine and
  expected; we should interpret that as signal that we're dealing with a
  replica that was obviously never migrated (to even start using replica
  versions in the first place). Because it didn't have a valid replica
  version installed, we can infer that it's soon to be GC-ed (else we
  wouldn't have been able to finalize the applied state + replica
  version migration)
- The conditions above made it possible for us to evaluate requests on
  replicas with migration state out-of-date relative to the store's
  version
- Boom

Release note: None
craig bot pushed a commit that referenced this issue Mar 31, 2021
60835: kv: bump timestamp cache to Pushee.MinTimestamp on PUSH_ABORT r=nvanbenschoten a=nvanbenschoten

Fixes #60779.
Fixes #60580.

We were only checking that the batch header timestamp was equal to or
greater than this pushee's min timestamp, so this is as far as we can
bump the timestamp cache.

62832: geo: minor performance improvement for looping over edges r=otan a=andyyang890

This patch slightly improves the performance of many
spatial builtins by storing the number of edges used
in the loop conditions of for loops into a variable.
We discovered this was taking a lot of time when
profiling the point-in-polygon optimization.

Release note: None

62838: kvserver: purge gc-able, unmigrated replicas during migrations r=irfansharif a=irfansharif

Fixes #58378.
Fixes #62267.

Previously it was possible for us to have replicas in-memory, with
pre-migrated state, even after a migration was finalized. This led to
the kind of badness we were observing in #62267, where it appeared that
a replica was not using the applied state key despite us having migrated
into it (see TruncatedAndRangeAppliedState, introduced in #58088).

---

To see how, consider the following set of events:

- Say r42 starts off on n1, n2, and n3
- n3 flaps and so we place a replica for r42 on n4
- n3's replica, r42/3, is now GC-able, but still un-GC-ed
- We run the applied state migration, first migrating all ranges into it
  and then purging outdated replicas
- Well, we should want to purge r42/3, cause it's un-migrated and
  evaluating anything on it (say a lease request) is unsound because
  we've bumped version gates that tell the kvserver to always expect
  post-migration state
- What happens when we try to purge r42/3? Previous to this PR if it
  didn't have a replica version, we'd skip over it (!)
- Was it possible for r42/3 to not have a replica version? Shouldn't it
  have been accounted for when we migrated all ranges? No, that's precisely
  why the migration infrastructure purge outdated replicas. The migrate
  request only returns once its applied on all followers; in our example
  that wouldn't include r42/3 since it was no longer one
- The stop-gap in #60429 made it so that we didn't GC r42/3, when we
  should've been doing the opposite. When iterating over a store's
  replicas for purging purposes, an empty replica version is fine and
  expected; we should interpret that as signal that we're dealing with a
  replica that was obviously never migrated (to even start using replica
  versions in the first place). Because it didn't have a valid replica
  version installed, we can infer that it's soon to be GC-ed (else we
  wouldn't have been able to finalize the applied state + replica
  version migration)
- The conditions above made it possible for us to evaluate requests on
  replicas with migration state out-of-date relative to the store's
  version
- Boom

Release note: None


62839: zonepb: make subzone DiffWithZone more accurate r=ajstorm a=otan

* Subzones may be defined in a different order. We did not take this
  into account which can cause bugs when e.g. ADD REGION adds a subzone
  in the end rather than in the old "expected" location in the subzones
  array. This has been fixed by comparing subzones using an unordered
  map.
* The ApplyZoneConfig we previously did overwrote subzone fields on the
  original subzone array element, meaning that if there was a mismatch
  it would not be reported through validation. This is now fixed by
  applying the expected zone config to *zonepb.NewZoneConfig() instead.
* Added logic to only check for zone config matches subzones from
  active subzone IDs.
* Improve the error messaging when a subzone config is mismatching -
  namely, add index and partitioning information and differentiate
  between missing fields and missing / extraneous zone configs

Resolves #62790

Release note (bug fix): Fixed validation bugs during ALTER TABLE ... SET
LOCALITY / crdb_internal.validate_multi_region_zone_config where
validation errors could occur when the database of a REGIONAL BY ROW
table has a new region added. Also fix a validation bug partition zone
mismatches configs were not caught.

62872: build: use -json for RandomSyntax test r=otan a=rafiss

I'm hoping this will help out with an issue where the test failures seem
to be missing helpful logs.

Release note: None

Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
Co-authored-by: Andy Yang <ayang@cockroachlabs.com>
Co-authored-by: irfan sharif <irfanmahmoudsharif@gmail.com>
Co-authored-by: Oliver Tan <otan@cockroachlabs.com>
Co-authored-by: Rafi Shamim <rafi@cockroachlabs.com>
@craig craig bot closed this as completed in 1416a4e Apr 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants