Currently, in-memory Replica objects can end up having a replicaID zero.
Roughly speaking, this is always the case when a Replica's range
descriptor does not contain the Replica's store, though sometimes we do
have a replicaID taken from incoming Raft messages (which then won't
survive across a restart).
We end up in this unnatural state mostly due to preemptive snapshots,
which are a snapshot of the Range state before adding a certain replica,
sent to the store that will house that replica once the configuration
change to add it has completed. The range descriptor in the snapshot
cannot yet assign the Replica a proper replicaID because none has been
allocated yet (and this allocation has to be done in the replica change
transaction, which hasn't started yet).
Even when the configuration change completes and the leader starts
"catching up" the preemptive snapshot and informs it of the replicaID,
it will take a few moments until the Replica catches up to the log entry
that actually updates the descriptor. If the node reboots before that
log entry is durably applied, the replicaID will "restart" at zero until
the leader contacts the Replica again.
This suggests that preemptive snapshots introduce fundamental complexity
which we'd like to avoid - as long as we use preemptive snapshots there
will not be sanity in this department.
This PR introduces a mechanism which delays the application of
preemptive snapshots so that we apply them only when the first request
*after* the completed configuration change comes in (at which point a
replicaID is present).
Superficially, this seems to solve the above problem (since the Replica
will only be instantiated the moment a replicaID is known), though it
doesn't do so across restarts.
However, if we synchronously persisted (not done in this PR) the
replicaID from incoming Raft messages whenever it changed, it seems that
we should always be able to assign a replicaID when creating a Replica,
even when dealing with descriptors that don't contain the replica itself
(since there would've been a Raft message with a replicaID at some
point, and we persist that). This roughly corresponds to persisting
`Replica.lastToReplica`.
We ultimately want to switch to learner replicas instead of preemptive
snapshots. Learner replicas have the advantage that they are always
represented in the replica descriptor, and so the snapshot that
initializes them will be a proper Raft snapshot containing a descriptor
containing the learner Replica itself. However, it's clear that we need
to continue supporting preemptive snapshots in 19.2 due to the need to
support mixed 19.1/19.2 clusters.
This PR in conjunction with persisting the replicaID (and auxiliary
work, for example on the split lock which currently also creates a
replica with replicaID zero and which we know [has bugs]) should allow
us to remove replicaID zero from the code base without waiting out the
19.1 release cycle.
[has bugs]: cockroachdb#21146
Release note: None