-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: test atomic replication changes #40234
Conversation
Release note: None
This will be used to test the behavior of and interactions with joint configurations. Release note: None
Release note: None
Release note: None
Release note: None
Release note: None
In adapting this test I also found a fat bug, which was that removing a learner while trying to make it a joint change would turn the learner into an outgoing voter. This is now fixed (and exercised by the test). Release note: None
Release note: None
Release note: None
It simply transitions them out before proceeding. Release note: None
Release note: None
Release note: None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r1, 3 of 3 files at r2, 3 of 3 files at r3, 1 of 1 files at r4, 1 of 1 files at r5, 1 of 1 files at r6, 5 of 5 files at r7, 2 of 2 files at r8, 1 of 1 files at r9, 4 of 4 files at r10, 1 of 1 files at r11, 2 of 2 files at r12.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @tbg)
pkg/roachpb/metadata_replicas.go, line 214 at r7 (raw file):
// returns true. The memory returned may be shared with the receiver. func (d ReplicaDescriptors) Filter( pred func(descriptor ReplicaDescriptor) bool,
nit: if you don't name the descriptor
arg, does this avoid the wrapping?
pkg/storage/merge_queue.go, line 294 at r10 (raw file):
} rhsDesc, err = maybeLeaveAtomicChangeReplicas(ctx, lhsRepl.store, rhsDesc)
It shouldn't matter, but rhsRepl.store
and rhsRepl.store.DB()
below would leave less room for confusion.
pkg/storage/replica_command.go, line 1354 at r7 (raw file):
if typ := rDesc.GetType(); !useJoint || typ != roachpb.VOTER_FULL { // NB: typ != VOTER_FULL means it's a LEARNER since we verified above that we // did not start in a joint config.
What's the importance of this comment? That if useJoint == true
, typ
must be a LEARNER because nothing else should be removed directly in a joint config?
pkg/storage/replica_learner_test.go, line 630 at r6 (raw file):
require.True(t, testutils.IsError(err, exp), err) // NB: we don't have to transition out of the joint config first because
"the previous joint config"
pkg/storage/replica_learner_test.go, line 721 at r7 (raw file):
// Removing the voter (and remaining in joint config) does.
Stray line.
pkg/storage/replica_learner_test.go, line 834 at r8 (raw file):
checkFails() // Add a VOTER_INCOMING to desc2 to make sure it actually exludes this type
It would be helpful to spell out the expected descriptors at each stage. It took me a while to understand why this was doing what it does.
pkg/storage/replica_learner_test.go, line 837 at r8 (raw file):
// of replicas from merges (rather than really just checking whether the // replica sets are equal).
Stray line?
pkg/storage/replica_learner_test.go, line 55 at r10 (raw file):
} func (rtl *replicationTestKnobs) withLearnerStop(f func()) {
I'd keep these named something like withStopAfterLearnerAtomic
and withStopAfterJointConfig
so that they are clearly associated with their corresponding knobs. That also reads better.
pkg/storage/replica_learner_test.go, line 56 at r10 (raw file):
func (rtl *replicationTestKnobs) withLearnerStop(f func()) { prev := atomic.LoadInt64(&rtl.replicaAddStopAfterLearnerAtomic)
nit: atomic.SwapInt64
would clean this up a bit.
And below.
pkg/storage/replica_learner_test.go, line 940 at r10 (raw file):
} {
Give this entire block a comment.
pkg/storage/replica_learner_test.go, line 968 at r10 (raw file):
Should we add:
require.False(t, desc.Replicas().InAtomicReplicationChange(), desc)
pkg/storage/replicate_queue.go, line 351 at r12 (raw file):
case AllocatorFinalizeAtomicReplicationChange: _, err := maybeLeaveAtomicChangeReplicas(ctx, repl.store, repl.Desc()) return true, err
Explain why we requeue.
pkg/storage/testing_knobs.go, line 202 at r2 (raw file):
ReplicaAddStopAfterLearnerSnapshot func() bool // ReplicaAddStopAfterJointConfig causes replica addition to return early if // the func returns true. This happens before transitioning out of a joint
Mention what this comes after.
pkg/storage/testing_knobs.go, line 206 at r3 (raw file):
ReplicaAddStopAfterJointConfig func() bool // ReplicationAlwaysUseJointConfig causes replica addition to always go // through a joint configuration, even when this isn't necessary.
Mention why this isn't already the case. In other words, when isn't this necessary.
pkg/storage/batcheval/cmd_lease.go, line 51 at r5 (raw file):
`could not find replica for store %s in %s`, rec.StoreID(), rec.Desc()) } else if t := repDesc.GetType(); t != roachpb.VOTER_FULL { // NB: there's no harm in transferring the lease to a VOTER_INCOMING,
Because we don't allow the leaseholder of a range to be removed, there will always be at least one VOTER_FULL while in joint consensus, right? So we'll never get stuck in a situation where no-one can become the leaseholder? If so, that would be worth adding to this comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed all commits via fixups, RFAL
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)
pkg/storage/merge_queue.go, line 294 at r10 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
It shouldn't matter, but
rhsRepl.store
andrhsRepl.store.DB()
below would leave less room for confusion.
There's no rhsRepl
, but I made it more symmetric-looking by declaring temporaries for store
and db
.
pkg/storage/replica_command.go, line 1354 at r7 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
What's the importance of this comment? That if
useJoint == true
,typ
must be a LEARNER because nothing else should be removed directly in a joint config?
yeah. I made this less confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r13, 1 of 1 files at r14, 1 of 1 files at r15, 1 of 1 files at r16, 1 of 1 files at r17, 1 of 1 files at r18, 1 of 1 files at r19, 1 of 1 files at r20, 1 of 1 files at r21, 1 of 1 files at r22, 1 of 1 files at r23, 1 of 1 files at r24, 1 of 1 files at r25, 2 of 2 files at r26, 1 of 1 files at r27.
Reviewable status: complete! 1 of 0 LGTMs obtained
Thanks @nvanbenschoten! bors r=nvanbenschoten |
40234: storage: test atomic replication changes r=nvanbenschoten a=tbg This PR adds a number of tests that focus on the interaction between the various queues and joint configurations. We don't flip the switch yet since adding/removing only learners does not work yet via the joint path. This isn't something we need per se, but it's an annoying restriction to keep in mind and work around if it does happen. Tracked in #12768 (comment). 40267: tree: make int::regtype::text O(1) r=jordanlewis a=jordanlewis Previously, casting an integer to a regtype and then text (which turns a type OID into the string of its corresponding type) would run a select over pg_type to figure out the answer, which under the hood materializes all of the types into a table and filters, an O(n) operation. This is silly because we already have a static lookup table for this info. Use it. This commonly shows up in visualization tools as O(n^2), since people tend to run one of these casts once per type. So this improves metadata query performance significantly. Release note: None Co-authored-by: Tobias Schottdorf <tobias.schottdorf@gmail.com> Co-authored-by: Jordan Lewis <jordanthelewis@gmail.com>
Build succeeded |
This PR adds a number of tests that focus on the interaction between the various
queues and joint configurations.
We don't flip the switch yet since adding/removing only learners does not work
yet via the joint path. This isn't something we need per se, but it's an
annoying restriction to keep in mind and work around if it does happen. Tracked
in
#12768 (comment).