Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: test atomic replication changes #40234

Merged
merged 27 commits into from
Aug 28, 2019
Merged

Conversation

tbg
Copy link
Member

@tbg tbg commented Aug 27, 2019

This PR adds a number of tests that focus on the interaction between the various
queues and joint configurations.

We don't flip the switch yet since adding/removing only learners does not work
yet via the joint path. This isn't something we need per se, but it's an
annoying restriction to keep in mind and work around if it does happen. Tracked
in
#12768 (comment).

tbg added 12 commits August 27, 2019 13:08
Release note: None
This will be used to test the behavior of and interactions with joint
configurations.

Release note: None
In adapting this test I also found a fat bug, which was that removing
a learner while trying to make it a joint change would turn the learner
into an outgoing voter. This is now fixed (and exercised by the test).

Release note: None
It simply transitions them out before proceeding.

Release note: None
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r1, 3 of 3 files at r2, 3 of 3 files at r3, 1 of 1 files at r4, 1 of 1 files at r5, 1 of 1 files at r6, 5 of 5 files at r7, 2 of 2 files at r8, 1 of 1 files at r9, 4 of 4 files at r10, 1 of 1 files at r11, 2 of 2 files at r12.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @tbg)


pkg/roachpb/metadata_replicas.go, line 214 at r7 (raw file):

// returns true. The memory returned may be shared with the receiver.
func (d ReplicaDescriptors) Filter(
	pred func(descriptor ReplicaDescriptor) bool,

nit: if you don't name the descriptor arg, does this avoid the wrapping?


pkg/storage/merge_queue.go, line 294 at r10 (raw file):

		}

		rhsDesc, err = maybeLeaveAtomicChangeReplicas(ctx, lhsRepl.store, rhsDesc)

It shouldn't matter, but rhsRepl.store and rhsRepl.store.DB() below would leave less room for confusion.


pkg/storage/replica_command.go, line 1354 at r7 (raw file):

				if typ := rDesc.GetType(); !useJoint || typ != roachpb.VOTER_FULL {
					// NB: typ != VOTER_FULL means it's a LEARNER since we verified above that we
					// did not start in a joint config.

What's the importance of this comment? That if useJoint == true, typ must be a LEARNER because nothing else should be removed directly in a joint config?


pkg/storage/replica_learner_test.go, line 630 at r6 (raw file):

	require.True(t, testutils.IsError(err, exp), err)

	// NB: we don't have to transition out of the joint config first because

"the previous joint config"


pkg/storage/replica_learner_test.go, line 721 at r7 (raw file):

	// Removing the voter (and remaining in joint config) does.

Stray line.


pkg/storage/replica_learner_test.go, line 834 at r8 (raw file):

	checkFails()

	// Add a VOTER_INCOMING to desc2 to make sure it actually exludes this type

It would be helpful to spell out the expected descriptors at each stage. It took me a while to understand why this was doing what it does.


pkg/storage/replica_learner_test.go, line 837 at r8 (raw file):

	// of replicas from merges (rather than really just checking whether the
	// replica sets are equal).

Stray line?


pkg/storage/replica_learner_test.go, line 55 at r10 (raw file):

}

func (rtl *replicationTestKnobs) withLearnerStop(f func()) {

I'd keep these named something like withStopAfterLearnerAtomic and withStopAfterJointConfig so that they are clearly associated with their corresponding knobs. That also reads better.


pkg/storage/replica_learner_test.go, line 56 at r10 (raw file):

func (rtl *replicationTestKnobs) withLearnerStop(f func()) {
	prev := atomic.LoadInt64(&rtl.replicaAddStopAfterLearnerAtomic)

nit: atomic.SwapInt64 would clean this up a bit.

And below.


pkg/storage/replica_learner_test.go, line 940 at r10 (raw file):

	}

	{

Give this entire block a comment.


pkg/storage/replica_learner_test.go, line 968 at r10 (raw file):
Should we add:

require.False(t, desc.Replicas().InAtomicReplicationChange(), desc)

pkg/storage/replicate_queue.go, line 351 at r12 (raw file):

	case AllocatorFinalizeAtomicReplicationChange:
		_, err := maybeLeaveAtomicChangeReplicas(ctx, repl.store, repl.Desc())
		return true, err

Explain why we requeue.


pkg/storage/testing_knobs.go, line 202 at r2 (raw file):

	ReplicaAddStopAfterLearnerSnapshot func() bool
	// ReplicaAddStopAfterJointConfig causes replica addition to return early if
	// the func returns true. This happens before transitioning out of a joint

Mention what this comes after.


pkg/storage/testing_knobs.go, line 206 at r3 (raw file):

	ReplicaAddStopAfterJointConfig func() bool
	// ReplicationAlwaysUseJointConfig causes replica addition to always go
	// through a joint configuration, even when this isn't necessary.

Mention why this isn't already the case. In other words, when isn't this necessary.


pkg/storage/batcheval/cmd_lease.go, line 51 at r5 (raw file):

			`could not find replica for store %s in %s`, rec.StoreID(), rec.Desc())
	} else if t := repDesc.GetType(); t != roachpb.VOTER_FULL {
		// NB: there's no harm in transferring the lease to a VOTER_INCOMING,

Because we don't allow the leaseholder of a range to be removed, there will always be at least one VOTER_FULL while in joint consensus, right? So we'll never get stuck in a situation where no-one can become the leaseholder? If so, that would be worth adding to this comment.

Copy link
Member Author

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed all commits via fixups, RFAL

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)


pkg/storage/merge_queue.go, line 294 at r10 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

It shouldn't matter, but rhsRepl.store and rhsRepl.store.DB() below would leave less room for confusion.

There's no rhsRepl, but I made it more symmetric-looking by declaring temporaries for store and db.


pkg/storage/replica_command.go, line 1354 at r7 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

What's the importance of this comment? That if useJoint == true, typ must be a LEARNER because nothing else should be removed directly in a joint config?

yeah. I made this less confusing.

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 1 files at r13, 1 of 1 files at r14, 1 of 1 files at r15, 1 of 1 files at r16, 1 of 1 files at r17, 1 of 1 files at r18, 1 of 1 files at r19, 1 of 1 files at r20, 1 of 1 files at r21, 1 of 1 files at r22, 1 of 1 files at r23, 1 of 1 files at r24, 1 of 1 files at r25, 2 of 2 files at r26, 1 of 1 files at r27.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained

@tbg
Copy link
Member Author

tbg commented Aug 28, 2019

Thanks @nvanbenschoten!

bors r=nvanbenschoten

craig bot pushed a commit that referenced this pull request Aug 28, 2019
40234: storage: test atomic replication changes r=nvanbenschoten a=tbg

This PR adds a number of tests that focus on the interaction between the various
queues and joint configurations.

We don't flip the switch yet since adding/removing only learners does not work
yet via the joint path. This isn't something we need per se, but it's an
annoying restriction to keep in mind and work around if it does happen. Tracked
in
#12768 (comment).

40267: tree: make int::regtype::text O(1) r=jordanlewis a=jordanlewis

Previously, casting an integer to a regtype and then text (which turns a
type OID into the string of its corresponding type) would run a select
over pg_type to figure out the answer, which under the hood
materializes all of the types into a table and filters, an O(n)
operation. This is silly because we already have a static lookup table
for this info. Use it.

This commonly shows up in visualization tools as O(n^2), since people
tend to run one of these casts once per type. So this improves metadata
query performance significantly.

Release note: None

Co-authored-by: Tobias Schottdorf <tobias.schottdorf@gmail.com>
Co-authored-by: Jordan Lewis <jordanthelewis@gmail.com>
@craig
Copy link
Contributor

craig bot commented Aug 28, 2019

Build succeeded

@craig craig bot merged commit 4c3d5a5 into cockroachdb:master Aug 28, 2019
@tbg tbg deleted the atomic/default-on branch August 28, 2019 18:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants