Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: add regression test for #21146 #39694

Merged
merged 1 commit into from
Aug 16, 2019
Merged

Conversation

danhhz
Copy link
Contributor

@danhhz danhhz commented Aug 15, 2019

This verifies the behavior of when the application of some split command
(part of the lhs's log) is delayed on some store and meanwhile the rhs
has rebalanced away and back, ending up with a larger ReplicaID than the
split thinks it will have.

Release note: None

@danhhz danhhz requested a review from tbg August 15, 2019 21:15
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks for working this out.

Reviewed 4 of 4 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @danhhz)


pkg/storage/client_split_test.go, line 3220 at r1 (raw file):

	defer tc.Stopper().Stop(ctx)
	db := sqlutils.MakeSQLRunner(tc.ServerConn(0))
	db.Exec(t, `SET CLUSTER SETTING kv.learner_replicas.enabled = true`)

Is this needed? They're always on now?


pkg/storage/client_split_test.go, line 3228 at r1 (raw file):

	// an orphaned learner on each side of the split. After the learner is
	// created, but before the split, block all incoming raft traffic to the
	// learner on the lhs of the split.

On the second node (index 1).


pkg/storage/client_split_test.go, line 3261 at r1 (raw file):

	// Wait for there to be an in-memory, uninitialized learner replica with the
	// lastest ReplicaID. Note: it cannot become initialized at this point because

latest


pkg/storage/client_split_test.go, line 3273 at r1 (raw file):

			return err
		}
		if replicaID := roachpb.ReplicaID(repl.RaftStatus().ID); replicaID != learnerDesc.ReplicaID {

It looks like you could make this more transparent if you maintained replicaID in the for i := 0; ... loop above. Then you know what replicaID you're looking for already and all you have to do is look up learnerDesc and confirm. Right now I'm scrutinizing whether you're picking the replicaID from an authoritative source, etc.


pkg/storage/client_split_test.go, line 3280 at r1 (raw file):

	// Re-enable raft and wait for the lhs to catch up to the post-split
	// descriptor. This used to panic with "raft group deleted".

You verified this with the workaround reverted, right?

This verifies the behavior of when the application of some split command
(part of the lhs's log) is delayed on some store and meanwhile the rhs
has rebalanced away and back, ending up with a larger ReplicaID than the
split thinks it will have.

Release note: None
Copy link
Contributor Author

@danhhz danhhz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

bors r=tbg

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @tbg)


pkg/storage/client_split_test.go, line 3220 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Is this needed? They're always on now?

It's not but I'm being defensive against that default being switched back. We can remove them all when we remove the cluster setting


pkg/storage/client_split_test.go, line 3228 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

On the second node (index 1).

Done.


pkg/storage/client_split_test.go, line 3261 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

latest

Done.


pkg/storage/client_split_test.go, line 3273 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

It looks like you could make this more transparent if you maintained replicaID in the for i := 0; ... loop above. Then you know what replicaID you're looking for already and all you have to do is look up learnerDesc and confirm. Right now I'm scrutinizing whether you're picking the replicaID from an authoritative source, etc.

I can't get it from AddReplicas because we're getting an error and the returned descriptor is wonky with the ReplicaAddSkipLearnerRollback hook. Added a comment


pkg/storage/client_split_test.go, line 3280 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

You verified this with the workaround reverted, right?

Yup! And if i only revert your second fix, i get the "replicaID cannot move backwards from 7 to 2" error

craig bot pushed a commit that referenced this pull request Aug 16, 2019
39694: storage: add regression test for #21146 r=tbg a=danhhz

This verifies the behavior of when the application of some split command
(part of the lhs's log) is delayed on some store and meanwhile the rhs
has rebalanced away and back, ending up with a larger ReplicaID than the
split thinks it will have.

Release note: None

Co-authored-by: Daniel Harrison <daniel.harrison@gmail.com>
@craig
Copy link
Contributor

craig bot commented Aug 16, 2019

Build succeeded

@craig craig bot merged commit 95f440f into cockroachdb:master Aug 16, 2019
@danhhz danhhz deleted the split_test branch September 3, 2019 19:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants