-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: add regression test for #21146 #39694
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, thanks for working this out.
Reviewed 4 of 4 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @danhhz)
pkg/storage/client_split_test.go, line 3220 at r1 (raw file):
defer tc.Stopper().Stop(ctx) db := sqlutils.MakeSQLRunner(tc.ServerConn(0)) db.Exec(t, `SET CLUSTER SETTING kv.learner_replicas.enabled = true`)
Is this needed? They're always on now?
pkg/storage/client_split_test.go, line 3228 at r1 (raw file):
// an orphaned learner on each side of the split. After the learner is // created, but before the split, block all incoming raft traffic to the // learner on the lhs of the split.
On the second node (index 1).
pkg/storage/client_split_test.go, line 3261 at r1 (raw file):
// Wait for there to be an in-memory, uninitialized learner replica with the // lastest ReplicaID. Note: it cannot become initialized at this point because
latest
pkg/storage/client_split_test.go, line 3273 at r1 (raw file):
return err } if replicaID := roachpb.ReplicaID(repl.RaftStatus().ID); replicaID != learnerDesc.ReplicaID {
It looks like you could make this more transparent if you maintained replicaID in the for i := 0; ...
loop above. Then you know what replicaID you're looking for already and all you have to do is look up learnerDesc
and confirm. Right now I'm scrutinizing whether you're picking the replicaID from an authoritative source, etc.
pkg/storage/client_split_test.go, line 3280 at r1 (raw file):
// Re-enable raft and wait for the lhs to catch up to the post-split // descriptor. This used to panic with "raft group deleted".
You verified this with the workaround reverted, right?
This verifies the behavior of when the application of some split command (part of the lhs's log) is delayed on some store and meanwhile the rhs has rebalanced away and back, ending up with a larger ReplicaID than the split thinks it will have. Release note: None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR!
bors r=tbg
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @tbg)
pkg/storage/client_split_test.go, line 3220 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
Is this needed? They're always on now?
It's not but I'm being defensive against that default being switched back. We can remove them all when we remove the cluster setting
pkg/storage/client_split_test.go, line 3228 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
On the second node (index 1).
Done.
pkg/storage/client_split_test.go, line 3261 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
latest
Done.
pkg/storage/client_split_test.go, line 3273 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
It looks like you could make this more transparent if you maintained replicaID in the
for i := 0; ...
loop above. Then you know what replicaID you're looking for already and all you have to do is look uplearnerDesc
and confirm. Right now I'm scrutinizing whether you're picking the replicaID from an authoritative source, etc.
I can't get it from AddReplicas because we're getting an error and the returned descriptor is wonky with the ReplicaAddSkipLearnerRollback hook. Added a comment
pkg/storage/client_split_test.go, line 3280 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
You verified this with the workaround reverted, right?
Yup! And if i only revert your second fix, i get the "replicaID cannot move backwards from 7 to 2" error
39694: storage: add regression test for #21146 r=tbg a=danhhz This verifies the behavior of when the application of some split command (part of the lhs's log) is delayed on some store and meanwhile the rhs has rebalanced away and back, ending up with a larger ReplicaID than the split thinks it will have. Release note: None Co-authored-by: Daniel Harrison <daniel.harrison@gmail.com>
Build succeeded |
This verifies the behavior of when the application of some split command
(part of the lhs's log) is delayed on some store and meanwhile the rhs
has rebalanced away and back, ending up with a larger ReplicaID than the
split thinks it will have.
Release note: None