-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: kv.atomic_replication_changes=true #40464
Conversation
This is exciting!
|
Hitting "descriptor changed" errors again in mixed-headroom, which means another migration concern:
The I found two callers that didn't check the cluster version, it seems likely that they're used during IMPORT and caused this problem. Added a commit to address this. I saw this in 2 out of 10 runs, so it's going to take a bit of time to confirm it's gone, but pretty sure this is it |
901e38d
to
b6612c7
Compare
5/5 headrooms and 4/5 mixed-headrooms passed. The one failure is the above, and is not related to this PR. Going to kick of another 10 mixed-headrooms with the presumed fix for the split error. |
@nvanbenschoten could you give me a review here? This PR was trivial but now it has a fix for the botched sticky migration I discovered while testing, and that patch won't apply cleanly on master. |
PS don't want to withhold the satisfying mixed-headroom results:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r1, 2 of 3 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @tbg)
pkg/storage/replica_command.go, line 470 at r2 (raw file):
newDesc := *desc newDesc.StickyBit = nil // can use &zero in 20.1
Is it possible to get here before all nodes update to 19.2, given the if (desc.GetStickyBit() == hlc.Timestamp{}) {
check?
pkg/storage/bulk/sst_batcher.go, line 250 at r2 (raw file):
log.Warning(ctx, err) } else { // NB: Passing 'hour' here is technically illegal until 19.2 is
Does all of the other complexity in this commit go away if we actually do this migration correctly and keep the use of AdminSplitRequest.ExpirationTime
as the single migration boundary? It seems pretty easy to plumb a cluster setting into SSTBatcher
and add logic like we have in restore.go -> splitAndScatter
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)
pkg/storage/replica_command.go, line 470 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Is it possible to get here before all nodes update to 19.2, given the
if (desc.GetStickyBit() == hlc.Timestamp{}) {
check?
You're right, the early return prevents this. I added a comment instead of using &zero here because both are equivalent and nil
is more obviously safe.
pkg/storage/bulk/sst_batcher.go, line 250 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Does all of the other complexity in this commit go away if we actually do this migration correctly and keep the use of
AdminSplitRequest.ExpirationTime
as the single migration boundary? It seems pretty easy to plumb a cluster setting intoSSTBatcher
and add logic like we have inrestore.go -> splitAndScatter
.
ExpirationTime
is not encapsulated at all. We have 10 calls to settings.Version.IsActive(cluster.VersionStickyBit)
before this PR, all at various callers to AdminSplit
. The offending calls were added after the sticky bit was introduced (and don't handle the case in which the sticky bit is not available), simply because it's so hard to tell that there is a migration to worry about.
So yes, if nobody ever issued an incorrect AdminSplit, there wouldn't be a problem (except that I think something was still sometimes using &zero here instead of nil).
I can add the cluster version check here if it tickles your OCD that I omitted it, but I'd rather not make any changes in the storage end of things, because a) I don't want to rely on the callers as outline before, and b) the moment I make a substantial change here I'll have to spend hours re-validating the results, and I don't think that's worth it.
b6612c7
to
78e4c88
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)
pkg/storage/replica_command.go, line 74 at r4 (raw file):
} else if err := detail.ActualValue.GetProto(&actualDesc); err == nil && desc.RangeID == actualDesc.RangeID && !desc.Equal(actualDesc) { return fmt.Sprintf("descriptor changed: [expected] %+#v != [actual] %+#v", desc, &actualDesc), true
This is unintentional debugging detritus. Reminder to self to remove
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you don't think it's worth cleaning up the migration further.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten and @tbg)
pkg/storage/replica_command.go, line 246 at r4 (raw file):
} func splitTxnStickyUpdateAttempt(
I think this was assuming that it could always set StickyBit
to a non-nil value because it was only called in the if desc.GetStickyBit().Less(args.ExpirationTime) {
case, which was only possible if the correct cluster version was enabled. That should have been documented, but that should still be the case.
pkg/storage/bulk/sst_batcher.go, line 250 at r2 (raw file):
Previously, tbg (Tobias Grieger) wrote…
ExpirationTime
is not encapsulated at all. We have 10 calls tosettings.Version.IsActive(cluster.VersionStickyBit)
before this PR, all at various callers toAdminSplit
. The offending calls were added after the sticky bit was introduced (and don't handle the case in which the sticky bit is not available), simply because it's so hard to tell that there is a migration to worry about.So yes, if nobody ever issued an incorrect AdminSplit, there wouldn't be a problem (except that I think something was still sometimes using &zero here instead of nil).
I can add the cluster version check here if it tickles your OCD that I omitted it, but I'd rather not make any changes in the storage end of things, because a) I don't want to rely on the callers as outline before, and b) the moment I make a substantial change here I'll have to spend hours re-validating the results, and I don't think that's worth it.
If you feel comfortable with the degree of validation that this setup for the migration has gotten then I am. It just seemed easier to fix the botched migration with SplitAndScatter
than forcing the handling of AdminSplit to need to worry about non-empty ExpirationTime
values.
78e4c88
to
8c910ec
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bors r=nvanbenschoten
TFTR!
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @nvanbenschoten)
flake was DML txn retry in logic test: #40542 (comment) bors r=nvanbenschoten |
Build failed (retrying...) |
Bors you liar bors r=nvanbenschoten |
I wonder if this PR crashes bors every time? bors r=nvanbenschoten |
bors r+ |
Let's try this again. bors r=nvanbenschoten |
40464: storage: kv.atomic_replication_changes=true r=nvanbenschoten a=tbg I ran the experiments in #40370 (comment) on (essentially) this branch and everything passed. Going to run another five instances of mixed-headroom and headroom with this change to shake out anything else that I might've missed. Release note (general change): atomic replication changes are now enabled by default. Co-authored-by: Tobias Schottdorf <tobias.schottdorf@gmail.com>
Before 19.2, callers to AdminSplit are only ever supposed to pass `Expiration==hlc.Timestamp{}` as this triggers the legacy code path necessary while nodes in the cluster may not know desc.StickyBit yet. To avoid CPut problems when those nodes update the range descriptor, we must not persist non-nil values on it in that case. It looked like we would sometimes still persist a &zero, which could cause problems. The bigger problem though was that there were also two callers that straight-up didn't check the cluster version and passed nonzero values into AdminSplit. These callers were added recently and I can't blame anyone there; it is impossible to know that one argument needs to be zero before 19.2. Instead of trying to fix this invariant (which wasn't trivial in this case) just ignore expiration times when the coordinator doesn't think 19.2 is already active. This could lead to sticky bits being ignored right around the cluster transition, but that seems much better than risking old nodes not being able to carry out any changes to the descriptors any more (which is the consequence of writing non-nil sticky bits before this is safe). Release note (bug fix): Fix a cluster migration bug that could occur while running in a mixed 19.1/19.2 cluster. The symptom would be messages of the form: ``` X at key /Table/54 failed: unexpected value: ... ``` Affected clusters should be updated to 19.2 or, if 19.1 is desired, recreated from a backup.
Release note (general change): atomic replication changes are now enabled by default.
8c910ec
to
b93a4ee
Compare
Rebased on master and pushed again. Let's see if that helps. bors r=nvanbenschoten |
40464: storage: kv.atomic_replication_changes=true r=nvanbenschoten a=tbg I ran the experiments in #40370 (comment) on (essentially) this branch and everything passed. Going to run another five instances of mixed-headroom and headroom with this change to shake out anything else that I might've missed. Release note (general change): atomic replication changes are now enabled by default. Co-authored-by: Tobias Schottdorf <tobias.schottdorf@gmail.com>
Build succeeded |
I ran the experiments in
#40370 (comment)
on (essentially) this branch and everything passed.
Going to run another five instances of mixed-headroom and headroom with this
change to shake out anything else that I might've missed.
Release note (general change): atomic replication changes are now enabled by
default.