Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: kv.atomic_replication_changes=true #40464

Merged
merged 2 commits into from
Sep 9, 2019

Conversation

tbg
Copy link
Member

@tbg tbg commented Sep 4, 2019

I ran the experiments in

#40370 (comment)

on (essentially) this branch and everything passed.

Going to run another five instances of mixed-headroom and headroom with this
change to shake out anything else that I might've missed.

Release note (general change): atomic replication changes are now enabled by
default.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@awoods187
Copy link
Contributor

awoods187 commented Sep 4, 2019 via email

@tbg
Copy link
Member Author

tbg commented Sep 4, 2019

Hitting "descriptor changed" errors again in mixed-headroom, which means another migration concern:

func TestFoo(t *testing.T) {
	s := []byte("_oS\267\003\010\032\022\010\275\211\366\314\212\206\367\262\032\001\302\"\006\010\003\020\003\030\001\"\006\010\002\020\002\030\002\"\006\010\001\020\001\030\004(\0050\003:\n\010\262\202\234\305\351\313\316\340\025")
	v := Value{RawBytes: s}
	var desc RangeDescriptor
	require.NoError(t, v.GetProto(&desc))
	t.Error(pretty.Sprint(&desc))
}
    data_test.go:1891: &roachpb.RangeDescriptor{
            RangeID:          26,
            StartKey:         {0xbd, 0x89, 0xf6, 0xcc, 0x8a, 0x86, 0xf7, 0xb2},
            EndKey:           {0xc2},
            InternalReplicas: {
                {
                    NodeID:    3,
                    StoreID:   3,
                    ReplicaID: 1,
                    Type:      (*roachpb.ReplicaType)(nil),
                },
                {
                    NodeID:    2,
                    StoreID:   2,
                    ReplicaID: 2,
                    Type:      (*roachpb.ReplicaType)(nil),
                },
                {
                    NodeID:    1,
                    StoreID:   1,
                    ReplicaID: 4,
                    Type:      (*roachpb.ReplicaType)(nil),
                },
            },
            NextReplicaID:        5,
            Generation:           &int64(3),
            GenerationComparable: (*bool)(nil),
            StickyBit:            &hlc.Timestamp{WallTime:1567598323264061746, Logical:0},   

The StickyBit should not be set; we're running in 19.1 mode and 19.1 doesn't know about the sticky bit.

I found two callers that didn't check the cluster version, it seems likely that they're used during IMPORT and caused this problem.

Added a commit to address this. I saw this in 2 out of 10 runs, so it's going to take a bit of time to confirm it's gone, but pretty sure this is it

@tbg
Copy link
Member Author

tbg commented Sep 4, 2019

5/5 headrooms and 4/5 mixed-headrooms passed. The one failure is the above, and is not related to this PR.

Going to kick of another 10 mixed-headrooms with the presumed fix for the split error.

@tbg
Copy link
Member Author

tbg commented Sep 4, 2019

@nvanbenschoten could you give me a review here? This PR was trivial but now it has a fix for the botched sticky migration I discovered while testing, and that patch won't apply cleanly on master.

@tbg
Copy link
Member Author

tbg commented Sep 4, 2019

PS don't want to withhold the satisfying mixed-headroom results:

--- PASS: tpcc/mixed-headroom/n5cpu16 11346.62s
--- PASS: tpcc/mixed-headroom/n5cpu16 11528.79s
--- PASS: tpcc/mixed-headroom/n5cpu16 11709.03s
--- PASS: tpcc/mixed-headroom/n5cpu16 11777.74s
--- PASS: tpcc/mixed-headroom/n5cpu16 11756.96s
--- PASS: tpcc/mixed-headroom/n5cpu16 11777.13s
--- PASS: tpcc/mixed-headroom/n5cpu16 11818.02s
--- PASS: tpcc/mixed-headroom/n5cpu16 12017.41s
--- PASS: tpcc/mixed-headroom/n5cpu16 12409.73s
--- PASS: tpcc/mixed-headroom/n5cpu16 12507.03s
PASS

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 2 files at r1, 2 of 3 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @tbg)


pkg/storage/replica_command.go, line 470 at r2 (raw file):

		newDesc := *desc
		newDesc.StickyBit = nil // can use &zero in 20.1

Is it possible to get here before all nodes update to 19.2, given the if (desc.GetStickyBit() == hlc.Timestamp{}) { check?


pkg/storage/bulk/sst_batcher.go, line 250 at r2 (raw file):

				log.Warning(ctx, err)
			} else {
				// NB: Passing 'hour' here is technically illegal until 19.2 is

Does all of the other complexity in this commit go away if we actually do this migration correctly and keep the use of AdminSplitRequest.ExpirationTime as the single migration boundary? It seems pretty easy to plumb a cluster setting into SSTBatcher and add logic like we have in restore.go -> splitAndScatter.

Copy link
Member Author

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)


pkg/storage/replica_command.go, line 470 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Is it possible to get here before all nodes update to 19.2, given the if (desc.GetStickyBit() == hlc.Timestamp{}) { check?

You're right, the early return prevents this. I added a comment instead of using &zero here because both are equivalent and nil is more obviously safe.


pkg/storage/bulk/sst_batcher.go, line 250 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Does all of the other complexity in this commit go away if we actually do this migration correctly and keep the use of AdminSplitRequest.ExpirationTime as the single migration boundary? It seems pretty easy to plumb a cluster setting into SSTBatcher and add logic like we have in restore.go -> splitAndScatter.

ExpirationTime is not encapsulated at all. We have 10 calls to settings.Version.IsActive(cluster.VersionStickyBit) before this PR, all at various callers to AdminSplit. The offending calls were added after the sticky bit was introduced (and don't handle the case in which the sticky bit is not available), simply because it's so hard to tell that there is a migration to worry about.

So yes, if nobody ever issued an incorrect AdminSplit, there wouldn't be a problem (except that I think something was still sometimes using &zero here instead of nil).

I can add the cluster version check here if it tickles your OCD that I omitted it, but I'd rather not make any changes in the storage end of things, because a) I don't want to rely on the callers as outline before, and b) the moment I make a substantial change here I'll have to spend hours re-validating the results, and I don't think that's worth it.

Copy link
Member Author

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)


pkg/storage/replica_command.go, line 74 at r4 (raw file):

		} else if err := detail.ActualValue.GetProto(&actualDesc); err == nil &&
			desc.RangeID == actualDesc.RangeID && !desc.Equal(actualDesc) {
			return fmt.Sprintf("descriptor changed: [expected] %+#v != [actual] %+#v", desc, &actualDesc), true

This is unintentional debugging detritus. Reminder to self to remove

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: if you don't think it's worth cleaning up the migration further.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten and @tbg)


pkg/storage/replica_command.go, line 246 at r4 (raw file):

}

func splitTxnStickyUpdateAttempt(

I think this was assuming that it could always set StickyBit to a non-nil value because it was only called in the if desc.GetStickyBit().Less(args.ExpirationTime) { case, which was only possible if the correct cluster version was enabled. That should have been documented, but that should still be the case.


pkg/storage/bulk/sst_batcher.go, line 250 at r2 (raw file):

Previously, tbg (Tobias Grieger) wrote…

ExpirationTime is not encapsulated at all. We have 10 calls to settings.Version.IsActive(cluster.VersionStickyBit) before this PR, all at various callers to AdminSplit. The offending calls were added after the sticky bit was introduced (and don't handle the case in which the sticky bit is not available), simply because it's so hard to tell that there is a migration to worry about.

So yes, if nobody ever issued an incorrect AdminSplit, there wouldn't be a problem (except that I think something was still sometimes using &zero here instead of nil).

I can add the cluster version check here if it tickles your OCD that I omitted it, but I'd rather not make any changes in the storage end of things, because a) I don't want to rely on the callers as outline before, and b) the moment I make a substantial change here I'll have to spend hours re-validating the results, and I don't think that's worth it.

If you feel comfortable with the degree of validation that this setup for the migration has gotten then I am. It just seemed easier to fix the botched migration with SplitAndScatter than forcing the handling of AdminSplit to need to worry about non-empty ExpirationTime values.

Copy link
Member Author

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bors r=nvanbenschoten

TFTR!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @nvanbenschoten)

@tbg
Copy link
Member Author

tbg commented Sep 6, 2019

flake was DML txn retry in logic test: #40542 (comment)

bors r=nvanbenschoten

@craig
Copy link
Contributor

craig bot commented Sep 6, 2019

Build failed (retrying...)

@tbg
Copy link
Member Author

tbg commented Sep 6, 2019

Bors you liar

bors r=nvanbenschoten

@tbg
Copy link
Member Author

tbg commented Sep 6, 2019

I wonder if this PR crashes bors every time?

bors r=nvanbenschoten

@nvanbenschoten
Copy link
Member

bors r+

@nvanbenschoten
Copy link
Member

Let's try this again.

bors r=nvanbenschoten

craig bot pushed a commit that referenced this pull request Sep 8, 2019
40464: storage: kv.atomic_replication_changes=true r=nvanbenschoten a=tbg

I ran the experiments in

#40370 (comment)

on (essentially) this branch and everything passed.

Going to run another five instances of mixed-headroom and headroom with this
change to shake out anything else that I might've missed.

Release note (general change): atomic replication changes are now enabled by
default.

Co-authored-by: Tobias Schottdorf <tobias.schottdorf@gmail.com>
Before 19.2, callers to AdminSplit are only ever supposed to pass
`Expiration==hlc.Timestamp{}` as this triggers the legacy code path
necessary while nodes in the cluster may not know desc.StickyBit yet. To
avoid CPut problems when those nodes update the range descriptor, we
must not persist non-nil values on it in that case. It looked like we
would sometimes still persist a &zero, which could cause problems.

The bigger problem though was that there were also two callers that
straight-up didn't check the cluster version and passed nonzero values
into AdminSplit. These callers were added recently and I can't blame
anyone there; it is impossible to know that one argument needs to be
zero before 19.2.

Instead of trying to fix this invariant (which wasn't trivial in this
case) just ignore expiration times when the coordinator doesn't think
19.2 is already active. This could lead to sticky bits being ignored
right around the cluster transition, but that seems much better than
risking old nodes not being able to carry out any changes to the
descriptors any more (which is the consequence of writing non-nil
sticky bits before this is safe).

Release note (bug fix): Fix a cluster migration bug that could occur
while running in a mixed 19.1/19.2 cluster. The symptom would be
messages of the form:

```
X at key /Table/54 failed: unexpected value: ...
```

Affected clusters should be updated to 19.2 or, if 19.1 is desired,
recreated from a backup.
Release note (general change): atomic replication changes are now enabled by
default.
@nvanbenschoten
Copy link
Member

Rebased on master and pushed again. Let's see if that helps.

bors r=nvanbenschoten

craig bot pushed a commit that referenced this pull request Sep 9, 2019
40464: storage: kv.atomic_replication_changes=true r=nvanbenschoten a=tbg

I ran the experiments in

#40370 (comment)

on (essentially) this branch and everything passed.

Going to run another five instances of mixed-headroom and headroom with this
change to shake out anything else that I might've missed.

Release note (general change): atomic replication changes are now enabled by
default.

Co-authored-by: Tobias Schottdorf <tobias.schottdorf@gmail.com>
@craig
Copy link
Contributor

craig bot commented Sep 9, 2019

Build succeeded

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants