release-20.2: kv: set destroy status before destroying data on subsumed replicas #55691

nvanbenschoten · 2020-10-19T15:18:28Z

Backport 2/2 commits from #55477.

/cc @cockroachdb/release

I'm not planning on merging this until v20.2.1, but getting the backport out now.

This PR cleans up some handling around replica destruction that scared me when working on #46329 and #55293. Specifically, there were cases during merges where the destroy status on a replica would not be set until after that replicas data was destroyed. This was true of merges applied through entry application, although in those cases, no user data was removed so it seems mostly harmless. More concerning is that is was true of merges applied through snapshot application. This was extra concerning because snapshot application can delete user data if a subsumed range is only partially covered (see clearSubsumedReplicaDiskData). So we were hypothetically risking user-visible correctness problems, especially now that we allow follower reads through on subsumed ranges as of #51594.

This PR patches up these issues and then adds a few extra assertions that enforce stricter preconditions for the functions at play during replica destruction. Specifically, we now assert that the destroy status is set to destroyReasonRemoved before calling preDestroyRaftMuLocked. We also assert that if RemoveOptions.DestroyData is false when passed to removeInitializedReplicaRaftMuLocked, then the destroy status is also set.

This unification allows us to remove the ignoreDestroyStatus flag on RemoveOptions, whose meaning is now exactly the inverse of DestroyData. In hindsight, pushing on why this extra flag was needed and what new information it was conveying to the function could have potentially caught these issues a little earlier.

I think we'll want to backport this to v20.2, though probably in the first patch release because there is some risk here (especially without sufficient time to bake on master) and we aren't aware of seeing any correctness issues from the combination of the bug fixed here and #51594.

Release note (bug fix): A hypothesized bug that could allow a follower read to miss data on a range in the middle of being merged away into its left-hand neighbor was fixed. The bug seems exceedingly unlikely to have materialized in practice.

cockroach-teamcity · 2020-10-19T15:18:38Z

This change is

ajwerner

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner)

The TODO discussed a merge followed by a split and alluded to an optimization that could be used to avoid a snapshot. This doesn't make sense to me. I think that's because the comment was written back in a time where a range merged into and then back out of its left hand neighbor would be the same range with the same ID. This is not the case anymore, so I don't see how we could ever (reasonably) avoid a snapshot in such a case.

This PR cleans up some handling around replica destruction that scared me when working on cockroachdb#46329 and cockroachdb#55293. Specifically, there were cases during merges where the destroy status on a replica would not be set until after that replicas data was destroyed. This was true of merges applied through entry application, although in those cases, no user data was removed so it seems mostly harmless. More concerning is that is was true of merges applied through snapshot application. This was extra concerning because snapshot application can delete user data if a subsumed range is only partially covered (see `clearSubsumedReplicaDiskData`). So we were hypothetically risking user-visible correctness problems, especially now that we allow follower reads through on subsumed ranges as of cockroachdb#51594. This PR patches up these issues and then adds a few extra assertions that enforce stricter preconditions for the functions at play during replica destruction. Specifically, we now assert that the destroy status is set to `destroyReasonRemoved` _before_ calling `preDestroyRaftMuLocked`. We also assert that if `RemoveOptions.DestroyData` is false when passed to `removeInitializedReplicaRaftMuLocked`, then the destroy status is also set. This unification allows us to remove the `ignoreDestroyStatus` flag on `RemoveOptions`, whose meaning is now exactly the inverse of `DestroyData`. In hindsight, pushing on why this extra flag was needed and what new information it was conveying to the function could have potentially caught these issues a little earlier. I think we'll want to backport this to v20.2, though probably in the first patch release because there is some risk here (especially without sufficient time to bake on master) and we aren't aware of seeing any correctness issues from the combination of the bug fixed here and cockroachdb#51594. Release note (bug fix): A hypothesized bug that could allow a follower read to miss data on a range in the middle of being merged away into its left-hand neighbor was fixed. The bug seems exceedingly unlikely to have materialized in practice.

nvanbenschoten requested a review from ajwerner October 19, 2020 15:18

ajwerner approved these changes Oct 21, 2020

View reviewed changes

nvanbenschoten added 2 commits November 12, 2020 12:51

nvanbenschoten force-pushed the backport20.2-55477 branch from b151349 to 242d026 Compare November 12, 2020 17:52

nvanbenschoten merged commit 0884565 into cockroachdb:release-20.2 Nov 12, 2020

nvanbenschoten deleted the backport20.2-55477 branch November 13, 2020 19:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

release-20.2: kv: set destroy status before destroying data on subsumed replicas #55691

release-20.2: kv: set destroy status before destroying data on subsumed replicas #55691

nvanbenschoten commented Oct 19, 2020

cockroach-teamcity commented Oct 19, 2020

ajwerner left a comment

release-20.2: kv: set destroy status before destroying data on subsumed replicas #55691

release-20.2: kv: set destroy status before destroying data on subsumed replicas #55691

Conversation

nvanbenschoten commented Oct 19, 2020

cockroach-teamcity commented Oct 19, 2020

ajwerner left a comment

Choose a reason for hiding this comment