Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release-20.2: kv: set destroy status before destroying data on subsumed replicas #55691

Merged

Conversation

nvanbenschoten
Copy link
Member

Backport 2/2 commits from #55477.

/cc @cockroachdb/release

I'm not planning on merging this until v20.2.1, but getting the backport out now.


This PR cleans up some handling around replica destruction that scared me when working on #46329 and #55293. Specifically, there were cases during merges where the destroy status on a replica would not be set until after that replicas data was destroyed. This was true of merges applied through entry application, although in those cases, no user data was removed so it seems mostly harmless. More concerning is that is was true of merges applied through snapshot application. This was extra concerning because snapshot application can delete user data if a subsumed range is only partially covered (see clearSubsumedReplicaDiskData). So we were hypothetically risking user-visible correctness problems, especially now that we allow follower reads through on subsumed ranges as of #51594.

This PR patches up these issues and then adds a few extra assertions that enforce stricter preconditions for the functions at play during replica destruction. Specifically, we now assert that the destroy status is set to destroyReasonRemoved before calling preDestroyRaftMuLocked. We also assert that if RemoveOptions.DestroyData is false when passed to removeInitializedReplicaRaftMuLocked, then the destroy status is also set.

This unification allows us to remove the ignoreDestroyStatus flag on RemoveOptions, whose meaning is now exactly the inverse of DestroyData. In hindsight, pushing on why this extra flag was needed and what new information it was conveying to the function could have potentially caught these issues a little earlier.

I think we'll want to backport this to v20.2, though probably in the first patch release because there is some risk here (especially without sufficient time to bake on master) and we aren't aware of seeing any correctness issues from the combination of the bug fixed here and #51594.

Release note (bug fix): A hypothesized bug that could allow a follower read to miss data on a range in the middle of being merged away into its left-hand neighbor was fixed. The bug seems exceedingly unlikely to have materialized in practice.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner)

The TODO discussed a merge followed by a split and alluded to an
optimization that could be used to avoid a snapshot. This doesn't make
sense to me. I think that's because the comment was written back in a
time where a range merged into and then back out of its left hand
neighbor would be the same range with the same ID. This is not the case
anymore, so I don't see how we could ever (reasonably) avoid a snapshot
in such a case.
This PR cleans up some handling around replica destruction that scared me
when working on cockroachdb#46329 and cockroachdb#55293. Specifically, there were cases during
merges where the destroy status on a replica would not be set until after
that replicas data was destroyed. This was true of merges applied through
entry application, although in those cases, no user data was removed so it
seems mostly harmless. More concerning is that is was true of merges applied
through snapshot application. This was extra concerning because snapshot
application can delete user data if a subsumed range is only partially
covered (see `clearSubsumedReplicaDiskData`). So we were hypothetically
risking user-visible correctness problems, especially now that we allow
follower reads through on subsumed ranges as of cockroachdb#51594.

This PR patches up these issues and then adds a few extra assertions that
enforce stricter preconditions for the functions at play during replica
destruction. Specifically, we now assert that the destroy status is set to
`destroyReasonRemoved` _before_ calling `preDestroyRaftMuLocked`. We also
assert that if `RemoveOptions.DestroyData` is false when passed to
`removeInitializedReplicaRaftMuLocked`, then the destroy status is also set.

This unification allows us to remove the `ignoreDestroyStatus` flag on
`RemoveOptions`, whose meaning is now exactly the inverse of `DestroyData`.
In hindsight, pushing on why this extra flag was needed and what new
information it was conveying to the function could have potentially caught
these issues a little earlier.

I think we'll want to backport this to v20.2, though probably in the first
patch release because there is some risk here (especially without sufficient
time to bake on master) and we aren't aware of seeing any correctness issues
from the combination of the bug fixed here and cockroachdb#51594.

Release note (bug fix): A hypothesized bug that could allow a follower
read to miss data on a range in the middle of being merged away into its
left-hand neighbor was fixed. The bug seems exceedingly unlikely to have
materialized in practice.
@nvanbenschoten nvanbenschoten merged commit 0884565 into cockroachdb:release-20.2 Nov 12, 2020
@nvanbenschoten nvanbenschoten deleted the backport20.2-55477 branch November 13, 2020 19:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants