Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: deal with leaseholders being VOTER_OUTGOING #42074

Closed
ajwerner opened this issue Oct 31, 2019 · 2 comments
Closed

storage: deal with leaseholders being VOTER_OUTGOING #42074

ajwerner opened this issue Oct 31, 2019 · 2 comments
Labels
A-kv-replication Relating to Raft, consensus, and coordination. no-issue-activity X-stale

Comments

@ajwerner
Copy link
Contributor

ajwerner commented Oct 31, 2019

In the fix of #38720 (comment) in #42068 we prevent replicas in VOTER_OUTGOING from becoming the leaseholder in the scenario we saw here and generally in all cooperative lease transfers (TransferLease). This protection is largely due to latching which prevents concurrency between changing replicas and lease transfers.

During lunch with @andreimatei and @nvanbenschoten we theorized that it is still possible for a non-cooperative lease transfer (RequestLease) to leave us in a state where the leaseholder is a replica of type VOTER_OUTGOING.

Imagine the following scenario:

  • Range is {n1, n2, n3} and the leaseholder is n1
  • n2 gets partitioned from n1
  • n1 begins the process of adding n4 and removing n2
  • n1 successfully commits the change that moves the range to {n1, n2:VOTER_OUTGOING, n3, n4:VOTER_INCOMING}
  • n1 becomes partitioned from all of the nodes
  • n2 decides decides it should become the leaseholder and sends a RequestLeaseRequest to itself. n2 evaluates the request locally while it still believes it is VOTER_FULL.
  • n2 becomes the leaseholder while it is VOTER_OUTGOING.

We could try to prevent this case from happening but in practice it doesn't seem so bad for the leaseholder to transiently be a replica in VOTER_OUTGOING. My suggestion on how to remedy this situation is that we do 1) from #38720 (comment) which is to say:

  1. Prevent the VOTER_OUTGOING leaseholder from proposing a command to remove itself.
  2. Add logic for a VOTER_OUTGOING leaseholder to actively transfer the lease in the replicate queue.
@ajwerner ajwerner added the A-kv-replication Relating to Raft, consensus, and coordination. label Oct 31, 2019
@tbg
Copy link
Member

tbg commented Nov 1, 2019

Shouldn't we add below-Raft code that makes sure that lease bounces? I'm suggesting this because if we opted for 1) instead, wouldn't the result be awkward? We'll be in a joint state and unable to leave it, so we're forced to write additional triggers that then transfer the lease away, but it's historically been hard to "enforce" a lease transfer.

@github-actions
Copy link

github-actions bot commented Jun 4, 2021

We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it in
5 days to keep the issue queue tidy. Thank you for your contribution
to CockroachDB!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-replication Relating to Raft, consensus, and coordination. no-issue-activity X-stale
Projects
None yet
Development

No branches or pull requests

2 participants