You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the fix of #38720 (comment) in #42068 we prevent replicas in VOTER_OUTGOING from becoming the leaseholder in the scenario we saw here and generally in all cooperative lease transfers (TransferLease). This protection is largely due to latching which prevents concurrency between changing replicas and lease transfers.
During lunch with @andreimatei and @nvanbenschoten we theorized that it is still possible for a non-cooperative lease transfer (RequestLease) to leave us in a state where the leaseholder is a replica of type VOTER_OUTGOING.
Imagine the following scenario:
Range is {n1, n2, n3} and the leaseholder is n1
n2 gets partitioned from n1
n1 begins the process of adding n4 and removing n2
n1 successfully commits the change that moves the range to {n1, n2:VOTER_OUTGOING, n3, n4:VOTER_INCOMING}
n1 becomes partitioned from all of the nodes
n2 decides decides it should become the leaseholder and sends a RequestLeaseRequest to itself. n2 evaluates the request locally while it still believes it is VOTER_FULL.
n2 becomes the leaseholder while it is VOTER_OUTGOING.
We could try to prevent this case from happening but in practice it doesn't seem so bad for the leaseholder to transiently be a replica in VOTER_OUTGOING. My suggestion on how to remedy this situation is that we do 1) from #38720 (comment) which is to say:
Prevent the VOTER_OUTGOING leaseholder from proposing a command to remove itself.
Add logic for a VOTER_OUTGOING leaseholder to actively transfer the lease in the replicate queue.
The text was updated successfully, but these errors were encountered:
Shouldn't we add below-Raft code that makes sure that lease bounces? I'm suggesting this because if we opted for 1) instead, wouldn't the result be awkward? We'll be in a joint state and unable to leave it, so we're forced to write additional triggers that then transfer the lease away, but it's historically been hard to "enforce" a lease transfer.
We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it in
5 days to keep the issue queue tidy. Thank you for your contribution
to CockroachDB!
In the fix of #38720 (comment) in #42068 we prevent replicas in VOTER_OUTGOING from becoming the leaseholder in the scenario we saw here and generally in all cooperative lease transfers (TransferLease). This protection is largely due to latching which prevents concurrency between changing replicas and lease transfers.
During lunch with @andreimatei and @nvanbenschoten we theorized that it is still possible for a non-cooperative lease transfer (RequestLease) to leave us in a state where the leaseholder is a replica of type VOTER_OUTGOING.
Imagine the following scenario:
We could try to prevent this case from happening but in practice it doesn't seem so bad for the leaseholder to transiently be a replica in VOTER_OUTGOING. My suggestion on how to remedy this situation is that we do 1) from #38720 (comment) which is to say:
The text was updated successfully, but these errors were encountered: