-
Notifications
You must be signed in to change notification settings - Fork 9.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix potential deadlock between Revoke and (Grant or Checkpoint) #14080
Merged
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why Revoke doesn't follow
Grant
andCheckpoint
in executing all changes under lock? If we look into code of similar caching structs (manages both in memory and bbolt representation of state) likeRaftCluster
, it does everything including backend changes under lock.Do you know why on line 337 we unlock
le.mu
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is the following transaction (which deletes the lease and all keys attached to the lease) may be time consuming if there are lots of keys being attached to the lease. Actually there is no need to acquire the le.mu during the backend transaction.
But both
Grant
andCheckpoint
follow the patternlease operation --> backend db operation --> lease operation
, so for simplicity, they just hold thele.mu
for the whole process.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that there is a chance of deadlock here, however I'm worried about removing lease from leaseMap and releasing lock before db operation is executed. This might be totally unfounded worry as backend operation as flushed only once 5 seconds, however I need to think a little on it. Have you thought on it more?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be OK. The
le.mu
is used to protect the data structures insidelessor
, and the backend transaction is protected by separate transaction lock. Just as I mentioned previously, there is no reason to hold thele.mu
during the transaction.Please note that during
Grant
andCheckpoint
, lessor may also persist the lease, but both of them will require the transaction lock before persisting the lease data. So it's safe.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my understanding,
before the change
There is a potential deadlock for sure
Why not we move P2 step 4 to happen after step 5? Namely
delete(le.leaseMap, id)
aftertxn.End()
which releases the backend transaction.The benefit of this idea is the rangeDeleter will eveually call lessor.Detach() to look up leaseID in
lessor.leaseMap
. So the order is preserved and won't cause the issue that #16035 is trying to fix.