Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: improve locking around replica destruction #64459

Open
erikgrinaker opened this issue Apr 30, 2021 · 1 comment
Open

kvserver: improve locking around replica destruction #64459

erikgrinaker opened this issue Apr 30, 2021 · 1 comment
Labels
A-kv-replication Relating to Raft, consensus, and coordination. C-cleanup Tech debt, refactors, loose ends, etc. Solution not expected to significantly change behavior.

Comments

@erikgrinaker
Copy link
Contributor

erikgrinaker commented Apr 30, 2021

The Replica.mu.destroyStatus field is used to signal atomic replica destruction. To update this, we often need to take out a bunch of locks -- typically raftMu, readOnlyCmdMu, and mu -- in a specific order:

// Locking notes: To avoid deadlocks, the following lock order must be
// obeyed: baseQueue.mu < Replica.raftMu < Replica.readOnlyCmdMu < Store.mu
// < Replica.mu < Replica.unreachablesMu < Store.coalescedMu < Store.scheduler.mu.
// (It is not required to acquire every lock in sequence, but when multiple
// locks are held at the same time, it is incorrect to acquire a lock with
// "lesser" value in this sequence after one with "greater" value).

This is tedious and error-prone. We should revisit this to see if it can be improved, at least in terms of ergonomics and safety. We should also use convenience methods on the replica to interact with destroyStatus, named by the locks that need to be held (e.g. Replica.destroyRaftMuReadOnlyCmdMuMuLocked 😐), and assert that the proper locks are in fact held with Mutex.AssertHeld() or Mutex.AssertRHeld().

See also #64324 for some discussion.

Jira issue: CRDB-7066

@erikgrinaker erikgrinaker added C-cleanup Tech debt, refactors, loose ends, etc. Solution not expected to significantly change behavior. A-kv-replication Relating to Raft, consensus, and coordination. labels Apr 30, 2021
@erikgrinaker erikgrinaker self-assigned this Apr 30, 2021
@erikgrinaker erikgrinaker removed their assignment Sep 8, 2021
@github-actions
Copy link

We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it in
10 days to keep the issue queue tidy. Thank you for your contribution
to CockroachDB!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-replication Relating to Raft, consensus, and coordination. C-cleanup Tech debt, refactors, loose ends, etc. Solution not expected to significantly change behavior.
Projects
None yet
Development

No branches or pull requests

1 participant