-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvserver: synchronize replica removal with read-write requests #64471
Conversation
081adfa
to
6830fda
Compare
Seems like the cockroach/pkg/kv/kvserver/replica_destroy.go Lines 184 to 210 in 3357d2e
So holding cockroach/pkg/kv/kvserver/replica_proposal.go Lines 791 to 792 in 5f40d69
This takes out Haven't found any other obvious deadlocks yet, and CI passes, so I'll start going down this path. Feel free to stop me if you know any reason why I shouldn't @tbg @nvanbenschoten. |
1464657
to
e82f4ae
Compare
This should be ready for review now. I ran the |
0eb92f9
to
a20cc3d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 7 of 7 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @erikgrinaker and @nvanbenschoten)
pkg/kv/kvserver/client_relocate_range_test.go, line 376 at r1 (raw file):
// which makes sure an in-flight read operation during replica removal won't // return empty results. func TestReplicaRemovalDuringGet(t *testing.T) {
Does this test also still repro the bug if you taint the fix?
pkg/kv/kvserver/replica_write.go, line 82 at r1 (raw file):
st, err := r.checkExecutionCanProceed(ctx, ba, g) if err != nil { r.readOnlyCmdMu.RUnlock()
How about
needRUnlock := true
defer func() {
if needRUnlock {
r.readOnyCmdMu.RUnlock()
}
}()
// later..
needRUnlock = false
or does that escape (and even then can use singleton pointers to true and false)?
The need to unlock before each return invites errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @tbg)
pkg/kv/kvserver/client_relocate_range_test.go, line 376 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
Does this test also still repro the bug if you taint the fix?
Yep
pkg/kv/kvserver/replica_write.go, line 82 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
How about
needRUnlock := true defer func() { if needRUnlock { r.readOnyCmdMu.RUnlock() } }() // later.. needRUnlock = falseor does that escape (and even then can use singleton pointers to true and false)?
The need to unlock before each return invites errors.
I agree, but we already do this all over the place. Personally I think e.g. utility functions that cover the mutex lifetime (so that we could use defer) would be cleaner. Anyway, I've opened #64459 to review and clean up this locking, I suggest we revisit this then since we may want to make broader changes (e.g. readOnlyCmdMu
is itself a contradiction here).
bors r=tbg |
bors r- Sorry to do this, but I'm midway through a review and there's at least one serious issue here. |
Canceled. |
No, appreciate it -- sorry for being trigger-happy here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 6 of 7 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @erikgrinaker and @tbg)
pkg/kv/kvserver/client_relocate_range_test.go, line 417 at r1 (raw file):
// Perform delayed conditional put during replica removal. This will cause // an ambiguous result error, as outstanding proposals in the leaseholder // replica's proposal queue will be aborted when the replica is removed.
Mind mentioning what this would return if the bug was not fixed here as well?
pkg/kv/kvserver/client_test.go, line 66 at r1 (raw file):
var e roachpb.Value e.SetBytes(expValue) expBytes = e.TagAndDataBytes()
nit: expBytes = roachpb.MakeValueFromBytes(expValue).TagAndDataBytes()
if you want
pkg/kv/kvserver/replica_raft.go, line 205 at r1 (raw file):
} maxLeaseIndex, pErr := r.propose(ctx, proposal, tok.Move(ctx))
Sorry for the false alarm. I was thinking that the propBuf's flushRLocked
path could acquire the raftMu
in cases where the propBuf
hit a size limit, but I see now that it's not grabbing the raftMu
, it's just upgrading a read lock on the Replica.mu
to a write lock. So I think we're ok. Do you mind just adding a reminder in a comment somewhere in this method that nothing at or below this level should acquire the raftMu?
Replica removal did not synchronize with in-flight read-write requests, which could cause them to be evaluated on a removed (empty) replica. The request would not be able to persist any writes, since it's unable to submit Raft proposals. However, it can affect conditional writes, for example causing a `ConditionalPutRequest` to error because it finds a missing value instead of the expected one. This patch fixes the problem by taking out `Replica.readOnlyCmdMu` during pre-Raft evaluation, to synchronize with replica removal. This can cause such requests to return `AmbiguousResultError` as the write is evaluated. Release note (bug fix): Fixed a race condition where read-write requests during replica removal (e.g. during range merges or rebalancing) could be evaluated on the removed replica. These will not have been able to write any data to persistent storage, but could behave unexpectedly, e.g. returning errors that they should not have returned.
a20cc3d
to
c81a2a3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @tbg)
pkg/kv/kvserver/client_test.go, line 66 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
nit:
expBytes = roachpb.MakeValueFromBytes(expValue).TagAndDataBytes()
if you want
Thanks!
pkg/kv/kvserver/replica_raft.go, line 205 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Sorry for the false alarm. I was thinking that the propBuf's
flushRLocked
path could acquire theraftMu
in cases where thepropBuf
hit a size limit, but I see now that it's not grabbing theraftMu
, it's just upgrading a read lock on theReplica.mu
to a write lock. So I think we're ok. Do you mind just adding a reminder in a comment somewhere in this method that nothing at or below this level should acquire the raftMu?
Not at all, thanks for checking! Added a note, good idea.
bors r=tbg,nvanbenschoten The CI failure ( |
Build succeeded: |
Replica removal did not synchronize with in-flight read-write requests,
which could cause them to be evaluated on a removed (empty) replica. The
request would not be able to persist any writes, since it's unable to
submit Raft proposals. However, it can affect conditional writes, for
example causing a
ConditionalPutRequest
to error because it finds amissing value instead of the expected one.
This patch fixes the problem by taking out
Replica.readOnlyCmdMu
during pre-Raft evaluation, to synchronize with replica removal. This
can cause such requests to return
AmbiguousResultError
as the write isevaluated.
Resolves #46329, follow-up from #64324.
Release note (bug fix): Fixed a race condition where read-write requests
during replica removal (e.g. during range merges or rebalancing) could
be evaluated on the removed replica. These will not have been able to
write any data to persistent storage, but could behave unexpectedly,
e.g. returning errors that they should not have returned.