Fixed processor dispose race condition #135
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fix race condition in RequestResponse (with processor Dispose())
Motivation:
I caught this bug while stress-testing RSocket-based Client/Server implementation in Facebook Thrift: https://github.com/facebook/fbthrift/blob/main/thrift/lib/go/thrift/stress/server_test.go
It's a pretty simple stress test - just 100K concurrent RequestResponse's to 1 server.
At most 100 concurrent RSocket connections at a time, a fresh connection is created for each request.
The symptom of the stress-test failure was that a small portion of requests would fail with
use of closed network connection
error.This is a race condition between two clients and it goes like this:
RequestResponse
type of request here:rsocket-go/internal/socket/duplex.go
Line 247 in 473989b
mono
processor is created by pulling it from a global processor pool:rsocket-go/internal/socket/duplex.go
Line 266 in 473989b
reactor-go
repo.)sink
field:rsocket-go/internal/socket/duplex.go
Lines 267 to 269 in 473989b
onFinally
callback (since the Stream Sequence is now complete):rsocket-go/internal/socket/duplex.go
Lines 257 to 264 in 473989b
rsocket-go/internal/socket/duplex.go
Line 259 in 473989b
rsocket-go/internal/socket/duplex.go
Line 263 in 473989b
a. This Go-routine creates a completely separate RSocket client to make a separate RequestResponse.
b. This RSocket client happens to get the same sink/processor (that we just disposed earlier) from the global pool.
Close()
on our original client from earlier steps (since the RequestResponse sequence had already been completed).a. A
destroyHandler
method gets invoked:rsocket-go/internal/socket/duplex.go
Lines 133 to 138 in 473989b
rsocket-go/internal/socket/duplex.go
Line 163 in 473989b
b. It in turn invokes
stopWithError
method of our handler (which we did not yet unregister because we got pre-emptied in step 8):rsocket-go/internal/socket/callback.go
Lines 33 to 36 in 473989b
c. However, the
sink
is already being used by another client. We are sending Error to a completely unrelated client!!! Race condition!d. The other (unrelated) client gets a false-positive error that the socket is closed!
Close()
executes, the Go routine from step 8 is scheduled and is finally able to unregister the handler - but it's too late - the race condition already occurred.Modifications/Fix:
Correctly ordered the relevant operations to avoid the race condition:
sink
(i.e. processor) first.Result:
The stress test succeeds after this change.