Concurrency bugs in mpsc's stream and shared mode #94518
Labels
A-concurrency
Area: Concurrency
C-bug
Category: This is a bug.
T-libs
Relevant to the library team, which will review and decide on the PR/issue.
Case 1
If a reader wait on
recv_timeout()
wakeups because of timeout, that reader will first callabort_selection()
to clear the signal_token, then do anothertry_recv()
, and return whatevertry_recv()
got. Howerver if asend()
operation happen after the reader wakeup and finish before thetry_recv()
got to execute, thetry_recv()
will return the data from thatsend()
operation, and the next call torecv()
will panic.To reproduce the problem, we have to arrange the
send()
happens exactly after therecv_timeout()
wakeups, and finishes before therecv_timeout()
returns. That can be done with a debugger's help.The
recv()
panics because in thedecrement()
, it founds that cnt(2) is greater than steals(1), which means the queue is not empty, and a nexttry_recv()
is guaranteed to return som data. Howerver the queue is actually empty, this inconsistency cause code goes into an impossible path, and panics.The relation between cnt and steals seems like this: if a sender send one element successfully, it must increase the cnt; if a reader receive an element successfully, it either increase the steals or decrease the cnt, if receive failed, it should not modify or undo the modification to steals and cnt. Thus we are guaranteed when cnt <= steals, it means the queue is definitely empty, and if cnt > steals, the queue is definitely non-empty.
Every
recv
rust/library/std/src/sync/mpsc/shared.rs
Lines 218 to 245 in 293b8f2
try_recv()
twice:If the first
try_recv
success, it will increase the steals; and if not, it willdecrease
rust/library/std/src/sync/mpsc/shared.rs
Line 261 in 293b8f2
try_recv
again. Normally when reader is waked up, that mean some elment has been sent, so the nexttry_recv()
will return an element successfully; but when reader is waked up because of timeout, the nexttry_recv()
will fail to get an element, thus it has torust/library/std/src/sync/mpsc/shared.rs
Line 462 in 293b8f2
abort_selection
.The bug hides in here, in fact, if a
send
happend after the wakeup and finish before the nexttry_recv
, the next try_recv will succeed! But in that case, neither the steal is increased nor the cnt is decreased, the consistency is breaked. The nextrecv
will discover this inconsistency, and panics.fix
This problem can be fixed by adding a return after
abort_selection
rust/library/std/src/sync/mpsc/shared.rs
Line 231 in 293b8f2
try_recv
, we can return an Err(Timeout) directly.case 2
This problem also need to reproduce under a debugger.
This assert
rust/library/std/src/sync/mpsc/shared.rs
Lines 462 to 467 in 293b8f2
abort_selection
caused the panic. It assumes that if the reader sees DISCONNECTED(which means all senders are dropped), the signal token (to_wake
) must has been cleared by the destructor. Which is not true, for indrop_chan
(destructor of sender side), theto_wake
was cleared after cnt was set to DISCONNECTED,rust/library/std/src/sync/mpsc/shared.rs
Lines 375 to 383 in 293b8f2
if the reader's load of cnt and to_wake come after L375, but finish before L377, the reader's assertion won't hold.
fix
The fix would be simply remove this assertion. Since once the cnt switch to DISCONNECTED(means all sender are dropped), the recv operation afterwards will not wait for sender anymore, reader doesn't need to worry about the sender wakeup a mismatched signal token like when reader discoverd that cnt >=0.
rust/library/std/src/sync/mpsc/shared.rs
Lines 470 to 475 in 293b8f2
note
For the mpsc's stream mode's code is basically the same as shared mode, the stream mode suffers from these problems too.
rust version:
rustc 1.58.1 (db9d1b20b 2022-01-20)
The text was updated successfully, but these errors were encountered: