-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix a race condition in pthread_mutex_timedlock.c #12245
Conversation
I'm not sure turning all pthread_mutex_lock calls into busy loops in an acceptable solution. There could be threads that wait for long periods on locks, no? |
Is this bug that effect musl in general or just emscripten? |
Also does the the same rice condition apply to timelock calls that do have timeout? What if I call timedlock with a 30 minute timeout? I could hit the race condition and end up waiting 30 minutes, right? |
I wonder if PR #10524 could also help resolve these race conditions? Since it ran the entire Open POSIX Test Suite in WebAssembly. |
I also wonder if this a real bug in musl if it might have been fixed upstream already? |
@kleisauke Good idea, but it doesn't look like #10524 can help here - it fixes other issues like thread cancellation, but not core mutex operations or our proxying logic. @sbc100 I did look at upstream musl, and the code has not changed significantly, so it's not fixed upstream AFAICT. I don't know if this only affects us or musl in general (it would be incredibly hard to test a native build of musl in a reliable enough way on a variant of #12258!). Overall I think the diagnoses in these three PRs is incorrect as has been pointed out. However, they are all necessary to fix #12258, and they each definitely fix a specific deadlock I encountered while debugging that testcase. So I guess we need to debug those three deadlocks more. I am a little unsure how best to do that though - how can I debug whether The only good news here is that this is likely not urgent, as these corner cases are very hard to hit. They are also all quite old, so I don't think we have any recent regression here. |
I have found the actual cause here, and will open a refactoring PR and then a fix PR shortly. |
pthread_mutex_lock
gives no time when it calls this function. In that casewe don't loop, we just do a single wait forever. In a rare race condition,
the condition we care about may be set right before the wait, and the wait
does not know to look for it (it looks for a wake event, it doesn't read the
memory to check the value).
Instead, just busy-wait. As this is for
pthread_mutex_lock
, the normal usecase is probably something that needs to be fast anyhow.