-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[wasm-mt] Remove two-phase suspend; fix state transitions in DS server; fix merge #73305
[wasm-mt] Remove two-phase suspend; fix state transitions in DS server; fix merge #73305
Conversation
Tagging subscribers to 'arch-wasm': @lewing Issue DetailsGrab bag of threading fixes:
|
Screen.Recording.2022-08-03.at.16.09.18.mov |
/azp run runtime-wasm |
Azure Pipelines successfully started running 1 pipeline(s). |
f7dc809
to
0c72fe8
Compare
/azp run runtime-wasm |
Azure Pipelines successfully started running 1 pipeline(s). |
0c72fe8
to
866504e
Compare
/azp run runtime-wasm |
Azure Pipelines successfully started running 1 pipeline(s). |
Wasm/library tests log:
@pavelsavara Is this the one you were fixing? |
AOT:
|
Each call to begin_suspend_request_suspension_cordially may increment the suspend count. But in STW we only resume each thread once. In 6726fae we added a second phase of STW to full coop on WebAssembly in order to suspend the browser thread after all the worker threads have been suspended in order to avoid some deadlocks that rely on the main thread continuing to process async work on behalf of the workers before they reach a safepoint. The problem is that for worker threads we could end up calling begin_suspend_request_suspension_cordially twice. If the thread self-suspends after the first call, the second call will increment the suspend count. As a result, when we restart the world, the thread will decrement its suspend count, but still stay suspended. Worse, on the _next_ stw, we will increment the suspend count two more times and decrement once on the next restart, etc. Eventually the thread will overflow the suspend counter and we will assert `!(suspend_count > 0)`. Also change `THREAD_SUSPEND_COUNT_MAX` to `0x7F` (from `0xFF`) - the suspend count is signed, so the roll-over from 127 to -128 is where we should assert Fixes dotnet#72857
include thread states and thread ids where available
…twice" This reverts commit 92f52ab7ed1cfaa1a4f66e869a8d9404e066f1b2.
Remove mono_threads_platform_stw_defer_initial_suspend The motivation for it in 6726fae was unfounded. There is no need to suspend the main browser thread after the other threads: suspension on wasm uses `sem_wait` which on Emscripten on the main thread is implemented using a busy wait `__timedwait_cp` which processes queued calls. So even if we suspend the main thread first, it will still allow other threads in GC Safe to make progress if they're using syscalls.
…_GC flag The diagnostic server worker spends most of its time in the JS event loop waiting for messages. After we attach to the runtime, we need to switch to GC Safe mode because the diagnostic server may not ever reach a safepoint (for example if no more DS events arrive). Conversely, when we call from JS into the C diagnostic server, we need to enter GC Unsafe mode (and potentially safepoint). Also mark the diagnostic server threads with the NO_GC flag - this thread does not manipulate managed objects so it doesn't need to stop for GC STW.
Mistake from a previous merge
when I repro this locally, I get a stack overflow: https://gist.github.com/lambdageek/8c07263665415982c3b793150a1626b4 (I ran the tests with Checking Update
|
866504e
to
8a8ab6c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know the threading infra well enough to firmly approve the logic changes here, but the diffs all look clean to me and it makes sense
Co-authored-by: Katelyn Gadd <kg@luminance.org>
Co-authored-by: Katelyn Gadd <kg@luminance.org>
/azp run runtime-wasm |
Azure Pipelines successfully started running 1 pipeline(s). |
|
Grab bag of threading fixes:
sem_wait
, it is still processing queued calls from other threads. So there is no need to first suspend the worker threads and then suspend the main thread. The implementation of two-phase suspend had a bug where it would suspend worker threads twice, making the suspend increase by 2. Since resume only decremented the count by 1, this lead to a suspend count overflow. Fixes [wasm-mt] Sampling thread requests resume for a thread with !(suspend_count > 0) #72857cwraps.mono_wasm_event_pipe_enable
due to a mistake in a previous mergebrowser-threads
sample