Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle thread exit #1869

Merged

Conversation

eloparco
Copy link
Contributor

@eloparco eloparco commented Jan 6, 2023

Implementation of what's described here: WebAssembly/wasi-threads#17:

  • a trap or proc_exit in the main thread should stop all the threads in the process
  • a thread created with wasi_thread_spawn that calls proc_exit should terminate all the threads in the process

The sample has been modified to test combinations of trap or proc_exit for termination and termination forced in the main thread or a spawn thread.

Part of #1790.

@eloparco eloparco force-pushed the eloparco/wasi-thread-exit branch 2 times, most recently from 6c6cd45 to b5b142a Compare January 6, 2023 17:30
core/iwasm/interpreter/wasm_runtime.c Outdated Show resolved Hide resolved
core/iwasm/common/wasm_application.c Outdated Show resolved Hide resolved
@eloparco eloparco force-pushed the eloparco/wasi-thread-exit branch 2 times, most recently from cf0c05d to 7cc4fc5 Compare January 10, 2023 09:57
@eloparco eloparco requested a review from wenyongh January 10, 2023 10:14
Copy link
Contributor

@wenyongh wenyongh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -27,7 +30,7 @@ typedef struct {
void
run_long_task()
{
// Busy waiting to be interruptible by exception
// Busy waiting to be interruptible by trap or `proc_exit`
for (int i = 0; i < TIMEOUT_SECONDS; i++)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wenyongh btw I notice something while testing: if I replace this for loop with a __builtin_wasm_memory_atomic_wait32(0, 0, -1); or even while(1); the thread doesn't get interrupted even if the exception is propagated (in the first case it gets stuck here https://github.com/bytecodealliance/wasm-micro-runtime/blob/main/core/shared/platform/common/posix/posix_thread.c#L192). Is that expected? I'll investigate to understand what's happening

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's what i expect to happen with the current implementation.

to terminate busy looping threads, something like wasm_cluster_thread_send_signal would work.
i guess we eventually need to implement something (eg. send a signal on posix-like platforms) to terminate the blocking threads.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wenyongh btw I notice something while testing: if I replace this for loop with a __builtin_wasm_memory_atomic_wait32(0, 0, -1); or even while(1); the thread doesn't get interrupted even if the exception is propagated (in the first case it gets stuck here https://github.com/bytecodealliance/wasm-micro-runtime/blob/main/core/shared/platform/common/posix/posix_thread.c#L192). Is that expected? I'll investigate to understand what's happening

Normally runtime should terminate all threads when exception was thrown, I remember it is the expected behavior of Java VM. Seems it is an issue of the main branch, I try to fix it with the patch below for interpreter. And another issue of main branch may be that the main thread doesn't spread exception to other thread when exception was thrown, like the change of this PR in wasm_runtime_call_wasm. I will discuss with @xujuntwt95329 for how to fix them.

diff --git a/core/iwasm/interpreter/wasm_interp_fast.c b/core/iwasm/interpreter/wasm_interp_fast.c
index 3109b0c8..36863bf2 100644
--- a/core/iwasm/interpreter/wasm_interp_fast.c
+++ b/core/iwasm/interpreter/wasm_interp_fast.c
@@ -1054,6 +1054,8 @@ wasm_interp_call_func_import(WASMModuleInstance *module_inst,
 #if WASM_ENABLE_THREAD_MGR != 0
 #define CHECK_SUSPEND_FLAGS()                           \
     do {                                                \
+        if (wasm_get_exception(module))                 \
+            goto got_exception;                         \
         if (exec_env->suspend_flags.flags != 0) {       \
             if (exec_env->suspend_flags.flags & 0x01) { \

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eloparco I discussed the issues with @xujuntwt95329, he agreed that they are issues and will submit patch to fix them on main branch. Could you wait until we fix them and merge branch main into dev/wasi_threads, so as to sync up the code and avoid conflicts in the future?

Copy link
Contributor Author

@eloparco eloparco Jan 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as proposed by @yamt, I tried adding a wasm_cluster_send_signal_all(cluster, WAMR_SIG_TERM); after this line that spreads the exception

traverse_list(&cluster->exec_env_list, set_exception_visitor, exec_env);
and it solves in case of a while(1); but not for other cases like sleep(TIMEOUT_SECONDS); or __builtin_wasm_memory_atomic_wait32(0, 0, -1);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@loganek Thanks for the review, I changed the wasm_runtime_deinstantiate_internal in thread_manager_start_routine. But for the wasm_cluster_add_exec_env, not sure why ret can be removed.
I pushed the PR, see: c714189

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But for the wasm_cluster_add_exec_env, not sure why ret can be removed.

It can be replaced with return statements (we no longer have a lock to unlock). As I said, it's not a big deal though. The rest looks ok for me.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @yamt that we may need some more complex mechanism to terminate the threads when they are blocked in kernel.

@xujuntwt95329 / @wenyongh does anybody on your side actively working on that? If not, we'll take on that work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@loganek not yet

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok thanks, we'll look into that then.

@eloparco eloparco force-pushed the eloparco/wasi-thread-exit branch 5 times, most recently from c4adc3a to e355799 Compare January 11, 2023 15:44
#if WASM_ENABLE_LIBC_WASI != 0
/* With WASI threads, proc_exit exception is not cleared because it has to
* be detected and propagated to the other threads in the process */
#if WASM_ENABLE_LIBC_WASI != 0 && WASM_ENABLE_LIB_WASI_THREADS == 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems lib-pthread also need to spread the proc exit exception, I'll modify this line to #if WASM_ENABLE_LIBC_WASI != 0 && WASM_ENABLE_THREAD_MGR == 0 in main branch

wenyongh pushed a commit that referenced this pull request Jan 20, 2023
And refactor clear_wasi_proc_exit_exception, refer to
#1869
@eloparco eloparco force-pushed the eloparco/wasi-thread-exit branch from e355799 to c2a3961 Compare January 20, 2023 14:18
@eloparco eloparco force-pushed the eloparco/wasi-thread-exit branch from c2a3961 to 80b0257 Compare January 20, 2023 14:23
@eloparco
Copy link
Contributor Author

Rebased onto dev/wasi_threads, so now the only difference wrt dev/wasi_threads is the example for trap/proc_exit propagation.
I was trying the different combination of TEST_TERMINATION_BY_TRAP and TEST_TERMINATION_IN_MAIN_THREAD and with this config

#define TEST_TERMINATION_BY_TRAP 1 // Otherwise test `proc_exit` termination
#define TEST_TERMINATION_IN_MAIN_THREAD 0

once I got

./iwasm -v=5 wasm-apps/thread_termination.wasm
ninja: no work to do.
[05:30:26:276 - 102E6C580]: Load type section success.

[05:30:26:298 - 102E6C580]: Load import section success.

[05:30:26:332 - 102E6C580]: Load function section success.

[05:30:26:335 - 102E6C580]: Load table section success.

[05:30:26:337 - 102E6C580]: Load memory section success.

[05:30:26:340 - 102E6C580]: Load global section success.

[05:30:26:344 - 102E6C580]: Load export section success.

[05:30:26:346 - 102E6C580]: Load start section success.

[05:30:26:349 - 102E6C580]: Load table segment section success.

[05:30:26:351 - 102E6C580]: Load datacount section success.

[05:30:26:353 - 102E6C580]: Load code segment section success.

[05:30:26:356 - 102E6C580]: Load data segment section success.

[05:30:26:358 - 102E6C580]: Ignore custom section [.debug_info].
[05:30:26:360 - 102E6C580]: Ignore custom section [.debug_loc].
[05:30:26:361 - 102E6C580]: Ignore custom section [.debug_ranges].
[05:30:26:363 - 102E6C580]: Ignore custom section [.debug_aranges].
[05:30:26:364 - 102E6C580]: Ignore custom section [.debug_abbrev].
[05:30:26:365 - 102E6C580]: Ignore custom section [.debug_line].
[05:30:26:367 - 102E6C580]: Ignore custom section [.debug_str].
[05:30:26:384 - 102E6C580]: Ignore custom section [name].
[05:30:26:389 - 102E6C580]: Ignore custom section [producers].
[05:30:26:391 - 102E6C580]: Ignore custom section [target_features].
[05:30:26:393 - 102E6C580]: Found aux __heap_base global, value: 38560
[05:30:26:394 - 102E6C580]: Found aux __data_end global, value: 5788
[05:30:26:396 - 102E6C580]: Found aux stack top global, value: 38560, global index: 0, stack size: 32768
[05:30:26:399 - 102E6C580]: Found malloc function, name: malloc, index: 19
[05:30:26:401 - 102E6C580]: Found free function, name: free, index: 21
[05:30:26:444 - 102E6C580]: wasm-micro-runtime/core/iwasm/interpreter/wasm_loader.c, line 715, can not find an export 0 named _initialize in the module 
[05:30:26:450 - 102E6C580]: Load module success.

[05:30:26:456 - 102E6C580]: Memory instantiate:
[05:30:26:458 - 102E6C580]:   page bytes: 65536, init pages: 1, max pages: 30
[05:30:26:459 - 102E6C580]:   heap offset: 65536, heap size: 0

[05:30:26:673 - 102E6C580]: Memory instantiate success.
Thread running
Waiting before terminating
Main thread running
Thread running


Assertion failed: false && "Unreachable" (wasm-micro-runtime/samples/wasi-threads/wasm-apps/thread_termination.c: __wasi_thread_start_C: 61)
[05:30:36:007 - 16D3AF000]: wasm-micro-runtime/core/iwasm/interpreter/wasm_interp_fast.c, line 3970, meet an exception Exception: unreachable
[05:30:36:570 - 16D39B000]: wasm-micro-runtime/core/iwasm/interpreter/wasm_interp_fast.c, line 3970, meet an exception Exception: unreachable
[05:30:36:541 - 102E6C580]: wasm-micro-runtime/core/iwasm/interpreter/wasm_interp_fast.c, line 3970, meet an exception Exception: unreachable

i.e. the example is failing bacause the trap generate in one of the spawn threads is not progated within TIMEOUT_SECONDS (10) seconds.

@eloparco
Copy link
Contributor Author

We can probably merge this PR and I'll open a separate issue to track the possible race condition that I documented above (and I've never been able to reproduce again).

@wenyongh wenyongh merged commit 9cf55f9 into bytecodealliance:dev/wasi_threads Jan 23, 2023
victoryang00 pushed a commit to victoryang00/wamr-aot-gc-checkpoint-restore that referenced this pull request May 27, 2024
victoryang00 pushed a commit to victoryang00/wamr-aot-gc-checkpoint-restore that referenced this pull request May 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants