-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exception Handling - exception on the wrong thread when using pthreads #12035
Comments
According to #11233, exception handling is not current thread safe. So what you are seeing makes sense. Also according to #11233, it seems that it was considered not worth fixing. However, if you would like to work on it, its certainly something that we could accept fixes for. Since you are at least the third person to run into this issue I think we should disable exception handling when threading is enabled. There is clearly no way it works today. |
We had read all of #11233 and the thread safety concerned us and we figured we were running into that very issue. Disabling exceptions doesn't really help us as the code will still be broken without the exceptions being thrown and caught. What we don't have a good feel for is what to do next. Is the native exception handling (-fwasm-exceptions) going to be any time soon (it has been 3 months since the "several more months" comment in 11233) and we should wait/help with that rather than spending time on the current exception handling? sbc100 mentioned here and in 11233 about accepting fixes to make the current exception handling thread safe in the meantime, but is it worth that effort if it is about to be replaced? |
I updated that issue, now - #11518 fixed most of the known problem there. If you still see an issue, maybe you are hitting the specific corner case mentioned in the comment in the source there that is not handled yet? If so, fixing that specific issue may be worth considering. cc @aheejin for the status of |
We had seen #11518 and waited for EMSDK 2.0.0 in hopes that it would fix the issue we were having, but alas, it has not.
While we have 3 pthreads, the one pthread is the primary thread and the only thread that is really using the exception handling. Therefore, we wouldn't have the same exception object being thrown on multiple threads - it might be thrown multiple times but it would always be on that primary pthread. Therefore, I wouldn't think the comment would apply, but perhaps I'm misreading it. However, it does bring up a separate question. We have cases where an exception is thrown and in the catch clause, something is done and then the exception is thrown again (and caught again). Is throwing an exception from a catch block a safe operation? Would it matter if it was a new exception or the same/caught exception? Would it matter if we saved a reference to the exception (or the fact that we needed an exception and created a new exception), got out of the catch block, and then rethrow the exception? |
I think those are safe, but I'm not enough of a C++ exceptions expert to know for sure. A quick thing you can do is build with In general it sounds like you may be hitting undefined behavior or a unknown bug. It might be good to create a small standalone testcase for investigation, could be an easy fix given that. |
Wasm native exception handling, enabled in clang by That being said, but, I'd still like to know what the full error message in clang you encountered was (if the source code is not available), and the version of toolchain you are using, just in case I can get any info from that. If the error message is from the debug build toolchain it'd be better, but I'm not sure if EMSDK provides that. Does it? @kripken
It's hard to know what the exact situation is; I don't see why rethrowing or throwing from a catch block can be a problem, but I think a source code snippet can help. Can you show a small example code that does what you said? |
I am working on a sample code that I can share that recreates the problem. I am trying out the #12039 fix. There were two -fwasm-exceptions errors with clang. I'll post them as two comments. Here is the first.
PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT: |
Second....
PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT: |
I tested the #12039 fix by replacing ./upstream/emscripten/src/library_exceptions.js, rebuilding the project, confirming my primary.js file has the changed code inserted into it, and running the program. Alas, it has the same errors as before. While I work on a test program for this, I did want to mention one other detail in case it matters. The primary pthread is the one who calls to create the other pthreads - they are not all created from main at startup. Perhaps there is some inherent hierarchy based on who creates the pthreads? For example, pthread A creates pthreads B and C; pthread A continues running and throws the exception which is given to pthread B (the first pthread that pthread A created) instead of pthread A. |
Both clang errors seem to be caused by using the new wasm EH ( And about the test case, if you already have it I'd appreciate it, but if not, please don't spend time on creating it; I think I know what's going on. |
I don't think that could matter to exceptions (but I may be missing something). Meanwhile, btw, talking to @sbc100 we realized that #7203 and followups to that file may have regressed this. When they were in JS, they were essentially thread-local. However when in C they are linked once so all threads use the same location. Those should probably be marked |
For the -fsanitize=undefined, the build gets this right after startup. We had tried the sanitizers before (at least with EMSDK 1.39.20 but I think we tried them with 1.39.17 too) and hit similar errors.
|
With the files replaced/patched from #12039 and #12056 (except the /test files - I didn't both to patch them), I still see the exception on the wrong thread. As a sanity check, I grepped all of our build outputs for "exception_builtin" and didn't get any hits (I do get the expected hits when grepping for "exceptionThrowBuf"). I don't know that any of the changes should have ended up in the outputs versus just being part of the build process, but I wanted to mention it in case it pointed to something be missing in my patching and building. I am still working on coming up with a test case. |
Just an update - I have written the test case and played with all sorts of things to recreate the error and nothing recreates the error so far. As I have gone back to our real code to look for hints on things to do differently (complexities to add to the test case), I am disabling more and more of our real code to try to narrow down what I need to simulate/recreate in the test case. So far, no luck on recreating the issue in the test case. But, the bright side of that is the basic threading and exception is probably working and there is more likely something else in our real code that might be corrupting memory or some other side effect where the symptom is these exceptions on the seemingly incorrect threads. I am still working on it. |
I was finally able to create a test case that reproduces the issue. Here is the test case which includes comments on the errors:
|
Starting with EMSDK 2.0.3, I modified the files from the 3 fixes, rebuilt my test program, ran the test program using node, and still hit the errors:
|
I see, thanks @rsielken I spent some time looking at this now. I do see That happens because This happens from It's possible that's a red herring, but it seems plausible it isn't. In that case, we should really fix that issue, which is blocked on a TLS fix in upstream LLVM, #12056 (comment) @tlively @aheejin any updates on the status there? If there isn't a quick fix, I think we have no choice but to go back to (slow) JS helper variables for |
I'll go take a look at it now. Sorry for dropping that without a clear owner. |
This adds the thread-local annotation to those globals. Previously they were in JS, which is effectively thread-local, but then we moved them to C which meant they were stored in the same shared memory for all threads. A race could happen if threads (or longjmp) operations happened at just the right time, one writing to those globals and another reading. Also make compiler-rt now build with a multithreaded variation, as the implementation of those globals is in that library. Also add a testcase that runs exceptions on multiple threads. It's not a guaranteed way to notice a race, but it may help, and this is an area we didn't have coverage of. Fixes #12035 This has been a possible race condition for a very long time. I think it went unnoticed because exceptions and longjmp tend to be used for exceptional things, and not constantly being executed. And also until we get wasm exceptions support they are also slow, so people have been trying to avoid them as much as possible anyhow.
I updated to 2.0.5 to pick up this fix and the test case passes for me also. Thank you for the fix. |
We have a large project which now runs pretty well except for the one module that uses C++ exceptions extensively. We generally have 3 pthreads running and what we see is that the expection on pthread A is thrown and then caught by any of the 3 pthreads (A, B or C) when it should always be A. Having pthread B or C don't have catches for that exception type, so those fall out (worker.js onmessage() captured an uncaught exception: RuntimeError: unreachable executed) and pthread A which needed the exception goes down error paths and often gets an index out of bounds exception trying to run code it shouldn't be. If I remove the other pthreads and only have the one, main pthread (losing the functionality of those other threads for the sake of this investigation - not a viable solution), we don't seem to have this issue but we are not convinced that that conclusion is not just circumstantial.
We have seen these issues with EMSDK 1.39.17, 1.39.20 and 2.0.0. We are building with -O1 which then requires -s DISABLE_EXCEPTION_CATCHING=0 per https://emscripten.org/docs/optimizing/Optimizing-Code.html#optimizing-code-exception-catching, but we have seen the issue with -O0 too.
Per ajeejin's May 28th comment from #11233, we tried to build with -s DISABLE_EXCEPTION_CATCHING=0 removed and -fwasm-exceptions. 2 modules got clang++ errors saying to submit a but to bugs.llvm.org including some files, but our company legal hasn't cleared us to do that because the files are basically the source code. I did try those 2 modules without -fwasm-exceptions and all the other modules with -fwasm-exceptions, but that didn't run:
wasm streaming compile failed: CompileError: wasm validation error: at offset 146711: unknown section before code section falling back to ArrayBuffer instantiation failed to asynchronously prepare wasm: CompileError: wasm validation error: at offset 146711: expected code section CompileError: wasm validation error: at offset 146711: expected code section
These exceptions are thrown and should be caught in the C++ code (not javascript), so https://emscripten.org/docs/porting/Debugging.html#handling-c-exceptions-from-javascript should not apply.
Are there additional settings that we need to have these exceptions be thrown and caught by the correct (same) thread (I have read through the Emscripten doc and tried all sorts of combinations of -S and other settings - all to no avail)? Is there anything to do until the new exception handling (-fwasm-exceptions) is done and ready? Is this a known issue for the existing exception handling that will be fixed? I don't have a test case outside of our real code that I can share - is it worth my time to work on a test case that I can share or is this already known and will be addressed (I don't want to waste time on the test case if it isn't needed but if this can/should be fixed on the existing exception handling infrastructure, I'm certainly will to go spend that time)?
The text was updated successfully, but these errors were encountered: