-
Notifications
You must be signed in to change notification settings - Fork 637
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EXC_BAD_ACCESS signal received executing GPT2 llvm-cpu on iOS Simulator #12369
Comments
Following the suggestion by @benvanik , I set a breakpoint in the program before where the signal happens. It seems that in the above screenshot, it was the debugger who cleared a0, a1, a2. Here is the screenshot of what they were before the signal happened. |
I also followed another suggestion by @benvanik to use the command-line option
Unfortunately, it seems that IREE runtime cannot find the dynamic library. |
Darn - this may be something in the simulator sandbox. You can add an and then and if you put a breakpoint here at the failure: https://github.com/openxla/iree/blob/main/runtime/src/iree/base/internal/dynamic_library_posix.c#L160 you can see if the paths its trying are valid on your filesystem. If we can get the system linker working then it's possible to use ASAN/etc to diagnose the memory accesses inside the executable by setting |
|
Ahh it may be a dlopen issue. Wondering if this is the simulator actually preventing dlopen or something. It's silly we don't print the actual here - I'll send a PR in a sec but you can apply this to see if you get a better error message: diff --git a/runtime/src/iree/base/internal/dynamic_library_posix.c b/runtime/src/iree/base/internal/dynamic_library_posix.c
index f4261d09b..4581e143f 100644
--- a/runtime/src/iree/base/internal/dynamic_library_posix.c
+++ b/runtime/src/iree/base/internal/dynamic_library_posix.c
@@ -158,7 +158,9 @@ iree_status_t iree_dynamic_library_load_from_files(
if (!handle) {
IREE_TRACE_ZONE_END(z0);
return iree_make_status(IREE_STATUS_NOT_FOUND,
- "dynamic library not found on any search path");
+ "failed to load dynamic library (possibly not "
+ "found on any search path): %s",
+ dlerror());
}
iree_dynamic_library_t* library = NULL; |
I followed your diff and made it prints
|
ah hah! |
|
However, clang /tmp/a.c \
--target=arm64-apple-ios-simulator \
-isysroot $(xcodebuild -version -sdk iphonesimulator Path) The following does not work.
It complains
|
We suspect that this patch is the culprit: cd92019 Confirmed that this was working in the candidate-20230222.438 release (possibly later but didn't finish complete bisect). We narrowed it down to this minimal repro:
It can be compiled for Apple Silicon with: And then running on Apple Silicon will hit that bad access: There were some odd points in the analysis that don't make sense yet. Biggest one is that when run via the Python runtime bindings, the entire model (including this dispatch) works. No explanation for that except that something is randomly different about the address layout that plays in favor here. |
Landing the revert now and assigning the issue to Diego to reference as he decides what to do about it. Will close once we manage a roll forward without the bug. |
Thanks for reverting the PR! It looks like masking was just exposing an unrelated issue. #12460 fixed the problem and the masking changes have been rolled forward. Thanks! |
This reverts commit cd92019. Reverting because this was identified as the root cause of iree-org#12369.
This reverts commit cd92019. Reverting because this was identified as the root cause of iree-org#12369.
This reverts commit cd92019. Reverting because this was identified as the root cause of iree-org#12369.
What happened?
Thanks to Lei (@antiagainst)'s quick review of my case, I have a general idea of what I should write about in this issue.
On my macOS/M1 Max system, I made a simple iOS app in Xcode that runs the llvm-cpu vmfb generated from the GPT-2 model by James and Rob (https://github.com/iree-org/iree-jax/blob/main/models/gpt2/export.py) using the IREE runtime in the iOS Simulator app. Unfortunately, I got a EXC_BAD_ACCESS signal at the first call into a function in the MLIR module. Here's a screenshot of the stack to show you what it looks like.
I used the following command to turn the MLIR into a vmfb file.
iree-compile /tmp/gpt2.mlir --iree-hal-target-backends=llvm-cpu --iree-input-type=mhlo > /tmp/gpt2.vmfb
But if I change the backend from "llvm-cpu" to "vmvx," the iOS app works fine and the call to the MLIR function returns the right value.
iree-compile /tmp/gpt2.mlir --iree-hal-target-backends=vmvx --iree-input-type=mhlo > /tmp/gpt2.vmfb
The following is a screenshot of the successful run.
A few weeks ago, I was able to compile a simple JAX program that trains a linear regression model into llvm-cpu vmfb and then run it in the iOS simulator with success.
Steps to reproduce your issue
source ./build/.env
andexport $PYTONPATH
.PYTHONPATH
./tmp/gpt2.mlir
and/tmp/gpt2.vmfb
ofllvm-cpu
code.What component(s) does this issue relate to?
Runtime
Version information
bdd679d
Additional context
macOS 13.2.1 (22D68)
Xcode 14.2 (14C18)
Apple clang version 14.0.0 (clang-1400.0.29.202)
The text was updated successfully, but these errors were encountered: