-
Notifications
You must be signed in to change notification settings - Fork 645
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segfault in production Android app - facebook::hermes::debugger::EventObserver::breakpointResolved #1434
Comments
Hey @MFazio23, looking at the stack, it looks like it hasn't been symbolicated properly, which is why the stack trace doesn't make sense, and so many frames are annotated with that one function. Take a look at the steps here to get a proper stack trace: https://github.com/facebook/hermes/blob/main/doc/ReactNativeIntegration.md#react-native--071 |
Thanks @neildhar - I did this before but I'll give it another shot because it looks like I got something wrong. |
We updated our app to use The stack trace was symbolicated by Sentry for us automatically, but when I uploaded the debug symbols to Crashlytics before, they were showing the same stack trace as Sentry. With the doc linked above, I'm still confused by "you can just invoke
|
cc @cortinico who is the expert on the intended workflow here. That said, you can also download the symbols directly from maven: |
You'll have to point to your app build folder like:
Also there was a sporadic bug with |
We have a new stack trace and it looks like the symbolication worked better this time, but let me know if it's not.
|
I've finally been able to reproduce this fairly frequently (but without any exact/consistent repro steps) on my Motorola Moto 5G 2023. Turning on Motorola's RAM Boost feature is critical to get our app to crash like this, and the crash stack traces reliably include Hades GC. We think the Hades GC is not interacting well with Motorola's implementation of virtual RAM in their "RAM boost" feature. We have seem tens of thousands of these crashes in the wild in the last couple months. The "user perceived crash rate" we see in Google Play for most Motorola devices is over 10% which is just crazy. We'd love some help from the Hermes team on this. Here's another symbolicated stack trace: Crash dump
|
These stack traces make a lot more sense, thank you for sharing them. It is odd that this crash is so isolated to a particular type of device, but it is hard to say what might be happening without more information. Some things that would help:
|
We'll work on a repro repository and follow up in the meantime as we get new stack traces. Is there any way to swap out the GC at runtime to switch from the default Hades to a different mode of Hades or to GenGC 0 if so we'd do so only on motorola devices... If not, is it possible to do the above but not at runtime - i.e. just change the Hermes GC statically for a given app release? |
There isn't any way to swap the GC in production right now, since GenGC was deleted some time ago. Some other things you could try along similar lines:
Knowing whether you observe the crash in these scenarios will be useful in narrowing things down. |
We were able to build from source and force Hades into non-concurrent mode, and still get the crash. Here's a symbolicated stack trace with the non-concurrent Hades GC: Crash dump:
|
@jtdaugh Could you also share the crash address? If you're building from source, could you try building with address sanitizer? An ASan report here would make it possible to understand why that write is failing. |
Here is the full tombstone: Can you share how we can build with address sanitizer? We're open to trying/modifying any and all build flags to see if we can find & fix the root cause, so if you have any ideas beyond |
@jtdaugh Could you share more tombstones? It would be interesting to see if there is a pattern in the crash addresses. To build Hermes with address sanitizer, you can set It would also be useful to build Hermes with asserts enabled (if you aren't already), which would hopefully catch this sooner. This should just be a matter of setting |
10 other tombstones attached. These all came from a different build (an actual play store distribution IIRC) without the custom hermes build - instead w/ the prepackaged hermes. Will follow up after getting some crashes w/ ASan and asserts enabled tombstone_00.txt |
@neildhar I've been struggling to get the app to build with ASan and CMAKE_BUILD_TYPE=Debug. In my setup I'm pointing RN to the custom hermes source and letting hermes be built from that custom source, I'm not separately directly cmaking the hermes binary. Is simply |
@neildhar here is a crash and tombstone that I believe have a debug build type and ASan enabled: |
Noticed the above (18) was not a hades crash.. Here is a newer crash that is back in Hades: |
I would expect you would set them in the same place you set
I was hoping that this would provide some more information about rogue access, but it doesn't look like there is an ASan report here. What fraction of your crashes on these Motorola devices have a Hermes stack trace? Is it possible that this is just general heap corruption happening somewhere else in your app? |
@neildhar we've been able to create release builds with all debug symbols included and repro the crash on my motorola device, and have some more detailed stack traces w/ tombstones attached here. One of these is an outlier, the other 3 are the exact same stack trace:
crash30.txt |
@neildhar we made this change the the hermes source as a hail mary, and have not observed a crash yet on this build, on my device that was reliably crashing every minute or so on vanilla hermes. I'll keep playing with this build for a while longer, but am curious about your thoughts on this potential fix. |
@jtdaugh Wow, this is extremely strange. Can you confirm that you are still building without |
@jtdaugh One potential theory is that the memory trimming code is being invoked for some reason in these circumstances, and ends up calling into the GC unsafely from another thread. Do you see anything in the logcat output related to memory warnings? Two other interesting things to note:
|
This app utilizes |
Since this appears to be thread related, and happens even with single-threaded GC, I think we should at least carefully consider the possibility that the problem is related to Let's look at what we know: there is a crash that happens very often and reproducibly and it appears to be thread-related. It only happens in this app. Hermes is deployed to hundreds of millions of devices, if this was happening commonly, we would now for sure. So, there is something that sets this particular app apart. We need to identify what this is. This doesn't mean that there is a bug in So, what is different? |
You’re right that we should consider |
@lukmccall There was another issue recently where someone mentioned that reanimated uses |
That might be the case. I have a theory on why this happens, but I'm still investigating it. I will come back with my findings. |
We finally determined the conflict in our application; the NewRelic library appears to be the source of these crashes. I've opened an issue with the NR team on the problem but the crashes have gone away in our latest release after disabling NewRelic. We did get some more detailed stack traces in case this is any help as to why NR is conflicting with something in the app:
|
Quick update: we were able to keep most of our NewRelic monitoring enabled and avoid crashes by turning off the |
Thanks for the update @MFazio23, closing this issue since there's no action needed in Hermes. |
Bug Description
We have a massive (15K) amount of crashes in our production React Native Android app which are shown to end in the
breakpointResolved
function inside Hermes.This crash occurs ~90% of the time on Motorola devices, though we've seen it a slight bit on other devices like Samsungs and Pixels.
It is also happening all over our app without a clear pattern as to something in particular causing it.
gradle clean
and confirmed this bug does not occur with JSCHermes git revision (if applicable): N/A
React Native version: 0.72.1
OS: Android
Platform (most likely one of arm64-v8a, armeabi-v7a, x86, x86_64): arm64_v8a
Stack trace
Steps To Reproduce
We have yet to reproduce this on our side and are going off of multiple crash/app health apps (Sentry/Crashlytics) to let us know this is happening. We've tested dev and production builds on lots of devices (including the most common Motorola device in the list) without seeing anything.
The Expected Behavior
The app does not crash.
The text was updated successfully, but these errors were encountered: