Set up Sentry for Android-native code #4996

gnprice · 2021-09-08T23:34:32Z

This means we'll start seeing reports in Sentry of both uncaught exceptions, and error or warning cases we explicitly catch, in our Kotlin code just as we do already in our RN-based JS code.

WesleyAC

I would do the Log.w in the helper, rather than manually. Other than that, looks good!

gnprice · 2021-09-09T20:05:40Z

Thanks for the review! Yeah, a good improvement to this code would be to unify things so that each error handler only has to make one call instead of two. Our logging.js functions provide something a lot like that. I may try making that change.

@chrisbobbe also pointed out (in the office yesterday) that we already have been getting some Sentry reports that look like they come from the Android-native side. For example here's one with a Java stack trace:
https://sentry.io/organizations/zulip/issues/1964700767/events/71cabef082ab476b9d55d6d11b2393f0/
And here's one reporting a segfault in libjsc, the JS implementation that RN supplies on Android:
https://sentry.io/organizations/zulip/issues/1968181388/events/1e76508e14aa44b5dd9686ac30fdc037/

I haven't actually done a negative test of this: I didn't try reporting a warning, or inducing an exception, before this "Enable Sentry" commit. I suspect what's happening is:

In the existing code, once we initialize Sentry from the JS side, the Sentry RN implementation also goes and initializes the Sentry Android or Sentry iOS implementation. That'd mean we're already covered… as long as the JS has already started running.
In order to have Sentry working when we're processing an incoming notification, though, as we want for notif: Require sender_id, server, and realm_uri, rejecting ancient servers' data #4967, we'll still need to make this change.
We probably also still need to initialize Sentry on the JS side (even on Android), in order to have it working there. Hopefully the redundancy is harmless.

I'll do some investigation to confirm.

gnprice · 2021-09-09T22:59:24Z

OK, some findings:

If I add a DSN in sentryConfig.js but not in AndroidManifest.xml, so that we initialize Sentry from JS only -- just as we do in the status quo for published releases -- then:
- Sentry.captureMessage in Java/Kotlin works fine after the JS has started up. In particular I added it in TextCompressionModule#decompress, and it showed up.
- OTOH it has no effect before the JS has started up. In particular I tried MainActivity#onCreate.
- An uncaught JVM exception doesn't get reported to Sentry, even after the JS has started up. I tried the same places as for Sentry.captureMessage.
OTOH if I add a DSN in both places, following the instructions in this branch, then:
- Sentry.captureMessage in Java/Kotlin now works fine both before and after the JS has started up.
- An uncaught exception from MainActivity#onCreate now gets reported just fine.
- … Hmm, but an uncaught exception in TextCompressionModule#decompress still doesn't!
  
  So that seems like a limitation of the Sentry library for React Native. That method is one that's exposed to JS via the React Native API, with the @ReactMethod annotation. An uncaught exception there causes a red screen in dev; in prod I expect it would cause a crash.
  
  In any case, that's not the situation that this PR is concerned with. AFAIK we've never had a bug of this form (and an uncaught exception in one of these methods is always a bug, in the method or something it calls -- exceptions should be turned into promise rejections, in order to propagate properly to the JS caller), so it's not a priority.

gnprice · 2021-09-09T23:10:23Z

OTOH if I add a DSN in both places, following the instructions in this branch, then: […]

An uncaught exception from MainActivity#onCreate now gets reported just fine.

… Hmm, but an uncaught exception in TextCompressionModule#decompress still doesn't!

So that seems like a limitation of the Sentry library for React Native. That method is one that's exposed to JS via the React Native API, with the @ReactMethod annotation. An uncaught exception there causes a red screen in dev; in prod I expect it would cause a crash.

Looks like this is probably getsentry/sentry-react-native#1102 . The person reporting that issue says it was limited to a debug build, and worked fine in a release build. (Easy to imagine how that could come about, with the red-screen mechanism interfering somehow.)

If that's the case then this is… still an annoying Sentry bug that they should really fix, because it causes confusion for developers, but it doesn't have a direct effect on error reporting in the field, which is nice.

chrisbobbe · 2021-09-09T23:59:55Z

Great, thanks for that investigation! I'm satisfied with it.

gnprice · 2021-09-10T00:13:00Z

Pushed a revision with Log.e and Log.w folded inside new wrappers ZLog.e and ZLog.w that cover both the device log and Sentry, as well as with commit-message edits to reflect that investigation. Please take a look!

chrisbobbe

LGTM, thanks! Feel free to merge after seeing a small thing or two below.

chrisbobbe · 2021-09-10T01:04:34Z

android/app/src/main/AndroidManifest.xml

@@ -87,5 +87,12 @@
                <action android:name="com.google.firebase.MESSAGING_EVENT" />
            </intent-filter>
        </service>
+


(IIUC)

I think it makes sense that getsentry/sentry-react-native#1102 would be the issue with TextCompressionModule#decompress, and that that report would be accurate about the problem being debug-mode-only. But I don't think we're 100% sure, right?

This should immediately mean that we start reporting to Sentry when an exception propagates uncaught and crashes the app.

So I wonder if this deserves an asterisk on the word "should", just in case we find an uncaught exception not being reported to Sentry in the future? Maybe just a link to #4996 (comment).

Sure; added a note about that.

chrisbobbe · 2021-09-10T01:07:51Z

android/app/src/main/java/com/zulipmobile/SentryUtils.kt

+        public fun warnException(e: Throwable) {
+            Sentry.withScope { scope ->
+                scope.level = SentryLevel.WARNING
+                Sentry.captureException(e)


Interesting, I think I'd expect that you'd have to pass the scope param somewhere in the Sentry.captureException call. But I guess you don't: https://docs.sentry.io/platforms/java/enriching-events/scopes/#local-scopes

Yeah. I think this API actually makes more sense than it would if you did have to pass it: basically the point is that there's always a current scope (there has to be, to handle uncaught exceptions where you don't explicitly invoke Sentry at all), and so Sentry.captureException uses that.

The withScope function is sort of doing two things at once: it pushes a new scope onto the scope stack (and undertakes to pop it at the end), and then it passes you the scope much like Sentry.configureScope does. I think that's a bit confusing, and if I were designing this API, I'd probably separate those, so that you'd write:

Sentry.pushScope { Sentry.configureScope { scope -> scope.level = SentryLevel.WARNING } Sentry.captureException(e) }

(Well, and I'd probably make configureScope a bit less cumbersome too, like:

Sentry.pushScope { Sentry.currentScope.level = SentryLevel.WARNING Sentry.captureException(e) }

)

This should immediately mean that we start reporting to Sentry when an exception propagates uncaught and crashes the app. It also means we can start explicitly logging things to Sentry when something goes wrong that we do anticipate and catch. It's actually already possible to invoke Sentry, but it only works when our JS code has already been started up, for example in a method exposed to JS with `@ReactMethod`. We want to use it from places like the notifications code, which can run without starting JS. Tested manually by following this comment's instructions with a DSN pointing to a test "project" in Sentry (as well as the existing instructions in `src/sentryConfig.js`), and inserting (into `MainActivity#onCreate`) both a Sentry warning and then just an uncaught exception. Both were successfully reported to Sentry, appearing in the test project's dashboard. There is an asterisk: when I tried inducing an uncaught exception inside a `@ReactMethod` method, I got (in a debug build) a red-screen modal as expected, but no report to Sentry. That seems like a Sentry bug. But it's a low-priority issue for us, in that AFAIK we've never had an issue that would cause one of those. Moreover, there looks to be a report of it in the Sentry tracker, which says it's specific to debug -- that it works fine in a release build. Discussion here: zulip#4996 (comment)

Like `Sentry.captureException`, but at warning level.

The one for malformed FCM messages may be especially informative as we start treating FCM messages from very old servers (that lack important pieces of information) as malformed. More generally this covers all of our call sites to `Log.w` and above. We should use `Zlog.w` and `Zlog.e` in place of those for any future call sites, too -- the `Log` call gets it to the device log, and that could be quite handy for development, but for issues encountered in a published build the Sentry call will get it to our Sentry dashboard, which is much more likely to be helpful. This change drops the message strings that were at each of these specific call sites. * In Sentry, there isn't a good place to put those -- the exception type and message are the most salient pieces of information -- and because that's our most valuable channel for debugging, it didn't seem like a good idea to have the `ZLog` API accept a message that we weren't going to be able to make good use of in Sentry. That's why we drop them. * In the device log, passing a throwable to `Log.*` causes the throwable's class name and message to get printed, as well as the stack trace. In each of these cases, that should cover all the information that was in these messages anyway. In particular, the most informative of these messages are really just restating the name of the class that we already know the exception is an instance of. So we're not losing anything material by dropping them, even when reading the device log rather than Sentry.

gnprice added a-Android a-tools P1 high-priority labels Sep 8, 2021

gnprice requested a review from chrisbobbe September 8, 2021 23:34

gnprice mentioned this pull request Sep 8, 2021

notif: Require sender_id, server, and realm_uri, rejecting ancient servers' data #4967

Merged

gnprice force-pushed the pr-sentry-android branch from e5d3e75 to eb20490 Compare September 8, 2021 23:39

WesleyAC reviewed Sep 9, 2021

View reviewed changes

gnprice force-pushed the pr-sentry-android branch from eb20490 to 31dd498 Compare September 10, 2021 00:11

chrisbobbe approved these changes Sep 10, 2021

View reviewed changes

gnprice and others added 4 commits September 10, 2021 16:06

android: Add "warnException" Sentry helper.

abce92b

Like `Sentry.captureException`, but at warning level.

android: Add ZLog methods, like android.util.Log but plus Sentry.

da28998

gnprice force-pushed the pr-sentry-android branch from 31dd498 to ab3ab62 Compare September 10, 2021 23:06

gnprice merged commit ab3ab62 into zulip:main Sep 10, 2021

gnprice deleted the pr-sentry-android branch September 10, 2021 23:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set up Sentry for Android-native code #4996

Set up Sentry for Android-native code #4996

gnprice commented Sep 8, 2021

WesleyAC left a comment

gnprice commented Sep 9, 2021

gnprice commented Sep 9, 2021

gnprice commented Sep 9, 2021

chrisbobbe commented Sep 9, 2021

gnprice commented Sep 10, 2021

chrisbobbe left a comment

chrisbobbe Sep 10, 2021 •

edited

Loading

gnprice Sep 10, 2021

chrisbobbe Sep 10, 2021

gnprice Sep 10, 2021

Set up Sentry for Android-native code #4996

Set up Sentry for Android-native code #4996

Conversation

gnprice commented Sep 8, 2021

WesleyAC left a comment

Choose a reason for hiding this comment

gnprice commented Sep 9, 2021

gnprice commented Sep 9, 2021

gnprice commented Sep 9, 2021

chrisbobbe commented Sep 9, 2021

gnprice commented Sep 10, 2021

chrisbobbe left a comment

Choose a reason for hiding this comment

chrisbobbe Sep 10, 2021 • edited Loading

Choose a reason for hiding this comment

gnprice Sep 10, 2021

Choose a reason for hiding this comment

chrisbobbe Sep 10, 2021

Choose a reason for hiding this comment

gnprice Sep 10, 2021

Choose a reason for hiding this comment

chrisbobbe Sep 10, 2021 •

edited

Loading