Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set up Sentry for Android-native code #4996

Merged
merged 4 commits into from
Sep 10, 2021
Merged

Conversation

gnprice
Copy link
Member

@gnprice gnprice commented Sep 8, 2021

This means we'll start seeing reports in Sentry of both uncaught exceptions, and error or warning cases we explicitly catch, in our Kotlin code just as we do already in our RN-based JS code.

Copy link
Contributor

@WesleyAC WesleyAC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would do the Log.w in the helper, rather than manually. Other than that, looks good!

@gnprice
Copy link
Member Author

gnprice commented Sep 9, 2021

Thanks for the review! Yeah, a good improvement to this code would be to unify things so that each error handler only has to make one call instead of two. Our logging.js functions provide something a lot like that. I may try making that change.

@chrisbobbe also pointed out (in the office yesterday) that we already have been getting some Sentry reports that look like they come from the Android-native side. For example here's one with a Java stack trace:
https://sentry.io/organizations/zulip/issues/1964700767/events/71cabef082ab476b9d55d6d11b2393f0/
And here's one reporting a segfault in libjsc, the JS implementation that RN supplies on Android:
https://sentry.io/organizations/zulip/issues/1968181388/events/1e76508e14aa44b5dd9686ac30fdc037/

I haven't actually done a negative test of this: I didn't try reporting a warning, or inducing an exception, before this "Enable Sentry" commit. I suspect what's happening is:

  • In the existing code, once we initialize Sentry from the JS side, the Sentry RN implementation also goes and initializes the Sentry Android or Sentry iOS implementation. That'd mean we're already covered… as long as the JS has already started running.
  • In order to have Sentry working when we're processing an incoming notification, though, as we want for notif: Require sender_id, server, and realm_uri, rejecting ancient servers' data #4967, we'll still need to make this change.
  • We probably also still need to initialize Sentry on the JS side (even on Android), in order to have it working there. Hopefully the redundancy is harmless.

I'll do some investigation to confirm.

@gnprice
Copy link
Member Author

gnprice commented Sep 9, 2021

OK, some findings:

  • If I add a DSN in sentryConfig.js but not in AndroidManifest.xml, so that we initialize Sentry from JS only -- just as we do in the status quo for published releases -- then:
    • Sentry.captureMessage in Java/Kotlin works fine after the JS has started up. In particular I added it in TextCompressionModule#decompress, and it showed up.
    • OTOH it has no effect before the JS has started up. In particular I tried MainActivity#onCreate.
    • An uncaught JVM exception doesn't get reported to Sentry, even after the JS has started up. I tried the same places as for Sentry.captureMessage.
  • OTOH if I add a DSN in both places, following the instructions in this branch, then:
    • Sentry.captureMessage in Java/Kotlin now works fine both before and after the JS has started up.

    • An uncaught exception from MainActivity#onCreate now gets reported just fine.

    • … Hmm, but an uncaught exception in TextCompressionModule#decompress still doesn't!

      So that seems like a limitation of the Sentry library for React Native. That method is one that's exposed to JS via the React Native API, with the @ReactMethod annotation. An uncaught exception there causes a red screen in dev; in prod I expect it would cause a crash.

      In any case, that's not the situation that this PR is concerned with. AFAIK we've never had a bug of this form (and an uncaught exception in one of these methods is always a bug, in the method or something it calls -- exceptions should be turned into promise rejections, in order to propagate properly to the JS caller), so it's not a priority.

@gnprice
Copy link
Member Author

gnprice commented Sep 9, 2021

OTOH if I add a DSN in both places, following the instructions in this branch, then: […]

  • An uncaught exception from MainActivity#onCreate now gets reported just fine.

  • … Hmm, but an uncaught exception in TextCompressionModule#decompress still doesn't!

    So that seems like a limitation of the Sentry library for React Native. That method is one that's exposed to JS via the React Native API, with the @ReactMethod annotation. An uncaught exception there causes a red screen in dev; in prod I expect it would cause a crash.

Looks like this is probably getsentry/sentry-react-native#1102 . The person reporting that issue says it was limited to a debug build, and worked fine in a release build. (Easy to imagine how that could come about, with the red-screen mechanism interfering somehow.)

If that's the case then this is… still an annoying Sentry bug that they should really fix, because it causes confusion for developers, but it doesn't have a direct effect on error reporting in the field, which is nice.

@chrisbobbe
Copy link
Contributor

Great, thanks for that investigation! I'm satisfied with it.

@gnprice
Copy link
Member Author

gnprice commented Sep 10, 2021

Pushed a revision with Log.e and Log.w folded inside new wrappers ZLog.e and ZLog.w that cover both the device log and Sentry, as well as with commit-message edits to reflect that investigation. Please take a look!

Copy link
Contributor

@chrisbobbe chrisbobbe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks! Feel free to merge after seeing a small thing or two below.

@@ -87,5 +87,12 @@
<action android:name="com.google.firebase.MESSAGING_EVENT" />
</intent-filter>
</service>

Copy link
Contributor

@chrisbobbe chrisbobbe Sep 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(IIUC)

I think it makes sense that getsentry/sentry-react-native#1102 would be the issue with TextCompressionModule#decompress, and that that report would be accurate about the problem being debug-mode-only. But I don't think we're 100% sure, right?

This should immediately mean that we start reporting to Sentry when
an exception propagates uncaught and crashes the app.

So I wonder if this deserves an asterisk on the word "should", just in case we find an uncaught exception not being reported to Sentry in the future? Maybe just a link to #4996 (comment).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure; added a note about that.

public fun warnException(e: Throwable) {
Sentry.withScope { scope ->
scope.level = SentryLevel.WARNING
Sentry.captureException(e)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, I think I'd expect that you'd have to pass the scope param somewhere in the Sentry.captureException call. But I guess you don't: https://docs.sentry.io/platforms/java/enriching-events/scopes/#local-scopes

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. I think this API actually makes more sense than it would if you did have to pass it: basically the point is that there's always a current scope (there has to be, to handle uncaught exceptions where you don't explicitly invoke Sentry at all), and so Sentry.captureException uses that.

The withScope function is sort of doing two things at once: it pushes a new scope onto the scope stack (and undertakes to pop it at the end), and then it passes you the scope much like Sentry.configureScope does. I think that's a bit confusing, and if I were designing this API, I'd probably separate those, so that you'd write:

Sentry.pushScope {
  Sentry.configureScope { scope -> scope.level = SentryLevel.WARNING }
  Sentry.captureException(e)
}

(Well, and I'd probably make configureScope a bit less cumbersome too, like:

Sentry.pushScope {
  Sentry.currentScope.level = SentryLevel.WARNING
  Sentry.captureException(e)
}

)

gnprice and others added 4 commits September 10, 2021 16:06
This should immediately mean that we start reporting to Sentry when
an exception propagates uncaught and crashes the app.  It also means
we can start explicitly logging things to Sentry when something goes
wrong that we do anticipate and catch.

It's actually already possible to invoke Sentry, but it only works
when our JS code has already been started up, for example in a method
exposed to JS with `@ReactMethod`.  We want to use it from places like
the notifications code, which can run without starting JS.

Tested manually by following this comment's instructions with a DSN
pointing to a test "project" in Sentry (as well as the existing
instructions in `src/sentryConfig.js`), and inserting (into
`MainActivity#onCreate`) both a Sentry warning and then just an
uncaught exception.  Both were successfully reported to Sentry,
appearing in the test project's dashboard.

There is an asterisk: when I tried inducing an uncaught exception
inside a `@ReactMethod` method, I got (in a debug build) a red-screen
modal as expected, but no report to Sentry.  That seems like a Sentry
bug.  But it's a low-priority issue for us, in that AFAIK we've never
had an issue that would cause one of those.  Moreover, there looks to
be a report of it in the Sentry tracker, which says it's specific to
debug -- that it works fine in a release build.  Discussion here:
  zulip#4996 (comment)
Like `Sentry.captureException`, but at warning level.
The one for malformed FCM messages may be especially informative
as we start treating FCM messages from very old servers (that lack
important pieces of information) as malformed.

More generally this covers all of our call sites to `Log.w` and
above.  We should use `Zlog.w` and `Zlog.e` in place of those for
any future call sites, too -- the `Log` call gets it to the device
log, and that could be quite handy for development, but for issues
encountered in a published build the Sentry call will get it to
our Sentry dashboard, which is much more likely to be helpful.

This change drops the message strings that were at each of these
specific call sites.

 * In Sentry, there isn't a good place to put those -- the exception
   type and message are the most salient pieces of information -- and
   because that's our most valuable channel for debugging, it didn't
   seem like a good idea to have the `ZLog` API accept a message that
   we weren't going to be able to make good use of in Sentry.  That's
   why we drop them.

 * In the device log, passing a throwable to `Log.*` causes the
   throwable's class name and message to get printed, as well as the
   stack trace.  In each of these cases, that should cover all the
   information that was in these messages anyway.  In particular,
   the most informative of these messages are really just restating
   the name of the class that we already know the exception is an
   instance of.  So we're not losing anything material by dropping
   them, even when reading the device log rather than Sentry.
@gnprice gnprice merged commit ab3ab62 into zulip:main Sep 10, 2021
@gnprice gnprice deleted the pr-sentry-android branch September 10, 2021 23:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants