Fix `handled` data and differentiate between unhandled errors and crashes #6073

lobsterkatie · 2022-10-27T15:40:37Z

Problem Statement

TL;DR: We send wrong data in the handled field, we conflate unhandled errors and crashes, and even if we fixed the former we'd still need to find effective ways to differentiate the latter from one another.

Stolen from my comment on getsentry/rfcs#10:

Note (just so the second part doesn't get missed): Frustratingly, the GH UI gives no indication of this, but this quoted comment "continues below the fold" (i.e., is scrollable).

This is decidedly broken in JS, in a number of different ways.

On the browser side of things:

True crashes (the kind that land you here: chrome://crash/) are pretty rare. That's fortunate, because at that point the JS engine has gone down in flames (taking Sentry along with it) so we have no good way to record what's happened. (There's no publicly-available native browser API to capture such an event the way you can with, say, a minidump.)

Nonetheless, we mark some errors as unhandled, specifically those which bubble up to the global handlers. While it's true they haven't been handled (unless the user has add a global onerror handler themselves, which is not something we currently try to detect), they're also not crashes. (They show up in release health metrics that way, though.)

Errors which are caught by our auto-instrumentation (1, 2, 3, 4, 5) are marked as handled, even though they have not, in fact, been handled by the user.

Events which are handled by the user are correctly marked as being handled.

So the data is wrong in many cases, but it's also kind of arbitrary and divorced from reality, in both directions: It's totally possible for some browser extension the user has installed to be blowing up the console with unhandled errors, causing all of the sessions in the main web app to be marked as crashed, even though the user is blissfully unaware that anything's wrong (and the erroring code has nothing to do with the app integrating sentry). It's also possible for, say, a click handler to error, the site to therefore become unresponsive to at least some user interactions, and for our auto-instrumentation to catch it and mark it handled (and therefore a healthy session), even though the user's experience is of an app which has frozen up.

On the node side, we have both the same (too much handled) and the opposite (too much unhandled) problem:

True crashes (anything which causes the main process to exit) do happen, but only sometimes trigger the beforeExit event. We also sometimes purposefully exit the process.

Similar to the browser SDK, the node SDK marks unhandled errors and rejections which make it to the global handler as unhandled, but depending on Sentry.init settings, this may or may not result in the process being force-exited. Also similar to browser-land, all other errors are marked handled, whether they were caught by the user or by auto-instrumentation.

If the node SDK is being run inside the serverless SDK, though, all errors are considered unhandled, regardless of whether or not the user catches the error.

Ultimately, there are really three problems/challenges/considerations/whatever that we're dealing with:

We'd like some way for the data model to distinguish between true crashes and unhandled errors (the main subject of this RFC).

As has been alluded to in other comments, making changes here has consequences for both the UI (the red 💀 badge) and downstream data (session statuses and the resulting crash-free rate for a given release).

We should be emitting values which better reflect reality (by itself not a hard change, but one which needs the above two issues to get figured out first).

As a result, anything we do is going to have to involve not only SDK folks but also product folks (together making a decision) and then design/UI folks, and possibly ingest folks as well (in order to fully implement any changes).

Relevant issues:
#5408
#5375

Other relevant bits of that thread:
getsentry/rfcs#10 (comment)
getsentry/rfcs#10 (comment)

Solution Brainstorm

There are a few separate but intertwined threads of what has to happen to really fix this:

We need the sentry server and UI to be able to handle three options rather than two.
- Takes resources outside of our team and therefore commitment to actually make this better from a Decider™.
- In the meantime, we agreed to start sending the correct data somewhere else in the issue, so we can run analytics on just how wrong things are, and so whatever future team picks up that side of things will have ready-made data to play with.
We need to correct our blatantly incorrect booleans
- Browser
- Node
- Other frameworks TBA
We need to find effective ways to differentiate between unhandled errors which don't really block users and ones which actually break stuff (the closest we're going to get to actual "crashes" in browser-land, at least until a real minidump-like API is introduced for browser tabs - for obvious reasons this doesn't count).
- Interesting ideas about midway through this deck on how to tell which errors negatively affect users: https://speakerdeck.com/bluesmoon/1000-javascript-errors-are-as-good-as-none
- Is this a place where session replay could help?

Note: As per getsentry/sentry-dart#1116 (comment), we should coordinate with the mobile folks when we actually start working on this. EDIT: Turns out they can already do the data gathering for the dart SDK. Might nice to try sending correct data from RN, though.

The text was updated successfully, but these errors were encountered:

lobsterkatie · 2022-12-06T05:38:15Z

We agreed that as a first step, we'd send better data in a different spot, so a) we can compare it to the bad data to see how often we're wrong, and b) we can give whoever might end up working on this eventually correct data to play with.

Tasks for this first step:

Figure out where to send it (right now, relay doesn't seem to be accepting other or meta)
Add correct value to existing instrumentation with mechanisms
Add mechanisms where they don't exist, and include correct data (replay, vue, serverless, etc)
Make sure all internal use of captureException comes from @sentry/core (ref(core): Switch all internal uses of captureException to use @sentry/core #6488)
Fix mechanism for manually-caught errors

Lms24 · 2022-12-21T15:05:44Z

heads-up @mydea just assigning you b/c you've taken a look at the PR. Doesn't mean we have to continue with this.

taj-p · 2023-08-18T00:07:15Z

Hey all!

I recently had to take a look at Sentry source code for an unrelated issue and to my surprise I discovered that handled: true tags do not mean that an error is handled as pointed out in this issue:

Errors which are caught by our auto-instrumentation (1, 2, 3, 4, 5) are marked as handled, even though they have not, in fact, been handled by the user.

When triaging errors, we focus our efforts first on unhandled errors before considering other issues that are handled (since, nominally, these have been correctly handled from the POV of the user).

I noticed there were a couple PRs that started to address this but have since been closed. Am I misunderstanding the expected semantics for the handled tag? Does handled: true mean that the error has been logged by some mechanism in Sentry prior to global event listeners and not that is has been handled from application code?

Lms24 · 2023-08-23T07:48:32Z

Hi @taj-p thanks for reminding us about this. We had to abandon changing behaviour here due to prioritization of other projects but I this again in our team and we talked about going with a pragmatic solution here - basically a subset of the proposed changes from the RC:

Set handled: true by default
make all our internal integrations/places where we capture errors report handled: false

This way, when users manually call captureException their errors are considered handled. When we capture an error, it's considered unhandled.

We're not gonna get to this this week but I'll bring it up in planning for next week(s). Can't promise any ETA though.

Lms24 · 2023-08-30T10:59:28Z

As can be seen by the linked PRs above, we had to adjust quite a few of our internal handlers/wrappers to always mark errors caught by us (i.e. automatically from the SDK) as unhandled. Here's a list for future reference:

@sentry/browser
- feat(browser): Mark errors caught from TryCatch as unhandled #8890
- feat(browser): Mark errors caught in wrap as unhandled #8898
- this will affect all SDKs inheriting from Browser (Angular, Vue, React, Svelte, etc)
@sentry/react
- feat(react): Mark errors captured from ErrorBoundary as unhandled #8914
@sentry/vue
- feat(vue): Mark errors caught in Vue wrappers as unhandled #8905
@sentry/nextjs
- feat(nextjs): Mark errors caught from NextJS wrappers as unhandled #8893
@sentry/remix
- feat(remix): Mark errors caught from Remix instrumentation as unhandled #8894
@sentry/serverless
- feat(serverless): Mark errors caught in Serverless handlers as unhandled #8907
@sentry/integrations
- feat(integrations): Mark errors caught from HttpClient and CaptureConsole integrations as unhandled #8891

…8893) Mark errors caught in * `captureUnderscoreException` * `wrapApiHandlerWithSentry` * `callDataFetcherTraced` * `withErrorInstrumentation` * `wrapServerComponentWithSentry` as unhandled. For more details, see #8890, #6073 Co-authored-by: Luca Forstner <luca.forstner@sentry.io>

…ed (#8894) Mark errors caught from remix instrumentation as unhandled: * `captureRemixErrorBoundaryError` * `captureRemixServerException` For more details see #8890, #6073

…Console` integrations as unhandled (#8891) Mark errors caught from optional integrations * `HttpClient` * `CaptureConsole` as unhandled. For more details see #8890, #6073

Lms24 · 2023-09-05T09:47:38Z

I'm going to close this issue as we're releasing the changes listed above as I'm typing this.
If you have concrete issues around specific mechanism adjustments please open a new issue. If you have general comments, feel free to comment here.

lobsterkatie added the Type: Improvement label Oct 27, 2022

lobsterkatie mentioned this issue Oct 27, 2022

Introduce additional types of exceptions next to mechanism.handled exceptions getsentry/rfcs#10

Open

lobsterkatie added Feature: Errors Status: Backlog Package: browser Issues related to the Sentry Browser SDK Package: node Issues related to the Sentry Node SDK labels Oct 27, 2022

lobsterkatie self-assigned this Oct 27, 2022

ueman mentioned this issue Nov 8, 2022

Fix handled data and differentiate between unhandled errors and crashes getsentry/sentry-dart#1116

Closed

lobsterkatie mentioned this issue Dec 9, 2022

ref(core): Switch all internal uses of captureException to use @sentry/core #6488

Closed

mydea mentioned this issue Dec 15, 2022

ref(core): Ensure we use captureException() from @sentry/core where possible #6546

Closed

Lms24 assigned mydea and unassigned lobsterkatie Dec 21, 2022

hubertdeng123 removed the Status: Backlog label Jul 26, 2023

getsantry bot added the Waiting for: Product Owner label Aug 18, 2023

getsantry bot added this to GitHub Issues with 👀 Aug 18, 2023

getsantry bot moved this to Waiting for: Product Owner in GitHub Issues with 👀 Aug 18, 2023

getsantry bot removed the Waiting for: Product Owner label Aug 23, 2023

getsantry bot removed the status in GitHub Issues with 👀 Aug 23, 2023

Lms24 assigned Lms24 and unassigned mydea Aug 29, 2023

Lms24 mentioned this issue Aug 30, 2023

feat(serverless): Mark errors caught in Serverless handlers as unhandled #8907

Merged

Lms24 mentioned this issue Aug 30, 2023

feat(react): Mark errors captured from ErrorBoundary as unhandled #8914

Merged

Lms24 closed this as completed Sep 5, 2023

Lms24 mentioned this issue Mar 5, 2024

ErrorBoundary errors are being incorrectly marked as unhandled since 5.9.2 #10985

Closed

11 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix `handled` data and differentiate between unhandled errors and crashes #6073

Fix `handled` data and differentiate between unhandled errors and crashes #6073

lobsterkatie commented Oct 27, 2022 •

edited

Loading

lobsterkatie commented Dec 6, 2022 •

edited

Loading

Lms24 commented Dec 21, 2022

taj-p commented Aug 18, 2023 •

edited

Loading

Lms24 commented Aug 23, 2023

Lms24 commented Aug 30, 2023 •

edited

Loading

Lms24 commented Sep 5, 2023

Fix handled data and differentiate between unhandled errors and crashes #6073

Fix handled data and differentiate between unhandled errors and crashes #6073

Comments

lobsterkatie commented Oct 27, 2022 • edited Loading

Problem Statement

Solution Brainstorm

lobsterkatie commented Dec 6, 2022 • edited Loading

Lms24 commented Dec 21, 2022

taj-p commented Aug 18, 2023 • edited Loading

Lms24 commented Aug 23, 2023

Lms24 commented Aug 30, 2023 • edited Loading

Lms24 commented Sep 5, 2023

Fix `handled` data and differentiate between unhandled errors and crashes #6073

Fix `handled` data and differentiate between unhandled errors and crashes #6073

lobsterkatie commented Oct 27, 2022 •

edited

Loading

lobsterkatie commented Dec 6, 2022 •

edited

Loading

taj-p commented Aug 18, 2023 •

edited

Loading

Lms24 commented Aug 30, 2023 •

edited

Loading