Skip to content
This repository has been archived by the owner on Apr 22, 2023. It is now read-only.

domains: fix issues with abort on uncaught #8666

Closed
wants to merge 2 commits into from

Conversation

misterdjules
Copy link

Do not abort the process if an error is thrown from within a domain, an
error handler is setup for the domain and --abort-on-uncaught-exception
was passed on the command line.

However, if an error is thrown from within the top-level domain's error
handler and --abort-on-uncaught-exception was passed on the command
line, make the process abort.

Fixes #8631. Also fixes #8630.

@misterdjules misterdjules force-pushed the fix-issue-8631 branch 2 times, most recently from 82a8d75 to fd61ee3 Compare November 3, 2014 20:33
@misterdjules
Copy link
Author

Tested on Linux, MacOS, SmartOS and Windows.

Initial pre-review done by @tjfontaine, cc @trevnorris @dap.

@misterdjules
Copy link
Author

Sorry @dap, I meant @davepacheco

@trevnorris
Copy link

Wow, this is a massive piece of work. Thanks for all the effort.

There are some issues. Like, we're not going to float a V8 patch for better domain support. Even more so since this is for the v0.10 branch.

@tjfontaine
Copy link

I think we need to float this patch, as it is today, people who are using domains heavily, but also want to be able to use --abort-on-uncaught-exception cannot do that. This patch to V8 is pretty minimal and doesn't change the ultimate behavior of V8.

We're obviously not going to be able to upstream this particular change to V8 as it's not the most appropriate, it is fairly reasonable for the V0.10 branch.

The upstream change to V8 is probably to remove the need for V8 to know about --abort-on-uncaught-exception altogether and instead upstream a more generic solution like SetUncaughtExceptionHandler that logically happens in the same location which would allow the embedder to determine the semantics of where and when it can/should abort.

@tjfontaine
Copy link

@geek would you like to comment on this?

@geek
Copy link
Member

geek commented Nov 5, 2014

@trevnorris without this patch you have to take your pick between post mortem debugging with core files and being able to use domains.

@trevnorris
Copy link

@tjfontaine @geek While this might fix the individual issue for v0.10, don't forget it still won't work for all cases in v0.12. V8's Promises will swallow errors, not allowing them to propagate to a domain.

@geek
Copy link
Member

geek commented Nov 6, 2014

@trevnorris good point.

@tjfontaine can we get this merged for 0.10.34!?!?

@geek
Copy link
Member

geek commented Nov 13, 2014

@tjfontaine or @trevnorris can this get merged soon?

@chrisdickinson
Copy link

This LGTM for v0.10, and floating a patch seems like a necessary evil given the nature of the problem ("domains or postmortem debugging, pick one"). My questions:

  • My assumption is that we're not planning any V8 upgrades to v0.10, given v0.12 is coming out soon -- is this correct?
  • Do we know what the timeframe is for V8 to make the necessary upstream changes to enable SetUncaughtExceptionHandler?
  • Is there a ticket link for that work?
  • If that works makes it into V8 in a timely fashion, is v0.12 in a state where we can either upgrade V8 to the necessary version, or backport that change?

In short, I could see this problem blocking v0.12; but this patch (assuming we're not planning on upgrading v0.10's V8 in the future) seems like a sane solution.

@trevnorris
Copy link

Node already uses v8::V8::SetFatalErrorHandler() to catch uncaught exceptions, and I'm not sure why we can't just tie into this.

@trevnorris
Copy link

Additional info from a chat on IRC:

<trevnorris> V8::SetFatalErrorHandler() exists in v0.10.
<trevnorris> it is the reason process._fatalException() gets called.
<trevnorris> simply checking a runtime flag and aborting() the moment SetFatalErrorHandle() would be enough.
<trevnorris> chrisdickinson: it might also be good if we fprintf(stderr) information about the failure, a lot like what V8 does in a Debug build.
<trevnorris> i.e. print out the stack trace, etc.

I.E. There's no reason to add anything to V8.

@chrisdickinson
Copy link

Node already uses v8::V8::SetFatalErrorHandler() to catch uncaught exceptions, and I'm not sure why we can't just tie into this.

Re: SetFatalErrorHandler does not handle uncaught exceptions. The code that uses the error handler set there is only called by FatalProcessOutOfMemory, ReportApiFailure, ReportV8Dead, and ReportEmptyHandle. These are not used to report unhandled exceptions, which are handled by the Isolate::DoThrow method.


To summarize the behaviors we're examining (please point out any mistakes you see here!):

The current behavior is:

Any time an exception propagates to the top of the JS stack, the process aborts. If there are uncaughtException listeners, or domains active, they will not be run. The core dump includes JS frames, usable with mdb.

This patch's behavior:

Domains will be run if active, uncaughtException listeners will as well. The process will abort if the last domain emits an unhandled error. The core file contains JS frames, usable from mdb. If domains are active, these JS frames will be low-value, because they'll represent domain handling frames, not the frames that generated the exception. If domains are inactive, the JS frames will represent the code that threw the exception.


The proposed fix from IRC, as I understand it, is as follows:

Check for --abort-on-uncaught-exceptions in node::FatalException after running the fatal exception handler. If the flag is true, and either the TryCatch has caught or the return value of the handler is false, run ReportException, then abort(). If desired, uncaughtException handlers may be skipped by modifying process._fatalException to return instead of re-running -- per:

[14:54:15]  <chrisdickinson>    jgi: and your proposed patch doesn't exactly restore the status quo so much as it improves it -- only aborting if the final domain throws?
[14:55:14]  <jgi>   chrisdickinson: yes, the proposed patch basically allows domains to work if —abort-on-uncaught-exception is passed to node
[14:55:28]  <jgi>   chrisdickinson: if the top level domain throws, node will abort and dump core
[14:55:33]  <trevnorris>    chrisdickinson: that can be done by returning immediately from https://github.com/joyent/node/blob/v0.10/src/node.js#L264-L268
[14:55:45]  <trevnorris>    that will bypass any uncaughtException callbacks

This approach's behavior:

If domains are active, they will be run. If the last domain emits an unhandled error, the process will abort.
Uncaught exception handlers may or may not be run, depending on whether the above node.js modification is made. Core dumps from these aborts will not contain JS frames, regardless of whether domains are or are not active.

EDIT:

  1. Re: promises, it seems like promises and domains will forever be mutually exclusive with respects to error handling -- promises won't throw exceptions for domains to handle.
  2. @misterdjules helpfully pointed out that the issue wasn't the flag being removed, rather that the flag was causing V8 to abort() too eagerly.

@tjfontaine
Copy link

There are unfortunately two issues being solved by this fix, mostly because it was difficult to separate them. I will enumerate the two problems such that's more clear what's happening.

Problem 1: code explicitly require('domain') and sets an error handler for a domain, node is run with --abort-on-uncaught-exception code path throws, node immediately aborts without running any domain error handlers

Problem 2: if the last error handler in a stack of domains throws node reports (to stderr) the original error that was propagating through the domain stack and not the error that actually threw.

The two problems are not actually related, to clarify what we're trying to solve for in problem 1 is that code that is not covered by a domain in a process that is run with --abort-on-uncaught-exception should still throw, and result in a core file. To accommodate that we must change v8 to call back into node so we can know the state of domains.

While testing it, it's easiest to achieve that by simply throwing in the last error handler in the stack, which is a case that is clearly broken by re-throwing the wrong error object. For people doing post-mortem debugging (mdb, lldb, gdb -- whatever) it's not as important, because what's important for them is that the stack is preserved not about what error object is preserved.

However for those who may hit this issue, where the last domain handler is throwing, and you're trying to debug without post-mortem help, it becomes difficult to understand the behavior because you're seeing the wrong error output on stderr. As a concrete example of this -- try debugging the crypto domain test transient failure that I fixed in: 2afa3d8 without also having this change.

@tjfontaine
Copy link

Also, to be clear, the abort must happen as triggered from the isolate and can never work in the catch block for a trycatch, because by definition the stack has already been unwound and state changed at that point.

The point of aborting here is to preserve as much of the "crime scene" as possible, where debugging your application is actually much harder than simply looking at the Error object's message and stack trace.

@trevnorris
Copy link

Here's a fugly patch that covers all the cases I can think of, and doesn't require V8 to be patched: https://gist.github.com/trevnorris/be1dee9d3fb4bc467b1c

EDIT: I was lazy about it, and there are two tests that fail but doesn't look like anything that can't be worked out.

@tjfontaine
Copy link

There are two kinds of exception handling that can exist in the V8 world.

  • internal try catch'es (internal to the vm)
    • What you're used to writing as try {} catch (e) {}
  • external try catches (external to the vm)
    • What you use from C++, written as TryCatch try_catch;

Because you can have nested exception handling, v8 keeps track of this by inspecting the call stack to decide if an exception "can be caught" and what type of exception handler is being used to catch it.

When --abort-on-uncaught-exception is set in v8, the runtime will abort on one of two conditions:

  • The exception cannot be caught
    • That is to say, there are no exception handlers in the call stack
  • The exception will be caught by an external exception handler

The purpose of --abort-on-uncaught-exception is to provide functionality equivalent to the behavior of a segmentation fault in your traditional C or C++ application. In order to provide that functionality, it must abort before any of the call stack was unwound.

So currently if the most recent try catch handler pushed onto the call stack is an external try catch, v8 will trigger the abort.

  internal
    external
      throw();
      // will abort

  internal
    throw();
    // will not abort

What we want to happen is for v8 to only abort in this case if and only if there is no active domain.

There are only two ways to try and handle this:

  1. Ensure that all JavaScript functions that are associated with domains, are always wrapped first with internal exception handlers
  2. Modify V8 to ask the embedder if it should indeed abort

The first solution is attractive, which is what your change does, but is not sufficient. Consider that the code may call into C++, and that code may put an external handler on the stack. If the code beyond that then throws, it will also abort despite being covered by a domain.

The only way to guarantee this for all code paths, with the least amount of impact for those involved, is to modify v8 such that it's asking the embedder if it should abort.

@misterdjules
Copy link
Author

I just pushed a new commit that does two things:

  • It prevents test-domain-with-abort-on-uncaught-exception.js from dumping core to avoid taking too much disk space on continuous testing and developers' machines. Thanks for the suggestion @trevnorris!
  • It throws errors from different types of callbacks to make sure that we cover most (if not all) of them.

I haven't squashed it yet so that we can review this addition separately. I will be glad to squash these commits into one if we feel that they should both be landed.

@@ -1534,7 +1546,8 @@ Isolate::Isolate()
date_cache_(NULL),
context_exit_happened_(false),
deferred_handles_head_(NULL),
optimizing_compiler_thread_(this) {
optimizing_compiler_thread_(this),
abort_on_uncaught_exception_callback_(0) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: change to NULL.

Julien Gilli added 2 commits November 18, 2014 16:05
Do not abort the process if an error is thrown from within a domain, an
error handler is setup for the domain and --abort-on-uncaught-exception
was passed on the command line.

However, if an error is thrown from within the top-level domain's error
handler and --abort-on-uncaught-exception was passed on the command
line, make the process abort.

Fixes nodejs#8631. Also fixes nodejs#8630.
@misterdjules
Copy link
Author

@trevnorris Updated the changes according to your comments, thanks again!

@trevnorris
Copy link

Thanks. Landed in fbff705 and caeb677.

trevnorris added a commit that referenced this pull request Nov 19, 2014
Add v8::Isolate::SetAbortOnUncaughtException() so the user can be
notified when an uncaught exception has bubbled.

PR-URL: #8666
Reviewed-by: Trevor Norris <trev.norris@gmail.com>
misterdjules pushed a commit that referenced this pull request Nov 19, 2014
Do not abort the process if an error is thrown from within a domain, an
error handler is setup for the domain and --abort-on-uncaught-exception
was passed on the command line.

However, if an error is thrown from within the top-level domain's error
handler and --abort-on-uncaught-exception was passed on the command
line, make the process abort.

Fixes: #8631
Fixes: #8630
PR-URL: #8666
Reviewed-by: Trevor Norris <trev.norris@gmail.com>
@trevnorris trevnorris closed this Nov 19, 2014
richardlau referenced this pull request in ibmruntimes/node Dec 22, 2014
Add v8::Isolate::SetAbortOnUncaughtException() so the user can be
notified when an uncaught exception has bubbled.

PR-URL: joyent#8666
Reviewed-by: Trevor Norris <trev.norris@gmail.com>

Port fbff705 to deps/v8ppc and deps/v8z
mscdex pushed a commit to mscdex/node that referenced this pull request Dec 25, 2014
Add v8::Isolate::SetAbortOnUncaughtException() so the user can be
notified when an uncaught exception has bubbled.

PR-URL: nodejs#8666
Reviewed-by: Trevor Norris <trev.norris@gmail.com>
mscdex pushed a commit to mscdex/node that referenced this pull request Dec 25, 2014
Do not abort the process if an error is thrown from within a domain, an
error handler is setup for the domain and --abort-on-uncaught-exception
was passed on the command line.

However, if an error is thrown from within the top-level domain's error
handler and --abort-on-uncaught-exception was passed on the command
line, make the process abort.

Fixes: nodejs#8631
Fixes: nodejs#8630
PR-URL: nodejs#8666
Reviewed-by: Trevor Norris <trev.norris@gmail.com>
@chrisdickinson chrisdickinson mentioned this pull request Jan 12, 2015
26 tasks
jeroen pushed a commit to v8-314/v8 that referenced this pull request Jun 30, 2016
Add v8::Isolate::SetAbortOnUncaughtException() so the user can be
notified when an uncaught exception has bubbled.

PR-URL: nodejs/node-v0.x-archive#8666
Reviewed-by: Trevor Norris <trev.norris@gmail.com>

Signed-off-by: Jeroen Ooms <jeroenooms@gmail.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants