Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

src: fix a crashing issue in Error::ThrowAsJavaScriptException #902

Closed
wants to merge 1 commit into from
Closed

src: fix a crashing issue in Error::ThrowAsJavaScriptException #902

wants to merge 1 commit into from

Conversation

ghost
Copy link

@ghost ghost commented Feb 7, 2021

When terminating an environment (e.g., by calling worker.terminate), napi_throw, which is called by Error::ThrowAsJavaScriptException, returns napi_pending_exception, which is then incorrectly treated as a fatal error resulting in a crash.

This is relatively easy to trigger with a native addon that runs a long synchronous operation, which then throws a JavaScript exception, but not before the environment has begun to terminate.

The other instances where errors are treated fatal in node-addon-api are unlikely to cause similar issues. Unlike napi_throw, none of the respective napi functions use NAPI_PREAMBLE macro (which returns napi_pending_exception in a terminating environment), and the few functions that handle V8 maybe types are tested with the test cases I added.

When terminating an environment (e.g., by calling worker.terminate),
napi_throw, which is called by Error::ThrowAsJavaScriptException,
returns napi_pending_exception, which is then incorrectly treated as
a fatal error resulting in a crash.
@ghost
Copy link
Author

ghost commented Feb 12, 2021

Any thoughts on this? :)

I'm new to Node.js, so I would appreciate any feedback you guys have. :)

I thought about creating a dedicated .cc for the test cases, but seeing how I would basically replicate all the relevant cases from error.cc, I chose to modify the existing ones instead. Not sure which is preferred here.

I also thought about testing node::Stop on the main environment (relevant for embedders), but since worker.terminate calls node::Stop eventually, I think it's safe to ignore it. It could be done by including node.h and then doing something like node::Stop(node::GetCurrentEnvironment(v8::Isolate::GetCurrent()->GetCurrentContext()));

@mhdawson
Copy link
Member

I'm wondering if this should be fixed in napi_throw instead as I assume the same problem could occur in direct use of the API as well?

@ghost
Copy link
Author

ghost commented Feb 14, 2021

@mhdawson Note that the behavior in napi_throw is caused by the NAPI_PREAMBLE macro, which is used widely throughout the API. Dozens of napi functions may trigger issues similar to this one, depending on how addons and node-addon-api handle the error code it returns.

Furthermore, I think the behavior itself is sound. As most operations will fail in a terminating environment, there must be a way to detect that, to implement error handling correctly. In napi functions, this is done by checking (env)->can_call_into_js in NAPI_PREAMBLE macro. In addons and node-addon-api, it is done by checking for napi_pending_exception.

The way regular and termination exceptions are combined into napi_pending_exception is really confusing, though, and I feel like they should be separated. Something like napi_environment_is_terminating would be quite self-explanatory. There should at the very least be a visible mention about the behavior in N-API documentation, which currently doesn't seem to say really anything about this.

Anyway. Regardless of what error code is used, I think it makes sense to have one, which is why I don't believe this is an issue with N-API, but with how node-addon-api handles the error code.

@ghost
Copy link
Author

ghost commented Feb 16, 2021

@mhdawson Just to be clear, these are the options I'm seeing here:

  1. Do nothing and leave the crashing behavior in node-addon-api. Yikes.

  2. Modify napi_throw to return napi_ok even in a terminating environment. This could work for napi_throw, as it doesn't really affect the state in a meaningful way in this context, and you are expected to return to JavaScript right after calling it. The same cannot be said for the dozens of other napi functions using the NAPI_PREAMBLE macro, though. This would leave addons and node-addon-api vulnerable for this issue to reappear in another context.

  3. Modify NAPI_PREAMBLE macro to not react to a terminating environment. This would, I would imagine, make napi functions fail in a variety of ways and return unspecific error codes or maybe even crash. How would addons and node-addon-api know whether the returned error code is due to a terminating environment or due to something that should be interpreted as a programming error?

  4. Modify Error::ThrowAsJavaScriptException to handle the error code returned by napi_throw correctly and then modify N-API documentation to clarify that napi_pending_exception is used in a terminating environment as well. Optionally create a dedicated error code for it to make the behavior less confusing.

Is there something I'm not seeing here? Tell me what you think. Thanks. :)

@mhdawson
Copy link
Member

@rudolftam thanks for the detailed response. I'll have to find a longer block of time to think it through.

@mhdawson
Copy link
Member

mhdawson commented Feb 17, 2021

I've looked at this in a bit more detail but I'm not sure avoiding the fatal error is the right behavior. We have have code that is not doing what the author asked/wanted it to do. Maybe handling that exception is important for cleanup and so just ignoring it will lead to a leak every time a thread is terminated.

Terminating a running thread is a bit of of a messy situation, but what I don't see any way for the native module to know that it is terminating and to do the right thing.

@mhdawson
Copy link
Member

I do agree that having napi_pending_exception returned when there is no exception, but instead things are shutting down is confusing. I can't remember why that made sense at the time.

@mhdawson
Copy link
Member

mhdawson commented Feb 18, 2021

I don't see anything in the worker API that can let the code running in the Worker know that there was a request to terminate and try to do an orderly shutdown.

EDIT the answer might be EnvironmentCleanupHooks https://nodejs.org/api/addons.html#addons_worker_support, https://nodejs.org/api/n-api.html#n_api_napi_add_env_cleanup_hook

If that hook gets run when the worker is terminated, the right answer might be that the addon should be written to avoid doing anything after that which might try to call into JavaScript.

In the test case described it would be good to find out if the cleanup hook runs before/after the Exception is thrown which causes the reported issue.

@mhdawson
Copy link
Member

During discussion in the Node-API team meeting @legendecas pointed out that Env::SetInstanceData
can include a finalizer and possibly that the addon should use to avoid calling JS after termination as this should give an indication when the thread is terminating.

@ghost
Copy link
Author

ghost commented Feb 20, 2021

@mhdawson I tested both napi_add_env_cleanup_hook and napi_set_instance_data, and in both cases, the callback is called from the same thread that runs the environment, meaning it won't be called in time to help with any of this.

If that were not the case, then they could indeed be used to improve error handling. Addons could use them to set a termination flag to be checked in between calls to N-API and node-addon-api. This still wouldn't prevent Error::ThrowAsJavaScriptException from crashing, though, if the execution happens to be inside it when the environment is terminated.

The next logical step to address that would be to use these hooks in node-addon-api as well, and that could work, but I think the proper way to fix this is at the source. N-API should use a dedicated error code for a terminating environment, making workarounds like these unnecessary. For that to work, though, there's yet another thing that may need attention. It looks to me as if N-API doesn't really consider the possibility that the environment can be terminated, not just before but during an N-API call.

Let's take napi_has_property as an example. It uses four macros that are affected by a terminating environment:

  1. NAPI_PREAMBLE, our old friend, returns napi_pending_exception when (env)->can_call_into_js evaluates to false.

  2. CHECK_TO_OBJECT returns napi_object_expected when it fails to convert the provided napi_value into v8::Local<v8::Object>.

  3. CHECK_MAYBE_NOTHING returns napi_generic_failure when the provided maybe is nothing.

  4. GET_RETURN_STATUS returns napi_pending_exception when an exception (either a regular JavaScript exception or a TerminateExecution exception) has been caught.

Note how there are three unique error codes in play here. In the second and third cases, the error code is returned when encountering a maybe value that is either empty or nothing. The problem here is that the values, I presume, will be such in a terminating environment as well.

Here's what https://github.com/nodejs/node/blob/master/deps/v8/include/v8.h says about it:

A simple Maybe type, representing an object which may or may not have a
value, see https://hackage.haskell.org/package/base/docs/Data-Maybe.html.

If an API method returns a Maybe<>, the API method can potentially fail
either because an exception is thrown, or because an exception is pending,
e.g. because a previous API call threw an exception that hasn't been caught
yet, or because a TerminateExecution exception was thrown. In that case, a
"Nothing" value is returned.

Here's what https://github.com/nodejs/node/blob/master/src/README.md#exception-handling says about it:

Calls that invoke any JavaScript code, including JavaScript code that is provided from Node.js internals or V8 internals, will fail when JavaScript execution is being terminated. This typically happens inside Workers when worker.terminate() is called, but it can also affect the main thread when e.g. Node.js is used as an embedded library. These exceptions can happen at any point. It is not always obvious whether a V8 call will enter JavaScript. In addition to unexpected getters and setters, accessing some types of built-in objects like Maps and Sets can also run V8-internal JavaScript code.

So, depending on when the environment is terminated, napi_has_property will return either napi_pending_exception, napi_object_expected, or napi_generic_failure. Yikes!

In light of all this, I'm thinking of the following:

  1. Modify N-API to use a dedicated error code for a terminating environment (e.g., napi_environment_is_terminating), considering the possibility that the environment can be terminated, not just before but during an N-API call.

  2. Modify N-API documentation to match the new behavior and add a mention to each function that may return the code.

  3. Modify node-addon-api to take the new behavior into account. At the very least, Error::ThrowAsJavaScriptException must be modified to avoid crashing in a terminating environment. In addition to that, I think it would be wise to add something like Env::IsTerminating based on a flag that is set when encountering the corresponding error code from an N-API call.

Edit: I need more sleep

@ghost
Copy link
Author

ghost commented Feb 22, 2021

One more issue related to napi_pending_exception. Turns out napi_is_exception_pending does not share the behavior with the other napi functions in a terminating environment. While napi_throw and others return napi_pending_exception, napi_is_exception_pending instead sets the result to false. This could affect error handling as neither napi_is_exception_pending nor Env::IsExceptionPending can be used to detect failures caused by a terminating environment.

@ghost
Copy link
Author

ghost commented Feb 23, 2021

@mhdawson In a bit more detail, the fix could look something like this:

  1. Add napi_is_environment_terminating. Environment::can_call_into_js would probably work here. It's used in the NAPI_PREAMBLE macro as well.

  2. Modify napi functions to return napi_environment_is_terminating when needed. The relevant ones, I presume, are the same as the ones using the NAPI_PREAMBLE macro. The modifications would involve all the macros that may return error codes to check for napi_is_environment_terminating before returning. This would also guarantee that the error code is used consistently regardless of when the environment is terminated.

  3. Modify napi documentation to match the new behavior and add a mention to each function that may return the new error code.

  4. Add Env::IsTerminating. Make use of napi_is_environment_terminating.

  5. Modify Error::ThrowAsJavaScriptException to ignore napi_environment_is_terminating from napi_throw.

  6. Modify node-addon-api documentation to match the new behavior.

Scenarios regarding Error::ThrowAsJavaScriptException would go as follows:

  1. When C++ exceptions are enabled, and Error::ThrowAsJavaScriptException is called indirectly right before returning to JavaScript, no harm is done by ignoring the fact that it fails. No JavaScript code is run afterward. And the addon is destroyed before any further calls are made to it from JavaScript.

  2. When C++ exceptions are disabled, and Error::ThrowAsJavaScriptException is called directly, the user is expected to return to JavaScript right afterward, making it mostly the same scenario as the previous one. Env::IsTerminating can still be used, though, if needed.

  3. When C++ exceptions are disabled, and a call to node-addon-api fails, the user checks for Env::IsExceptionPending and Env::IsTerminating and performs the necessary cleanup, if any. Alternatively, a wrapper can be used to combine these two, as the actions following them are usually the same (i.e., return to JavaScript).

Is there anything I've overlooked here? What do you think? Thanks. :)

@ghost
Copy link
Author

ghost commented Mar 4, 2021

Any update on this? Is what I suggested sound? Have I overlooked something? Is another approach preferred?

I can make the changes I suggested, but I could use some feedback first. Also, if someone else wishes to fix these issues, that's cool. I'm happy as long as there's a viable way of handling these scenarios. Thanks. :)

@mhdawson
Copy link
Member

mhdawson commented Mar 8, 2021

Sorry I've not had time to look at this further, and have not yet convinced myself the proposed fix is the right thing to do.

@mhdawson
Copy link
Member

mhdawson commented Mar 11, 2021

Longer term I'm thinking we would want to figure out how to tease apart a pending exception and napi_environment_is_terminating as two different cases. This will likely need to be "opt-in" in order to be non SemVer major. Something along what you describe in:

Modify napi functions to return napi_environment_is_terminating when needed. The relevant ones, I presume, are the same as the ones using the NAPI_PREAMBLE macro. The modifications would involve all the macros that may return error codes to check for napi_is_environment_terminating before returning. This would also guarantee that the error code is used consistently regardless of when the environment is terminated.

but only enabled if the addon "opts-in", possibly through a new API call.

In the short term the following might be a work around in Error::ThrowAsJavaScriptException():

Before calling napi_throw(), add a call to napi_status napi_is_exception_pending(napi_env env, bool* result)

The code is as follows:

 // NAPI_PREAMBLE is not used here: this function must execute when there is a
  // pending exception.
  CHECK_ENV(env);
  CHECK_ARG(env, result);

  *result = !env->last_exception.IsEmpty();
  return napi_clear_last_error(env);

It should tell us if there is an exception pending or not before we even try to throw. This is the same check as will be done
in NAPI_PREAMBLE except that it does not check (env)->can_call_into_js()

if that indicates there is a pending exception then just set status to napi_pending_exception otherwise call napi_throw and do the return if it says napi_pending_exception as that must mean that (env)->can_call_into_js() failed.

Something like

inline void Error::ThrowAsJavaScriptException() const {
  HandleScope scope(_env);
  if (!IsEmpty()) {
    bool pendingException = false;

    // check if there is already a pending exception. If so don't try to throw a new
    // one as that is not allowed/possible
    napi_status status = napi_is_exception_pending(_env, &pendingException) 

    if ((status == napi_ok) && (pendingException == false)) {
      // We intentionally don't use `NAPI_THROW_*` macros here to ensure
      // that there is no possible recursion as `ThrowAsJavaScriptException`
      // is part of `NAPI_THROW_*` macro definition for noexcept.

      status = napi_throw(_env, Value());

      if (status == napi_pending_exception) {
        // The environment must be terminating as we checked earlier and there
        // was no pending exception. In this case continuing will result
        // in a fatal error and there is nothing the author has done incorrectly
        // in their code that is worth flagging through a fatal error
        return;
      }
    } else {
      status = napi_pending_exception;
    }

#ifdef NAPI_CPP_EXCEPTIONS
    if (status != napi_ok) {
      throw Error::New(_env);
    }
#else // NAPI_CPP_EXCEPTIONS
    NAPI_FATAL_IF_FAILED(status, "Error::ThrowAsJavaScriptException", "napi_throw");
#endif // NAPI_CPP_EXCEPTIONS
  }
}

I think that will preserve the existing behavior for the non-terminating case, while avoiding the fatal exception during termination.

@rudolftam what do you think. If it seems reasonable to you I'll see in the weekly node-api team meeting if it seems reasonable to the other team members as well.

@ghost
Copy link
Author

ghost commented Mar 11, 2021

@mhdawson Looks good to me!

As you probably noticed, though, the third scenario regarding Error::ThrowAsJavaScriptException I described earlier does not work as expected. Any code using Env::IsExceptionPending may run into trouble as it gives no indication that the previous operation may have failed due to a terminating environment.

https://github.com/nodejs/node-addon-api/blob/main/doc/error_handling.md#handling-a-n-api-js-exception

It's not as critical as Error::ThrowAsJavaScriptException crashing, though, so it can be fixed later, but just something to keep in mind.

@mhdawson
Copy link
Member

@KevinEady I think you had some thoughts on this in the last team meeting. Once you have time to take a closer look can you comment.

@mhdawson
Copy link
Member

The following is a real-world case were we see an exception/fatal error during termination (in this case normal termination of the Node.js runtime). We should take a closer look to see if it is a bug or not as that might help inform the the discussion in this issue.

To recreate:

This seems to recreate with the earliest version of 14.x and the latest version of 12.x.

I see

1616019511323:INFO: mainline:createDbFixtures(resolvedPromise): Script finished at: 3/17/2021, 6:18:31 PM
FATAL ERROR: Error::ThrowAsJavaScriptException napi_throw
 1: 0xa18150 node::Abort() [node]
 2: 0xa1855c node::OnFatalError(char const*, char const*) [node]
 3: 0xa185f9  [node]
 4: 0x9ec06b napi_fatal_error [node]
 5: 0x7f537e42b92d  [/home/midawson/learningpath/IBM-Developer/Node.js/Course/Unit-6/node_modules/sqlite3/lib/binding/napi-v3-linux-x64/node_sqlite3.node]
 6: 0x7f537e42cd63 Napi::Error::ThrowAsJavaScriptException() const [/home/midawson/learningpath/IBM-Developer/Node.js/Course/Unit-6/node_modules/sqlite3/lib/binding/napi-v3-linux-x64/node_sqlite3.node]
 7: 0x7f537e432f5d Napi::Function::MakeCallback(napi_value__*, unsigned long, napi_value__* const*, napi_async_context__*) const [/home/midawson/learningpath/IBM-Developer/Node.js/Course/Unit-6/node_modules/sqlite3/lib/binding/napi-v3-linux-x64/node_sqlite3.node]
 8: 0x7f537e437a07 node_sqlite3::Database::Work_AfterClose(napi_env__*, napi_status, void*) [/home/midawson/learningpath/IBM-Developer/Node.js/Course/Unit-6/node_modules/sqlite3/lib/binding/napi-v3-linux-x64/node_sqlite3.node]
 9: 0x9eab13  [node]
10: 0x136bfa5  [node]
11: 0x13706be  [node]
12: 0x1383755  [node]
13: 0x1370eff uv_run [node]
14: 0x9c70d9 node::Environment::CleanupHandles() [node]
15: 0x9c718f node::Environment::RunCleanup() [node]
16: 0x986d67 node::FreeEnvironment(node::Environment*) [node]
17: 0xa5b1af node::NodeMainInstance::Run() [node]
18: 0x9e8a3c node::Start(int, char**) [node]
19: 0x7f53851ac7b3 __libc_start_main [/lib64/libc.so.6]
20: 0x981d35  [node]
Aborted (core dumped)

@mhdawson
Copy link
Member

mhdawson commented Mar 20, 2021

So in the case I mentioned in #902 (comment), the problem was that that db.close() was being called in process.on('exit',...) In this case it was either a problem in the user of the db api, or in the API itself depending on whether the API is supposed to allow calling during shutdown. In this case moving the call to be before the program terminated fixed the problem and I think the crash did identify a real problem (database not being closed properly) and it was possibly to fix.

@KevinEady
Copy link
Contributor

Hi @mhdawson ,

Regarding this... The thing that I mentioned in the previous meeting is that V8 has a restriction that only objects can have references created off them:

new WeakRef("hello")
VM101:1 Uncaught TypeError: WeakRef: target must be an object
    at new WeakRef (<anonymous>)
    at <anonymous>:1:1

and this is probably the reason we have the underlying restriction as well and perform a check... but I think that is more related to #912

From what I recall in the meeting, we were debating a few solutions: (1) introducing a new napi_status for "cannot call into js", (2) "swallow" the error by not by not calling into JS ourselves, (3) introduce some wrapper object that gets thrown instead that wraps the primitive... but again, maybe I am confusing the two issues.

I do not think we decided on the best approach? But I personally think (1) has the best merit, like @gabrielschulhof mentioned there are going to be more-and-more instances coming up where we need to differentiate between environment shutdown and it may be best to explicitly catch that condition.

@mhdawson
Copy link
Member

@KevinEady I agree that 1) has merit, but I think we can deal with this issue separately as I suggested a work around that effectively figures out if the return code would be "cannot call into js" in the future.

The key thing is whether we agree that not forcing the fatal exception is the right thing to do if the return code is "cannot call into js". If we agree that makes sense then we could fix the Error::ThrowAsJavaScriptException now with that work around and then update later once we have update to more generally return "cannot call into js" and have backported it across the LTS release lines.

@mhdawson
Copy link
Member

We discussed again today in the team meeting and the consensus seemed to be:

  • Add a #define that can control whether the Fatal exception is thrown in the cannot call into JS case or not.
  • This will allow the maintainer to select the behavior they believe is appropriate for their package.
  • The default will continue to be throwing the fatal exception.

@ghost
Copy link
Author

ghost commented Mar 28, 2021

I'm wondering, is there really any reason to keep the crashing behavior? Even as an option? Let alone as the default behavior?

Wouldn't this be like having a web browser that crashes on you when you close a tab in which the execution happens to be in the right spot? And to avoid the crash, you would have to find out that there's actually an option called "do not crash when closing a tab" that needs to be enabled?

Like, why would there even be an option such as that? And why wouldn't it be enabled by default? Why would anyone want to crash in a scenario like that?

To put it in another way, wouldn't it be like having a car in which the brakes are connected to a stick of dynamite by default? What's the use of Environment::ExitEnv, node::Stop, or Worker::StopThread (i.e., Worker.terminate) if node-addon-api treats these scenarios as fatal and crashes the whole program?

@mhdawson
Copy link
Member

@rudolftam the reason we had it originally and would plan to keep as a default is that otherwise there is no way for the developer/user to know that something potentially wrong in the application. In the example I outlined in #902 (comment) the code was written such that the database was not being shutdown correctly. The only reason that was visible was because of the fatal error. Because of that I was able to identify there was a problem and fix the code in order to avoid it. If the exception was silently ignored that would not be the case.

We can see that there will be cases where you may not be able to fix the code (mostly likely in the arbitrary case of killing threads) so an option to suppress the exception makes sense for developers who are sure they are doing the right thing. We plan to leave the default to mirror current behavior and because we believe that it is safer to make it a concious decision to ignore the cases were the API can't do what you asked and there is no other way to surface that.

@ghost
Copy link
Author

ghost commented Mar 31, 2021

@mhdawson I agree that the user should be notified when something is potentially wrong in the application, but it's still unclear to me why it is necessary to crash to do so. Consider the following:

  1. Modify Error::ThrowAsJavaScriptException to set Env::_terminating when napi_throw returns napi_pending_exception and napi_is_exception_pending sets the result to false. Then write a warning message to stderr and return without crashing.

  2. Modify Env::IsExceptionPending to check for Env::_terminating.

  3. Add Env::IsTerminating to enable users to differentiate the two cases. (Optional)

Wouldn't this be a viable compromise?

It would require only minimal modifications and should be backward compatible. Any code using Env::IsExceptionPending should now work in the terminating case as well. Env::GetAndClearPendingException doesn't clear the termination, of course, but it shouldn't break the error handling. And the warning message to stderr is a standard procedure in scenarios where something potentially dangerous is happening but doesn't make sense to stop the program.

@gabrielschulhof
Copy link
Contributor

Setting the flag on the environment is a good idea, however, it likely needs to be done in core, for the following two reasons:

  • If Napi::Env::_terminating1 is a static flag, then it is not thread-safe.
  • If it is an instance flag on Napi::Env then it will not be preserved across stacks, because a Napi::Env instance is constructed from the underlying napi_env value every time control pass back into node-addon-api.

@ghost
Copy link
Author

ghost commented Apr 4, 2021

@gabrielschulhof Thanks for pointing that out! I meant it as an instance variable. I somehow recalled that Napi::Env is passed around as a reference, which is not the case.

There are ways around that, but it's probably best to modify napi_is_exception_pending directly to check for Environment::can_call_into_js and then backport it to previous versions. This is what I assumed the behavior to be when I created this PR. The difference would be the additional warning message to stderr to warn the users about potentially dangerous behavior that can help catch issues like the one @mhdawson outlined earlier.

Further improvements related to the terminating case could then be made later, the most critical issues having already been fixed (i.e., crashing and hiding errors).

Would this be a sensible approach?

@KevinEady
Copy link
Contributor

Hi @mhdawson , @rudolftam ,

As discussed in the Node-API meeting, we wanted to expose this as an opt-in feature. I went with the name NODE_API_SWALLOW_UNTHROWABLE_EXCEPTIONS.

I was able to modify https://github.com/rudolftam/node-addon-api/commit/fd4e33aaeb7d3e312834312101f0fa58aff405a5 with this simple patch, incorporating the changes in #902 (comment):

diff --git a/common.gypi b/common.gypi
index 9be254f..4955dc9 100644
--- a/common.gypi
+++ b/common.gypi
@@ -15,6 +15,7 @@
       }
     }]
   ],
+  'defines': [ 'NODE_API_SWALLOW_UNTHROWABLE_EXCEPTIONS' ],
   'include_dirs': ["<!(node -p \"require('../').include_dir\")"],
   'cflags': [ '-Werror', '-Wall', '-Wextra', '-Wpedantic', '-Wunused-parameter' ],
   'cflags_cc': [ '-Werror', '-Wall', '-Wextra', '-Wpedantic', '-Wunused-parameter' ]
diff --git a/napi-inl.h b/napi-inl.h
index 5da37d6..40e3f5b 100644
--- a/napi-inl.h
+++ b/napi-inl.h
@@ -2366,12 +2366,37 @@ inline const std::string& Error::Message() const NAPI_NOEXCEPT {
 inline void Error::ThrowAsJavaScriptException() const {
   HandleScope scope(_env);
   if (!IsEmpty()) {
-
+#ifdef NODE_API_SWALLOW_UNTHROWABLE_EXCEPTIONS
+    bool pendingException = false;
+
+    // check if there is already a pending exception. If so don't try to throw a new
+    // one as that is not allowed/possible
+    napi_status status = napi_is_exception_pending(_env, &pendingException);
+
+    if ((status == napi_ok) && (pendingException == false)) {
+      // We intentionally don't use `NAPI_THROW_*` macros here to ensure
+      // that there is no possible recursion as `ThrowAsJavaScriptException`
+      // is part of `NAPI_THROW_*` macro definition for noexcept.
+
+      status = napi_throw(_env, Value());
+
+      if (status == napi_pending_exception) {
+        // The environment must be terminating as we checked earlier and there
+        // was no pending exception. In this case continuing will result
+        // in a fatal error and there is nothing the author has done incorrectly
+        // in their code that is worth flagging through a fatal error
+        return;
+      }
+    } else {
+      status = napi_pending_exception;
+    }
+#else
     // We intentionally don't use `NAPI_THROW_*` macros here to ensure
     // that there is no possible recursion as `ThrowAsJavaScriptException`
     // is part of `NAPI_THROW_*` macro definition for noexcept.
 
     napi_status status = napi_throw(_env, Value());
+#endif
 
     if (status == napi_pending_exception) {
       // The environment could be terminating.

These continues to keep the newly added error_terminating_environment test passing.

However, I was unable to make test where the define is not enabled. I tried:

  1. Not having the define in binding.gyp and using #define prior to #include <napi.h>: I was not able to opt-in.
  2. Having the define in binding.gyp and using #undef prior to #include <napi.h>: I was not able to opt-out.

I could create a whole new target like @legendecas does in https://github.com/nodejs/node-addon-api/pull/927/files#diff-8587ff6bc921f72a4730fb056f3a1d2b02e0406604c058bef93a09b722d072dbR84-R91 but that would compile everything, and I just want one source to be opt-in or opt-out.

So the questions are:

  1. Is the name NODE_API_SWALLOW_UNTHROWABLE_EXCEPTIONS acceptable?
  2. Do we need a test where we opt-out of the feature?

@rudolftam would you like to continue working on the PR with this patch? Since it is an option, it now requires creating documentation. I can also take over if you want.

Thanks, Kevin

@ghost
Copy link
Author

ghost commented Apr 12, 2021

@KevinEady I can continue working on the PR, but I still haven't understood the reasoning behind the crashing. Please bear with me. Let's use the example @mhdawson provided earlier in #902 (comment).

Here's what https://nodejs.org/api/process.html#process_event_exit says:

Listener functions must only perform synchronous operations. The Node.js process will exit immediately after calling the 'exit' event listeners causing any additional work still queued in the event loop to be abandoned. In the following example, for instance, the timeout will never occur:

process.on('exit', (code) => {
  setTimeout(() => {
    console.log('This will not run');
  }, 0);
});

This is why calls to node-sqlite3, which is asynchronous by design, will not work correctly in the exit handler. Here's the code for the close function: https://github.com/mapbox/node-sqlite3/blob/593c9d498be2510d286349134537e3bf89401c4a/src/database.cc#L224

Node.js, however, will not crash or even warn the user if the caution regarding the exit handler is ignored and there is still work queued in the event loop when exiting.

The question is: why is that?

Why isn't Node.js crashing like node-addon-api? I mean, some of the events could be important, right? Like the one meant to close the database? Isn't this the reasoning behind crashing node-addon-api? To help catch programming errors?

If the same logic is applied to Node.js, then shouldn't it also crash by default? And the user should either ensure that there are no asynchronous operations left in the event queue before exiting the thread or explicitly disable the crashing behavior?

Which one has the correct or preferred behavior? Node.js or node-addon-api?

I'm of the opinion that neither of them is correct or preferred, and they should instead write a warning message to stderr. That would make the user aware of potential programming errors without causing all the trouble with crashing and additional flags that need to be enabled to avoid it.

I hope I'm not being too abrasive about this. I'm just concerned that node-addon-api may be taking a step in the wrong direction with this decision. What do you guys think about all this? Thanks.

@mhdawson
Copy link
Member

@KevinEady, I think

"Not having the define in binding.gyp and using #define prior to #include <napi.h>: I was not able to opt-in."

That should work and I think we use something similar in other cases.

@KevinEady
Copy link
Contributor

@mhdawson that is what I thought... I think perhaps my setup was wrong. I will revisit.

@rudolftam does bring up a good point regarding exit handler and timeouts: the Node behavior is to silently "fail" by not executing the handler. Does this situation also make sense here, when attempting to throw an error on env shutdown?

@mhdawson
Copy link
Member

@rudolftam re: 'taking a step in the wrong direction with this decision'.

You could also look at it as a first step instead. Adding an option that people can opt into, instead of not having that option is a step in they direction you are advocating for. If everybody opts in and asks why its not the default that could trigger switching the default.

To be honest I'm still not sure what the right final answer is (and only have so much time to think about it). I'm happy to take the first step, not ready to make it the default. I'd prefer to take the first step and hopefully get it into our next release versus having the discussion drag out and not including it in the release we are planning soon.

@ghost
Copy link
Author

ghost commented Apr 13, 2021

@mhdawson You're right, of course. Having the option is infinitely better than not having it, so it's definitely a step in the right direction. I didn't phrase that quite right. :) When are you guys planning on making the new release?

@KevinEady Would you like to continue working on this, or should I take over? I'm wondering if the schedule is tight, then it might be better if you guys finish this PR as you know best what to do.

@mhdawson
Copy link
Member

@rudolftam we are looking to do one as soon as we can close out #906

@KevinEady correct me if I'm wrong but I think you were already working to incorporate the original suggestion, some of what I'd suggested and input from our discusions into a PR

@mhdawson
Copy link
Member

This is going to be addressed through #975 instead. Closing. Please let us know if you think that was not the right thing to do.

@mhdawson mhdawson closed this Jun 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants