-
Notifications
You must be signed in to change notification settings - Fork 467
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
src: fix a crashing issue in Error::ThrowAsJavaScriptException #902
Conversation
When terminating an environment (e.g., by calling worker.terminate), napi_throw, which is called by Error::ThrowAsJavaScriptException, returns napi_pending_exception, which is then incorrectly treated as a fatal error resulting in a crash.
Any thoughts on this? :) I'm new to Node.js, so I would appreciate any feedback you guys have. :) I thought about creating a dedicated .cc for the test cases, but seeing how I would basically replicate all the relevant cases from error.cc, I chose to modify the existing ones instead. Not sure which is preferred here. I also thought about testing |
I'm wondering if this should be fixed in napi_throw instead as I assume the same problem could occur in direct use of the API as well? |
@mhdawson Note that the behavior in Furthermore, I think the behavior itself is sound. As most operations will fail in a terminating environment, there must be a way to detect that, to implement error handling correctly. In napi functions, this is done by checking The way regular and termination exceptions are combined into Anyway. Regardless of what error code is used, I think it makes sense to have one, which is why I don't believe this is an issue with N-API, but with how node-addon-api handles the error code. |
@mhdawson Just to be clear, these are the options I'm seeing here:
Is there something I'm not seeing here? Tell me what you think. Thanks. :) |
@rudolftam thanks for the detailed response. I'll have to find a longer block of time to think it through. |
I've looked at this in a bit more detail but I'm not sure avoiding the fatal error is the right behavior. We have have code that is not doing what the author asked/wanted it to do. Maybe handling that exception is important for cleanup and so just ignoring it will lead to a leak every time a thread is terminated. Terminating a running thread is a bit of of a messy situation, but what I don't see any way for the native module to know that it is terminating and to do the right thing. |
I do agree that having |
I don't see anything in the worker API that can let the code running in the Worker know that there was a request to terminate and try to do an orderly shutdown. EDIT the answer might be EnvironmentCleanupHooks https://nodejs.org/api/addons.html#addons_worker_support, https://nodejs.org/api/n-api.html#n_api_napi_add_env_cleanup_hook If that hook gets run when the worker is terminated, the right answer might be that the addon should be written to avoid doing anything after that which might try to call into JavaScript. In the test case described it would be good to find out if the cleanup hook runs before/after the Exception is thrown which causes the reported issue. |
During discussion in the Node-API team meeting @legendecas pointed out that Env::SetInstanceData |
@mhdawson I tested both If that were not the case, then they could indeed be used to improve error handling. Addons could use them to set a termination flag to be checked in between calls to N-API and node-addon-api. This still wouldn't prevent The next logical step to address that would be to use these hooks in node-addon-api as well, and that could work, but I think the proper way to fix this is at the source. N-API should use a dedicated error code for a terminating environment, making workarounds like these unnecessary. For that to work, though, there's yet another thing that may need attention. It looks to me as if N-API doesn't really consider the possibility that the environment can be terminated, not just before but during an N-API call. Let's take
Note how there are three unique error codes in play here. In the second and third cases, the error code is returned when encountering a maybe value that is either empty or nothing. The problem here is that the values, I presume, will be such in a terminating environment as well. Here's what https://github.com/nodejs/node/blob/master/deps/v8/include/v8.h says about it:
Here's what https://github.com/nodejs/node/blob/master/src/README.md#exception-handling says about it:
So, depending on when the environment is terminated, In light of all this, I'm thinking of the following:
Edit: I need more sleep |
One more issue related to |
@mhdawson In a bit more detail, the fix could look something like this:
Scenarios regarding
Is there anything I've overlooked here? What do you think? Thanks. :) |
Any update on this? Is what I suggested sound? Have I overlooked something? Is another approach preferred? I can make the changes I suggested, but I could use some feedback first. Also, if someone else wishes to fix these issues, that's cool. I'm happy as long as there's a viable way of handling these scenarios. Thanks. :) |
Sorry I've not had time to look at this further, and have not yet convinced myself the proposed fix is the right thing to do. |
Longer term I'm thinking we would want to figure out how to tease apart a pending exception and napi_environment_is_terminating as two different cases. This will likely need to be "opt-in" in order to be non SemVer major. Something along what you describe in:
but only enabled if the addon "opts-in", possibly through a new API call. In the short term the following might be a work around in Error::ThrowAsJavaScriptException(): Before calling napi_throw(), add a call to napi_status napi_is_exception_pending(napi_env env, bool* result) The code is as follows: // NAPI_PREAMBLE is not used here: this function must execute when there is a
// pending exception.
CHECK_ENV(env);
CHECK_ARG(env, result);
*result = !env->last_exception.IsEmpty();
return napi_clear_last_error(env); It should tell us if there is an exception pending or not before we even try to throw. This is the same check as will be done if that indicates there is a pending exception then just set status to Something like inline void Error::ThrowAsJavaScriptException() const {
HandleScope scope(_env);
if (!IsEmpty()) {
bool pendingException = false;
// check if there is already a pending exception. If so don't try to throw a new
// one as that is not allowed/possible
napi_status status = napi_is_exception_pending(_env, &pendingException)
if ((status == napi_ok) && (pendingException == false)) {
// We intentionally don't use `NAPI_THROW_*` macros here to ensure
// that there is no possible recursion as `ThrowAsJavaScriptException`
// is part of `NAPI_THROW_*` macro definition for noexcept.
status = napi_throw(_env, Value());
if (status == napi_pending_exception) {
// The environment must be terminating as we checked earlier and there
// was no pending exception. In this case continuing will result
// in a fatal error and there is nothing the author has done incorrectly
// in their code that is worth flagging through a fatal error
return;
}
} else {
status = napi_pending_exception;
}
#ifdef NAPI_CPP_EXCEPTIONS
if (status != napi_ok) {
throw Error::New(_env);
}
#else // NAPI_CPP_EXCEPTIONS
NAPI_FATAL_IF_FAILED(status, "Error::ThrowAsJavaScriptException", "napi_throw");
#endif // NAPI_CPP_EXCEPTIONS
}
} I think that will preserve the existing behavior for the non-terminating case, while avoiding the fatal exception during termination. @rudolftam what do you think. If it seems reasonable to you I'll see in the weekly node-api team meeting if it seems reasonable to the other team members as well. |
@mhdawson Looks good to me! As you probably noticed, though, the third scenario regarding It's not as critical as |
@KevinEady I think you had some thoughts on this in the last team meeting. Once you have time to take a closer look can you comment. |
The following is a real-world case were we see an exception/fatal error during termination (in this case normal termination of the Node.js runtime). We should take a closer look to see if it is a bug or not as that might help inform the the discussion in this issue. To recreate:
This seems to recreate with the earliest version of 14.x and the latest version of 12.x. I see
|
So in the case I mentioned in #902 (comment), the problem was that that db.close() was being called in |
Hi @mhdawson , Regarding this... The thing that I mentioned in the previous meeting is that V8 has a restriction that only objects can have references created off them: new WeakRef("hello")
and this is probably the reason we have the underlying restriction as well and perform a check... but I think that is more related to #912 From what I recall in the meeting, we were debating a few solutions: (1) introducing a new napi_status for "cannot call into js", (2) "swallow" the error by not by not calling into JS ourselves, (3) introduce some wrapper object that gets thrown instead that wraps the primitive... but again, maybe I am confusing the two issues. I do not think we decided on the best approach? But I personally think (1) has the best merit, like @gabrielschulhof mentioned there are going to be more-and-more instances coming up where we need to differentiate between environment shutdown and it may be best to explicitly catch that condition. |
@KevinEady I agree that 1) has merit, but I think we can deal with this issue separately as I suggested a work around that effectively figures out if the return code would be "cannot call into js" in the future. The key thing is whether we agree that not forcing the fatal exception is the right thing to do if the return code is "cannot call into js". If we agree that makes sense then we could fix the Error::ThrowAsJavaScriptException now with that work around and then update later once we have update to more generally return "cannot call into js" and have backported it across the LTS release lines. |
We discussed again today in the team meeting and the consensus seemed to be:
|
I'm wondering, is there really any reason to keep the crashing behavior? Even as an option? Let alone as the default behavior? Wouldn't this be like having a web browser that crashes on you when you close a tab in which the execution happens to be in the right spot? And to avoid the crash, you would have to find out that there's actually an option called "do not crash when closing a tab" that needs to be enabled? Like, why would there even be an option such as that? And why wouldn't it be enabled by default? Why would anyone want to crash in a scenario like that? To put it in another way, wouldn't it be like having a car in which the brakes are connected to a stick of dynamite by default? What's the use of Environment::ExitEnv, node::Stop, or Worker::StopThread (i.e., Worker.terminate) if node-addon-api treats these scenarios as fatal and crashes the whole program? |
@rudolftam the reason we had it originally and would plan to keep as a default is that otherwise there is no way for the developer/user to know that something potentially wrong in the application. In the example I outlined in #902 (comment) the code was written such that the database was not being shutdown correctly. The only reason that was visible was because of the fatal error. Because of that I was able to identify there was a problem and fix the code in order to avoid it. If the exception was silently ignored that would not be the case. We can see that there will be cases where you may not be able to fix the code (mostly likely in the arbitrary case of killing threads) so an option to suppress the exception makes sense for developers who are sure they are doing the right thing. We plan to leave the default to mirror current behavior and because we believe that it is safer to make it a concious decision to ignore the cases were the API can't do what you asked and there is no other way to surface that. |
@mhdawson I agree that the user should be notified when something is potentially wrong in the application, but it's still unclear to me why it is necessary to crash to do so. Consider the following:
Wouldn't this be a viable compromise? It would require only minimal modifications and should be backward compatible. Any code using |
Setting the flag on the environment is a good idea, however, it likely needs to be done in core, for the following two reasons:
|
@gabrielschulhof Thanks for pointing that out! I meant it as an instance variable. I somehow recalled that There are ways around that, but it's probably best to modify Further improvements related to the terminating case could then be made later, the most critical issues having already been fixed (i.e., crashing and hiding errors). Would this be a sensible approach? |
Hi @mhdawson , @rudolftam , As discussed in the Node-API meeting, we wanted to expose this as an opt-in feature. I went with the name I was able to modify https://github.com/rudolftam/node-addon-api/commit/fd4e33aaeb7d3e312834312101f0fa58aff405a5 with this simple patch, incorporating the changes in #902 (comment): diff --git a/common.gypi b/common.gypi
index 9be254f..4955dc9 100644
--- a/common.gypi
+++ b/common.gypi
@@ -15,6 +15,7 @@
}
}]
],
+ 'defines': [ 'NODE_API_SWALLOW_UNTHROWABLE_EXCEPTIONS' ],
'include_dirs': ["<!(node -p \"require('../').include_dir\")"],
'cflags': [ '-Werror', '-Wall', '-Wextra', '-Wpedantic', '-Wunused-parameter' ],
'cflags_cc': [ '-Werror', '-Wall', '-Wextra', '-Wpedantic', '-Wunused-parameter' ]
diff --git a/napi-inl.h b/napi-inl.h
index 5da37d6..40e3f5b 100644
--- a/napi-inl.h
+++ b/napi-inl.h
@@ -2366,12 +2366,37 @@ inline const std::string& Error::Message() const NAPI_NOEXCEPT {
inline void Error::ThrowAsJavaScriptException() const {
HandleScope scope(_env);
if (!IsEmpty()) {
-
+#ifdef NODE_API_SWALLOW_UNTHROWABLE_EXCEPTIONS
+ bool pendingException = false;
+
+ // check if there is already a pending exception. If so don't try to throw a new
+ // one as that is not allowed/possible
+ napi_status status = napi_is_exception_pending(_env, &pendingException);
+
+ if ((status == napi_ok) && (pendingException == false)) {
+ // We intentionally don't use `NAPI_THROW_*` macros here to ensure
+ // that there is no possible recursion as `ThrowAsJavaScriptException`
+ // is part of `NAPI_THROW_*` macro definition for noexcept.
+
+ status = napi_throw(_env, Value());
+
+ if (status == napi_pending_exception) {
+ // The environment must be terminating as we checked earlier and there
+ // was no pending exception. In this case continuing will result
+ // in a fatal error and there is nothing the author has done incorrectly
+ // in their code that is worth flagging through a fatal error
+ return;
+ }
+ } else {
+ status = napi_pending_exception;
+ }
+#else
// We intentionally don't use `NAPI_THROW_*` macros here to ensure
// that there is no possible recursion as `ThrowAsJavaScriptException`
// is part of `NAPI_THROW_*` macro definition for noexcept.
napi_status status = napi_throw(_env, Value());
+#endif
if (status == napi_pending_exception) {
// The environment could be terminating.
These continues to keep the newly added However, I was unable to make test where the define is not enabled. I tried:
I could create a whole new target like @legendecas does in https://github.com/nodejs/node-addon-api/pull/927/files#diff-8587ff6bc921f72a4730fb056f3a1d2b02e0406604c058bef93a09b722d072dbR84-R91 but that would compile everything, and I just want one source to be opt-in or opt-out. So the questions are:
@rudolftam would you like to continue working on the PR with this patch? Since it is an option, it now requires creating documentation. I can also take over if you want. Thanks, Kevin |
@KevinEady I can continue working on the PR, but I still haven't understood the reasoning behind the crashing. Please bear with me. Let's use the example @mhdawson provided earlier in #902 (comment). Here's what https://nodejs.org/api/process.html#process_event_exit says:
This is why calls to node-sqlite3, which is asynchronous by design, will not work correctly in the exit handler. Here's the code for the close function: https://github.com/mapbox/node-sqlite3/blob/593c9d498be2510d286349134537e3bf89401c4a/src/database.cc#L224 Node.js, however, will not crash or even warn the user if the caution regarding the exit handler is ignored and there is still work queued in the event loop when exiting. The question is: why is that? Why isn't Node.js crashing like node-addon-api? I mean, some of the events could be important, right? Like the one meant to close the database? Isn't this the reasoning behind crashing node-addon-api? To help catch programming errors? If the same logic is applied to Node.js, then shouldn't it also crash by default? And the user should either ensure that there are no asynchronous operations left in the event queue before exiting the thread or explicitly disable the crashing behavior? Which one has the correct or preferred behavior? Node.js or node-addon-api? I'm of the opinion that neither of them is correct or preferred, and they should instead write a warning message to stderr. That would make the user aware of potential programming errors without causing all the trouble with crashing and additional flags that need to be enabled to avoid it. I hope I'm not being too abrasive about this. I'm just concerned that node-addon-api may be taking a step in the wrong direction with this decision. What do you guys think about all this? Thanks. |
@KevinEady, I think "Not having the define in binding.gyp and using #define prior to #include <napi.h>: I was not able to opt-in." That should work and I think we use something similar in other cases. |
@mhdawson that is what I thought... I think perhaps my setup was wrong. I will revisit. @rudolftam does bring up a good point regarding |
@rudolftam re: 'taking a step in the wrong direction with this decision'. You could also look at it as a first step instead. Adding an option that people can opt into, instead of not having that option is a step in they direction you are advocating for. If everybody opts in and asks why its not the default that could trigger switching the default. To be honest I'm still not sure what the right final answer is (and only have so much time to think about it). I'm happy to take the first step, not ready to make it the default. I'd prefer to take the first step and hopefully get it into our next release versus having the discussion drag out and not including it in the release we are planning soon. |
@mhdawson You're right, of course. Having the option is infinitely better than not having it, so it's definitely a step in the right direction. I didn't phrase that quite right. :) When are you guys planning on making the new release? @KevinEady Would you like to continue working on this, or should I take over? I'm wondering if the schedule is tight, then it might be better if you guys finish this PR as you know best what to do. |
@rudolftam we are looking to do one as soon as we can close out #906 @KevinEady correct me if I'm wrong but I think you were already working to incorporate the original suggestion, some of what I'd suggested and input from our discusions into a PR |
This is going to be addressed through #975 instead. Closing. Please let us know if you think that was not the right thing to do. |
When terminating an environment (e.g., by calling
worker.terminate
),napi_throw
, which is called byError::ThrowAsJavaScriptException
, returnsnapi_pending_exception
, which is then incorrectly treated as a fatal error resulting in a crash.This is relatively easy to trigger with a native addon that runs a long synchronous operation, which then throws a JavaScript exception, but not before the environment has begun to terminate.
The other instances where errors are treated fatal in node-addon-api are unlikely to cause similar issues. Unlike
napi_throw
, none of the respective napi functions useNAPI_PREAMBLE
macro (which returnsnapi_pending_exception
in a terminating environment), and the few functions that handle V8 maybe types are tested with the test cases I added.