Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unhandledRejection does not have the correct stack trace. #5084

Closed
Raynos opened this issue Feb 4, 2016 · 64 comments
Closed

unhandledRejection does not have the correct stack trace. #5084

Raynos opened this issue Feb 4, 2016 · 64 comments
Labels
lib / src Issues and PRs related to general changes in the lib or src directory. post-mortem Issues and PRs related to the post-mortem diagnostics of Node.js. v8 engine Issues and PRs related to the V8 dependency.

Comments

@Raynos
Copy link
Contributor

Raynos commented Feb 4, 2016

Given the recent PR ( #5020 ) I wanted to see if promises could enable core dump based debugging.

It looks like currently when an unhandledRejection occurs the stack is gone. This is very similar to how the stack is gone when an uncaughtException occurs.

We've added --abort-on-uncaught-exception flag to allow us to abort the process with the correct stack in C++. We need to implement --abort-on-unhandled-rejection flag as well.

See foo.js example

process.on('unhandledRejection', function onError(err) {
     throw err;
});

function foo() {
    new Promise(function reject(re, rj) { rj(new Error('sad')) });
}
foo();
raynos at raynos-SVS15127PXB  ~
$ node foo.js 
/home/raynos/foo.js:2
     throw err;
     ^

Error: sad
    at reject (/home/raynos/foo.js:6:46)
    at foo (/home/raynos/foo.js:6:5)
    at Object.<anonymous> (/home/raynos/foo.js:8:1)
    at Module._compile (module.js:435:26)
    at Object.Module._extensions..js (module.js:442:10)
    at Module.load (module.js:356:32)
    at Function.Module._load (module.js:313:12)
    at Function.Module.runMain (module.js:467:10)
    at startup (node.js:136:18)
    at node.js:963:3

Note the error has the correct stacktrace including foo and reject

raynos at raynos-SVS15127PXB  ~
$ node --abort-on-uncaught-exception foo.js 
Uncaught Error: sad

FROM
process.onError (/home/raynos/foo.js:2:6)
emitTwo (events.js:87:13)
process.emit (events.js:172:7)
emitPendingUnhandledRejections (node.js:500:24)
runMicrotasksCallback (node.js:329:11)
doNTCallback0 (node.js:419:9)
process._tickCallback (node.js:348:13)
Function.Module.runMain (module.js:469:11)
startup (node.js:136:18)
node.js:963:3
Illegal instruction

Note that the actual stack of the program when in the unhandledRejection listener is wrong. This means we cannot just process.abort() in there since that's absolutely useless, I want to stack that mattered.

I considered this a MUSTFIX before we land promises into core, if we cannot debug promises then there is not point in using it. The only good feature nodejs has is extreme debuggability.

cc: @chrisdickinson @misterdjules

@bnoordhuis
Copy link
Member

isolate->SetPromiseRejectCallback(reinterpret_cast<v8::PromiseRejectCallback>(abort)) is theoretically all you need. The call chain leading up to abort() is the stack trace you're looking for, if I understand you right.

@Raynos
Copy link
Contributor Author

Raynos commented Feb 4, 2016

Correct. The stack leading upto abort() should include the line on which the error was thrown.

See below an example of errors and --abort-on-uncaught-exception notice, how the abort stack contains foo function.

raynos at raynos-SVS15127PXB  ~
$ node foo.js 
/home/raynos/foo.js:6
    throw new Error('sad');
    ^

Error: sad
    at foo (/home/raynos/foo.js:6:11)
    at Object.<anonymous> (/home/raynos/foo.js:8:1)
    at Module._compile (module.js:435:26)
    at Object.Module._extensions..js (module.js:442:10)
    at Module.load (module.js:356:32)
    at Function.Module._load (module.js:313:12)
    at Function.Module.runMain (module.js:467:10)
    at startup (node.js:136:18)
    at node.js:963:3
raynos at raynos-SVS15127PXB  ~
$ node --abort-on-uncaught-exception foo.js 
Uncaught Error: sad

FROM
foo (/home/raynos/foo.js:6:5)
Object.<anonymous> (/home/raynos/foo.js:8:1)
Module._compile (module.js:435:26)
Object.Module._extensions..js (module.js:442:10)
Module.load (module.js:356:32)
Function.Module._load (module.js:313:12)
Function.Module.runMain (module.js:467:10)
startup (node.js:136:18)
node.js:963:3
Illegal instruction
raynos at raynos-SVS15127PXB  ~
$ cat foo.js 
process.on('unhandledException', function onError(err) {
     throw err;
});

function foo() {
    throw new Error('sad');
}
foo();

@misterdjules
Copy link

Thank you @Raynos for opening this issue! My plan was to give a shot at implementing that to validate/invalidate the concerns I expressed earlier. It shouldn't be too long.

@misterdjules
Copy link

Just to make sure we're on the same page, and that I'm not missing anything, here's the tiny change that I made to node's master:

diff --git a/src/node.cc b/src/node.cc
index ab58627..a91ece3 100644
--- a/src/node.cc
+++ b/src/node.cc
@@ -1011,6 +1011,9 @@ static bool ShouldAbortOnUncaughtException(Isolate* isolate) {
   return isEmittingTopLevelDomainError || !DomainsStackHasErrorHandler(env);
 }

+static void AbortOnPromiseRejection(PromiseRejectMessage message) {
+  abort();
+}

 void SetupDomainUse(const FunctionCallbackInfo<Value>& args) {
   Environment* env = Environment::GetCurrent(args);
@@ -1117,7 +1120,6 @@ void SetupPromises(const FunctionCallbackInfo<Value>& args) {

   CHECK(args[0]->IsFunction());

-  isolate->SetPromiseRejectCallback(PromiseRejectCallback);
   env->set_promise_reject_function(args[0].As<Function>());

   env->process_object()->Delete(
@@ -4136,6 +4138,8 @@ static void StartNodeInstance(void* arg) {

     isolate->SetAbortOnUncaughtExceptionCallback(
         ShouldAbortOnUncaughtException);
+    isolate->SetPromiseRejectCallback(
+        AbortOnPromiseRejection);

     // Start debug agent when argv has --debug
     if (instance_data->use_debug_agent())

and here's what I ran:

[jgilli@dev ~/node]$ cat /tmp/test-promise-throw.js 
function bar() {
  console.log('boom!');
  throw new Error('boom');
}

function foo() {
  bar();
}

function boom() {
  return new Promise(function (fulfill, reject) {
    foo(); 
  });
}

boom();
[jgilli@dev ~/node]$ ./node /tmp/test-promise-throw.js 
boom!
Abort (core dumped)
[jgilli@dev ~/node]$

and finally here's the callstack that I get from the core file that was just generated:

[root@00-0c-29-44-bf-9f ~]# mdb /zones/e1a453d7-9b7b-4721-aea6-791d38603cee/cores/core.node.38463
Loading modules: [ libumem.so.1 libc.so.1 ld.so.1 ]
> ::load /root/mdb_v8_amd64.so
mdb_v8 version: 1.1.1 (release, from 28cedf2)
V8 version: 4.8.271.17
Autoconfigured V8 support from target
C++ symbol demangling enabled
> ::jsstack
native: libc.so.1`_lwp_kill+0xa
native: libc.so.1`raise+0x20
native: libc.so.1`abort+0x98
        (1 internal frame elided)
native: v8::internal::Isolate::ReportPromiseReject+0x59
native: v8::internal::Runtime_PromiseRejectEvent+0x110
        (1 internal frame elided)
js:     rejectPromise
js:     Promise
        (1 internal frame elided)
js:     boom
js:     <anonymous> (as <anon>)
        (1 internal frame elided)
js:     <anonymous> (as Module._compile)
js:     <anonymous> (as Module._extensions..js)
js:     <anonymous> (as Module.load)
js:     <anonymous> (as Module._load)
js:     <anonymous> (as Module.runMain)
js:     startup
js:     <anonymous> (as <anon>)
        (1 internal frame elided)
        (1 internal frame elided)
native: v8::internal::_GLOBAL__N_1::Invoke+0xb3
native: v8::internal::Execution::Call+0x62
native: v8::Function::Call+0xf6
native: v8::Function::Call+0x41
native: node::LoadEnvironment+0x1e8
native: node::Start+0x518
native: _start+0x6c
> 

The problem as I see it is that foo and bar should be in that call stack for post-mortem debugging to be useful.

@Raynos
Copy link
Contributor Author

Raynos commented Feb 5, 2016

Correct, foo and bar need to be in that call stack.

The reason they need to be in the call stack is far more evident if we use ::jsstack -v

Seeing the arguments to functions on the stack is absolutely critical to debugging what is going on in bar and foo.

Take for example the classic recursion problem, your trying to debug an infinite recursion exception about "Maximum call stack" you really want to see the functions that are being called recursively. Seeing the nearest promise that caught the recursion gives you less information then preferable.

In fact seeing the stack to the nearest promise is probably as good as looking at the err.stack string in terms of debugging ability.

@chrisdickinson
Copy link
Contributor

Thanks for doing this!

This touches on something interesting. You're instrumenting the executor, which runs before new Promise is resolved and returned, and catches any thrown exception and turns it into a rejection — essentially Promise runs function Promise(executor) { try { executor(resolve, reject) } catch (err) { reject(err) }. While I don't think that's going to change (I could be wrong, it might be something to bring up with the V8 team), it's not quite the same thing as a rejection that happens explicitly, or one that happens as the result of a throw in a success handler. The latter is probably the most important to test, since it's the case that stands to become the most common should the promises API land in core.

For reference, that would look something like this:

'use strict'
const fs = require('fs')

function bar () {
  throw new Error('unhandled rejection');
}

function foo () {
  return bar();
}

const p = new Promise(function (resolve, reject) {
  fs.readFile(__filename, (err, data) => { err ? reject(err) : resolve(data) })
}).then(() => {
  foo()
});

I am curious: do you think the overlap between the post mortem debugging and promise-using audiences is such that there are a lot of folks that desire this debugging to be perfect? Or can the promise-using audience rely on advances in the Chrome dev tools, while non-promise-users rely on lldb and mdb + core dumps?

@Raynos
Copy link
Contributor Author

Raynos commented Feb 5, 2016

@chrisdickinson in terms of overlap and usages of nodejs in companies:

do you think the overlap between the post mortem debugging and promise-using audiences is such that there are a lot of folks that desire this debugging to be perfect

I think the overlap probably exists in enterprises.

Take for example Uber. We probably have about ~5 nodejs engineers that use mdb and we have ~200 nodejs engineers in total. For the five of us to be able to use mdb we have to tell 200 engineers that do not care about mdb that they cannot use promises.

This definitely shows itself when you have 1 or 2 people writing for example, an "rpc" library or a high performance "json api" library that want to be able to do post-mortem debugging on their library which is embedded in >50 services and touched by >200 engineers.

This becomes especially difficult if an application developer uses a database ORM library that has promises and then calls response.send(obj) inside a then() block and complains to a library engineer that implemented response about a bug or a crash.

Our current response would be to try and debug their issue and then tell them that there is not enough information and we have to add a change their application to use libraries and techniques that enable post mortem debugging.

This particular dynamic of application developers and library authors wrt Promises has not happened at Uber yet, but it will be far more likely to happen if promises are in core.

@misterdjules
Copy link

You're instrumenting the executor

I thought this was what the promisify function is currently doing in #5020. So for instance, if fs.readFile throws on an incorrect argument, my understanding is that the current implementation of promisify will catch that error (and thus unwind the stack) and then turn it into an operational error. So it seems to me it's a use case worth considering.

it's not quite the same thing as a rejection that happens explicitly, or one that happens as the result of a throw in a success handler. The latter is probably the most important to test, since it's the case that stands to become the most common should the promises API land in core.

I would think it's another data point and a great test to add to document and try to improve the post-mortem debugging support for promises. EDIT: since it suffers from similar issues as the previous test case.

I am curious: do you think the overlap between the post mortem debugging and promise-using audiences is such that there are a lot of folks that desire this debugging to be perfect?

I don't think it's about "perfect" debugging, it's about being able to debug anything at all using post-mortem debugging. As a lot of applications are built on top of third-party npm modules, it seems very likely (if not certain) that users of post-mortem debuggers will need to debug applications that use the promises-based APIs of node core using post-mortem debugging tools, or they'll need to make code changes to these modules. Making that more likely to happen before thinking about solutions to the issues mentioned in this thread would seem to be going against the work done by the @nodejs/post-mortem WG.

Or can the promise-using audience rely on advances in the Chrome dev tools

What do Chrome dev tools provide in terms of debugging promises? That sounds interesting!

while non-promise-users rely on lldb and mdb + core dumps?

I don't think post-mortem debugging users will have the luxury of not using any module making use of the new promises-based node core APIs.

@Raynos
Copy link
Contributor Author

Raynos commented Feb 5, 2016

I don't think post-mortem debugging users will have the luxury of not using any module making use of the new promises-based node core APIs.

Even if post mortem debugging users have that luxury and control over their dependency tree (some of us do). The moment a caller wraps the callsite entry point into their code with a try/catch it hinders the ability to debug any bugs that occur in production using post mortem techniques.

@mscdex mscdex added v8 engine Issues and PRs related to the V8 dependency. lib / src Issues and PRs related to general changes in the lib or src directory. labels Feb 5, 2016
@gx0r
Copy link

gx0r commented Feb 5, 2016

Sincere thanks to you all for taking the time to dig into this. Promises in core with good dumps would of course be incredible.

@petkaantonov
Copy link
Contributor

promisify will catch that error (and thus unwind the stack)

What do you mean by this? Throwing/catching doesn't affect an error's .stack. It's created when you do new Error() (which does Error.captureStackTrace) and can only changed by the user explicitly after that.

@chrisdickinson
Copy link
Contributor

@petkaantonov "Unwinding the stack" is a different case than the captured stack trace — it means that when the process aborts, the abort call is at the top of stack so that the resulting core file can be inspected to find the original stack leading to the exception.

@petkaantonov
Copy link
Contributor

Oh I see. Btw the PromiseRejectMessage used in unhandled promise rejection has a stack_trace_ property (GetStackTrace()) that we are not using. That's probably why the stack trace reported is wrong/not as helpful as possible.

@petkaantonov
Copy link
Contributor

We need to implement --abort-on-unhandled-rejection flag as well.

This flag doesn't make any sense with code you don't control. One needs to use a workaround for certain cases to avoid false positives, that cannot be expected from "low quality" npm modules.

@vkurchatkin
Copy link
Contributor

I think it's not possible to implement this. Thrown errors are caught no matter what. Explicitly rejecting should work, but it would result in false positives. For example Promise.reject().catch() would abort instantly

@misterdjules
Copy link

@chrisdickinson

the abort call is at the top of stack so that the resulting core file can be inspected to find the original stack leading to the exception.

The abort() call (or whatever we use to make the process dump core) will always be at the top of the stack, or near the top, because that's what makes a process dump core. What we want to make sure of is that the function that threw the uncaught error is at the top of the stack.

@petkaantonov

Oh I see. Btw the PromiseRejectMessage used in unhandled promise rejection has a stack_trace_ property (GetStackTrace()) that we are not using. That's probably why the stack trace reported is wrong/not as helpful as possible.

The first message in this issue illustrates that one cannot use a process' unhandledRejection event listener to create core dumps that are usable for post-mortem debugging, not that the stack trace is incorrectly reported. The stack trace is correctly reported, it's just that at the time the process' unhandledRejection event listener run, the information that would allow one to use a post-mortem debugger to root cause a problem, such as which functions were on the stack and their parameters, is gone.

@petkaantonov

We need to implement --abort-on-unhandled-rejection flag as well.

This flag doesn't make any sense with code you don't control. One needs to use a workaround for certain cases to avoid false positives, that cannot be expected from "low quality" npm modules.

Could you please be more explicit? What "certain cases" do you have in mind? Is it the Promise.reject().catch() that @vkurchatkin mentioned?

@petkaantonov
Copy link
Contributor

If you don't immediately call .catch() on a promise, and it rejects before you do, then that rejection will be falsely reported as "unhandledRejection". That's why there is the event "rejectionHandled".

@benjamingr
Copy link
Member

Right, this is impossible, it's not an artifact and it'll be the same way with any asynchronous error - those just get silenced with callbacks where promises provide a unhandledRejection hook to help spot the vast majority of them so you're not left in the dark.

Of course, you can access the stack inside the unhandledRejection event handler. You can just log the stack there and then throw.

@misterdjules
Copy link

@petkaantonov

If you don't immediately call .catch() on a promise, and it rejects before you do, then that rejection will be falsely reported as "unhandledRejection". That's why there is the event "rejectionHandled".

Thank you, that's helpful to know! I'll definitely add that to the elements to consider when writing the results of my research.

@chrisdickinson
Copy link
Contributor

As @misterdjules notes, yes, let's keep this conversation polite & productive.

It looks like there are a couple misunderstandings on both sides, so I'd like to define some terms here:

unhandledRejection: This term refers to an event fired by Node on behalf of V8. Unlike uncaughtException, it does not represent an unrecoverable failure. A promise will trigger this event when it is rejected if there are no handlers for rejection. Other code may still have a reference to this promise when it becomes rejected, and may add handlers at some point later. At this point, the rejection becomes handled and the rejectionHandled event is fired. In a program using promises, it's an accepted practice to track a Set of rejected promises, noting when they resolved. Many promise users use this technique, other promise users prefer to avoid triggering unhandled rejection — it's possible to use promises in both fashions, and there are trade offs and use cases for both approaches. This is why Node's default unhandledRejection handler does not terminate the program — the user needs to tell Node which tradeoffs are more acceptable for their program.

"Unwinding the Stack": This is separate from a stack trace — which gives an indication of "where" the error occurred, but less information about "how" an control passed to that point. When abort is called, the stack up until the abort call will be present in the resulting core dump file. Debuggers like lldb and mdb can inspect this stack, including being able to pull out the arguments that were passed to the frame — they get more of an idea of "how" the error occurred after the fact than what a trace provides. When an exception is thrown, it unwinds the stack to the nearest catch clause. If an abort is called as a descendent stack frame from the catch clause, despite receiving trace information, the argument information is lost.

@petkaantonov
Copy link
Contributor

I understand post-mortem debugging very well, and I care for its users, which is why I wanted to put some effort in understanding how the project could understand better how to use them together, and as a result help users of promises too.

Promise rejection in general case is not an indication of "mortem".

@misterdjules
Copy link

@vkurchatkin Continuing the discussion from #5020 (comment):

The code sample you pasted uses try/catch explicitly, and thus doesn't abort.

No, it's code of the lib. User doesn't explicitly catch anything. So there is no guarantee that lib doesn't break post mortem even if it doesn't use promises

Right, the point is not necessarily that it's in a library or not, it's about differentiating programmer errors and operational errors. Users of post-mortem debugging handle programmer errors by generating a core file that can be inspected later to root-cause the problem.

The issue with promises is that so far and If I understand them correctly, programmer errors and operational errors are not handled differently, and so it's impossible to differentiate them. Thus it seems promises in their current implementation are not compatible with post-mortem debugging.

Adding a promises-based API in core means that core accepts and supports not being able to differentiate between the two types of errors, which I feel contradicts the work of the @nodejs/post-mortem WG and the current support for post-mortem debugging.

Similar (not identical) issues have been caused by the addition of domains in core. Luckily, some work has been done to mitigate these issues, maybe the same work can be done with promises at least to inform the project on whether or not we want to accept promises-based API in core?

@misterdjules
Copy link

@petkaantonov

Promise rejection in general case is not an indication of "mortem".

Right, and that's not what I suggested. See my response above to @vkurchatkin:

[Post-mortem debugging is] about differentiating programmer errors and operational errors. Users of post-mortem debugging handle programmer errors by generating a core file that can be inspected later to root-cause the problem.

The issue with promises is that so far and If I understand them correctly, programmer errors and operational errors are not handled differently, and so it's impossible to differentiate them.

@petkaantonov
Copy link
Contributor

The issue with promises is that so far and If I understand them correctly, programmer errors and operational errors are not handled differently, and so it's impossible to differentiate them.

You can do different kind of error handling depending on the context, the errors type and properties of the error.

Suppose the program is leaking database handles. You will get handle allocation error, in this case you would abort the process and restart, to free all the leaked handles..

Suppose you get TypeError or ReferenceError or such. There might not be any need to abort, you can drop whatever context was related to this (e.g. one request).

Suppose you get typical operational error like TimeoutError, in this case you would retry. Or if you have already done e.g. 5 retries, you would report it to the user.

@chrisdickinson
Copy link
Contributor

@misterdjules:

[Post-mortem debugging is] about differentiating programmer errors and operational errors. Users of post-mortem debugging handle programmer errors by generating a core file that can be inspected later to root-cause the problem.

I sort of disagree with this definition: this describes an aspect of post-mortem debugging, but not post-mortem debugging as whole. I would broaden the definition — post-mortem debugging is about having the necessary tools and information available to piece back the series of events leading to an unexpected state in the program after the program has terminated. Determining the provenance of an error is a task one should be able to perform, but it's not the whole of the thing.

@petkaantonov
Copy link
Contributor

post-mortem debugging is about having the necessary tools and information available to piece back the series of events leading to an unexpected state in the program after the program has terminated

But even currently you have no such thing since you only have the last event? In that case I'd argue the current promise setups in userland are already doing this much better with long stack traces that you can log in the unhandledRejection handler. This is currently unimplemented by Node.js for native promises, but it's in DevTools so I assume there is API for it.

@misterdjules
Copy link

@vkurchatkin

when using promises, I haven't found a way so far to call abort() (or any other API that makes a process generate a core dump) in a way that generates a core dump that preserves the call stack _at the exact same state when the error occurred.

You can't do that reliably with callbacks either

Yes, that's what's done in Isolate::Throw, and that's how --abort-on-uncaught-exception is implemented.

@vkurchatkin
Copy link
Contributor

Yes, that's what's done in Isolate::Throw, and that's how --abort-on-uncaught-exception is implemented.

This doesn't work if someone catches and rethrows an error

@petkaantonov
Copy link
Contributor

You can handle the Exception DebugEvent and manually PrintStackTrace and abort() there

@vkurchatkin
Copy link
Contributor

I don't think it's something that can be turned on in production. It looks like it debugger analyses every throw. If it happens in a context of a promise it tries to predict if it will be caught: https://github.com/nodejs/node/blob/master/deps/v8/src/debug/debug.cc#L1799

@misterdjules
Copy link

Yes, that's what's done in Isolate::Throw, and that's how --abort-on-uncaught-exception is implemented.

This doesn't work if someone catches and rethrows an error

Absolutely, which is why users of post-mortem debugging carefully use try/catch blocks and throw statements. The problem that we're describing here is that promises implicitly add that try/catch block, and thus don't give you a choice of what errors are caught or uncaught.

@chrisdickinson
Copy link
Contributor

@misterdjules: How feasible do you think it would be for the V8 stack trace capture mechanism to additionally capture parameter values? If feasible, do you think that would make the case where the core contains an error object (but an undesirable stack state) more tenable?

@petkaantonov
Copy link
Contributor

Absolutely, which is why users of post-mortem debugging carefully use try/catch blocks and throw statements

If you need to carefully use try catch and throw anyway, then what is the point of this issue? You dont need to be particularly careful about not using promises.

@Raynos
Copy link
Contributor Author

Raynos commented Feb 6, 2016

@ChrisDickson @mrdjules @vkurchatin

As an aside, Uber has a hacked up private fork of mdb that kind of works on Linux. Might be worth starting a conversation with post-mortem about Linux support.

@misterdjules
Copy link

@petkaantonov

Absolutely, which is why users of post-mortem debugging carefully use try/catch blocks and throw statements

If you need to carefully use try catch and throw anyway, then what is the point of this issue? You dont need to be particularly careful about not using promises.

Because so far a lot of node's core (domains is a counter example of that) has been written in a way that core APIs throw when an error cannot be dealt with (on a programmer error, on an 'error' event being emitted but no handler being set, etc.). Users of post-mortem debugging rely on that convention to make their process abort and generate a core dump when such an error is thrown, and they can't handle it themselves with a try/catch block.

Because promises add an implicit try/catch block around users' code, it seems that convention isn't respected anymore.

@misterdjules
Copy link

@Raynos

As an aside, Uber has a hacked up private fork of mdb that kind of works on Linux. Might be worth starting a conversation with post-mortem about Linux support.

That would be great! Would you mind opening an issue in nodejs/post-mortem?

@petkaantonov
Copy link
Contributor

Because promises add an implicit try/catch block around users' code, it seems that convention isn't respected anymore.

Yes but you need to use promises in order for this to happen. If you dont use promises then you are not affected. So non promise users who are not affected shouldnt block promise progress for thos who actually use them.

@misterdjules
Copy link

@petkaantonov

Because promises add an implicit try/catch block around users' code, it seems that convention isn't respected anymore.

Yes but you need to use promises in order for this to happen. If you dont use promises then you are not affected.

What if I want to use the new promises-based core APIs and post-mortem debugging? How do we communicate that the two are basically incompatible? Like I said before, domains have similar issues with post-mortem debugging. I've fixed some of these issues and had to explain inconsistencies to many users. I think having two incompatible tools in the core runtime brings a lot of confusion and in the long term doesn't serve users and the project. Does it hurt to try to find a way to make them work together? That's all I'm trying to do here.

And what if I can't avoid using the new promises-based core APIs? If core provides a promises-based API for most of its modules, then I would expect at least a significant part of the modules ecosystem to use them, otherwise, why bother doing it? A lot of node programs are built on top of small third-party modules, and post-mortem debugging users like myself unfortunately won't have the time to rewrite code that migrates to using core's promises based APIs.

Of course any of those third-party modules could today wrap all their code in a giant try/catch, thus having a negative impact on post-mortem debugging too, but that's not what most of them are doing, because it's not the convention that has been built so far. By moving to using core's promises based APIs though, it seems that these modules, and thus the applications that use them, would get closer to that. Is it worth it to discuss and investigate how we can understand that better and maybe find solutions if we determine there are problems to consider? I think so.

So non promise users who are not affected shouldnt block promise progress for thos who actually use them.

I am not blocking anything, in fact I cannot block anything because that's not how the project works.

I have raised questions around the impact of #5020 on post-mortem debugging, and I've provided some facts and code to illustrate these questions. We've already made progress in this discussion, and I'm sure we can make a lot more progress in the weeks to come to answers some of these questions. I believe answering these questions only helps the project and its users.

Like @chrisdickinson mentioned before, the worse thing that can happen by doing that is that we understand better the potential problems, and by doing so better serve our users and the project.

@misterdjules misterdjules added the post-mortem Issues and PRs related to the post-mortem diagnostics of Node.js. label Feb 6, 2016
@chrisdickinson
Copy link
Contributor

@misterdjules Indeed, I'm happy you broached this concern! I'm looking into what information V8 is privy to when generating stack traces, and it looks like it may contain the parameter values for each frame. If this true it opens an interesting venue for investigation: if we can preserve this information on the error object somehow instead of throwing it away when capturing the trace, this might greatly improve post-mortem capabilities in the general case, while resolving many of the concerns post-mortem has with the Promises PR.

Specifically, this would relax the need for a process to abort at a given location on the stack — if the process aborts anywhere with an Error object on-stack, one could inspect the Error object for the frame information that would previously be discovered via ::jsstack. Aside from parameter information, is there any other per-frame info that is necessary from the point of view of the post-mortem WG?

@misterdjules
Copy link

@chrisdickinson

Aside from parameter information, is there any other per-frame info that is necessary from the point of view of the post-mortem WG?

It's not really that post-mortem debuggers need a specific set of data and we could try to provide that data in different ways, but more that the core files they inspect need to be generated as soon as possible after the error happened, and without loosing any state.

For instance, what if I want to inspect the content of a buffer instance? That content is in the heap, how do I know that that content has not been overwritten by some code that runs after the error has happened and before the process aborts if between these two events node and V8 can basically do a lot of extra processing, including garbage collection, memory allocation/deallocation, etc?

What happens if I want to inspect the content of the stack frames but this content has been overwritten by calls to functions that handle errors after the uncaught error was thrown and before the process aborts?

What if I want to inspect the state of globals objects that may have changed in the meantime too? These are just examples of the same idea that core files should reflect the exact state of the process at the time of the error.

Does that answer your question?

@misterdjules
Copy link

Just for fun, I experimented with a small hack that provides us with the ability to dump a core file at the time an uncaught error is thrown from within a promise's resolver.

This hack currently covers only one specific use case that was mentioned in this issue, but I believe if we determine it's an approach worth considering, it wouldn't be too difficult to try to cover all other use cases.

As seen in node's output when running this script and in mdb's ::jsstack's output, the core file was generated at the time the error was thrown, and thus all the frames that are needed are on the stack.

I don't know yet if this hack can be considered a valid proof of concept, most importantly because contrary to the current behavior of --abort-on-uncaught-exception and throwing outside of promises, this would make throwing from within promises completely different depending on whether the flag is on or not.

At the very least, all other cases where an error can be thrown from a promise (not only from the resolver) would need to be handled, but it can be a starting point for discussing a potential fix for this problem.

@bnoordhuis
Copy link
Member

@misterdjules Maybe I missed it in the discussion but is there a reason something like this wouldn't work?

diff --git a/src/node.cc b/src/node.cc
index 20be309..dde2be1 100644
--- a/src/node.cc
+++ b/src/node.cc
@@ -145,6 +145,7 @@ static bool use_debug_agent = false;
 static bool debug_wait_connect = false;
 static int debug_port = 5858;
 static bool prof_process = false;
+static bool v8_abort_on_uncaught_exception = false;
 static bool v8_is_profiling = false;
 static bool node_is_initialized = false;
 static node_module* modpending;
@@ -1094,6 +1095,9 @@ void SetupNextTick(const FunctionCallbackInfo<Value>& args) {
 }

 void PromiseRejectCallback(PromiseRejectMessage message) {
+  if (v8_abort_on_uncaught_exception)
+    abort();
+
   Local<Promise> promise = message.GetPromise();
   Isolate* isolate = promise->GetIsolate();
   Local<Value> value = message.GetValue();
@@ -3864,10 +3868,12 @@ void Init(int* argc,
   // behavior but it could also interfere with the user's intentions in ways
   // we fail to anticipate.  Dillema.
   for (int i = 1; i < v8_argc; ++i) {
-    if (strncmp(v8_argv[i], "--prof", sizeof("--prof") - 1) == 0) {
-      v8_is_profiling = true;
-      break;
-    }
+    const char* const arg = v8_argv[i];
+#define V(flag, var) if (strncmp(arg, flag, sizeof(flag) - 1) == 0) var = true
+    V("--abort-on-uncaught-exception", v8_abort_on_uncaught_exception);
+    V("--abort_on_uncaught_exception", v8_abort_on_uncaught_exception);
+    V("--prof", v8_is_profiling);
+#undef V
   }

 #ifdef __POSIX__

The issue I see with your changes to promise.js is that they look to be breaking promise debugging when --abort_on_uncaught_exception is on.

@petkaantonov
Copy link
Contributor

@bnoordhuis That would abort() even when the rejection originated from explicit reject() call and not just from thrown exceptions. Such reject calls are normally coming from "operational errors" where abort() is usually inappropriate.

@misterdjules
Copy link

@bnoordhuis

Maybe I missed it in the discussion but is there a reason something like this wouldn't work?

In addition to what @petkaantonov said, I had experimented with such a solution earlier and described why it isn't suitable.

The issue I see with your changes to promise.js is that they look to be breaking promise debugging when --abort_on_uncaught_exception is on.

OK, I'll look into that when I have more time tomorrow. Thank you for taking a look and giving your feedback!

@chrisdickinson
Copy link
Contributor

@misterdjules: Yes — though it seems to me that in many cases, aborting at the top of stack may not be possible. I suspect that this will be true of most promise-based programs. However, perhaps we can broaden the applicability of post-mortem debugging techniques to include programs that use catch (or promises) by recording more information when generating error objects.

The ephemeral stack data would be preserved and persisted by the error object, and the heap should be kept static for the most part. Only a small gap is introduced between entering the exceptional state and the abort process, while relaxing the need to exit at top of stack somewhat.

In a promise chain like so:

A = promise()
B = A.then(op)
C = B.then(op)

If A rejects, the error will be propagated on next tick to B. If B rejects, the error will be propagated on next tick to C. C, having no handlers, calls the unhandledRejection event handler. By this time, the stack from A has unwound. However, we are left with an Error object, and that Error object could be made to record the additional stack information we desire at the VM level.

(I posted to this effect in the post-mortem WG thread as well.)

@hashseed
Copy link
Member

V8 fires "unhandled rejection" and "rejection handled" events even when the debugger is not active. When the debugger is active however, V8 collects additional information when Promises are executed. I believe the problems you are talking about are all solved problems if the debugger is active. See https://github.com/v8/v8/tree/master/test/mjsunit/es6/debug-promises

Short explanation: promise rejections are treated the same way as exception events in the debug listener. Rejected promises that have a reject handler are considered caught exceptions, those without are considered uncaught. The exception event is fired at the point of the rejection, so the stack has not been unwound yet. The promise object itself can be retrieved from the event data object.

You can also use the promise mirror to get some reflection: https://github.com/v8/v8/blob/master/test/mjsunit/es6/mirror-promises.js

Another gem you might be interested in: Chrome DevTools offers async stack traces. With that, you can actually see the asynchronous call chain leading up to Promise rejections.

@misterdjules
Copy link

@hashseed

I believe the problems you are talking about are all solved problems if the debugger is active

This issue is about being able to investigate production issues using core dumps. Enabling the debugger in production is, as far as I know, not standard practice. I would imagine that the impact on performance would not be acceptable, but I'd welcome any data that suggests otherwise.

Another gem you might be interested in: Chrome DevTools offers async stack traces. With that, you can actually see the asynchronous call chain leading up to Promise rejections.

Does that imply live debugging? If so that is outside of the scope of this issue, which is about post-mortem debugging in production.

@hashseed
Copy link
Member

hashseed commented Mar 1, 2016

I see. Then the debugger indeed won't be a help. Note however that I don't expect the debugger to impact performance aside from calling to the callback on exception events though. JIT code only run slow if a break point has been set to it.

@vkurchatkin
Copy link
Contributor

Closing in favour of nodejs/post-mortem#16. For posterity, issue title is misleading: unhandled rejections in fact DO have the correct stack traces (that is, error.stack properties).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lib / src Issues and PRs related to general changes in the lib or src directory. post-mortem Issues and PRs related to the post-mortem diagnostics of Node.js. v8 engine Issues and PRs related to the V8 dependency.
Projects
None yet
Development

No branches or pull requests

10 participants