Skip to content
This repository has been archived by the owner on Aug 7, 2023. It is now read-only.

promises and post-mortem debugging #16

Closed
misterdjules opened this issue Feb 6, 2016 · 123 comments
Closed

promises and post-mortem debugging #16

misterdjules opened this issue Feb 6, 2016 · 123 comments

Comments

@misterdjules
Copy link

A PR that aims at adding a promises-based API to most node core APIs was recently submitted to nodejs/node.

I have expressed my concerns regarding the impact of such a change on post-mortem debugging, and a separate issue was created by @Raynos to further discuss these concerns.

My concerns are more questions that I asked because I felt that the way promises handle errors is incompatible, at least in some ways, with post-mortem debugging. So far my investigations only confirmed these concerns, but I'm not done with them. One part of these investigations is also to find a solution to the problems that I found so far.

The discussion in nodejs/node has been productive so far, but I represent only one post-mortem debugging user/maintainer. I feel it would help if some people from this working group would share their opinion on these concerns with the rest of the group here in this issue, if only to confirm or infirm that they have the same questions/reserves as I do about the impact of nodejs/node#5020 on post-mortem debugging. For instance, if some of you already worked on that problem and did a similar investigation, I would be interested if you could share your results.

In any case, I will share my findings with the group as soon as I have some more time to continue my investigations.

Thank you!

@davepacheco
Copy link

For what it's worth, I have all the same concerns and reservations, but nothing additional to add to the discussion that you didn't already mention. Thanks for weighing in on the PR.

@mikeal
Copy link

mikeal commented Feb 6, 2016

To what extent are these incompatibilities an implementation detail of V8's Promises vs. an issue with any Promise implementation based on the spec in ES2015?

I ask because we have to approach fixing them very different depending.

@misterdjules
Copy link
Author

@mikeal That's a great question. I would think that we would try to come up with a solution similar to what has been done for --abort-on-uncaught-exception. That is: it would be behind a flag, and should be considered as a debugging tool specific to the Node.js runtime/platform.

I don't think aborting on an uncaught exception when passing --abort-on-uncaught-exception on the command line conforms to the ES 2015 spec, in the sense that I'm not sure if it's in the scope of the spec. If that's a valid interpretation of the spec, I would think the same reasoning could apply to a similar solution to this specific problem.

@misterdjules
Copy link
Author

Also, see nodejs/node#5084 (comment) for a quick experiment that I've done.

@mikeal
Copy link

mikeal commented Feb 6, 2016

Maybe we should ask @domenic about this.

@domenic
Copy link

domenic commented Feb 6, 2016

What is the exact question?

@jclulow
Copy link

jclulow commented Feb 6, 2016

I suspect it mostly boils down to this: Given the way that promises are currently specified, is it possible for the interpreter to know a priori whether or not a thrown exception will be handled as a rejection later?

If it is possible to know, we can abort the process at the correct moment because we know the thrown exception will not be handled as a rejection at a later time. This preserves the state of the process at the moment the fault occurs; stack arguments will be available, and the heap will have been preserved in its entirety by the operating system.

If it is not possible to know for all cases, then the interpreter must necessarily unwind the stack and continue executing until a subsequent determination can be made as to whether the rejection is unhandled. If this is true, then promises are anathema to post mortem debugging -- the model prescribes the destruction of the very state we are trying so hard to preserve for later study.

@domenic
Copy link

domenic commented Feb 6, 2016

I suspect it mostly boils down to this: Given the way that promises are currently specified, is it possible for the interpreter to know a priori whether or not a thrown exception will be handled as a rejection later?

No. Just like there's no way for the "interpreter" to know whether or not a thrown exception will be catch-ed later.

The entire point of promises is to be able to handle errors, similar to try/catch. If I understand you correctly, post-mortem debugging is anathema to the JavaScript language and its try/catch feature (as well as the promises feature that depends builds on that). A better approach would be to run a source transform to replace throw (as well as promise reject calls) with process.abort(), or use a language which does not have catchable exceptions.

@jclulow
Copy link

jclulow commented Feb 6, 2016

No. Just like there's no way for the "interpreter" to know whether or not a thrown exception will be catch-ed later.

This is manifestly false. Whether a throw will be caught by a try/catch or not depends entirely on whether the interpreter is handling code within a (potentially nested) try block or not. The call frame itself can make this coarse grained information available, and the interpreter can walk stack frames without dismantling them.

If nothing else, there is proof by construction: the Node --abort-on-uncaught-exception flag does this today.

The entire point of promises is to be able to handle errors, similar to try/catch. If I understand you correctly, post-mortem debugging is anathema to the JavaScript language and its try/catch feature (as well as the promises feature that depends builds on that).

It's certainly possible to make software less debuggable with blanket usage of try and catch, but you at least have to opt in to doing so. One of the more unfortunate aspects of Javascript is that catch is a binary construct: either you catch everything, or you catch nothing. On this basis, we tend to recommend that you avoid throwing or catching at all. Thrown exceptions should be left to serious programmer errors, where the only recourse is to abort an apparently logically inconsistent program.

Promises, like catch, do not appear to afford the programmer the ability to catch only a subset of expected exceptions. They also apparently lack any way to turn off the indiscriminate swallowing of all classes of exceptions. That is what makes them undesirable, at least to those of us working on systems software.

A better approach would be to run a source transform to replace throw (as well as promise reject calls) with process.abort(), or use a language which does not have catchable exceptions.

Even if reaching into arbitrary software and replacing random statements was a good idea, it would not work for a ReferenceError or a TypeError, or any other fault that arises implicitly due to an errant program. It would be similarly unhelpful with exceptions thrown from functions with a native implementation; e.g., JSON.parse().

This is not about aborting the process every time something is thrown, but rather when something is thrown but not handled.

@domenic
Copy link

domenic commented Feb 6, 2016

Given the tone of this conversation, I do not feel comfortable continuing to participate in it, and am unsubscribing from this thread. I would ask that do you do not mention me again in it.

@chrisdickinson
Copy link

@jclulow: To answer your original question — "Given the way that promises are currently specified, is it possible for the interpreter to know a priori whether or not a thrown exception will be handled as a rejection later?" The answer is no — or at least, not in a way that is compatible with the behavior you're looking for. Rejections are not treated like panics because it is possible that the rejection will become handled later in execution, possibly on a subsequent tick. Note, this is not the same thing as "on error resume" behavior, like domains enable — a rejection cannot "jump out of" the dependency graph of async ops like domains can.

This is not to say promises are anathema to post mortem debugging: as Domenic suggested, relying on throws to panic means that authors need to avoid try and catch where possible. While this group recommends that approach in order to make existing post-mortem tools work optimally, others find value in using try/catch (and promises) for exceptional control flow; and the introduction of constructs like process.on('uncaughtException'), and domains themselves show that the "on error panic" approach is not universally acceptable for Node users — some users and applications will want to make different tradeoffs than systems programmers.

Given that, even absent promises, our existing post-mortem tools rely on specific approaches to writing JS, and given that it's unlikely that generalizing a single approach to writing JS will work for all Node users, it seems to me that finding a solution that addresses the majority of both audience's concerns would be a good outcome to work towards.

For example: the post-mortem working group currently requires panic behavior to happen at the apex of the stack, which means that any approach to writing JS that involves catch clauses reduces the efficacy of the post mortem tools we have available. This happens even if the error is rethrown from within the handler immediately. @misterdjules has noted that the reason for this invariant is the need to reduce the number of confounding factors when examining the state of the program at crash — to be able to rely on the fact that the content of the heap and stack sections did not change between entering the error state and the process exiting.

However, it seems like the most valuable ephemeral information is the stack information — the functions, receivers, and arguments that get destroyed by unwinding the stack. This is also the most difficult requirement imposed by our post-mortem tools — it means that Node users that choose to use catch or promises are opting out of post-mortem techniques, and also requires that constructs like domains are called in a specific manner that can be awkward to ensure.

The Error.captureStackTrace API is backed by a stack frame iterator that has access to all of the information discarded by stack unwinding, and it captures data at precisely the point we need it to. Right now it doesn't record all of the information we're interested in — namely, parameter values. I understand that it doesn't enforce the ideal assertion — that no operations happened between unexpected state and abort — but it brings us from "no information" on catch and rethrow (or promises) to "useful partial information," and allows the post-mortem WG to relax the requirement that panic must happen at top of stack somewhat.

With regards to promises, because the unhandledRejection hook is exposed at application level, users who wish to treat unhandled rejections as panics are free to do so, and may abort providing a useful error object on-stack for post-mortem debugging.

@davepacheco
Copy link

This is not to say promises are anathema to post mortem debugging: as Domenic suggested, relying on throws to panic means that authors need to avoid try and catch where possible.

This point has been repeated a few times in a few threads, so it's worth clarifying. It seems to imply that developers that care about postmortem debugging have to take special care in their code (and, by logical extension, the modules that they use). But the vast majority of operational errors in Node and Node modules are already asynchronous and so are not thrown and not caught with try/catch anyway. As a result, most of the time, things work regardless of whether a module author considered postmortem support in the way they designed an interface or whether a program author considered that when they added a new module dependency.

Unlike try/catch, promises are used for handling asynchronous operational errors. That's why a promises-based core API is significantly more problematic for postmortem debugging than try/catch is.

This is also the most difficult requirement imposed by our post-mortem tools
...
For example: the post-mortem working group currently requires panic behavior to happen at the apex of the stack, which means that any approach to writing JS that involves catch clauses reduces the efficacy of the post mortem tools we have available.

I think it's fairer to say that these requirements derive from the unfortunate behavior of software that sometimes fails in ways that requires that data to understand it. For the reasons I mentioned above, I think in practice the way "catch" is used is not a major problem.

@chrisdickinson
Copy link

@davepacheco:

This point has been repeated a few times in a few threads, so it's worth clarifying. It seems to imply that developers that care about postmortem debugging have to take special care in their code (and, by logical extension, the modules that they use). But the vast majority of operational errors in Node and Node modules are already asynchronous and so are not thrown and not caught with try/catch anyway. As a result, most of the time, things work regardless of whether a module author considered postmortem support in the way they designed an interface or whether a program author considered that when they added a new module dependency.

Thanks for your response. This is a fair point: at the moment most errors generated by the core API are propagated through means other than throwing. However, I don't believe this precludes users from having to take special care in order to make post-mortem debugging viable for their programs.

When I say that special care has to be taken, I am referring to the recommendations on the Joyent error document as well as the recent recommendations from the Node error symposium, which stipulate that users should avoid using catch wherever possible in order to avoid losing stack information. In the case of promises, their existing use in the ecosystem would seem to preclude the application of post-mortem tools to those packages (& users of those packages.) Whether Core provides a promise-based API or not, this is a problem with the current state of post-mortem WG, if indeed we wish all Node users to be post-mortem WG users, as suggested elsewhere.

While I understand that core providing a promise-based API highlights this problem, I don't think it introduces it, and indeed it gives us an opportunity to find workable solutions for users that prefer to use catch + rethrow as well as users that prefer promises. Note that this isn't a suggestion to change the practices involved with callbacks — they remain as they are, and the existing tools and techniques used to debug them still work. This is a purely additive suggestion — that we start to look into instrumenting error objects more at stack trace capture time — that makes all programs, callback-based or not, produce more detailed information in core dumps. In other words, this is an opportunity to make the post-mortem debugging story a little better for all Node users, not just the ones that are able (or willing) to make the tradeoffs inherent in the recommendations we've given them thus far.

@stevemao
Copy link

stevemao commented Feb 7, 2016

Related issue about error handling mcollina/split2#8

@vkurchatkin
Copy link

I suspect it mostly boils down to this: Given the way that promises are currently specified, is it possible for the interpreter to know a priori whether or not a thrown exception will be handled as a rejection later?

The answer is yes, it's possible and v8 is capable of doing it, at least in debug mode.

@chrisdickinson
Copy link

@vkurchatkin:

The answer is yes, it's possible and v8 is capable of doing it, at least in debug mode.

Hm, I don't think it's possible to know at error-generation time — with the stack that led to error generation — that the promise will be unhandled.

For example:

const A = createPromiseSomehow()
const B = A.then(handlerB)
const C = B.then(handlerC)

If handlerB throws an exception, it will call PromiseReject on the promise. At this point, PromiseDone is called, and since there are pending tasks (B.then(handlerC)), PromiseEnqueue is called. Once the microtask fires, it will run PromiseIdRejectHandler which re-throws the error. This causes C to trigger an unhandledRejection event, however the stack leading to the original error has already been unwound. That said, the information that stack contained could be made available on the err object by decorating that object with more (private-to-VM) information about the stack frames it must iterate to generate the err.stack property.

@vkurchatkin
Copy link

@chrisdickinson

If B rejects its possible to traverse it's children and find out that there is no catch in the chain, so we can immediately abort

@chrisdickinson
Copy link

@vkurchatkin: An excellent point — right now all handlers A, B, C have a catch, it's just the PromiseIdRejectHandler (Edit: but in a patch, we could check for that.) The only other case it doesn't handle is an "eventual" handle of C, where elsewhere code calls C.catch after the initial rejection event, though I think that's probably not a deal breaker since application-level users can decide to disable that behavior explicitly. Perhaps we should talk to someone from the V8 team that focuses on promises about the feasibility of a patch?

@vkurchatkin
Copy link

The only other case it doesn't handle is an "eventual" handle of C, where elsewhere code calls C.catch after the initial rejection event,

Well that's obviously not possible if you choose to abort on unhandled promise

An excellent point — right now all handlers A, B, C have a catch, it's just the PromiseIdRejectHandler (Edit: but in a patch, we could check for that)

This is actually already implemented in PromiseHasUserDefinedRejectHandlerRecursive function: https://github.com/v8/v8/blob/master/src/js/promise.js#L406

@chrisdickinson
Copy link

@vkurchatkin:

Well that's obviously not possible if you choose to abort on unhandled promise

Yep! I agree we can safely ignore that case.

This is actually already implemented in PromiseHasUserDefinedRejectHandlerRecursive function: https://github.com/v8/v8/blob/master/src/js/promise.js#L406

Oh wow, I totally missed this. That's definitely useful. A patch could make it so that the perf hit for checking is only imposed on programs that are run with --abort-on-unhandled-rejection. That's exciting!

@groundwater
Copy link

Just want to add that regardless of whether V8 can maintain the stack on an unhandled Promise rejection, promise rejection is meant as a control flow. Many promise APIs reject promises for operational errors, meaning errors must be inspected, and unknown errors rethrown. By the time we know if our error is unexpected, the stack is unwound.

If promises are rejecting for well-defined reasons, such as a resource not existing or a timeout occurring we want to separate those cases from ReferenceErrors where the program is just broken.

@jclulow
Copy link

jclulow commented Feb 8, 2016

If promises are rejecting for well-defined reasons, such as a resource not existing or a timeout occurring we want to separate those cases from ReferenceErrors where the program is just broken.

Right. This is important, and I think speaks to a critical flaw in the model: implicitly thrown exceptions (ReferenceError, etc) are shoe-horned into the same error-handling channel as any other source of rejection.

Promises would ideally not have a set of edge cases or weird behaviour when somebody enables whatever becomes the promise equivalent of --abort-on-uncaught-exception. One of the great parts about --abort-on-uncaught-exception is that it only aborts when Node would have otherwise exited in error. It doesn't exit because a callback was called with an Error as the first argument, rather only when an exception is thrown but not caught.

@chrisdickinson
Copy link

@groundwater A good point, and many promise-based programs will be doing this sort of "does this error apply to me?" check-and-rethrow. There are a couple of avenues of investigation, here:

  1. Introduce an --abort-on-unhandled-rejection flag to V8, one that preserves stack when no user-defined handlers have been added.
  2. Look into recording frame parameter info onto error objects in a private-to-VM fashion. At the very least this offsets the costs of receiving a core with an unwound stack but with an error object present on-stack — a case that, at least anecdotally, I've run into in the past even in callback heavy codebases.
  3. On the language-level track, look into what it would take to add checked .catch handlers to promises, as an analogue (or as a part of) a proposal to add checked catch to the language to address this concern in the long run.

@chrisdickinson
Copy link

@jclulow:

This is a lever with which to turn the language. We can't change promises' existing behavior with regard to catching ReferenceErrors, since userland code relies on it. We can't prevent Node users from choosing to use them — they are a lot of good reasons to use them. Leaving them out of the Node core API won't fix the problem — some subset of users already want to use promises, leaving promises out won't change that.

However, we can press for better support from the language for being able to tell if a thrown error will simply be rethrown — both for catch clauses as well as for Promise .catch() methods. This should make the cores they produce suitable for use with the existing post-mortem tools. Leaving objections with the model aside for the moment, we can make it so that another package author's choice to use promises somewhere in your dependency chain doesn't unduly affect your ability to debug your application.

While we do that, we can improve the story for short stacks by recording more information on the Error object, making the existing cases where cores are produced with partially unwound stacks more workable — if not ideal — and making existing promise use less of an issue.

From what I've heard it seems like this should address most of the concerns voiced by the post-mortem WG. I am curious if you would find this acceptable — it is a compromise, but it is a compromise that expands the post-mortem story to include promises which means a larger user base is covered, even if not ideally.

@benjamingr
Copy link
Member

@jclulow

Right. This is important, and I think speaks to a critical flaw in the model: implicitly thrown exceptions (ReferenceError, etc) are shoe-horned into the same error-handling channel as any other source of rejection.

Right, but those are still reported - this is exactly why we did nodejs/node#256 you simply don't catch ReferenceErrors in the handler and then report them in the global hook.


I don't understand something, the use cases post-mortem debugging applies to like: "out of memory", "C++ module did something bad" or "I'm out of DB handles because something leaks" are all things you can inspect synchronously - they get thrown outside the chain and you can keep using the same techniques to debug them.

The other stuff - promises make things a lot easier to debug. You have throw safety, cascading error management like in synchronous code and when async/await lands it'll look synchronous too. You can have a good stack trace where the promise is thrown (userland libraries are better at this ATM though) and you can debug your code much like you would debug synchronous code.

Basically, every asynchronous case would have this problem with callbacks anyway and every synchronous case would not be affected.

@davepacheco
Copy link

@chrisdickinson

Thanks for your response. This is a fair point: at the moment most errors generated by the core API are propagated through means other than throwing. However, I don't believe this precludes users from having to take special care in order to make post-mortem debugging viable for their programs.

When I say that special care has to be taken, I am referring to the recommendations on the Joyent error document as well as the recent recommendations from the Node error symposium, which stipulate that users should avoid using catch wherever possible in order to avoid losing stack information.

It feels like my point has been acknowledged, but ignored. That document you linked to says:

In fact, the only commonly-used case where you'd use try/catch is JSON.parse and other user-input validation functions.
...
The general rule is that a function may deliver operational errors synchronously (e.g., by throwing) or asynchronously (by passing them to a callback or emitting error on an EventEmitter), but it should not do both. This way, a user can handle errors by either handling them in the callback or using try/catch, but they never need to do both.
...
In general, using throw and expecting a caller to use try/catch is pretty rare, since it's not common in Node.js for synchronous functions to have operational errors. (The main exception are user input validation functions like JSON.parse.)

I don't see anything that says that one should never use try/catch on the grounds that it interferes with postmortem debugging, because indeed: using try/catch to handle operational errors has no impact on postmortem debugging. Many of us have argued against using try/catch for handling programmer errors. But you don't usually have to worry about how modules handle programmer errors because modules generally don't attempt to deal with programmer errors at all, let alone those that arise from outside themselves. The results of the recent symposium do suggest avoiding try/catch -- but again, it's clearly talking about operational errors.

In the case of promises, their existing use in the ecosystem would seem to preclude the application of post-mortem tools to those packages (& users of those packages.) Whether Core provides a promise-based API or not, this is a problem with the current state of post-mortem WG, if indeed we wish all Node users to be post-mortem WG users, as suggested elsewhere.

The working group surely cannot prevent people from making design choices in their own modules that interfere with their ability to debug their own software postmortem (nor is that part of their charge, except perhaps to educate people on the issue). There's a big difference between that and landing changes to Node core APIs which, if used (and quite possibly used without understanding the impact), interfere with postmortem debugging.

Here and elsewhere, a false dichotomy is presented between accepting the PR and opposing a complete solution for postmortem debugging with promises. I don't believe anybody opposes such a solution, but many of us believe that before a new public API is created in platform, particularly one for which API stability is a stated goal, the implications of the API should be well-understood, and the known problems with it should be worked out. Landing an API is not the way to solve problems with it.

@davepacheco
Copy link

@benjamingr

I don't understand something, the use cases post-mortem debugging applies to like: "out of memory", "C++ module did something bad" or "I'm out of DB handles because something leaks" are all things you can inspect synchronously - they get thrown outside the chain and you can keep using the same techniques to debug them.
...
Basically, every asynchronous case would have this problem with callbacks anyway and every synchronous case would not be affected.

I'm not sure what you mean by by this -- what's "outside the chain" and what are "the same techniques"?
The use-cases are much broader than the ones you listed. A prototypical use-case for postmortem debugging would be: during an http 'request' handler that calls through several different modules while handling the request, the program ends up calling a JavaScript function that attempts to access property "foo" of a value that's undefined. The program crashes and dumps core. The core file allows the developer to see all of the information about the http request (method, url, headers, bytes read), all of the handlers on the stack, the arguments to each of these handlers, the "this" argument to each function on the stack, and so on. The value is that sometimes, the stack trace just isn't enough, and this other information helps nail the problem. Besides that, all this works even if the problem is not reproducible even once, and it does not require you to stop the program in a debugger (so you can use this in production).

@chrisdickinson
Copy link

@davepacheco:

Thanks for your response. I'd like to highlight a couple points:

But you don't usually have to worry about how modules handle programmer errors because modules generally don't attempt to deal with programmer errors at all, let alone those that arise from outside themselves.

The working group surely cannot prevent people from making design choices in their own modules that interfere with their ability to debug their own software postmortem (nor is that part of their charge, except perhaps to educate people on the issue).

I think the above points represent where we start to diverge. Specifically, I believe that the primary consumers of PMWG tooling are a subset of application developers — folks working directly on or in service of applications. Not all application developers consume PMWG tooling — indeed, though I have used the PMWG tooling in the past, in my current position I do not. Using myself as an example, as a member of the set of application developers, this would imply that the set of all application devs is not the set of all PMWG tooling consumers. PMWG tooling consumers use packages from the larger ecosystem — that is, PMWG users use packages authored by non-PMWG-using authors.

Some subset of non-PMWG-using package authors prefer to use promises. Those promises may cross package boundaries — that is, they may be returned to calling packages. This is an existing state in the ecosystem — see knex or sequelize for examples of this. By using and returning promises, these packages do deal with programmer errors. PMWG consumers may indirectly consume these packages as part of their application's dependency chain. It is likely that more package authors will start using promises as async/await lands sometime this year. Thus, the promises in the ecosystem already pose a problem for PMWG tooling consumers — this is a problem that exists now, and is likely to get worse over time if not addressed. As you said, I do not think that this group can prevent this problem via recommendation.

In that respect, I believe that introducing a promise API to core is a separate concern from the existing problem, and is unlikely to affect the presence or absence of the problem. I realize that not everyone agrees with me on this point — but if we agree on the prior chain of reasoning, that seems to imply that the post mortem WG will need to address promise use in tooling sooner rather than later. The timeline on the promise API is long, with go/no-go checkpoints along the way — nearly a year before it is officially supported and unflagged — which means that a solution can be developed in the interim. We can use the information we gather while this API is flagged as empirical evidence of problems with promises in practice, to be given to TC39, to help move that spec in a direction that is more amenable to PMWG.

Again, the API proposed does not expose the promise API in a cavalier fashion — it is a measured, long-term feature, with two major versions before official support + a flag for initial use, which gives us two major versions worth of information and avoids landing it in LTS until we know that the solution is tenable. We can and should solve the interoperation with promises and PMWG tooling, but as that problem already exists, I don't see why we can't do this in parallel, while using information from the flagged experiment to inform our research (as well as the standards bodies that can affect lasting change.)

@davepacheco
Copy link

@chrisdickinson Indeed, not all Node users use postmortem debugging tooling. We agree on that. I think there's agreement as well that some form of postmortem debugging support is something the Node community wants to keep.

Thus, the promises in the ecosystem already pose a problem for PMWG tooling consumers — this is a problem that exists now, and is likely to get worse over time if not addressed.... In that respect, I believe that introducing a promise API to core is a separate concern from the existing problem, and is unlikely to affect the presence or absence of the problem.

I believe several of us have repeatedly reported that we have not experienced this issue at all. That's not to say that it doesn't exist, but it appears to not yet be a widespread issue. (I'd welcome data points from other members of the working group on this.) Do you believe that the presence of a promise-based core API in Node will not contribute to the more widespread of use promises, and particularly in situations where module authors may have used the API without even knowing that it would impact other users' ability to use postmortem debugging?

@vkurchatkin
Copy link

@benjamingr well, we can't know the future) although I think it's theoretically possible to dump core and then continue execution until we know for sure that the process shouldn't be running

@benjamingr
Copy link
Member

@vkurchatkin that would also be a very interesting path to explore. If we can generate the core dump and keep running under the flag for a while that could be fantastic, --abort-on-rejection-unhandled-for=1000 for generating a core dump at the error but running for another second until aborting. I wonder if that's possible - it's definitely an interesting idea.

I'm personally OK with catch handlers being attached synchronously for libraries and for apps that use core dumps but I'm just one person and I'd like to see someone argue against it a little (there sure are people who hold the opposing view).

@bnoordhuis
Copy link
Member

If we can generate the core dump and keep running under the flag for a while that could be fantastic, --abort-on-rejection-unhandled-for=1000 for generating a core dump at the error but running for another second until aborting. I wonder if that's possible - it's definitely an interesting idea.

#18 (comment) (tl;dr fork + abort)

@vkurchatkin
Copy link

@bnoordhuis we need to somehow communicate the pid of the forked process to the tooling, otherwise it would be impossible identify core file. Is that correct?

@bnoordhuis
Copy link
Member

Correct. fork() returns the pid of the child process to the parent (i.e. original) process.

@davepacheco
Copy link

For what it's worth, there are a bunch of issues with forking in the face of a potentially fatal error:

  • The child process is not an exact copy of the parent. File descriptors marked CLOEXEC will be missing, for example, as will all the other threads. A core dump from the child will be missing a lot of important program state.
  • Forking at arbitrary points of execution is not necessarily safe, depending on what other threads are doing, what locks they've got held, and what resources they're holding onto. It's especially fraught if the program has actually encountered a fatal error because it's even harder to make any assertions about what other threads are doing, what locks are held by the current thread, and the like.
  • The fork() itself may not succeed, especially in these cases. No small number of core files we've seen are a result of out-of-memory conditions, and there are other cases where fork() can fail where a normal abort() will not.

@bnoordhuis
Copy link
Member

File descriptors marked CLOEXEC will be missing, for example, as will all the other threads.

File descriptors are gone anyway when you open the core dump so I don't see how that is relevant. It's theoretically possible to record file descriptor metadata in a PT_NOTE section but I'm not aware of platforms that actually do that.

You're right about threads but I thought the goal was to collect C+JS stack traces for the main thread.

Forking at arbitrary points of execution is not necessarily safe

That's why the fork should call abort() immediately afterwards.

The fork() itself may not succeed

True. You can opt to call abort() in the main process in that case. There may be no real going forward anyway once resource limits are hit.

@davepacheco
Copy link

File descriptors are gone anyway when you open the core dump so I don't see how that is relevant. It's theoretically possible to record file descriptor metadata in a PT_NOTE section but I'm not aware of platforms that actually do that.

illumos systems do that, and it's extremely useful. That said, I was mistaken -- those file descriptors will remain open unless you call exec(2).

@misterdjules
Copy link
Author

@vkurchatkin

Thank you very much for taking the time to respond to my questions and taking a look at the latest potential solution I explored. This is exactly the kind of feedback I was looking for, and I'm hopeful that we're starting to be on the right path to find a way that would be acceptable for post-mortem debugging users :)

Isolate::Throw should just be concerned by whether there's a try/catch handler in the stack above the current stack frame.

The problem is that there is ALWAYS try/catch handler in the stack above in case of promises

Absolutely, I think that's well understood, and that's the reason why in my suggested approach I explored a way to not setup a try/catch handler when passing a flag that aims at enabling a more post-mortem debugging compatible behavior.

so we have to special case it

Right, the difference between our potential solutions is where this special casing happens.

This sample code should not abort when run with --abort-on-uncaught-exception.

This is debatable. This patch doesn't allow to throw out of promise handlers and executor regardless of outer try catches. The semantics is weird, indeed, but try-catch can't affect this: promise constructor doesn't throw anything ever.

Would that work as designed with async/await? I'm even less familiar with async/await than with promises, so I may very well be wrong, but from what I understand one could write code such as:

async function boom() {
  await promiseThatThrows();
}

try {
  boom();
} catch () {
}

Is that correct? If so, I would think that even when enabling post-mortem debugging specific behaviors that abort early, that code would not abort.

your approach is interesting, but it clearly violates standard.

Would you mind referring to the section in the standard that indicates that? I'm not familiar with it and it can sometimes be difficult for me to understand e.g what is a hard requirement and what is undefined behavior.

Also I think not catching errors from microtasks can have unforeseen consequences.

Right, thank you for bringing that up. Are microtasks used for anything else than promises and Object.observe? Are there some npm modules using microtasks outside of these two use cases?

@vkurchatkin
Copy link

@misterdjules

async functions are basically just functions that return promises, although it is implicitly wrapped in try-catch as well, so it can only reject, not throw.

Would you mind referring to the section in the standard that indicates that?

Section 25.4.3.1 specifies how promise constructor works. As you can see, it always returns normal completion (step 11) expect for 10b, which should't happen. Normal completion means that function does't throw (throwing is an abrupt completion. If new Promise can't throw, wrapping it in try-catch can't catch anything.

Are microtasks used for anything else than promises and Object.observe

no, as far as I know

Are there some npm modules using microtasks outside of these two use cases

there are no good reasons to use them, they are no better than nextTick, but require native code compilation, so most likely the answer is no

@spion
Copy link

spion commented Feb 16, 2016

@misterdjules What about generators and transpilers for async/await that compile to generators? The generator based runner must attach a catch handler to the yielded promise, even if there is no try-catch block. Unless it can ask the generator iterator something like generatorIterator.isFrameWithinTryBlock(). Or am I missing something again?

@misterdjules
Copy link
Author

@vkurchatkin

async functions are basically just functions that return promises, although it is implicitly wrapped in try-catch as well, so it can only reject, not throw.

What about this sample code (same as the previous one, without the use of the async keyword to define an async function):

try {
  await promiseThatThrows();
} catch () {
}

Wouldn't that need to not abort regardless of the use of a flag to make uncaught errors in promises abort?

Or maybe it's designed/speced to be rewritten as a promiseThatThrows().catch() by all implementations and thus your approach would work as designed even in this case?

@misterdjules
Copy link
Author

@spion

What about generators and transpilers for async/await that compile to generators? The generator based runner must attach a catch handler to the yielded promise, even if there is no try-catch block. Unless it can ask the generator iterator something like generatorIterator.isFrameWithinTryBlock(). Or am I missing something again?

Can you be more specific? I don't understand in your comment what the question is. There's a lot of parallel discussions in this thread, and thus I'm not sure what you're referring to. Would you mind quoting the text you're responding to and proving details about what code doesn't work as you would expect and why?

Thank you!

@spion
Copy link

spion commented Feb 17, 2016

@misterdjules sorry, I forgot to give an example.

With this change, I'm assuming we would expect this to throw:

function makeRejectedPromise() {
  return Promise.resolve().then(_ => { throw new Error("oops") })
}

var asyncFn = async function f() {
  var result = await makeRejectedPromise();
  return result;
}

f()

because the await call is not within a try-catch block.

However, most existing async function compilers transform async functions to generators using a generator wrapper like this one:

function async(gen) {
  return function() {
    var iterator = gen.call(this, arguments)
    function process(val) {
      // simplified version without sync error handling
      var next = iterator.next(val);
      return next.done ? next.value
                       : next.value.catch(e => iterator.throw(e)).then(process)
    }
    return process()
  }
}

In order for the wrapper to be able to forward errors to the generator iterator, it has to catch the exception. Since we don't have generatorIterator.isFrameWithinTryBlock(), we can't ask it before we attach the catch handler.

Which as far as I can tell means that we wont get an abort (the catch handler is there and checks out), not until iterator.throw(e) throws us the error back (because the generator doesn't handle it). By that point, however, the stack changed completely.

@refack
Copy link

refack commented Jun 4, 2017

Cross-ref: nodejs/node#13437

@misterdjules
Copy link
Author

@refack Can you expand on why nodejs/node#13437 is relevant to this discussion?

@refack
Copy link

refack commented Jun 5, 2017

@misterdjules I believe it's more the other way around, this discussion is relevant to the one going on at nodejs/node#13437

@misterdjules
Copy link
Author

@refack The question still stands: how is it relevant? Otherwise it's really not clear to me, and I suspect for others, why the two are linked.

@refack
Copy link

refack commented Jun 5, 2017

In the most basic sense both discussions deal with Promises and diagnostics (although pre vs post mortem).
In a more specific sense async_hooks enable implementing trace mechanisms that may be beneficial for post-mortem diagnostics, but only if done well.
IMHO this discussion has raised good points, and personally helped me when thinking about pre-mortem Promise tracing. And so I believed cross referencing the two might help others.

as an aside; I feel that there is a huge body of knowledge in the nodejs org, but it's quite fragmented, causing mistakes to be repeated or decisions to be made without taking into account points that were already riesed but forgotten. So I feel defragmenting, and cross referencing, more often than not has more benefit than cost

@AndreasMadsen
Copy link
Member

as an aside; I feel that there is a huge body of knowledge in the nodejs org, but it's quite fragmented, causing mistakes to be repeated or decisions to be made without taking into account points that were already riesed but forgotten. So I feel defragmenting, and cross referencing, more often than not has more benefit than cost

Indeed, I had no idea this thread existed :o

@mmarchini
Copy link

Discussion on this topic has moved to nodejs/promises-debugging.

@richardlau
Copy link
Member

As @mmarchini says, this work is now being done in https://github.com/nodejs/promises-debugging. If this needs to be tracked outside of that repository, please open an issue over in https://github.com/nodejs/diagnostics.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests