Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When using handleUncaughtExceptions, if an error is thrown within a stream callback in the uncaughtException handler, restify enters an infinite loop. #1749

Open
3 tasks done
cprussin opened this issue Feb 13, 2019 · 10 comments

Comments

@cprussin
Copy link
Contributor

  • Used appropriate template for the issue type
  • Searched both open and closed issues for duplicates of this issue
  • Title adequately and concisely reflects the feature or the bug

Bug Report

Restify Version

Tested on 7.7.0, almost definitely present in all versions due to underlying Node issue.

Node.js Version

v10.15.1, likely present in all versions of Node.

Expected behaviour

No infinite loop

Actual behaviour

Infinite loop

Repro case

const restify = require('restify');
const http = require('http');

const server = restify.createServer({ handleUncaughtExceptions: true });
server.on('uncaughtException', () => {
    console.log('catching exception!');
    http.get('http://google.com', () => { throw new Error('foobar'); });
});
server.get('/', () => { throw new Error('fail'); });
server.listen(2000, () => console.log('Server now up on port 2000'));

Cause

This is caused by this issue in node.

Are you willing and able to fix this?

Yes.

There is a workaround available for this issue, which would be to change on to once here. If the restify team is on board with this change, I'm happy to submit a PR.

@misterdjules
Copy link
Contributor

One of my concerns with this is that it makes this problematic behavior silent. Would it be useful to know whether the error handler actually throws or emit an error than is not handled?

If so, would setting up a custom error event handler that:

  1. wraps the error event handler and somehow tries to report an error (publish a metric somewhere) first, assuming that error reporting code would be simpler/more robust than the original error handling code
  2. if called again, turns to a no-op

be a better trade-off between silencing all errors that occur in the error handler and consuming resources without bounds?

Another question that I have is: is it possible for an error to be emitted or thrown in the domain error handler before resources associated with the request/response are released? If so, would those resources ever be released if the error handler is set using once?

@cprussin
Copy link
Contributor Author

Would it be useful to know whether the error handler actually throws or emit an error than is not handled?

Yes, I definitely think that's ideal.

If so, would setting up a custom error event handler that:

  1. wraps the error event handler and somehow tries to report an error (publish a metric somewhere) first, assuming that error reporting code would be simpler/more robust than the original error handling code
  2. if called again, turns to a no-op

be a better trade-off between silencing all errors that occur in the error handler and consuming resources without bounds?

That does sound like a reasonable proposal to me. Any idea what you'd think such an API would look like?

Another question that I have is: is it possible for an error to be emitted or thrown in the domain error handler before resources associated with the request/response are released? If so, would those resources ever be released if the error handler is set using once?

That I'm not sure of. I could do more testing on Node to get a lower-level understanding. Regardless, I think that it would be ideal to solve this such that we can ensure that Restify always releases resources, if possible.

@misterdjules
Copy link
Contributor

@cprussin I'm thinking that prescribing/enforcing any behavior in restify is going to be tricky, because it seems difficult to determine what the best way is for restify consumers to handle those edge cases.

Is it possible for you to implement those ideas in a new domain within that restify domain error handler?

@cprussin
Copy link
Contributor Author

We can definitely do that @misterdjules. Being that that's the case, Restify using once instead of on would cause restify to not enter an infinite loop, but would still allow consumers to write code that catch this case. So, should we change restify or leave it as is?

@misterdjules
Copy link
Contributor

@cprussin I'm not sure anymore that once is actually desirable in all cases.

For instance, it seems like a legitimate use case for restify's uncaughtException handler is to log errors that are emitted on the req and res objects, or any other emitter created in the context of req/res' domain. Using once would allow only one error event to be logged.

Thoughts on that?

@cprussin
Copy link
Contributor Author

cprussin commented Feb 15, 2019

@misterdjules yeah, that's a good point.

Depending on what happens with the upstream node issue, we could also consider changing restify to create a new domain and call _onHandlerError within that domain. That is, change

node-restify/lib/server.js

Lines 829 to 831 in 4900d6b

handlerDomain.on('error', function onError(err) {
self._onHandlerError(err, req, res, true);
});
to:

handlerDomain.on('error', function onError(err) {
    var errorDomain = domain.create();
    errorDomain.on('error', function onError(err) {
        // Do something with err, probably including telling app
        // authors how they can handle this more gracefully...
    });
    errorDomain.run(function run() {
        self._onHandlerError(err, req, res, true);
    });
});

That way, we avoid an infinite loop in restify, but apps can still do error detection within the uncaught exception handler (by creating yet another new domain...).

@misterdjules
Copy link
Contributor

@cprussin Very interesting!

Would the errorDomain's error handler include potentially making sure that the request/response's resources are closed if err is a thrown error (as opposed to an emitted error)?

I presume it could at the very least log an error too, even though I wonder if that would drown into a sea of logs and turn out to be not that useful.

@cprussin
Copy link
Contributor Author

@misterdjules I would think it likely should do both of those, but OTOH is there an argument to be had that in this scenario, we should kill the process?

@misterdjules
Copy link
Contributor

@cprussin I don't know about killing the process, since the goal of using domains in the first place is to not exit on uncaught exceptions. I'm starting to think that all this should be left to the consumer of Restify because they have the most context on how they want their application to behave.

That is valid for nodejs/node#26211 too, and so I'm starting to think that is not a desirable change too.

Regardless, we could definitely improve Restify's and domain's documentation by documenting those edge cases and available workarounds.

Thoughts?

@cprussin
Copy link
Contributor Author

@cprussin I don't know about killing the process, since the goal of using domains in the first place is to not exit on uncaught exceptions. I'm starting to think that all this should be left to the consumer of Restify because they have the most context on how they want their application to behave.

That is valid for nodejs/node#26211 too, and so I'm starting to think that is not a desirable change too.

@misterdjules I agree that the ultimate insights into that an exception occurred should be up to the app, but I think that having a mode where the tool can enter an infinite loop unexpectedly (or at least, due to a condition that isn't obvious) is a problem--entering an unexpected infinite loop is a worse condition than crashing the process anyways. However, the more we chat about this, the more I think that the problem isn't Restify's to solve. Because of that, I'm actually really in favor of your change to Node, but I still think we should document those edge cases when using the restify flag so that consuming apps know how to account for these cases.

So my viewpoint on this is:

  1. Node shouldn't have a condition where it enters an infinite loop due to some nonobvious condition. Your PR will fix that, but will cause cases where handleUncaughtExceptions isn't sufficient to keep the server from crashing.
  2. I believe the purpose of the handleUncaughtExceptions flag is to indicate that uncaught exceptions on the request should be handled. I don't think it's up to restify to handle all possible exceptions. There will be edge cases to what conditions restify can handle exceptions under (for instance, what if an exception is thrown in Restify before we even enter the domain?). Point 1 will add such an edge case.
  3. To combat 2, we should add documentation in Restify for whatever edge cases exist in uncaught exception handling, and how to write a robust and stable error handler while using that flag.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants