Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement context isolation for Node.js #711

Merged
merged 8 commits into from
Mar 9, 2022
Merged

Conversation

shalvah
Copy link
Contributor

@shalvah shalvah commented Mar 5, 2022

Status

READY

Description

This PR fixes #708 by storing the context in AsyncLocalStorage for web requests.

Implementation

Thanks to the earlier work by @subzero10, the core implementation was relatively straightforward:

  • First, when the agent is created, it uses the GlobalStore (renamed from DefaultStore so its behaviour is more obvious). This allows us to support simple single-request use cases like CLI jobs.
  • We're exposing the hb.run(cb) method, which sets the store to the AsyncStore and runs the given code with it. Each request will have its own context, thanks to ALS. Users can wrap their request handlers in hb.run() to properly capture the context for each request. For supported frameworks like Express, they can just add the Honeybadger.requestHandler middleware, which is a wrapper over hb.run.

However, because of Node.js' oddities, I had to do several more investigations. Here's my tale:

The existing Honeybadger.requestHandler() simply wrapped the code in a domain and set the next handler as the domain's error handler. At first, it seemed strange, but I realized that the advantage of domains is that they can capture asynchronous errors triggered from the same context. In the example below, the error will be caught by someErrorHandler.:

functionThatErrors = () => { throw new Error("Oh no") };

handler = () => {
  try {
    setTimeout(functionThatErrors, 100);
  } catch (e) {
    // This won't catch the error!
  }
}

dom = require("domain").create();
dom.on("error", someErrorHandler); // <- This will
dom.run(handler);

Without wrapping in a domain, there is no way to catch the error (unless you fix functionThatErrors). The error would bubble up to process.on("uncaughtException") (or crashes if you don't have one registered)`:

try {
  handler ();
} catch (e) {
  // Doesn't work
}

process.on("uncaughtException", (err) => {
  // Catches it
});

Your uncaughtException handler will catch this, yes, but you've lost the context. In an Express app, for example, you'd no longer have access to the res object, so you can't send a response to the user. But with domains, you can create a new one for every request and give it an error handler that captures res and anything else you need. So, domains are sorta mini-worlds.

app.get("/", (req, res, next) => {
  dom = domain.create()
  dom.on("error", () => res.send("Oh no"))
  dom.run(() => doStuff())
})

As you can see, this is often desirable for a webapp. You don't want your server to crash without sending a friendly error response (and then you should let it crash).

For this reason, the hb.run function runs the user's code in a domain. However, there are problems with domains, The docs have an entire article on it. Additionally:

  • domains are an easy way to swallow an exception and cause a memory leak due to unclaimed resources.
  • nested domains can lead to unexpected behaviour. We don't know if the end user or the framework is also wrapping the request handling in a domain.

The biggest mindfuck with domains, though, is that the domain error handler will NOT be called if there's a try/catch anywhere in the stack, even if it's way outside the domain.

dom.on("error", errorHandler);
dom.run(() => {
  try {
    throw new Error("ga");
  } catch (e) {
    // The error is caught here, as expected.
    // `errorHandler` is not called.
  }
});

try {
  dom.on("error", errorHandler);
  dom.run(() => {
    throw new Error("ga");
  });
} catch (e) {
    // The error is caught here. WUT?
    // `errorHandler` is still ignored.
}

I suspect similar happens with Promises, because the docs say domains don't catch Promise rejections, but they did in my tests.

There's also the fact that you can't resume execution. Once the domain error handler is triggered, nothing else executes after that, so the error handler has to call the next function to be executed. These two behaviours limited the tests I could write.

Unforunately, there's no way to catch every possible scenario. Error handling in Node.js is a mess, and we want to limit our changes to the user's execution environment.

To mitigate these, hb.run takes an errorHandler argument, and will only use a domain if this value is supplied. This way, frameworks which are more promise-friendly can run with minimal interference, and users can opt out of domains.


Will send in a docs PR when this is merged.

@shalvah
Copy link
Contributor Author

shalvah commented Mar 5, 2022

If anyone would like to test this out, I've updated the Express example to easily test this. You can switch between the old and new versions of this package by commenting one out and leaving the other:

// const Honeybadger = require('@honeybadger-io/js'); // Old version
const Honeybadger = require('../../dist/server/honeybadger.js'); // Current

Start the server with node index.js, then try making 3 simultaneous requests to /test-context in a different tab. With autocannon:

autocannon --connections 3 --amount 3 http://localhost:3000/test-context

Look at the console logs from the server, and the error that is logged.

When we're done, I can change the example back to something more user-friendly.

@shalvah shalvah requested review from joshuap and subzero10 and removed request for joshuap March 5, 2022 15:59
Copy link
Member

@joshuap joshuap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good to me. Thanks for the writeup!

It looks like there's an issue with the integration tests:

TypeError: Unable to get property '__breadcrumbs' of undefined or null reference

@subzero10, can you review and approve if you agree with everything?

Edit: Oh, this will also need a changelog entry! 👍

test/unit/server/middleware.server.test.ts Show resolved Hide resolved
const dom = domain.create()
dom.on('error', next)
dom.run(next)
this.run(next, next)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

src/core/store.ts Outdated Show resolved Hide resolved
@@ -97,11 +100,14 @@ export default class Client {
this.config.__plugins.forEach((plugin) => plugin.load(this))
}

this.__store = opts.store || new DefaultStore({ context: {}, breadcrumbs: [] })
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason you moved this from here to the constructor? At first, I was conflicted as well, but then I realized that Honeybadger.configure is a required method call for it to work (you need to at least call it to set the apiKey). And since we have other initializations happening here, I decided to include the store initialization.

We can leave it as you want, it doesn't make much difference, I just want to know how you thought about this.

Finally, if we are keeping this version, we should remove the store property from the Config interface (types.ts).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For one, the .configure method is typically for things the end user should set, and for now, the store is not meant to be set by them. It's too critical and tied to implementation details. And they don't need to set it either, as hb.run will set it as needed.

Also, the store is too important to not be set, IMO. It's better to construct the object straightaway with as many key details set as possible, avoiding any invalid states.

Finally, in our tests, we don't always call .configure, and having to add that to get things working was a pain.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For one, the .configure method is typically for things the end user should set, and for now, the store is not meant to be set by them. It's too critical and tied to implementation details. And they don't need to set it either, as hb.run will set it as needed.

Good point.

Also, the store is too important to not be set, IMO. It's better to construct the object straightaway with as many key details set as possible, avoiding any invalid states.

I agree, but the client should not work without a call to configure. Perhaps we should enforce this 🤔 (i.e. throw an error if configure is not called)? Or go the other way make the configure method completely optional (i.e. read the apiKey from .env)?

Finally, in our tests, we don't always call .configure, and having to add that to get things working was a pain.

Hehe, I faced the same scenario. I went ahead and added configure because otherwise we were not initializing the client correctly.

Anyway, I'm convinced 😄 and I think @joshuap is too, because he suggested adding the store initialization in the constructor too. I just didn't go for it to keep initializations in one place.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're an error-reporting library, we can't afford to throw an error😅

At best we can write a log if there's no API key when we try to report, but I think we maybe already do that.

I don't think we need to change anything here at the moment. I think expecting the user to call configure is fair. It's always in the usage instructions, right next to the require() part.

// Note that this doesn't still handle all cases. `domain` has its own problems:
// See https://github.com/honeybadger-io/honeybadger-js/pull/711
const dom = domain.create();
const onErrorWithContext = (err) => this.__store.run(storeObject, () => onError(err));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this create a new context? I think that every time we call ALS.run() it creates a new isolated context. So if something was added to the store (in the handler) before throwing the error event is raised, we will not see it here because the storeObject is the initial object we retrieved at the beginning (before calling the handler).
Do you agree? Can we write a test for this?
Maybe it would be better if we wrapped all of this inside this.store.run() ? Like:

const storeObject = this.__getStoreOrDefaultObject();
this.__setStore(AsyncStore)

return this.__store.run(storeObject, () => {
  if (onError) {
    // ALS is fine for context-tracking, but `domain` allows us to catch errors
    // thrown asynchronously (timers, event emitters)
    // We can't use unhandledRejection/uncaughtException listeners; they're global and shared across all requests
    // But the `onError` handler might be request-specific.
    // Note that this doesn't still handle all cases. `domain` has its own problems:
    // See https://github.com/honeybadger-io/honeybadger-js/pull/711
    const dom = domain.create();
    dom.on('error', onError);
    handler = dom.bind(handler);
  }

  return handler();
});

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hehe, I went through the same thought patterns and iterations too.😅

Your suggestion doesn't work because it seems the domain error handler is called outside the context of the ALS context (even though we ran the domain inside it). A simple script you can test with:

const als = new (require("async_hooks").AsyncLocalStorage);

als.run({a: true}, () => {
    const dom = require("domain").create();
    onError = () => console.log(als.getStore())
    dom.on('error', onError);
    handler = dom.bind(() => { throw new Error });
    handler();
});

Logs undefined.

So apparently, the ALS context ends before the domain error handler is called.

As for your first question, the context is an object, so it is preserved, since we're passing the same object to both cases of als.run. Test:

const als = new (require("async_hooks").AsyncLocalStorage);

ctx = {a: true}
als.run(ctx, () => {
    const dom = require("domain").create();
    onError = () => console.log(als.getStore())
    dom.on('error', (e) => als.run(ctx, onError));
    handler = dom.bind(() => { 
        als.getStore().b = true;
        throw new Error
    });
    handler();
});

Logs:

{ a: true, b: true }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, re tests: the captures errors in timers with the right context below tests for this. The context object is set in the handler, then we grab it in the error handler and assert that it's correct.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thorough explanation, thanks!

Copy link
Member

@subzero10 subzero10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All my questions have been answered/resolved, thanks @shalvah !

@shalvah shalvah merged commit 20b8bd0 into master Mar 9, 2022
@shalvah shalvah deleted the context-isolation branch March 10, 2022 00:28
@shalvah
Copy link
Contributor Author

shalvah commented Mar 10, 2022

@joshuap could we tag a new beta release so I can test this out on more examples?

@joshuap
Copy link
Member

joshuap commented Mar 14, 2022

@joshuap could we tag a new beta release so I can test this out on more examples?

Yeah I'll release a new beta today

@shalvah
Copy link
Contributor Author

shalvah commented Mar 15, 2022

@joshuap 👀

@joshuap
Copy link
Member

joshuap commented Mar 16, 2022

@joshuap 👀

yeah sorry, got tied up. I'll do it today... 😅

@joshuap
Copy link
Member

joshuap commented Mar 16, 2022

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Node.js server integration - isolated context per request
3 participants