Implement context isolation for Node.js #711

shalvah · 2022-03-05T15:32:30Z

Status

READY

Description

This PR fixes #708 by storing the context in AsyncLocalStorage for web requests.

Implementation

Thanks to the earlier work by @subzero10, the core implementation was relatively straightforward:

First, when the agent is created, it uses the GlobalStore (renamed from DefaultStore so its behaviour is more obvious). This allows us to support simple single-request use cases like CLI jobs.
We're exposing the hb.run(cb) method, which sets the store to the AsyncStore and runs the given code with it. Each request will have its own context, thanks to ALS. Users can wrap their request handlers in hb.run() to properly capture the context for each request. For supported frameworks like Express, they can just add the Honeybadger.requestHandler middleware, which is a wrapper over hb.run.

However, because of Node.js' oddities, I had to do several more investigations. Here's my tale:

The existing Honeybadger.requestHandler() simply wrapped the code in a domain and set the next handler as the domain's error handler. At first, it seemed strange, but I realized that the advantage of domains is that they can capture asynchronous errors triggered from the same context. In the example below, the error will be caught by someErrorHandler.:

functionThatErrors = () => { throw new Error("Oh no") };

handler = () => {
  try {
    setTimeout(functionThatErrors, 100);
  } catch (e) {
    // This won't catch the error!
  }
}

dom = require("domain").create();
dom.on("error", someErrorHandler); // <- This will
dom.run(handler);

Without wrapping in a domain, there is no way to catch the error (unless you fix functionThatErrors). The error would bubble up to process.on("uncaughtException") (or crashes if you don't have one registered)`:

try {
  handler ();
} catch (e) {
  // Doesn't work
}

process.on("uncaughtException", (err) => {
  // Catches it
});

Your uncaughtException handler will catch this, yes, but you've lost the context. In an Express app, for example, you'd no longer have access to the res object, so you can't send a response to the user. But with domains, you can create a new one for every request and give it an error handler that captures res and anything else you need. So, domains are sorta mini-worlds.

app.get("/", (req, res, next) => {
  dom = domain.create()
  dom.on("error", () => res.send("Oh no"))
  dom.run(() => doStuff())
})

As you can see, this is often desirable for a webapp. You don't want your server to crash without sending a friendly error response (and then you should let it crash).

For this reason, the hb.run function runs the user's code in a domain. However, there are problems with domains, The docs have an entire article on it. Additionally:

domains are an easy way to swallow an exception and cause a memory leak due to unclaimed resources.
nested domains can lead to unexpected behaviour. We don't know if the end user or the framework is also wrapping the request handling in a domain.

The biggest mindfuck with domains, though, is that the domain error handler will NOT be called if there's a try/catch anywhere in the stack, even if it's way outside the domain.

dom.on("error", errorHandler);
dom.run(() => {
  try {
    throw new Error("ga");
  } catch (e) {
    // The error is caught here, as expected.
    // `errorHandler` is not called.
  }
});

try {
  dom.on("error", errorHandler);
  dom.run(() => {
    throw new Error("ga");
  });
} catch (e) {
    // The error is caught here. WUT?
    // `errorHandler` is still ignored.
}

I suspect similar happens with Promises, because the docs say domains don't catch Promise rejections, but they did in my tests.

There's also the fact that you can't resume execution. Once the domain error handler is triggered, nothing else executes after that, so the error handler has to call the next function to be executed. These two behaviours limited the tests I could write.

Unforunately, there's no way to catch every possible scenario. Error handling in Node.js is a mess, and we want to limit our changes to the user's execution environment.

To mitigate these, hb.run takes an errorHandler argument, and will only use a domain if this value is supplied. This way, frameworks which are more promise-friendly can run with minimal interference, and users can opt out of domains.

Will send in a docs PR when this is merged.

shalvah · 2022-03-05T15:53:39Z

If anyone would like to test this out, I've updated the Express example to easily test this. You can switch between the old and new versions of this package by commenting one out and leaving the other:

// const Honeybadger = require('@honeybadger-io/js'); // Old version
const Honeybadger = require('../../dist/server/honeybadger.js'); // Current

Start the server with node index.js, then try making 3 simultaneous requests to /test-context in a different tab. With autocannon:

autocannon --connections 3 --amount 3 http://localhost:3000/test-context

Look at the console logs from the server, and the error that is logged.

When we're done, I can change the example back to something more user-friendly.

joshuap

This is looking good to me. Thanks for the writeup!

It looks like there's an issue with the integration tests:

TypeError: Unable to get property '__breadcrumbs' of undefined or null reference

@subzero10, can you review and approve if you agree with everything?

Edit: Oh, this will also need a changelog entry! 👍

test/unit/server/middleware.server.test.ts

subzero10 · 2022-03-09T20:28:24Z

src/server/middleware.ts

-  const dom = domain.create()
-  dom.on('error', next)
-  dom.run(next)
+  this.run(next, next)


src/core/store.ts

subzero10 · 2022-03-09T20:41:28Z

src/core/client.ts

@@ -97,11 +100,14 @@ export default class Client {
      this.config.__plugins.forEach((plugin) => plugin.load(this))
    }

-    this.__store = opts.store || new DefaultStore({ context: {}, breadcrumbs: [] })


Is there any reason you moved this from here to the constructor? At first, I was conflicted as well, but then I realized that Honeybadger.configure is a required method call for it to work (you need to at least call it to set the apiKey). And since we have other initializations happening here, I decided to include the store initialization.

We can leave it as you want, it doesn't make much difference, I just want to know how you thought about this.

Finally, if we are keeping this version, we should remove the store property from the Config interface (types.ts).

For one, the .configure method is typically for things the end user should set, and for now, the store is not meant to be set by them. It's too critical and tied to implementation details. And they don't need to set it either, as hb.run will set it as needed.

Also, the store is too important to not be set, IMO. It's better to construct the object straightaway with as many key details set as possible, avoiding any invalid states.

Finally, in our tests, we don't always call .configure, and having to add that to get things working was a pain.

For one, the .configure method is typically for things the end user should set, and for now, the store is not meant to be set by them. It's too critical and tied to implementation details. And they don't need to set it either, as hb.run will set it as needed.

Good point.

Also, the store is too important to not be set, IMO. It's better to construct the object straightaway with as many key details set as possible, avoiding any invalid states.

I agree, but the client should not work without a call to configure. Perhaps we should enforce this 🤔 (i.e. throw an error if configure is not called)? Or go the other way make the configure method completely optional (i.e. read the apiKey from .env)?

Finally, in our tests, we don't always call .configure, and having to add that to get things working was a pain.

Hehe, I faced the same scenario. I went ahead and added configure because otherwise we were not initializing the client correctly.

Anyway, I'm convinced 😄 and I think @joshuap is too, because he suggested adding the store initialization in the constructor too. I just didn't go for it to keep initializations in one place.

Since we're an error-reporting library, we can't afford to throw an error😅

At best we can write a log if there's no API key when we try to report, but I think we maybe already do that.

I don't think we need to change anything here at the moment. I think expecting the user to call configure is fair. It's always in the usage instructions, right next to the require() part.

subzero10 · 2022-03-09T20:56:57Z

src/server.ts

+      // Note that this doesn't still handle all cases. `domain` has its own problems:
+      // See https://github.com/honeybadger-io/honeybadger-js/pull/711
+      const dom = domain.create();
+      const onErrorWithContext = (err) => this.__store.run(storeObject, () => onError(err));


Doesn't this create a new context? I think that every time we call ALS.run() it creates a new isolated context. So if something was added to the store (in the handler) before throwing the error event is raised, we will not see it here because the storeObject is the initial object we retrieved at the beginning (before calling the handler).
Do you agree? Can we write a test for this?
Maybe it would be better if we wrapped all of this inside this.store.run() ? Like:

const storeObject = this.__getStoreOrDefaultObject(); this.__setStore(AsyncStore) return this.__store.run(storeObject, () => { if (onError) { // ALS is fine for context-tracking, but `domain` allows us to catch errors // thrown asynchronously (timers, event emitters) // We can't use unhandledRejection/uncaughtException listeners; they're global and shared across all requests // But the `onError` handler might be request-specific. // Note that this doesn't still handle all cases. `domain` has its own problems: // See https://github.com/honeybadger-io/honeybadger-js/pull/711 const dom = domain.create(); dom.on('error', onError); handler = dom.bind(handler); } return handler(); });

Hehe, I went through the same thought patterns and iterations too.😅

Your suggestion doesn't work because it seems the domain error handler is called outside the context of the ALS context (even though we ran the domain inside it). A simple script you can test with:

const als = new (require("async_hooks").AsyncLocalStorage); als.run({a: true}, () => { const dom = require("domain").create(); onError = () => console.log(als.getStore()) dom.on('error', onError); handler = dom.bind(() => { throw new Error }); handler(); });

Logs undefined.

So apparently, the ALS context ends before the domain error handler is called.

As for your first question, the context is an object, so it is preserved, since we're passing the same object to both cases of als.run. Test:

const als = new (require("async_hooks").AsyncLocalStorage); ctx = {a: true} als.run(ctx, () => { const dom = require("domain").create(); onError = () => console.log(als.getStore()) dom.on('error', (e) => als.run(ctx, onError)); handler = dom.bind(() => { als.getStore().b = true; throw new Error }); handler(); });

Logs:

{ a: true, b: true }

Also, re tests: the captures errors in timers with the right context below tests for this. The context object is set in the handler, then we grab it in the error handler and assert that it's correct.

Thorough explanation, thanks!

subzero10

All my questions have been answered/resolved, thanks @shalvah !

shalvah · 2022-03-10T18:03:50Z

@joshuap could we tag a new beta release so I can test this out on more examples?

joshuap · 2022-03-14T16:42:52Z

@joshuap could we tag a new beta release so I can test this out on more examples?

Yeah I'll release a new beta today

shalvah · 2022-03-15T15:06:08Z

@joshuap 👀

joshuap · 2022-03-16T16:47:02Z

@joshuap 👀

yeah sorry, got tied up. I'll do it today... 😅

joshuap · 2022-03-16T17:08:11Z

Released v4.0.0-beta.1 https://www.npmjs.com/package/@honeybadger-io/js/v/4.0.0-beta.1

shalvah added 3 commits March 4, 2022 19:55

Implement context isolation for Node.js

0b9f259

Rename DefaultStore to GlobalStore

394def3

More testing and reworking 😖

dc88358

Update example for use as test

a8d67e1

shalvah requested review from joshuap and subzero10 and removed request for joshuap March 5, 2022 15:59

joshuap reviewed Mar 8, 2022

View reviewed changes

shalvah added 3 commits March 9, 2022 11:51

Merge remote-tracking branch 'origin/master' into context-isolation

5bca2e4

Fix tests

8363bba

Update changelog

74778e7

subzero10 reviewed Mar 9, 2022

View reviewed changes

Clean up

e170c37

subzero10 approved these changes Mar 9, 2022

View reviewed changes

shalvah merged commit 20b8bd0 into master Mar 9, 2022

shalvah deleted the context-isolation branch March 10, 2022 00:28

shalvah mentioned this pull request Mar 21, 2022

Node.js Context isolation, Part 2 #717

Merged

subzero10 added this to the 4.0.0 milestone Apr 24, 2022

subzero10 mentioned this pull request Aug 11, 2023

uncaughtException handler not working as expected with latest release #960

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement context isolation for Node.js #711

Implement context isolation for Node.js #711

shalvah commented Mar 5, 2022 •

edited

Loading

shalvah commented Mar 5, 2022

joshuap left a comment •

edited

Loading

subzero10 Mar 9, 2022

subzero10 Mar 9, 2022

shalvah Mar 9, 2022

subzero10 Mar 9, 2022

shalvah Mar 9, 2022

subzero10 Mar 9, 2022

shalvah Mar 9, 2022

shalvah Mar 9, 2022

subzero10 Mar 9, 2022

subzero10 left a comment

shalvah commented Mar 10, 2022

joshuap commented Mar 14, 2022

shalvah commented Mar 15, 2022

joshuap commented Mar 16, 2022

joshuap commented Mar 16, 2022

Implement context isolation for Node.js #711

Implement context isolation for Node.js #711

Conversation

shalvah commented Mar 5, 2022 • edited Loading

Status

Description

Implementation

shalvah commented Mar 5, 2022

joshuap left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

subzero10 left a comment

Choose a reason for hiding this comment

shalvah commented Mar 10, 2022

joshuap commented Mar 14, 2022

shalvah commented Mar 15, 2022

joshuap commented Mar 16, 2022

joshuap commented Mar 16, 2022

shalvah commented Mar 5, 2022 •

edited

Loading

joshuap left a comment •

edited

Loading