AsyncLocalStorage kills 97% of performance in an `async` environment #34493

danielgindi · 2020-07-23T12:10:19Z

Version: 12.18.3
Platform: macos
Subsystem: async_hooks

What steps will reproduce the bug?

const { AsyncLocalStorage } = require('async_hooks');
const asyncLocalStorage = new AsyncLocalStorage();

let fn = async () => /test/.test('test');

let runWithExpiry = async (expiry, fn) => {
    let iterations = 0;
    while (Date.now() < expiry) {
        await fn();
        iterations++;
    }
    return iterations;
};

(async () => {

    console.log(`Performed ${await runWithExpiry(Date.now() + 100, fn)} iterations to warmup`);

    asyncLocalStorage.run({}, () => {});
    let withAls = await runWithExpiry(Date.now() + 1000, fn);
    console.log(`Performed ${withAls} iterations (with ALS enabled)`);

    asyncLocalStorage.disable();
    let withoutAls = await runWithExpiry(Date.now() + 1000, fn);
    console.log(`Performed ${withoutAls} iterations (with ALS disabled)`);

    console.log('ALS penalty: ' + Math.round((1 - (withAls / withoutAls)) * 10000) / 100 + '%');
})();

Output:

Performed 180407 iterations to warmup
Performed 205741 iterations (with ALS enabled)
Performed 6446728 iterations (with ALS disabled)
ALS penalty: 96.8%

How often does it reproduce? Is there a required condition?

In any case that await is used.

What is the expected behavior?

Should be around 10% penalty.

What do you see instead?

I see 97% reduction in performance.

Additional information

I've played in the past with a Zone polyfill of my own (zone-polyfill), and at the beginning I tried using the stack traces that's generated in latest V8 versions, where they keep their context after await calls.
Combining that with the basic technique to track context, I was able to track the zone but due to the string nature of the stack traces - I had to map them in the memory but there was no WeakMap available. Using a NAN solution I got to about 15% penalty.
Now assuming that these capabilities are available on the c++ side and much more natively, with the option to directly leverage the compiler instead of generating stack traces to query for info - I'd assume max 10% penalty with a native implementation.

The text was updated successfully, but these errors were encountered:

puzpuzpuz · 2020-07-28T12:01:24Z

I've tried to run this test on the latest master and got the following result:

Performed 769807 iterations to warmup
Performed 747431 iterations (with ALS enabled)
Performed 8319900 iterations (with ALS disabled)

So, it's slightly better than the reported result, yet ALS (or async_hooks to be more precise) have a significant overhead in such simple synthetical benchmarks where are promises are used heavily and there are no significant computation or I/O operations going on.

I've also tried to run benchmark/async_hooks/promises.js with modifications from #34523 with the following no-op hook:

static void FastPromiseHook(PromiseHookType type, Local<Promise> promise,
                            Local<Value> parent) {
  Local<Context> context = promise->CreationContext();
  Environment* env = Environment::GetCurrent(context);
  if (env == nullptr) return;

  env->new_async_id();
}

and got the following result:

$ ./node benchmark/async_hooks/promises.js
async_hooks/promises.js asyncHooks="enabled" n=1000000: 885,253.5326989443
async_hooks/promises.js asyncHooks="enabledWithDestroy" n=1000000: 251,938.54719793968
async_hooks/promises.js asyncHooks="enabledWithInitOnly" n=1000000: 787,156.9010261587
async_hooks/promises.js asyncHooks="disabled" n=1000000: 1,583,621.8483830877

As you may see in this benchmark with get around 50% of the disabled performance in situations when the no-op hook is installed. And that means that 50% is the best overhead we can get in such benchmarks with the current V8 PromiseHook integration. In reality the overhead will be higher, as we need to do many operations, not only the env->new_async_id() one.

BTW I've also experimented with internal fields (a couple of them) instead of symbol properties and got no significant performance improvements. Mainly that's because of the fact that with this approach the hook has to do more operations on average when compared with the current implementation.

danielgindi · 2020-07-28T13:43:38Z

I've tried to run this test on the latest master and got the following result:
Performed 769807 iterations to warmup
Performed 747431 iterations (with ALS enabled)
Performed 8319900 iterations (with ALS disabled)
So, it's slightly better than the reported result, yet ALS (or async_hooks to be more precise) have a significant overhead in such simple synthetical benchmarks where are promises are used heavily and there are no significant computation or I/O operations going on.

I've also tried to run benchmark/async_hooks/promises.js with modifications from #34523 with the following no-op hook:
static void FastPromiseHook(PromiseHookType type, Local<Promise> promise,
                            Local<Value> parent) {
  Local<Context> context = promise->CreationContext();
  Environment* env = Environment::GetCurrent(context);
  if (env == nullptr) return;

  env->new_async_id();
}
and got the following result:
$ ./node benchmark/async_hooks/promises.js
async_hooks/promises.js asyncHooks="enabled" n=1000000: 885,253.5326989443
async_hooks/promises.js asyncHooks="enabledWithDestroy" n=1000000: 251,938.54719793968
async_hooks/promises.js asyncHooks="enabledWithInitOnly" n=1000000: 787,156.9010261587
async_hooks/promises.js asyncHooks="disabled" n=1000000: 1,583,621.8483830877
As you may see in this benchmark with get around 50% of the disabled performance in situations when the no-op hook is installed. And that means that 50% is the best overhead we can get in such benchmarks with the current V8 PromiseHook integration. In reality the overhead will be higher, as we need to do many operations, not only the env->new_async_id() one.

BTW I've also experimented with internal fields (a couple of them) instead of symbol properties and got no significant performance improvements. Mainly that's because of the fact that with this approach the hook has to do more operations on average when compared with the current implementation.

Well, only slightly better. with ALS enabled is 8.9% of with ALS disabled, which means that 91% is lost, and not 97% (which was the consistent number I got on all machines).

If there was a way to only get a hook installed for a case of an await returning to parent context - that would help us have a js-land implementation with much less overhead.

puzpuzpuz · 2020-07-28T14:08:41Z

Well, only slightly better.

Yes and that aligns with the benchmark results I got for #34523. There is still a lot to wish for in terms of the PromiseHook overhead. However, in real-world applications the overhead will be much smaller.

If there was a way to only get a hook installed for a case of an await returning to parent context - that would help us have a js-land implementation with much less overhead.

That's what PromiseHook used by async_hooks allows you to do. Yet, it assumes native code.

Qard · 2020-07-31T03:35:10Z

I'd consider this is a somewhat unrealistic micro-benchmark as it's awaiting a non-promise in a loop which is naturally going to have a lot more machinery in the barrier marking and tracking then in the non-async function call it is giving to the await.

I did a few runs locally, replacing the function with some more realistic variations:

let fn = () => /test/.test('test'); - the original code, for baseline comparison:

Performed 744791 iterations to warmup
Performed 790158 iterations (with ALS enabled)
Performed 7946914 iterations (with ALS disabled)

let fn = () => Promise.resolve('test'); - at least creates an actual promise, though no actual asynchronous behaviour beyond deferring to the next microtask queue tick:

Performed 873666 iterations to warmup
Performed 962120 iterations (with ALS enabled)
Performed 8148736 iterations (with ALS disabled)

let fn = () => new Promise(setImmediate); - somewhat more realistic representation of a promise you'd actually typically see in reality:

Performed 5477 iterations to warmup
Performed 56758 iterations (with ALS enabled)
Performed 62029 iterations (with ALS disabled)

let fn = () => new Promise(process.nextTick); - reasonably realistic representation of higher-priority promises you might see in reality:

Performed 158118 iterations to warmup
Performed 582238 iterations (with ALS enabled)
Performed 2683626 iterations (with ALS disabled)

Most promises would be I/O related and would sit somewhere between the last two tests.

It's fair to want the performance to be better, but the path your code tested is actually quite different, according to the spec, from what an actual typical promise-based await would look like. I don't think it's too helpful to rely on micro-benchmarks, especially with a system this complex. I do think there may be optimization potential with AsyncLocalStorage to use the underlying machinery of async_hooks directly in a way that can avoid the numerous barrier crosses of the hook callback functions, though that would be a fairly complex change. 🤔

danielgindi · 2020-07-31T04:34:18Z

I agree that micro-benchmarks do not present the whole picture.
But it does present the performance difference of this specific feature, and how it will affect your code if you have a lot of awaits/Promises. And if you do have a loop over little-work async code somewhere in your app, something that happens a lot of time, you will be surprised to see a major hit in performance.

So if the hit in the micro benchmark was 20%, that would be acceptable. But that's not the case.

Qard · 2020-07-31T04:38:09Z

As I said, what you were testing was not async. The performance difference would almost never look like that in reality because an actual async promise has a bunch more overhead than the non-async test function you awaited in your test code.

danielgindi · 2020-07-31T06:41:57Z

So you're saying the numbers I've posted are not bad enough?
And this is intentional, as I wanted to measure the impact on the await.

puzpuzpuz · 2020-07-31T07:55:43Z

@Qard thanks for these experiments. I also agree that ALS is meant to be used in real-world applications and it makes sense to benchmark it in a different way. Micro-benchmarks are certainly valuable for various optimizations, however web applications do more than what's done in the original snippet and ALS' overhead will be much lower for them.

@danielgindi however, this doesn't mean that async_hooks + promise integration doesn't need any performance improvements. But, as it was already discussed, all low-hanging fruit optimizations are already implemented. So, it may take some time and significant efforts to come up with more advanced optimizations.

mcollina · 2020-07-31T08:21:32Z

So you're saying the numbers I've posted are not bad enough?

They are bad. However it is very rare that await is the bottleneck of any application. In order for await to be the bottleneck, what being await would have to be synchronous or fake-async.

By using a setTimeout of 2ms:

const { AsyncLocalStorage } = require('async_hooks');
const asyncLocalStorage = new AsyncLocalStorage();
const { promisify } = require('util')

let fn = promisify(setTimeout).bind(null, 2);

let runWithExpiry = async (expiry, fn) => {
      let iterations = 0;
      while (Date.now() < expiry) {
                await fn();
                iterations++;
            }
      return iterations;
};

(async () => {

      console.log(`Performed ${await runWithExpiry(Date.now() + 100, fn)} iterations to warmup`);

      asyncLocalStorage.run({}, () => {});
      let withAls = await runWithExpiry(Date.now() + 10000, fn);
      console.log(`Performed ${withAls} iterations (with ALS enabled)`);

      asyncLocalStorage.disable();
      let withoutAls = await runWithExpiry(Date.now() + 10000, fn);
      console.log(`Performed ${withoutAls} iterations (with ALS disabled)`);

      console.log('ALS penalty: ' + Math.round((1 - (withAls / withoutAls)) * 10000) / 100 + '%');
})();

We get:

$ node a
Performed 44 iterations to warmup
Performed 4214 iterations (with ALS enabled)
Performed 4400 iterations (with ALS disabled)
ALS penalty: 4.23%

This is totally acceptable and well within range.

danielgindi · 2020-08-02T07:14:19Z

@mcollina I get that. Although some async operations may take less than 2ms, still being async and taking advantage of libuv threads.

I'm going to do a wet test today on a large scale application that has all imaginable scenarios implemented, and that should service hundreds of millions of users. I'll let you guys know how it goes :-)

puzpuzpuz · 2021-05-20T12:58:35Z

Since #36394 has landed and got shipped in v16.2.0, I tried running the original script with that release. Here is the result:

Performed 655186 iterations to warmup
Performed 3084236 iterations (with ALS enabled)
Performed 6097186 iterations (with ALS disabled)
ALS penalty: 49.42%

From 96.8% down to 49.42% here.

The script from #34493 (comment) (the 2ms setTimeout() one) gives the following result:

Performed 45 iterations to warmup
Performed 4202 iterations (with ALS enabled)
Performed 4277 iterations (with ALS disabled)
ALS penalty: 1.75%

From 4.23% down to 1.75% here.

The improvement is quite dramatic (thanks to @Qard) especially considering that the original script does almost nothing on the hot path, but creating promises. I think this issue can be now closed.

danielgindi · 2021-05-20T17:08:22Z

This is great! Real dramatic improvement. 😀

HarshithaKP added the async_hooks Issues and PRs related to the async hooks subsystem. label Jul 24, 2020

addaleax mentioned this issue Jul 25, 2020

[Experiment] async_hooks: optimize fast-path promise hook for ALS #34512

Closed

4 tasks

puzpuzpuz added the performance Issues and PRs related to the performance of Node.js. label Jul 26, 2020

puzpuzpuz mentioned this issue Jul 29, 2020

Proposal for reworking promise integration into async_hooks nodejs/diagnostics#389

Open

1 task

thetutlage mentioned this issue Dec 18, 2020

feat: add async HTTP context adonisjs/http-server#18

Merged

SimonX200 mentioned this issue Mar 26, 2021

Context is preserved between two runPromise calls Jeff-Lewis/cls-hooked#63

Open

devsnek closed this as completed May 20, 2021

subzero10 mentioned this issue Jan 6, 2022

Improve Honeybadger.lambdaHandler() to return async handler honeybadger-io/honeybadger-js#680

Merged

10 tasks

FoxxMD mentioned this issue Mar 14, 2022

Implement tracking api usage FoxxMD/context-mod#81

Open

Char2sGu mentioned this issue Apr 7, 2022

[Feature] Optional Alternative to Scoped Providers deepkit/deepkit-framework#208

Closed

This was referenced May 16, 2022

feat: use AsyncLocalStorage to track logger context seek-oss/skuba#864

Merged

feat(RequestLogging): Add createContextStorage seek-oss/koala#153

Merged

vmarchaud mentioned this issue Jun 16, 2022

Reorg manual tracing JS docs based on SDK open-telemetry/opentelemetry.io#1445

Merged

macobo mentioned this issue Aug 17, 2022

chore(plugin-server): disable tracing for a bit PostHog/posthog#11355

Closed

temich mentioned this issue Feb 25, 2024

refactor(exposition): remove express toa-io/toa#525

Merged

Brooooooklyn mentioned this issue Mar 12, 2024

fix(server): wrap read-modify-write apis with distributed lock toeverything/AFFiNE#5979

Merged

winterggg mentioned this issue Mar 30, 2024

AsyncLocalStorage 源码分析 winterggg/blog#7

Open

HardenSG mentioned this issue Apr 8, 2024

cls-hooked异步上下文存储实现与PromiseHook在node层源码解析 HardenSG/blog#9

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AsyncLocalStorage kills 97% of performance in an `async` environment #34493

AsyncLocalStorage kills 97% of performance in an `async` environment #34493

danielgindi commented Jul 23, 2020 •

edited

Loading

puzpuzpuz commented Jul 28, 2020

danielgindi commented Jul 28, 2020 •

edited

Loading

puzpuzpuz commented Jul 28, 2020

Qard commented Jul 31, 2020 •

edited

Loading

danielgindi commented Jul 31, 2020

Qard commented Jul 31, 2020

danielgindi commented Jul 31, 2020

puzpuzpuz commented Jul 31, 2020

mcollina commented Jul 31, 2020

danielgindi commented Aug 2, 2020

puzpuzpuz commented May 20, 2021

danielgindi commented May 20, 2021

AsyncLocalStorage kills 97% of performance in an async environment #34493

AsyncLocalStorage kills 97% of performance in an async environment #34493

Comments

danielgindi commented Jul 23, 2020 • edited Loading

What steps will reproduce the bug?

How often does it reproduce? Is there a required condition?

What is the expected behavior?

What do you see instead?

Additional information

puzpuzpuz commented Jul 28, 2020

danielgindi commented Jul 28, 2020 • edited Loading

puzpuzpuz commented Jul 28, 2020

Qard commented Jul 31, 2020 • edited Loading

danielgindi commented Jul 31, 2020

Qard commented Jul 31, 2020

danielgindi commented Jul 31, 2020

puzpuzpuz commented Jul 31, 2020

mcollina commented Jul 31, 2020

danielgindi commented Aug 2, 2020

puzpuzpuz commented May 20, 2021

danielgindi commented May 20, 2021

AsyncLocalStorage kills 97% of performance in an `async` environment #34493

AsyncLocalStorage kills 97% of performance in an `async` environment #34493

danielgindi commented Jul 23, 2020 •

edited

Loading

danielgindi commented Jul 28, 2020 •

edited

Loading

Qard commented Jul 31, 2020 •

edited

Loading