Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DO NOT MERGE] test: duplicate async_hooks tests in esm #44323

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

GeoffreyBooth
Copy link
Member

Spinning off from #43408 (comment) and following comments, in this PR I ported the async_hooks tests in test/async-hooks/*.js into ESM. I left the original CommonJS versions side-by-side with the ESM versions, so the diff shows just 90 new .mjs files; but the tests themselves are unchanged other than require becoming import and similar expected changes related to conversion to ESM (declaring __filename, replacing top-level return with process.exit(), etc.). This PR isn’t meant to be merged, which is why I’ve opened it as a draft; it’s just meant to provide a branch to test issues related to async_hooks and ESM. (Branches off of and includes #44322.)

On my machine, running python tools/test.py -J --mode=release async-hooks for this branch yields 58 failing tests. Arguably there should be no failing tests here; async_hooks should behave identically when used in ESM code as when used in CommonJS code, like most (all?) other Node APIs. So this raises a few questions:

  1. Are the failing tests simply tests that need rewriting? The tests are very specific in the assertions they validate, looking for things like the number of both user-created and internal Node async resources created over the lifetime of the Node process. Perhaps the different results are expected, and the tests should simply be changed so that the ESM versions have different expectations to validate; or the validation should become less specific, such as by filtering out all the Node internal async activity somehow.

  2. Is something about the ESM loader broken? Is it creating async resources before it should, or not resolving all startup async resources before user code runs, denying async_hooks the clean environment it expects when the first line of user code runs? Is there some change that we can make to the ESM loader to make it behave more similarly to the CommonJS loader, from async_hooks‘s perspective, without that change breaking anything else?

  3. Is async_hooks itself somehow broken or incompatible under ESM? And if so, can it be fixed? In the worst case, something like the async activity of import statements themselves would be confusing async_hooks, presenting us with what I assume is no way to make the async_hooks API behave equivalently in ESM; or user code needs to somehow filter out or accommodate all these “module loading” async resources that don’t exist in the sync CommonJS.

cc @nodejs/async_hooks @nodejs/modules @nodejs/loaders

@GeoffreyBooth GeoffreyBooth added wip Issues and PRs that are still a work in progress. test Issues and PRs related to the tests. async_hooks Issues and PRs related to the async hooks subsystem. esm Issues and PRs related to the ECMAScript Modules implementation. labels Aug 21, 2022
@nodejs-github-bot nodejs-github-bot added the needs-ci PRs that need a full CI run. label Aug 21, 2022
@nodejs-github-bot
Copy link
Collaborator

aduh95

This comment was marked as off-topic.

@GeoffreyBooth
Copy link
Member Author

There are a lot of places where this PR uses named imports at the cost of decreasing the readability of the code

I converted all these files using an automated process. This isn't a PR that's intended to be merged; it's a branch to discuss async_hooks under ESM (see top comment). If we decide that this is something we actually want in the codebase, sure, let's clean it up; but I doubt we will. I would think that we would probably want a handful of async_hooks tests to validate certain things in ESM that are expected to be different from CommonJS and that's it.

@mcollina
Copy link
Member

Here is the full list of failing tests, we should check them one-by-one and see if the differences are correct or there are actually bugs to fix:

  • test-async-await
  • test-async-exec-resource-match
  • test-crypto-pbkdf2
  • test-crypto-randomBytes
  • test-disable-in-init
  • test-embedder.api.async-resource
  • test-embedder.api.async-resource-no-type
  • test-emit-init
  • test-emit-after-on-destroyed
  • test-enable-disable
  • test-enable-in-init
  • test-emit-before-on-destroyed
  • test-fseventwrap
  • test-filehandle-no-reuse
  • test-fsreqcallback-access
  • test-fsreqcallback-readFile
  • test-getaddrinforeqwrap
  • test-getnameinforeqwrap
  • test-graph.fsreq-readFile
  • test-graph.http
  • test-graph.intervals
  • test-graph.pipe
  • test-graph.pipeconnect
  • test-graph.shutdown
  • test-graph.signal
  • test-graph.statwatcher
  • test-graph.tcp
  • test-graph.timeouts
  • test-graph.tls-write-12
  • test-http-agent-handle-reuse-parallel
  • test-graph.tls-write
  • test-http-agent-handle-reuse-serial
  • test-httpparser.request
  • test-httpparser.response
  • test-immediate
  • test-improper-order
  • test-nexttick-default-trigger
  • test-pipeconnectwrap
  • test-pipewrap
  • test-promise
  • test-promise.chain-promise-before-init-hooks
  • test-promise.promise-before-init-hooks
  • test-queue-microtask
  • test-querywrap
  • test-shutdownwrap
  • test-signalwrap
  • test-statwatcher
  • test-tcpwrap
  • test-timers.setInterval
  • test-timers.setTimeout
  • test-tlswrap
  • test-ttywrap.readstream
  • test-udpwrap
  • test-udpsendwrap
  • test-unhandled-exception-valid-ids
  • test-unhandled-rejection-context
  • test-writewrap
  • test-zlib.zlib-binding.deflate

@mcollina
Copy link
Member

I think the following test shows the problem:

$ ./node test/async-hooks/test-async-await.mjs
Error: Found a handle whose promiseResolve hook was invoked but not its init hook
    at ActivityCollector._getActivity (file:///Users/matteo/Repositories/node/test/async-hooks/init-hooks.mjs:168:19)
    at ActivityCollector._promiseResolve (file:///Users/matteo/Repositories/node/test/async-hooks/init-hooks.mjs:213:20)
    at emitHook (node:internal/async_hooks:235:38)
    at promiseResolveHook (node:internal/async_hooks:364:3)
    at ModuleJob.run (node:internal/modules/esm/module_job:193:25)
    at async Promise.all (index 0)
    at async ESMLoader.import (node:internal/modules/esm/loader:528:24)
    at async loadESM (node:internal/process/esm_loader:102:5)
    at async handleMainPromise (node:internal/modules/run_main:70:12)

We are already in the microtask queue when user code is executed, i.e. when we can start async_hooks. Therefore we have a spurious execution flow that we cannot track. The consequences of this are a bit unpredictable. AsynchronousLocalStorage uses the init hook to keep the context, so we are potentially missing an execution flow and a context failure. However, I don't think that's the case as I guess that the "spurious" promise originated from the loader chain, and it is not something useful for the user. @nodejs/diagnostics @nodejs/loaders wdyt?

FWIW The following test pass:

import '../common/index.mjs';
import { strictEqual } from 'assert';
import { AsyncLocalStorage } from 'async_hooks';

const asyncLocalStorage = new AsyncLocalStorage();

async function test() {
  asyncLocalStorage.getStore().set('foo', 'bar');
  await Promise.resolve();
  strictEqual(asyncLocalStorage.getStore().get('foo'), 'bar');
}

await asyncLocalStorage.run(new Map(), test);
strictEqual(asyncLocalStorage.getStore(), undefined);

@Flarna
Copy link
Member

Flarna commented Aug 28, 2022

I don't think the promises created by the loader chain are of any interest for classic APM products monitoring HTTP, GRPC, database,... requests. They start transactions in the HTTP/GRPC/... request handlers and track the context from this point on to link client requests. Even if a server would be created before the AsyncLocalStore instance has been created it's not an issue as long as incoming request are seen.

But there might be other tools which would like to monitor the loading itself. To my understanding this can get quite complex if one chains serveral loaders which may fetch source from somewhere, transplile, cache,...
If someone want's to monitor this it's likely needed to track promises created by these loaders.

Usually the solution is to start the monitoring tool earlier then the application. In CJS this works fine by loading the tool via -r or put it as first require in the main script file.
For ESM that's likely more complex as a loader is already needed that time.

I'm not an expert on ESM and I have followed the loaders topics only from time to time. If I remember correct the plan is to rename them to hooks and make them more generic. Adding a hook to load tooling before any application code would be nice and maybe solve such issues.
But I fear it's not that simple as one tool may want to monitor the other tool. So something like a startup priority or runlevel or similar might be needed.

@GeoffreyBooth
Copy link
Member Author

I’m not an expert on ESM and I have followed the loaders topics only from time to time. If I remember correct the plan is to rename them to hooks and make them more generic. Adding a hook to load tooling before any application code would be nice and maybe solve such issues.

The plan for loaders is in https://github.com/nodejs/loaders#status and the renaming discussion is in nodejs/loaders#95. I’d like to keep the discussion in this PR focused on the async_hooks issue, but I’m happy to discuss these other topics somewhere else (such as in an issue in the loaders repo).

Usually the solution is to start the monitoring tool earlier then the application. In CJS this works fine by loading the tool via -r or put it as first require in the main script file. For ESM that’s likely more complex as a loader is already needed that time.

Since we added --import, there’s a way to run arbitrary ESM code before the main entry point (not just the module-loading hooks) similar to --require. That doesn’t solve the problem presented by the broken tests on this PR’s branch, however, because there are still these internally created async resources already existing before the first line of the first --import code is run.

I don’t think the promises created by the loader chain are of any interest for classic APM products monitoring HTTP, GRPC, database,… requests.

Yes, that’s a potential solution, in a way—”won’t fix,” basically. We document that users of async_hooks shouldn’t expect a clean slate when the first line of ESM code is run, and leave the users to handle the consequences. I’m told that this is likely unacceptable, however, by comments like #43408 (comment). Maybe that viewpoint is an outlier, but I feel like it’s not worth debating unless we have no other choices (as in, until we exhaust all our options for getting the tests in this PR to pass). So first I want to see what our brain trust can come up with in terms of understanding what we’re setting with the tests in this PR, and second deciding what to do about it.

@mcollina
Copy link
Member

@GeoffreyBooth What are the next steps here? I think we have diagnosed the problem.

From my point of view we have three solutions:

  1. accept that we cannot have feature parity between CJS and ESM, and recommend to kick off async_hooks with CJS and then use import() to load an esm application - or use --require. I do not like this situation, but it's the status quo.
  2. make loading modules via ESM synchronous, minus TLA support. This will require all loaders to be moved to a separate worker thread and possibly more significant changes.
  3. do a significant refactor of async_hooks to make all this not a problem. I'm not sure if it's possible at all.

@GeoffreyBooth
Copy link
Member Author

I think we have diagnosed the problem.

I’m not sure we’ve fully gotten to the bottom of it. We know that an async resource is pending when the first line of user ESM code runs, but not what that async resource is. I’d guess it’s a promise created by the ESM loader, but where? If we can identify the code that creates the promise, perhaps we can find a way to either resolve it before user code is evaluated or delay its creation until after user code is evaluated. Either option, if possible, should also solve the problem, I’d think; and would be the most precise fix.

@GeoffreyBooth
Copy link
Member Author

I did a little experiment where I added some code to run_main.js that registered an async hooks callback, before any modules are parsed or loaded. It’s in this branch: https://github.com/GeoffreyBooth/node/tree/debug-esm-async-hooks. My hope was that I could maybe use the async hooks machinery itself to find the source of the mystery async resource that’s pending when user ESM code begins running.

If you check out that branch and build Node, then run

./node ./debug-async-hooks.mjs

you’ll get a minimal reproduction of the async hooks ESM issue. There’s only one pending async resource when user code starts running, something with ID 51. I managed to find its parent, ID 47, which I marked in a comment in that branch’s commit; but I can’t seem to find 51. If we can find 51, maybe we can find a way to make it not be pending when user code runs, and then async hooks in ESM is fixed.

I tried adding trace: new Error().trace into the async hooks init and I get this (partial):

    '    at AsyncHook.init (node:internal/modules/run_main:27:14)\n' +
    '    at emitInitNative (node:internal/async_hooks:202:43)\n' +
    '    at emitInitScript (node:internal/async_hooks:505:3)\n' +
    '    at promiseInitHook (node:internal/async_hooks:324:3)\n' +
    '    at ModuleJob.run (node:internal/modules/esm/module_job:189:16)\n' +
    '    at node:internal/modules/esm/loader:527:28\n' +

I’ve been going through the files in this trace and changing code from like await fn() to const prom = fn(); prom.tagged = 'hi'; await prom() and then running console.log([...debugAsyncHooks].filter(resource => resource.tagged)) to print the promises I’ve tagged. Async hooks registered by user code have only one pending promise when ESM user code begins, with ID 51. I can’t seem to find that one, but I have found the promise that’s its triggerAsyncId, 47, from module_job.js _instantiate(). I’ve been repeating this tagging method on every await or promise I can find inside or called from _instantiate(), without luck.

So anyway, can anyone find what line of code triggers async resource 51? Corollary to that, if there’s a way to do this detective work that’s better than the method I’ve outlined in this comment, I’d love recommendations 😄

@targos
Copy link
Member

targos commented Sep 29, 2022

There’s only one pending async resource when user code starts running, something with ID 51.

What if that resource is the user's module ? I think there has to be a pending promise to support top-level await.

@GeoffreyBooth
Copy link
Member Author

What if that resource is the user’s module ? I think there has to be a pending promise to support top-level await.

That would make a lot of sense. First I’d like to test this theory to confirm that it’s accurate; I’ve tried various tricks as discussed in my previous comment and I’ve looked all around ModuleJob, so while I agree the user’s module is the most logical candidate I haven’t been able to confirm that it is indeed the culprit.

Once we find the source then we can discuss if it’s possible to make it not pending at the time user code begins, and if it’s the user module that might be really hard; but I think first we need to confirm that it is the user’s module and find what line of code creates that promise.

@targos
Copy link
Member

targos commented Sep 29, 2022

./node --inspect-brk-node debug-async-hooks.mjs May be a way to debug this.
I tried with a debugger statement in your init hook in run_main and it seems to point to a promise created in addJobsToDependencyGraph, but I don't know exactly which one. Maybe the promise is created on the C++ side and we can't precisely point to it from JS.

@mcollina
Copy link
Member

This looks like a problem because I'm experiencing a few random crashes when testing with tap an esm module (tap uses domain, which uses async_hooks internally).

@GeoffreyBooth
Copy link
Member Author

We got to the bottom of the problem. In my test file debug-async-hooks.mjs, replacing console.log with process._rawDebug eliminated the mystery pending promises.

Basically, in Node console.log creates async resources under the hood; whereas process._rawDebug does not. The console.log async resources in this case had a trigger async ID that corresponded with the user code of my debug-async-hooks.mjs file. That async resource, that represents the module itself, was created before that module’s code could run and register the async hooks, which is why the init hook never gets fired for that module.

However, we can call async_hooks.executionAsyncId() within the user code right before async_hooks.createHook, to capture the async resource ID of the user code module itself. Then any resources created from that module would have a known parent that we could connect it to. Something like this is probably what instrumentation libraries should do.

@mcollina
Copy link
Member

mcollina commented Oct 9, 2022

@GeoffreyBooth can you please include the example we worked on in this issue?

@mcollina
Copy link
Member

mcollina commented Oct 9, 2022

On further analysis, our work during the collab summit was incomplete.

Consider this preload.js file:

const async_hooks = require('async_hooks')

async_hooks.createHook({
  init: (asyncId, type, triggerAsyncId, resource) => {
    process._rawDebug('>>> ', asyncId, type, triggerAsyncId)
  }
}).enable()

and this run.mjs:

import asyncHooks from 'async_hooks'

process._rawDebug('The current asyncId:', asyncHooks.executionAsyncId())

Running this combo with ./node -r ./preoload.js a.mjs, gives the following output (tailed):

...
>>>  136 PROMISE 51
The current asyncId: 51
>>>  137 PROMISE 136

As you can see, the asyncId where our script is run is 51 but there is a 136 promise that is a child of the current one that we can never track because it starts before us.
This promise is the one started from

await this.module.evaluate(timeout, breakOnSigint);
, which comes from V8 natively and we can't do much about.

I think our best option is to recommend that async_hooks is set up via preloarders/commonjs. We should also document that using async_hooks started from ESM will not track this specific promise.

I'm not 100% sure it's innocuous ignoring it.


On a side note, do anyone know why executionAsyncId() is 0 for ESM but 1 for CJS at startup (without any async_hooks active)?

@Flarna
Copy link
Member

Flarna commented Oct 13, 2022

On a side note, do anyone know why executionAsyncId() is 0 for ESM but 1 for CJS at startup (without any async_hooks active)?

Took a fast look into this and it seems this is because startup in CJS is sync. The id 1 is created here in StartExecution. The main script is called a bit later within ExecuteBootstrapper.

In ESM mode ExecuteBootstrapper is not calling the main script. Don't know what exactly it calls but I guess there are various async operations running till the ESM loader has the main script ready to be executed. All these async operations doesn't seem to create new AsyncResources therefore the default (0) is used. Not sure if 0 is used here because these operations come from "nowhere" or just because no context tracking is active at this time.

Not sure if this is a bug/missing feature in the ESM startup. One might expect that all async operations done within node core should be trackable which might require the use of AsyncResource there.

@Qard
Copy link
Member

Qard commented Oct 13, 2022

Yep, that's basically what is going on. The 0 value is a special value meaning it doesn't have a path back to the root, which should not be the case here. It should be able to resolve that path, but there seems to be a bunch of async stuff that's not handled appropriately so it just marks everything as unrooted activity.

@mcollina
Copy link
Member

@Qard this looks like a bug we should be fixing.

@Qard
Copy link
Member

Qard commented Oct 14, 2022

Yep. I'll try to find some time to look into it when I can.

@GeoffreyBooth GeoffreyBooth changed the title test: duplicate async_hooks tests in esm [DO NOT MERGE] test: duplicate async_hooks tests in esm Oct 21, 2023
@GeoffreyBooth
Copy link
Member Author

GeoffreyBooth commented Oct 22, 2023

I’d like to take another crack at trying to get the ESM loader handling all entry points, rather than being opted into by running an ESM entry or using a flag like --import. If we can get the ESM loader handling all modules, then it could become the only module loader, which would make the modules code far more maintainable and will also provide greater opportunities for performance improvements like #50322.

Changing shouldUseESMLoader in run_main.js to just always return true leads to 154 failing tests today, similar to what we saw last year. Based on the discussion above, I assume that many of these failures are due to async resources being created during startup before user code can run and register async hooks. I was wondering if potentially we could refactor the ESM loader to avoid creating these async resources.

I wrote the following test file:

const { createHook } = require('node:async_hooks');
const { writeSync } = require('node:fs');

const resources = new Set();

createHook({
  init(asyncId, type, triggerAsyncId, resource) {
    resources.add(asyncId);
  },
  promiseResolve(asyncId) {
    if (!resources.has(asyncId)) {
      writeSync(1, `orphaned promise: ${asyncId}\n`);
    }
  }
}).enable();

When I run this in the “always use the ESM loader” mode, it prints:

orphaned promise: 9
orphaned promise: 10
orphaned promise: 11
orphaned promise: 12

So presumably there are four promises created during startup that resolve after my script runs.

I traced the flow of running a CommonJS entry point in this “always use the ESM loader” mode. This is what I found, with async functions marked:

Module._compile (node:internal/modules/cjs/loader:1376:14)
Module._extensions..js (node:internal/modules/cjs/loader:1435:10)
Module.load (node:internal/modules/cjs/loader:1207:32)
Module._load (node:internal/modules/cjs/loader:1023:12)
cjsLoader (node:internal/modules/esm/translators:345:17) ❌ async
ModuleWrap.<anonymous> (node:internal/modules/esm/translators:294:7)
ModuleJob.run (node:internal/modules/esm/module_job:218:25) ❌ async
async ModuleLoader.import (node:internal/modules/esm/loader:329:24) ❌ async
async loadESM (node:internal/process/esm_loader:34:7) ❌ async
async handleMainPromise (node:internal/modules/run_main:114:12) ❌ async

Each async function returns a promise, so that’s at least five async resources right there? Though not four, curiously. Perhaps if promises are resolved before user code runs, that’s good enough?

The questions I have at this point are:

  • Does this even matter? Do we need to find a solution for this problem, in order to preserve some behavior of async hooks that our tests are checking for (asserting that an exact number of async resources are created) that may not matter to real world code? If we shipped a semver-major change that made the ESM loader handle all entry points, and just updated all the broken tests accordingly, what would be the consequences? (Aside from breaking monkey patching, which is a whole separate discussion.)

  • If ignoring the issue isn’t an option, how could we eliminate these async resources? Assuming the async resources are created by the async functions I found and the promises they implicitly and explicitly create, would refactoring that code into a .then style prevent the creation of async resources? Or refactoring into a callback style? Or are there other solutions?

@Qard
Copy link
Member

Qard commented Oct 23, 2023

The four extra promises seen by promiseResolve don't matter. The presence of an asyncId indicates that async_hooks saw their initialization. So it seems likely to me that handleMainPromise is running before the async_hooks setup somehow.

I mostly agree that the async_hooks tests suck and we should probably be just updating them, but I am a bit worried that changing the tests to match the behaviour could conceal a change that actually is breaking.

My preference would be if this work be done after #48528 lands as at that point the risk of breaking async_hooks will be much reduced as the core use of it will be moved off. Ideally we'll be on a path to deprecate or make it internal, so I'd be less worried about breakage when AsyncLocalStorage is not using it anymore.

@Flarna
Copy link
Member

Flarna commented Oct 24, 2023

I agree with @Qard that these promises don't matter. the async hooks tests simply assert to strict in this area.

The main problem I see with removing/ignoring async hooks once ALS has moved away from it is that actual testcoverage in this regard is lost. As of now ALS works as good/bad as async hooks do and there are quite some tests for it. Once ALS is not based on it anymore we loose also this coverage.
But this is more an issue for ALS not really ESM related.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
async_hooks Issues and PRs related to the async hooks subsystem. esm Issues and PRs related to the ECMAScript Modules implementation. needs-ci PRs that need a full CI run. test Issues and PRs related to the tests. wip Issues and PRs that are still a work in progress.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants