Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ESM Hook deadlocks since 21.2.0 a3e09b3 #50948

Open
isaacs opened this issue Nov 28, 2023 · 19 comments
Open

ESM Hook deadlocks since 21.2.0 a3e09b3 #50948

isaacs opened this issue Nov 28, 2023 · 19 comments
Labels
confirmed-bug Issues with confirmed bugs. esm Issues and PRs related to the ECMAScript Modules implementation. loaders-agenda Issues and PRs to discuss during the meetings of the Loaders team module Issues and PRs related to the module subsystem.

Comments

@isaacs
Copy link
Contributor

isaacs commented Nov 28, 2023

Version

21.2.0

Bisected to a3e09b3

Platform

Darwin moxy.lan 23.1.0 Darwin Kernel Version 23.1.0: Mon Oct 9 21:27:24 PDT 2023; root:xnu-10002.41.9~6/RELEASE_ARM64_T6000 arm64

Subsystem

esm hooks

What steps will reproduce the bug?

echo 'console.log("hello")' > x.js
echo '{}' > package.json
npm i tap
node --import=./node_modules/@tapjs/mock/dist/esm/import.mjs --import=./node_modules/@tapjs/processinfo/dist/esm/import.mjs x.js

How often does it reproduce? Is there a required condition?

Always reproduces

What is the expected behavior? Why is that the expected behavior?

Expect that it loads the two import arguments, and then runs the file.

What do you see instead?

Deadlocks at the Module.register() call in the second import script.

Additional information

Discussion from slack:

isaacs
7 hours ago
@aduh95
Since a3e09b3 landed, node-tap hangs whenever I run it. It seems like it's getting stuck at:

      // Sleep until worker responds.
      AtomicsWait(this.#lock, WORKER_TO_MAIN_THREAD_NOTIFICATION, this.#workerNotificationLastId);

I haven't been able to narrow it down to a satisfyingly minimal reproduction, and for extra 😬 it's order-dependent, but this will reproduce it in any project that has the latest tap installed:

node --import=./node_modules/@tapjs/mock/dist/esm/import.mjs --import=./node_modules/@tapjs/processinfo/dist/esm/import.mjs x.js

(where x.js is just console.log('hello'))

5 replies
Jacob Smith
4 hours ago
Oof. These are very difficult to troubleshoot.
I think i'll have time on Wednesday to take a look (edited)
aduh95
3 hours ago
For reference: a3e09b3
🙏
1

isaacs
9 minutes ago
It seems like what's going on is that the @tapjs/mock needs to make a request on its MessageChannel port in order to finish its resolve hook.
Then it loads @tapjs/processinfo/import, which calls Module.register. This triggers a sync request to the worker thread for 'register', which then triggers the mock's resolve hook to resolve the loader.mjs file, but because it's already awaiting a sync message (the register), it'll never come, because that sync message is waiting for another sync message (the resolve).
isaacs
4 minutes ago
My working theory at the moment is that it started at the "run imports sequentially" because prior to that, it wasn't blocking on Module.register() so any blocking async-made-sync actions in the process of the register call weren't a problem.
isaacs
1 minute ago
Also: seems like it's related to the use of the MessageChannel specifically. Just a timeout doesn't trigger it, but the busywait message processing seems to keep the other message channel from proceeding.

@isaacs isaacs changed the title ESM Hook deadlocks since 21.2.0 a3e09b3fdd3908a6cc82afa319596d3889513a51 ESM Hook deadlocks since 21.2.0 a3e09b3 Nov 28, 2023
@JakobJingleheimer
Copy link
Member

JakobJingleheimer commented Dec 3, 2023

Also from the slack convo (me):

The problem here occurs when register tries to register loader "B", and its resolution goes through loader "A", and A then tries to communicate with the Module Worker (which is busy registering B).

I can definitely see this being surprising.

As it currently is, I think we can't protect against this in node because that part of node is sleeping.

A potential mitigation is switching from MessageChannel (which is one-to-one), to BroadcastChannel so that "anyone" trying to make a request to the worker can at least be informed that the worker is busy (so don't).

I can repro the issue. I'm not sure if this is strictly a bug or just an oversight/limitation. I added the confirmed-bug label to denote it is confirmed.

@JakobJingleheimer JakobJingleheimer added module Issues and PRs related to the module subsystem. esm Issues and PRs related to the ECMAScript Modules implementation. confirmed-bug Issues with confirmed bugs. labels Dec 3, 2023
@JakobJingleheimer JakobJingleheimer added the loaders-agenda Issues and PRs to discuss during the meetings of the Loaders team label Dec 3, 2023
@isaacs
Copy link
Contributor Author

isaacs commented Dec 4, 2023

Does Module.register() have to be sync? It seems like if not, then Hooks.register() could take the resolve/load result as its argument, and do that bit before the thread is paused.

@GeoffreyBooth
Copy link
Member

Does Module.register() have to be sync?

I don’t think so, though we’re hoping to avoid any more breaking changes at this point before we go stable.

Please don’t forget to ping @nodejs/loaders. Just adding the label doesn’t do anything.

@JakobJingleheimer
Copy link
Member

There was a reason to make it sync. IIR, it was I who advocated for it because there was no scenario we could think of where it being async was desirable.

@isaacs
Copy link
Contributor Author

isaacs commented Dec 7, 2023

there was no scenario we could think of where it being async was desirable

Hooray for running code! It has a nasty way of finding edge cases that even very smart people can't think of ;)

Idk if it's desirable for it to be async, per se, but it seems like the machinery required to prevent deadlocks here would be quite a bit more elaborate. Maybe there's some clever approach I'm not thinking of, but it feels like it might be expensive. And certainly, not deadlocking is preferable. I guess the question is whether taking on that change (making register async) is more prudent than combing through the edge cases and supporting that machinery long term.

@isaacs
Copy link
Contributor Author

isaacs commented Dec 7, 2023

I suppose another way to approach this would be to inform the hook methods that the main thread is blocked, some flag on context or something, so at least they can avoid doing any main-thread comms for that request.

@GeoffreyBooth
Copy link
Member

In the meeting on Tuesday @JakobJingleheimer thought that the potential for deadlock might be eliminated if we changed the first argument of register from any import specifier (which needs to go through the resolve chain of hooks) to only allow an absolute or relative URL (which we can synchronously convert into an absolute URL without involving any hooks, thereby avoiding any cross-thread communication and its deadlock potential). This would be a very minor change for most people; anyone currently doing register('bare-specifier') could instead do register(import.meta.resolve('bare-specifier')) to preserve the current behavior.

@aduh95
Copy link
Contributor

aduh95 commented Dec 7, 2023

anyone currently doing register('bare-specifier') could instead do register(import.meta.resolve('bare-specifier')) to preserve the current behavior.

import.meta.resolve is not available in CJS modules.

@JakobJingleheimer
Copy link
Member

JakobJingleheimer commented Dec 7, 2023

Is there no equivalent?

When I was suggesting import.meta.resolve as a potential solution, I was mentioning there must be a reason because it's too easy.

@GeoffreyBooth
Copy link
Member

import.meta.resolve is not available in CJS modules.

pathToFileURL(require.resolve('bare-specifier'))

@aduh95
Copy link
Contributor

aduh95 commented Dec 7, 2023

require.resolve('bare-specifier')

require.resolve returns the value for the "require" condition of the "exports" map, while import.meta.resolve returns the "import" condition.

@GeoffreyBooth
Copy link
Member

GeoffreyBooth commented Dec 7, 2023

require.resolve returns the value for the "require" condition of the "exports" map, while import.meta.resolve returns the "import" condition.

Yeah. And if that’s the case for a particular package, then the user just needs to create an ESM file for this, or define the URL explicitly without relying on require.resolve. It’s not the end of the world. The point is that it’s the user’s responsibility to resolve the URL to the hooks file.

Also a hooks file must be ESM, so it wouldn’t make much sense for there to be a "require" condition for it.

@aduh95
Copy link
Contributor

aduh95 commented Dec 7, 2023

I don't think that's reasonable, and it wouldn't solve the problem anyway (the main thread still need to be blocked while the 'load' hook executes and the module is parsed and executed).
(Also if the user calls import.meta.resolve, it's also blocking the main thread while it's calling the resolve hook chain, so it's exactly the same situation as we have today)

@JakobJingleheimer
Copy link
Member

So having the communication channel is just impossible to support?

@GeoffreyBooth
Copy link
Member

First I think we need to see if it solves the issue. If it doesn’t, there’s no point in debating whether it’s desirable UX.

The thinking was that we don’t block the main thread for load since load is async. It’s only resolve that’s potentially sync. (Or is load potentially sync for require?)

@aduh95
Copy link
Contributor

aduh95 commented Dec 8, 2023

module.register is sync, that’s what’s blocking the thread. It doesn’t change anything how many hooks it calls, it stays sync and blocks the main thread because that’s how it’s implemented. Similarly, dynamic imports are async, yet they don’t block the main thread at all despite calling both resolve and load chains. If the idea was that skipping resolve would somehow make it async, that’s simply not a good lead, one thing has nothing to do with the other.

module.register has to stay sync, otherwise we would introduce race conditions that will certainly be more problematic than the issue at hand (because harder to debug, and very frustrating for the users to have flakiness in Node.js itself – at least the current behavior reproduces consistently). It would also be a breaking change to an API which is at the RC stage, so something we should try to avoid anyway.

So having the communication channel is just impossible to support?

Not possible during sync operations ( module.register, import.meta.resolve, faux-CJS). During static and dynamic imports it should be fine though.

@GeoffreyBooth
Copy link
Member

Not possible during sync operations ( module.register, import.meta.resolve, faux-CJS).

Can we error on sending a cross thread message while the main thread is blocked? So that we throw rather than deadlock.

@JakobJingleheimer
Copy link
Member

JakobJingleheimer commented Dec 8, 2023

Can we error on sending a cross thread message while the main thread is blocked? So that we throw rather than deadlock.

Maaaybe. For register, I think yes because there will be an outstanding request and that is always sync. Others I think would currently be more difficult because nothing identifies whether the main is blocked. An easy solution may be to merely expand the types (ex 'resolve' → 'resolve-async' & 'resolve-sync'), or add another property to the request (sync: boolean).

@GeoffreyBooth
Copy link
Member

Is this still an issue, or has it been fixed by #52706? (Try on latest main.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
confirmed-bug Issues with confirmed bugs. esm Issues and PRs related to the ECMAScript Modules implementation. loaders-agenda Issues and PRs to discuss during the meetings of the Loaders team module Issues and PRs related to the module subsystem.
Projects
None yet
Development

No branches or pull requests

4 participants