Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Add globalPreload to ts-node/esm for node 20 #2009

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

isaacs
Copy link
Contributor

@isaacs isaacs commented May 6, 2023

As of node v20, loader hooks are executed in a separate isolated thread environment. As a result, they are unable to register the require.extensions hooks in a way that would (in other node versions) make both CJS and ESM work as expected.

By adding a globalPreload method, which does execute in the main script environment (but with very limited capabilities), these hooks can be attached properly, and --loader=ts-node/esm will once again make both cjs and esm typescript programs work properly.

@isaacs
Copy link
Contributor Author

isaacs commented May 6, 2023

Not sure what kind of test should be added for this, since it just returns a string. Just comparing against a fixture feels kind of redundant?

@isaacs
Copy link
Contributor Author

isaacs commented May 6, 2023

Hm, this definitely needs a bit more work, because it breaks on node versions that don't have the off-thread loader hooks. (Ie, everything before v20.)

I'm not sure the best idiomatic approach to that in this project, but I'm not sure how to detect the situation where it's required, other than sniffing the version.

@ljharb
Copy link

ljharb commented May 6, 2023

If it helps, in node 19 vs node 20, this file:

export function globalPreload() {
	console.log('globalPreload');
	return '';
}

export function getGlobalPreloadCode() {
	console.log('getGlobalPreloadCode');
	return '';
}

logs, in node 19:

(node:43737) ExperimentalWarning: Custom ESM Loaders is an experimental feature and might change at any time
(Use `node --trace-warnings ...` to show where the warning was created)
globalPreload
(node:43737) Warning: Loader hook "getGlobalPreloadCode" has been renamed to "globalPreload"
(node:43737) Warning: Loader hook "getGlobalPreloadCode" has been renamed to "globalPreload"

and in node 20:

(node:43215) ExperimentalWarning: Custom ESM Loaders is an experimental feature and might change at any time
(Use `node --trace-warnings ...` to show where the warning was created)
globalPreload

… so, you could probably intercept console.log at module level, catch the experimental warning (and suppress it), and use it to indicate whether you were in 20+ or not?

@isaacs
Copy link
Contributor Author

isaacs commented May 6, 2023

@ljharb ooh sneaky, that might work. I'll look into that.

@ssalbdivad
Copy link

Would this allow ts-node to be used directly again in Node 20, or would it still require it to be run through node --loader?

Eagerly awaiting the fix on this so we can support development on Node 20 🙏

@cspotcode
Copy link
Collaborator

Historically I use version sniffing for this stuff, so I'd go with that. It's happened several times before that ts-node needs to implement multiple behaviors depending on the version of node or ts.

Something to keep in mind: ts-node --esm launches a subprocess sorta like node --require <foo> --loader <bar> <args>. So this change will need to be compatible with both ts-node --esm and node --loader ts-node/esm.

In typechecking mode, is the typechecking work being repeated on both threads? Keeping in mind that typechecking one file involves parsing the others, so CJS files typecheck with type info from ESM files and vice-versa. Repeated typechecking work is not a dealbreaker, but it's something we should at least document in this thread.

@codecov
Copy link

codecov bot commented May 6, 2023

Codecov Report

Merging #2009 (3fd7b4f) into main (47d4f45) will increase coverage by 0.25%.
The diff coverage is 62.50%.

❗ Current head 3fd7b4f differs from pull request most recent head b614b1b. Consider uploading reports for the commit b614b1b to get more accurate results

Files Changed Coverage Δ
src/transpilers/swc.ts 81.81% <ø> (ø)
src/child/child-loader.ts 54.54% <38.46%> (-23.24%) ⬇️
src/esm.ts 78.90% <62.06%> (-3.67%) ⬇️
src/bin.ts 89.83% <66.66%> (-0.52%) ⬇️
src/index.ts 80.58% <87.50%> (+0.48%) ⬆️
src/child/spawn-child.ts 84.21% <100.00%> (-3.29%) ⬇️

... and 3 files with indirect coverage changes

📢 Have feedback on the report? Share it here.

@isaacs
Copy link
Contributor Author

isaacs commented May 7, 2023

Added a commit to only not add the globalPreload registration on node versions less than 20.0.0.

@cspotcode As far as I can tell, with this change, ts-node --esm file.ts is just as broken on node 20 (but not any more broken, at least). I'll look into that, might be a straightforward way to work around it.

I haven't looked into type checking, but my guess is, if registerAndCreateEsmHooks triggers the type checking, then yes, it'd happen twice. Though, if it aborts on failure, it'd only be done once in the failure case, because the loader runs to completion before the globalPreload is executed. It is definitely registering extensions twice, but in isolated environments.

@isaacs
Copy link
Contributor Author

isaacs commented May 23, 2023

Got some time to dig into this. The issue with ts-node --esm blah.mts seems to have something to do with lateBindHooks. In node before v20, this works fine, but in v20, it doesn't pick them up. Still tracing through to try to figure out why that is.

@isaacs
Copy link
Contributor Author

isaacs commented May 23, 2023

Aha, of course.

The callInChild is running node --loader=child-loader.js child-entrypoint.js <args>.

child-loader.js sets up the proxy hooks, and child-entrypoint.js assigns the values to them by calling bootstrap() from ../bin, which calls lateBindHooks. But child-entrypoint.js and child-loader.js are in separate isolated threads.

The only way this can work on node 20 is for child-loader.js to set the actual hooks itself, rather than late binding them in the main thread. The child-entrypoint.js should register the require.extensions handlers, however, that avoids the need for a globalPreload.

@isaacs
Copy link
Contributor Author

isaacs commented May 23, 2023

Ok, got ts-node --esm foo.mts working, albeit in a somewhat unfortunate copypasta way. I suggest refactoring to remove the late-binding loaders entirely on all Node versions, and just set them up in the child-loader.mjs only. It seems like child-entrypoint.ts would then only be needed to munge the process.argv which could be done in a globalPreload on node v20 and higher (until that's replaced with main-thread assignment of loaders via --import), or directly in the loader on earlier versions. I held off on doing that for now, on the assumption that there might be other side-effects I'm not aware of.

Verified that typechecking does not get run twice in the presence of an off-thread loader, which in hindsight makes sense, since either the source is being loaded and transpiled once in the loader thread, or once in the main (only) thread, but never both. The whole point is that the load() function is never called in the main thread on node v20, which is what will eventually enable synchronous-looking behavior of loaders, in accordance with the browser specs.

@cspotcode PTAL when you get a chance :)

@cspotcode
Copy link
Collaborator

Thanks, I haven't had a chance to take a look yet, but WRT the double-typechecking:
I imagine it would happen when mixing CJS and ESM. E.g. you have some .mts files and some .cts files. And since typechecking one file relies on type info from others, the compiler repeats a bunch of work on both threads.

@isaacs
Copy link
Contributor Author

isaacs commented May 25, 2023

I've been trying to throw some complicated scenarios at it, but I'm not sure how to go about triggering the situation you're thinking of here.

The userland program isn't ever loaded or typechecked by TS in the loader thread. So if there's double-typechecking happening, it seems to me that it'd be either (a) the loading of ts files from within ts-node itself (or its deps), or (b) already a problem without a loader-thread (ie, if double-checking is happening in a single-threaded loader environment).

Maybe there's something I'm missing? But it seems like this can't possibly make the problem significantly worse. If you have a test or example I can poke at, I'm happy to try to dig in further.

@isaacs
Copy link
Contributor Author

isaacs commented May 25, 2023

Er, rather, it's not executed on the loader thread. Obviously the code is loaded in the loader thread.

But the typechecking happens when it compiles it to JS, and that only happens in one place. What actually ends up in the main thread is JavaScript going straight to the node VM.

@cspotcode
Copy link
Collaborator

cspotcode commented May 25, 2023

If double-typecheck is happening, the only visible side-effect would be higher CPU usage. So it's not a problem, per-se, it just means potentially a performance regression on node 20.

If code on the main thread does require('./other-file') then that TS->JS transformation and typecheck happens on the main thread, right?

The scenario I imagine is:

ts-node --esm ./entrypoint.mts

Where ./entrypoint.mts has require('./other.cts')

The loader thread's load() hooks ask the TS compiler for TS diagnostics on entrypoint.mts.
This means an instance of the TS compiler inside the loader thread does the work of parsing and computing type information for entrypoint.mts, other.cts, and every other file they (transitively) reference.

Then the main thread does require('./other.cts'), which asks the TS compiler for TS diagnostics on other.cts
This means another instance of the TS compiler on the main thread does the work of parsing and computing type information for other.cts and every other file it (transitively) references.

@cspotcode
Copy link
Collaborator

cspotcode commented May 25, 2023

The scenario above could also happen if entrypoint.mts does import './other.cts' which does import './another.cts' (secretly compiles to require('./another.cts') So could happen in a mixed CTS / MTS codebase where source code is exclusively import but some get compiled to require()

Copy link
Collaborator

@cspotcode cspotcode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks. I wanted to give a single, complete review after testing this locally, but I've been busy. These are the notes I have so far.

  • We should move as much of the logic into child-loader.ts and esm.ts, out of the .mjs files which are meant to be shims as thin as possible.
  • globalPreload should be part of the hooks returned by our createEsmHooks API

esm.mjs Outdated

// Affordance for node 20, where load() happens in an isolated thread
const offThreadLoader = versionGteLt(process.versions.node, '20.0.0');
export const globalPreload = () => {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this hook be included in the return value of https://typestrong.org/ts-node/api/index.html#createEsmHooks?

That API is meant for anyone wanting to wrap our loader in their own logic. They'd do --loader my-loader.mjs with their my-loader.mjs exporting all the functions they get from calling createEsmHooks(), perhaps wrapping them to implement new behavior.

This API predates support for multiple loaders, but also it should be possible to compose loaders in code so that end-users don't have to pass --loader twice.

Copy link
Contributor Author

@isaacs isaacs May 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything is just as composable as it was, with the exception that globalPreload is a bit weird to stack. (You have to append all the strings together wrapped in IIFE's so they don't clobber each others' vars.)

But multiple loaders have been supported longer than globalPreload was needed, so that might not matter.

child-loader.mjs Outdated
const require = createRequire(fileURLToPath(import.meta.url));

// TODO why use require() here? I think we can just `import`
/** @type {import('./dist/child-loader')} */
const childLoader = require('./dist/child/child-loader');
export const { resolve, load, getFormat, transformSource } = childLoader;

// On node v20, we cannot lateBind the hooks from outside the loader thread
// so it has to be done here.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, can we lateBind by sending config through the MessagePort?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's effectively what's happening here. It's calling bootstrap to late-bind the hooks with the state passed in on the loader URL.

I suppose it could get that information via the message port, but if we have the config we need on the URL, I'm not sure what that gets us?

child-loader.mjs Outdated Show resolved Hide resolved
child-loader.mjs Outdated Show resolved Hide resolved
esm.mjs Outdated Show resolved Hide resolved
src/child/spawn-child.ts Outdated Show resolved Hide resolved
const child = spawn(
process.execPath,
[
'--require',
require.resolve('./child-require.js'),
'--loader',
// Node on Windows doesn't like `c:\` absolute paths here; must be `file:///c:/`
pathToFileURL(require.resolve('../../child-loader.mjs')).toString(),
loaderURL.toString(),
require.resolve('./child-entrypoint.js'),
`${argPrefix}${compress(state)}`,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: can we include state only once to reduce the risk we hit CommandLine length limit on Windows?
Consider how this will look with forthcoming register() API. If register() will allow sending the state payload in-memory, akin to current lateBind logic, then consider using a pattern that looks similar today.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah good call, I guess child-entrypoint should just pull it from the loader URL if it needs it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, so the tricky bit here is that the child-entrypoint is modifying and re-stashing the value in the state's argv for child processes, so storing it only in the loader url execArgv involves a bit more refactoring. Can definitely be done, but I'd recommend putting it off for a second commit (if not second PR) just to ensure edge cases are handled properly.

@isaacs
Copy link
Contributor Author

isaacs commented May 29, 2023

Thinking through the double-checking scenario, I think you're right, but I don't think there's much to be done about it. The good news is, once source-returning commonjs loaders land in node, and ts-node starts using that instead, then the problem goes away (along with several others!), since require() will also be going through the same paths on the loader thread.

Cleaned up where the globalPreload logic lives, and a few other things.

@cspotcode
Copy link
Collaborator

The ideal until node supports CJS-via-loaders is that we use the message port to delegate all compilation into the loader thread. service.compile() in the main thread and all worker threads can be a shim that makes a blocking RPC call into the loader thread.

But that requires more work and it's not necessary to get this merged.

@isaacs
Copy link
Contributor Author

isaacs commented May 30, 2023

The ideal until node supports CJS-via-loaders is that we use the message port to delegate all compilation into the loader thread. service.compile() in the main thread and all worker threads can be a shim that makes a blocking RPC call into the loader thread.

I was actually going to suggest something similar, having been in this code a little bit now. Using the globalPreload context.port is a bit tricky, since it means putting more logic in the sloppy mode string-literal code, and you have to feature- or version-detect to know whether the port is even going to be there. The approach I'm using with @tapjs/processinfo is to use diagnostics_channel, which has been available for quite a bit longer: https://github.com/tapjs/processinfo/blob/main/lib/esm.mts

Then at least on node versions from 14.17 up, you could use the same approach for all loaders, whether they're running in a separate thread or not. It does add a tiny bit of unnecessary serialization overhead if loaders are already on the main thread and can just call the function directly, but not much. If the intermediate child process spawned by ts-node --esm had an IPC channel, you could probably have it work in the same way.

@cspotcode
Copy link
Collaborator

Does diagnostics_channel support blocking calls across thread boundaries? So that require('./something.ts') can be compiled off-thread?

since it means putting more logic in the sloppy mode string-literal code

I'm not too worried about that. We should use the same approach as with the .js/.mjs shims, where all logic lives in separate .ts files. require('/abs/path/to/something-else.js').doTheWork(typeof port == 'undefined' ? port : undefined)

you have to feature- or version-detect to know whether the port is even going to be there

When will the port be absent? My understanding is that it's always present w/off-thread loaders. Are there any non-EOLed node versions that don't have it?

you could use the same approach for all loaders, whether they're running in a separate thread or not

This seems not a big deal. Main and worker threads are making a blocking call to .compile(), whether that call is handled on-thread or RPCd over thread boundaries can be transparently swapped out.

@isaacs
Copy link
Contributor Author

isaacs commented May 31, 2023

When will the port be absent? My understanding is that it's always present w/off-thread loaders. Are there any non-EOLed node versions that don't have it?

It is always present with off-thread loaders.

But it seems I was completely mistaken about this, and while diagnostics_channel is synchronous, it doesn't (as yet) cross over to the loader thread, so you do still need to proxy through the globalPreload context.port, at least until we get import { register } from 'node:module', and that is not currently synchronous. So, at least for the near term, require.extensions and the possibility of double-typechecking is unavoidable.

@isaacs
Copy link
Contributor Author

isaacs commented May 31, 2023

When will the port be absent?

Technically speaking, it's not present on 16.0 through 16.11, which doesn't EOL until September 2023, but in those cases we don't bother to use a globalPreload, so it's not an issue.

@GeoffreyBooth
Copy link

import { register } from 'node:module', and that is not currently synchronous.

It is synchronous: nodejs/node#46826

@jlenon7
Copy link

jlenon7 commented May 31, 2023

import { register } from 'node:module', and that is not currently synchronous.

It is synchronous: nodejs/node#46826

He is right, register fn is not totally synchronous. Behind the scenes it's still async because of the communication with loaders worker thread.

@cspotcode
Copy link
Collaborator

For the sake of anyone else reading along:

A combination of MessageChannel & SharedArrayBuffer & Atomics can be used to make blocking RPC calls between threads.

The goal is to make blocking calls from the main thread to the loader thread, using MessageChannel / SharedArrayBuffer / Atomics. We want to block the main thread / worker thread, but the loader thread can answer asynchronously.

So it is possible today for require() hooks to call another thread for module resolution and compilation.


Sounds like the question is whether bootstrapping must be async. Do we have to await register() before Module.runMain();? Does register() give us a promise which is guaranteed to resolve after a port exists in the main thread which can talk to the loader thread? As soon as the promise resolves, we'll synchronously attempt to use the port.

@GeoffreyBooth
Copy link

Do we have to await register() before Module.runMain();? Does register() give us a promise

At least according to the docs in the PR, register is sync. It does not return a promise.

isaacs added a commit to isaacs/ts-node that referenced this pull request Sep 6, 2023
@isaacs
Copy link
Contributor Author

isaacs commented Sep 6, 2023

Updated to remove line number source map tests for repl and eval. (If you'd rather take on the refactoring to try to get it to work, that's fine, but imo that should be a separate PR at least.)

Added import.mjs and import-loader.mjs to support --import ts-node/import, which is similar to --loader ts-node/esm, but:

  • no warnings
  • no need for a globalPreload script string
  • sees process.stderr.isTTY properly, so no need for a port message to get pretty error reporting

Would be good to add some tests for --import, I can also back that out and make it a separate PR. Otherwise, I think this is ready to land.

isaacs and others added 13 commits September 7, 2023 14:54
This removes support for keeping import assertions, which were broken in
swc at some point, and unconditionally transpiled into import
attributes. (Ie, `import/with` instead of `import/assert`.)

No version of node supports import attributes with this syntax yet, so
anyone using swc to import json in ESM is out of luck no matter what.

And swc 1.3.83 broke the option that ts-node was using. The position of
the swc project is that experimental features are not supported, and may
change in patch versions without warning, making them unsafe to rely on
(as evidenced here, and the reason why this behavior changed
unexpectedly in the first place).

Better to just not use experimental swc features, and let it remove
import assertions rather than transpile them into something that node
can't run.

Fix: TypeStrong#2056
As of node v20, loader hooks are executed in a separate isolated thread
environment.  As a result, they are unable to register the
`require.extensions` hooks in a way that would (in other node versions)
make both CJS and ESM work as expected.

By adding a `globalPreload` method, which *does* execute in the main
script environment (but with very limited capabilities), these hooks can
be attached properly, and `--loader=ts-node/esm` will once again make
both cjs and esm typescript programs work properly.
When running `ts-node --esm`, a child process is spawned with the
`child-loader.mjs` loader, `dist/child/child-entrypoint.js` main,
and `argv[2]` set to the base64 encoded compressed configuration
payload.

`child-loader.mjs` imports and re-exports the functions defined
in `src/child/child-loader.ts`. These are initially set to empty
loader hooks which call the next hook in line until they are
defined by calling `lateBindHooks()`.

`child-entrypoint.ts` reads the config payload from argv, and
bootstraps the registration process, which then calls
`lateBindHooks()`.

Presumably, the reason for this hand-off is because `--loader`
hooks do not have access to `process.argv`.  Unfortunately, in
node 20, they don't have access to anything else, either, so
calling `lateBindHooks` is effectively a no-op; the
`child-loader.ts` where the hooks end up getting bound is not the
same one that is being used as the actual loader.

To solve this, the following changes are added:

1. An `isLoaderThread` flag is added to the BootstrapState. If
   this flag is set, then no further processing is performed
   beyond binding the loader hooks.
2. `callInChild` adds the config payload to _both_ the argv and
   the loader URL as a query param.
3. In the `child-loader.mjs` loader, only on node v20 and higher,
   the config payload is read from `import.meta.url`, and
   `bootstrap` is called, setting the `isLoaderThread` flag.

I'm not super enthusiastic about this implementation. It
definitely feels like there's a refactoring opportunity to clean
it up, as it adds some copypasta between child-entrypoint.ts and
child-loader.mjs. A further improvement would be to remove the
late-binding handoff complexity entirely, and _always_ pass the
config payload on the loader URL rather than on process.argv.
When an error is thrown in the loader thread, it must be passed through
the comms channel to be printed in the main thread.

Node has some heuristics to try to reconstitute errors properly, but
they don't function very well if the error has a custom inspect method,
or properties that are not compatible with JSON.stringify, so the
TSErrors raised by the source transforms don't get printed in any sort
of useful way.

This catches those errors, and creates a new error that can go through
the comms channel intact.

Another possible approach would be to update the shape of the errors
raised by source transforms, but that would be a much more extensive
change with further reaching consequences.
Set the default `options.pretty` value based on stderr rather than
stdout, as this is where errors are printed.

The loader thread does not get a process.stderr.isTTY set, because its
"stderr" is actually a pipe. If `options.pretty` is not set explicitly,
the GlobalPreload's `context.port` is used to send a message from the
main thread indicating the state of stderr.isTTY.

Adds `Service.setPrettyErrors` method to enable setting this value when
needed.
The @cspotcode/source-map-support module does not function properly on
Node 20, resulting in incorrect stack traces. Fortunately, the built-in
source map support in Node is now quite reliable. This does have the
following (somewhat subtle) changes to error output:

- When a call site is in a method defined within a constructor function,
  it picks up the function name *as well as* the type name and method
  name. So, in tests where a method is called and throws within the a
  function constructor, we see `Foo.Foo.bar` instead of `Foo.bar`.
- Call sites displays show filenames instead of file URLs.
- The call site display puts the `^` character under the `throw` rather
  than the construction point of the error object. This is closer to how
  normal un-transpiled JavaScript behaves, and thus somewhat
  preferrable, but isn't possible when all we have to go on is the Error
  stack property, so it is a change.

I haven't been able to figure out why exactly, but the call sites appear
to be somewhat different in the repl/eval contexts as a result of this
change. It almost seems like the @cspotcode/source-map-support was
applying source maps to the vm-evaluated scripts, but I don't see how
that could be, and in fact, there's a comment in the code stating that
that *isn't* the case. But the line number showing up in an Error.stack
property is `1` prior to this change (matching the location in the TS
source) and is `2` afterwards (matching the location in the compiled
JS).

An argument could be made that specific line numbers are a bit
meaningless in a REPL anyway, and the best approach is to just make
those tests accept either result. One possible approach to provide
built-in source map support for the repl would be to refactor the
`appendCompileAndEvalInput` to diff and append the *input* TS, and
compile within the `runInContext` method. If the transpiled code was
prepended with `process.setSourceMapsEnabled(true);`, then Error stacks
and call sites would be properly source mapped by Node internally.
This also adds a type for the loader hooks API v3, as globalPreload is
scheduled for removal in node v21, at which point '--loader ts-node/esm'
will no longer work, and '--import ts-node/import' will be the way
forward.
@isaacs isaacs mentioned this pull request Sep 10, 2023
26 tasks
cspotcode added a commit that referenced this pull request Sep 15, 2023
cspotcode added a commit that referenced this pull request Sep 15, 2023
cspotcode added a commit that referenced this pull request Sep 15, 2023
#2063)

* bump min ts version to 4.4 to match definitelytyped at time of writing

* update lockfile

* fix a bit of noise in test failures, also addressed by #2009, sneaking it in here to quiet the CI on other PRs

* Sneak in improvement to assertion logging
@piotr-cz
Copy link

This fix (or at least @isaacs/ts-node-temp-fork-for-pr-2009) seems to fix all ts-node issues related to node 18.19.x, 20.x and 22.x

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants