-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
doc: add synchronous hooks proposal #198
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! This seems close to being ready to implement.
Co-authored-by: Geoffrey Booth <webadmin@geoffreybooth.com>
After some thoughts I think the higher level function exports(url, context) { ... }
function requires(specifier, context, nextRequires) {
// requires hook can choose to run something here, exports can't.
// If hooks want to simply hijack a request and return early, skipping
// resolve and load, they can do it here. They also don't need to care
// about how different specifiers can be mapped to the same URL.
if (specifier === 'pnpapi') {
return { url: 'some://dummy/url', exports: pnpApiObject };
}
const result = nextRequires(specifier, context); // Invokes resolve and load steps
exports(result.url, result);
} Or to re-implement the after-load function linkAfterLoad(url, context) { ... }
const pnpApiModule = new vm.SyntheticModule([...], () => {
pnpApiModule.setExports('default', pnpApiObject);
});
function link(specifier, context, nextLink) {
// link hook can choose to run something here, linkAfterLoad can't.
// If hooks want to simply hijack a request and return early, skipping
// resolve and load, they can do it here. They also don't need to care
// about how different specifiers can be mapped to the same URL.
if (specifier === 'pnpapi') {
return { url: 'some://dummy/url', module: pnpApiModule };
}
const result = nextLink(specifier, context); // Invokes resolve and load steps
linkAfterLoad(result.url, result);
} In addition, if these hooks need to invoke say |
I like this assessment. |
Let’s say I want to write CoffeeScript and have it instrumented. The CoffeeScript hooks might implement just As an author, personally what I find easiest to follow is hooks that correspond to steps in the process. So if there’s a step before resolution, and a step after loading, then those can be additional hooks that still run in the sequence: something before resolution, resolve, load, something after loading. |
I am suggesting to have |
After attempting to prototype the high-level wrappers in the ESM loader I realized that there are some nuances in the current design - the
|
How does that work for loaders that are prerequisite for subsequent loaders to work? For example let's say a TS loader needing to resolve & load files from within zip archives? |
I like this idea. I find the |
The alternative is to provide the |
Can you describe how the module request would look like in this case? Would that just override the resolution/loading completely or still use part of the default logic? For example assuming that the request is function beforeResolve(specifier, context) {
if (specifier === 'foo') {
const url = 'file:///path/to/package.zip?file=foo.ts'; // Custom resolution
return { format: 'zip+typescript', url, shortCircuit: true };
}
return { }; // do nothing, and the default resolution logic gets used.
}
function beforeLoad(url, context) {
if (context.format === 'zip+typescript') {
const source = unzipAndTranspile(url); // Unzip path/to/package.zip, and transpile foo.ts
return { source, format: 'commonjs', shortCircuit: true };
}
return { }; // do nothing, and the default loading logic gets called.
} (Not sure if the search params can be preserved in CJS loader yet, given the compat issue of Or if you are thinking about composing a zip loader + a typescript loader: // Zip loader
function beforeResolve(specifier, context) {
if (specifier === 'foo') {
const url = 'file:///path/to/package.zip/foo.ts'; // Custom resolution
return { format: 'zip', url, shortCircuit: true };
}
return { }; // do nothing, and the default resolution logic gets used.
}
function beforeLoad(url, context) {
if (context.format === 'zip') {
const source = unzip(url); // Unzip path/to/package.zip, and locate foo.ts
return { source, shortCircuit: true };
}
return { }; // do nothing, and the default loading logic gets called.
}
// TypeScript loader
function afterLoad(url, source, context) {
if (url.endsWith('.ts')) {
return { source: transpile(source), format: 'commonjs' };
}
return { }; // do nothing, and the default loading logic gets called.
} (I think we might need to separate |
Yeah one alternative is to provide function load(url, context, nextLoad) {
if (url.startsWith('http')) {
const source = fetchSync(url);
return { source, shortCircuit: true };
}
return nextLoad(url, context);
} ..which needs to be ensured to be at the bottom of the hook chain (next to the default one). One downside of this is the fetching would be sequential and blocking even for ESM which is designed to make parallel loading of module requests in the same module possible. Theoretically with a slightly different design I think it's still possible to make the fetching part almost parallel: function load(url, context, nextLoad) {
if (url.startsWith('http')) {
const syncResponse = fetchSyncPooled(url); // This adds a fetch request to a pool
return {
// This blocks until the result is fetched; by default if this is the only load hook, the ESM loader
// runs the hooks to get the source provider functions first (then the requests are all started)
// and then run over the source provider functions to get the source (and start blocking on the
// first module request).
getSource() { return syncResponse.readAsBuffer(); } ,
shortCircuit: true
};
}
return nextLoad(url, context);
} This would require significant refactoring in the ESM loader to make the creation of ModuleWraps lazy but it allows the loader to start the fetch requests sequentially fairly quickly in a pool (without waiting for the previous module request to actually complete), and then the worker pool can handle the requests in parallel. Any On the other hand I think this might be a more compelling reason to split the hooks - with the // This needs to be the last of the beforeLoad chain:
function beforeLoad(url, context) {
if (url.startsWith('http')) {
const syncResponse = fetchSyncPooled(url); // This adds a fetch request to a pool
responseMap.set(url, syncResponse);
return {
source: 'dummy',
shortCircuit: true
};
}
return {};
}
// Node.js can run the beforeLoad eagerly for the module requests, and
// only start running afterLoad when the all beforeLoads of parallel module
// requests are run
// This needs to be the first of the afterLoad chain:
function afterLoad(url, source, context, nextLoad) {
if (url.startsWith('http')) {
// source was dummy at this point.
const syncResponse = responseMap.get(url);
return {
source: syncResponse.readAsBuffer(),
};
}
return {};
} |
Would it make sense to have a think like the "link" hook that works for both module formats? I think we would generally prefer a single interface to work with, even if it's a bit sub-optimal for certain cases. We want to be able to apply CJS-based patches to ESM rewrites when they happen without needing to rewrite the patches. That's not a huge deal though. |
I don't think so, because this is where one of the key difference of CJS and ESM lies: ESM linking (dependency resolution) is designed to be done before evaluation, for all module requests in the module, which relies on static analysis based on the import syntax, and it's also what allows parallel loading of ESM dependencies; CJS dependency resolution is done during evaluation (in the form of |
I'm aware of the difference in order/timing. I was thinking more if some interface could exist which abstracts that away. Like if you allow the CJS side to see what the export names are without directly accessing them you could do a similar "replace when available" kind of semantic as we have to do with ESM. I feel like a similar shaped API should be possible between the two, it just might not be quite as friendly. 🤔 Anyway, two APIs are probably liveable, we'll just have to make sure our instrumentations can handle both scenarios with (hopefully) minimal breakage. |
That sounds cool but I think it would be more flexible if the built-in loader API provides the low-level hooks, and if a unified hook is desired, some user-land package can be developed on top of them (like a package that just unify import-in-the-middle and require-in-the-middle). Otherwise by only providing high-level unified hooks in core we will deprive the ability to do low-level customization completely from developers who need them, and only cater to the needs of developers who don't care about the differences. |
Fair point. I'd be happy to help with constructing whatever that higher-level thing is later, but probably would want that to live in the Node.js org just to have an unbiased home for it that all the APM vendors can contribute to it. (Similar to what we're trying to do by donating IITM at the moment.) |
} | ||
``` | ||
|
||
## `exports` (require-only): from compiled CJS wrapper to exports object |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did some experiments with this, and I think this can probably be repurposed as an "execute"/"evaluate" hook.
/**
* @typedef {{
* thisValue: object,
* parameters: { exports: object, require: function, module: object, __filename: string, __dirname: string }
* }} ModuleExecuteContext
*/
/**
* @typedef {{
* exports: object,
* shortCircuit?: boolean
* }} ModuleExecuteResult
*/
function execute(url, context, nextExecute) {
// ...
}
This would still work basically the same for the example mock of require-in-the-middle
below, but it also allows users to override how the CJS wrapper gets invoked. This is still CJS-only, because in the case of ESM, V8 doesn't give us a way to hook into the evaluation of every module in the graph - we only get to control the evaluation of the root module.
In any case this might need a bit more thought, so I'll keep it off the initial PR.
The latest prototype is in https://github.com/joyeecheung/node/tree/sync-hooks-6 - I've got it working for ESM too, still need a bunch of testing to be opened as PR tough. |
This lays the foundation for adding synchronous hooks proposed in nodejs/loaders#198. In addition this corrects and adds several JSDoc comments for internal functions of the ESM loader, as well as explaining how require() for import CJS work in the special resolve/load paths. This doesn't consolidate it with import in require(esm) yet due to caching differences, which is left as a TODO.
This lays the foundation for supporting synchronous hooks proposed in nodejs/loaders#198 for ESM. In addition this corrects and adds several JSDoc comments for internal functions of the ESM loader, as well as explaining how require() for import CJS work in the special resolve/load paths. This doesn't consolidate it with import in require(esm) yet due to caching differences, which is left as a TODO.
This lays the foundation for supporting synchronous hooks proposed in nodejs/loaders#198 for ESM. - Corrects and adds several JSDoc comments for internal functions of the ESM loader, as well as explaining how require() for import CJS work in the special resolve/load paths. This doesn't consolidate it with import in require(esm) yet due to caching differences, which is left as a TODO. - The moduleProvider passed into ModuleJob is replaced as moduleOrModulePromise, we call the translators directly in the ESM loader and verify it right after loading for clarity. - Reuse a few refactored out helpers for require(esm) in getModuleJobForRequire().
This lays the foundation for supporting synchronous hooks proposed in nodejs/loaders#198 for ESM. - Corrects and adds several JSDoc comments for internal functions of the ESM loader, as well as explaining how require() for import CJS work in the special resolve/load paths. This doesn't consolidate it with import in require(esm) yet due to caching differences, which is left as a TODO. - The moduleProvider passed into ModuleJob is replaced as moduleOrModulePromise, we call the translators directly in the ESM loader and verify it right after loading for clarity. - Reuse a few refactored out helpers for require(esm) in getModuleJobForRequire().
This lays the foundation for supporting synchronous hooks proposed in nodejs/loaders#198 for ESM. - Corrects and adds several JSDoc comments for internal functions of the ESM loader, as well as explaining how require() for import CJS work in the special resolve/load paths. This doesn't consolidate it with import in require(esm) yet due to caching differences, which is left as a TODO. - The moduleProvider passed into ModuleJob is replaced as moduleOrModulePromise, we call the translators directly in the ESM loader and verify it right after loading for clarity. - Reuse a few refactored out helpers for require(esm) in getModuleJobForRequire().
This lays the foundation for supporting synchronous hooks proposed in nodejs/loaders#198 for ESM. - Corrects and adds several JSDoc comments for internal functions of the ESM loader, as well as explaining how require() for import CJS work in the special resolve/load paths. This doesn't consolidate it with import in require(esm) yet due to caching differences, which is left as a TODO. - The moduleProvider passed into ModuleJob is replaced as moduleOrModulePromise, we call the translators directly in the ESM loader and verify it right after loading for clarity. - Reuse a few refactored out helpers for require(esm) in getModuleJobForRequire().
This lays the foundation for supporting synchronous hooks proposed in nodejs/loaders#198 for ESM. - Corrects and adds several JSDoc comments for internal functions of the ESM loader, as well as explaining how require() for import CJS work in the special resolve/load paths. This doesn't consolidate it with import in require(esm) yet due to caching differences, which is left as a TODO. - The moduleProvider passed into ModuleJob is replaced as moduleOrModulePromise, we call the translators directly in the ESM loader and verify it right after loading for clarity. - Reuse a few refactored out helpers for require(esm) in getModuleJobForRequire().
This lays the foundation for supporting synchronous hooks proposed in nodejs/loaders#198 for ESM. - Corrects and adds several JSDoc comments for internal functions of the ESM loader, as well as explaining how require() for import CJS work in the special resolve/load paths. This doesn't consolidate it with import in require(esm) yet due to caching differences, which is left as a TODO. - The moduleProvider passed into ModuleJob is replaced as moduleOrModulePromise, we call the translators directly in the ESM loader and verify it right after loading for clarity. - Reuse a few refactored out helpers for require(esm) in getModuleJobForRequire(). PR-URL: #54769 Reviewed-By: Matteo Collina <matteo.collina@gmail.com> Reviewed-By: Stephen Belanger <admin@stephenbelanger.com> Reviewed-By: James M Snell <jasnell@gmail.com>
This lays the foundation for supporting synchronous hooks proposed in nodejs/loaders#198 for ESM. - Corrects and adds several JSDoc comments for internal functions of the ESM loader, as well as explaining how require() for import CJS work in the special resolve/load paths. This doesn't consolidate it with import in require(esm) yet due to caching differences, which is left as a TODO. - The moduleProvider passed into ModuleJob is replaced as moduleOrModulePromise, we call the translators directly in the ESM loader and verify it right after loading for clarity. - Reuse a few refactored out helpers for require(esm) in getModuleJobForRequire(). PR-URL: nodejs#54769 Reviewed-By: Matteo Collina <matteo.collina@gmail.com> Reviewed-By: Stephen Belanger <admin@stephenbelanger.com> Reviewed-By: James M Snell <jasnell@gmail.com>
This lays the foundation for supporting synchronous hooks proposed in nodejs/loaders#198 for ESM. - Corrects and adds several JSDoc comments for internal functions of the ESM loader, as well as explaining how require() for import CJS work in the special resolve/load paths. This doesn't consolidate it with import in require(esm) yet due to caching differences, which is left as a TODO. - The moduleProvider passed into ModuleJob is replaced as moduleOrModulePromise, we call the translators directly in the ESM loader and verify it right after loading for clarity. - Reuse a few refactored out helpers for require(esm) in getModuleJobForRequire(). PR-URL: #54769 Reviewed-By: Matteo Collina <matteo.collina@gmail.com> Reviewed-By: Stephen Belanger <admin@stephenbelanger.com> Reviewed-By: James M Snell <jasnell@gmail.com>
This comment was marked as off-topic.
This comment was marked as off-topic.
This refactors the CommonJS loading a bit to create a center point that handles source loading (`loadSource`) and make format detection more consistent to pave the way for future synchronous hooks. - Handle .mjs in the .js handler, similar to how .cjs has been handled. - Generate the legacy ERR_REQUIRE_ESM in a getRequireESMError() for both .mts and require(esm) handling (when it's disabled). PR-URL: #55590 Refs: nodejs/loaders#198 Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com> Reviewed-By: Matteo Collina <matteo.collina@gmail.com> Reviewed-By: Marco Ippolito <marcoippolito54@gmail.com> Reviewed-By: Juan José Arboleda <soyjuanarbol@gmail.com>
This refactors the CommonJS loading a bit to create a center point that handles source loading (`loadSource`) and make format detection more consistent to pave the way for future synchronous hooks. - Handle .mjs in the .js handler, similar to how .cjs has been handled. - Generate the legacy ERR_REQUIRE_ESM in a getRequireESMError() for both .mts and require(esm) handling (when it's disabled). PR-URL: #55590 Refs: nodejs/loaders#198 Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com> Reviewed-By: Matteo Collina <matteo.collina@gmail.com> Reviewed-By: Marco Ippolito <marcoippolito54@gmail.com> Reviewed-By: Juan José Arboleda <soyjuanarbol@gmail.com>
This lays the foundation for supporting synchronous hooks proposed in nodejs/loaders#198 for ESM. - Corrects and adds several JSDoc comments for internal functions of the ESM loader, as well as explaining how require() for import CJS work in the special resolve/load paths. This doesn't consolidate it with import in require(esm) yet due to caching differences, which is left as a TODO. - The moduleProvider passed into ModuleJob is replaced as moduleOrModulePromise, we call the translators directly in the ESM loader and verify it right after loading for clarity. - Reuse a few refactored out helpers for require(esm) in getModuleJobForRequire(). PR-URL: nodejs#54769 Reviewed-By: Matteo Collina <matteo.collina@gmail.com> Reviewed-By: Stephen Belanger <admin@stephenbelanger.com> Reviewed-By: James M Snell <jasnell@gmail.com>
This refactors the CommonJS loading a bit to create a center point that handles source loading (`loadSource`) and make format detection more consistent to pave the way for future synchronous hooks. - Handle .mjs in the .js handler, similar to how .cjs has been handled. - Generate the legacy ERR_REQUIRE_ESM in a getRequireESMError() for both .mts and require(esm) handling (when it's disabled). PR-URL: nodejs#55590 Refs: nodejs/loaders#198 Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com> Reviewed-By: Matteo Collina <matteo.collina@gmail.com> Reviewed-By: Marco Ippolito <marcoippolito54@gmail.com> Reviewed-By: Juan José Arboleda <soyjuanarbol@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this has been open long enough and the raised notes have either all been addressed or could be addressed via follow-up PRs to edit this document. I think it’s time to land this and continue iterating in follow-ups.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🙌 LGTM and nicely aligns with existing conventions/APIs/signatures.
This lays the foundation for supporting synchronous hooks proposed in nodejs/loaders#198 for ESM. - Corrects and adds several JSDoc comments for internal functions of the ESM loader, as well as explaining how require() for import CJS work in the special resolve/load paths. This doesn't consolidate it with import in require(esm) yet due to caching differences, which is left as a TODO. - The moduleProvider passed into ModuleJob is replaced as moduleOrModulePromise, we call the translators directly in the ESM loader and verify it right after loading for clarity. - Reuse a few refactored out helpers for require(esm) in getModuleJobForRequire(). PR-URL: nodejs#54769 Reviewed-By: Matteo Collina <matteo.collina@gmail.com> Reviewed-By: Stephen Belanger <admin@stephenbelanger.com> Reviewed-By: James M Snell <jasnell@gmail.com>
This refactors the CommonJS loading a bit to create a center point that handles source loading (`loadSource`) and make format detection more consistent to pave the way for future synchronous hooks. - Handle .mjs in the .js handler, similar to how .cjs has been handled. - Generate the legacy ERR_REQUIRE_ESM in a getRequireESMError() for both .mts and require(esm) handling (when it's disabled). PR-URL: nodejs#55590 Refs: nodejs/loaders#198 Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com> Reviewed-By: Matteo Collina <matteo.collina@gmail.com> Reviewed-By: Marco Ippolito <marcoippolito54@gmail.com> Reviewed-By: Juan José Arboleda <soyjuanarbol@gmail.com>
cc @nodejs/loaders
Spinned from nodejs/node#52219 - I ended up writing it down as a proposal here, because there are some high-level questions and re-design ideas after I looked into existing usages of CJS monkey-patching. While
resolve
andload
hooks work well enough for some cases, it also seems common in the wild to just overridespecifier -> module
mapping directly, so I sketched out some higher level hooks as alternatives here as well.