Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lib: refactor ES module loader for readability (JS side) #16579

Closed
wants to merge 3 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions doc/api/errors.md
Original file line number Diff line number Diff line change
Expand Up @@ -1178,6 +1178,14 @@ for strict compliance with the API specification (which in some cases may accept
`func(undefined)` and `func()` are treated identically, and the
[`ERR_INVALID_ARG_TYPE`][] error code may be used instead.

<a id="ERR_MISSING_DYNAMIC_INSTANTIATE_HOOK"></a>
### ERR_MISSING_DYNAMIC_INSTANTIATE_HOOK

> Stability: 1 - Experimental

Used when an [ES6 module][] loader hook specifies `format: 'dynamic` but does
not provide a `dynamicInstantiate` hook.

<a id="ERR_MISSING_MODULE"></a>
### ERR_MISSING_MODULE

Expand Down
3 changes: 3 additions & 0 deletions lib/internal/errors.js
Original file line number Diff line number Diff line change
Expand Up @@ -298,6 +298,9 @@ E('ERR_IPC_ONE_PIPE', 'Child process can have only one IPC pipe');
E('ERR_IPC_SYNC_FORK', 'IPC cannot be used with synchronous forks');
E('ERR_METHOD_NOT_IMPLEMENTED', 'The %s method is not implemented');
E('ERR_MISSING_ARGS', missingArgs);
E('ERR_MISSING_DYNAMIC_INSTANTIATE_HOOK',
'The ES Module loader may not return a format of \'dynamic\' when no ' +
'dynamicInstantiate function was provided');
E('ERR_MISSING_MODULE', 'Cannot find module %s');
E('ERR_MODULE_RESOLUTION_LEGACY', '%s not found by import in %s.' +
' Legacy behavior in require() would have found it at %s');
Expand Down
45 changes: 38 additions & 7 deletions lib/internal/loader/Loader.js
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,8 @@ const ModuleRequest = require('internal/loader/ModuleRequest');
const errors = require('internal/errors');
const debug = require('util').debuglog('esm');

function getBase() {
// Returns a file URL for the current working directory.
function getURLStringForCwd() {
try {
return getURLFromFilePath(`${process.cwd()}/`).href;
} catch (e) {
Expand All @@ -23,22 +24,44 @@ function getBase() {
}
}

/* A Loader instance is used as the main entry point for loading ES modules.
* Currently, this is a singleton -- there is only one used for loading
* the main module and everything in its dependency graph. */
class Loader {
constructor(base = getBase()) {
this.moduleMap = new ModuleMap();
constructor(base = getURLStringForCwd()) {
if (typeof base !== 'string') {
throw new errors.TypeError('ERR_INVALID_ARG_TYPE', 'base', 'string');
}

this.moduleMap = new ModuleMap();
this.base = base;
this.resolver = ModuleRequest.resolve.bind(null);
// The resolver has the signature
// (specifier : string, parentURL : string, defaultResolve)
// -> Promise<{ url : string,
// format: anything in Loader.validFormats }>
// where defaultResolve is ModuleRequest.resolve (having the same
// signature itself).
// If `.format` on the returned value is 'dynamic', .dynamicInstantiate
// will be used as described below.
this.resolver = ModuleRequest.resolve;
// This hook is only called when resolve(...).format is 'dynamic' and has
// the signature
// (url : string) -> Promise<{ exports: { ... }, execute: function }>
// Where `exports` is an object whose property names define the exported
// names of the generated module. `execute` is a function that receives
// an object with the same keys as `exports`, whose values are get/set
// functions for the actual exported values.
this.dynamicInstantiate = undefined;
}

hook({ resolve = ModuleRequest.resolve, dynamicInstantiate }) {
// Use .bind() to avoid giving access to the Loader instance when it is
// called as this.resolver(...);
this.resolver = resolve.bind(null);
this.dynamicInstantiate = dynamicInstantiate;
}

// Typechecking wrapper around .resolver().
async resolve(specifier, parentURL = this.base) {
if (typeof parentURL !== 'string') {
throw new errors.TypeError('ERR_INVALID_ARG_TYPE',
Expand All @@ -48,10 +71,11 @@ class Loader {
const { url, format } = await this.resolver(specifier, parentURL,
ModuleRequest.resolve);

if (typeof format !== 'string') {
if (!Loader.validFormats.includes(format)) {
throw new errors.TypeError('ERR_INVALID_ARG_TYPE', 'format',
['esm', 'cjs', 'builtin', 'addon', 'json']);
Loader.validFormats);
}

if (typeof url !== 'string') {
throw new errors.TypeError('ERR_INVALID_ARG_TYPE', 'url', 'string');
}
Expand All @@ -72,14 +96,20 @@ class Loader {
return { url, format };
}

// May create a new ModuleJob instance if one did not already exist.
async getModuleJob(specifier, parentURL = this.base) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your comment makes me wonder if this would be better called getOrCreateModuleJob.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine, get kind of leaves the details open of how the returned module job got into existence :)

const { url, format } = await this.resolve(specifier, parentURL);
let job = this.moduleMap.get(url);
if (job === undefined) {
let loaderInstance;
if (format === 'dynamic') {
const { dynamicInstantiate } = this;
if (typeof dynamicInstantiate !== 'function') {
throw new errors.Error('ERR_MISSING_DYNAMIC_INSTANTIATE_HOOK');
}

loaderInstance = async (url) => {
const { exports, execute } = await this.dynamicInstantiate(url);
const { exports, execute } = await dynamicInstantiate(url);
return createDynamicModule(exports, url, (reflect) => {
debug(`Loading custom loader ${url}`);
execute(reflect.exports);
Expand All @@ -100,5 +130,6 @@ class Loader {
return module.namespace();
}
}
Loader.validFormats = ['esm', 'cjs', 'builtin', 'addon', 'json', 'dynamic'];
Object.setPrototypeOf(Loader.prototype, null);
module.exports = Loader;
110 changes: 56 additions & 54 deletions lib/internal/loader/ModuleJob.js
Original file line number Diff line number Diff line change
@@ -1,91 +1,93 @@
'use strict';

const { ModuleWrap } = internalBinding('module_wrap');
const { SafeSet, SafePromise } = require('internal/safe_globals');
const assert = require('assert');
const resolvedPromise = SafePromise.resolve();

const enableDebug = (process.env.NODE_DEBUG || '').match(/\besm\b/) ||
process.features.debug;

/* A ModuleJob tracks the loading of a single Module, and the ModuleJobs of
* its dependencies, over time. */
class ModuleJob {
/**
* @param {module: ModuleWrap?, compiled: Promise} moduleProvider
*/
// `loader` is the Loader instance used for loading dependencies.
// `moduleProvider` is a function
constructor(loader, url, moduleProvider) {
this.loader = loader;
this.error = null;
this.hadError = false;

// linked == promise for dependency jobs, with module populated,
// module wrapper linked
this.moduleProvider = moduleProvider;
this.modulePromise = this.moduleProvider(url);
// This is a Promise<{ module, reflect }>, whose fields will be copied
// onto `this` by `link()` below once it has been resolved.
this.modulePromise = moduleProvider(url);
this.module = undefined;
this.reflect = undefined;
const linked = async () => {

// Wait for the ModuleWrap instance being linked with all dependencies.
const link = async () => {
const dependencyJobs = [];
({ module: this.module,
reflect: this.reflect } = await this.modulePromise);
assert(this.module instanceof ModuleWrap);
this.module.link(async (dependencySpecifier) => {
const dependencyJobPromise =
this.loader.getModuleJob(dependencySpecifier, url);
dependencyJobs.push(dependencyJobPromise);
const dependencyJob = await dependencyJobPromise;
return (await dependencyJob.modulePromise).module;
});
if (enableDebug) {
// Make sure all dependencies are entered into the list synchronously.
Object.freeze(dependencyJobs);
}
return SafePromise.all(dependencyJobs);
};
this.linked = linked();
// Promise for the list of all dependencyJobs.
this.linked = link();

// instantiated == deep dependency jobs wrappers instantiated,
// module wrapper instantiated
this.instantiated = undefined;
}

instantiate() {
async instantiate() {
if (this.instantiated) {
return this.instantiated;
}
return this.instantiated = new Promise(async (resolve, reject) => {
const jobsInGraph = new SafeSet();
let jobsReadyToInstantiate = 0;
// (this must be sync for counter to work)
const queueJob = (moduleJob) => {
if (jobsInGraph.has(moduleJob)) {
return;
}
jobsInGraph.add(moduleJob);
moduleJob.linked.then((dependencyJobs) => {
for (const dependencyJob of dependencyJobs) {
queueJob(dependencyJob);
}
checkComplete();
}, (e) => {
if (!this.hadError) {
this.error = e;
this.hadError = true;
}
checkComplete();
});
};
const checkComplete = () => {
if (++jobsReadyToInstantiate === jobsInGraph.size) {
// I believe we only throw once the whole tree is finished loading?
// or should the error bail early, leaving entire tree to still load?
if (this.hadError) {
reject(this.error);
} else {
try {
this.module.instantiate();
for (const dependencyJob of jobsInGraph) {
dependencyJob.instantiated = resolvedPromise;
}
resolve(this.module);
} catch (e) {
e.stack;
reject(e);
}
}
}
};
queueJob(this);
});
return this.instantiated = this._instantiate();
}

// This method instantiates the module associated with this job and its
// entire dependency graph, i.e. creates all the module namespaces and the
// exported/imported variables.
async _instantiate() {
const jobsInGraph = new SafeSet();

const addJobsToDependencyGraph = async (moduleJob) => {
if (jobsInGraph.has(moduleJob)) {
return;
}
jobsInGraph.add(moduleJob);
const dependencyJobs = await moduleJob.linked;
return Promise.all(dependencyJobs.map(addJobsToDependencyGraph));
};
try {
await addJobsToDependencyGraph(this);
} catch (e) {
if (!this.hadError) {
this.error = e;
this.hadError = true;
}
throw e;
}
this.module.instantiate();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing to see this simplification. Not having to call instantiate for each module in the right order is a much nicer v8 API!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just in case it wasn’t clear, we only had one call to instantiate() before as well, but maybe this is a bit more obvious now :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, it seems I was simply trying to post-justify my less elegant approach :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh, didn’t know this was your code ;)

for (const dependencyJob of jobsInGraph) {
// Calling `this.module.instantiate()` instantiates not only the
// ModuleWrap in this module, but all modules in the graph.
dependencyJob.instantiated = resolvedPromise;
}
return this.module;
}

async run() {
Expand Down
52 changes: 31 additions & 21 deletions lib/internal/loader/ModuleWrap.js
Original file line number Diff line number Diff line change
Expand Up @@ -10,39 +10,49 @@ const createDynamicModule = (exports, url = '', evaluate) => {
`creating ESM facade for ${url} with exports: ${ArrayJoin(exports, ', ')}`
);
const names = ArrayMap(exports, (name) => `${name}`);
// sanitized ESM for reflection purposes
const src = `export let executor;
${ArrayJoin(ArrayMap(names, (name) => `export let $${name}`), ';\n')}
;(() => [
fn => executor = fn,
{ exports: { ${
ArrayJoin(ArrayMap(names, (name) => `${name}: {
get: () => $${name},
set: v => $${name} = v
}`), ',\n')
} } }
]);
`;
// Create two modules: One whose exports are get- and set-able ('reflective'),
// and one which re-exports all of these but additionally may
// run an executor function once everything is set up.
const src = `
export let executor;
${ArrayJoin(ArrayMap(names, (name) => `export let $${name};`), '\n')}
/* This function is implicitly returned as the module's completion value */
(() => ({
setExecutor: fn => executor = fn,
reflect: {
exports: { ${
ArrayJoin(ArrayMap(names, (name) => `
${name}: {
get: () => $${name},
set: v => $${name} = v
}`), ', \n')}
}
}
}));`;
const reflectiveModule = new ModuleWrap(src, `cjs-facade:${url}`);
reflectiveModule.instantiate();
const [setExecutor, reflect] = reflectiveModule.evaluate()();
const { setExecutor, reflect } = reflectiveModule.evaluate()();
// public exposed ESM
const reexports = `import { executor,
const reexports = `
import {
executor,
${ArrayMap(names, (name) => `$${name}`)}
} from "";
export {
${ArrayJoin(ArrayMap(names, (name) => `$${name} as ${name}`), ', ')}
}
// add await to this later if top level await comes along
typeof executor === "function" ? executor() : void 0;`;
if (typeof executor === "function") {
// add await to this later if top level await comes along
executor()
}`;
if (typeof evaluate === 'function') {
setExecutor(() => evaluate(reflect));
}
const runner = new ModuleWrap(reexports, `${url}`);
runner.link(async () => reflectiveModule);
runner.instantiate();
const module = new ModuleWrap(reexports, `${url}`);
module.link(async () => reflectiveModule);
module.instantiate();
return {
module: runner,
module,
reflect
};
};
Expand Down