Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

loader: refactor loader #16874

Closed
wants to merge 1 commit into from
Closed

Conversation

devsnek
Copy link
Member

@devsnek devsnek commented Nov 7, 2017

The loader was refactored to use some better terms and methods, mostly
inspired by the whatwg loader spec. Namely there is now translators
instead of loaders and registry instead of moduleMap. The loader
is also exposed as a builtin, allowing users to make their own loaders
and hook the main loader.

I also found a test that wanted to check for the values of exports being unchanged, but I think thats not really correct, and only succeeded as an edge case, and since it was failing after the refactor I removed it. If this was erroneous please let me know.

I'm going to avoid documenting the whole loader until this gets some backing.

cc @bmeck @guybedford @addaleax

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • commit message follows commit guidelines
Affected core subsystem(s)

loader

@nodejs-github-bot nodejs-github-bot added the lib / src Issues and PRs related to general changes in the lib or src directory. label Nov 7, 2017
@devsnek devsnek changed the title loader: refactor loader and expose it as a module loader: refactor loader and expose it as a builtin Nov 7, 2017
@devsnek
Copy link
Member Author

devsnek commented Nov 8, 2017

one thing i wonder, as ModuleJob#reflect isn't used should it be removed? It basically allows the equivalent of messing with require.cache

const debug = require('util').debuglog('esm');

const kDynamicTranslate = Symbol('dynamicTranslate');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this a symbol if it is essentially a publicly mutable property?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to force people to use hook, in case we have any internal structure changes in the future, and I should probably do the same with resolve

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They can get this via Object.getOwnPropertySymbols, I don't think this is private.

Copy link
Member Author

@devsnek devsnek Nov 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bmeck i think i'll use a closure but if you have a better idea lemme know. technically the loader spec wants them to be as symbols (which is why i did Loader.resolve) but i'm not too sure about that

this.resolver = resolve.bind(null);
this.dynamicInstantiate = dynamicInstantiate;
this.base = base;
this.registry = new Registry();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, the code should keep using that terminology. It sucks that that really makes it less comprehensible, though … so I would really not mind any kind of comments to explain what’s happening

@bmeck
Copy link
Member

bmeck commented Nov 8, 2017

Please use the standard terminology from published specs like module map. The author of the loader spec recommends excluding it from implementations for now.

import one from './esm-snapshot';
import assert from 'assert';

assert.strictEqual(one, 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this file is done to test timing of snapshotting. It should not be removed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this test is checking if mutating require.cache affects the imported modules? Seeing as module._load is used wouldn't this be the expected behavior?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldn't this be the expected behavior?

I don't know what "this" is.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry about that, i meant wouldn't

  1. importing the mutator
  2. importing the mutatee (i hope thats a word)
  3. mutatee being mutated

be the expected behavior

Copy link
Member

@bmeck bmeck Nov 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we walk thought the code the following happens:

  • require('./esm-snapshot-mutator')
    • Module Record is formed for ./esm-snapshot-mutator
    • require('./esm-snapshot')
      • Module Record is formed for ./esm-snapshot
      • Evaluation of ./esm-snapshot ends, module.exports === 1
      • Snapshot takes effect, Module Record saves as default = 1
    • Evaluation of ./esm-snapshot-mutator ends

The snapshot is taken before ./esm-snapshot-mutator changes the entry of require.cache.

This snapshot ensures the Module Namespace values of ./esm-snapshot is the same no matter when it is imported.

lib/loader.js Outdated

const Loader = require('internal/loader/Loader');

module.exports = new Loader();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not comfortable exposing the loader when hooks are not even finished. Please remove this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if its marked as experimental is that ok?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd still lean on no. I'm not entirely sure why you want it exposed if you want people to use hooks to mutate it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bmeck Node’s lack of APIs for its internals has always been a huge pain point. We should definitely expose the Loader at some point.

if you want people to use hooks to mutate it.

That makes sense if you think of the loader as Node’s way of loading its users’ modules from disk. Node definitely should have some kind of API for people to create and build modules. Think of it like the vm module, just more complicated because modules are more complicated than scripts.

Copy link
Member

@addaleax addaleax Nov 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, this is going to conflict with @JacksonTian’s https://npm.im/loader

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@devsnek I'd need to do a code review of it since I didn't design it for public exposure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@devsnek is there a use case you are trying to get by doing this? I keep asking the question but don't understand right now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to expose it to allow for more control over node's loader, which will be a very important system moving forward. In the same way that require can be messed with, I think that loader should be able to be messed with. I agree that there should be control over what stuff can be messed with, but the loader having a public api is, in my opinion, a must have moving forward. I think that the hooks idea is good as a controlled system (and should remain in), but that it should also be possible to make your own loaders, change them up, etc. More to this point is why i suggested that at least exposing the class by default while allowing access to the instance with a flag would be a good choice to keep the system save and extensible at the same time.

Copy link
Member

@bmeck bmeck Nov 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would need to understand various aspects of this. I see you want it, but don't see why except an opinion of "I think that loader should be able to be messed with". Various parts of mutating require like NODE_PATH and require.extensions were deprecated and I don't feel that saying require is mutable so the future should be mutable is a strong position. Exposing individual use cases like hooking into how specifier resolution works is very different from creating multiple disconnected loaders.

In particular, this exposure is problematic since the Module Map is not shared. I would not expose this without a clearer reasoning to need/want multiple completely disconnected loaders.

I also feel like the Realms API should be looked at to ensure we don't have problems if/when it arrives. The people who worked on the WHATWG Loader API are involved in that. It doesn't look to be exposing a Loader like object from the last few TC39 meetings on this topic.

If you can explain the concrete use case of having multiple loaders and explain how it does not suffer from the problems of mutation in the same way that require has that seems fine. Right now this seems too rushed and hasn't looked at compatibility or scope of what exposing it means.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Messing with" the loader won't just have effects for you and your app, it could very very easily leak out and have effects on the ecosystem.

Whether this is a good idea ever is a separate discussion, but allowing this kind of experimentation prior to everything being stable and finished is, imo, a very dangerous and bad idea.

lib/loader.js Outdated

const Loader = require('internal/loader/Loader');

module.exports = new Loader();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#16874 (comment) remains, commenting since default github UI is hiding it

}
// A closure is used here because we really really really don't want users
// touching the hooks without using `hook()`.
function Loader(base = getURLStringForCwd()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not a class?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the comment explains

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can still be a class with the this.hook = in the constructor

@devsnek devsnek changed the title loader: refactor loader and expose it as a builtin loader: refactor loader Nov 8, 2017
@devsnek
Copy link
Member Author

devsnek commented Nov 8, 2017

@bmeck how do requires inside cjs modules loaded with loader.import get linked as esm deps?
Can they just do the normal cjs require? (if this isn't possible then 16675 will have to get closed)

@bmeck
Copy link
Member

bmeck commented Nov 8, 2017

@devsnek It has 2 parts:

When importing CJS:

During HostResolveImportedModule the specifier is resolved. It is then transformed into a module record during the phase of Instantiate for the ModuleJob that is loading the CJS. This registers a function in the DymanicModule that is used during InnerModuleEvaluation. This function will call module._load during that time (which may have had behavior mutated by things like babel-node).

When requireing CJS:

After _load finished for a CJS module it checks to see if an entry is in the ModuleMap and adds it if it doesn't exist. It grabs a snapshot of module.exports using a regular property access to module.exports firing any getters that may be present.

This is not the most performant approach since we can delay actually creating the ESM record and incurring the parse cost of the DynamicModule. We could hold a reference to what the snapshot would be instead. Lacking that lazy behavior the main reason CJS load time suffers when using --experimental-modules.

@devsnek
Copy link
Member Author

devsnek commented Nov 8, 2017

@bmeck i meant uh

// file.js
require('fs');

// main.mjs
import 'file.js';

when running link on file.js's ModuleWrap instance, fsshows up as a dep. shouldn't it be handled by require, not loader? this is separate from it getting added to module map.

@bmeck
Copy link
Member

bmeck commented Nov 8, 2017

@devsnek it shouldn't. What is the output with NODE_DEBUG=esm node --experimental-modules main.mjs

@devsnek
Copy link
Member Author

devsnek commented Nov 8, 2017

yea i tried that earlier, got really confused:

ESM 10912: Storing file:///main.mjs in ModuleMap
ESM 10912: Loading StandardModule file:///main.mjs
ESM 10912: creating ESM facade for file://file.js with exports: default
ESM 10912: Storing file:///file.js in ModuleMap
ESM 10912: Loading CJSModule file:///file.js

@bmeck
Copy link
Member

bmeck commented Nov 8, 2017

Ah, it looks like

debug(`Loading StandardModule ${url}`);
is logging at a different time than the other Module types. Thats a bug that should be fixed.

I don't see node:fs anywhere to imply the fs module was wrapped into an ESM.

@devsnek
Copy link
Member Author

devsnek commented Nov 9, 2017

@bmeck on my latest commit i added buildAtInstantiate which should allow some more freedom with translating modules, but i'm having some issues.
using the example above, i get
Error: linking error, dependency promises must be resolved on instantiate
when calling this.module.instantiate() for main.mjs
they should all be resolved at that point, i think i just need another pair of eyes. could you take a look? (i've left some debug stuff in)

Copy link
Member

@bnoordhuis bnoordhuis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really a review, just a quick glance.

@@ -86,7 +86,7 @@ const builtinLibs = [
'assert', 'async_hooks', 'buffer', 'child_process', 'cluster', 'crypto',
'dgram', 'dns', 'domain', 'events', 'fs', 'http', 'http2', 'https', 'net',
'os', 'path', 'perf_hooks', 'punycode', 'querystring', 'readline', 'repl',
'stream', 'string_decoder', 'tls', 'tty', 'url', 'util', 'v8', 'vm', 'zlib'
'stream', 'string_decoder', 'tls', 'tty', 'url', 'util', 'v8', 'vm', 'zlib',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated change?

@@ -1,5 +1,6 @@
// Flags: --experimental-modules
/* eslint-disable required-modules */

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise.

@@ -5,7 +5,7 @@ const debug = require('util').debuglog('esm');
const ArrayJoin = Function.call.bind(Array.prototype.join);
const ArrayMap = Function.call.bind(Array.prototype.map);

const createDynamicModule = (exports, url = '', evaluate) => {
function createDynamicModule(exports, url = '', evaluate) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please change this back to an arrow function, it lacks a .prototype and protects against new

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do, also did you see #16874 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did, but this PR is a lot of changes which makes me a bit slow to re-review so quickly

@devsnek
Copy link
Member Author

devsnek commented Nov 9, 2017

as far as i can tell at this point, module_wrap wants to have the full module at instantiate time, not just the dependencies. is this safe to change? technically if the module has no deps, it shouldn't need the full module until evaluation time.

@bmeck
Copy link
Member

bmeck commented Nov 9, 2017

@devsnek InnerModuleInstantiation requires the full Module since it does instantiate the hoistable declarations using ModuleDeclarationEnvironmentSetup

@bmeck
Copy link
Member

bmeck commented Nov 9, 2017

I think export * from also would require the full source text as well... can't remember why though. Collisions?

@devsnek
Copy link
Member Author

devsnek commented Nov 9, 2017

@bmeck but if there aren't any hoistable declarations there's no need to have the full module right?

@bmeck
Copy link
Member

bmeck commented Nov 9, 2017

@devsnek you still need it whenever a parent is instantiating. The whole graph Instantiates prior to any evaluation. Not sure I understand

@devsnek
Copy link
Member Author

devsnek commented Nov 9, 2017

being able to defer translation until ModuleJob#run would be a nice way to unblock #16675

// Use .bind() to avoid giving access to the Loader instance when it is
// called as this.resolver(...);
this._resolve = resolve.bind(null);
this._dynamicTranslate = dynamicTranslate;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bind(null) here too

@guybedford
Copy link
Contributor

Note this still needs a documentation change that the "cjs" name in the loader format has been changed to "commonjs". //cc @devsnek

@jdalton
Copy link
Member

jdalton commented Jan 15, 2018

I missed the "commonjs" bikeshed. Just curious, what's the reason for using a short form name for "esm" but a longer form name for "commonjs" instead of "cjs"?

@jdalton
Copy link
Member

jdalton commented Jan 15, 2018

@devsnek

commonjs is still short, not really any reason to make it cjs; ecmascript modules is a bit longer

It's also smooshing case though as it's not "Commonjs". Using "cjs" avoids that and keeps symmetry with "esm" mode as both short name forms. Even aligns better with the other form, "json".

@addaleax addaleax removed the author ready PRs that have at least one approval, no pending requests for changes, and a CI started. label Jan 18, 2018
@MylesBorins
Copy link
Contributor

This is not landing cleanly on v9.x, would someone be willing to open a backport?

@devsnek
Copy link
Member Author

devsnek commented Feb 20, 2018

I can do that when I get home in a few hours. i should open a pr targeting the 9.x branch right?

@MylesBorins
Copy link
Contributor

@devsnek nevermind, I got it to land after another backport landed!

MylesBorins pushed a commit that referenced this pull request Feb 20, 2018
PR-URL: #16874
Reviewed-By: Guy Bedford <guybedford@gmail.com>
Reviewed-By: Bradley Farias <bradley.meck@gmail.com>
@MylesBorins MylesBorins mentioned this pull request Feb 21, 2018
@MylesBorins
Copy link
Contributor

Should this be backported to v8.x-staging? If yes please follow the guide and raise a backport PR, if not let me know or add the dont-land-on label.

@devsnek
Copy link
Member Author

devsnek commented Mar 20, 2018

this sits on top of #18085, so i'm not sure

@guybedford
Copy link
Contributor

If this is backported, it should be backported along with #18596.

ChALkeR added a commit that referenced this pull request Jun 23, 2018
This restores a broken and erroneously removed error, which was
accidentially renamed to ERR_MISSING_DYNAMIC_INTSTANTIATE_HOOK (notice
the "INTST" vs "INST") in 921fb84
(PR #16874) and then had documentation and implementation removed under
the old name in 6e1c25c (PR #18857),
as it appeared unused.

This error code never worked or was documented under the mistyped name
ERR_MISSING_DYNAMIC_INTSTANTIATE_HOOK, so renaming it back to
ERR_MISSING_DYNAMIC_INSTANTIATE_HOOK is a semver-patch fix.

Refs: #21440
Refs: #21470
Refs: #16874
Refs: #18857
targos pushed a commit to ChALkeR/io.js that referenced this pull request Jun 30, 2018
This restores a broken and erroneously removed error, which was
accidentially renamed to ERR_MISSING_DYNAMIC_INTSTANTIATE_HOOK (notice
the "INTST" vs "INST") in 921fb84
(PR nodejs#16874) and then had documentation and implementation removed under
the old name in 6e1c25c (PR nodejs#18857),
as it appeared unused.

This error code never worked or was documented under the mistyped name
ERR_MISSING_DYNAMIC_INTSTANTIATE_HOOK, so renaming it back to
ERR_MISSING_DYNAMIC_INSTANTIATE_HOOK is a semver-patch fix.

Refs: nodejs#21440
Refs: nodejs#21470
Refs: nodejs#16874
Refs: nodejs#18857
@ChALkeR
Copy link
Member

ChALkeR commented Jun 30, 2018

#21493 also should be included with any backport.

targos pushed a commit that referenced this pull request Jun 30, 2018
This restores a broken and erroneously removed error, which was
accidentially renamed to ERR_MISSING_DYNAMIC_INTSTANTIATE_HOOK (notice
the "INTST" vs "INST") in 921fb84
(PR #16874) and then had documentation and implementation removed under
the old name in 6e1c25c (PR #18857),
as it appeared unused.

This error code never worked or was documented under the mistyped name
ERR_MISSING_DYNAMIC_INTSTANTIATE_HOOK, so renaming it back to
ERR_MISSING_DYNAMIC_INSTANTIATE_HOOK is a semver-patch fix.

Refs: #21440
Refs: #21470
Refs: #16874
Refs: #18857

PR-URL: #21493
Reviewed-By: Gus Caplan <me@gus.host>
Reviewed-By: Trivikram Kamat <trivikr.dev@gmail.com>
Reviewed-By: Vse Mozhet Byt <vsemozhetbyt@gmail.com>
Reviewed-By: Tiancheng "Timothy" Gu <timothygu99@gmail.com>
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Ron Korving <ron@ronkorving.nl>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
targos pushed a commit that referenced this pull request Jul 3, 2018
This restores a broken and erroneously removed error, which was
accidentially renamed to ERR_MISSING_DYNAMIC_INTSTANTIATE_HOOK (notice
the "INTST" vs "INST") in 921fb84
(PR #16874) and then had documentation and implementation removed under
the old name in 6e1c25c (PR #18857),
as it appeared unused.

This error code never worked or was documented under the mistyped name
ERR_MISSING_DYNAMIC_INTSTANTIATE_HOOK, so renaming it back to
ERR_MISSING_DYNAMIC_INSTANTIATE_HOOK is a semver-patch fix.

Refs: #21440
Refs: #21470
Refs: #16874
Refs: #18857

PR-URL: #21493
Reviewed-By: Gus Caplan <me@gus.host>
Reviewed-By: Trivikram Kamat <trivikr.dev@gmail.com>
Reviewed-By: Vse Mozhet Byt <vsemozhetbyt@gmail.com>
Reviewed-By: Tiancheng "Timothy" Gu <timothygu99@gmail.com>
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Ron Korving <ron@ronkorving.nl>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
esm Issues and PRs related to the ECMAScript Modules implementation. lib / src Issues and PRs related to general changes in the lib or src directory.
Projects
None yet
Development

Successfully merging this pull request may close these issues.