Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

esm: source hooks #30986

Closed
wants to merge 16 commits into from
Closed

Conversation

GeoffreyBooth
Copy link
Member

@GeoffreyBooth GeoffreyBooth commented Dec 16, 2019

This PR adds additional hooks to the ESM --loader implementation:

  • getSource allows a user to override the function that Node’s ESM loader uses to retrieve the source code for a specifier (e.g. loading a JavaScript file from disk). A loader that overrides this can do things like load from cache or memory (like tink or yarn pnp) or from other sources like https:// URLs. An example of the former is in the tests, and the latter is in the docs.

  • transformSource allows a user to mutate the source code string or buffer after it’s been retrieved via getSource but before Node does anything else with it. This allows for things like a transpiler loader (TypeScript, CoffeeScript, JSX, Babel, etc.). This is separate, and intentionally smaller in scope, than a potential translate hook which could override one of the existing translators (such as for module, commonjs, wasm etc.; see lib/internal/modules/esm/translators.js).

See discussion in nodejs/modules#351. cc @nodejs/modules-active-members

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • documentation is changed or added
  • commit message follows commit guidelines

bmeck
bmeck previously requested changes Dec 16, 2019
Copy link
Member

@bmeck bmeck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some changes, but I'd be hesitant to land these changes if we want to do a more serious overhaul.

doc/api/esm.md Outdated Show resolved Hide resolved
lib/internal/modules/esm/loader.js Outdated Show resolved Hide resolved
doc/api/esm.md Outdated Show resolved Hide resolved
test/es-module/test-esm-get-source-loader.mjs Show resolved Hide resolved
@MylesBorins MylesBorins added the esm Issues and PRs related to the ECMAScript Modules implementation. label Dec 16, 2019
@bmeck
Copy link
Member

bmeck commented Dec 16, 2019

I talked with @GeoffreyBooth and am fine with landing this as long as we make the potential for change much more visible as we have discussed other hooks and I synced up with him about how these hooks are different from the design docs hooks we have previously discussed. We should sync up about the design of these hooks being explicitly written out, but this is fine as long as we go with an iterative approach and do not commit to these as final hooks.

Copy link
Member

@bmeck bmeck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dismissing except for nit, seems ok as we are being more explicit as these are likely to change and merging this PR does not cement the hooks.

@bmeck bmeck dismissed their stale review December 16, 2019 22:17

agreed to iterate elsewhere afterwards/document instability

@devsnek
Copy link
Member

devsnek commented Dec 16, 2019

Do you think you could set up a benchmark? It would be good to keep a handle on how our loader performance is, and this seems like it might add a lot of overhead (not that I'm saying we shouldn't land it, but rather we should just be aware).

@devsnek devsnek added the experimental Issues and PRs related to experimental features. label Dec 16, 2019
@GeoffreyBooth
Copy link
Member Author

Do you think you could set up a benchmark?

That would be great, but I’ve never done benchmarks in the Node codebase. Is there something you can point me to to show me how? Can that be its own PR?

Also more broadly how would that work? Like I would assume that the benchmarked case is “no custom loaders,” so is what we’re benchmarking just how much slower that case becomes as more hooks are added? Or is your thinking to benchmark “no loaders” vs “with custom loader A” or B or C etc.?

@devsnek
Copy link
Member

devsnek commented Dec 16, 2019

@GeoffreyBooth benchmarks are here: https://github.com/nodejs/node/tree/master/test/benchmark. From what I can tell they run a specified test a bunch of times. The case I have in mind at the moment is "resolve big tree without hooks", specifically I'm eyeing the added await and two branches in each translator path, which is why I thought it would be good to add the benchmark now. Adding it later also works too.

@GeoffreyBooth
Copy link
Member Author

Adding it later also works too.

Thanks. I’m not sure when I’ll have time for that, so if you don’t mind I’d prefer to do that as a follow-up PR (or anyone else is welcome to).

Copy link
Contributor

@guybedford guybedford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing work moving this forward!

Would be nice to use the direct hook functions like with resolve, instead of having the custom wrappers, see the review comments.

I would still prefer to only have a getSource hook and not also add a transform hook, since the ability to transform is enabled by the hook composition model, and I don't like the idea of endorsing the concept of transform hooks in the API explicitly, as opposed to simply enabling them.

doc/api/esm.md Outdated Show resolved Hide resolved
lib/internal/modules/esm/translators.js Outdated Show resolved Hide resolved
lib/internal/modules/esm/loader.js Outdated Show resolved Hide resolved
lib/internal/modules/esm/loader.js Outdated Show resolved Hide resolved
lib/internal/modules/esm/translators.js Outdated Show resolved Hide resolved
lib/internal/modules/esm/translators.js Outdated Show resolved Hide resolved
Copy link
Contributor

@guybedford guybedford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about this a little further, I think this is a useful opportunity to also move the format return to the getSource hook.

The reason for this is that format being returned by the resolver was always due to us not having a source hook, and having the source hook return the format is much more useful in the loader workflows (like HTTP loaders).

I think it's important to get that in as part of this.

@guybedford
Copy link
Contributor

As discussed further with @GeoffreyBooth I'd be happy to have a getFormat hook that can be called after resolve, and before getSource.

@GeoffreyBooth
Copy link
Member Author

GeoffreyBooth commented Dec 17, 2019

As discussed further with @GeoffreyBooth I'd be happy to have a getFormat hook that can be called after resolve, and before getSource.

So the current flow works like this:

  1. The ESM loader receives a specifier, e.g. 'some-pkg/some-export'
  2. resolve converts that specifier into a file URL, e.g. 'file:///.../node_modules/some-pkg/lib/some-export.js'
  3. resolve additionally determines the format by looking at the file extension of the URL in 2 and possibly referencing package type. format in this case is the name of one of the translators: module, commonjs, builtins, wasm, json, dynamic
  4. Based on the format returned, one of the translators is chosen and passed the URL
  5. Inside the translator, getSource retrieves file contents for the given URL (and transformSource possibly mutates the contents)

So a getFormat hook would split out that logic from inside resolve (i.e. make step 3 above its own hook). I think that would be all we do for the first version of the hook.

Later on we probably want the “determine format” logic to be possibly set or overridden by the source returned in getSource, for example if getSource makes an HTTP call and we want the format set based on the headers in the response. I would still want getFormat to be its own hook, but we’d have to reorder the steps of the flow or make it optional when the format is determined. That’s a bigger refactor, possibly a redesign of this whole flow, that I think would be a separate effort.

@coreyfarrell
Copy link
Member

I would still prefer to only have a getSource hook and not also add a transform hook, since the ability to transform is enabled by the hook composition model, and I don't like the idea of endorsing the concept of transform hooks in the API explicitly, as opposed to simply enabling them.

Combining them would cause problems once adding multiple hooks is supported. getSource callbacks are not necessarily expected to call the defaultGetSource but every registered transform hook should be run on every source. Sorry that I'm not understanding, can you tell me the reason for opposing direct support for transform hooks?

@guybedford
Copy link
Contributor

Combining them would cause problems once adding multiple hooks is supported. getSource callbacks are not necessarily expected to call the defaultGetSource but every registered transform hook should be run on every source.

Yes, with the getSource model, we can still have both classes of hooks, but they follow different rules - source hooks ignore defaultGetSource, while transform hooks implemented via getSource always chain with defaultGetSource (which under loader composition would be the parent loader return value).

Sorry that I'm not understanding, can you tell me the reason for opposing direct support for transform hooks?

Great question - the primary reason is a human / branding one. Including a transform hook explicitly in the Node.js documentation acts as an advertisement to users to write their own transform loaders, and that leads to a bad time unless loader authors are highly skilled at implementing caching, from my experience with SystemJS.

@bmeck
Copy link
Member

bmeck commented Dec 18, 2019

I've extracted a getFormat hook based upon conversations. There is some dancing going on so that old resolve hooks are properly maintained to minimize migration breakage. Ideally we can remove some of the odd ordering of operations in the future when we fully remove format from resolve.

Copy link
Contributor

@guybedford guybedford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good.

doc/api/esm.md Outdated Show resolved Hide resolved
doc/api/esm.md Outdated Show resolved Hide resolved
lib/internal/modules/esm/loader.js Outdated Show resolved Hide resolved
'ExperimentalWarning');
format = legacyExtensionFormatMap[ext];
}
return { format: format || null };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to make this an object, or can we just permit a string return?

Copy link
Member Author

@GeoffreyBooth GeoffreyBooth Dec 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it depends on if we might extend this in the future to return more data? Perhaps data or functions to pass between hooks?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For getFormat in particular I feel that this should be an object while we sort out how things like bodies can be ferried between to avoid double parsing/fetching.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we perhaps still support the direct string as a sugar case though?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not the biggest fan of polymorphic return types since that means each stage has to do branching marshaling to/from strings. If we keep it the same as the value returned by getDefaultFormat that seems better than branching to me. I could be swayed otherwise but would rather start throwing on invalid returns like the documentation example has/had rather than trying to marshal multiple types. I think an alternative of moving getFormat to be combined with getSource would remove the need for this data ferrying as well and might be more palpable to myself.

Copy link
Member

@coreyfarrell coreyfarrell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About the defaultGetFormat / defaultGetSource passed to the hook, I think these will eventually need to be async functions. Right now only a single loader hook is supported but when they can be stacked the defaultFn argument will not be the node.js function but will be an async function provided by another loader hook.

doc/api/esm.md Outdated Show resolved Hide resolved
doc/api/esm.md Outdated Show resolved Hide resolved
lib/internal/modules/esm/default_resolve.js Outdated Show resolved Hide resolved
@coreyfarrell
Copy link
Member

Sorry that I'm not understanding, can you tell me the reason for opposing direct support for transform hooks?

Great question - the primary reason is a human / branding one. Including a transform hook explicitly in the Node.js documentation acts as an advertisement to users to write their own transform loaders, and that leads to a bad time unless loader authors are highly skilled at implementing caching, from my experience with SystemJS.

@guybedford I'm worried that if transformSource is omitted transform hooks will get blocked/ignored by fetch hooks. Specifically I'm concerned that this will introduce a situation where a fetch hook added in the incorrect order will cause the nyc transform (instrument) function to simply never run. No error will be provided to the user, they just won't get coverage.

I appreciate your not wanting to encourage unnecessary transform hooks but can this be solved by documentation warnings? In addition I have not seen firm plans for an API to add loader hooks. At one point I was hoping for an API to enable require('./install-a-loader-hook.js') but your comment about SystemJS has given me reason to reconsider this. If --loader will be the only way to add a hook this is essentially a permanently flagged feature.

@arcanis
Copy link
Contributor

arcanis commented Dec 21, 2019

When running yarn install, we generate a single .pnp.js file that contains the CJS loader and inject it in the environment through --require. For the loader API to be viable for us we'd need this .pnp.js file to keep working as it does while also being a valid loader. I'm not sure that's the case: the .pnp.js file will be CJS (and need to, for backward compat), whereas the loader has to be ESM from what I gather.

Why are loaders required to be ESM? As it been considered to also support CJS-exported loaders?

@frank-dspeed
Copy link
Contributor

maybe look into my implamentation https://github.com/direktspeed/esm-loader it works without hooks and is highly configurable via parsers

@GeoffreyBooth
Copy link
Member Author

I pushed some commits last night but this is still WIP. A few TODOs:

  • getFormat currently needs to be synchronous, but we should allow a getFormat hook to be async.

  • Since getFormat always returns an object, e.g. {format: 'module'}, the other new hooks should also always return objects, e.g. {source: '...'}.

  • We should probably remove loader from all of the hook signatures.

  • How do people feel about the unified signatures across all hooks? They’re now all hook(options, defaultHook) e.g. resolve({ specifier, parentURL }, defaultResolve) and so on. And all would return objects.

@GeoffreyBooth
Copy link
Member Author

I updated the docs to move the loader examples into their own section, where I think they make much more sense rather than haphazardly spread across multiple hooks’ sections. I think that’s everything, @guybedford please let me know if there’s anything else.

@nodejs-github-bot
Copy link
Collaborator

@guybedford
Copy link
Contributor

Once CI completes it looks like we will be good to land this. Further final reviews welcome.

guybedford pushed a commit that referenced this pull request Jan 6, 2020
PR-URL: #30986
Reviewed-By: Guy Bedford <guybedford@gmail.com>
Reviewed-By: Bradley Farias <bradley.meck@gmail.com>
@guybedford
Copy link
Contributor

Landed in 2551a21.

@guybedford guybedford closed this Jan 6, 2020
@GeoffreyBooth GeoffreyBooth deleted the source-hooks branch January 6, 2020 23:32
@coreyfarrell coreyfarrell mentioned this pull request Jan 7, 2020
4 tasks
MylesBorins pushed a commit that referenced this pull request Jan 16, 2020
PR-URL: #30986
Reviewed-By: Guy Bedford <guybedford@gmail.com>
Reviewed-By: Bradley Farias <bradley.meck@gmail.com>
@codebytere codebytere mentioned this pull request Jan 16, 2020
codebytere added a commit that referenced this pull request Jan 20, 2020
Notable changes:

* deps:
  * upgrade to libuv 1.34.1 (cjihrig) #31332
  * upgrade npm to 6.13.6 (Ruy Adorno) #31304
* doc:
  * add GeoffreyBooth to collaborators (Geoffrey Booth) #31306
* module
  * add API for interacting with source maps (bcoe) #31132
  * loader getSource, getFormat, transform hooks (Geoffrey Booth) #30986
  * logical conditional exports ordering (Guy Bedford) #31008
  * unflag conditional exports (Guy Bedford) #31001
* process:
  * allow monitoring uncaughtException (Gerhard Stoebich) #31257

PR-URL: #31382
codebytere added a commit that referenced this pull request Jan 21, 2020
Notable changes:

* deps:
  * upgrade to libuv 1.34.1 (cjihrig) #31332
  * upgrade npm to 6.13.6 (Ruy Adorno) #31304
* module
  * add API for interacting with source maps (bcoe) #31132
  * loader getSource, getFormat, transform hooks (Geoffrey Booth) #30986
  * logical conditional exports ordering (Guy Bedford) #31008
  * unflag conditional exports (Guy Bedford) #31001
* process:
  * allow monitoring uncaughtException (Gerhard Stoebich) #31257
* Added new collaborators:
  * [GeoffreyBooth](https://github.com/GeoffreyBooth) - Geoffrey Booth. #31306

PR-URL: #31382
codebytere added a commit that referenced this pull request Jan 21, 2020
Notable changes:

* deps:
  * upgrade to libuv 1.34.1 (cjihrig) #31332
  * upgrade npm to 6.13.6 (Ruy Adorno) #31304
* module
  * add API for interacting with source maps (bcoe) #31132
  * loader getSource, getFormat, transform hooks (Geoffrey Booth) #30986
  * logical conditional exports ordering (Guy Bedford) #31008
  * unflag conditional exports (Guy Bedford) #31001
* process:
  * allow monitoring uncaughtException (Gerhard Stoebich) #31257
* Added new collaborators:
  * [GeoffreyBooth](https://github.com/GeoffreyBooth) - Geoffrey Booth. #31306

PR-URL: #31382
MylesBorins pushed a commit that referenced this pull request Jan 28, 2020
PR-URL: #30986
Reviewed-By: Guy Bedford <guybedford@gmail.com>
Reviewed-By: Bradley Farias <bradley.meck@gmail.com>
BethGriggs pushed a commit that referenced this pull request Feb 6, 2020
PR-URL: #30986
Reviewed-By: Guy Bedford <guybedford@gmail.com>
Reviewed-By: Bradley Farias <bradley.meck@gmail.com>
@MylesBorins MylesBorins mentioned this pull request Feb 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
esm Issues and PRs related to the ECMAScript Modules implementation. experimental Issues and PRs related to experimental features.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants