proposal: Staged require() with lifecycle hooks #56

Qard · 2017-05-06T05:04:14Z

I'm still working on this, but I think it's time to get some more eyes on it.

This is the EPS continuation of my work in nodejs/node#12349 to attempt to enable an AST transformation pipeline in Node.js core. There's definitely some stuff missing still, and I'll add more over the weekend, but this is where I'm at today. 😸

@nodejs/diagnostics

sam-github · 2017-05-06T09:26:16Z

I wonder if a simpler approach of only a single hook would be better, but with Module offering APIs that can be used to implement that hook - APIs like _resolveLookupPaths() (but doced).

The 3-part API reminds me a bit of the mid-layer anti-pattern: https://lwn.net/Articles/336262/

Its very modelled on current behaviour, but if I wanted to implement a resolve like that used in https://github.com/zeit/pkg, I might want to completely replace all 3 stages with a different implementation, and would want just one hook, and then have a library of useful methods (resolve, load, parse, rewrite, ...) that I can use if I want, but no have the assumption that those 3 stages always occur baked into the extension hook API.

^--- not a rejection, just raising the possibility of a different approach that might be worth considering

cc: @watson @nodejs/diagnostics

sam-github · 2017-05-06T08:34:53Z

XXX-staged-require-with-lifecycle-hooks.md

+### Safety
+
+Monkey-patching can be unsafe for many reasons. Sometimes code the behaviour
+of code changes based on parameter length of input functions--express is a good


I know what you are trying to say about n-arity of functions, but the grammar is a bit garbled here.

I'll work on it. Thanks for the input. 😅

sam-github · 2017-05-06T08:45:45Z

XXX-staged-require-with-lifecycle-hooks.md

+
+A big issue on the horizon is that, if ES Modules are adopted, modules become
+immutable at build time. This means that the current monkey-patching approach
+will not work anymore. TC39 has already suggested using AST transformation


do you have a link to this discussion, or other ref?

It came up in one-on-one discussion with @bmeck at one point. Not sure if that is doc'd somewhere.

relevant links:
https://tc39.github.io/ecma262/#sec-module-namespace-exotic-objects
https://tc39.github.io/ecma262/#sec-createimportbinding

sam-github · 2017-05-06T08:50:22Z

XXX-staged-require-with-lifecycle-hooks.md

+
+### `require.resolvers`
+
+It is the responsibility of `require.resolvers` extension handlers to receive


Reminds me of https://www.lua.org/manual/5.1/manual.html#pdf-package.loaders

sam-github · 2017-05-06T09:17:43Z

XXX-staged-require-with-lifecycle-hooks.md

+```
+
+By defining named behaviour specific to `require()`, we open the door for
+supporting the different behaviour of `import` in the future.


maybe should be an array of resolvers, where if a resolver fails to resolve, the next one is tried?

also, the API only makes sense if in the future there will be a key that is not "require", which is a bit YAGNI, unless you actually demonstrate that the API will work for a future use-case ("import" is what I assume you are leaving the door open for)

Good idea on the failover design. 👍

Import is indeed what I'm thinking of with the named list design. That one is much more of a "maybe" to me than the other bits right now. I feel like there's something in here that ES Modules can benefit from in regards to the loader, but I'm not quite sure what yet.

I'd go for simpler and more generic unless we are certain this will be helpful for imports, and actually, as much as es mods are a WIP, its probably not worth doing this unless we are certain it works for them, or its guaranteed to be "legacy" in a couple years.

And isn't it 2am in Vancouver? ;-)

Almost 3AM. 😅

Also, my thinking was to put the import stuff in there just to spark some discussion around that. I'm definitely not tied to that part being in the final proposal. I'd really love to hear from other people more deeply involved in the ES Modules discussion than I, since both things are deeply tied to the module loading systems and I'd really like to get some coordination going on here.

sam-github · 2017-05-06T09:24:22Z

XXX-staged-require-with-lifecycle-hooks.md

+    'exports = module.exports ='
+  )
+}
+```


will c++ addons go through the loader?

Not sure. My thinking was the single line extensions handler currently used would become a compiler and it'd skip the loader step. Not sure if that makes sense though.

sam-github · 2017-05-06T09:28:01Z

XXX-staged-require-with-lifecycle-hooks.md

+require.compilers['.yaml'] = function yamlCompiler(request) {
+  request.module.exports = yaml.parse(request.contents)
+}
+```


also seems like it should be a stack. If the code was .ts, it might need compilation to .js, but after it's compiled to js, an APM that instruments js would want to get it so it can be rewritten for instrumentation

Yeah, my original PR went with a stacked design. I've been going back and forth and if that's much better than just wrapping the existing function repeatedly. It's a trade off of making all requires a bit slower with stack checks/iteration or just making patched environments slightly slower.

might be possible to optimize inside module. If the array has only one member, the function used to iterate the array can be replaced directly with the only function in the array. For arrays longer than one, we might even be able to pre-compose a function out of the members of the array (basically, cacheing them to avoid repeated iteration).

I definitely feel like the stacked interface is friendlier. Maybe I'll put that in tomorrow and have a couple alternative suggestions to discuss and narrow down on.

Qard · 2017-05-06T09:28:49Z

Yep, I had similar thinking. Hadn't thought yet about how to encode it into the proposal, but basically my thinking was just that the three-stage thing would be what the internals does, but higher levels could be wrapped too, if that's more suitable.

sam-github · 2017-05-06T09:32:00Z

It might be clearer to remove refs to AST, because no AST is exposed anywhere here. APMs would have to use esprima (or just regex or text substitution) to directly rewrite the js source, whether they do that by transforming to AST, modifying, and reencoding as js is an implementation detail if I understand the proposal.

I am generally warming to the idea of exposing hooks into the module require/import system, but also worried a bit about tying the hooks too closely to what we do today, which may be counter productive if the intention is to enable extensibility into future innovative use cases.

sam-github · 2017-05-06T09:35:28Z

@igorklopov would a proposal such as this help with pkg in implementing a loader from the deps compiled into the executable?

Qard · 2017-05-06T20:37:53Z

Ok, I rewrote most of the doc and added an alternative implementation. (I kind of prefer the alternative, to be honest)

sam-github · 2017-05-09T16:15:18Z

Still looks to me like the mid-layer anti-pattern: it assumes a 4-stage life-cycle, so makes this inflexible when it could be flexible.

What about something more like:

const m = require('module');


m.register(tsRequire);

function tsRequire(name, parent) {
  // name is the module name passed to require, parent is the Module doing the require
  // return value is a Module

  if (!name.test(/\.ts$/)) return;
  const found = m.resolveRelative(name, parent); // could look in a zip file or on a web server, or ...
  const src = fs.readFileSync(found);
  const xfrm = ts.compile(src);
  return  m.whatever(xfrm, parent) // however given js source one creates a Module to return
}

So there is only one hook, its given what require always knows (name and parent), there are some library functions it can use to do its job (essentially the internal functions that are used by require now), but it can do something completely different if it wants to, as long as the end result is "here is a Module for that name".

Qard · 2017-05-09T16:40:44Z

The problem with that approach is that you can't reasonably intercept the data between the stages. If, for example, I wanted to apply another transform to that code after the ts transpile, I'd have to monkey-patch the ts module itself along with the register function to ensure it only applies my extra transform when using ts through register handlers.

sam-github · 2017-05-09T18:03:11Z

@Qard Right, I see your point, and take back my suggestion.

Qard · 2017-05-09T18:28:24Z

I definitely get where you're coming from on wanting a more composable interface though.

Where the proposal is at now tries to balance enough power to cover the particular set of needs we know we have now without sacrificing too much in complexity and potentially performance burden.

I'm definitely open to suggestions on how to improve the proposal. It is an EPS after all.

sam-github · 2017-05-09T18:47:50Z

At this point, I think we need more feedback from prospective users.

mhdawson · 2017-05-09T21:51:49Z

Nice write-up with a good overview of the rational. @sam-github do you have suggestions on who specifically we should ask for feedback ?

bmeck · 2017-05-09T21:54:05Z

If I get time, I would like to review it. That is unlikely prior to June though :(

sam-github · 2017-05-09T22:51:20Z

@mhdawson people who are working in the target use-cases, to quote the proposal:

Tracing, debugging and monitoring
Code coverage
Mocking
Transpiling

So, APM implementors (new relic, appdynamics, opsbeat, appmetrics, etc), someone from Istanbul (are there other coverage implementations?), mocking... no idea, babel/typescript/...

novemberborn · 2017-05-10T11:09:59Z

AVA and (nominally) nyc/Istanbul maintainer here. Thank you @Qard for raising this proposal, we've ended up using packages like append-transform and caching-transform which required a lot of effort to make work correctly. It'd be great to have official support from Node.js itself, especially as ES modules come into play.

nyc has a use case where it needs to apply its instrumentation transform last. This is tricky though, since nyc bootstraps itself in child process so that it's loaded first, and it is not typically aware of say a user-supplied babel-register hook. append-transform goes to great lengths to achieve this.

Loader objects should make this easier, since it means there's only one list who's order has to be controlled. Ideally the require.loaders property would not be configurable or writable. Perhaps it could be a custom collection that controls where in the pipeline a loader is inserted and that prevents arbitrary mutation.

jkrems · 2017-05-10T15:47:10Z

One note about the API: In JS module land, it will also be valuable to control linking. That's the mechanism that would still allow "monkey-patching" (since source transforms have quite a few downsides re: access to dependencies).

bmeck · 2017-05-10T15:59:34Z

@jkrems you need to be careful here, as interrupting linking generally removes exports from being live. There is no notification when exports update and no getter like mechanism upon accessing an import.

jkrems · 2017-05-10T16:06:51Z

@bmeck Interesting! The whole "when is a binding live live" thing still trips me up. What I was thinking was something along the lines of:

// original.js
export function f() {}
export function g() {}

// monkey-patch.js
export { g } from '::magic'; // or whatever that syntax is
import { f as originalF } from '::magic';
export function f() {
   // Do typical monkey stuff
   return originalF.apply(this, arguments);
}

I definitely have to do more experimentation/reading and it fails terribly if you don't have a full list of the original module's exports (and there currently is no API exposing that info).

bmeck · 2017-05-10T16:08:58Z

@jkrems also, your example is using a fn, which is safe-ish since it evaluates on each invocation, but things like:

export let now = Date.now();
setInterval(() => now = Date.now());

where you can't execute code upon access are the real problems.

Note: v8 has rejected implementing getters for variable access due to a number of reasons.

watson · 2017-05-17T23:43:56Z

XXX-staged-require-with-lifecycle-hooks.md

+- For the object-based style, it might also be a good idea to, rather than have
+the `transform` method, have `preCompile` and `postCompile` transformers.
+
+[1]: https://github.com/nodejs/node/blob/master/lib/_tls_wrap.js#L829


You probably wanna future proof this link by tying it to a specific git ref (it's already broken) - tip: press y while on the github page and it will update the URL for you with a ref

Good catch, I'll fix that soon.

watson · 2017-05-18T01:34:49Z

XXX-staged-require-with-lifecycle-hooks.md

+### `require.resolvers`
+
+It is the responsibility of `require.resolvers` extension handlers to receive
+the in-process `Module` object and use the metadata on it to figure out how to


When referencing the Module object, did you mean ModuleRequest?

Yep. I'll make some revisions soon. Pretty busy for the next few days! 😱

watson · 2017-05-18T01:39:50Z

XXX-staged-require-with-lifecycle-hooks.md

+
+```js
+require.transformers.push(({ contents, resolvedPath }) => {
+  if (!/\.js$/.test(resolvedPath)) return


shouldn't it return contents? Or is returning undefined the same as not modifying the contents?

Yep, returning undefined means no changes. Could work either way though.

matthewloring · 2017-05-18T01:51:41Z

How will these lifecycle events work with the module cache? Would you expect to do the module cache lookup after resolution as is done currently and if so would only the resolution event be emitted on cache hits?

watson · 2017-05-18T02:06:08Z

@matthewloring Personally I'd expect the cache to be populated with the output of the pipeline. So the cache would then continue to function just as today. I can't currently see a use-case where you'd like any of the hooks to fire if the module is being loaded from the cache... but I might have over looked something

matthewloring · 2017-05-18T02:41:08Z

@watson The cache is currently keyed on the result of the resolution step so I think the user provided resolver would still be run before the cache lookup could occur. The use case that comes to mind for this would be module wrapping where you want the wrapper module to be given when the parent is not the wrapper module itself (which needs a handle to the module being wrapped). The cache does not currently take the parent into account to my knowledge which could interfere with this behavior.

Qard · 2017-05-22T23:33:14Z

I made a few revisions. The resolve hook should always run, even when there's a cache entry, since the cache entries are currently structured as resolved paths for keys.

@matthewloring That should be possible by looking at the parent property on the ModuleRequest.

matthewloring · 2017-05-23T00:10:03Z

Ok, that sound good. It does exclude applying different source transformations based on the parent module but I'm not sure how compelling that use case is.

mcollina · 2017-05-23T07:15:20Z

I do not think we should expose this API through the standard require function. Let's attach it somewhere more "hidden", as it is a relative advanced topic, if this lands.

I think we should state that this new API will not have any slowdown in load time if it is not enabled. Node.js fast boot time is one of its main features, and we should keep that. Moreover, it should be fast even it is enabled, as there should be little overhead in passing the data.

Given that ES6 modules are loaded at a different time compared to the standard require. ES6 modules are loaded during parsing, rather than at runtime. Could this transformation pipeline work with ES6 modules? In that case, would we be able to implement something like http://npm.im/proxyquire for ES6 modules?

BridgeAR · 2017-09-12T20:13:34Z

There was no progress for a long while here and I do not see any conclusion. Is there any progress here?

Qard · 2017-09-26T23:38:42Z

I'm planning on revisiting this soon and making some updates related to supporting es modules. :)

Trott · 2018-06-12T23:27:44Z

Closing, but feel free to re-open or move to a more appropriate/higher-visibility repository.

proposal: Staged require() with lifecycle hooks

375b0dc

sam-github reviewed May 6, 2017

View reviewed changes

Qard added 2 commits May 6, 2017 11:38

elaborate on motivations a bit

3664146

rewrote hook-style a bit and added object-based style

0e1f17b

watson reviewed May 17, 2017

View reviewed changes

watson reviewed May 18, 2017

View reviewed changes

minor revisions

19247a4

Qard mentioned this pull request Jun 21, 2017

proposal: JS tracing APIs #48

Closed

Trott closed this Jun 12, 2018


		### `require.resolvers`

		It is the responsibility of `require.resolvers` extension handlers to receive

proposal: Staged require() with lifecycle hooks #56

proposal: Staged require() with lifecycle hooks #56

Conversation

Qard commented May 6, 2017

sam-github commented May 6, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Qard commented May 6, 2017

sam-github commented May 6, 2017

sam-github commented May 6, 2017

Qard commented May 6, 2017

sam-github commented May 9, 2017

Qard commented May 9, 2017

sam-github commented May 9, 2017

Qard commented May 9, 2017

sam-github commented May 9, 2017

mhdawson commented May 9, 2017

bmeck commented May 9, 2017

sam-github commented May 9, 2017

novemberborn commented May 10, 2017

jkrems commented May 10, 2017

bmeck commented May 10, 2017

jkrems commented May 10, 2017

bmeck commented May 10, 2017 • edited Loading

watson May 17, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

matthewloring commented May 18, 2017

watson commented May 18, 2017

matthewloring commented May 18, 2017

Qard commented May 22, 2017

matthewloring commented May 23, 2017

mcollina commented May 23, 2017

BridgeAR commented Sep 12, 2017

Qard commented Sep 26, 2017

Trott commented Jun 12, 2018

bmeck commented May 10, 2017 •

edited

Loading

watson May 17, 2017 •

edited

Loading