Skip to content
This repository has been archived by the owner on Jul 31, 2018. It is now read-only.

proposal: Staged require() with lifecycle hooks #56

Closed
wants to merge 4 commits into from

Conversation

Qard
Copy link
Member

@Qard Qard commented May 6, 2017

I'm still working on this, but I think it's time to get some more eyes on it.

This is the EPS continuation of my work in nodejs/node#12349 to attempt to enable an AST transformation pipeline in Node.js core. There's definitely some stuff missing still, and I'll add more over the weekend, but this is where I'm at today. 😸

@nodejs/diagnostics

@sam-github
Copy link

I wonder if a simpler approach of only a single hook would be better, but with Module offering APIs that can be used to implement that hook - APIs like _resolveLookupPaths() (but doced).

The 3-part API reminds me a bit of the mid-layer anti-pattern: https://lwn.net/Articles/336262/

Its very modelled on current behaviour, but if I wanted to implement a resolve like that used in https://github.com/zeit/pkg, I might want to completely replace all 3 stages with a different implementation, and would want just one hook, and then have a library of useful methods (resolve, load, parse, rewrite, ...) that I can use if I want, but no have the assumption that those 3 stages always occur baked into the extension hook API.

^--- not a rejection, just raising the possibility of a different approach that might be worth considering

cc: @watson @nodejs/diagnostics

### Safety

Monkey-patching can be unsafe for many reasons. Sometimes code the behaviour
of code changes based on parameter length of input functions--express is a good

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know what you are trying to say about n-arity of functions, but the grammar is a bit garbled here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll work on it. Thanks for the input. 😅


A big issue on the horizon is that, if ES Modules are adopted, modules become
immutable at build time. This means that the current monkey-patching approach
will not work anymore. TC39 has already suggested using AST transformation

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you have a link to this discussion, or other ref?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It came up in one-on-one discussion with @bmeck at one point. Not sure if that is doc'd somewhere.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


### `require.resolvers`

It is the responsibility of `require.resolvers` extension handlers to receive

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

```

By defining named behaviour specific to `require()`, we open the door for
supporting the different behaviour of `import` in the future.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe should be an array of resolvers, where if a resolver fails to resolve, the next one is tried?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, the API only makes sense if in the future there will be a key that is not "require", which is a bit YAGNI, unless you actually demonstrate that the API will work for a future use-case ("import" is what I assume you are leaving the door open for)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea on the failover design. 👍

Import is indeed what I'm thinking of with the named list design. That one is much more of a "maybe" to me than the other bits right now. I feel like there's something in here that ES Modules can benefit from in regards to the loader, but I'm not quite sure what yet.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd go for simpler and more generic unless we are certain this will be helpful for imports, and actually, as much as es mods are a WIP, its probably not worth doing this unless we are certain it works for them, or its guaranteed to be "legacy" in a couple years.

And isn't it 2am in Vancouver? ;-)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost 3AM. 😅

Also, my thinking was to put the import stuff in there just to spark some discussion around that. I'm definitely not tied to that part being in the final proposal. I'd really love to hear from other people more deeply involved in the ES Modules discussion than I, since both things are deeply tied to the module loading systems and I'd really like to get some coordination going on here.

'exports = module.exports ='
)
}
```

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will c++ addons go through the loader?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure. My thinking was the single line extensions handler currently used would become a compiler and it'd skip the loader step. Not sure if that makes sense though.

require.compilers['.yaml'] = function yamlCompiler(request) {
request.module.exports = yaml.parse(request.contents)
}
```

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also seems like it should be a stack. If the code was .ts, it might need compilation to .js, but after it's compiled to js, an APM that instruments js would want to get it so it can be rewritten for instrumentation

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, my original PR went with a stacked design. I've been going back and forth and if that's much better than just wrapping the existing function repeatedly. It's a trade off of making all requires a bit slower with stack checks/iteration or just making patched environments slightly slower.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be possible to optimize inside module. If the array has only one member, the function used to iterate the array can be replaced directly with the only function in the array. For arrays longer than one, we might even be able to pre-compose a function out of the members of the array (basically, cacheing them to avoid repeated iteration).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I definitely feel like the stacked interface is friendlier. Maybe I'll put that in tomorrow and have a couple alternative suggestions to discuss and narrow down on.

@Qard
Copy link
Member Author

Qard commented May 6, 2017

Yep, I had similar thinking. Hadn't thought yet about how to encode it into the proposal, but basically my thinking was just that the three-stage thing would be what the internals does, but higher levels could be wrapped too, if that's more suitable.

@sam-github
Copy link

It might be clearer to remove refs to AST, because no AST is exposed anywhere here. APMs would have to use esprima (or just regex or text substitution) to directly rewrite the js source, whether they do that by transforming to AST, modifying, and reencoding as js is an implementation detail if I understand the proposal.

I am generally warming to the idea of exposing hooks into the module require/import system, but also worried a bit about tying the hooks too closely to what we do today, which may be counter productive if the intention is to enable extensibility into future innovative use cases.

@sam-github
Copy link

@igorklopov would a proposal such as this help with pkg in implementing a loader from the deps compiled into the executable?

@Qard
Copy link
Member Author

Qard commented May 6, 2017

Ok, I rewrote most of the doc and added an alternative implementation. (I kind of prefer the alternative, to be honest)

@sam-github
Copy link

Still looks to me like the mid-layer anti-pattern: it assumes a 4-stage life-cycle, so makes this inflexible when it could be flexible.

What about something more like:

const m = require('module');


m.register(tsRequire);

function tsRequire(name, parent) {
  // name is the module name passed to require, parent is the Module doing the require
  // return value is a Module

  if (!name.test(/\.ts$/)) return;
  const found = m.resolveRelative(name, parent); // could look in a zip file or on a web server, or ...
  const src = fs.readFileSync(found);
  const xfrm = ts.compile(src);
  return  m.whatever(xfrm, parent) // however given js source one creates a Module to return
}

So there is only one hook, its given what require always knows (name and parent), there are some library functions it can use to do its job (essentially the internal functions that are used by require now), but it can do something completely different if it wants to, as long as the end result is "here is a Module for that name".

@Qard
Copy link
Member Author

Qard commented May 9, 2017

The problem with that approach is that you can't reasonably intercept the data between the stages. If, for example, I wanted to apply another transform to that code after the ts transpile, I'd have to monkey-patch the ts module itself along with the register function to ensure it only applies my extra transform when using ts through register handlers.

@sam-github
Copy link

@Qard Right, I see your point, and take back my suggestion.

@Qard
Copy link
Member Author

Qard commented May 9, 2017

I definitely get where you're coming from on wanting a more composable interface though.

Where the proposal is at now tries to balance enough power to cover the particular set of needs we know we have now without sacrificing too much in complexity and potentially performance burden.

I'm definitely open to suggestions on how to improve the proposal. It is an EPS after all.

@sam-github
Copy link

At this point, I think we need more feedback from prospective users.

@mhdawson
Copy link
Member

mhdawson commented May 9, 2017

Nice write-up with a good overview of the rational. @sam-github do you have suggestions on who specifically we should ask for feedback ?

@bmeck
Copy link
Member

bmeck commented May 9, 2017

If I get time, I would like to review it. That is unlikely prior to June though :(

@sam-github
Copy link

@mhdawson people who are working in the target use-cases, to quote the proposal:

  • Tracing, debugging and monitoring
  • Code coverage
  • Mocking
  • Transpiling

So, APM implementors (new relic, appdynamics, opsbeat, appmetrics, etc), someone from Istanbul (are there other coverage implementations?), mocking... no idea, babel/typescript/...

@novemberborn
Copy link

AVA and (nominally) nyc/Istanbul maintainer here. Thank you @Qard for raising this proposal, we've ended up using packages like append-transform and caching-transform which required a lot of effort to make work correctly. It'd be great to have official support from Node.js itself, especially as ES modules come into play.

nyc has a use case where it needs to apply its instrumentation transform last. This is tricky though, since nyc bootstraps itself in child process so that it's loaded first, and it is not typically aware of say a user-supplied babel-register hook. append-transform goes to great lengths to achieve this.

Loader objects should make this easier, since it means there's only one list who's order has to be controlled. Ideally the require.loaders property would not be configurable or writable. Perhaps it could be a custom collection that controls where in the pipeline a loader is inserted and that prevents arbitrary mutation.

@jkrems
Copy link

jkrems commented May 10, 2017

One note about the API: In JS module land, it will also be valuable to control linking. That's the mechanism that would still allow "monkey-patching" (since source transforms have quite a few downsides re: access to dependencies).

@bmeck
Copy link
Member

bmeck commented May 10, 2017

@jkrems you need to be careful here, as interrupting linking generally removes exports from being live. There is no notification when exports update and no getter like mechanism upon accessing an import.

@jkrems
Copy link

jkrems commented May 10, 2017

@bmeck Interesting! The whole "when is a binding live live" thing still trips me up. What I was thinking was something along the lines of:

// original.js
export function f() {}
export function g() {}
// monkey-patch.js
export { g } from '::magic'; // or whatever that syntax is
import { f as originalF } from '::magic';
export function f() {
   // Do typical monkey stuff
   return originalF.apply(this, arguments);
}

I definitely have to do more experimentation/reading and it fails terribly if you don't have a full list of the original module's exports (and there currently is no API exposing that info).

@bmeck
Copy link
Member

bmeck commented May 10, 2017

@jkrems also, your example is using a fn, which is safe-ish since it evaluates on each invocation, but things like:

export let now = Date.now();
setInterval(() => now = Date.now());

where you can't execute code upon access are the real problems.

Note: v8 has rejected implementing getters for variable access due to a number of reasons.

- For the object-based style, it might also be a good idea to, rather than have
the `transform` method, have `preCompile` and `postCompile` transformers.

[1]: https://github.com/nodejs/node/blob/master/lib/_tls_wrap.js#L829
Copy link
Member

@watson watson May 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably wanna future proof this link by tying it to a specific git ref (it's already broken) - tip: press y while on the github page and it will update the URL for you with a ref

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I'll fix that soon.

### `require.resolvers`

It is the responsibility of `require.resolvers` extension handlers to receive
the in-process `Module` object and use the metadata on it to figure out how to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When referencing the Module object, did you mean ModuleRequest?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. I'll make some revisions soon. Pretty busy for the next few days! 😱


```js
require.transformers.push(({ contents, resolvedPath }) => {
if (!/\.js$/.test(resolvedPath)) return
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't it return contents? Or is returning undefined the same as not modifying the contents?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, returning undefined means no changes. Could work either way though.

@matthewloring
Copy link

How will these lifecycle events work with the module cache? Would you expect to do the module cache lookup after resolution as is done currently and if so would only the resolution event be emitted on cache hits?

@watson
Copy link
Member

watson commented May 18, 2017

@matthewloring Personally I'd expect the cache to be populated with the output of the pipeline. So the cache would then continue to function just as today. I can't currently see a use-case where you'd like any of the hooks to fire if the module is being loaded from the cache... but I might have over looked something

@matthewloring
Copy link

@watson The cache is currently keyed on the result of the resolution step so I think the user provided resolver would still be run before the cache lookup could occur. The use case that comes to mind for this would be module wrapping where you want the wrapper module to be given when the parent is not the wrapper module itself (which needs a handle to the module being wrapped). The cache does not currently take the parent into account to my knowledge which could interfere with this behavior.

@Qard
Copy link
Member Author

Qard commented May 22, 2017

I made a few revisions. The resolve hook should always run, even when there's a cache entry, since the cache entries are currently structured as resolved paths for keys.

@matthewloring That should be possible by looking at the parent property on the ModuleRequest.

@matthewloring
Copy link

Ok, that sound good. It does exclude applying different source transformations based on the parent module but I'm not sure how compelling that use case is.

@mcollina
Copy link
Member

I do not think we should expose this API through the standard require function. Let's attach it somewhere more "hidden", as it is a relative advanced topic, if this lands.

I think we should state that this new API will not have any slowdown in load time if it is not enabled. Node.js fast boot time is one of its main features, and we should keep that. Moreover, it should be fast even it is enabled, as there should be little overhead in passing the data.

Given that ES6 modules are loaded at a different time compared to the standard require. ES6 modules are loaded during parsing, rather than at runtime. Could this transformation pipeline work with ES6 modules? In that case, would we be able to implement something like http://npm.im/proxyquire for ES6 modules?

@Qard Qard mentioned this pull request Jun 21, 2017
@BridgeAR
Copy link
Member

There was no progress for a long while here and I do not see any conclusion. Is there any progress here?

@Qard
Copy link
Member Author

Qard commented Sep 26, 2017

I'm planning on revisiting this soon and making some updates related to supporting es modules. :)

@Trott
Copy link
Member

Trott commented Jun 12, 2018

Closing, but feel free to re-open or move to a more appropriate/higher-visibility repository.

@Trott Trott closed this Jun 12, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.