-
Notifications
You must be signed in to change notification settings - Fork 43
Loader Hooks #351
Comments
Some use cases I've encountered:I'm working on a custom dependency bundler and loader designed to improve cold-start startup times by transparently loading from a bundle to avoid file-system overhead. Currently, I have to monkey-patch I also want to load modules from V8 code-cache similar to v8-compile-cache. Again I have to re-implement Some other use cases that would benefit:
|
I think the current exposed hooks are the right hooks to expose, but we definitely need to work on polishing the API:
|
Very excited to see interest in this! I believe that @bmeck has a POC that has memory leaks that need to be fixed. @guybedford may know about this too |
@devsnek I think there's more things that list is missing. E.g. providing resource content, not just format. Or the question of if we can extend aspects of this feature to CommonJS (e.g. for the tink/entropic/yarn case that currently requires monkey-patching the CommonJS loader or even the |
@jkrems i think cjs loader hooks are outside the realm of our design. cjs can only deal with local files and it uses filenames, not urls. Providing resource content is an interesting idea though. I wonder if we could just figure out a way to pass vm modules to the loader. |
@devsnek we discussed and even implemented a PoC of intercepting CJS in the middle of last year and had a talk on the how/why in both These would only allow for files for |
@bmeck i don't doubt it can be done, i'm just less convinced it makes sense to include with the esm loader hooks given the large differences in the systems. |
@a-lxe thanks for opening this discussion. It was interesting to hear you say that multiple loaders were one of the features you find important here. The PR at nodejs/node#18914 could certainly be revived. Is this something you are thinking of working on? I'd be glad to collaborate on this work if you would like to discuss it further at all. |
@guybedford Yeah! At least to me it seems right now the singular I think a node loader hook api needs to provide mechanisms for compounding on and falling back to already registered hooks (or the default cjs/esm behavior). I'm not really sure yet where to focus work, but I definitely want to collaborate :) |
@a-lxe agreed we need a way to chain loaders. Would the approach in nodejs/node#18914 work for you, or if not, how would you want to go about it differently? One way to start might be to get that rebased and working again and then to iterate on it from there. |
@guybedford I like the way nodejs/node#18914 chains the loaders and provides Some gripes which are more relevant to the initial
Also, your last comment on nodejs/node#18914 hints at another loaders implementation by @bmeck. Does this exist in an actionable state?
|
Loaders are a higher-level feature of the environment, kind of like a boot system feature. They sit at the root of the security model for the application, so there are some security concerns here. In addition to that, hooking loaders during runtime can lead to unpredictable results, since any already-loaded modules will not get loaders applied. I'm sure @bmeck can clarify on these points, but those are the two I remember on this discussion offhand.
There is nothing to say we won't have CJS loader hooks or a generalized hook system, but it's just that our priority to date has been getting the ESM loader worked out. In addition the ESM hooks allow async functions, while CJS hooks would need some manipulation to support async calls. There's also the problem of the loaders running in different resolution spaces (URLs v paths) as discussed. Once we have our base ESM loader API finalized I'm sure we could extend it to CJS with some extra resolution metadata and handling of the resolution spaces, but I very much feel that loader unification is a "nice to have" that is additive over the base-level ESM API which should be the priority for us to consolidate and work towards first. That loader stability and architecture should take preference in the development process. That said, if you want to work on CJS unification first, feel free, but there are no guarantees the loader API will be stable or even unflagged unless we work hard towards that singular goal right now. So what I'm saying is chained loaders, whether the loader is off-thread, whether the API will be abstracted to deal with multi-realm and non-registry based API, and the translate hook all take preference in the path to a stable API to me, overy unifying ESM and CJS hooks. And that path is already very tenuous and unlikely, so that we should focus our combined efforts on the API stability first and foremost.
Implementing a
As mentioned above, this work can be done, but I would prefer to get the ground work done first. |
That all makes a lot of sense and I appreciate you describing it for me 🙂 I can start with pulling nodejs/node#18914 and getting that in a working state. |
Just to spark some discussion, here’s a wholly theoretical potential API that I could imagine being useful to me as a developer: import { registerHook } from 'module';
import { promises as fs, constants as fsConstants } from 'fs';
registerHook('beforeRead', async function automaticExtensionResolution (module) {
const extensions = ['', '.mjs', '.js', '.cjs'];
for (let i = 0; i < extensions.length; i++) {
const resolvedPathWithExtension = `${module.resolvedPath}${extensions[i]}`;
try {
await fs.access(resolvedPathWithExtension, fsConstants.R_OK);
module.originalResolvedPath = module.resolvedPath;
module.resolvedPath = resolvedPathWithExtension;
break;
} catch {}
}
return module;
}, 10); The new
In the first example, my Another example: import { registerHook } from 'module';
import CoffeeScript from 'coffeescript';
registerHook('afterRead', async function transpileCoffeeScript (module) {
if (/\.coffee$|\.litcoffee$|\.coffee\.md$/.test(module.resolvedPath)) {
module.source = CoffeeScript.compile(module.source);
}
return module;
}, 10); This hook is registered after Node has loaded the file contents from disk ( And so on. I have no idea how close or far any of the above is from the actual implementation of the module machinery; hopefully it’s not so distant as to be useless. Most of the loader use cases in our README could be satisfied by an API like this:
Anyway this is just to start a discussion of what kind of public-facing API we would want, and the kind of use cases it would support. I’m not at all married to any of the above, I’m just hoping that we come up with something that has roughly the same versatility as this. |
Thanks for this write-up @GeoffreyBooth! Some thoughts to add to the discussion: To me this looks like a transformer architecture, which exposes the entire in-progress module object to each hook, as opposed to the current This api also doesn't allow for a loader to prevent other loaders from acting, which the wip multiple loader implementation at nodejs/node#18914 does. I don't think that's a bad thing, and I would be interested in hearing what people think on that front. I'm not sure about the optional priority parameter. I don't think loaders should know much about what other loaders are registered or be making decisions about which order they're executed in. The user controls the order by choosing the order in which they register the loaders. |
These are all good points. I would err on the side of exposing a lot of surface area, though, as that’s what users are used to from CommonJS. A lot of the power of things like stubs are because the surface area is huge. In particular, I think we do want to allow reading a new file to override the old, or at least modifying the loaded string (which of course could be modified by loading the contents of a new file); otherwise we can’t have transpilers, for example, or stubs that are conditional based on the contents of a file rather than just the name in the specifier. The priority option is just a convenience, so that the user doesn’t need to be careful about the order that they register hooks. One thing that I thought of after posting was to add the concept of package scope to this. A lot of loaders will only be useful in the app’s package scope, not the combined scope of the app plus all its dependencies. We probably want some easy way to limit the callbacks to just the package scope around |
On the |
Node could simply ignore any changes to “shouldn’t be modified” properties. That’s probably better than trying to lock them down or removing them from the |
So there is a lot to talk about on loaders. We have had multiple meetings discussing some design constraints to keep in mind. I think setting up another meeting just to review things from the past would be helpful. |
At the moment, from PnP's perspective:
|
the first one is already possible with our current design (not including cjs). the second one is interesting and should probably exist, but it is unlikely that a cjs version can be added without breaking some modules that are loaded by it. |
We're currently monkey-patching the I think for cjs it would be doable if the |
Part of my worry and reason why I feel we need to expand loader hooks as best we can for CJS is exactly that we don't guarantee this to work currently or in the future. Even if we cannot support C++ modules (the main problem with this FS patching approach that has been known since at least 2014 when I spoke on it) we can cover most situations and WASM at least can begin to replace some C++ module needs. I see this as a strong indication that we need to solve this somehow or provide some level of discretion for what is supported. |
We have a mostly stable design document, please feel free to comment or request edit access as needed, at https://docs.google.com/document/d/1J0zDFkwxojLXc36t2gcv1gZ-QnoTXSzK1O6mNAMlync/edit#heading=h.xzp5p5pt8hlq . The main contention is towards the bottom around potential implementations, but reading the things before then explain a lot of different ideas from historical threads and research over the past few years and have been summarized. |
Great to see work moving here! I really like the overall model, we maybe just have a few disagreements about the exact APIs. I’ve already statement my feedback in the doc, but will summarize it again here:
Most of the above is relatively superficial though - the core of the model seems good to me. (1) means having two-phase messaging with loaders, so is slightly architectural though. |
Per "separation". I agree there needs to be a "fetch"/"retrieve" hook of some kind, but not that resolve` should not be able to return a body. The problem you explain above is about passing data to parent loaders such as list of extensions, but does not seem to be fixed by separating loaders that I can tell. Per APIs, we can argue about which APIs to use but we should start making lists of what features are desirable rather than bike shedding without purpose. To that end I'd like to posit the following:
If that sounds fine, we can add constraints and a data structure to the design document. Overall, I do not think streaming is necessarily the best first pass given how little I expect it to be useful currently. I found Error stacks are able to be serialized properly, but it depends on what you are seeking from a debugging experience. They are a leak technically, but I do not consider them a fatal leak since a loader can throw their own object instead of a JS I would be wary about user actionability on these messages as |
As another example, consider a loader which applies a security policy that only certain modules on the file system can be loaded. This loader is added last in the chain, and basically provides a filter on the resolver, throwing for resolutions that are not permitted. The issue then with the model is that by the time the permission loader throws, the file might have already been opened by the underlying parent loader. This is the sort of separation of concerns that concerns me.
The basic requirement is being able to determine what buffer to execute, and how to execute it in the module system. The simplest interface that captures this requirement is - interface Output {
source: String | Buffer;
format: 'wasm' | 'module' | 'addon' | 'commonjs' | 'json'
} The above could be extended to support streams by supporting source as an async iterator as well, but I'm certainly not pushing streams support yet either.
Thanks for the clarifications re error stacks, we should just make sure we are aware of the debugging experience implications and properly support these workflows. Just getting the sync stack copied across as a string should be fine I guess. |
This has been something I've been thinking about as well. I think for a loader to handle both ESM and CommonJS files, it also needs to hook into Another thing we might want to consider is making the ESM loader hooks simply the loader hooks, to apply to both CommonJS and ESM, finally replacing the deprecated |
@GeoffreyBooth the problem with unifying the hooks is that Right now, when a user installs ts-node's ESM hooks, we know they can't have installed any other hooks. Depending how you think about, we are responsible for implementing some basic features that would ideally be implemented by third-party ESM hooks. If a third-party library installs |
Hmmmm. That is a problem. @jkrems or @weswigham is there any hope here, or would hooks that work with CommonJS run into the same issues that Wes’ “ I suppose one could write synchronous hooks? CoffeeScript transpilation is synchronous, for example; I dunno if TypeScript’s is? Obviously there couldn’t be a sync HTTP loader but there are plenty of useful loader cases that don’t need async. Anyway applying these hooks to CommonJS as well is a long-term maybe goal. AFAIK CommonJS wasn’t designed to be hookable/customizable, and the current ways people do it are monkey-patched hacks more or less. Perhaps it’s best left as is. |
@GeoffreyBooth one of the motivations of moving loading off thread/isolated was we have proof of concept that we can use a SharedArrayBuffer to sleep the main thread while the loader does async tasks but looks as if it were blocking via |
What's the benefit of moving to a threaded loader in terms of user story? Do we get improved performance? Improved security? New functionality that wouldn't have been possible in a single-threaded loader? |
Decreased resource/memory usage is the most expected outcome, especially with multiple contexts or threads that all need hooks. If the APIs for hooks aren't designed to run in isolation, this would be hard or impossible to achieve in userland (since none of the hook implementations would be compatible with those assumptions by default). It's much more realistic to run one TSC or babel instance than 100 per process. Another aspect is increased stability: The loader can't accidentally be broken by or break the application, at least not as easily as when they run in the same global scope. |
So we have 2 users effectively:
I think for application code not much would be visibly affected and/or varies to much per application to state much about it. I think for loader code there are a lot of pro/con to consider but I believe the pros outweigh the cons significantly. In theory, a portion of the pros could be left to users to do things like spin up workers inside of their own loader, but a variety of things are not feasible or have higher advantages if done by the runtime itself. The overall story is a bit complicated in terms of performance but overall I'd say for simple workflows that putting them on a thread is worse, but for complex workflows and applications it is better.
Security has a bunch of discussions about what your security model is but for some simple statements that aren't really controversial:
Per features that if loaders are not on same thread:
I do strongly think we need to solve the object reference problem, but we haven't really spent time looking into it to my knowledge. Even if we don't move off thread we likely need to solve the object reference problem in order to allow a solution for saving variables in |
Another use-case for custom loaders. Stubbing CSS-imports in UI libraries. Currently there are similar solutions for CommonJS
I tried to implement the same functionality using the new Node.js loader API: https://gist.github.com/just-boris/b07d66e306c94cf42db41b010231fbbf Works well for such cases. |
solved the "machine-level store and symlinked node_modules" problem |
Hi i am defining a new packaging module standard it is in Fact a ECMAScript Module based standard for packaging modules to get reused so i want to define some fundamentals and conventions on property to express importent meta like integrity related content hashes and the used hash algorithm and verification algorithm. how can i pull that off chained with the loader hooks api to maybe directly support the needed functions i call it web-module standard it is designed to form a world wide web scale shared p2p distributed content addressed module build cache and module based package system so Web 4.0 compose apps out of modules without even the need for nodejs any web platform works for that. and we can directly link remote contexts then no roundtrips via packaging are needed if we agree on a module standard that can get directly used even when the module is relativ Zero Trust. like it is with any npm package. what do you think? |
Closing we we have https://github.com/nodejs/loaders now devoted to this. |
Hooking into the dependency loading steps in Nodejs should be easy, efficient, and reliable across CJS+ESM. Loader hooks would allow for developers to make systematic changes to dependency loading without breaking other systems.
It looks like discussion on this topic has died down, but I'm really interested in loader hooks and would be excited to work on an implementation! There's of prior discussion to parse through, and with this issue I'm hoping to reignite discussion and to create a place for feedback.
Some of that prior discussion:
require()
edit (mylesborins)
here is a link to the design doc
https://docs.google.com/document/d/1J0zDFkwxojLXc36t2gcv1gZ-QnoTXSzK1O6mNAMlync/edit#heading=h.xzp5p5pt8hlq
The text was updated successfully, but these errors were encountered: