Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sharing package.json and stat cache between CJS and ESM loaders #30674

Closed
guybedford opened this issue Nov 26, 2019 · 11 comments
Closed

Sharing package.json and stat cache between CJS and ESM loaders #30674

guybedford opened this issue Nov 26, 2019 · 11 comments
Labels
esm Issues and PRs related to the ECMAScript Modules implementation. help wanted Issues that need assistance from volunteers or PRs that need help to proceed.

Comments

@guybedford
Copy link
Contributor

Currently the ES module resolver and CJS resolver do not share their stat cache or package configuration cache.

Sharing these caches would significantly improve boot performance of applications that contain mixed CommonJS and ESM (currently very few, but soon to be most, we hope!)

I wanted to create this issue to track that optimization work.

@guybedford guybedford added esm Issues and PRs related to the ECMAScript Modules implementation. help wanted Issues that need assistance from volunteers or PRs that need help to proceed. labels Nov 26, 2019
@juanarbol
Copy link
Member

Could you explainme a bit better how cache thing works? I've read something on cjs loader, but no understood very well.

@shackijj
Copy link
Contributor

shackijj commented Apr 27, 2020

Currently the ES module resolver and CJS resolver do not share their stat cache or package configuration cache.

@guybedford Should the cache sharing be implemented in a scope of this issue? If yes, I'd be glad to help.

@guybedford
Copy link
Contributor Author

@shackijj help would be amazing here. The work has already been simplified by the fact that both resolvers are now in JS wheras previously the ESM resolver was C++.

Should the cache sharing be implemented in a scope of this issue?

Can you clarify the question here a bit more?

@shackijj
Copy link
Contributor

@guybedford
In the original message you wrote:

I wanted to create this issue to track that optimization work.

So, I wanted to clarify if we need to make the optimization (write some code) or this issue was created for tracking purpose only (e.g. after a certain release we should check that performance got better)

I hope this makes sense.

@shackijj
Copy link
Contributor

shackijj commented Apr 28, 2020

Spent some time on researching the code base and thinking of the implementation. The diagram below represents the way we can make it. @guybedford could you please share thoughts about it?
If it's OK, please assign the issue to me and I will work on that.
Screenshot 2020-04-28 at 09 04 19

@guybedford
Copy link
Contributor Author

@shackijj there's no other optimizations or blockers here, this issue is directly for this work of joining the caches.

Glad to hear you managed to take a look at the codebase already. That looks like a great summary. I would suggest tackling the package config cache and stat cache separately either as separate PRs or at least as separate commits. This will also help me for review in keeping the diffs straightforward.

Joining the package cache should be as simple as calling the package cache function https://github.com/nodejs/node/blob/master/lib/internal/modules/cjs/loader.js#L279 from the ESM loader. Note it is a slightly different API to what the ESM loader users, but refactoring the APIs to be similar is an important simplification as part of this work.

As we generalize these routines between both loaders we may also want to move them out to a shared resolution file as well in the process. Focusing on keeping the diff down and ensuring simplifications in the refactoring would be great to see.

Let me know if I can help further at all here, otherwise would be great to see what you come up with.

shackijj added a commit to shackijj/node that referenced this issue May 10, 2020
add test for ERR_INVALID_PACKAGE_CONFIG error

Refs: nodejs#30674
shackijj added a commit to shackijj/node that referenced this issue May 11, 2020
test/fixtures/node_modules/invalid-pjson/package.json

Refs: nodejs#30674
shackijj added a commit to shackijj/node that referenced this issue May 18, 2020
shackijj added a commit to shackijj/node that referenced this issue May 20, 2020
GeoffreyBooth pushed a commit that referenced this issue May 24, 2020
Refs: #30674

PR-URL: #33229
Reviewed-By: Guy Bedford <guybedford@gmail.com>
Reviewed-By: Geoffrey Booth <webmaster@geoffreybooth.com>
@shackijj
Copy link
Contributor

shackijj commented Jun 5, 2020

will start working on stat cache next week.

@shackijj
Copy link
Contributor

shackijj commented Jun 14, 2020

I spent some time researching how the stat cache works now.

There is no a stat cache in ESM loader. So, adding it there won't be a problem.

In CJS loader, the stat cache's lifetime is limited. It seems to me that the cache is used during a single requre call and then it's cleared.

if (requireDepth === 0) statCache = null;

As to share the cache between CJS and ESM loaders and make it work effectively, we probably need to remove the cache clearing. I'm not sure if it's a good idea because a stat cache entries will be stored in RAM all the time while node process is running.

@guybedford @addaleax @BridgeAR Why the CJS's stat cache is cleared? To avoid memory leak?
What do you think about removing the cache clearing and sharing the cache with ESM loader?

@addaleax
Copy link
Member

@shackjjj I don’t know why it’s cleared myself, but yes, I would assume that there’s a memory consumption aspect to it. It’s probably also relevant for accounting for file system modifications done by the running program itself – those can affect the results of stat() calls, so it’s not reasonable to assume that the stat cache remains valid forever. (Technically, it’s also not valid to assume that it remains valid for the duration of the outermost require() call either.)

@guybedford
Copy link
Contributor Author

The stat cache was only added relatively recently and yes I believe it was short-lived to try to reduce the scope for errors as far as I'm aware given that Node.js never previously had anything like this and liveness with the file system is expected between separate module loads.

Given this nature of the stat cache it likely doesn't actually make sense to share unless we make the separate decision to use a persistent stat cache in Node.js.

Personally I would be quite weary of a persistent stat cache, since it might preclude some dynamic package management use cases. Perhaps we should at least consider this with some input from package managers like Yarn and Pnpm maintainers.

A similar per-request stat cache technique may or may not be a possible performance improvement to the ES module resolution. Perhaps this issue should be closed for more general profiler-based performance work on the loader further?

codebytere pushed a commit that referenced this issue Jun 18, 2020
Refs: #30674

PR-URL: #33229
Reviewed-By: Guy Bedford <guybedford@gmail.com>
Reviewed-By: Geoffrey Booth <webmaster@geoffreybooth.com>
codebytere pushed a commit that referenced this issue Jun 30, 2020
Refs: #30674

PR-URL: #33229
Reviewed-By: Guy Bedford <guybedford@gmail.com>
Reviewed-By: Geoffrey Booth <webmaster@geoffreybooth.com>
codebytere pushed a commit that referenced this issue Jul 8, 2020
Refs: #30674

PR-URL: #33229
Reviewed-By: Guy Bedford <guybedford@gmail.com>
Reviewed-By: Geoffrey Booth <webmaster@geoffreybooth.com>
@guybedford
Copy link
Contributor Author

This was implemented by @shackijj in #33229.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
esm Issues and PRs related to the ECMAScript Modules implementation. help wanted Issues that need assistance from volunteers or PRs that need help to proceed.
Projects
None yet
Development

No branches or pull requests

4 participants