Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Potential optimizations #4

Closed
wants to merge 12 commits into from

Conversation

RuslanZavacky
Copy link

@RuslanZavacky RuslanZavacky commented Feb 1, 2022

I am trying to find ways if possible to improve performance in our monorepo, and debugging running ember s got me to the fs-updater, merge-trees and broccoli-merge-trees.

Here are some data for the context:

Show screenshot of debug and functions time Screenshot 2022-01-26 at 12 10 18 Screenshot 2022-01-26 at 12 09 15

From these screenshots, couple things that are interesting:

  1. A lot of time spent in mergeDirectories (hence my investigation)
  2. 30% of time spent in GC - which is due to the size of project and a lot of recursion to read directories

This change that I've identified should potentially help minimize fs.stat calls, and on our monorepo, it does seem like it is a lot, here are some numbers:

  { lstat: 159039, dirent: 363220 }

dirent number shows how many times instead of fs.stat we were able to derive information required to determine directory/file/symbolic link from the additional info from the fs.readdir. This is particularly important on the systems with software that is actively monitor/backup all the changes (in enterprises).

I'd greatly appreciate any feedback/hints/help to figure out if there is more that I can look at to get much better results and optimizations in general.

I know there was this effort broccolijs/broccoli-merge-trees#43 which potentially can lead to quite a good improvement on bigger projects as well.

Reference to other PRs that might improve performance in general:
broccolijs/node-merge-trees#6
broccolijs/broccoli-persistent-filter#188
https://github.com/broccolijs/fs-merger

Ruslan Zavacky added 2 commits February 1, 2022 09:38
BREAKING CHANGE: as it is a major node bump, requires a breaking change
`fs.readdirSync(p, { withFileTypes: true })` - return the Dirent object, that conveniently has isDirectory, isFile, isSymbolic link methods, which in return allow us not to do additional lstat when deciding which class to create.

In some cases, File class do require stats to be calculated, which will happen as soon as needed and will be cached.
@@ -226,7 +298,7 @@ function fileStatsEqual(a, b) {
// Note that stats.mtimeMs is only available as of Node 9
return (
a.ino === b.ino &&
a.mtime.getTime() === b.mtime.getTime() &&
// a.mtime.getTime() === b.mtime.getTime() &&

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is/was this important?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NullVoxPopuli in real life it's really rare case where file is changed without it's changed size

}

const now = Date.now();
const isCacheValid = (this._cacheBuildAt + CACHE_TTL) > now;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how does this affect the dev-time reload of linked packages that are maybe ctrl+s'd rapidly?
Like, does this mean that when working on a library, we can only receive updates at quickest, every 5s?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now yes, it was more of an experiment and solution to our problem.. with this - we have issues in tests for the merge-trees, so need to figure out some other solution.. but to say, it does add a big speed boost, so caching is a must.. need to think how to pass it probably from the consuming packages.

throw new Error("File has unexpected type: " + p);
}
} else {
let stats = fs.lstatSync(p);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any difference in perf in sync vs async?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

likely yes (for SSD drives, because there is more file threads available), but this package is sync

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but for stat it's not clear - nodejs/node#38006

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants