Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fs.Dir.read() is very slow #29941

Closed
paulmillr opened this issue Oct 12, 2019 · 12 comments
Closed

fs.Dir.read() is very slow #29941

paulmillr opened this issue Oct 12, 2019 · 12 comments
Assignees
Labels
fs Issues and PRs related to the fs subsystem / file system. performance Issues and PRs related to the performance of Node.js.

Comments

@paulmillr
Copy link

paulmillr commented Oct 12, 2019

  • Version: 12.12.0
  • Platform: macOS catalina
  • Subsystem: FS

I maintain the popular readdirp module. Decided to try fs.opendir from node 12.12. The results are suboptimal: it's 3x slower to walk file trees when compared to fs.readdir.

To reproduce:

  1. Check out the gist: https://gist.github.com/paulmillr/4e3a4da2ad8e98cf0fe7dc819242532a
  2. By switching logic from if (opendir) to if (!opendir) I get 3x speedup, which means opendir is 3x slower than readdir.

Refs: #583, #29349

@addaleax addaleax added fs Issues and PRs related to the fs subsystem / file system. performance Issues and PRs related to the performance of Node.js. labels Oct 12, 2019
@addaleax
Copy link
Member

addaleax commented Oct 12, 2019

Fwiw, #29893 isn’t released yet and is going to improve performance significantly.

That being said, the big advantage of fs.opendir() is that it has a low upper limit on the memory necessary to read a directory, and not necessarily being faster than fs.readdir().

@paulmillr
Copy link
Author

I think it must have comparable performance (-+ 30%), because it's reasonable to assume the software doesn't know how big a directory is ahead of time. Would be especially useful for readdirp.

@addaleax
Copy link
Member

With that PR, I’m getting something closer to an 80 % perf difference.

One possibility that was suggested in it was making the number of entries read in one go configurable. I think that would bring this into acceptable parameters for you.

@paulmillr
Copy link
Author

Great!

@Fishrock123 Fishrock123 changed the title fs.opendir is very slow fs.Dir.read() is very slow Oct 12, 2019
@Fishrock123
Copy link
Contributor

Fishrock123 commented Oct 12, 2019

So, this will always be slower than fs.readdir(). Always, because to use it reasonably, you're going to need an async iterator (Promises) or an object stream (ugh).

You'd be able to get much closer by having a different but frankly weird/awful api: an async iteratable of sync iterables...

That being said that kind of subverts the point of the API. Sure, it should buffer to some extent because performance can reasonably be substantially improved without significant drawback. However, the large idea of such an API is to allow scheduled work on said directory entries to be started/return as iterator progresses, and to avoid such a large memory allocation on large directories.

@piyukore06
Copy link
Contributor

Would you like a PR for that?

@addaleax
Copy link
Member

@Fishrock123 We have a highWaterMark option for streams, so adding something similar here seems fine to me. In the end it just gives the user more control over the memory-vs-time tradeoff that’s involved here.

@piyukore06 Unless there’s a strong objection, I’d say go for it. :)

@Fishrock123
Copy link
Contributor

Oh, I wasn't clear but I certainly support being able to configure this. In fact, I was thinking about that as I was reviewing your PR. 😄

@Fishrock123
Copy link
Contributor

Fishrock123 commented Oct 15, 2019

@piyukore06 Please let us know if you need any help making that PR. It'l require a bit of C++, but only a little bit.

@piyukore06
Copy link
Contributor

Yes @Fishrock123 I feel a bit stuck, I will push my changes as soon as possible. Maybe you people could take a look and point me in right direction..

Thanks in advance :)

@piyukore06
Copy link
Contributor

@addaleax @Fishrock123 I don't think I completely get the whole building, compiling and testing process. It's taking more time than expected, so If anyone else wants to go ahead and finish it, please do. I will try and understand more about the ecosystem and try to find another issue to contribute to. Thank you for your patience.

@addaleax addaleax self-assigned this Oct 23, 2019
@cclauss cclauss added the macos Issues and PRs related to the macOS platform / OSX. label Oct 23, 2019
@Fishrock123 Fishrock123 removed the macos Issues and PRs related to the macOS platform / OSX. label Oct 24, 2019
addaleax added a commit to addaleax/node that referenced this issue Oct 25, 2019
Add an option that controls the size of the internal
buffer.

Fixes: nodejs#29941
targos pushed a commit that referenced this issue Nov 5, 2019
Add an option that controls the size of the internal
buffer.

Fixes: #29941

PR-URL: #30114
Reviewed-By: Richard Lau <riclau@uk.ibm.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
MylesBorins pushed a commit that referenced this issue Jan 12, 2020
Add an option that controls the size of the internal
buffer.

Fixes: #29941

PR-URL: #30114
Reviewed-By: Richard Lau <riclau@uk.ibm.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
@paulmillr
Copy link
Author

@piyukore06 I don't see any performance improvements in the test from first post. Basically no changes with node 13-latest and bufferSize: 1024 / 4096 / whatevs. BufferSize: 1 makes this slower, so the arg works.

BethGriggs pushed a commit that referenced this issue Feb 6, 2020
Add an option that controls the size of the internal
buffer.

Fixes: #29941

PR-URL: #30114
Reviewed-By: Richard Lau <riclau@uk.ibm.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fs Issues and PRs related to the fs subsystem / file system. performance Issues and PRs related to the performance of Node.js.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants