Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Time to find a non glob path increase with number of files in the same directory #60

Closed
pvdlg opened this issue Feb 14, 2018 · 8 comments
Assignees

Comments

@pvdlg
Copy link

pvdlg commented Feb 14, 2018

Environment

  • OS Version: OSX 10.12.6
  • Node.js Version: 9.50

Actual behavior

When searching for a single file (no glob parts in the path), the time increase with as the number of files in the same directory increase. With node-glob this time is constant.

Find 1 file in a directory with 1 file (ran 500 times):

  • node-glob: 48.868ms
  • fast-glob: 118.218ms

Find 1 file in a directory with 500 file (ran 500 times):

  • node-glob: 45.647ms
  • fast-glob: 1935.108ms

Find 1 file in a directory with 2000 file (ran 500 times):

  • node-glob: 48.535ms
  • fast-glob: 7538.522ms

Expected behavior

Globbing a static path (with no wildcards) is a something that can happen sometimes in modules that get the glob patterns from user input, as users can pass a list of file path (e.g, mymodule file1.js file2.js file3.js ).

It seems that node-glob has a way to detect those cases and optimize them. There is potential gain of perf in fast-glob by implementing the same type of optimization.

Steps to reproduce

Use the code below and change the TOTAL_FILES to measure the changes in the globbing time between fast-glob and node-glob.

Code sample

const fs = require('fs');
const path = require('path');
const glob = require('glob');
const fg = require('fast-glob');

const TOTAL_FILES = 2000;

fs.mkdirSync('test')
for (let i = 0; i < TOTAL_FILES; i++) {
  const filepath = path.join('test', `file-${i}`);
  fs.writeFileSync(filepath, '');
}

console.time('node-glob');
for (let i = 0; i < 500; i++) {
  glob.sync('test/file-1')
}
console.timeEnd('node-glob');

console.time('fast-glob');
for (let i = 0; i < 500; i++) {
  fg.sync('test/file-1')
}
console.timeEnd('fast-glob');
@mrmlnc
Copy link
Owner

mrmlnc commented Feb 14, 2018

Hello, @pvdlg,

Good catch! Yeap, this is due to the fact that we first try to read the directory, and then try to apply patterns.

So we need fix this behaviour. Thanks for reporting!

@mrmlnc
Copy link
Owner

mrmlnc commented Feb 14, 2018

Just a few thoughts.

We can extract only constant patterns from all patterns, for example, using a is-glob package and run them as a separate task with different types (something like glob and regular). This requires changes in the task manager, creating adapters (glob, regular) and some magic in the base reader class. But this is first view.

@skarfacegc
Copy link

Would a check for file existence on each glob work?

const strippedGlobs = []
const results = [];
globs.forEach(glob =>{
  fs.access(glob, fs.constants.R_OK, err => {

    // file doesn't exist, add to the list of globs
    if(err){
      strippedGlobs.push(glob);
    } else {
      // file does exist, add to the result set
      results.push(glob);
    }
  })
})

@mrmlnc
Copy link
Owner

mrmlnc commented Feb 15, 2018

@skarfacegc,

Yeap, good idea!

I think it's possible. Now the task manager doesn't know anything about the File System and i think he shouldn't know anything about it. We can create a new manager, provider or something and check whether the pattern exists on the file system. We need to think about how to solve this problem correctly.

I'll invest time to this issue in the near future.

@skarfacegc
Copy link

If you can point me in the right direction I can take a look as well. I looked around but I couldn't wrap my head around how all the pieces fit together.

@pvdlg
Copy link
Author

pvdlg commented Feb 15, 2018

By checking for each string passed in the patterns wouldn't we loose performances for cases in which the pattern is an actual glob?

Passing a non glob path is still less common than passing a glob.
Maybe we should still use is-glob before testing on the FS.

@mrmlnc
Copy link
Owner

mrmlnc commented Feb 15, 2018

I prefer solution with is-glob with strict mode. In this case we needs have a separated tasks for static and dynamic patterns (just like the dynamic property in the ITask interface). Then, when processing a task, simply pass them to the correct adapters (fs and readdir-enhanced).

I've already started working on this issue.

@mrmlnc
Copy link
Owner

mrmlnc commented Feb 25, 2018

Must be fixed by fast-glob@2.1.0. Please, read release notes for more details.

Thanks for reporting ❤️!

@mrmlnc mrmlnc closed this as completed Feb 25, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants