-
-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Perf improvement #68
Conversation
`['a/*', '!a/**']` is not really a relevant pattern to test as it wouldn't select any file. The pattern `['b/*', 'a/*', '!a/**']` still test the performance of ignore a full directory but make more realistic still selecting other files
For further improvements we could drop the
It seems that |
After doing a bit more test, it turns the current solution doesn't work properly for negative patterns in I added a failing test to demonstrate that.
If someone has a an idea how to solve this, let me know! Otherwise we'll have to revert to the old method. |
@pvdlg Just a rough idea: What if we combine both approaches, like this.
|
That sounds a good idea. I'm going to try that. For info I did some test with the current Not directly related, but I noticed that in most performance issue reported for XO, the performance issue come from what appear to be a bug in glob.sync('**/*.js', {ignore: ['**/node_modules/**']})
// => Works as expected, content of node_modules is excluded
glob.sync('./**/*.js', {ignore: ['**/node_modules/**']})
// => Content of node_modules is NOT excluded This problem doesn't happen with If we switch to
=> Long story short, keeping the code as is and just replacing |
Nah, we shouldn't map it. We need to do a major bump for this regardless.
I would be ok with that. |
Awesome work on investigating this @pvdlg :) |
@pvdlg Is there something worth merging from this PR? I could at least see the additional benchmarks and tests being useful. |
Yes we could merge the benchmark and the tests. |
Alright. Sounds good. |
Closing this, but would really appreciate the mentioned PR with test/bench improvements whenever/if you have time. |
See benchmark results
And a recap of the gains:
Performance improvements
Use fast-glob instead of node-glob.
Handle
.gitignore
differently: Parse the.gitignore
files and transform the content into globs, then revert them and add them to thepatterns
.This allow to glob and read
.gitignore
files only once per call (while it was done once per task before). In addition, it avoid to retrieve files that would be excluded after by the gitignoreFilter. This allow to take advantage offast-glob
optimizations.That should help (or probably fix) xojs/xo#65
Micro optimizations in the code (less branches, avoid creating unnecessary functions etc...).
Code changes
nodir
option is replaced byonlyFiles
(as it's the name used byfast-glob
). We could mapnodir
toonlyFiles
to avoid a breaking change. Not sure if it's useful though, as somenode-glob
options are not available yet infast-glob
so it would be a breaking anyway.When using
expandDir
, directories directly undercwd
are not included. See mrmlnc/fast-glob#47.glob('tmp/**') => ['tmp', 'tmp/a.tmp', 'tmp/b.tmp', 'tmp/c.tmp', 'tmp/d.tmp', 'tmp/e.tmp']
fast-glob('tmp/**') => ['tmp/a.tmp', 'tmp/b.tmp', 'tmp/c.tmp', 'tmp/d.tmp', 'tmp/e.tmp']
Not sure which one is the expected behavior though.
I implemented a workaround for mrmlnc/fast-glob#45 to avoid trailing slashes for directories.Fixed infast-glob@2.02
.I added a default list of ignore directory when globbing theFixed in.gitignore
files, to workaround mrmlnc/fast-glob#42. Even when the bug is fixed its probably to a good idea to keep the workaround as there is no reason to look for a.gitignore
file innode_modules
,bower_components
etc...mrmlnc/fast-glob#42 can also appear for users with directory with a large amount of files. We should probably wait for it to be fixed.
fast-glob2.0.2
.I dropped the globby.gitignore.sync([options]) for now. My understanding is that was used for checking is a file is ignored in a
.gitignored
. With the new implementation it would add additional code more or less disconnected to the main features. We could recommend to doglobby(['file_to_check'], {gitignore: true})
instead.If necessary I can add the functionality again.
Benchmark changes
I added a benchmark with
.gitignore
. The comparison is relevant only betweenglobby (working directory)
andglobby (upstream/master)
as other tools do not support reading the exclusions from a.gitignore
file.I changed the
negative globs (whole dir)
bench to use a more realistic pattern. The patter['a/*', '!a/**']
doesn't make much sense as the 2 pattern just contradict each others. It also happens to be a very specific edge case that onlynode-glob
optimize very well. See mrmlnc/fast-glob#45I replaced it with
['b/*', 'a/*', '!a/**']
which is a more realistic (less edgy) use case and give a more valuable information regarding the performance of each solution.TODO
.gitignore
files (exclude comments for example)globby.gitignore.sync([options])
?nodir
toonlyFiles
?