Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to faster micromatch lib #332

Closed
AndyOGo opened this issue Apr 12, 2017 · 5 comments
Closed

Switch to faster micromatch lib #332

AndyOGo opened this issue Apr 12, 2017 · 5 comments

Comments

@AndyOGo
Copy link

AndyOGo commented Apr 12, 2017

Why switch to micromatch?

  • Native support for multiple glob patterns, no need for wrappers like multimatch
  • 10-55x faster and more performant than minimatch and multimatch. This is achieved through a combination of caching and regex optimization strategies, a fundamentally different approach than minimatch.
  • More extensive support for the Bash 4.3 specification
  • More complete extglob support
  • Extensive unit tests (approx. 1,300 tests). Minimatch fails many of the tests.

See: https://github.com/jonschlinkert/micromatch#why-switch-to-micromatch

@curbengh
Copy link

curbengh commented Oct 5, 2019

Check out https://github.com/micromatch/glob-fs and https://github.com/micromatch/bash-glob, both created by micromatch's author.

@isaacs
Copy link
Owner

isaacs commented Feb 28, 2023

No.

There are several cases where micromatch does not implement bash semantics as faithfully as this library requires. These are explained in detail on the readme. Also, the benefit of using minimatch is that I maintain it, so when it needs to change to support glob, that's very easy for me to do. Both of these would be dealbreakers anyway.

Furthermore, please be aware that benchmarks must be read very carefully, and can easily be misleading, even if the creator of those benchmarks isn't intending them to be.

The actual matching of glob is (a) not the slow part, and (b) not significantly improved by using micromatch's patterns rather than minimatch's. Micromatch creates patterns faster, but that happens exactly 1 time per pattern per glob operation. The n for a glob operation is the number of files it has to test, the number of directories that need to be read, etc. If a module can generate a million patterns per ms, that's not significantly different than a module that can only generate 1 pattern per ms, because the glob walk takes tens (or in large folder trees with complicated patterns, hundreds) of ms to complete, with variance well above ±1ms.

I had wondered if maybe minimatch's anchoring and dot-preventing lookaheads imposed a performance penalty. Towards that end, I've tested extensively how this impacts the actual glob performance, if there's some benefit to using the regular expressions generated by picomatch or micromatch. There really isn't. Even just running them in a tight loop calling pattern.test(path) over and over again, it's not measurably different. (There is often a difference but it's owing to noise; it is inconsistent and not statistically significant.)

Also, since the most common patterns, by far, in use in glob expressions are **, *.<some extension>, *, and some number of question marks followed by an extension, minimatch now has fast-path shortcuts for those patterns in particular. Meaning, in a very practical sense, using minimatch with glob is both more correct, and faster, than using micromatch or picomatch, even though those libraries are faster when used directly.

@isaacs isaacs closed this as completed Feb 28, 2023
@AndyOGo
Copy link
Author

AndyOGo commented Mar 1, 2023

Thank you for your answer @isaacs.

I wonder why these statements are made then?
May I ask you to provide concrete sources?

https://github.com/micromatch/micromatch#why-use-micromatch

How does micromatch handle security concerns?

  • Safe - Micromatch is not subject to DoS with brace patterns like minimatch and multimatch.

@isaacs
Copy link
Owner

isaacs commented Mar 1, 2023

I'm not really the one to ask about those statements, since I didn't make them ;)

The differences between Bash and micromatch/picomatch that I've found are explained in some detail in the readme on this repo, under the "Comparisons with Other JavaScript Glob Implementations" heading. (One point that's missed there, the POSIX standard classes are actually not as accurate as they are in minimatch, for example [[:alpha:]] won't match 'é'. But minimatch having support for those at all is a very recent development.) It passes the spec tests in bash 4.3's test suite, but there are clear examples where it doesn't behave the same as bash, as evidenced by the extensive test suite that minimatch and glob use, which submit bash 5 to a battery of extreme examples and verifies that they all are handled the same way. Minimatch isn't exactly 100% correct either, though; there are some edge cases that are just nearly impossible to do well in a regular expression, but it's much closer.

Micromatch is faster, as I said, but not in any way that matters for glob's use case. If you're evaluating a lot of patterns to essentially test once and throw away, especially if it's in a hot path, yeah, it'll be worthwhile to use micromatch or picomatch if the bash coherence isn't a dealbreaker for you (and there are a lot of cases like that). But glob doesn't do that, it parses the pattern one time in way less than 1ms (whether using micromatch or minimatch), then spends absolute ages by comparison doing a ton of file system operations.

The security thing is sort of weird to bring up. Minimatch had a redos vulnerability that was reported and swiftly fixed. I assume that micromatch would have done the same thing (or may even have done so in the past), as this is a pretty normal thing in any software library that processes strings.

So I'm not saying one is "better" than the other, just that the tradeoffs of using minimatch make it a much better, um, match for this library.

@AndyOGo
Copy link
Author

AndyOGo commented Mar 1, 2023

Alright.

Thank you for taking the time to give an thorough answer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants