-
Notifications
You must be signed in to change notification settings - Fork 29.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
deps: swap minimatch
with micromatch
#53547
Conversation
Review requested:
|
I’m not entirely certain why I am getting The tests pass if I do |
You always need to use full paths, starting from (That being said I haven't checked the source of the error, but rather the message itself, so it's possible a different issue is causing the error) |
First of all thanks for your contribution. cc @MoLow A few issues:
(As a side note https://github.com/jonschlinkert/maintainers-guide-to-staying-positive @jonschlinkert this is neat) |
That means the tests broke :]
Note that internally, Node.js does not use the same loader for its own code. JavaScript in core gets bundled into the Node executable itself - so not all CommonJS loader options are supported (check the code under lib/ for examples) |
@benjamingr Sure, he literally requested it though 😆 |
Ah lol, I didn't realize he left the original comment :D |
Ok, so am I right in thinking we are talking about the relative paths within # update-micromatch.sh
"$NODE" "$NPM" pkg set scripts.node-build="esbuild ./index.js --bundle --platform=node --outfile=index.js --allow-overwrite" |
if the motivation for this change is mainly performance can you please add a benchmark so we can compare the numbers? |
FWIW According to https://github.com/micromatch/picomatch#benchmarks:
|
Yes, those benchmarks exist, but I expect this to come first with benchmark files inside this project so we can use the same methodology and infrastructure we use for other benchmarks in the project |
minimatch
with picomatch
minimatch
with micromatch
Not only performance, but also other reasons, best outlined here: Why use micromatch? It’s basically a faster, more reliable superset of minimatch. My personal motivation is picomatch (and micro if/when I realised we should use micromatch here over pico, to maintain brace expansion… See also: library-comparisons.
@RedYetiDev See also micromatch benchmarks: # .makeRe star
micromatch x 2,232,802 ops/sec ±2.34% (89 runs sampled))
minimatch x 781,018 ops/sec ±6.74% (92 runs sampled))
# .makeRe star; dot=true
micromatch x 1,863,453 ops/sec ±0.74% (93 runs sampled)
minimatch x 723,105 ops/sec ±0.75% (93 runs sampled)
# .makeRe globstar
micromatch x 1,624,179 ops/sec ±2.22% (91 runs sampled)
minimatch x 1,117,230 ops/sec ±2.78% (86 runs sampled))
# .makeRe globstars
micromatch x 1,658,642 ops/sec ±0.86% (92 runs sampled)
minimatch x 741,224 ops/sec ±1.24% (89 runs sampled))
# .makeRe with leading star
micromatch x 1,525,014 ops/sec ±1.63% (90 runs sampled)
minimatch x 561,074 ops/sec ±3.07% (89 runs sampled)
# .makeRe - braces
micromatch x 172,478 ops/sec ±2.37% (78 runs sampled)
minimatch x 96,087 ops/sec ±2.34% (88 runs sampled)))
# .makeRe braces - range (expanded)
micromatch x 26,973 ops/sec ±0.84% (89 runs sampled)
minimatch x 3,023 ops/sec ±0.99% (90 runs sampled))
# .makeRe braces - range (compiled)
micromatch x 152,892 ops/sec ±1.67% (83 runs sampled)
minimatch x 992 ops/sec ±3.50% (89 runs sampled)d))
# .makeRe braces - nested ranges (expanded)
micromatch x 15,816 ops/sec ±13.05% (80 runs sampled)
minimatch x 2,953 ops/sec ±1.64% (91 runs sampled)
# .makeRe braces - nested ranges (compiled)
micromatch x 110,881 ops/sec ±1.85% (82 runs sampled)
minimatch x 1,008 ops/sec ±1.51% (91 runs sampled)
# .makeRe braces - set (compiled)
micromatch x 134,930 ops/sec ±3.54% (63 runs sampled))
minimatch x 43,242 ops/sec ±0.60% (93 runs sampled)
# .makeRe braces - nested sets (compiled)
micromatch x 94,455 ops/sec ±1.74% (69 runs sampled))
minimatch x 27,720 ops/sec ±1.84% (93 runs sampled))
@MoLow I’ll look into adding these as the last step of this PR… Presumably following writing-and-running-benchmarks. Also, looks like this will have to be rebased, and take into account #52881, which just landed… cc: @jonschlinkert |
@@ -133,7 +132,7 @@ class Pattern { | |||
isLast(isDirectory) { | |||
return this.indexes.has(this.last) || | |||
(this.at(-1) === '' && isDirectory && | |||
this.indexes.has(this.last - 1) && this.at(-2) === lazyMinimatch().GLOBSTAR); | |||
this.indexes.has(this.last - 1) && this.at(-2) === lazyMicromatch().GLOBSTAR); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think micromatch has a .GLOBSTAR
export.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think micromatch has a
.GLOBSTAR
export.
Yeah, do you think I should I add a definition somewhere to fs
constants, or in an upstream PR to micromatch? Something like…
const GLOBSTAR = Symbol('globstar **');
What do you think? @RedYetiDev @jonschlinkert
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure, I'm not a core collaborator, but https://github.com/nodejs/node/blob/main/doc/contributing/using-symbols.md may help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, do you think I should I add a definition somewhere to
fs
constants, or in an upstream PR to micromatch? Something like…
Let me know if you want me to help with anything
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a core collaborator, and these are my suggestions, but they aren't blocking.
(From here I'll leave the reviewing to the collaborators, I just wanted to get involved because this involved a change I recently landed)
lib/internal/fs/glob.js
Outdated
let micromatch; | ||
function lazyMicromatch() { | ||
micromatch ??= require('internal/deps/micromatch/index'); | ||
micromatch.GLOBSTAR = Symbol('globstar **'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't this be the redudant creation of a Symbol?
You are redefining it everytime we call Micromatch.
A better approach would be:
if (micromatch == null) {
micromatch = require('internal/deps/micromatch/index');
micromatch.GLOBSTAR = Symbol('globstar **');
}
Or maybe, define GLOBSTAR
somewhere else for use here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't this be the redudant creation of a Symbol? You are redefining it everytime we call Micromatch.
A better approach would be:
if (micromatch == null) { micromatch = require('internal/deps/micromatch/index'); micromatch.GLOBSTAR = Symbol('globstar **'); }
Maybe, or just ??=
.
Or maybe, define
GLOBSTAR
somewhere else for use here?
I thought about that, as a constant elsewhere. But not yet sure if it should instead be a PR to micro/picomatch, or even if it’s strictly necessary to expose as a Symbol
at all…
So the above implementation is most likely temporary, anyway.
Thank you!
This definitely has my blessing. I'm willing to put in extra time or add collaborators to make sure it's maintained. If there are specific things you want my help answering/resolving let me know. |
I have a few questions of my own, if you don't mind:
|
There's some context that seems to be missing in this conversation. PerformanceIt doesn't matter. Really, in this case, it does not rise to the level that anyone should care about the performance one way or another. Making something a million or even a billion times faster won't make a difference if that thing is less than 0.01% of the overall time spent. (And picomatch is not 1M times faster than minimatch, anyway.) As far as I know, glob pattern parsing is not directly exposed. It's only exposed in parsing the arguments to the test runner, and The benchmarks referenced here are for the The If the goal is to make fs.glob faster, then the algorithm needs to be changed to be more like that of If there's no benchmark showing a performance improvement for node (ie, not just a microbenchmark of a single arbitrary function pulled out of the API) then there is no performance improvement, and nothing to discuss. Correctness/CompletenessThere does not exist a 100% fully complete and correct "glob" implementation in JavaScript, for the simple reason that there is no 100% agreed-upon spec or definition of what exactly that means. Thus, it is important (as I argued in the fs.glob PR) for any given platform to precisely identify how its globs work, rather than leaving it up to users' intuitions. Minimatch (and really, glob) defines Bash 5.latest as the targeted reference implementation. There are a small number of cases where it cannot match Bash's behavior exactly, either due to performance considerations or just fundamental limitations of JavaScript regular expressions. Those cases are explicitly defined and tested for. I considered using picomatch or micromatch in the glob 7 rewrite. I abandoned the idea when it became clear at the time that there was no way to correctly support every combination of braces, extglobs, and globstar patterns, with identical semantics to bash, without rewriting it entirely anyway. So instead, I rewrote minimatch to support these things properly and expose the interfaces glob needed. If the goal of the project is to allow Do not assume that "the tests pass" means all edge cases are fully covered. The tests that node-glob and minimatch use to ensure coherence with Bash semantics are much more extensive than the glob tests in bash itself, for example. See also:
So does Minimatch.
Citation needed. I don't care about OSS rivalry; too many people use my code as it is. If micromatch or picomatch are a better fit for node's goals, then they can use those with my blessing, of course. Please present information fairly and with full context so that the project maintainers can make rational decisions. |
I'm sorry to hear that you feel that way. My goal is to focus on the technical aspects, not to debate. Instead, I'll just focus on the point I made. Minimatch is inefficient, starting with the smallest brace pattern, and it gets exponentially more inefficient with each character added to the string. Here is a simple "benchmark" that people can try for themselves with the latest minimatch, v9.0.4, and the current micromatch. const minimatch = require('minimatch');
const micromatch = require('..');
let start = Date.now();
console.log(micromatch.makeRe('foo/{1..1000000}/bar')); //=> 35 byte regex -> /^(?:foo[\\/][1-1000000][\\/]bar)$/
console.log(`micromatch.makeRe: ${Date.now() - start}`);
// micromatch.makeRe: 5ms
start = Date.now();
console.log(minimatch.makeRe('foo/{1..1000000}/bar')); //=> 16.9 MB regex
console.log(`minimatch.makeRe: ${Date.now() - start}`);
// minimatch.makeRe: 5099ms (~85s) The output from minimatch was too large to paste into the issue. If you add one more zero, minimatch freezes: const minimatch = require('minimatch');
const micromatch = require('..');
let start = Date.now();
console.log(micromatch.makeRe('foo/{1..10000000}/bar')); //=> 35 byte regex -> /^(?:foo[\\/][1-10000000][\\/]bar)$/
console.log(`micromatch.makeRe: ${Date.now() - start}`);
// micromatch.makeRe: 8ms
start = Date.now();
console.log(minimatch.makeRe('foo/{1..10000000}/bar'));
console.log(`minimatch.makeRe: ${Date.now() - start}`);
// FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
I agree, at least partially, with your perspective on this in a general sense. I've been frustrated with vulnerability-hunters like Snyk because they create contrived and very unlikely scenarios to demonstrate what they claim to be a vulnerability. Honestly, I had a vulnerability report from them on Enquirer, a prompting library for the terminal. What exactly is the attack vector? And why would anyone loop over a prompt 10,000 times? I think those cases are silly. But this one isn't, because I'm demonstrating that when minimatch receives valid input without any loops, trickery, or external contrived scenarios, it causes a JavaScript heap out of memory error due to ineffective mark-compacts near the heap limit resulting in allocation failures. Anyone who actually sees the generated regular expressions firsthand will understand immediately what the problem is. Sure, we could lop some zeros off of the pattern, but that misses the point: Minimatch and micromatch take completely different approaches to generating patterns, and this is just one convenient example to demonstrate it.
No, it's not that much faster. There might even be some benchmarks where minimatch is faster if you want to do a PR to add them. Since you mentioned something about being outdated, here is minimatch@9.0.4 (latest AFAIK) against the current picomatch@4.0.2:
Again, minimatch freezes on the range expansion. Yes, that might represent a small number of cases, or be extremely unlikely to happen, but the benchmark isn't about the pattern. The point is to demonstrate how the underlying code was designed. I think we should continue exploring this. It seems there is no good reason not to, even if there are "minor" differences in performance. Small optimizations compound geometrically, especially in large dependency trees. |
In my opinion, this should first be opened as an issue, where the potential change can be discussed. @benjamingr @MoLow WDYT? Additionally, in my opinion, this is more complicated than a "drop in replacement", and I think that, if this is to be implemented, @jonschlinkert should be one to do it, as he knows the most about the micromatch. (I don't mean this in any way against @danielbayley, your work has been excellent, and truly appreciated, I just think that a maintainer's touch is helpful) |
I appreciate your perspective @RedYetiDev, I'm here, and offering my help so far as it's needed. I'm willing to do patches, or a major bump with whatever changes are necessary. But I probably wouldn't repeat the hard work that's already been done on this PR. Changes can be made, and commits can be squashed, and I'm happy to help, so long as the core team is open to exploring this. I |
@jonschlinkert but this benchmark is still testing a known-limited convenience method that node doesn't use or expose. makeRe is irrelevant. Show a meaningful difference in a benchmark of what node is actually doing, in the way it is doing it. Until and unless that is done, "performance" should not be a part of this conversation. Optimizing a function that accounts for virtually none of the overhead, is just polishing doorknobs. Optimizing a function it doesn't even call is even sillier. |
Honestly this just sounds like an emotionally charged conversation, and I don't think it's going to benefit anyone for me to get into a battle with you over this. The community deserves better than this attitude. If someone wants to talk about how we can make node more performant, I'm interested in that conversation. |
as I said, the actual benchmarks we should look at are benchmarks for |
Hi, everyone. I don't know the history here, but I'm really uncomfortable with all the talk about "lies" and so on and would really appreciate if we could focus on the content here. Getting back to the content: I basically agree that if there's no relevant benchmark for Node.js showing a performance boost for the way we use minimatch, no existing ReDoS or other vulnerability, and no bug that this change fixes, then this seems like a very large (and therefore dangerous) change for no clear benefit. It does seem very unlikely that this will affect Node.js performance in any measurable way. Microbenchmarks of a single API don't seem relevant. I won't oppose something like this if there is consensus among collaborators, but I'm not seeing that either. I appreciate the effort extended to make this happen, and I hope it's not too discouraging, but I don't expect this to land. The big thing that would change my mind would be a benchmark for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Making my -1 explicit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-1. I'd be down if this had relevant benchmarks of Node.js with the two modules, showing an improvement of even >=10%. So far I've not seen that, happy to reconsider if those do show up. Otherwise, I'm wary of introducing dependency changes that our end-users won't obviously benefit from.
This has two hard blocks (mcollina, bnb) and at least two soft blocks (MoLow, me) with no collaborators expressing enthusiasm. Given that, I'm going to close it. However, if I'm just being hasty and you're putting together benchmarks right now and you feel like a persuasive case can be made, feel free to re-open (if the GitHub interface lets you) or leave a comment requesting that it be re-opened and I or some other collaborator or triager will re-open it. Thanks again for the PR. I know it's frustrating when you do a lot of work and a bunch of people who didn't help much are all "Nah, sorry." But in this case, I'm not sure what to do about that. |
Given the close, I've removed some of the additional labels associated with the file changes. This is so finding this issue may be simpler. (Keeping Feel free to re-add labels |
As mentioned in #51912 (comment) by @jonschlinkert:
This PR aims to implement what should be a drop-in replacement of minimatch with micromatch, for reasons stated above.
This might be a
notable-change
?