Copying an RE2 without recompiling / how best to replace inline regex literals #162
-
When you create a new RE2 from and old one, as in the README: var re1 = new RE2(/ab*/ig); // from a RegExp object
var re2 = new RE2(re1); // from another RE2 object does that still recompile the regex, or is it a very cheap operation? The reason I ask is that I'm trying to find the best way to replace an inline regex literal with RE2. To avoid the cost of recompiling the regex every execution, I can define it as a top-level constant. But then if var re = new RE2(...);
function foo() {
new RE2(re).test(...);
} would be a pattern that avoids both the mutability and the recompilation. If not (or even if so) is there a better approach? Thanks in advance. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
It is a good question. Creating a RE2 object from a different RE2 object copies all properties understood by a constructor ( But let me do some performance testing and get back to you. |
Beta Was this translation helpful? Give feedback.
-
Finally, I got to measure, and copying an RE2 object is faster because it saves on internal translations: const re = /A(B|C+)+D/,
re2 = new RE2(re);
new RE2('A(B|C+)+D');
new RE2(re);
new RE2(re2); // the fastest out of these three If you can safely reuse then the code like that can be the fastest: const lastIndex = re2.lastIndex;
re2.exec(string);
re2.lastIndex = lastIndex; Raw dataThe test: import show from 'nano-bench/show.js';
import RE2 from 're2';
const re = /A(B|C+)+D/,
re2 = new RE2(re),
string = 'ACCCCCCCCCCCCCCCCX';
console.log('RegExp:');
await show({
'regexp-inline': n => {
for (let i = 0; i < n; ++i) /A(B|C+)+D/.test(string);
},
'regexp-copy': n => {
for (let i = 0; i < n; ++i) new RegExp(re).test(string);
},
'regexp-reuse': n => {
for (let i = 0; i < n; ++i) {
const lastIndex = re.lastIndex;
re.test(string);
re.lastIndex = lastIndex;
}
}
});
console.log('\nRE2:');
await show({
're2-inline': n => {
for (let i = 0; i < n; ++i) new RE2('A(B|C+)+D').test(string);
},
're2-copy-regexp': n => {
for (let i = 0; i < n; ++i) new RE2(re).test(string);
},
're2-copy-re2': n => {
for (let i = 0; i < n; ++i) new RE2(re2).test(string);
},
're2-reuse': n => {
for (let i = 0; i < n; ++i) {
const lastIndex = re2.lastIndex;
re2.test(string);
re2.lastIndex = lastIndex;
}
}
}); I used an "evil" regular expression and a specially tailored string to give Results: RegExp:
measuring (confidence interval: 95%, series: 100, bootstrap samples: 1,000) ...
"regexp-inline": median 260.9μs +10.1μs -5.5μs for 100 iterations in 100 series
"regexp-copy": median 259.3μs +9.6μs -4.4μs for 100 iterations in 100 series
"regexp-reuse": median 260.9μs +7.2μs -5.4μs for 100 iterations in 100 series
The difference is STATISTICALLY SIGNIFICANT!
Statistically significant difference between groups:
"regexp-inline" and "regexp-copy"
"regexp-copy" and "regexp-reuse"
RE2:
measuring (confidence interval: 95%, series: 100, bootstrap samples: 1,000) ...
"re2-inline": median 4.99μs +0.29μs -0.19μs for 5,000 iterations in 100 series
"re2-copy-regexp": median 4.95μs +0.31μs -0.20μs for 5,000 iterations in 100 series
"re2-copy-re2": median 4.86μs +0.27μs -0.21μs for 5,000 iterations in 100 series
"re2-reuse": median 247.6ns +4.5ns -1.9ns for 100k iterations in 100 series
The difference is STATISTICALLY SIGNIFICANT!
Statistically significant difference between groups:
"re2-inline" and "re2-copy-re2"
"re2-inline" and "re2-reuse"
"re2-copy-regexp" and "re2-copy-re2"
"re2-copy-regexp" and "re2-reuse"
"re2-copy-re2" and "re2-reuse" Notice the difference between "μs" and "ns". When using trivial regular expressions and a simple string |
Beta Was this translation helpful? Give feedback.
Finally, I got to measure, and copying an RE2 object is faster because it saves on internal translations:
If you can safely reuse then the code like that can be the fastest:
Raw data
The test: