first-mate not taking advantage of caching in Oniguruma #93

winstliu · 2017-04-18T03:49:16Z

I've been investigating recently why first-mate takes so long to tokenize files with very long lines. For reference, here's the current performance (in milliseconds):

Tokenizing jQuery v2.0.3
1341

Tokenizing jQuery v2.0.3 minified
1403255

Tokenizing Bootstrap CSS v3.1.1
523

Tokenizing Bootstrap CSS v3.1.1 minified
20760

As you can see, it takes around 23 minutes to fully tokenize jquery.min.js, which is absolutely unacceptable.

It turns out the reason for this is that we haven't been utilizing the caching that Oniguruma offers. Here's a breakdown of the history:

March 2013: node-oniguruma is created, seemingly without any caching mechanisms.
March 2015: A faulty caching mechanism is added to Oniguruma.
September 2015: alexandrudima fixes the caching mechanism, which is published in oniguruma@4.2.4 and drops tokenization time down to slightly less than double that of jquery.js. The artificial line limit is kept in place in Atom (it doesn't look like anyone noticed the PR).
December 2015: alexandrudima fundamentally changes the caching mechanism. Unfortunately I cannot give benchmark changes here as he is now using vscode-textmate instead of first-mate for benchmarking, but it appears to provide a 30x speedup for files containing multibyte characters. As this is a breaking change, it is published as oniguruma@6. first-mate is kept at oniguruma@5.
August 2016: first-mate is upgraded to oniguruma@6 without implementing the necessary changes to enable caching. Result: we are back to insane tokenization times for minified files.

In order to enable caching, it appears that we need to send Oniguruma an OnigString of the line we want to tokenize, rather than a JavaScript String. Unfortunately, I have thus far been unable to make this work, as I get differing results depending on whether I pass in an OnigString or a String.

/cc: @nathansobo

The text was updated successfully, but these errors were encountered:

nathansobo · 2017-04-18T15:34:46Z

/cc @maxbrunsfeld @as-cii

winstliu · 2017-04-19T22:55:46Z

❤️ Thanks @maxbrunsfeld!

winstliu added bug performance labels Apr 18, 2017

maxbrunsfeld mentioned this issue Apr 18, 2017

Use caching in oniguruma #94

Merged

maxbrunsfeld closed this as completed in #94 Apr 19, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

first-mate not taking advantage of caching in Oniguruma #93

first-mate not taking advantage of caching in Oniguruma #93

winstliu commented Apr 18, 2017 •

edited

Loading

nathansobo commented Apr 18, 2017

winstliu commented Apr 19, 2017

first-mate not taking advantage of caching in Oniguruma #93

first-mate not taking advantage of caching in Oniguruma #93

Comments

winstliu commented Apr 18, 2017 • edited Loading

nathansobo commented Apr 18, 2017

winstliu commented Apr 19, 2017

winstliu commented Apr 18, 2017 •

edited

Loading