You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Dec 15, 2022. It is now read-only.
I've been investigating recently why first-mate takes so long to tokenize files with very long lines. For reference, here's the current performance (in milliseconds):
September 2015: alexandrudima fixes the caching mechanism, which is published in oniguruma@4.2.4 and drops tokenization time down to slightly less than double that of jquery.js. The artificial line limit is kept in place in Atom (it doesn't look like anyone noticed the PR).
December 2015: alexandrudima fundamentally changes the caching mechanism. Unfortunately I cannot give benchmark changes here as he is now using vscode-textmate instead of first-mate for benchmarking, but it appears to provide a 30x speedup for files containing multibyte characters. As this is a breaking change, it is published as oniguruma@6. first-mate is kept at oniguruma@5.
August 2016: first-mate is upgraded to oniguruma@6 without implementing the necessary changes to enable caching. Result: we are back to insane tokenization times for minified files.
In order to enable caching, it appears that we need to send Oniguruma an OnigString of the line we want to tokenize, rather than a JavaScript String. Unfortunately, I have thus far been unable to make this work, as I get differing results depending on whether I pass in an OnigString or a String.
I've been investigating recently why first-mate takes so long to tokenize files with very long lines. For reference, here's the current performance (in milliseconds):
As you can see, it takes around 23 minutes to fully tokenize
jquery.min.js
, which is absolutely unacceptable.It turns out the reason for this is that we haven't been utilizing the caching that Oniguruma offers. Here's a breakdown of the history:
node-oniguruma
is created, seemingly without any caching mechanisms.jquery.js
. The artificial line limit is kept in place in Atom (it doesn't look like anyone noticed the PR).In order to enable caching, it appears that we need to send Oniguruma an
OnigString
of the line we want to tokenize, rather than a JavaScript String. Unfortunately, I have thus far been unable to make this work, as I get differing results depending on whether I pass in an OnigString or a String./cc: @nathansobo
The text was updated successfully, but these errors were encountered: