-
-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance issue with long lines and repeated indexOf('\n')
searches
#298
Comments
Thank you! That's a beautifully investigated and reported bug. I'm pretty sure that the lexer's So as a first step, I added caching to @MikeWeller Would you be able to verify those results on your machine by testing the current head of the |
I'll test it out soon. One thing to be careful of:
I think the cached |
I can confirm these changes show a similar improvement to what I reported with my "hacks". |
Good catch. Took me a while to figure out why the stream tests weren't complaining about this, and it's because @MikeWeller, please close this issue if you think that the performance issue is solved by these changes? |
Problem solved, thanks for the quick turnaround! |
Describe the bug
While parsing long lines with no line breaks, an excessive amount of time is spent in the lexer performing repeated
this.buffer.indexOf('\n')
operations.We are parsing some very large (1MB+) lines of minified YAML containing the "flow" style of arrays/maps. This is taking far too long than it should.
To Reproduce
Expected behaviour
This should parse in just a second or so (on my machine). It actually takes 28s.
Versions (please complete the following information):
v2.0.0-7
Additional context
If I run the above code and profile with the Chrome profiler I see ~28s spent in 'parseDocument':
The majority of this time is spent in
parseFlowCollection
Specifically we spent a lot of time in two
indexOf
calls ingetLine
andparseQuotedScalar
:Note: the lines are offset by one due to some issue on my end
What appears to be happening is that every time we lex one of the
{}
flow collections, we are searching for the final newline on the line which is millions of characters ahead. We do this for every flow collection on the line and don't cache the result of the search. This results in many expensiveindexOf
searches.I hacked some memoization/caching of the newline search into the code (I won't share because it was very ugly and just to verify the performance issue) and after my changes the total parse time goes to just ~1s (from 28s):
And the time spent in 'parseFlowCollection' is now just 177ms (down from 26s):
I'm guessing there are other places in the code that may have similar
indexOf
type searches, but I happened to hit this due toparseFlowCollection
.I'm not really sure the best solution to this, but I'm happy to work on a fix if we can agree on a best approach/design/whatever. Thanks!
The text was updated successfully, but these errors were encountered: