Caching lookaheads for speed gains #1822
Replies: 4 comments
-
I'm not sure how you can capture a look ahead to cache it. |
Beta Was this translation helpful? Give feedback.
-
It seems you can directly capture a positive lookahead `(?=(capture)), but not a negative one, since by definition it will be avoided. So... Could we maybe swap the negative lookaheads to positive? Or... just another normal capture group? If an interrupter is found, the paragraph token is just the first part of the regex, and the next token will be the second part. If not found, it's just a normal uninterrupted paragraph token. ...Something like this?
I can already see some potential flaws here (what if the third paragraph has an interrupter), but maybe it's a starting point.... Or.... we could just limit the paragraph regex to one newline at a time, and then in the tokenizer if there are multiple paragraph tokens right next to each other, group them together? |
Beta Was this translation helpful? Give feedback.
-
The Lexer groups together text tokens in a similar way. https://github.com/markedjs/marked/blob/master/src/Lexer.js#L237 |
Beta Was this translation helpful? Give feedback.
-
FYI: redesigning the regex to capture the interrupters and immediately push them to the token array doesn't seem to give any noticeable speedup. https://github.com/calculuschild/marked/tree/refactorParagraphs Just for fun, going to try out the other method of tokenizing paragraphs individually. |
Beta Was this translation helpful? Give feedback.
-
Running a rather large markdown file through the demo page and looking at the Google Inspector profiler, I noticed a lot of time spent on the paragraph regex. Given that paragraphs are probably one of the most common elements in a typical document, I wonder if we can speed this up...
The regex relies on a lot of lookaheads to make sure it isn't interrupted by various block elements. In the case that it does detect one of those items, does it make sense to cache that result and immediately apply it as the next token? Would that even give a noticeable boost at all?
Beta Was this translation helpful? Give feedback.
All reactions