-
Notifications
You must be signed in to change notification settings - Fork 57
Atom hangs on 100% if given a particular expression #70
Comments
TL;DR: Regular expression matching is slow, hard, and horribly complex behind the scenes. Oh, and you found a potential malicious regular expression resulting from this complexity. You have discovered a potential malicious regular expression. That file would be a partial match of all of these, with multiple matches for each (note that
With each of these, there's a lot of loops involved that are hard to optimize away. Here's a quick diagram of just the last one as a DFA, the simplest by far, and excluding captures:
(Captures only make this more complicated, so that's why I omitted them.) And for your particular source, just trace it with that flow chart. See how long it takes each time. Also, try tracing it by testing your word regex I'll also note that this is far more optimized than what you'd get out of most engines: they usually just perform basic optimizations and use either on the fly code generation or just generate a tree as an NFA (theoretically equivalent to an DFA), since it takes linear time to create the NFA and polynomial time to match worst case, compared to the exponential time and memory best and worst case to go from the NFA to the DFA, just for linear time matching. (My above example is an NFA.) Or, long story short, regular expression matching is slow, hard, and horribly complex behind the scenes. |
First of all, I know damn well GitHub won't make this look as good as it does with my ligature-endowed editor font, but that didn't stop me from sprucing up your diagram with box-drawing glyphs:
Frivolous crap aside, I'm well-aware of ReDoS (and its slightly less dread-inspiring namesake, MS-DOS). I'm afraid your answer doesn't pinpoint exactly which code in First-Mate is at fault, though. |
Also, I swear wherever |
@Alhadis as a maintainer of a grammar package, I encounter a lot of issues regarding to this and some other packages so I follow everything that is going on at certain repositories. As a person who has general interest in formal languages and parsing, I also appreciate a good automaton from time to time 😄. My two cents on this issue is that it seems more like an implementation limitation that regex craftsmen and women have to take into account. Textmate grammars generally benefit from shorter-matching expressions however that's not always possible if the language is rather context sensitive. But if that expression is fast enough elsewhere (other regex engines), then it might be an issue on oniguruma's side that is worth looking into. You might be able to work around that by matching the general case first, effectively limiting length of the input and then working on the shorter case with greater confidence, but that's what first-mate somewhat does already on the lines. |
This is somewhat difficult to explain, so please bear with me...
Last night while working on a grammar, I added a pattern-block that made Atom hang at 100% CPU usage. After force-quitting the app twice, I scrutinised the last expression for anything that might've triggered infinite recursion. I was drawing blanks until I narrowed it down from this...
... to this:
All of a sudden, Atom wasn't hanging whenever I switched to the tab that had the grammar's language in it. The patterns themselves didn't match anything. What the patterns were was irrelevant. What's important is that, somehow, this particular capturing group causes Atom to freeze.
I know how good First-Mate is at avoiding infinite loops, which is why this bug is concerning.
I've isolated it to a separate branch in my repo for closer examination.
Reproduction
tests/life.apl
. It should be fine.make
in the package's base directory to toggle the problematic capturing group.tests/life.apl
and restart Atomtests/life.apl
I noticed Lightshow also has troubles with this particular ruleset, although its handling is somewhat less predictable than Atom's.
The text was updated successfully, but these errors were encountered: