node type in a syntax tree for implementing NFA is described as follow:
symbol | node type |
---|---|
dot(.) |
Leaf Node |
^ |
Head Node |
$ |
Tail Node |
[] |
O-Node, option node. |
* |
Star Node |
+ |
Star Node |
? |
Star Node |
{} |
Star Node |
r1r2 |
Cat Node |
r1|r2 |
Or Node |
() |
represent a unit, it will be set as an attribute in the tree node. |
\1 |
leaf node, for back reference. |
char |
leaf node represents normal character. |
syntaxs of the regular expression supported by my implementation.
- all the syntaxs supported are as mentioned in the previous section.
- for [], support range and negation, [a-f], [2-9], [^abc].
- for back reference, referencing index starts from 0, this is quite different from what you usually see. eg, given pattern: ((ab)cd), for testing string abcd, we get: \0 == ab, \1 == abcd. so, group capturing works inside out.
- use {2, 4} to make repetition of the previous unit. eg, (ab){2, 4} means repeating ab 2 to 4 times(inclusive)