lexer & parser generator and grammar toolkit written in java
- accepts regex like grammar(EBNF)
- lexer generator
- Recursive descent parser generator that supports left recursion
- LR(1),LALR(1) parser generator
- DFA minimization
- Outputs CST
- dot graph of NFA, DFA, LR(1), LALR(1)
- left recursion remover(direct and indirect)
- precedence remover
- ebnf to bnf
- epsilon remover
Examples are in examples folder
//this is a line comment
/* this is a
multine comment */
to include another grammar use;
include "<grammar_name>"
e.g include "lexer.g"
token{
<TOKEN_NAME> : <regex> ;
}
e.g
token{
#LETTER: [a-zA-Z]
#DIGIT: [0-9];
NUMBER: DIGIT+;
IDENT: (LETTER | '_') (LETTER | DIGIT | '_')*;
}
prefixing token name with '#' makes that token fragment so that it can be used as only reference
<RULE_NAME> : <regex> ;
e.g
assign: left "=" right;
left: IDENT;
right: IDENT | LITERAL;
r1 | r2 | r3
r1 r2 r3
r*
= zero or more times(kleene star)
r+
= one or more times(kleene plus>
r?
= zero or one time(optional)
(r)
you can group complex regexes in tokens and rules
e.g a (b | c+)
use %empty
, %epsilon
or ε
for epsilon
e.g rule: a (b | c | %epsilon);
place ranges or single chars inside brackets(without quote)
[start-end single]
e.g id: [a-zA-Z0-9_];
escape sequences also supported
e.g ws: [\u00A0\u000A\t];
negation e.g lc: "//" [^\n]*;
use single or double quotes for your strings
e.g stmt: "if" "(" expr ")" stmt;
e.g stmt: 'if' '(' expr ')' stmt;
strings in rules will be replaced with token references that are declared in token
block
so in the example above the strings would need to be declared like;
token{
IF: "if";
LP: "(";
RP: ")";
}
in LR parsing you have to specify start rule with %start
e.g %start: expr;
use %left
or %right
to specify associativity
E: E "*" E %left | E "+" E %right | NUM;
precedence handled by picking the alternation declared before
e.g E: E "*" E | E "+" E | NUM;
multiplication takes precedence over addition in the example
you can use modes to create more complex lexer
token{
LT: "<" -> attr;
attr{
TAG_NAME: [:ident:] -> attr;
}
attr{
WS: [\r\n\t ] -> skip;
GT: ">" -> DEFAULT;
SLASH_GT: "/>" -> DEFAULT;
ATTR_NAME: [:ident:] -> eq;
}
attr_eq{
EQ: "=" -> attr_val;
}
attr_val{
VAL: [:string:] -> attr;
}
}
note: default mode is used to exit from modes
tokens marked with skip mode will be ignored by the parser so you can use it for comments and whitespaces
token{
comment: "//" [^\n]* -> skip;
ws: [ \r\n\t]+;
}