v2 was released in November 2020. It contains the following changes, some of which are backwards-incompatible:
-
Added optional
LexString()
andLexBytes()
methods that lexer definitions can implement to fast-path lexing of bytes and strings. -
A new stateful lexer has been added.
-
A
filename
must now be passed to allParse*()
andLex*()
methods. -
The
text/scanner
lexer no longer automatically unquotes strings or supports arbitary length single quoted strings. The tokens it produces are identical to that of thetext/scanner
package. UseUnquote()
to remove quotes. -
Tok
andEndTok
will no longer be populated. -
If a field named
Token []lexer.Token
exists it will be populated with the raw tokens that the node parsed from the lexer. -
Support capturing directly into lexer.Token fields. eg.
type ast struct { Head lexer.Token `@Ident` Tail []lexer.Token `@(Ident*)` }
-
Add an
experimental/codegen
for stateful lexers. This provides ~10x performance improvement with zero garbage when lexing strings. -
The
regex
lexer has been removed. -
The
ebnf
lexer has been removed. -
All future work on lexing will be put into the stateful lexer.
-
The need for
DropToken
has been removed.