A bootstrap of RBNF.hs to generate standalone parsers targeting multiple programming languages.
Standalone: the generated code can run without runtime dependencies other than the language and standard libraries.
You can install binary files via: The Haskell Tool Stack.
sh> stack install .
Otherwise, binary files for various platforms(Win64, Generic Linux, MAC OSX 10.13-10.15) are released on GitHub.
Download it from Releases, add fff-lex
and fff-pgen
to your PATH.
frontend-for-free
now provides a wrapper for Python only:
pip install frontend-for-free
or install it from GitHub.
sh> fff <xxx>.rbnf --trace [--lexer_out <xxx>_lex.py] [--parser_out <xxx>_parser.py]
sh> # note that you should also provide a <xxx>.rlex file
sh> ls | grep <xxx>
<xxx>_parser.py <xxx>_lex.py
See examples at runtest.
A framework for generating context-free parsers with the following features:
- cross-language
- distributed with a lexer generator, but feel free to use your own lexers.
- LL(k) capability
- efficient left recursions
- standalone No 3rd party library is introduced, while the generator requires Python3.6+ with a few dependencies.
- defined with a most intuitive and expressive BNF derivative
-
action/rewrite:
pair := a b { ($1, $2) }
-
parameterised polymorphisms for productions:
nonEmpty[A] := A { [$1] } | hd=A tl=nonEmpty[A] { tl.append(hd); tl }
where
append
shall be provided by the user code.
-
Currently,
- the parser generator support for a programming language is hard coded in
src/RBNF/BackEnds/<LanguageName>.hs
. - the lexer generator support for a programming language is hard coded in
ffflex.py
.
-
Parsing JSON
-
Parser as Interpreter: Implementing a Programming Language within 20 Minutes
-
Parsing LaTeX
- lexer: gkdtex.rlex
- parser: gkdtex.gg
-
Parsing LLVM IR(A major subset)
- lexer: llvmir.rlex
- parser: llvmir.rbnf
-
Parsing nested arithmetic expressions
- lexer: arith.rlex
- parser: arith.rbnf
-
Parsing the BNF derivative used by FFF(bootstrap)
- lexer: fffbnf.rlex
- parser: fffbnf.rbnf
-
Parsing ML syntax:
-
(OLD VER 0)Parsing ML syntax and convert it to DrRacket
- lexer: yesml.rlex
- parser: yesml.rbnf
-
(OLD VER 1)Muridesu: 以木兰的方式, 三小时做出强比Python,形似GoLang的语言
- lexer: muridesu.rlex
- parser: muridesu.exrbnf
-
(OLD VER 2)Parsing Python ASDL files
- lexer: asdl.rlex
- parser: asdl.exrbnf
OLD VER 2, OLD VER 1 and OLD VER 0 are out-of-date, hence the code generation does not work with the master branch.
However, the generated code is permanent and now still working.
Further, OLD VER 2 can be easily up-to-date by manually performing the following transformations:
-
changing slots
$0, $1, $2, ...
to$1, $2, $3, ...
-
changing
list(rule)
tolist[rule]
, and provide the definition oflist
production:list[p] ::= p { [$1] } | list[p] p { $1.append($2); $1 }
-
changing
separated_list(sep, rule)
toseparated_list[sep, rule]
, and provide the definition ofseparated_list
production:separated_list[sep, p] ::= p { [$1] } | list[p] sep p { $1.append($3); $1 }
For most cases, you don't need to understand any parsing components like lexers, token tables, states, etc.
In fact, you can easily access your generated parser simply via the following function parse(source_code, filename="<unknown>")
:
from <the generated parser module> import *
from <the generated lexer module> import lexer
__all__ = ["parse"]
_parse = mk_parser()
def parse(text: str, filename: str = "unknown"):
tokens = lexer(filename, text)
status, res_or_err = _parse(None, Tokens(tokens))
if status:
return res_or_err
msgs = []
lineno = None
colno = None
filename = None
offset = 0
msg = ""
for each in res_or_err:
i, msg = each
token = tokens[i]
lineno = token.lineno + 1
colno = token.colno
offset = token.offset
filename = token.filename
break
e = SyntaxError(msg)
e.lineno = lineno
e.colno = colno
e.filename = filename
e.text = text[offset - colno:text.find('\n', offset)]
e.offset = colno
raise e
Calling parse
will get you the expected result, or a considerably readable error message.