python PEG grammar #177

adsharma · 2021-02-18T06:02:38Z

Python has a PEG grammar here:

https://github.com/python/cpython/blob/master/Grammar/python.gram

That grammar uses a slightly different format. I'm looking to parse it using parsimonious. My script massages the grammar above to something close to what this module expects. But two issues remain:

cpython uses ':' for rules and you seem to use '='
cpython uses '|' for alternatives and you seem to use '/'

Has anyone looked into reconciling these two and using the package to parse python code itself?

adsharma · 2021-02-18T06:03:57Z

Cleaned up grammar produced by my script:

https://paste.ubuntu.com/p/ftbMmhB5fV/

goodmami · 2021-02-23T08:45:36Z

Python's new PEG parser ("pegen") and its syntax is described here: https://www.python.org/dev/peps/pep-0617/#syntax

The syntax is based on the older LL(1) ("pgen") parser, and the same syntax is retained and extended for pegen because, apparently, GvR likes it (source). So : is equivalent to = and | is equivalent to /.

More interesting is that pegen is not a scannerless PEG parser (e.g., note that NAME is not defined by the grammar). It must first tokenize the input, then it uses the PEG rules to parse the tokens. See https://docs.python.org/3/library/token.html for the valid tokens. If you want to parse Python character by character, you'll need to write rules for those tokens as well.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

python PEG grammar #177

python PEG grammar #177

adsharma commented Feb 18, 2021

adsharma commented Feb 18, 2021

goodmami commented Feb 23, 2021

python PEG grammar #177

python PEG grammar #177

Comments

adsharma commented Feb 18, 2021

adsharma commented Feb 18, 2021

goodmami commented Feb 23, 2021