Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

python PEG grammar #177

Open
adsharma opened this issue Feb 18, 2021 · 2 comments
Open

python PEG grammar #177

adsharma opened this issue Feb 18, 2021 · 2 comments

Comments

@adsharma
Copy link

Python has a PEG grammar here:

https://github.com/python/cpython/blob/master/Grammar/python.gram

That grammar uses a slightly different format. I'm looking to parse it using parsimonious. My script massages the grammar above to something close to what this module expects. But two issues remain:

cpython uses ':' for rules and you seem to use '='
cpython uses '|' for alternatives and you seem to use '/'

Has anyone looked into reconciling these two and using the package to parse python code itself?

@adsharma
Copy link
Author

Cleaned up grammar produced by my script:

https://paste.ubuntu.com/p/ftbMmhB5fV/

@goodmami
Copy link

Python's new PEG parser ("pegen") and its syntax is described here: https://www.python.org/dev/peps/pep-0617/#syntax

The syntax is based on the older LL(1) ("pgen") parser, and the same syntax is retained and extended for pegen because, apparently, GvR likes it (source). So : is equivalent to = and | is equivalent to /.

More interesting is that pegen is not a scannerless PEG parser (e.g., note that NAME is not defined by the grammar). It must first tokenize the input, then it uses the PEG rules to parse the tokens. See https://docs.python.org/3/library/token.html for the valid tokens. If you want to parse Python character by character, you'll need to write rules for those tokens as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants