Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refine Python grammar #310

Merged
merged 2 commits into from
Mar 28, 2022
Merged

refine Python grammar #310

merged 2 commits into from
Mar 28, 2022

Conversation

mw66
Copy link
Contributor

@mw66 mw66 commented Mar 26, 2022

A couple of fixes:

-- <- to <
-- https://docs.python.org/3/reference/grammar.html in many places use:

  ','.expression+

I'm not sure if this is standard PEG syntax, anyway, translated into

  ((expression ("," expression)*)

Actually the grammar is still not working, the parsing output in the unittest is wrong:

Running python-test-library 
Python[0, 0][]
 +-Python.file[0, 0][]

However, I noticed Pegged is really slow, without 'lowmem' it run out of memory (after consume ~32GB); and with lowmem, it took ~5min to build. This make the trial-and-error process to debug very time consuming, esp. considering I'm a Pegged newbie.

I also tried to use the Python library parsimonious to help, and found it's really fast and helpful to identify errors; but unfortunately, due to this issue that parsimonious does not consume (skip) the white spaces by default:

erikrose/parsimonious#193

it would make no sense if I try to add meaningless whitespaces rules _ to the current grammar, so I stopped here. I.e. parsimonious no longer complain about any errors, but I won't able to parse any real Python code to test.

@veelo can you help from here if you have time? maybe it's easier for your expert eyes to see the problem.

Also, I'm wondering if this compile-time parser generator worth it? or does Pegged have a library mode, i.e. generate parsing tree at runtime on the given grammar file. Will this be faster? For comparison, parsimonious parse (and report error) almost instantly in seconds, which makes the development process much faster.

Thanks.

@mw66
Copy link
Contributor Author

mw66 commented Mar 26, 2022

BTW, I found a working Python PEG parser here:

https://github.com/we-like-parsers/pegen/blob/main/data/python.gram

but it uses a different PEG syntax.

It's a working parser, and I have tried it to parse really complex py source code.

BTW, generate from Python PEG grammar to python_parser.py only take less than a second:

pegen/src$ time python -m pegen  ../data/python.gram -o ../data/python_parser.py
...

real	0m0.560s
user	0m0.531s
sys	0m0.017s

Once again, I'm amazed by how much resource (memory & cpu time) that Pegged requires.

@veelo
Copy link
Collaborator

veelo commented Mar 28, 2022

For practical parsers, IMO you should always use https://github.com/PhilippeSigaud/Pegged/wiki/Grammars-as-D-Modules. Compile-time parsing is rarely useful. See https://github.com/PhilippeSigaud/Pegged/wiki/Using-a-Grammar#runtime-parsing for runtime parsing.

Also, I think your grammar lacks eoi, which it might need, see https://github.com/PhilippeSigaud/Pegged/wiki/Predefined-Parsers.

On speed, to things. Firstly, since the test and the grammar are in the same file, you are measuring both the time it takes to generate the parser and the time it takes to parse the input. Using asModule will help you separate the two. Secondly,
Pegged is known for not being fast, sadly. I suspect that a big reason for that is that it is based on string-based switch statements. It would be very interesting to consider a redesign of Pegged to switch on enums, but I don't have that kind of time...

You might find some help in https://github.com/PhilippeSigaud/Pegged/wiki/Optimizations. You can also enable tracing to get an insight into the parsing process.

@veelo veelo merged commit e81ac46 into dlang-community:master Mar 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants