Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exploring Performant PEG Parsing #2354

Closed
austin-schick opened this issue Jan 3, 2024 · 17 comments
Closed

Exploring Performant PEG Parsing #2354

austin-schick opened this issue Jan 3, 2024 · 17 comments

Comments

@austin-schick
Copy link
Contributor

What was I exploring, and why?

Pierre has expressed interest in replacing the current Brython parser with a PEG parser generated from Python's official grammar. He's implemented a version of this -- a parser which takes a grammar as a JS object -- but it's significantly slower than the current parser.

Recently, he said:

I didn't make performance tests but when running the built-in test suite it is obvious that it's still significantly slower, . . .

It's a shame really because it would make the implementation much easier to maintain, and more compliant with standard Python...

I agree! I think an automatically generated parser has strong pros. Like Pierre said, it'd help with maintainability and compliance with CPython. I also think a generated parser would would have fewer bugs and make the process of adding new syntax less error prone. I think maintainability, correctness, and stability are all great goals.

So I wanted to explore these questions:

  • How much slower is Brython's PEG parser than its handmade parser, really?
  • Why is the current PEG parser implementation slow?
  • What is a reasonable path forward for a PEG parser?

I'm hoping my exploration gets somebody excited about continuing the great progress Pierre has already made in this arena.

How much slower is Brython's PEG parser, really?

I wrote a script (based on the existing node_bridge.js script) to test the parser's performance. It takes a Brython implementation as an argument and uses that implementation to parse most of the pyperformance benchmarks. This seemed like a nice choice because they're Python-official, varied, and meant to look like real world code.

The performance results were:

Handmade parser:     1.2s
PEG parser:          3.9s

So a possible answer to this question is: Brython's PEG parser is 3-3.5x slower than its handmade parser. Noticeably slower, maybe slow enough to be an argument against adopting the current PEG parser.

Why is the current PEG parser implementation slow?

I ran my perf script with node's --prof flag and generated a profile from it. I noticed that a large chunk of time was spent in a Proxy used to lazily generate tokens. Replacing the Proxy with an equivalent function improved the performance quite a bit.

New performance results on the pyperformance suite:

Handmade parser:     1.2s
PEG parser:          3.9s
Improved PEG parser: 2.8s

Not bad! Unfortunately there was no more low hanging fruit after this point. I traced the execution of the parser on a simple example to make sure it wasn't doing extra work (I once wrote a sometimes-accidentally-exponential typechecker), and it seemed to me to be working correctly.

I then looked at performance on individual pyperformance benchmarks to find the benchmarks that had the worst performance relative to the handmade parser. This led me to learn that parsing a file full of string literals is extra bad for the PEG parser. In fact, parsing a single integer literal is pretty bad. From this point on, parsing the simple file 1 became a standard test for me.

I think parsing an integer is slow because it's a case where the parser has to visit many, many grammar rules. You can see all of the rules Brython's PEG parser visits to parse 1 here. It's a lot of rules! And each of those rules triggers many additional function calls.

I'm certainly no JavaScript optimization expert, but here are my guesses for why the existing PEG parser is slower than the handmade parser, at a high level:

  1. The PEG parser is making a lot of function calls.
  2. All of the frequently-called functions are polymorphic, and taking inputs of varying shapes, so JS engines may be having a hard time optimizing them. There might be clues toward this in the node profile (if I'm reading it right) where ~10% of the time is spent doing megamorphic IC loads.

There may be another large optimization opportunity I'm missing here. But I think the current "generic parser" architecture comes with some inherent slowness. (I did try standardizing the grammar rules to all have the same keys, with some null values, but that didn't seem to make a difference).

What is a reasonable path forward for a PEG parser?

I think much of Pierre's existing work on the PEG parser could be used to support a parser generator. Instead of storing the grammar as an object and visiting rules dynamically, we could pre-generate a function for each grammar rule, and inline some of that rule's evaluation so there are fewer function calls.

CPython takes this approach and uses this program to generate the C code for its parser. Brython could modify this file to produce JavaScript output instead.

I did a fast, ugly, proof of concept of this that used some not-totally-correct JS utilities and a custom grammar with a JS subheader and trailer, whose actions are totally wrong but are at least syntactically valid JS. To be clear, all of this works so poorly that it doesn't successfully parse 1. But it does at least visit the same set of rules as Brython's PEG parser, which I think was enough to learn something.

Here's how long it took for a couple parsers to parse 1 1,000 times:

Handmade parser:           22ms
PEG parser:                175ms
Improved PEG parser:       124ms
New, generated PEG parser: 54ms

Pretty good! That means a generated parser might perform only ~2x slower than the handmade parser in the worst cases.

The "generated parser" was also ~2.5x faster than the "improved parser" for this test. If that speedup applied to the average case, then a generated parser might be just as fast as the handmade parser on average.

There's really not a lot of data here. But I think this is an approach worth exploring. The performance of this approach shows some tentative promise, I was able to get pretty far with it with only about a day and a half of work, and the idea of using CPython's tooling to generate Brython's parser feels compelling to me. We'd be taking advantage of optimization and research that the CPython project has done for their own parser, and adding that to Brython almost for free. Generating "C-like" JavaScript code also feels like a good strategy for generating fast and optimization-friendly code.

CPython's goal was to keep the PEG parser's performance within 10% of the old parser in terms of speed and memory consumption. I'm not sure if Brython has to be that strict.

The benefits of a parser generator for this project feel large enough to me to justify some slowdown. Probably not a 3.5x slowdown though. I'm not sure where that line is for me.

Thanks for reading!

I had fun diving into this. A nice unintended side effect of this exploration was learning some of the nitty gritty details of CPython's parser.

I hope what I've learned is helpful (or at least interesting!) to the Brython community. And I hope this research inspires somebody to build on Brython's PEG parser journey.

-- Austin

@denis-migdal
Copy link
Contributor

denis-migdal commented Jan 4, 2024

Hi,

Seems nice ;)

I have a few comments below.

A On the benchmark :

In Brython, we can say we have 3 steps :

  1. generation of the JS code, with the parser.
  2. parsing of the generated JS code by the browser.
  3. execution of the JS code.

I assume the generated JS code should be the same with the different parser versions, so you should have the same execution speed for steps 2&3. However, it is important to compare the execution speed at step 1 with the steps 2&3.

Indeed, even if step 1 is 1,000x slower with your parser, we don't care if e.g. steps 2 is 1,000,000x slower than step 1.
Indeed, the difference between 1,000,001ms and 1,001,000ms is negligible (<0.01% execution speed decrease).

B Suggestion: WASM ?

If we automatically generate the parser, maybe we can write the parser in e.g. C, then convert it into WASM ?
This may enable to have more control over the internal data structure and would enable to use many threads ? E.g. parsing many modules/files at the same time.
Of course, it will come at a cost as we need to transfer the python source code from JS to WASM and then the generated code from WASM to JS.

C Parsing : checking multiple rules at the same time ?

Parser aren't really my area of expertise. But I am a little surprised we seem to check one rule after another.
Can't we check a set of rule in one operation using either :

  • a switch statement
  • a Map of functions
  • an Array of functions

Or even a binary (?) decision tree.

@austin-schick
Copy link
Contributor Author

Thanks for the comments Denis!

However, it is important to compare the execution speed at step 1 with the steps 2&3.

I somewhat agree here, but a metric like "total runtime" is sometimes less important than "time to start execution". Brython users writing web apps will want their files parsed and event handlers installed as soon as possible, so users aren't waiting on an unresponsive site. My app loads an ~2kloc file to run user-created animations. Right now it takes ~150ms to parse, so the there's an almost imperceptible delay before an animation starts. But if that was a ~500ms delay instead, the experience would be worse.

This program from the gallery is another good example. The Mandlebrot set takes a long time to draw. But it's nice that it doesn't take 2 seconds to start drawing.

Maybe we can write the parser in e.g. C, then convert it into WASM ?

I also find the idea of WASM exciting! But I don't think a C parser is the right next step. I think the biggest reason not to follow this approach is Pierre's lack of interest and experience in C programming. Since this is something Pierre would maintain and he's not comfortable with C or the WASM toolchain, I think a C parser is out. I'm also skeptical of how much of a speedup we'd even get from a different language, given how nicely optimizable the generated JavaScript could be.

checking multiple rules at the same time ?

I don't think I'm totally following here. But the algorithm doesn't check every rule against every token. It does have someswitch-like behavior -- returning early as soon as it finds a matching option for the current rule (mostly).

@PierreQuentel
Copy link
Contributor

@austin-schick

A very well written and inspiring post Austin, thanks !

I didn't think of generating a parser with the CPython tools, if it's only about 2 times slower than the hand-made parser it's probably worth the work needed to adapt these tools to Javascript. I will work on this in the next days, unless you plan to do it yourself.

@denis-migdal
Copy link
Contributor

denis-migdal commented Jan 5, 2024

However, it is important to compare the execution speed at step 1 with the steps 2&3.

I somewhat agree here, but a metric like "total runtime" is sometimes less important than "time to start execution". Brython users writing web apps will want their files parsed and event handlers installed as soon as possible, so users aren't waiting on an unresponsive site. My app loads an ~2kloc file to run user-created animations. Right now it takes ~150ms to parse, so the there's an almost imperceptible delay before an animation starts. But if that was a ~500ms delay instead, the experience would be worse.

This program from the gallery is another good example. The Mandlebrot set takes a long time to draw. But it's nice that it doesn't take 2 seconds to start drawing.

I was not really talking about the total execution time, but the relative time between each steps.
The code will start executing only after steps 1&2. Indeed, there are 2 parsings :

  • Python code parsed by Brython in order to generate the JS (step 1).
  • JS parsing by the browser in order to then execute it (step 2).

We could also say there are a "step 0" which is the execution of brython.js, required before any parsing.

My app loads an ~2kloc file to run user-created animations. Right now it takes ~150ms to parse, so the there's an almost imperceptible delay before an animation starts.

150ms seems to be a lot for "only" 2kB.

Maybe there could be a way to make brython works with <script async>, and start parsing scripts asap, then only executing them once the DOM is loaded. This could make us gain a little time for pages loading ?

checking multiple rules at the same time ?

I don't think I'm totally following here. But the algorithm doesn't check every rule against every token. It does have someswitch-like behavior -- returning early as soon as it finds a matching option for the current rule (mostly).

Normally JS switch do its operations in O(1) and should be quicker than a series of conditions.
I think it internally relies on hashes, and on "goto" instructions.

After switch the second best should be Map (of functions) (if string keys) as it relies on hashes, or Array (of functions) (if non sparse integer keys).

@austin-schick
Copy link
Contributor Author

@PierreQuentel Thank you!

I will work on this in the next days, unless you plan to do it yourself.

That would be great! My holiday break is over now, so I'm not available to pick up the task myself (though it does sound like a lot of fun!)

@PierreQuentel
Copy link
Contributor

With the commits above, I could generate a Javascript PEG parser with the CPython toolchain, complete enough to successfully parse big Python files such as _pydatetime.py, and the result is very disappointing: it takes more than 6 seconds...

To test it, you have to

  • run the script scripts/downloads.py : this installs the Python grammar in scripts/python.3.12.gram
  • run the script scripts/make_javascript_gen_parse.py: it uses the module scripts/pegen/javascript_generator.py, adapted from c_generator.py, to generate the Javascript parser gen_parse.js, stored in /www/src. The script is around 820kB
  • load the page /tests/test_gen_parse.html. It loads the parser, plus additional helper scripts, most of them already present since my first attempts at a PEG parser, plus a new one, pegen.js, adapted from CPython's pegen.c

After a lot of work (I am not very good at reading C code, which was necessary to understand the logic of the generated parser and generate equivalent Javascript code), I finally could parse many Python scripts and generate the correct AST (except some positional information in some places).

The last step was to measure the time taken to parse a big module in the standard distribution, and I already spoiled the conclusion : the generated parser is awfully slow, much slower than the existing PEG parser, and consequently much, much slower than the current hand-made parser.

There might be possible improvements to this code, but I doubt that we can come from 6 seconds to a few hundreds of milliseconds...

@denis-migdal
Copy link
Contributor

denis-migdal commented Jan 8, 2024

I tried to take a quick look at it.

Did you test the execution speed with all the console.log removed (I think I saw one in the commits) ?
Is part of the parser's Python code converted to JS through Brython (I see some py files) ?

We'd need to do some profiling to see what takes a lot of time and see whether it can be optimized.

@PierreQuentel
Copy link
Contributor

Yes, I removed the debug traces, the rare console.log that remain don't impact the overall performance.

The Javascript parser code (/www/src/gen_parse.js) is generated by the Python script /scripts/pegen/javascript_generator.py, not by Brython.

@PierreQuentel
Copy link
Contributor

There might be possible improvements to this code, but I doubt that we can come from 6 seconds to a few hundreds of milliseconds...

I was wrong to doubt: using Chrome profiling tools I found a bug in a function, fixing it dramatically reduces parsing time to around 2 or 3 times the "legacy" Brython parser. The generated PEG parser is now a little faster than my previous PEG parser version.

@austin-schick
Copy link
Contributor Author

Woohoo! That's great news. I think I'll have some time this weekend to run the new implementation on my benchmarks and look into possible optimizations. I'm looking forward to jumping in.

PierreQuentel added a commit that referenced this issue Jan 19, 2024
…mpare current Brython parser, first PEG parser implementation and new generated PEG parser. Related to issue #2354.
PierreQuentel added a commit that referenced this issue Jan 19, 2024
@PierreQuentel
Copy link
Contributor

I have added 3 pages to test the different parsers versions:

  • tests/parse_tests/test_legacy_parser.html (current hand-made parser, the fastest so far)
  • tests/parse_tests/test_first_PEG_parser.html (previous PEG implementation)
  • tests/parse_tests/test_generated_parser.html (new generated PEG parser)

They all measure the time taken to parse the code of _pydecimal.py and generate the AST.

@austin-schick
Copy link
Contributor Author

I opened a PR here #2358 with a few optimizations to the generated parser. After those changes, I'm measuring the parse times to be ~1.7x those of the legacy parser. I don't feel like I exhausted the optimization opportunities, so I bet we could get even faster with some more effort.

@PierreQuentel
Copy link
Contributor

The version developed in branch generated_peg_parser now passes all the tests in the built-in test suite !

At the moment the size of brython.js in this branch is 1158 kB, vs 921 kB in the current version, because the generated parser (gen_parse.js) is much bigger than the "legacy" parser in py2js.js. It can certainly be reduced, but in any case IMO the benefits of using the generated parser is worth the additional size.

@austin-schick
Copy link
Contributor Author

It looks like after merging, there are now quite a few regressions in test_syntax.py. I just wanted to make sure those didn't go unnoticed! Happy to help out however I can there as well.

@PierreQuentel
Copy link
Contributor

@austin-schick You are right, I should have checked that... I think I have fixed most of the regressions.

@PierreQuentel
Copy link
Contributor

I had not fixed the issues generated by the docstring tests, and there were quite a few...

The Travis CI build now passes without errors. I will publish the first release with the new parser in a few days.

@austin-schick
Copy link
Contributor Author

Woohoo! I see the release is now live. This is very exciting news. Thank you so much for the hard work you put into this Pierre! It looks like getting the new parser working took some serious effort.

And now, as far as I can tell, Brython is more compliant with CPython than ever. Maybe I'll throw a PEG Parser Party to celebrate :-)

I think with that, it's time to close this issue. Merci encore, Pierre !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants