Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to go about including pegen code in CPython? #7

Closed
lysnikolaou opened this issue Mar 9, 2020 · 12 comments
Closed

How to go about including pegen code in CPython? #7

lysnikolaou opened this issue Mar 9, 2020 · 12 comments

Comments

@lysnikolaou
Copy link

How would you go about moving all the pegen code into cpython? What I have done so far is the following.

  1. Copy the pegen directory into cpython/Parser (and remove all the stuff that's not needed).
  2. Move pegen.c and parse_string.c from cpython/Parser/pegen into cpython/Parser.
  3. Move pegen.h and parse_string.h from cpython/Parser/pegen into cpython/Include.
  4. Generate two new functions that call run_parser_* and can be called from C.
  5. Add a parse.h in cpython/Include that exports both the new generated functions.
  6. Generate the parser into Parser/parse.c.

All the names are just dummy names that I didn't give much thought, but do you think something like this would work?

@lysnikolaou lysnikolaou changed the title How to go about including pegen code in cpython? How to go about including pegen code in CPython? Mar 9, 2020
@gvanrossum
Copy link

Yeah, that's mostly reasonable, and it's good that we're thinking about this now.

One suggestion: I propose to move (copy) the pegen parser generator (for generating both Python and C) into the Tools directory, rather than following the 30-year-old example of Parser/pgen.

In that move, I would also propose to rename either the top-level directory (the repo root, corresponding to a specific subdirectory of Tools) or the actual module (pegen/pegen), so that there isn't the horrible ambiguity of what "pegen" refers to. Maybe we should keep "pegen" as the module name (else too many imports need to be changed) and put all the code in Tools/peg_generator?

I would move everything except the C code, which has has little or no business living in the same tree, and the "storyN" directories, which belong to my blog posts only. I would also probably weed out most of the data files (except the ones that are used by tests). We can weed out more later if we feel there is cruft.

I would then copy the hand-written .c and .h files into cpython/Parser and cpython/Include, as you indicate.

I would also copy the grammar (data/simpy.gram) into cpython/Grammar, renaming it to python.gram.

Then we need the two run functions like you say, and a module that allows calling these from Python (the functions themselves can be called from C).

I would like to come up with slightly better names for most files at runtime -- "parse" or "parser" is too generic, and "pegen" already is over-used.

Also, let's not overthink this -- we can start with some less-than-perfect scheme and iterate.

@gvanrossum
Copy link

Are you working on this (or any other aspect of the integration)? If not, I'd like to give it a try.

@lysnikolaou
Copy link
Author

I'm not currently working on this, but was also thinking of starting some of these days. Did you start with this? Can I help out in any way?

@gvanrossum
Copy link

I am working on getting the code generator copied into Tools. I've almost got it working. Will push to a new branch when I am happy with that step, hopefully later today. Be safe!

@gvanrossum
Copy link

Pushed to branch "radical". I only copied stuff that's part of the parser generator or the support for the generated C code, and scripts and tests. I moved the .c and .h files to peg_parser/, so they are no longer mixed in with the generator (but the support is still in pegen/, since it's used by the generator to read the grammar). I renamed simpy.gram to python.gram, and renamed two other data files from .txt to .py. I tested most Makefile targets, and they all seem to work except for the former simpy targets (which I've renamed to test_local and test_global). I'll see if I can fix those next. Once that all is done I'll see if I can actually generate an extension module that works with 3.9 (currently it still requires 3.8 :-).

@gvanrossum

This comment has been minimized.

@gvanrossum gvanrossum reopened this Mar 18, 2020
@gvanrossum
Copy link

In the process of moving to 3.9 I was reminded that the 3.9 grammar has very recently changed -- slice_ty no longer exists (python#9605).

@gvanrossum
Copy link

gvanrossum commented Mar 20, 2020

the 3.9 grammar has very recently changed

Fixed in the 'radical' branch.

  1. Copy the pegen directory into cpython/Parser (and remove all the stuff that's not needed).
  2. Move pegen.c and parse_string.c from cpython/Parser/pegen into cpython/Parser.
  3. Move pegen.h and parse_string.h from cpython/Parser/pegen into cpython/Include.

[Added]

  1. Generate the parser into Parser/parse.c.

These are done too (though slightly differently). I will make a PR and merge this into the 'pegen' branch as a single commit. [Done.]

@gvanrossum
Copy link

gvanrossum commented Mar 20, 2020

Next up: item 6.

@lysnikolaou
Copy link
Author

Is it okay to occasionally merge python/cpython:master into pegen? I think, that the Windows and MacOS tests were fixed upstream, so we might be able to turn those on again if we do so.

@gvanrossum
Copy link

Is it okay to occasionally merge python/cpython:master into pegen?

Absolutely! Go ahead.

@gvanrossum
Copy link

gvanrossum commented Mar 27, 2020

Everything here's done now. I'll open a new issue to track enabling the new parser using a command-line flag and/or environment variable. [UPDATE: We already have #4 and #5]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants