Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to a new parser #2318

Open
JelleZijlstra opened this issue Jun 8, 2021 · 7 comments
Open

Switch to a new parser #2318

JelleZijlstra opened this issue Jun 8, 2021 · 7 comments
Labels
C: parser How we parse code. Or fail to parse it. S: accepted The changes in this design / enhancement issue have been accepted and can be implemented T: enhancement New feature or request

Comments

@JelleZijlstra
Copy link
Collaborator

JelleZijlstra commented Jun 8, 2021

Currently, Black uses a vendored version of lib2to3 for parsing. This works well for parsing Python 2 and early Python 3, but Python has now moved on to a PEG-based parser (PEP 617), and lib2to3 is no longer being maintained.

So we need a new parser. There are a few existing options that we could leverage (Parso, LibCST), but it's going to be a lot of work to do the migration. WE're doing some early brainstorming in a Google doc. This issue exists so that we have a public record that we know this is a problem.

Concrete pieces of syntax that are blocked by this new grammar include parenthesized context managers and the match statement in Python 3.10. (#2242 through #2586, #2667, #2758)

@JelleZijlstra JelleZijlstra added S: accepted The changes in this design / enhancement issue have been accepted and can be implemented C: parser How we parse code. Or fail to parse it. labels Jun 8, 2021
@ichard26 ichard26 added C: blib2to3 T: enhancement New feature or request labels Jun 8, 2021
@kamahen
Copy link

kamahen commented Jul 5, 2021

The main bug is: https://bugs.python.org/issue40360 (and also https://bugs.python.org/issue36541).

I think that there's a fairly straightforward way of wrapping the new Python parser to give the necessary functionality that "Black" (and other source-level tools) need. However, it's a non-trivial amount of work, and I'm loathe to do it unless I'm sure it'll be used and that nobody else is doing the work. (There appears to be one existing wrapper, namely leoAst.py; I've looked at it a bit but it seems much more complicated than necessary and therefore could be both difficult to use and a maintenance issue.)

Some other discussion at kamahen/pykythe#27 google/yapf#825 (comment) , google/yapf#894 (comment) and elsewhere.

@ianliu
Copy link

ianliu commented Nov 5, 2021

Has treesitter been considered? It already implements a parser for python here: https://github.com/tree-sitter/tree-sitter-python and I think it allows to build formaters upon it.

@JelleZijlstra
Copy link
Collaborator Author

JelleZijlstra commented Nov 5, 2021

@ianliu interesting, I hadn't heard of that!

Looking at the Python bindings (https://github.com/tree-sitter/py-tree-sitter), it might be hard to get it to work for us:

  • Installing it requires a C compiler on most platforms (there's wheels only for MacOS/3.8)
  • And that doesn't even give you a Python grammar: you have to clone a repo and build the grammar at runtime.

That sounds like it would lead to a lot of people with mildly exotic systems who'd be unable to install Black if it depended on this library.

@jakkdl
Copy link
Contributor

jakkdl commented Nov 7, 2022

LibCST now supports (according to readme) 3.0->3.11, though it does say

It is more difficult to implement tools that focus almost exclusively on whitespace on top of LibCST instead of lib2to3. For example, Black would need to modify whitespace nodes instead of prefix strings, making its implementation much more complex.

@Udayraj123
Copy link

Hi @JelleZijlstra, wanted to know if a resolution would be provided for this any time soon. Any alternatives/work arounds for now?

@JelleZijlstra
Copy link
Collaborator Author

There are no concrete plans to switch to a new parser, but we have full support for the latest Python grammar changes through some hacks on our existing parser. What do you need a workaround for?

@Udayraj123
Copy link

Udayraj123 commented Mar 29, 2023

Oh I see, I was facing this issue: #2242 with the match/case syntax. I guess it might be a configuration issue on my end then.

Edit: An error shown in this discussion seems to not address match case, was it fixed later?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C: parser How we parse code. Or fail to parse it. S: accepted The changes in this design / enhancement issue have been accepted and can be implemented T: enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants