Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python file indentation is wrong #763

Closed
sekirocc opened this issue Sep 18, 2021 · 22 comments · Fixed by #5332
Closed

Python file indentation is wrong #763

sekirocc opened this issue Sep 18, 2021 · 22 comments · Fixed by #5332
Labels
A-tree-sitter Area: Tree-sitter C-enhancement Category: Improvements E-hard Call for participation: Experience needed to fix: Hard / a lot

Comments

@sekirocc
Copy link

Reproduction steps

  1. download latest macOS release 0.4.1
  2. open a python file
  3. move cursor to some line
  4. press o to insert newline
  5. the cursor is at the beginning of line, without any indentation.

Environment

  • Platform: macOS
  • Helix version: 0.4.1

zero config.

@sekirocc sekirocc added the C-bug Category: This is a bug label Sep 18, 2021
@sudormrfbin
Copy link
Member

sudormrfbin commented Sep 18, 2021

Indent support requires an indents.toml file describing the tree-sitter syntax nodes where it should be automatically inserted, for example see https://github.com/helix-editor/helix/blob/master/runtime/queries/rust/indents.toml. Python doesn't yet have this file, so contributions are welcome !

If anyone wants to take this up, there's https://github.com/nvim-treesitter/nvim-treesitter/blob/master/queries/python/indents.scm and the tree-sitter python grammar for reference.

@sudormrfbin sudormrfbin added A-tree-sitter Area: Tree-sitter C-enhancement Category: Improvements E-easy Call for participation: Experience needed to fix: Easy / not much and removed C-bug Category: This is a bug labels Sep 18, 2021
@lisael
Copy link

lisael commented Sep 23, 2021

I think it's not easy (if feasible at all) with the current Tree-sitter only indentation logic. In most editors when I type:

if True:
    blah
    else:<ret>

I expect the whole else: line to be outdented and a new indented line added.

In helix, TreeSitter fails to parse else: as an else statement precisely because it's not outdented (another effect is that else is not highlighted as a keyword in my example). Therefore

outdent = [
    "else_clause"
]

doesn't work. I can't see another way than parsing the current line ourselves at each return

It's a chicken-and-egg problem.

Note nvimTS has the same issue, which is really annoying, but easier to deal with because of smarttabs, a single <backspace> outdents correctly.

@lisael
Copy link

lisael commented Sep 23, 2021

That said, a single

indent = [ ":", "list", "tuple", "dictionary", "set"]

does a good job at indenting.

@willparsons
Copy link

I've only had helix for a few days so I may not be understanding this correctly.

Is this not the indents config for Python? https://github.com/helix-editor/helix/blob/master/runtime/queries/python/indents.scm

Yet I still don't get auto-indent.

@thomasaarholt
Copy link
Contributor

I get autoindentation if I press Enter when the marker is at | in the following:

a = list(|)
->
a = list(
    |
)

but I do not get it with the if statement as hinted should happen in the the file @willparsons refers to above.

if a == 1:|
->
if a == 1:
|

@zSnails
Copy link

zSnails commented Apr 18, 2022

any news on this? I'm looking forward to switching to helix

@BonfireAtNight
Copy link

The lack of auto-indentation is the only thing that keeps me from fully embracing Helix. For the most part it's great (especially because most things work out of the box), but auto-indent is really the level of convenience every text-editor should offer. Maybe I'm doing something wrong (I've installed Helix via Pacman and pylsp via pip and downloaded the languages.toml file from the repo), but for me not even Indentation after ":" is working, let alone aligning parentheses or operators in multiline clauses.

Is the issue really as severe as @lisael is pointing out? Any hope for a feasible solution in the near to mid future?

Incidentally, what does the "Auto Indent" on the languages page refer to exactly? That it's part of the auto-formatting?

@sudormrfbin sudormrfbin added E-hard Call for participation: Experience needed to fix: Hard / a lot and removed E-easy Call for participation: Experience needed to fix: Easy / not much labels Apr 23, 2022
@paul-scott
Copy link
Contributor

Had a bit of a look into this. Thought I'd document my understanding of the issue.

def func():#<- this cursor position is within the "module" node, not the "function_definition" node

if True:#<- this cursor position is within the "module" node, not the "if_statement" node

Where node refers to tree-sitter tree node. You can have a play in the playground with this.

As a simplification the indents are calculated by traversing the tree and counting the number of nodes that have the capture type @indent. Because in the above positions we are just past the end of the relevant node we don't get the desired indentation. This explains why the following cases do work (we are inside the function_definition):

def func1(#):

def func2():#
    x = 1

I suspect there isn't any way to fix this with the current way helix does indentation. Neovim seems to support some additional capture types, e.g., @branch, @auto, @aligned_indent, which might be why things work better there:

https://github.com/nvim-treesitter/nvim-treesitter/blob/master/queries/python/indents.scm

@paul-scott
Copy link
Contributor

One idea to solve this is:

  • Add a new capture type called @indent-after
  • In indents.scm attach it to function_definition, class_definition, if_statement etc..
  • Implement handling in helix-core/src/indent.rs

Specifically when adding a new line, determine all nodes associated with one position back from the current cursor position. If any of those nodes are @indent-after and have a span that ends there, then count +1 indent. Note that this would only be counted when adding a new line, so @indent-after would be otherwise ignored.

I'm not sure if any other languages would potentially benefit from this (yaml?).

Another idea:

The above is reasonably convoluted given that the rule could probably be simplified down to: if the character at the end of the line is a :, inserting a new line should have the indentation of this line +1. So maybe there could be a capture @indent-after-line that is attached to ":" and on creating a new line if the previous / line ending character is : then copy the current line indentation using indent_level_for_line, and +1 to it.

@WindSoilder did you have any more thoughts on how to resolve this after doing the original implementation in #837?

@WindSoilder
Copy link
Contributor

@paul-scott Sorry, I don't have ideas, and actually I don't know much about treesitter scm file...

I just think that once the relative treesitter issue nvim-treesitter/nvim-treesitter#1136 issue is correctly handled, we can (maybe)adjust indents.scm file to make it works properly.

@paul-scott
Copy link
Contributor

No worries. When / if there is a solution for nvim it is possible that it gets solved specifically for nvim-treesitter in their wrapper / libraries rather than the upstream treesitter implementation. So while it would be useful to follow a similar approach, it could still use a bit of effort to implement for helix.

Neovim also has other non-treesitter ways to do indentation which in my experience work quite well. In helix I believe we just have the option of treesitter-based or copy-the-indentation-of-the-previous-line-based indentation. So there is more motivation at the moment to get a better solution working in helix whether tree-sitter-based or otherwise.

@Triton171
Copy link
Contributor

@paul-scott The indent-after idea is pretty much what I had in mind as well, just haven't gotten to implement it yet. There's a few things I have to add though:

  • In my opinion, this should simply extend the node that is captured by it (i.e. we pretend that the node contains more than what tree-sitter tells us it does). This should then be reasonably simple to implement: We have to increase the range of the query here in order to include any preceding node that should be extended. If there is an extended node, we have to use it here. This also has the advantage that we cannot only indent with this, but also outdent and do anything else we'll add to the indent system in the future.

  • What you suggest would be to extend the node by one character. This would improve the situation but not completely solve the problem: As soon as one leaves a line blank while writing a Python function, the indent would be lost, because now the end of the function node is more than one character away from the cursor:

    def func():
        # This line will be indented
    # This line will no longer be indented

    Any whitespace after func(): would break the indentation for the first line in the function as well.

    For python, we ideally want to extend the function node to everything that is currently indented more than the beginning of the function node. The indentation query depending on the current indentation is a bit circular, but that's just because, in Python, the syntax depends on the indentation. Similarly to this @extend-indented capture, it might make sense to have other captures. I believe, something like @extend-always would fix the lua indentation issues.

    Extending a node would of course always stop at the beginning of its next sibling or at the end of its parent.

Let me know what you think and if I've overlooked anything. If you (or anyone else) want to implement it, I'll be happy to help with any questions. Otherwise, I'll do it at some point, but it'll probably take 2 or 3 months until I get to it.

@paul-scott
Copy link
Contributor

What you suggest sounds like it would be cleaner.

For python, we ideally want to extend the function node to everything that is currently indented more than the beginning of the function node.

Yea that seems important in python.

Let me know what you think and if I've overlooked anything.

Nothing obvious to me at this point. I'm sure there will be some challenges when it actually comes to implementing it.

I'll have a go if I can find some spare time.

@pickfire
Copy link
Contributor

pickfire commented May 27, 2022

One thing I noticed as well when O on first line, it will indent the first line and your cursor will jump to second line

#(|)#import line

o


    #(|)#import time

archseer added a commit that referenced this issue Jul 1, 2022
There's been a lot of complaints about the state of python indentation
and the fallback actually works better until the solution proposed
in #763 (comment)
is implemented.
@mitsuhiko
Copy link
Contributor

I believe related to this issue is that I think the underlying tree-sitter grammar for Python is not great for interactive use.

Case in point:

def foo():
    try:
        1 + 

At this point foo is known as ["module", "function_definition", "identifier"]. However if I add an identifier after the 1 + to make it a full expression, all the sudden foo changes to ["module", "ERROR", "identifier"]. This can also be seen with how the syntax highlighting of the function name changes.

This relates to indentation because depending on the state of how it parses this, completely different rules apply. I reckon the underlying tree-sitter stuff for Python probably needs adjusting for the use here.

@zyklotomic
Copy link
Contributor

I believe related to this issue is that I think the underlying tree-sitter grammar for Python is not great for interactive use.

Case in point:

def foo():
    try:
        1 + 

At this point foo is known as ["module", "function_definition", "identifier"]. However if I add an identifier after the 1 + to make it a full expression, all the sudden foo changes to ["module", "ERROR", "identifier"]. This can also be seen with how the syntax highlighting of the function name changes.

This relates to indentation because depending on the state of how it parses this, completely different rules apply. I reckon the underlying tree-sitter stuff for Python probably needs adjusting for the use here.

@mitsuhiko good point, I want to mention another find. tree-sitter doesn't recognize the try statement until we put the corresponding except:. Semantically this is correct, but does not make sense for interactive use as you said.

Could we just also add an @branch capure type like nvim does here?

I could try that and see how feasible that is, and submit a PR if I succeed.

@Triton171
Copy link
Contributor

The @branch capture type of nvim-treesitter roughly corresponds to the @outdent capture type in helix (compare their docs with docs for helix). Adding an @outdent capture for the except_clause makes sense but it wouldn't solve this specific issue (which is caused by the except clause not being there). I would guess (although I haven't tested it) that in this scenario, nvim-treesitter has the same issues, which could be adressed in one of 2 ways:

  • Improve the tree-sitter grammar for python, as @mitsuhiko mentioned. Ideally, an incomplete try statement would always result in a try_statement node with an error at its end or after it. At the very least, it should be consistent and not depend on what exactly the contents of the try statement are.
  • Since improving the grammar is probably quite difficult, it might also be possible to work around it by writing queries for the specific situations where the indentation is currently broken. This is only a workaround and it will probably never work for all situations but it might be possible to at least fix the common cases.

@zyklotomic
Copy link
Contributor

My bad! I just realized I conflated two separate issues. When I was talking about @branch, I was referring to the else clause not retroactively outdenting, as you mentioned in your PR #3382. A branching capture type could be added to handle this retroactive case.

@zyklotomic
Copy link
Contributor

After playing around with tree-sitter a bit more, the first proposed option of modifying tree-sitter would actually require allowing for "invalid" programs since a try-statement can't be standalone without an except or finally clause. Why if works is because we don't need an elif or else to be grammatically correct. Thus, if we are adamant on going this route, we probably would have to fork the Python tree-sitter grammar instead of proposing broken grammar changes upstream. This whole approach feels very shoehorn-ey to me.

As for the second mentioned approach, I initially tried

(
  (ERROR) @try_clause
  (#match? @try_clause "try:.*")
) @indent @extend

which handles

try:

correctly, but not:

def fn(x):
  try:

If you try it out in the tree-sitter playground, you'll see that the (ERROR) doesn't even match "try" in the second case.

To remedy this, maybe we could add more predicates such that we can single out specific subcaptures with a regex, like

(
  (ERROR) @try_clause
  (#regex-capture? @try_clause "try:.*")
) @indent @extend

These two solutions aside, another solution might be with a plugin system (#3806). We might want a plugin system to handle the retroactive dedenting too. Here is how neovim does it.

@Triton171
Copy link
Contributor

Thanks for looking into this @zyklotomic. What I mean by adjusting the tree-sitter grammar is not to make it parse incorrect code without producing any ERROR nodes. It should still produce errors (after all, the code is incorrect) but ideally they would be a bit more local. For example, an incomplete try_statement shouldn't suddenly make the surrounding function an error as well. Maybe it could even produce a try_statement node and put the ERROR inside. This way, highlighting and indentation would still work everywhere except for the exact location of the error.

I had a look at the playground and came up with some queries which seem to work pretty well. The PR is #5332, if you want to try it out and report any issues, that would be a great help.

As for retroactive indenting/dedenting, I'm thinking if we can find a general, language-agnostic solution to this. It'd be nice if Helix could automatically rerun indent queries when the syntax tree changes and adjust the indent of existing lines. This will probably take some time though, as it has to be fast, accurate and it needs to detect when the user is trying to override the computed indent (for example when it is incorrect).

@zyklotomic
Copy link
Contributor

I like the idea of wanting to find a language-agnostic solution, but I'm also wondering how pragmatic it would be. What other languages depend on indentation as syntax / and or benefit from these changes?

@Triton171
Copy link
Contributor

I've definitely come across the need to retroactively update indentation in several languages. For example, in C/C++, labels are usually outdented:

int main() {
start:
    do_smth();
    goto start;
}

start being a label is only clear after typing :, so it would be nice if helix could outdent it then. For another example, some IDEs automatically indent code if you surround it with curly braces. This makes refactoring a lot more pleasant but is of course more complicated since we'd have to update the indentation for an (arbitrarily long) block of code and not just for a single line.

I do agree that python is probably one of the languages where such a feature is the most important.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-tree-sitter Area: Tree-sitter C-enhancement Category: Improvements E-hard Call for participation: Experience needed to fix: Hard / a lot
Projects
None yet
Development

Successfully merging a pull request may close this issue.