Python file indentation is wrong #763

sekirocc · 2021-09-18T07:24:01Z

Reproduction steps

download latest macOS release 0.4.1
open a python file
move cursor to some line
press o to insert newline
the cursor is at the beginning of line, without any indentation.

Environment

Platform: macOS
Helix version: 0.4.1

zero config.

The text was updated successfully, but these errors were encountered:

sudormrfbin · 2021-09-18T11:37:59Z

Indent support requires an indents.toml file describing the tree-sitter syntax nodes where it should be automatically inserted, for example see https://github.com/helix-editor/helix/blob/master/runtime/queries/rust/indents.toml. Python doesn't yet have this file, so contributions are welcome !

If anyone wants to take this up, there's https://github.com/nvim-treesitter/nvim-treesitter/blob/master/queries/python/indents.scm and the tree-sitter python grammar for reference.

lisael · 2021-09-23T09:45:38Z

I think it's not easy (if feasible at all) with the current Tree-sitter only indentation logic. In most editors when I type:

if True:
    blah
    else:<ret>

I expect the whole else: line to be outdented and a new indented line added.

In helix, TreeSitter fails to parse else: as an else statement precisely because it's not outdented (another effect is that else is not highlighted as a keyword in my example). Therefore

outdent = [
    "else_clause"
]

doesn't work. I can't see another way than parsing the current line ourselves at each return

It's a chicken-and-egg problem.

Note nvimTS has the same issue, which is really annoying, but easier to deal with because of smarttabs, a single <backspace> outdents correctly.

lisael · 2021-09-23T09:47:24Z

That said, a single

indent = [ ":", "list", "tuple", "dictionary", "set"]

does a good job at indenting.

willparsons · 2022-04-04T22:17:44Z

I've only had helix for a few days so I may not be understanding this correctly.

Is this not the indents config for Python? https://github.com/helix-editor/helix/blob/master/runtime/queries/python/indents.scm

Yet I still don't get auto-indent.

thomasaarholt · 2022-04-07T11:01:57Z

I get autoindentation if I press Enter when the marker is at | in the following:

a = list(|)
->
a = list(
    |
)

but I do not get it with the if statement as hinted should happen in the the file @willparsons refers to above.

if a == 1:|
->
if a == 1:
|

zSnails · 2022-04-18T20:49:52Z

any news on this? I'm looking forward to switching to helix

BonfireAtNight · 2022-04-23T07:21:05Z

The lack of auto-indentation is the only thing that keeps me from fully embracing Helix. For the most part it's great (especially because most things work out of the box), but auto-indent is really the level of convenience every text-editor should offer. Maybe I'm doing something wrong (I've installed Helix via Pacman and pylsp via pip and downloaded the languages.toml file from the repo), but for me not even Indentation after ":" is working, let alone aligning parentheses or operators in multiline clauses.

Is the issue really as severe as @lisael is pointing out? Any hope for a feasible solution in the near to mid future?

Incidentally, what does the "Auto Indent" on the languages page refer to exactly? That it's part of the auto-formatting?

paul-scott · 2022-05-13T11:17:52Z

Had a bit of a look into this. Thought I'd document my understanding of the issue.

def func():#<- this cursor position is within the "module" node, not the "function_definition" node

if True:#<- this cursor position is within the "module" node, not the "if_statement" node

Where node refers to tree-sitter tree node. You can have a play in the playground with this.

As a simplification the indents are calculated by traversing the tree and counting the number of nodes that have the capture type @indent. Because in the above positions we are just past the end of the relevant node we don't get the desired indentation. This explains why the following cases do work (we are inside the function_definition):

def func1(#):

def func2():#
    x = 1

I suspect there isn't any way to fix this with the current way helix does indentation. Neovim seems to support some additional capture types, e.g., @branch, @auto, @aligned_indent, which might be why things work better there:

https://github.com/nvim-treesitter/nvim-treesitter/blob/master/queries/python/indents.scm

paul-scott · 2022-05-14T10:03:28Z

One idea to solve this is:

Add a new capture type called @indent-after
In indents.scm attach it to function_definition, class_definition, if_statement etc..
Implement handling in helix-core/src/indent.rs

Specifically when adding a new line, determine all nodes associated with one position back from the current cursor position. If any of those nodes are @indent-after and have a span that ends there, then count +1 indent. Note that this would only be counted when adding a new line, so @indent-after would be otherwise ignored.

I'm not sure if any other languages would potentially benefit from this (yaml?).

Another idea:

The above is reasonably convoluted given that the rule could probably be simplified down to: if the character at the end of the line is a :, inserting a new line should have the indentation of this line +1. So maybe there could be a capture @indent-after-line that is attached to ":" and on creating a new line if the previous / line ending character is : then copy the current line indentation using indent_level_for_line, and +1 to it.

@WindSoilder did you have any more thoughts on how to resolve this after doing the original implementation in #837?

WindSoilder · 2022-05-16T03:29:41Z

@paul-scott Sorry, I don't have ideas, and actually I don't know much about treesitter scm file...

I just think that once the relative treesitter issue nvim-treesitter/nvim-treesitter#1136 issue is correctly handled, we can (maybe)adjust indents.scm file to make it works properly.

paul-scott · 2022-05-16T04:57:03Z

No worries. When / if there is a solution for nvim it is possible that it gets solved specifically for nvim-treesitter in their wrapper / libraries rather than the upstream treesitter implementation. So while it would be useful to follow a similar approach, it could still use a bit of effort to implement for helix.

Neovim also has other non-treesitter ways to do indentation which in my experience work quite well. In helix I believe we just have the option of treesitter-based or copy-the-indentation-of-the-previous-line-based indentation. So there is more motivation at the moment to get a better solution working in helix whether tree-sitter-based or otherwise.

Triton171 · 2022-05-25T22:14:00Z

@paul-scott The indent-after idea is pretty much what I had in mind as well, just haven't gotten to implement it yet. There's a few things I have to add though:

In my opinion, this should simply extend the node that is captured by it (i.e. we pretend that the node contains more than what tree-sitter tells us it does). This should then be reasonably simple to implement: We have to increase the range of the query here in order to include any preceding node that should be extended. If there is an extended node, we have to use it here. This also has the advantage that we cannot only indent with this, but also outdent and do anything else we'll add to the indent system in the future.
What you suggest would be to extend the node by one character. This would improve the situation but not completely solve the problem: As soon as one leaves a line blank while writing a Python function, the indent would be lost, because now the end of the function node is more than one character away from the cursor:
```
def func():
    # This line will be indented
# This line will no longer be indented
```
Any whitespace after func(): would break the indentation for the first line in the function as well.

For python, we ideally want to extend the function node to everything that is currently indented more than the beginning of the function node. The indentation query depending on the current indentation is a bit circular, but that's just because, in Python, the syntax depends on the indentation. Similarly to this @extend-indented capture, it might make sense to have other captures. I believe, something like @extend-always would fix the lua indentation issues.

Extending a node would of course always stop at the beginning of its next sibling or at the end of its parent.

Let me know what you think and if I've overlooked anything. If you (or anyone else) want to implement it, I'll be happy to help with any questions. Otherwise, I'll do it at some point, but it'll probably take 2 or 3 months until I get to it.

paul-scott · 2022-05-25T23:19:28Z

What you suggest sounds like it would be cleaner.

For python, we ideally want to extend the function node to everything that is currently indented more than the beginning of the function node.

Yea that seems important in python.

Let me know what you think and if I've overlooked anything.

Nothing obvious to me at this point. I'm sure there will be some challenges when it actually comes to implementing it.

I'll have a go if I can find some spare time.

pickfire · 2022-05-27T01:45:27Z

One thing I noticed as well when O on first line, it will indent the first line and your cursor will jump to second line

#(|)#import line

o


    #(|)#import time

There's been a lot of complaints about the state of python indentation and the fallback actually works better until the solution proposed in #763 (comment) is implemented.

mitsuhiko · 2022-10-16T20:35:05Z

I believe related to this issue is that I think the underlying tree-sitter grammar for Python is not great for interactive use.

Case in point:

def foo():
    try:
        1 +

At this point foo is known as ["module", "function_definition", "identifier"]. However if I add an identifier after the 1 + to make it a full expression, all the sudden foo changes to ["module", "ERROR", "identifier"]. This can also be seen with how the syntax highlighting of the function name changes.

This relates to indentation because depending on the state of how it parses this, completely different rules apply. I reckon the underlying tree-sitter stuff for Python probably needs adjusting for the use here.

zyklotomic · 2022-12-23T06:02:24Z

I believe related to this issue is that I think the underlying tree-sitter grammar for Python is not great for interactive use.

Case in point:
def foo():
    try:
        1 + 
At this point foo is known as ["module", "function_definition", "identifier"]. However if I add an identifier after the 1 + to make it a full expression, all the sudden foo changes to ["module", "ERROR", "identifier"]. This can also be seen with how the syntax highlighting of the function name changes.

This relates to indentation because depending on the state of how it parses this, completely different rules apply. I reckon the underlying tree-sitter stuff for Python probably needs adjusting for the use here.

@mitsuhiko good point, I want to mention another find. tree-sitter doesn't recognize the try statement until we put the corresponding except:. Semantically this is correct, but does not make sense for interactive use as you said.

Could we just also add an @branch capure type like nvim does here?

I could try that and see how feasible that is, and submit a PR if I succeed.

Triton171 · 2022-12-23T12:56:55Z

The @branch capture type of nvim-treesitter roughly corresponds to the @outdent capture type in helix (compare their docs with docs for helix). Adding an @outdent capture for the except_clause makes sense but it wouldn't solve this specific issue (which is caused by the except clause not being there). I would guess (although I haven't tested it) that in this scenario, nvim-treesitter has the same issues, which could be adressed in one of 2 ways:

Improve the tree-sitter grammar for python, as @mitsuhiko mentioned. Ideally, an incomplete try statement would always result in a try_statement node with an error at its end or after it. At the very least, it should be consistent and not depend on what exactly the contents of the try statement are.
Since improving the grammar is probably quite difficult, it might also be possible to work around it by writing queries for the specific situations where the indentation is currently broken. This is only a workaround and it will probably never work for all situations but it might be possible to at least fix the common cases.

zyklotomic · 2022-12-24T08:06:28Z

My bad! I just realized I conflated two separate issues. When I was talking about @branch, I was referring to the else clause not retroactively outdenting, as you mentioned in your PR #3382. A branching capture type could be added to handle this retroactive case.

zyklotomic · 2022-12-28T15:39:45Z

After playing around with tree-sitter a bit more, the first proposed option of modifying tree-sitter would actually require allowing for "invalid" programs since a try-statement can't be standalone without an except or finally clause. Why if works is because we don't need an elif or else to be grammatically correct. Thus, if we are adamant on going this route, we probably would have to fork the Python tree-sitter grammar instead of proposing broken grammar changes upstream. This whole approach feels very shoehorn-ey to me.

As for the second mentioned approach, I initially tried

(
  (ERROR) @try_clause
  (#match? @try_clause "try:.*")
) @indent @extend

which handles

try:

correctly, but not:

def fn(x):
  try:

If you try it out in the tree-sitter playground, you'll see that the (ERROR) doesn't even match "try" in the second case.

To remedy this, maybe we could add more predicates such that we can single out specific subcaptures with a regex, like

(
  (ERROR) @try_clause
  (#regex-capture? @try_clause "try:.*")
) @indent @extend

These two solutions aside, another solution might be with a plugin system (#3806). We might want a plugin system to handle the retroactive dedenting too. Here is how neovim does it.

Triton171 · 2022-12-28T22:20:05Z

Thanks for looking into this @zyklotomic. What I mean by adjusting the tree-sitter grammar is not to make it parse incorrect code without producing any ERROR nodes. It should still produce errors (after all, the code is incorrect) but ideally they would be a bit more local. For example, an incomplete try_statement shouldn't suddenly make the surrounding function an error as well. Maybe it could even produce a try_statement node and put the ERROR inside. This way, highlighting and indentation would still work everywhere except for the exact location of the error.

I had a look at the playground and came up with some queries which seem to work pretty well. The PR is #5332, if you want to try it out and report any issues, that would be a great help.

As for retroactive indenting/dedenting, I'm thinking if we can find a general, language-agnostic solution to this. It'd be nice if Helix could automatically rerun indent queries when the syntax tree changes and adjust the indent of existing lines. This will probably take some time though, as it has to be fast, accurate and it needs to detect when the user is trying to override the computed indent (for example when it is incorrect).

zyklotomic · 2022-12-29T08:11:19Z

I like the idea of wanting to find a language-agnostic solution, but I'm also wondering how pragmatic it would be. What other languages depend on indentation as syntax / and or benefit from these changes?

Triton171 · 2022-12-30T11:29:37Z

I've definitely come across the need to retroactively update indentation in several languages. For example, in C/C++, labels are usually outdented:

int main() {
start:
    do_smth();
    goto start;
}

start being a label is only clear after typing :, so it would be nice if helix could outdent it then. For another example, some IDEs automatically indent code if you surround it with curly braces. This makes refactoring a lot more pleasant but is of course more complicated since we'd have to update the indentation for an (arbitrarily long) block of code and not just for a single line.

I do agree that python is probably one of the languages where such a feature is the most important.

sekirocc added the C-bug Category: This is a bug label Sep 18, 2021

sudormrfbin added A-tree-sitter Area: Tree-sitter C-enhancement Category: Improvements E-easy Call for participation: Experience needed to fix: Easy / not much and removed C-bug Category: This is a bug labels Sep 18, 2021

WindSoilder mentioned this issue Oct 14, 2021

Add indents for python, but it's not perfect. #837

Merged

archseer mentioned this issue Nov 7, 2021

Perl support #978

Merged

kirawi mentioned this issue Apr 2, 2022

Python auto indent not working #1935

Closed

sudormrfbin added E-hard Call for participation: Experience needed to fix: Hard / a lot and removed E-easy Call for participation: Experience needed to fix: Easy / not much labels Apr 23, 2022

the-mikedavis mentioned this issue Jun 2, 2022

Python indent not working #2651

Closed

Triton171 mentioned this issue Aug 10, 2022

Fix python indentation #3382

Merged

the-mikedavis mentioned this issue Aug 13, 2022

Grammars fetched and built but not in use #3407

Closed

archseer mentioned this issue Sep 18, 2022

tab in python #3883

Closed

tmke8 mentioned this issue Dec 9, 2022

python wrong indentation on try: #5045

Closed

Triton171 mentioned this issue Dec 28, 2022

Improve indent queries for python #5332

Merged

archseer closed this as completed in #5332 Jan 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python file indentation is wrong #763

Python file indentation is wrong #763

sekirocc commented Sep 18, 2021

sudormrfbin commented Sep 18, 2021 •

edited

Loading

lisael commented Sep 23, 2021 •

edited

Loading

lisael commented Sep 23, 2021 •

edited

Loading

willparsons commented Apr 4, 2022

thomasaarholt commented Apr 7, 2022

zSnails commented Apr 18, 2022

BonfireAtNight commented Apr 23, 2022

paul-scott commented May 13, 2022

paul-scott commented May 14, 2022

WindSoilder commented May 16, 2022

paul-scott commented May 16, 2022

Triton171 commented May 25, 2022

paul-scott commented May 25, 2022

pickfire commented May 27, 2022 •

edited

Loading

mitsuhiko commented Oct 16, 2022

zyklotomic commented Dec 23, 2022

Triton171 commented Dec 23, 2022

zyklotomic commented Dec 24, 2022

zyklotomic commented Dec 28, 2022

Triton171 commented Dec 28, 2022

zyklotomic commented Dec 29, 2022

Triton171 commented Dec 30, 2022

Python file indentation is wrong #763

Python file indentation is wrong #763

Comments

sekirocc commented Sep 18, 2021

Reproduction steps

Environment

sudormrfbin commented Sep 18, 2021 • edited Loading

lisael commented Sep 23, 2021 • edited Loading

lisael commented Sep 23, 2021 • edited Loading

willparsons commented Apr 4, 2022

thomasaarholt commented Apr 7, 2022

zSnails commented Apr 18, 2022

BonfireAtNight commented Apr 23, 2022

paul-scott commented May 13, 2022

paul-scott commented May 14, 2022

WindSoilder commented May 16, 2022

paul-scott commented May 16, 2022

Triton171 commented May 25, 2022

paul-scott commented May 25, 2022

pickfire commented May 27, 2022 • edited Loading

mitsuhiko commented Oct 16, 2022

zyklotomic commented Dec 23, 2022

Triton171 commented Dec 23, 2022

zyklotomic commented Dec 24, 2022

zyklotomic commented Dec 28, 2022

Triton171 commented Dec 28, 2022

zyklotomic commented Dec 29, 2022

Triton171 commented Dec 30, 2022

sudormrfbin commented Sep 18, 2021 •

edited

Loading

lisael commented Sep 23, 2021 •

edited

Loading

lisael commented Sep 23, 2021 •

edited

Loading

pickfire commented May 27, 2022 •

edited

Loading