Implement new indentation logic based on tree-sitter #594

chfritz · 2018-09-10T01:33:49Z

Enhancement

Description

As discussed in various places, including atom/atom#10384 (comment), there is an opportunity to implement a new indentation logic that heavily exploits the output from the new tree-sitter parser. This makes it possible to address a number of issues users have been seeing with indentation, including in other languages, e.g., atom/atom#6655.

Status

I've started developing this in my sane-indentation package, and version 0.9.3 already use it. It works very well so far and without having gone through the above linked list of issues in detail, it seems to fix a good number of them already. I'm showing an example of its current indentation below.

Proposed Approach

I plan to complete the work for this new logic in the package for JavaScript. Once that is done, i.e., it fixes all known indentation issues, so long as they are not mutually exclusive (for instance, because they were just a matter of opinion), then I propose we discuss porting that code into this package (language-javascript). If and when that is completed, i.e., we all agree that this is the right approach, then it should be relatively simple to port this to other languages as well.

Example

foo({
    sd,
    sdf
  },
  4
);

foo( 2, {
    sd,
    sdf
  },
  4
);

foo( 2,
  {
    sd,
    sdf
  });

foo(2,
  4);

var x = [
  3,
  4
];

if (true) {
  foo();
  bar();
} else {
  foo();
  bar();
}

if (true)
  foo();
else
  bar();

const jsx = (
  <div
    title='start'
    >
    good
    <a>
      link
    </a>
    <i>
      sdfg
    </i>
    <div>
      sdf
    </div>
  </div>
);

const two = (
  <div>
    <b>
      test
    </b>
    <b>
      test
    </b>
  </div>
);

const x = {
  g: {
    a: 1,
    b: 2
  },
  h: {
    c: 3
  }
}

/* multi-line expressions */
req
  .shouldBeOne();
too.
  more.
  shouldBeOneToo;

const a =
  long_expression;

b =
  long;

b =
  3 + 5;

/**
  Comments
*/

while (mycondition) {
  sdfsdfg();
}

chfritz · 2018-09-10T01:34:08Z

Feel free to assign this to me.

lee-dohm · 2018-09-11T20:19:20Z

@maxbrunsfeld Do you want to weigh in on this?

maxbrunsfeld · 2018-09-11T20:32:41Z

@chfritz This sounds really great; I'm glad that you're working on this! I like the way that you've already structured your package to separate the per-language configuration from the core logic. I always envisioned some system like this.

I propose we discuss porting that code into this package (language-javascript)

I think that once we're confident, we should try to generalize the system to one or two other fairly different languages, like Python, HTML, or Ruby. Then, once that is working fairly well, we should port the core logic to Atom core, and the configuration to the language packages.

In atom core, we could look for an optional indent-config section in the grammar. If it's there, we could use your new logic, and if it's not, we could fall back to the old logic.

chfritz · 2018-09-12T04:18:38Z

Sounds good @maxbrunsfeld .

One issue I'm struggling with right now is when the part of the program I'm in is in a state of invalid syntax. The tree-sitter parser then fails for this part. A part here is any section of code that starts at the file scope (indentation zero).

For instance, the following would result:

if (myTest) {
  const all = "good";
}

if (myTest2) {
const failsToIndent = "because_there_is_no_closing_bracket";
const object = {
andAlsoThisfails: "for_the_same_reason";

For now, when parsing fails, my code suggests to stay at the indentation of the previous line, but that's brittle.

So my question is, does tree-sitter support a "loose" parsing of the code, like acorn-loose? On https://github.com/tree-sitter/tree-sitter it says that it is "Robust enough to provide useful results even in the presence of syntax errors" and I've looked at some of the links regarding error recovery in LR parsers, but I wasn't sure whether that robustness is already being "used" in atom right now. If so, meaning we'll have to deal with how it is now since error recovery is obviously hard, then it might make sense to indicate the broken syntax to the user (e.g., using red underlining). That would make it easier to recover and also help him understand why indentation may be wrong at that point.

Any comments on this? I'm sure you've thought about this problem a fair bit already, especially since it is also an opportunity (help finding syntax errors would certainly be appreciated by the user).

maxbrunsfeld · 2018-09-12T16:10:46Z

I wasn't sure whether that robustness is already being "used" in atom right now.

Yeah, it already is; that's the feature that basically allows the majority of the file to parse while other parts have errors.

it might make sense to indicate the broken syntax to the user (e.g., using red underlining).

Yeah, I've experimented with this in the past, but I haven't chosen to go that route. The main reason is that in some languages like C, because you can do such unpredictable things with the preprocessor, we'll never be able to parse everything without errors; instead, we have to rely on error recovery for presenting useful results to the user.

I could imagine an approach where, if there's an error at the start of the current line or the end of the previous line, you inspect the tokens within the error, and use those tokens to compute the suggested indent level. For example, if you're in an error, but the previous line ends with a { or = token, you might want to indent.

The good news is that even when there's an error, we still have tokens to work with; we don't have to look at the raw text, so we don't have to worry about { or = characters occurring within strings, or that sort of thing. It's not fully thought through, but I think it may be worth exploring.

chfritz · 2018-09-22T23:27:50Z

OK, so I've made a bunch of progress within the package. It now has support to Javascript (incl. JSX), HTML, Python, C and C++. The latter three could use some more testing, especially some of the C++11 features, which I'm not yet familiar with. But I think the overall structure is more or less complete and it is just a matter of configuration now, at least for these languages. You can check out how it currently indents by looking at the sample files in https://github.com/chfritz/atom-sane-indentation/tree/master/spec/fixtures. The current indentation in these files is how the package currently suggests it.

Are there any other languages you think we should try before we think about creating a PR into atom core? I haven't done Ruby yet because I don't know it and can't quite judge what's right from wrong. But if someone can provide me with a reasonably exhaustive sample file (exhaustive in the sense of containing all language features) that is already indented correctly, then I'd be happy to try it.

- previously developed and tested in the sane-indentation package (> 0.9). Refer to atom/language-javascript#594 (comment) By itself this does nothing. The new logic is only used if the language package for the current language contains the necessary configuration (e.g., which scopes to indent on). So this PR goes together with, e.g., FILL-ME-IN in language-javascript.

…core See this PR in atom/atom: FILL-ME-IN. Once the other PR is merged, this one then fixes atom#594.

chfritz · 2018-10-25T04:49:26Z

@maxbrunsfeld I've gone ahead and created a PR in core and a corresponding PR in language-javascript with the configuration required for JavaScript. If and when that is reviewed and merged, I can port the configuration I already have for the other languages I'm listing above (and which are already in the sane-indentation package code).

- previously developed and tested in the sane-indentation package (> 0.9). Refer to atom/language-javascript#594 (comment) By itself this does nothing. The new logic is only used if the language package for the current language contains the necessary configuration (e.g., which scopes to indent on). So this PR goes together with, e.g., FILL-ME-IN in language-javascript.

…core See this PR in atom/atom: FILL-ME-IN. Once the other PR is merged, this one then fixes atom#594.

- previously developed and tested in the sane-indentation package (> 0.9). Refer to atom/language-javascript#594 (comment) By itself this does nothing. The new logic is only used if the language package for the current language contains the necessary configuration (e.g., which scopes to indent on). So this PR goes together with, e.g., FILL-ME-IN in language-javascript.

…core See this PR in atom/atom: atom/atom#18321. Once the other PR is merged, this one then fixes atom#594. Updated: now without the need for a callback function as part of configuration

- previously developed and tested in the sane-indentation package (> 0.9). Refer to atom/language-javascript#594 (comment) By itself this does nothing. The new logic is only used if the language package for the current language contains the necessary configuration (e.g., which scopes to indent on). So this PR goes together with, e.g., FILL-ME-IN in language-javascript. Updated: now without the need for 'precedingRowCondition' callbacks in the languages-specific configuration

…core See this PR in atom/atom: atom/atom#18321. Once the other PR is merged, this one then fixes atom#594. Updated: now without the need for a callback function as part of configuration

- previously developed and tested in the sane-indentation package (> 0.9). Refer to atom/language-javascript#594 (comment) By itself this does nothing. The new logic is only used if the language package for the current language contains the necessary configuration (e.g., which scopes to indent on). So this PR goes together with, e.g., FILL-ME-IN in language-javascript. Updated: now without the need for 'precedingRowCondition' callbacks in the languages-specific configuration

chfritz mentioned this issue Sep 10, 2018

Updated regex's for indentation #301

Closed

chfritz mentioned this issue Sep 27, 2018

Incorrect indentation after case statement #298

Open

chfritz added a commit to chfritz/language-javascript that referenced this issue Oct 25, 2018

Adding configuration for new, tree-sitter based indentation logic in …

43dd402

…core See this PR in atom/atom: FILL-ME-IN. Once the other PR is merged, this one then fixes atom#594.

This was referenced Oct 25, 2018

Adding configuration for new, tree-sitter based indentation logic #608

Open

Implementing tree-sitter based indentation logic atom/atom#18321

Merged

chfritz added a commit to chfritz/language-javascript that referenced this issue Jan 7, 2019

Adding configuration for new, tree-sitter based indentation logic in …

955bc72

…core See this PR in atom/atom: FILL-ME-IN. Once the other PR is merged, this one then fixes atom#594.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement new indentation logic based on tree-sitter #594

Implement new indentation logic based on tree-sitter #594

chfritz commented Sep 10, 2018

chfritz commented Sep 10, 2018 •

edited

Loading

lee-dohm commented Sep 11, 2018

maxbrunsfeld commented Sep 11, 2018

chfritz commented Sep 12, 2018

maxbrunsfeld commented Sep 12, 2018 •

edited

Loading

chfritz commented Sep 22, 2018 •

edited

Loading

chfritz commented Oct 25, 2018 •

edited

Loading

Implement new indentation logic based on tree-sitter #594

Implement new indentation logic based on tree-sitter #594

Comments

chfritz commented Sep 10, 2018

Enhancement

Description

Status

Proposed Approach

Example

chfritz commented Sep 10, 2018 • edited Loading

lee-dohm commented Sep 11, 2018

maxbrunsfeld commented Sep 11, 2018

chfritz commented Sep 12, 2018

maxbrunsfeld commented Sep 12, 2018 • edited Loading

chfritz commented Sep 22, 2018 • edited Loading

chfritz commented Oct 25, 2018 • edited Loading

chfritz commented Sep 10, 2018 •

edited

Loading

maxbrunsfeld commented Sep 12, 2018 •

edited

Loading

chfritz commented Sep 22, 2018 •

edited

Loading

chfritz commented Oct 25, 2018 •

edited

Loading