Skip to content
This repository has been archived by the owner on Dec 15, 2022. It is now read-only.

Implement new indentation logic based on tree-sitter #594

Open
chfritz opened this issue Sep 10, 2018 · 7 comments · May be fixed by #608
Open

Implement new indentation logic based on tree-sitter #594

chfritz opened this issue Sep 10, 2018 · 7 comments · May be fixed by #608

Comments

@chfritz
Copy link

chfritz commented Sep 10, 2018

Enhancement

Description

As discussed in various places, including atom/atom#10384 (comment), there is an opportunity to implement a new indentation logic that heavily exploits the output from the new tree-sitter parser. This makes it possible to address a number of issues users have been seeing with indentation, including in other languages, e.g., atom/atom#6655.

Status

I've started developing this in my sane-indentation package, and version 0.9.3 already use it. It works very well so far and without having gone through the above linked list of issues in detail, it seems to fix a good number of them already. I'm showing an example of its current indentation below.

Proposed Approach

I plan to complete the work for this new logic in the package for JavaScript. Once that is done, i.e., it fixes all known indentation issues, so long as they are not mutually exclusive (for instance, because they were just a matter of opinion), then I propose we discuss porting that code into this package (language-javascript). If and when that is completed, i.e., we all agree that this is the right approach, then it should be relatively simple to port this to other languages as well.

Example

foo({
    sd,
    sdf
  },
  4
);

foo( 2, {
    sd,
    sdf
  },
  4
);

foo( 2,
  {
    sd,
    sdf
  });

foo(2,
  4);

var x = [
  3,
  4
];

if (true) {
  foo();
  bar();
} else {
  foo();
  bar();
}

if (true)
  foo();
else
  bar();

const jsx = (
  <div
    title='start'
    >
    good
    <a>
      link
    </a>
    <i>
      sdfg
    </i>
    <div>
      sdf
    </div>
  </div>
);

const two = (
  <div>
    <b>
      test
    </b>
    <b>
      test
    </b>
  </div>
);

const x = {
  g: {
    a: 1,
    b: 2
  },
  h: {
    c: 3
  }
}

/* multi-line expressions */
req
  .shouldBeOne();
too.
  more.
  shouldBeOneToo;

const a =
  long_expression;

b =
  long;

b =
  3 + 5;

/**
  Comments
*/

while (mycondition) {
  sdfsdfg();
}
@chfritz
Copy link
Author

chfritz commented Sep 10, 2018

Feel free to assign this to me.

@lee-dohm
Copy link
Contributor

@maxbrunsfeld Do you want to weigh in on this?

@maxbrunsfeld
Copy link
Contributor

@chfritz This sounds really great; I'm glad that you're working on this! I like the way that you've already structured your package to separate the per-language configuration from the core logic. I always envisioned some system like this.

I propose we discuss porting that code into this package (language-javascript)

I think that once we're confident, we should try to generalize the system to one or two other fairly different languages, like Python, HTML, or Ruby. Then, once that is working fairly well, we should port the core logic to Atom core, and the configuration to the language packages.

In atom core, we could look for an optional indent-config section in the grammar. If it's there, we could use your new logic, and if it's not, we could fall back to the old logic.

@chfritz
Copy link
Author

chfritz commented Sep 12, 2018

Sounds good @maxbrunsfeld .

One issue I'm struggling with right now is when the part of the program I'm in is in a state of invalid syntax. The tree-sitter parser then fails for this part. A part here is any section of code that starts at the file scope (indentation zero).

For instance, the following would result:

if (myTest) {
  const all = "good";
}

if (myTest2) {
const failsToIndent = "because_there_is_no_closing_bracket";
const object = {
andAlsoThisfails: "for_the_same_reason";

For now, when parsing fails, my code suggests to stay at the indentation of the previous line, but that's brittle.

So my question is, does tree-sitter support a "loose" parsing of the code, like acorn-loose? On https://github.com/tree-sitter/tree-sitter it says that it is "Robust enough to provide useful results even in the presence of syntax errors" and I've looked at some of the links regarding error recovery in LR parsers, but I wasn't sure whether that robustness is already being "used" in atom right now. If so, meaning we'll have to deal with how it is now since error recovery is obviously hard, then it might make sense to indicate the broken syntax to the user (e.g., using red underlining). That would make it easier to recover and also help him understand why indentation may be wrong at that point.

Any comments on this? I'm sure you've thought about this problem a fair bit already, especially since it is also an opportunity (help finding syntax errors would certainly be appreciated by the user).

@maxbrunsfeld
Copy link
Contributor

maxbrunsfeld commented Sep 12, 2018

I wasn't sure whether that robustness is already being "used" in atom right now.

Yeah, it already is; that's the feature that basically allows the majority of the file to parse while other parts have errors.

it might make sense to indicate the broken syntax to the user (e.g., using red underlining).

Yeah, I've experimented with this in the past, but I haven't chosen to go that route. The main reason is that in some languages like C, because you can do such unpredictable things with the preprocessor, we'll never be able to parse everything without errors; instead, we have to rely on error recovery for presenting useful results to the user.

I could imagine an approach where, if there's an error at the start of the current line or the end of the previous line, you inspect the tokens within the error, and use those tokens to compute the suggested indent level. For example, if you're in an error, but the previous line ends with a { or = token, you might want to indent.

The good news is that even when there's an error, we still have tokens to work with; we don't have to look at the raw text, so we don't have to worry about { or = characters occurring within strings, or that sort of thing. It's not fully thought through, but I think it may be worth exploring.

@chfritz
Copy link
Author

chfritz commented Sep 22, 2018

OK, so I've made a bunch of progress within the package. It now has support to Javascript (incl. JSX), HTML, Python, C and C++. The latter three could use some more testing, especially some of the C++11 features, which I'm not yet familiar with. But I think the overall structure is more or less complete and it is just a matter of configuration now, at least for these languages. You can check out how it currently indents by looking at the sample files in https://github.com/chfritz/atom-sane-indentation/tree/master/spec/fixtures. The current indentation in these files is how the package currently suggests it.

Are there any other languages you think we should try before we think about creating a PR into atom core? I haven't done Ruby yet because I don't know it and can't quite judge what's right from wrong. But if someone can provide me with a reasonably exhaustive sample file (exhaustive in the sense of containing all language features) that is already indented correctly, then I'd be happy to try it.

chfritz added a commit to chfritz/atom that referenced this issue Oct 25, 2018
- previously developed and tested in the sane-indentation package (> 0.9).

Refer to atom/language-javascript#594 (comment)

By itself this does nothing. The new logic is only used if the language package for the current language contains the necessary configuration (e.g., which scopes to indent on). So this PR goes together with, e.g., FILL-ME-IN in language-javascript.
chfritz added a commit to chfritz/language-javascript that referenced this issue Oct 25, 2018
…core

See this PR in atom/atom: FILL-ME-IN.

Once the other PR is merged, this one then fixes atom#594.
@chfritz
Copy link
Author

chfritz commented Oct 25, 2018

@maxbrunsfeld I've gone ahead and created a PR in core and a corresponding PR in language-javascript with the configuration required for JavaScript. If and when that is reviewed and merged, I can port the configuration I already have for the other languages I'm listing above (and which are already in the sane-indentation package code).

chfritz added a commit to chfritz/atom that referenced this issue Nov 2, 2018
- previously developed and tested in the sane-indentation package (> 0.9).

Refer to atom/language-javascript#594 (comment)

By itself this does nothing. The new logic is only used if the language package for the current language contains the necessary configuration (e.g., which scopes to indent on). So this PR goes together with, e.g., FILL-ME-IN in language-javascript.
chfritz added a commit to chfritz/atom that referenced this issue Nov 8, 2018
- previously developed and tested in the sane-indentation package (> 0.9).

Refer to atom/language-javascript#594 (comment)

By itself this does nothing. The new logic is only used if the language package for the current language contains the necessary configuration (e.g., which scopes to indent on). So this PR goes together with, e.g., FILL-ME-IN in language-javascript.
chfritz added a commit to chfritz/language-javascript that referenced this issue Jan 7, 2019
…core

See this PR in atom/atom: FILL-ME-IN.

Once the other PR is merged, this one then fixes atom#594.
chfritz added a commit to chfritz/atom that referenced this issue Jan 7, 2019
- previously developed and tested in the sane-indentation package (> 0.9).

Refer to atom/language-javascript#594 (comment)

By itself this does nothing. The new logic is only used if the language package for the current language contains the necessary configuration (e.g., which scopes to indent on). So this PR goes together with, e.g., FILL-ME-IN in language-javascript.
chfritz added a commit to chfritz/atom that referenced this issue Jan 7, 2019
- previously developed and tested in the sane-indentation package (> 0.9).

Refer to atom/language-javascript#594 (comment)

By itself this does nothing. The new logic is only used if the language package for the current language contains the necessary configuration (e.g., which scopes to indent on). So this PR goes together with, e.g., FILL-ME-IN in language-javascript.
chfritz added a commit to chfritz/language-javascript that referenced this issue Jan 12, 2019
…core

See this PR in atom/atom: atom/atom#18321.

Once the other PR is merged, this one then fixes atom#594.

Updated: now without the need for a callback function as part of configuration
chfritz added a commit to chfritz/atom that referenced this issue Jan 12, 2019
- previously developed and tested in the sane-indentation package (> 0.9).

Refer to atom/language-javascript#594 (comment)

By itself this does nothing. The new logic is only used if the language package for the current language contains the necessary configuration (e.g., which scopes to indent on). So this PR goes together with, e.g., FILL-ME-IN in language-javascript.

Updated: now without the need for 'precedingRowCondition' callbacks in the languages-specific configuration
chfritz added a commit to chfritz/atom that referenced this issue Jan 12, 2019
- previously developed and tested in the sane-indentation package (> 0.9).

Refer to atom/language-javascript#594 (comment)

By itself this does nothing. The new logic is only used if the language package for the current language contains the necessary configuration (e.g., which scopes to indent on). So this PR goes together with, e.g., FILL-ME-IN in language-javascript.

Updated: now without the need for 'precedingRowCondition' callbacks in the languages-specific configuration
chfritz added a commit to chfritz/atom that referenced this issue Jan 12, 2019
- previously developed and tested in the sane-indentation package (> 0.9).

Refer to atom/language-javascript#594 (comment)

By itself this does nothing. The new logic is only used if the language package for the current language contains the necessary configuration (e.g., which scopes to indent on). So this PR goes together with, e.g., FILL-ME-IN in language-javascript.

Updated: now without the need for 'precedingRowCondition' callbacks in the languages-specific configuration
chfritz added a commit to chfritz/atom that referenced this issue Jan 19, 2019
- previously developed and tested in the sane-indentation package (> 0.9).

Refer to atom/language-javascript#594 (comment)

By itself this does nothing. The new logic is only used if the language package for the current language contains the necessary configuration (e.g., which scopes to indent on). So this PR goes together with, e.g., FILL-ME-IN in language-javascript.

Updated: now without the need for 'precedingRowCondition' callbacks in the languages-specific configuration
chfritz added a commit to chfritz/language-javascript that referenced this issue Jan 21, 2019
…core

See this PR in atom/atom: atom/atom#18321.

Once the other PR is merged, this one then fixes atom#594.

Updated: now without the need for a callback function as part of configuration
sadick254 pushed a commit to chfritz/atom that referenced this issue May 20, 2021
- previously developed and tested in the sane-indentation package (> 0.9).

Refer to atom/language-javascript#594 (comment)

By itself this does nothing. The new logic is only used if the language package for the current language contains the necessary configuration (e.g., which scopes to indent on). So this PR goes together with, e.g., FILL-ME-IN in language-javascript.

Updated: now without the need for 'precedingRowCondition' callbacks in the languages-specific configuration
chfritz added a commit to chfritz/atom that referenced this issue May 25, 2021
- previously developed and tested in the sane-indentation package (> 0.9).

Refer to atom/language-javascript#594 (comment)

By itself this does nothing. The new logic is only used if the language package for the current language contains the necessary configuration (e.g., which scopes to indent on). So this PR goes together with, e.g., FILL-ME-IN in language-javascript.

Updated: now without the need for 'precedingRowCondition' callbacks in the languages-specific configuration
chfritz added a commit to chfritz/atom that referenced this issue May 27, 2021
- previously developed and tested in the sane-indentation package (> 0.9).

Refer to atom/language-javascript#594 (comment)

By itself this does nothing. The new logic is only used if the language package for the current language contains the necessary configuration (e.g., which scopes to indent on). So this PR goes together with, e.g., FILL-ME-IN in language-javascript.

Updated: now without the need for 'precedingRowCondition' callbacks in the languages-specific configuration
chfritz added a commit to chfritz/atom that referenced this issue May 28, 2021
- previously developed and tested in the sane-indentation package (> 0.9).

Refer to atom/language-javascript#594 (comment)

By itself this does nothing. The new logic is only used if the language package for the current language contains the necessary configuration (e.g., which scopes to indent on). So this PR goes together with, e.g., FILL-ME-IN in language-javascript.

Updated: now without the need for 'precedingRowCondition' callbacks in the languages-specific configuration
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants