Revamp the way tokens and comments are built into pieces. #1336

munificent · 2023-12-01T05:11:20Z

I recently ran into bugs where a line comment after some AST node will cause the node to split incorrectly. A simple example is:

var x = 1 + 2; // comment

Currently, the formatter splits that to:

var x =
    1 +
        2; // comment

It does that because the Piece tree it creates doesn't line up with the AST node boundaries. In particular, the current design appends tokens and comments to a preceding piece. So in this example, the piece tree looks like:

Var(
  `var`
  Assign(
    `x =`
    Infix(
      `1 +`
      `2; // comment`
    )
  )
)

Note how the ; and line comment are attached as part of the RHS of the +. That's why the formatter thinks the line comment's newline is inside the + expression and forces it to split.

We could fix this specific bug by making ExpressionStatement treat the ; as a separate piece, but I suspect that we will be playing whack-a-mole if we keep the current design. Instead, this unfortunately giant PR revamps the piece API. It has a couple of intermingled changes:

Split pieces at all AST boundaries

Whenever a visit___() returns, an implicit split is inserted so that no single TextPiece contains tokens from a parent and child AST node. This directly fixes the above bug and all similar bugs in that category.

Note that while we split the tokens into separate pieces, that doesn't mean they may split in the "line splitting" sense. The TextPieces go into AdjacentPiece objects that don't insert actual splits between the pieces. This means this change shouldn't significantly impact the performance of line splitting.

It's just about ensuring that the nesting structure of the piece tree mirrors the nesting structure of the AST. That way, when a newline in a child piece node invalidates an outer piece, that invalidation respecst the original syntax.

Revamp the API for creating pieces

The previous API had a DSL-like "push" API where the pieces created by PieceWriter were stored internally and exposed by a fairly confusing give()/take()/split() API. That was necessary because any given visit___() method might not be able to return a Piece for its node if that node just concatenated its tokens into some surrounding piece.

With the previous change where every AST node corresponds to a piece, we have that option. So this PR also makes that change. Every visit___() method is now required to return a piece. Likewise, all of the create___() methods in PieceFactory return the pieces they create. This avoids the need for a weird take() API.

Add an AdjacentBuilder and buildPiece() API

Getting rid of the implicit storage and dataflow for pieces is good for being able to easily reason about how the piece tree gets created out of child pieces. But it can come at the cost of making code that creates pieces very verbose with lots of local variables and List<Piece> objects to store the intermediate pieces being built.

To make that nicer, I wrote an AdjacentBuilder class with an imperative API for building an AdjacentPiece out of a series of tokens, nodes, and spaces. This API closely mirrors the original DSL-like API. Except now you know exactly what object the nodes and tokens are pushing their pieces into.

To make that even nicer, I added a buildPiece() method that takes a callback, invokes it with a new AdjacentBuilder, and return the built result. This gets most code for building pieces fairly close to the original push-based API but with hopefully clearer more explicit dataflow.

I'm deeply sorry for the giant size of this PR. If you really want, I can try to break it into a series of smaller commits (but likely still one PR), but doing so is pretty challenging given how intertwined these changes are. It's hard to change the return type of the visit methods without also getting rid of the implicit dataflow and at that point, almost all the changes are there.

Also, I added more tests to cover the cases around comments that were broken.

This PR is just a squashed version of the experimental branch. The latter has a completely bananas commit history because I was hunting for a solution and trying out several different API ideas.

I recently ran into bugs where a line comment after some AST node will cause the node to split incorrectly. A simple example is: ``` var x = 1 + 2; // comment ``` Currently, the formatter splits that to: ``` var x = 1 + 2; // comment ``` It does that because the Piece tree it creates doesn't line up with the AST node boundaries. In particular, the current design appends tokens and comments to a preceding piece. So in this example, the piece tree looks like: ``` Var( `var` Assign( `x =` Infix( `1 +` `2; // comment` ) ) ) ``` Note how the `;` and line comment are attached as part of the RHS of the `+`. That's why the formatter thinks the line comment's newline is inside the + expression and forces it to split. We could fix this specific bug by making ExpressionStatement treat the `;` as a separate piece, but I suspect that we will be playing whack-a-mole if we keep the current design. Instead, this unfortunately giant PR revamps the piece API. It has a couple of intermingled changes: ### Split pieces at all AST boundaries Whenever a `visit___()` returns, an implicit split is inserted so that no single `TextPiece` contains tokens from a parent and child AST node. This directly fixes the above bug and all similar bugs in that category. Note that while we split the tokens into separate pieces, that doesn't mean they may split in the "line splitting" sense. The TextPieces go into AdjacentPiece objects that don't insert actual splits between the pieces. This means this change shouldn't significantly impact the performance of line splitting. It's just about ensuring that the nesting structure of the piece tree mirrors the nesting structure of the AST. That way, when a newline in a child piece node invalidates an outer piece, that invalidation respecst the original syntax. ### Revamp the API for creating pieces The previous API had a DSL-like "push" API where the pieces created by PieceWriter were stored internally and exposed by a fairly confusing `give()`/`take()`/`split()` API. That was necessary because any given `visit___()` method might not be *able* to return a Piece for its node if that node just concatenated its tokens into some surrounding piece. With the previous change where every AST node corresponds to a piece, we have that option. So this PR also makes that change. Every `visit___()` method is now required to return a piece. Likewise, all of the `create___()` methods in PieceFactory return the pieces they create. This avoids the need for a weird `take()` API. ### Add an AdjacentBuilder and buildPiece() API Getting rid of the implicit storage and dataflow for pieces is good for being able to easily reason about how the piece tree gets created out of child pieces. But it can come at the cost of making code that creates pieces very verbose with lots of local variables and `List<Piece>` objects to store the intermediate pieces being built. To make that nicer, I wrote an AdjacentBuilder class with an imperative API for building an AdjacentPiece out of a series of tokens, nodes, and spaces. This API closely mirrors the original DSL-like API. Except now you know exactly what object the nodes and tokens are pushing their pieces into. To make that even nicer, I added a `buildPiece()` method that takes a callback, invokes it with a new AdjacentBuilder, and return the built result. This gets most code for building pieces fairly close to the original push-based API but with hopefully clearer more explicit dataflow. I'm really sorry for the giant size of this PR. If you want, I can try to break it into a series of smaller commits (but likely still one PR), but doing so is pretty challenging given how intertwined these changes are. It's hard to change the return type of the visit methods without also getting rid of the implicit dataflow and at that point, almost all the changes are there. Also, I added more tests to cover the cases around comments that were broken.

kallentu

This is huge haha. Seems good to me.

test/top_level/library_comment.unit

test/variable/local_comment.stmt

lib/src/piece/function.dart

lib/src/front_end/ast_node_visitor.dart

# Conflicts: # lib/src/front_end/ast_node_visitor.dart # lib/src/piece/adjacent.dart

lib/src/front_end/adjacent_builder.dart

lib/src/front_end/ast_node_visitor.dart

lib/src/front_end/comment_writer.dart

munificent requested review from natebosch and kallentu December 1, 2023 05:11

munificent mentioned this pull request Dec 1, 2023

Update to and fix latest lints #1335

Merged

kallentu approved these changes Dec 5, 2023

View reviewed changes

test/top_level/library_comment.unit Outdated Show resolved Hide resolved

test/variable/local_comment.stmt Show resolved Hide resolved

lib/src/piece/function.dart Show resolved Hide resolved

lib/src/front_end/ast_node_visitor.dart Show resolved Hide resolved

Merge branch 'main' into revamp-piece-api

ecea321

# Conflicts: # lib/src/front_end/ast_node_visitor.dart # lib/src/piece/adjacent.dart

natebosch approved these changes Dec 5, 2023

View reviewed changes

lib/src/front_end/adjacent_builder.dart Outdated Show resolved Hide resolved

lib/src/front_end/ast_node_visitor.dart Show resolved Hide resolved

lib/src/front_end/ast_node_visitor.dart Show resolved Hide resolved

lib/src/front_end/comment_writer.dart Outdated Show resolved Hide resolved

munificent added 2 commits December 5, 2023 16:08

Apply review feedback.

3658546

Remove unused import.

4592707

munificent merged commit 7ae652e into main Dec 6, 2023
7 checks passed

munificent deleted the revamp-piece-api branch December 6, 2023 00:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revamp the way tokens and comments are built into pieces. #1336

Revamp the way tokens and comments are built into pieces. #1336

munificent commented Dec 1, 2023

kallentu left a comment

Revamp the way tokens and comments are built into pieces. #1336

Revamp the way tokens and comments are built into pieces. #1336

Conversation

munificent commented Dec 1, 2023

Split pieces at all AST boundaries

Revamp the API for creating pieces

Add an AdjacentBuilder and buildPiece() API

kallentu left a comment

Choose a reason for hiding this comment