forc-fmt: Consider parsing with comments #3789

kayagokalp · 2023-01-16T15:47:15Z

We are facing lots of small issues regarding comment insertion. At the time of #2311 we were still exploring the issue and after maintaining this way of handling comments for couple of months, I think it seems like a better way of handling should be discussed. I am not linking all comment related hot-fix PRs but we had couple of them 😄

Proposal

Let's parse comments and produce a CommentedParseTree just like we are lexing comments (we have CommentedTokenStream vs TokenStream etc). This way re-constructing the comments section would not need:

Search the unformatted code
Search the formatted code

We would be simply formatting comments just like we are formatting other stuff like Tys, exprs etc. I believe this would be a little bit faster and most importantly it could lead to less problems. Currently once we have some edge case stuff happening related to the comments we tend to introduce rather "hacky" solutions as the interaction between formatter and comment handler is not simple enough.

With parsed comments, if we can also collect some sort of context we could still be sure that we are not altering the intentional comment styles (rust classifies comments like this)

Any thoughts on this? cc @FuelLabs/tooling

The text was updated successfully, but these errors were encountered:

eightfilms · 2023-01-16T16:22:59Z

+1 on this, swayfmt v2 seems mature enough to work well for most cases but it seems like the search-and-insert method is leading to a few weird behavior surrounding formatting with comments. Right now fixing these edge cases are just small changes but this could become hard to maintain over time.

mitchmindtree · 2023-01-17T04:04:21Z

Sounds good to me!

This would also address #2357.

Originally, we avoided trying to produce a Commented alternative to the sway_ast::Module due to the sheer number of places that comments can appear within the AST (between every leaf node!) however as we're discovering, the unstructured alternative leaves a lot of edge cases that are difficult to reliably handle in a non-destructive way.

I just had a chat with @bingcicle and after checking rustfmt again for better ideas, he'll start looking into how feasible this is.

The gist of the idea is to mirror the existing AST of sway_ast under a new commented module and replace each field with a commented alternative along the lines of the following:

pub struct Commented<T> {
    pub node: T,
    pub comments: Vec<Comment>,
}

pub struct Comment {
    pub span: Span,
}

Where comments is a list of comments that occur between this node and the previous node.

With parsed comments, if we can also collect some sort of context we could still be sure that we are not altering the intentional comment styles (rust classifies comments like this)

I think inferring the intended "style" of comments like this can be tricky to do reliably. While that CommentStyle enum looks handy, I'm skeptical that it can reliably describe a lot of cases. E.g. while I was chatting with Bing, I found a few cases where rustfmt automatically deletes comments entirely 😅

We might be better off starting by simply collecting comments that precede the node and re-adding them verbatim. If a node has a commented child where the comment style is at all ambiguous, we can avoid formatting the item for now.

Alternative

Thinking on this more, I think there's still a chance we can find a simpler alternative that doesn't require constructing an entire alternative commented AST. I'm imagining something like this:

Before formatting a module begins, construct a CommentMap, i.e. a BTreeMap from spans to comments (I think we already have a type for this somewhere).
Provide a reference to the CommentMap to the Format::format trait method (potentially via the Formatter type).
Provide a CommentMap method that makes it easy to check whether or not a node intersects a comment by checking the node's span against the btreemap.

This approach might be helped by splitting the Format trait into Visitor and Format traits, separating the logic for traversing the nodes from the logic for formatting them, e.g. https://refactoring.guru/design-patterns/visitor/rust/example. This way, before "visiting" a child node, the visitor could check whether or not the child directly follows a comment by checking it's span against the CommentMap, and if so, allow for formatting it as a Commented<ChildNode> rather than a ChildNode.

This might require a less of a significant refactor, but get close to the "structured"-ness of the full commented AST approach? @bingcicle if you're interested in this approach let me know and we can have another chat.

eightfilms · 2023-01-18T08:10:00Z

I'm starting to think the alternative approach might be preferred here for now, even though the commented AST approach would probably be a more structurally sound approach. I think the formatter as it is right now is good for most cases, and perhaps what we need to do is to rethink the heuristics when it comes to formatting commenting. The visitor pattern seems like a great idea - it seems to be what rustfmt is doing as well.

With that said, rustfmt seemingly has similar issues with comment formatting in the same way we do now, leading to a recurring issue of the formatter destructively removing comments without warning. In some cases, this destructive manner could severely impact the devEx of the formatter.

One additional thing we should probably consider here is to define clearly what is and what isn't a valid comment. I'm wary of trying to follow in rustfmt's footsteps here. In the rustfmt guide there is a recommendation to "avoid writing comments on the same line as the braces", and the general sense is that rustfmt is flexible about where you can place your comments, and it will do the rest for you. However, I think leaving options open make it much harder to have a good solution for a very minor improvement in user flexibility. This issue comes to mind and @Braqzen also made a good point there - perhaps we should also think about restricting what is valid in the context of formatting comments while tackling this issue. Doing this will probably help with reducing the destructive behavior we're seeing happen in rustfmt (and sometimes, already in swayfmt) while reducing the scope of work here as well, at the cost of slightly less flexibility when it comes to comment syntax.

mitchmindtree · 2023-01-19T02:46:33Z

I'm certainly open to restricting certain kinds of comments if it does indeed simplify our implementation, however this restriction should be imposed during parsing, and not only through forc fmt mysteriously deleting invalid comments.

Perhaps for now we can forge ahead with the "Alternative" visitor approach above and get as far as we can handling as many types of comments as we can. During this, we collect examples of comments that we are unable to support in a practical manner, and we propose to the lang team that we disallow these during parsing?

eightfilms · 2023-01-19T03:28:33Z

Yep, agreed that it should be imposed under parsing and not under formatting.

Perhaps for now we can forge ahead with the "Alternative" visitor approach above and get as far as we can handling as many types of comments as we can. During this, we collect examples of comments that we are unable to support in a practical manner, and we propose to the lang team that we disallow these during parsing?

This sounds like a good way to begin!

eightfilms · 2023-01-20T04:47:46Z

Provide a CommentMap method that makes it easy to check whether or not a node intersects a comment by checking the node's span against the btreemap

By 'intersect', do you mean any node's span that is in range with a comment's span? How would this play well with for eg. trailing comments that don't intersect?

This way, before "visiting" a child node, the visitor could check whether or not the child directly follows a comment by checking it's span against the CommentMap, and if so, allow for formatting it as a Commented rather than a ChildNode.

If we are incrementally implementing this for each known AST node, it means that parts of comment formatting will still be handled via insertion while we are migrating to a Visitor pattern where we format both the node and the comment in the same pass, correct? In that case, our goal would be to handle whatever comment formatting we can within the initial formatting phase, remove the formatted comments from the CommentMap, then pass it over to handle_comments() to do the remainder. If this is the case, we would also need some ability to 'look back' so we know that there are no other tokens in between the end of a comment span and the start of a node span.

I think the first PR here to make these changes should just be the skeleton for the new pattern, aka introduce an impl FmtVisitor for all nodes, and simply replace all usages of format() with visit(), and within visit() we just call the current implementations - and from there we open new issues to tackle all the missing formatting for the commented versions of the nodes. This would probably help with reviewing big diffs, since we know that there are no logical changes there.

kayagokalp · 2023-01-20T12:58:30Z

How would this play well with for eg. trailing comments that don't intersect?

I had a similar problem with initial implementation, since we are searching between spans for comments, a comment that is after (or before) all of the nodes in the tree was getting chopped off. I remember working around that by simply inserting two dummy spans one that points to the beginning of the file, and the other pointing to the end of the file.

eightfilms · 2023-01-20T16:38:43Z

I had a similar problem with initial implementation, since we are searching between spans for comments, a comment that is after (or before) all of the nodes in the tree was getting chopped off

This was also a problem I found while digging through old rustfmt issues - they also had problems with trying to format comments between rustc AST nodes...

I remember working around that by simply inserting two dummy spans one that points to the beginning of the file, and the other pointing to the end of the file.

This is good to know! Any other gotchas to be aware of here?

So far, this is what I'm doing:

we used to initialize the CommentMap at the comment insertion step. Instead, lets do it right before formatting now, and have the Formatter hold it as a &mut CommentMap
we then begin formatting. However, we do not implement FmtVisitor - I realized that it isn't really a useful abstraction at the moment. Instead I have a helper function that finds the comments within the comment map that are between a given start and end point - this can be between the end of a preceding span and a start of a following span. This has been particularly useful in certain cases that were hard to format before, like comments between } and else.
we format the node there and then with the comment, and remove the entry from the CommentMap. This mutated CommentMap is later passed to handle_comments() which will format the remaining comments without the already formatted comment. This makes it so we can incrementally introduce the new pattern while still allowing us to use our old way of comment insertion.

A potential downside i'm concerned about here is if we end up with a lot more code, since we're no longer doing everything in handle_comments(), but rather at each impl Format of each node. But I think this might be a clearer way to reason about formatting comments, and we also get access to other properties of Formatter like the indentation when doing it at this stage.

kayagokalp · 2023-01-20T17:45:44Z

If this would mean we would be looking for bugs in specific item's formatting function, it is still a big win imho :)

closes #3853 closes #3574 ## Overview Currently, comment formatting is done post-formatting of the source code, which can make it difficult to reason about formatting comments in certain scenarios. There have been a few cases (#3574, #2888, #2649) where comments are often wrongly indented. In #3789, there is extensive discussion on an alternative way of handling comment formatting. It seems like we're at a similar problem that rustfmt originally encountered since comments are not part of the AST. ## Approach Instead of formatting comments _after_ formatting the source code as a whole w/o comments, we now format comments locally within each `Format` implementation of each AST node. Few advantages of doing this: 1) We now have the context of the formatter at the point of which the comments are being added, e.g. we can use `formatter.shape.indent` to indent comments, especially in odd locations. 2) Easier handling of `LineStyle::Multiline` <> `LineStyle::Inline` conversions, when comments are involved. You can see this from how we can check if comments are present and we can decide which `LineStyle` to use in the `}` and `else` formatting of comments. To make sure that we don't do too much extra work, we use `comment_map_from_src` to init a `CommentMap` at the point where formatting begins instead of during `handle_comments` - we then reuse this `CommentMap` for the entire pass of the formatting. This is temporary and the end goal is to deprecate comment insertion post-formatting the source code, and to do it all in one pass. To ensure backward compatibility, we _still_ call `handle_comments` for now until we refactor everything to use the new method. Thankfully we have extensive tests that will prevent regression while this refactor is happening. This will be a gradual refactor and the above 2 issues were selected since they were difficult to fix with the old model of comment insertion, but much easier with this new model.

eightfilms · 2023-02-01T09:46:02Z

Closing this in favor of #3938

kayagokalp added enhancement New feature or request bikeshedding For bikeshedding trivialities code quality formatter labels Jan 16, 2023

kayagokalp changed the title ~~forc-fmt: Consider parsing comments~~ forc-fmt: Consider parsing with comments Jan 16, 2023

mitchmindtree assigned eightfilms Jan 19, 2023

eightfilms mentioned this issue Jan 23, 2023

feat(fmt): introduce formatting comments together with src #3854

Merged

eightfilms mentioned this issue Jan 31, 2023

[Tracking] feat(fmt): rewrite comment formatting #3938

Closed

39 tasks

eightfilms closed this as completed Feb 1, 2023

kayagokalp mentioned this issue Apr 24, 2023

Fix relationship between handle_comments & handle_newlines #2790

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

forc-fmt: Consider parsing with comments #3789

forc-fmt: Consider parsing with comments #3789

kayagokalp commented Jan 16, 2023

eightfilms commented Jan 16, 2023

mitchmindtree commented Jan 17, 2023

eightfilms commented Jan 18, 2023 •

edited

Loading

mitchmindtree commented Jan 19, 2023

eightfilms commented Jan 19, 2023

eightfilms commented Jan 20, 2023

kayagokalp commented Jan 20, 2023

eightfilms commented Jan 20, 2023 •

edited

Loading

kayagokalp commented Jan 20, 2023 •

edited

Loading

eightfilms commented Feb 1, 2023

forc-fmt: Consider parsing with comments #3789

forc-fmt: Consider parsing with comments #3789

Comments

kayagokalp commented Jan 16, 2023

Proposal

eightfilms commented Jan 16, 2023

mitchmindtree commented Jan 17, 2023

Alternative

eightfilms commented Jan 18, 2023 • edited Loading

mitchmindtree commented Jan 19, 2023

eightfilms commented Jan 19, 2023

eightfilms commented Jan 20, 2023

kayagokalp commented Jan 20, 2023

eightfilms commented Jan 20, 2023 • edited Loading

kayagokalp commented Jan 20, 2023 • edited Loading

eightfilms commented Feb 1, 2023

eightfilms commented Jan 18, 2023 •

edited

Loading

eightfilms commented Jan 20, 2023 •

edited

Loading

kayagokalp commented Jan 20, 2023 •

edited

Loading