-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
forc-fmt: Consider parsing with comments #3789
Comments
+1 on this, swayfmt v2 seems mature enough to work well for most cases but it seems like the search-and-insert method is leading to a few weird behavior surrounding formatting with comments. Right now fixing these edge cases are just small changes but this could become hard to maintain over time. |
Sounds good to me! This would also address #2357. Originally, we avoided trying to produce a I just had a chat with @bingcicle and after checking rustfmt again for better ideas, he'll start looking into how feasible this is. The gist of the idea is to mirror the existing AST of pub struct Commented<T> {
pub node: T,
pub comments: Vec<Comment>,
}
pub struct Comment {
pub span: Span,
} Where
I think inferring the intended "style" of comments like this can be tricky to do reliably. While that We might be better off starting by simply collecting comments that precede the node and re-adding them verbatim. If a node has a commented child where the comment style is at all ambiguous, we can avoid formatting the item for now. AlternativeThinking on this more, I think there's still a chance we can find a simpler alternative that doesn't require constructing an entire alternative commented AST. I'm imagining something like this:
This approach might be helped by splitting the This might require a less of a significant refactor, but get close to the "structured"-ness of the full commented AST approach? @bingcicle if you're interested in this approach let me know and we can have another chat. |
I'm starting to think the alternative approach might be preferred here for now, even though the commented AST approach would probably be a more structurally sound approach. I think the formatter as it is right now is good for most cases, and perhaps what we need to do is to rethink the heuristics when it comes to formatting commenting. The visitor pattern seems like a great idea - it seems to be what rustfmt is doing as well. With that said, rustfmt seemingly has similar issues with comment formatting in the same way we do now, leading to a recurring issue of the formatter destructively removing comments without warning. In some cases, this destructive manner could severely impact the devEx of the formatter. One additional thing we should probably consider here is to define clearly what is and what isn't a valid comment. I'm wary of trying to follow in rustfmt's footsteps here. In the rustfmt guide there is a recommendation to "avoid writing comments on the same line as the braces", and the general sense is that rustfmt is flexible about where you can place your comments, and it will do the rest for you. However, I think leaving options open make it much harder to have a good solution for a very minor improvement in user flexibility. This issue comes to mind and @Braqzen also made a good point there - perhaps we should also think about restricting what is valid in the context of formatting comments while tackling this issue. Doing this will probably help with reducing the destructive behavior we're seeing happen in rustfmt (and sometimes, already in swayfmt) while reducing the scope of work here as well, at the cost of slightly less flexibility when it comes to comment syntax. |
I'm certainly open to restricting certain kinds of comments if it does indeed simplify our implementation, however this restriction should be imposed during parsing, and not only through Perhaps for now we can forge ahead with the "Alternative" visitor approach above and get as far as we can handling as many types of comments as we can. During this, we collect examples of comments that we are unable to support in a practical manner, and we propose to the lang team that we disallow these during parsing? |
Yep, agreed that it should be imposed under parsing and not under formatting.
This sounds like a good way to begin! |
By 'intersect', do you mean any node's span that is in range with a comment's span? How would this play well with for eg. trailing comments that don't intersect?
If we are incrementally implementing this for each known AST node, it means that parts of comment formatting will still be handled via insertion while we are migrating to a I think the first PR here to make these changes should just be the skeleton for the new pattern, aka introduce an |
I had a similar problem with initial implementation, since we are searching between spans for comments, a comment that is after (or before) all of the nodes in the tree was getting chopped off. I remember working around that by simply inserting two dummy spans one that points to the beginning of the file, and the other pointing to the end of the file. |
This was also a problem I found while digging through old rustfmt issues - they also had problems with trying to format comments between rustc AST nodes...
This is good to know! Any other gotchas to be aware of here? So far, this is what I'm doing:
A potential downside i'm concerned about here is if we end up with a lot more code, since we're no longer doing everything in |
If this would mean we would be looking for bugs in specific item's formatting function, it is still a big win imho :) |
closes #3853 closes #3574 ## Overview Currently, comment formatting is done post-formatting of the source code, which can make it difficult to reason about formatting comments in certain scenarios. There have been a few cases (#3574, #2888, #2649) where comments are often wrongly indented. In #3789, there is extensive discussion on an alternative way of handling comment formatting. It seems like we're at a similar problem that rustfmt originally encountered since comments are not part of the AST. ## Approach Instead of formatting comments _after_ formatting the source code as a whole w/o comments, we now format comments locally within each `Format` implementation of each AST node. Few advantages of doing this: 1) We now have the context of the formatter at the point of which the comments are being added, e.g. we can use `formatter.shape.indent` to indent comments, especially in odd locations. 2) Easier handling of `LineStyle::Multiline` <> `LineStyle::Inline` conversions, when comments are involved. You can see this from how we can check if comments are present and we can decide which `LineStyle` to use in the `}` and `else` formatting of comments. To make sure that we don't do too much extra work, we use `comment_map_from_src` to init a `CommentMap` at the point where formatting begins instead of during `handle_comments` - we then reuse this `CommentMap` for the entire pass of the formatting. This is temporary and the end goal is to deprecate comment insertion post-formatting the source code, and to do it all in one pass. To ensure backward compatibility, we _still_ call `handle_comments` for now until we refactor everything to use the new method. Thankfully we have extensive tests that will prevent regression while this refactor is happening. This will be a gradual refactor and the above 2 issues were selected since they were difficult to fix with the old model of comment insertion, but much easier with this new model.
Closing this in favor of #3938 |
We are facing lots of small issues regarding comment insertion. At the time of #2311 we were still exploring the issue and after maintaining this way of handling comments for couple of months, I think it seems like a better way of handling should be discussed. I am not linking all comment related hot-fix PRs but we had couple of them 😄
Proposal
Let's parse comments and produce a
CommentedParseTree
just like we are lexing comments (we haveCommentedTokenStream
vsTokenStream
etc). This way re-constructing the comments section would not need:We would be simply formatting comments just like we are formatting other stuff like
Ty
s,expr
s etc. I believe this would be a little bit faster and most importantly it could lead to less problems. Currently once we have some edge case stuff happening related to the comments we tend to introduce rather "hacky" solutions as the interaction between formatter and comment handler is not simple enough.With parsed comments, if we can also collect some sort of context we could still be sure that we are not altering the intentional comment styles (rust classifies comments like this)
Any thoughts on this? cc @FuelLabs/tooling
The text was updated successfully, but these errors were encountered: