Inserting Comments #2311

kayagokalp · 2022-07-12T13:41:24Z

About this PR

closes #2122.

Comments are stripped in the parsing step which causes them to disappear in the formatted output because the formatter reconstructs the code from the parsed source code. With this PR sway-fmt-v2will be able to conserve the comments as they were inserted by the user. To accomplish that, the following concepts are introduced:

A mapping between Span -> Comment
A visitor that can visit and collect possible positions inside a parsed code piece.
While finding comments, collecting some context so that user-given indentations, etc are intact (we are not applying any formatting logic to the Comments yet)

Mapping between `Span` -> `Comment`

This is used for fast retrieval of comments with a given range of spans. This is essential since to insert comments we should first find them with respect to items that are also present in the formatted output. To do that we will be searching for comments in possible spots

Visitor that can visit and collect possible positions inside a parsed code piece.

To do that we will be searching for comments in possible spots for possible positions.

To be able to accomplish this goal of ours we would either need to implement searching logic for all possible spanned pieces in the code like the following:

impl AddComments for ItemUse {
    // ...
}
impl AddComments for ItemFn {
    // ...
}
.
.
.

or we should have a generic logic for searching & insertion of comments with given possible spans. This requires us to be able to visit each possible parsed code piece and collect possible spans from that code piece. Then a generic search & insert would be able to insert all the comments for all possible places that a comment can be seen. Since this seemed a cleaner and shorter way, I followed this path and introduced a CommentVisitor trait. If CommentVisitor is implemented for a type we can get all of the possible Spans by simply calling collect_spans().

A little side note here: Since for BTreeMap to keep an ordering of Span -> Comment, I needed Span to implement Ord. Since Span is heavily used in the other parts of the code as well and for comment insertion, I do only require the start and end fields of it, I created a stripped version that also implements Ord (CommentSpan).

Collecting some context while finding comments from unformatted code

Since we are finding the comment by simply searching in the mapping for the given range of spans, we are only getting the comment itself. Inserting just the comment itself deletes the user's original alignment whitespaces. We are collecting some extra information (context) while searching for comments to conserve the user's alignment.

General structure of `Comment` insertion

We get the unformatted version of the code, and parse it to get the unformatted code's module.
We format the unformatted version of the code and parse it again to get the formatted code's module.
We collect spans for these two modules and traverse them together. (This way we have a way of mapping a span range from unformatted code to a span range in formatted code. This is required since we do not have another way of finding the correct spot for the comment in the formatted code because when the code is formatted places of items etc can heavily change in various scenarios. For example, input with a large number of new lines between items can be considered. When such input is formatted, the number of newlines between items will reduce dramatically. Although we will be able to find the comments easily in the unformatted code placing them in the correct places w.t.r to the item they were originally commenting on would be an open problem if we didn't traverse both formatted and unformatted code together.).
In each iteration of the traverse we search comments between the last CommentSpan and the current one. If we can find some we are inserting them with their collected context to the corresponding place in the formatted output.

Current status of `Comment`s

This is a little tricky to write about since while working on comment stuff I had a couple of "a-ha moments" about comment insertion. I think this can be enough to reach MVP formatter but this part of the code will require a little bit more love. Some points to keep in mind around Comment stuff:

If a structure was multiline in unformatted code and formatter transformed that stuff to a single line, comment formatting needs to be aware of that because single-line comments that are valid in the form of a multiline structure might not be valid in a single-line case. (In that case, we can either transform the comments to something like /*...*/ from //... or we can prevent single-line formatting while we have comments.)
We might want to collect more meaningful context for formatting the actual comment.

A final note about the current status: until expr formatting lands we will have duplicate comments inside expr's since the formatter also adds them (because it is directly pushing raw str until it is implemented. @eureka-cpu brewing some cool stuff for that over #2338)

Initial explanation of this PR (written while it was still a draft leaving this here for visibility purposes)

This is very WIP, opening up for visibility and discussion.

~~Please mind that I created this branch from #2229 since it is not merged yet, GitHub diff shows that changes as a part of this one too.~~

Comments are not included in the AST which requires us to construct a mapping from Span -> Comment and search for them in possible places. Ideally, we should be able to do it as generic as possible and while doing so we should be preserving the whitespace around user comments as they might be intentional (at least for now, we will be exploring how to format them in a later PR).

This implementation tries to be as generic as possible this is trivial once we are in the Item level like the following example:
// This is a comment
struct Foo {
    baz: u64,
}
// This is another comment
struct Baz {
    foo: u64,
}
Finding and inserting these types of comments are easier since they can be searched between items that all have Spans so we can have a generic implementation for finding and adding them easily. Currently, this is not implemented.

The problem is visiting all possible places in the Item which might contain a comment in between. For example:
struct Foo {
    baz: u64, // This is another comment
    fooo: u64,
}
Finding out (and adding) these type of comments require us to visit each possible place in a struct (like in between fields or after {, etc.). For searching for the comments we require the spans of the things we are going to be searching in between (like from the baz field to the fooo field we should be searching for comments.)

To be able to implement this as generic as possible I created this draft PR to further our previous discussion yesterday @mitchmindtree @eureka-cpu.

With this type of implementation, we will need to implement the following portion of the code for items that we are going to be visiting (like enums, structs, fns ...):
impl CommentVisitor for ItemStruct {
    fn collect_spans(&self) -> Vec<CommentSpan> {
        let mut collected_spans = Vec::new();
        // Add visibility token's span if it exists
        if let Some(visibility) = &self.visibility {
            collected_spans.push(CommentSpan::from_span(visibility.span()));
        }
        // Add struct_token's span
        collected_spans.push(CommentSpan::from_span(self.struct_token.span()));
        // Add name's span
        collected_spans.push(CommentSpan::from_span(self.name.span()));
        // Add generics' span if it exist
        if let Some(generics) = &self.generics {
            collected_spans.push(CommentSpan::from_span(generics.parameters.span()))
        }
        // Collect fields' spans.
        collected_spans.append(&mut self.fields.collect_spans());
        collected_spans
    }
}
This implementation currently formats the following input:
contract;

struct Foo { // Here I am 4
    // Here I am
    barasdd: u64, // Here i am 2 
    baz: bool,
}
struct Foo2 {
    barasdd: u64,
    baz: bool,
}
to
contract;

struct Foo {// Here I am 4
// Here I am
    barasdd: u64,// Here i am 2 
    baz: bool,
}
struct Foo2 {
    barasdd: u64,
    baz: bool,
}
To Do

Collect more context regarding the user's whitespace around their comments

Implement handling of comments in-between Items

Implement CommentVisitor for other item type's as well

~~I feel like this type of implementation (depending on collecting more context part's implementation) would be under 1000 lines which will be an improvement over my first approach 😄~~

Would love to hear your thoughts about this one and also we can still explore modifying the AST if this still seems crazy. But I am not sure if that is going to be easier than this one.

To the future reviewers of this PR: May the force be with you. 😄

…stream, rename: construct_comment_map to comment_map_from_src

…itor

kayagokalp · 2022-07-13T00:20:45Z

I solved the whitespace changes in the formatted output. I can handle regular whitespace correctly but with the assumption that the offsetting between the comment and the item associated with the comment (assigned by the formatter) is separated by ' ', not \n. This is obviously not the case in general but I am guessing that wouldn't be hard to tackle.

mitchmindtree · 2022-07-13T07:33:28Z

Would you mind merging in or rebasing onto master so that the diff no longer includes the CommentMap changes? It might make it a little easier to review the new comment insertion work.

…rmatting

…actoring around insertion

kayagokalp · 2022-07-21T17:58:30Z

Thanks for the nice comments @JoshuaBatty @mitchmindtree 🙏 . I applied the changes requested and wanted to comment about the current approach. As @mitchmindtree touched in the review above this comment, although this works this can be done more efficiently. But we need some ninja level refactoring to reach there and this PR is already big and I think we need the formatter v2 to replace the current one as soon as possible. So as @mitchmindtree suggested in the review, I am creating a separate issue for improving the efficiency and handle that later on with a clean look.

Previously, both the implementation and the test missed the trailing slash during lexing. This fixes both, closes #2355 and unblocks #2311.

* Fix lexing of multiline comments to include the trailing '/' Previously, both the implementation and the test missed the trailing slash during lexing. This fixes both, closes #2355 and unblocks #2311. * Add missing slash to sway-fmt-v2 comment test

mitchmindtree

🚢

sway-fmt-v2/src/error.rs

kayagokalp and others added 13 commits July 5, 2022 15:50

CommentedTokenTree::Comment case for span comment map implemented

ee9abdb

CommentSpan ordering fix and test added

6b6681a

clippy

be4511c

Handle in-block and next-to item cases for comment map generation

9dc606a

Comments in between items are added correctly.

ddfb007

remove premature comment addition

fe5c851

Merge branch 'master' into kayagokalp/2224

9d48927

removed FormatComment trait

06b5968

rename: get_comment_from_token_stream to collect_comments_from_token_…

d9cb274

…stream, rename: construct_comment_map to comment_map_from_src

Merge branch 'master' into kayagokalp/2224

d2b61ce

Merge branch 'master' into kayagokalp/2224

2793a16

comment ordering corrected

bf9dcce

Comment handling for inside Struct block implemented

0f6d01e

kayagokalp added enhancement New feature or request big this task is hard and will take a while formatter labels Jul 12, 2022

kayagokalp self-assigned this Jul 12, 2022

kayagokalp added 4 commits July 13, 2022 02:30

More context collected for whitespace preserving

f8774ae

test for comments added

3bdc20b

Added forgetten final value for punctuated

86b4786

removed test since it requires all item types to implement CommentVis…

d603a8f

…itor

Until CommentVisitor implemented another check added for tests to pass

934fb84

kayagokalp mentioned this pull request Jul 13, 2022

Storage formatting does not add = and Expr in sway-fmt-v2 #2321

Closed

kayagokalp added 5 commits July 13, 2022 14:18

Merge master

e94e123

CommentVisitor implemented for ItemEnum

d518b67

Add comment test

018effd

test extended to include enum

6cfc34e

CommentVisitor for ItemFn implemendted

40bb416

Prevent reparsing the unformatted_code as we already did it before fo…

9ae7aa8

…rmatting

kayagokalp mentioned this pull request Jul 21, 2022

Comment lexing misses trailing / for multi-line comments #2355

Closed

refactor insert_after_span

fd62f6d

kayagokalp dismissed JoshuaBatty’s stale review via fd62f6d July 21, 2022 15:06

kayagokalp mentioned this pull request Jul 21, 2022

Lexer misses the last comment after a const declaration #2356

Closed

kayagokalp and others added 3 commits July 21, 2022 20:40

Split test_comments to seperate tests for better visibility, code ref…

4424ae8

…actoring around insertion

removed comments regarding comments inside PathType

c8bc474

Merge branch 'master' into kayagokalp/comment_inserting

43e7650

kayagokalp mentioned this pull request Jul 21, 2022

Investigate and implement a more efficient way to insert comments and newlines for swayfmt #2357

Closed

kayagokalp enabled auto-merge (squash) July 21, 2022 18:02

kayagokalp requested review from mitchmindtree and JoshuaBatty July 21, 2022 18:02

mitchmindtree added a commit that referenced this pull request Jul 22, 2022

Fix lexing of multiline comments to include the trailing '/'

c27f9d4

Previously, both the implementation and the test missed the trailing slash during lexing. This fixes both, closes #2355 and unblocks #2311.

mitchmindtree mentioned this pull request Jul 22, 2022

Fix lexing of multiline comments to include the trailing '/' #2359

Merged

kayagokalp added 3 commits July 22, 2022 13:52

Merge branch 'master' into kayagokalp/comment_inserting

9d6c924

CommentVisitor -> LeafSpans and comment formatting fix

5328b3d

fmt fix

c118303

mitchmindtree approved these changes Jul 25, 2022

View reviewed changes

Merge branch 'master' into kayagokalp/comment_inserting

4bdb3b4

mitchmindtree requested a review from a team July 25, 2022 04:38

kayagokalp mentioned this pull request Jul 25, 2022

[Tracking] swayfmt-v2 MVP #1516

Closed

eureka-cpu reviewed Jul 25, 2022

View reviewed changes

sway-fmt-v2/src/error.rs Show resolved Hide resolved

eureka-cpu self-requested a review July 25, 2022 22:08

eureka-cpu approved these changes Jul 25, 2022

View reviewed changes

kayagokalp merged commit 7aa44cd into master Jul 25, 2022

kayagokalp deleted the kayagokalp/comment_inserting branch July 25, 2022 22:08

eureka-cpu mentioned this pull request Aug 5, 2022

Add expr formatting to sway-fmt-v2 #2338

Merged

21 tasks

kayagokalp mentioned this pull request Jan 16, 2023

forc-fmt: Consider parsing with comments #3789

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inserting Comments #2311

Inserting Comments #2311

kayagokalp commented Jul 12, 2022 •

edited

Loading

kayagokalp commented Jul 13, 2022 •

edited

Loading

mitchmindtree commented Jul 13, 2022

kayagokalp commented Jul 21, 2022

mitchmindtree left a comment

Inserting Comments #2311

Inserting Comments #2311

Conversation

kayagokalp commented Jul 12, 2022 • edited Loading

About this PR

Mapping between Span -> Comment

Visitor that can visit and collect possible positions inside a parsed code piece.

Collecting some context while finding comments from unformatted code

General structure of Comment insertion

Current status of Comments

Initial explanation of this PR (written while it was still a draft leaving this here for visibility purposes)

To Do

kayagokalp commented Jul 13, 2022 • edited Loading

mitchmindtree commented Jul 13, 2022

kayagokalp commented Jul 21, 2022

mitchmindtree left a comment

Choose a reason for hiding this comment

kayagokalp commented Jul 12, 2022 •

edited

Loading

Mapping between `Span` -> `Comment`

General structure of `Comment` insertion

Current status of `Comment`s

kayagokalp commented Jul 13, 2022 •

edited

Loading