Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inserting Comments #2311

Merged
merged 53 commits into from
Jul 25, 2022
Merged

Inserting Comments #2311

merged 53 commits into from
Jul 25, 2022

Conversation

kayagokalp
Copy link
Member

@kayagokalp kayagokalp commented Jul 12, 2022

About this PR

closes #2122.

Comments are stripped in the parsing step which causes them to disappear in the formatted output because the formatter reconstructs the code from the parsed source code. With this PR sway-fmt-v2will be able to conserve the comments as they were inserted by the user. To accomplish that, the following concepts are introduced:

  1. A mapping between Span -> Comment
  2. A visitor that can visit and collect possible positions inside a parsed code piece.
  3. While finding comments, collecting some context so that user-given indentations, etc are intact (we are not applying any formatting logic to the Comments yet)

Mapping between Span -> Comment

This is used for fast retrieval of comments with a given range of spans. This is essential since to insert comments we should first find them with respect to items that are also present in the formatted output. To do that we will be searching for comments in possible spots

Visitor that can visit and collect possible positions inside a parsed code piece.

To do that we will be searching for comments in possible spots for possible positions.

To be able to accomplish this goal of ours we would either need to implement searching logic for all possible spanned pieces in the code like the following:

impl AddComments for ItemUse {
    // ...
}
impl AddComments for ItemFn {
    // ...
}
.
.
.

or we should have a generic logic for searching & insertion of comments with given possible spans. This requires us to be able to visit each possible parsed code piece and collect possible spans from that code piece. Then a generic search & insert would be able to insert all the comments for all possible places that a comment can be seen. Since this seemed a cleaner and shorter way, I followed this path and introduced a CommentVisitor trait. If CommentVisitor is implemented for a type we can get all of the possible Spans by simply calling collect_spans().

A little side note here: Since for BTreeMap to keep an ordering of Span -> Comment, I needed Span to implement Ord. Since Span is heavily used in the other parts of the code as well and for comment insertion, I do only require the start and end fields of it, I created a stripped version that also implements Ord (CommentSpan).

Collecting some context while finding comments from unformatted code

Since we are finding the comment by simply searching in the mapping for the given range of spans, we are only getting the comment itself. Inserting just the comment itself deletes the user's original alignment whitespaces. We are collecting some extra information (context) while searching for comments to conserve the user's alignment.

General structure of Comment insertion

  1. We get the unformatted version of the code, and parse it to get the unformatted code's module.
  2. We format the unformatted version of the code and parse it again to get the formatted code's module.
  3. We collect spans for these two modules and traverse them together. (This way we have a way of mapping a span range from unformatted code to a span range in formatted code. This is required since we do not have another way of finding the correct spot for the comment in the formatted code because when the code is formatted places of items etc can heavily change in various scenarios. For example, input with a large number of new lines between items can be considered. When such input is formatted, the number of newlines between items will reduce dramatically. Although we will be able to find the comments easily in the unformatted code placing them in the correct places w.t.r to the item they were originally commenting on would be an open problem if we didn't traverse both formatted and unformatted code together.).
  4. In each iteration of the traverse we search comments between the last CommentSpan and the current one. If we can find some we are inserting them with their collected context to the corresponding place in the formatted output.

Current status of Comments

This is a little tricky to write about since while working on comment stuff I had a couple of "a-ha moments" about comment insertion. I think this can be enough to reach MVP formatter but this part of the code will require a little bit more love. Some points to keep in mind around Comment stuff:

  1. If a structure was multiline in unformatted code and formatter transformed that stuff to a single line, comment formatting needs to be aware of that because single-line comments that are valid in the form of a multiline structure might not be valid in a single-line case. (In that case, we can either transform the comments to something like /*...*/ from //... or we can prevent single-line formatting while we have comments.)
  2. We might want to collect more meaningful context for formatting the actual comment.

A final note about the current status: until expr formatting lands we will have duplicate comments inside expr's since the formatter also adds them (because it is directly pushing raw str until it is implemented. @eureka-cpu brewing some cool stuff for that over #2338)

Initial explanation of this PR (written while it was still a draft leaving this here for visibility purposes)

This is very WIP, opening up for visibility and discussion.

Please mind that I created this branch from #2229 since it is not merged yet, GitHub diff shows that changes as a part of this one too.

Comments are not included in the AST which requires us to construct a mapping from Span -> Comment and search for them in possible places. Ideally, we should be able to do it as generic as possible and while doing so we should be preserving the whitespace around user comments as they might be intentional (at least for now, we will be exploring how to format them in a later PR).

This implementation tries to be as generic as possible this is trivial once we are in the Item level like the following example:

// This is a comment
struct Foo {
    baz: u64,
}
// This is another comment
struct Baz {
    foo: u64,
}

Finding and inserting these types of comments are easier since they can be searched between items that all have Spans so we can have a generic implementation for finding and adding them easily. Currently, this is not implemented.

The problem is visiting all possible places in the Item which might contain a comment in between. For example:

struct Foo {
    baz: u64, // This is another comment
    fooo: u64,
}

Finding out (and adding) these type of comments require us to visit each possible place in a struct (like in between fields or after {, etc.). For searching for the comments we require the spans of the things we are going to be searching in between (like from the baz field to the fooo field we should be searching for comments.)

To be able to implement this as generic as possible I created this draft PR to further our previous discussion yesterday @mitchmindtree @eureka-cpu.

With this type of implementation, we will need to implement the following portion of the code for items that we are going to be visiting (like enums, structs, fns ...):

impl CommentVisitor for ItemStruct {
    fn collect_spans(&self) -> Vec<CommentSpan> {
        let mut collected_spans = Vec::new();
        // Add visibility token's span if it exists
        if let Some(visibility) = &self.visibility {
            collected_spans.push(CommentSpan::from_span(visibility.span()));
        }
        // Add struct_token's span
        collected_spans.push(CommentSpan::from_span(self.struct_token.span()));
        // Add name's span
        collected_spans.push(CommentSpan::from_span(self.name.span()));
        // Add generics' span if it exist
        if let Some(generics) = &self.generics {
            collected_spans.push(CommentSpan::from_span(generics.parameters.span()))
        }
        // Collect fields' spans.
        collected_spans.append(&mut self.fields.collect_spans());
        collected_spans
    }
}

This implementation currently formats the following input:

contract;

struct Foo { // Here I am 4
    // Here I am
    barasdd: u64, // Here i am 2 
    baz: bool,
}
struct Foo2 {
    barasdd: u64,
    baz: bool,
}

to

contract;

struct Foo {// Here I am 4
// Here I am
    barasdd: u64,// Here i am 2 
    baz: bool,
}
struct Foo2 {
    barasdd: u64,
    baz: bool,
}

To Do

  • Collect more context regarding the user's whitespace around their comments
  • Implement handling of comments in-between Items
  • Implement CommentVisitor for other item type's as well

I feel like this type of implementation (depending on collecting more context part's implementation) would be under 1000 lines which will be an improvement over my first approach 😄

Would love to hear your thoughts about this one and also we can still explore modifying the AST if this still seems crazy. But I am not sure if that is going to be easier than this one.

To the future reviewers of this PR: May the force be with you. 😄

@kayagokalp kayagokalp added enhancement New feature or request big this task is hard and will take a while formatter labels Jul 12, 2022
@kayagokalp kayagokalp self-assigned this Jul 12, 2022
@kayagokalp
Copy link
Member Author

kayagokalp commented Jul 13, 2022

I solved the whitespace changes in the formatted output. I can handle regular whitespace correctly but with the assumption that the offsetting between the comment and the item associated with the comment (assigned by the formatter) is separated by ' ', not \n. This is obviously not the case in general but I am guessing that wouldn't be hard to tackle.

@mitchmindtree
Copy link
Contributor

Would you mind merging in or rebasing onto master so that the diff no longer includes the CommentMap changes? It might make it a little easier to review the new comment insertion work.

@kayagokalp
Copy link
Member Author

Thanks for the nice comments @JoshuaBatty @mitchmindtree 🙏 . I applied the changes requested and wanted to comment about the current approach. As @mitchmindtree touched in the review above this comment, although this works this can be done more efficiently. But we need some ninja level refactoring to reach there and this PR is already big and I think we need the formatter v2 to replace the current one as soon as possible. So as @mitchmindtree suggested in the review, I am creating a separate issue for improving the efficiency and handle that later on with a clean look.

@kayagokalp kayagokalp enabled auto-merge (squash) July 21, 2022 18:02
mitchmindtree added a commit that referenced this pull request Jul 22, 2022
Previously, both the implementation and the test missed the trailing
slash during lexing. This fixes both, closes #2355 and unblocks #2311.
mitchmindtree added a commit that referenced this pull request Jul 22, 2022
* Fix lexing of multiline comments to include the trailing '/'

Previously, both the implementation and the test missed the trailing
slash during lexing. This fixes both, closes #2355 and unblocks #2311.

* Add missing slash to sway-fmt-v2 comment test
Copy link
Contributor

@mitchmindtree mitchmindtree left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚢

@mitchmindtree mitchmindtree requested a review from a team July 25, 2022 04:38
@eureka-cpu eureka-cpu self-requested a review July 25, 2022 22:08
@kayagokalp kayagokalp merged commit 7aa44cd into master Jul 25, 2022
@kayagokalp kayagokalp deleted the kayagokalp/comment_inserting branch July 25, 2022 22:08
@eureka-cpu eureka-cpu mentioned this pull request Aug 5, 2022
21 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
big this task is hard and will take a while enhancement New feature or request formatter
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Handle comments in sway-fmt-v2
4 participants