parser: weird formatting in a trailing comment causes all comments for next element to be ignored #10648

jhump · 2022-09-23T13:32:05Z

Here's an example source:

message Foo {
  optional string s = 1; /* is this a trailer
                            comment? */ // is this detached?

  // this is clearly a detached comment
  
  // and this is clearly a leading comment
  optional int32 i = 2;
}

Because the trailer comment above does not have a newline between it and the subsequent "is this detached?" comment, the parser gets confused and gives up on all of the rest of the comments from there to the next element.

So the above source code results in no comments in source code info in the resulting descriptor.

The issue is that it gets confused here, thinking that the next token is on the same line (even though the next "token" is actually another comment).

If it were actually a non-comment token, the behavior makes sense: it is not clear whether the comment should be attributed as a trailing comment for s or a leading comment for the next element, so it gets dropped.

But since the next thing is actually another comment, I'd suggest that the correct interpretation is:

s trailing comment "is this a trailer\ncomment?"
i detached comments
- "is this detached?"
- "this is clearly a detached comment"
i leading comment "and this is clearly a leading comment"

I could see an argument that the first two comments are weird and might be dropped, instead resulting in the following:

s has no trailing comment
i detached comment "this is clearly a detached comment"
i leading comment "and this is clearly a leading comment"

But I think the current behavior, where it fails to attribute any detached or leading comments to i is clearly wrong.

FWIW, I think the first interpretation above is the easier one to implement (and seems useful to preserve more comments).

The text was updated successfully, but these errors were encountered:

fowles · 2022-09-23T16:50:54Z

I agree with your assessment and we would be happy to accept a PR that implements your first suggestion.

jhump · 2022-09-27T17:38:51Z

@fowles, in making this change, I think it also makes sense -- for the sake of consistency -- to handle the following as annotated:

message Foo
  string s = 1; /* detached (because ambiguous) */ uint64 i = 2;
}

message Foo
  string s = 1; /* this multi-line
                  comment is actually
                 ambiguous and thus
                will be detached */ uint64 i = 2;
}

message Bar {
  string s = 1; /* trailing */ /* leading */ uint64 i = 2;
}

message Baz {
  string s = 1; /* trailing */ /* detached */ /* leading */ uint64 i = 2;
}

message Baz {
  string s = 1; /* trailing */ /* detached #1 */ /* detached #2 */ /* leading */ uint64 i = 2;
}

WDYT? Preserving the current behavior (specifically the current intended behavior, not buggy behavior) would mean that all of these are discarded. So does it make more sense to classify the comments as above or to drop them?

jhump · 2022-09-27T17:44:13Z

FWIW, the PR I just pushed proceeds with the handling in the previous comment. But it's trivial to adapt to drop the less clear attributions. Just let me know what you think.

jhump added the untriaged auto added to all issues by default when created. label Sep 23, 2022

fowles added protoc release notes: no and removed untriaged auto added to all issues by default when created. labels Sep 23, 2022

fowles added release notes: yes and removed release notes: no labels Sep 23, 2022

jhump mentioned this issue Sep 27, 2022

gracefully handle weird placement of linebreaks around comments #10660

Merged

fowles closed this as completed in #10660 Sep 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

parser: weird formatting in a trailing comment causes all comments for next element to be ignored #10648

parser: weird formatting in a trailing comment causes all comments for next element to be ignored #10648

jhump commented Sep 23, 2022 •

edited

Loading

fowles commented Sep 23, 2022

jhump commented Sep 27, 2022

jhump commented Sep 27, 2022

parser: weird formatting in a trailing comment causes all comments for next element to be ignored #10648

parser: weird formatting in a trailing comment causes all comments for next element to be ignored #10648

Comments

jhump commented Sep 23, 2022 • edited Loading

fowles commented Sep 23, 2022

jhump commented Sep 27, 2022

jhump commented Sep 27, 2022

jhump commented Sep 23, 2022 •

edited

Loading