Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add line and column information to scanner and visitor #17

Merged
merged 3 commits into from
Mar 21, 2019
Merged

add line and column information to scanner and visitor #17

merged 3 commits into from
Mar 21, 2019

Conversation

P0lip
Copy link
Contributor

@P0lip P0lip commented Mar 14, 2019

The PR implements an extra and optional onLineBreak function.
At the moment, jsonc-parser make use of offset and length for positional info, which is totally fine.
Unfortunately, quite a few tools/editors operate on lines and columns, therefore integrating jsonc-parser is a bit more troublesome, since you need to fall back to scanner/tokenizer and perform the entire parsing process (At least, I couldn't find any reasonable way to implement positional info based on columns and lines without the use of scanner).
Hope the above reasoning makes sense.
I am not sure about the presence of lineNumber and whether it should be zero-based or not. I believe we could get rid of it and let the consumer implement it if needed.
Moreover, I'm afraid we cannot really handle line breaks in multi-line comments due to the way these comments are parsed.

@msftclas
Copy link

msftclas commented Mar 14, 2019

CLA assistant check
All CLA requirements met.

README.md Outdated Show resolved Hide resolved
@aeschli
Copy link
Contributor

aeschli commented Mar 18, 2019

Cool, thanks a lot!

@aeschli
Copy link
Contributor

aeschli commented Mar 18, 2019

Not having new-lines reported in comment is a problem. Especially as the scanner always scans and consumes comments. Its only the parser that ignores it and reports and error, if not allowed.
So if there happens to be a /* in the content, all line counting is off.

So your original idea of reporting the line number with the onSeparator is probably the right idea.

I'm also not opposed to returning linenumber/character in every call back (onObjectBegin, ...).
It should be 0-based.

@P0lip
Copy link
Contributor Author

P0lip commented Mar 19, 2019

Thanks a lot for your feedback!

I decided to implement an extra getTokenLineNumber method in JSONScanner.

At the moment, the value BlockCommentTrivia token contains line-break characters, therefore line-breaks occurring within a block/multi-line comment aren't treated as separate tokens/trivias.
Changing representation of BlockCommentTrivia is potentially possible as we could break it into multiple tokens, i.e. BlockCommentSegmentTrivia and LineBreakTrivia, but I am not sure whether it's worth it.
In my opinion, the above approach would make more harm than good. Obviously, the change would be breaking, as the scanner would need to be suspended on line-break in block comment and then resumed in the same place. Due to this, offsets and lengths would be altered. In general, handling block comments would most likely be more challenging, since they wouldn't be represented as a single token.
I believe accessing a single line of multi-line comment is not a common need and can be achieved on demand (for instance by splitting the value of token) if actually needed.
In most cases, having the start/end line number of block comment is sufficient.
As onLineBreak visit method receives zero-based line number, implementing range/position for block comments would be trivial and the current usage wouldn't be affected.

The only bit I'm worried about is the confusion the onLineBreak visit method may introduce. One may expect onLineBreak to be executed on every single break line, while in fact it's invoked only when an actual line-break token (LineBreakTrivia) is encountered. I included that information in README, yet it may not be quite obvious when line-break is represented as LineBreakTrivia and when not.

We could also remove onLineBreak and simply add line number as an argument to every visit method.
Happy to hear your feedback. Don't really know which way is more ergonomic/clear.
Both approaches would be backward-compatible, so they are worth considering.
If we go with the latter (no onLineBreak + line passed to each visit method), we might be obliged to provide the column/character as well.

I hope I took into account all line breaks that may occur in the scanning process.

@aeschli
Copy link
Contributor

aeschli commented Mar 19, 2019

We could also remove onLineBreak and simply add line number as an argument to every visit method.

That would be my favourite as its the most consistent.

@P0lip
Copy link
Contributor Author

P0lip commented Mar 20, 2019

That would be my favourite as its the most consistent.

Should we pass a zero-based column alongside the current line as well?

@aeschli
Copy link
Contributor

aeschli commented Mar 20, 2019

Yes, line and character (following the naming in the LSP)
In theory it should be startLine/startColum & endLine/endColumn.
But for all tokens the end position is easy to compute with the length, except for multi line comments.
I'd be ok to not have endLine/endColumn for the moment.

@P0lip
Copy link
Contributor Author

P0lip commented Mar 20, 2019

But for all tokens the end position is easy to compute with the length, except for multi line comments.
I'd be ok to not have endLine/endColumn for the moment.

Sounds good to me.
If one really needs to compute end-line for multi-line comments, it's still feasible - it's just a matter of performing scanning process or trying to consume other visit methods (that might be a bit troublesome / less reliable, though, as other tokens might be placed on the same line as a multi-line comment).

README.md Outdated Show resolved Hide resolved
src/impl/scanner.ts Outdated Show resolved Hide resolved
@aeschli
Copy link
Contributor

aeschli commented Mar 21, 2019

Looks good!

@aeschli aeschli merged commit b9ecc3b into microsoft:master Mar 21, 2019
@aeschli aeschli added this to the March 2019 milestone Mar 29, 2019
@aeschli
Copy link
Contributor

aeschli commented Mar 29, 2019

published as 2.1

@aeschli aeschli changed the title add onLineBreak visit function add line and column information to scanner and visitor Mar 29, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants