Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only allow U+000A and U+000D as line break tokens #475

Closed
marcelofabri opened this issue Jan 29, 2018 · 2 comments
Closed

Only allow U+000A and U+000D as line break tokens #475

marcelofabri opened this issue Jan 29, 2018 · 2 comments

Comments

@marcelofabri
Copy link
Contributor

https://github.com/jpsim/SourceKitten/blob/master/Source/SourceKittenFramework/String%2BSourceKitten.swift#L89

This is probably the cause of realm/SwiftLint#2009 (comment), as when "u{2028}" is used the ranges are incorrect.

That's because that character is a line separator, but it takes 3 bytes to represent it in utf8:

p "\u{2028}".utf8.count
(Int) $R1 = 3
@SDGGiesbrecht
Copy link

@marcelofabri, your example with U+2028 demonstrates that SourceKit’s behaviour is actually at odds with Swift’s official grammar (4.0.3).

  • SourceKit apparently follows the Unicode line‐breaking algorithm, while...
  • the Swift grammar makes U+2028 illegal (except in comments and string literals, where it is treated as a printable character with no newline semantics).

One of those two is the real source of the bug. SourceKitten actually follows the Swift grammar accurately as far as U+2028 is concerned.

However, the carriage return (U+000D) variants are clear and consistent in Swift and appear to be mishandled by SourceKitten. They could be fixed without needing to choose between two differing “correct” behaviours.

Note that String+SourceKitten.swift, line 66 is also incompatible with the Swift grammar, since .newlines includes characters Swift does not consider to be newlines. (This may be the very same error SourceKit makes.)

@marcelofabri
Copy link
Contributor Author

Actually, as @SDGGiesbrecht pointed out, we shouldn't consider U+2028 as a valid new line token.

I think SourceKit is actually doing the right here here, we're not.

@marcelofabri marcelofabri changed the title Stop assuming \n when calculating line range Only allow U+000A and U+000D as line break tokens Feb 3, 2018
@jpsim jpsim closed this as completed in 11ca107 Mar 7, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants