Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Surrogate halves #10440

Closed
HertzDevil opened this issue Feb 24, 2021 · 2 comments · Fixed by #10451
Closed

Surrogate halves #10440

HertzDevil opened this issue Feb 24, 2021 · 2 comments · Fixed by #10451

Comments

@HertzDevil
Copy link
Contributor

Should surrogate halves be allowed in string literals? (This isn't about surrogate halves physically stored in the source files, since all Crystal source files must be encoded in UTF-8.)

"\uD834" # invalid UTF-8 or UTF-16

# U+1D11E for UTF-16, but all string literals represent UTF-8 sequences according to
# https://crystal-lang.org/reference/syntax_and_semantics/literals/string.html
"\uD834\uDD1E"

String.new(Bytes[0xD8, 0x34, 0xDD, 0x1E], "UTF-16BE") # this is always allowed

Second, should Int#chr allow surrogate halves?

0xD800.chr # invalid UTF-8 codepoint

0xD800.unsafe_chr # this is always allowed

In Ruby surrogate halves within string literals are a syntax error, whereas 0xD800.chr(Encoding::UTF_8) raises a RangeError. So my guess is Crystal should disallow these too.

@HertzDevil
Copy link
Contributor Author

HertzDevil commented Feb 25, 2021

More generally, should we also protect against surrogate halves in other methods such as Char#each_byte (which already check for ord <= 0x10FFFF, but not ord >= 0)?

@straight-shoota
Copy link
Member

UTF-8 codepoints in the surrogate pair range are considered invalid. I'm not sure we need special handling for that in Char, though. I'd prefer to assume Char to always represent a valid codepoint (and thus allow to remove ord <= 0x10FFFF checks). You can only end up with an invalid codepoint with an unchecked unsafe_chr.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants