You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Should surrogate halves be allowed in string literals? (This isn't about surrogate halves physically stored in the source files, since all Crystal source files must be encoded in UTF-8.)
"\uD834"# invalid UTF-8 or UTF-16# U+1D11E for UTF-16, but all string literals represent UTF-8 sequences according to# https://crystal-lang.org/reference/syntax_and_semantics/literals/string.html"\uD834\uDD1E"String.new(Bytes[0xD8, 0x34, 0xDD, 0x1E], "UTF-16BE") # this is always allowed
Second, should Int#chr allow surrogate halves?
0xD800.chr # invalid UTF-8 codepoint0xD800.unsafe_chr # this is always allowed
In Ruby surrogate halves within string literals are a syntax error, whereas 0xD800.chr(Encoding::UTF_8) raises a RangeError. So my guess is Crystal should disallow these too.
The text was updated successfully, but these errors were encountered:
More generally, should we also protect against surrogate halves in other methods such as Char#each_byte (which already check for ord <= 0x10FFFF, but not ord >= 0)?
UTF-8 codepoints in the surrogate pair range are considered invalid. I'm not sure we need special handling for that in Char, though. I'd prefer to assume Char to always represent a valid codepoint (and thus allow to remove ord <= 0x10FFFF checks). You can only end up with an invalid codepoint with an unchecked unsafe_chr.
Should surrogate halves be allowed in string literals? (This isn't about surrogate halves physically stored in the source files, since all Crystal source files must be encoded in UTF-8.)
Second, should
Int#chr
allow surrogate halves?In Ruby surrogate halves within string literals are a syntax error, whereas
0xD800.chr(Encoding::UTF_8)
raises aRangeError
. So my guess is Crystal should disallow these too.The text was updated successfully, but these errors were encountered: