Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make IO#read_char's default behaviour UTF-8-strict #10446

Merged
merged 4 commits into from
Aug 25, 2021

Conversation

HertzDevil
Copy link
Contributor

A lot of invalid UTF-8 byte sequences go through IO#read_char unnoticed on the default encoding settings where decoding support is provided by Crystal itself instead of Iconv. For example:

io = IO::Memory.new(Bytes[0x81, 0x00])
io.read_char # => '@'
io.pos       # => 2

io = IO::Memory.new(Bytes[0xED, 0xA0, 0x80]) # U+D800
io.read_char # => '�'
io.pos       # => 3

io = IO::Memory.new(Bytes[0xF7, 0xBF, 0xBF, 0xBF]) # U+1FFFFF
io.read_char.not_nil!.ord # => 2097151
io.pos                    # => 4

This PR adds the same checks as those in Char::Reader#decode_char_at which conform to UTF-8.

@straight-shoota
Copy link
Member

Would it be possible to combine both methods with a little refactoring? They seem pretty similar, so it might be relatively easy to do.

Alternative, there should be comments referencing the respective other methods.

@straight-shoota
Copy link
Member

Oh, I actually meant Char::Reader#decode_char_at and IO#read_char. This doesn't look too bad, though, either...

@straight-shoota straight-shoota added kind:bug A bug in the code. Does not apply to documentation, specs, etc. topic:stdlib:files labels Apr 16, 2021
@beta-ziliani beta-ziliani added this to the 1.2.0 milestone Aug 20, 2021
@HertzDevil HertzDevil changed the title Make IO#read_char's default behaviour UTF-8-strict Make IO#read_char's default behaviour UTF-8-strict Aug 23, 2021
@straight-shoota straight-shoota merged commit d12190c into crystal-lang:master Aug 25, 2021
@HertzDevil HertzDevil deleted the bug/io-read-char-utf8 branch August 26, 2021 05:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:bug A bug in the code. Does not apply to documentation, specs, etc. topic:stdlib:files
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants