StringIO#ungetbyte stores incorrectly #2436

Nakilon · 2021-09-02T09:14:33Z

$ rbenv shell 2.3.8
$ ruby -e "s = StringIO.new; s.ungetbyte(255); p [s.string, s.string.encoding, s.string.bytes, s.getc]"
["\xFF", #<Encoding:UTF-8>, [255], "\xFF"]

$ rbenv shell 3.0.1
$ ruby -rstringio \
       -e "s = StringIO.new; s.ungetbyte(255); p [s.string, s.string.encoding, s.string.bytes, s.getc]"
["\xFF", #<Encoding:UTF-8>, [255], "\xFF"]

$ rbenv shell truffleruby-21.1.0
$ ruby -e "s = StringIO.new; s.ungetbyte(255); p [s.string, s.string.encoding, s.string.bytes, s.getc]"
["ÿ", #<Encoding:UTF-8>, [195, 191], "ÿ"]

This results in several tests failing in my project.

The text was updated successfully, but these errors were encountered:

Nakilon · 2021-09-02T09:45:52Z

And this one:

$ ruby -rstringio -e "Encoding::default_external = 'ASCII-8BIT'; s = StringIO.new; 1.times{ s.ungetbyte 255 }; puts :OK"
OK
$ ruby -rstringio -e "Encoding::default_external = 'ASCII-8BIT'; s = StringIO.new; 2.times{ s.ungetbyte 255 }; puts :OK"
.../truffleruby-21.1.0/lib/truffle/stringio.rb:615:in `ungetbyte': incompatible character encodings: UTF-8 and ASCII-8BIT (Encoding::CompatibilityError)
	from -e:1:in `block in <main>'
	from <internal:core> core/integer.rb:148:in `times'
	from -e:1:in `<main>'

aardvark179 · 2021-09-03T16:26:57Z

I see the problem. The StringIO#ungetbyte is treating the byte as a character rather than raw bytes. Since we're commonly working in UTF8 this single byte is converted into a multibyte UTF8 encoding, and appended to the front of the string.

To illustrate this consider the string "\u01A9". This is encoded into the byte sequence 0xC60xA9. When a byte is read we get 0xc6, but if we try to unget that byte we append 0xC30x86 to the start of the string, because that's the UTF8 encoding of \u00C6.

This particular piece of our library looks like it can be simplified considerably as ungetbyte will only accept a single number, and will mask it to be a single byte.

eregon · 2021-10-13T11:36:39Z

This was fixed by @aardvark179 in a52bd0f

aardvark179 self-assigned this Sep 2, 2021

bjfish added the bug label Sep 2, 2021

eregon added this to the 21.3.0 milestone Sep 6, 2021

gogainda mentioned this issue Sep 28, 2021

sprintf error #2451

Closed

eregon closed this as completed Oct 13, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

StringIO#ungetbyte stores incorrectly #2436

StringIO#ungetbyte stores incorrectly #2436

Nakilon commented Sep 2, 2021 •

edited

Loading

Nakilon commented Sep 2, 2021

aardvark179 commented Sep 3, 2021

eregon commented Oct 13, 2021

StringIO#ungetbyte stores incorrectly #2436

StringIO#ungetbyte stores incorrectly #2436

Comments

Nakilon commented Sep 2, 2021 • edited Loading

Nakilon commented Sep 2, 2021

aardvark179 commented Sep 3, 2021

eregon commented Oct 13, 2021

Nakilon commented Sep 2, 2021 •

edited

Loading