Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StringIO#write does not transcode strings like CRuby and JRuby #2839

Closed
flavorjones opened this issue Jan 22, 2023 · 4 comments
Closed

StringIO#write does not transcode strings like CRuby and JRuby #2839

flavorjones opened this issue Jan 22, 2023 · 4 comments

Comments

@flavorjones
Copy link
Contributor

Observed

When an IO object's encoding is set to Encoding::UTF_16, and then #write is called with a UTF-8-encoded string, the string is written as UTF-8 codepoints in TruffleRuby.

Expected

In CRuby and JRuby the string is transcoded to UTF-16.

Repro

This test passes on CRuby and JRuby, but fails on TR head (v23.0.0-dev-dd8609b3):

#! /usr/bin/env ruby

require "minitest/spec"
require "minitest/autorun"

require "stringio"

puts RUBY_DESCRIPTION

describe "encoding" do
  let(:utf8_str) { "hello" }

  # this test passes on all platforms
  it "UTF-8 string is transcoded correctly by String#encode" do
    expected = [
      254, 255, # BOM
      0, 104, 0, 101, 0, 108, 0, 108, 0, 111, # double-width "hello"
    ]

    assert_equal(expected, utf8_str.encode(Encoding::UTF_16).bytes)
  end

  # this test fails on TruffleRUby
  describe "given an IO with UTF-16 encoding" do
    let(:io) { StringIO.new.set_encoding(Encoding::UTF_16) }

    it "#write accepts a UTF-8-encoded string and transcodes it" do
      io.write(utf8_str)
      result = io.string
      expected = [
        254, 255, # BOM
        0, 104, 0, 101, 0, 108, 0, 108, 0, 111, # double-width "hello"
      ]

      assert_equal(Encoding::UTF_16, io.external_encoding)
      assert_equal(expected, result.bytes)
    end
  end
end

The failure is:

  1) Failure:
encoding::given an IO with UTF-16 encoding#test_0001_#write accepts a UTF-8-encoded string and transcodes it [./repro-truffle-encoding.rb:36]:
--- expected
+++ actual
@@ -1 +1 @@
-[254, 255, 0, 104, 0, 101, 0, 108, 0, 108, 0, 111]
+[104, 101, 108, 108, 111]
@andrykonchin
Copy link
Member

Thank you for reporting the issue. I can reproduce it.

Looks like it's an issue with StringIO only. I cannot reproduce it for IO/File.

@eregon
Copy link
Member

eregon commented Jan 23, 2023

StringIO has a weird defined notion of encoding/transcoding, e.g., it accepts but seemingly ignores it in the constructor (e.g. #2793).
Thanks for the report, looks like there should still be some transcoding.

@eregon eregon changed the title IO#write does not transcode strings like CRuby and JRuby StringIO#write does not transcode strings like CRuby and JRuby Jan 23, 2023
@eregon
Copy link
Member

eregon commented Mar 13, 2023

As a note Encoding::UTF_16 is a dummy encoding, so probably not something you would want to use in practice, rather UTF_16LE or UTF_16BE.

@flavorjones
Copy link
Contributor Author

Closed by #2927

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants