Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: StringIO#write transcodes strings with a different encoding #2927

Merged

Conversation

flavorjones
Copy link
Contributor

This is a potential fix for #2839. I've based this approach on the transcoding logic in IO#write.

I'm not sure I've placed the new test in the best place; or if this fix is the best option. This is my first PR to TR! Feedback welcome. ❤️

@oracle-contributor-agreement oracle-contributor-agreement bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label Mar 10, 2023
Copy link
Member

@eregon eregon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR!
I'll address my own comments for convenience.

lib/truffle/stringio.rb Show resolved Hide resolved
spec/ruby/library/stringio/write_spec.rb Outdated Show resolved Hide resolved
@eregon eregon added the in-ci The PR is being tested in CI. Do not push new commits. label Mar 13, 2023
@eregon eregon added the shopify label Mar 13, 2023
@flavorjones flavorjones force-pushed the flavorjones-2839-stringio-transcode branch from 938aecb to a27f06b Compare March 14, 2023 12:22
@flavorjones
Copy link
Contributor Author

Remaining test failures don't seem related to this PR.

@eregon eregon force-pushed the flavorjones-2839-stringio-transcode branch from a27f06b to 2d1822c Compare March 14, 2023 14:46
@eregon
Copy link
Member

eregon commented Mar 16, 2023

I found another edge case, revealed by this failing CRuby test:

  1) Error:
TestGemCommandsSetupCommand#test_show_release_notes:
Encoding::UndefinedConversionError: "\xCF\x80" to US-ASCII in conversion from UTF-8 to US-ASCII
    <internal:core> core/string.rb:373:in `encode!'
    <internal:core> core/string.rb:417:in `encode'
    <internal:core> core/truffle/io_operations.rb:78:in `write_transcoding'
    /b/b/e/graal/sdk/mxbuild/linux-amd64/GRAALVM_5080722BFA_JAVA17/graalvm-5080722bfa-java17-23.0.0-dev/languages/ruby/lib/truffle/stringio.rb:284:in `write'
    <internal:core> core/truffle/io_operations.rb:67:in `block in puts'
    <internal:core> core/truffle/io_operations.rb:34:in `each'
    <internal:core> core/truffle/io_operations.rb:34:in `puts'
    /b/b/e/graal/sdk/mxbuild/linux-amd64/GRAALVM_5080722BFA_JAVA17/graalvm-5080722bfa-java17-23.0.0-dev/languages/ruby/lib/truffle/stringio.rb:96:in `puts'
    /b/b/e/graal/sdk/mxbuild/linux-amd64/GRAALVM_5080722BFA_JAVA17/graalvm-5080722bfa-java17-23.0.0-dev/languages/ruby/lib/mri/rubygems/user_interaction.rb:331:in `say'
    /b/b/e/graal/sdk/mxbuild/linux-amd64/GRAALVM_5080722BFA_JAVA17/graalvm-5080722bfa-java17-23.0.0-dev/languages/ruby/lib/mri/rubygems/user_interaction.rb:152:in `say'
    /b/b/e/graal/sdk/mxbuild/linux-amd64/GRAALVM_5080722BFA_JAVA17/graalvm-5080722bfa-java17-23.0.0-dev/languages/ruby/lib/mri/rubygems/commands/setup_command.rb:572:in `show_release_notes'
    /b/b/e/main/test/mri/tests/rubygems/test_gem_commands_setup_command.rb:394:in `block in test_show_release_notes'
    /b/b/e/graal/sdk/mxbuild/linux-amd64/GRAALVM_5080722BFA_JAVA17/graalvm-5080722bfa-java17-23.0.0-dev/languages/ruby/lib/mri/rubygems/user_interaction.rb:46:in `use_ui'
    /b/b/e/graal/sdk/mxbuild/linux-amd64/GRAALVM_5080722BFA_JAVA17/graalvm-5080722bfa-java17-23.0.0-dev/languages/ruby/lib/mri/rubygems/user_interaction.rb:69:in `use_ui'
    /b/b/e/main/test/mri/tests/rubygems/test_gem_commands_setup_command.rb:393:in `test_show_release_notes'

It tries to write a UTF-8 String with non-ASCII characters into a StringIO which was set_encoding(Encoding::US_ASCII).

I simplified it to:

$ cruby -rstringio -e 'io=StringIO.new.set_encoding(Encoding::US_ASCII); p io.string.encoding; io.write "* π is tasty"; p io.string.encoding; p io.string'
#<Encoding:US-ASCII>
#<Encoding:UTF-8>
"* π is tasty"

So CRuby seems to magically change the StringIO encoding for this case. Waow.

And I wondered if the BINARY case is handled similarly but no, that doesn't change the StringIO encoding:

$ cruby -rstringio -e 'io=StringIO.new.set_encoding(Encoding::US_ASCII); p io.string.encoding; io.write "* π is tasty".b; p io.string.encoding; p io.string'
#<Encoding:US-ASCII>
#<Encoding:US-ASCII>
"* \xCF\x80 is tasty"

So many hacks and inconsistencies (and maybe some bugs) for StringIO encoding :(

@eregon
Copy link
Member

eregon commented Mar 16, 2023

The encoding logic for StringIO#write upstream seems to be https://github.com/ruby/stringio/blob/009896b97378311ecaa88c8fa9c32acd7656648a/ext/stringio/stringio.c#L1498-L1506. So special case both BINARY and US-ASCII for the String encoding, in case conversation fail, move on without converting. Yeah, that's a messy hack, but we can copy it.

@graalvmbot graalvmbot merged commit 9c9f6ba into oracle:master Mar 16, 2023
@flavorjones
Copy link
Contributor Author

Thank you for the extra work needed to complete this work, @eregon.

@flavorjones flavorjones deleted the flavorjones-2839-stringio-transcode branch March 17, 2023 01:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
in-ci The PR is being tested in CI. Do not push new commits. OCA Verified All contributors have signed the Oracle Contributor Agreement. shopify
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants