Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

buffer encoding issues #18

Closed
HoneyryderChuck opened this issue Dec 15, 2017 · 13 comments · Fixed by #19
Closed

buffer encoding issues #18

HoneyryderChuck opened this issue Dec 15, 2017 · 13 comments · Fixed by #19

Comments

@HoneyryderChuck
Copy link
Contributor

I've been seeing some issues with this gem in jruby, that led me to this snippet here, where I think that some encoding mis-handling might be happening, and wanted to open the discussion before a PR is submitted.

The snippet in question was (in my case) "shadowed" by IO.copy_stream, but the gist of it is:

  • outbuf.to_s.replace("") will return an "UTF-8" string.
  • @bufferis a binary string.
  • outbuf << @buffer will not change the encoding in outbuf (it'll remain "UTF-8").
  • @buffer.length != @buffer.bytesize

I'd like to propose the following changes:

  • all strings (outbuf, @buffer) should be binary (current_io.read(length, @buffer)would anyways convert the buffer to binary if it were another encoding).
  • switch to @buffer.bytesize;
  • switch to use "".b for initialization (because String.new.encoding, String.new("").encoding...).
@ixti
Copy link
Member

ixti commented Dec 15, 2017

PRs are totally welcome :D

@HoneyryderChuck
Copy link
Contributor Author

Perfect, will get to that as soon as I find a slot!

@nightsurge
Copy link

nightsurge commented Jan 8, 2018

@HoneyryderChuck Is your issue similar to this:

Encoding::CompatibilityError: incompatible encodings: ASCII-8BIT and UTF-8 from org/jruby/RubyString.java:2268:in concat'`

I am also having issues doing multi-part file uploads in JRuby and get that error.

@HoneyryderChuck
Copy link
Contributor Author

yes, it was.

@nightsurge
Copy link

@HoneyryderChuck I pulled in your code for the gem to try it locally. It was still giving me the same encoding error until I changed line 38 of composite_io.rb to be the following:

outbuf << @buffer.force_encoding(Encoding::BINARY)

Thoughts?

@HoneyryderChuck
Copy link
Contributor Author

when is the buffer not binary? what is the current_io in your specific case? I'm assuming you're passing a binary string as the second argument.

@nightsurge
Copy link

nightsurge commented Jan 8, 2018

Here's the code I am using which is the "FormData" that gets passed to the HTTP.post request:

document_string = '{"docId": '+document_id.to_s+'}'
      
form = {
   PageCreateData: HTTP::FormData::Part.new(document_string, content_type: 'application/json'),
   Image: HTTP::FormData::File.new(file_source)
}

pry(#<HTTP::FormData::CompositeIO>)> current_io
=> #<HTTP::FormData::Multipart::Param:0x46f03d6d
 @io=
  #<HTTP::FormData::CompositeIO:0x4fa1bdd0
   @buffer="",
   @index=0,
   @ios=
    [#<StringIO:0x65b63277>,
     #<HTTP::FormData::Part:0x7d5fb53a @content_type="application/json", @filename=nil, @io=#<StringIO:0x4b98900b>>,
     #<StringIO:0x4568ef38>],
   @size=110>,
 @name="PageCreateData",
 @part=#<HTTP::FormData::Part:0x7d5fb53a @content_type="application/json", @filename=nil, @io=#<StringIO:0x4b98900b>>>

@HoneyryderChuck
Copy link
Contributor Author

what is the file_source?

@nightsurge
Copy link

A string with the location of the file.

@HoneyryderChuck
Copy link
Contributor Author

ups, forgot what the argument was about.

Somehow the read call is changing the buffer encoding. I'd suggest testing your script locally against CRuby, maybe inspect that line and see whether the chunks read from the file are in the expected encoding, and compare that to the results running in JRuby. I'd be surprised if they were different, as the file is opened in binmode.

@nightsurge
Copy link

Yeah, for whatever reason I have to use .force_encoding(Encoding::BINARY) on both outbuf and @buffer to get it to work... Even on the master branch, that seems to resolve my issues without the other changes in your pull request.

@nightsurge
Copy link

Just tried it on MRI ruby and got the same results as before. Here is a script I made that you can run:
@HoneyryderChuck i'm curious to know your results and jruby version.

require "stringio"

# Provides IO interface
class IOTest
  # IO object
  def initialize(io)
    @buffer = String.new
    puts "@buffer: #{@buffer.encoding} in initialize\n\n"
    @io = if io.is_a?(String)
            StringIO.new(io)
          elsif io.respond_to?(:read)
            io
          else
            raise ArgumentError,
              "#{io.inspect} is neither a String nor an IO object"
          end
  end

  # @param [Integer] length Number of bytes to retrieve
  # @param [String] outbuf String to be replaced with retrieved data
  #
  # @return [String, nil]
  def read(length = nil, outbuf = nil)
    outbuf = outbuf.to_s.clear

    puts "Outbuf: #{outbuf.encoding} setup"
    puts "@buffer: #{@buffer.encoding} setup"
    @io.read(length, @buffer)

    puts "Outbuf: #{outbuf.encoding} after read"
    puts "@buffer: #{@buffer.encoding} after read"
    outbuf << @buffer

    puts "Outbuf: #{outbuf.encoding} after outbuf append"
    puts "@buffer: #{@buffer.encoding} after outbuf append"

    if length
      length -= @buffer.length
      break if length.zero?
    end

    outbuf unless length && outbuf.empty?
  end

  def read_force_outbuf(length = nil, outbuf = nil)
    outbuf = outbuf.to_s.clear
    outbuf.force_encoding(Encoding::BINARY)

    puts "Outbuf: #{outbuf.encoding} setup"
    puts "@buffer: #{@buffer.encoding} setup"
    @io.read(length, @buffer)

    puts "Outbuf: #{outbuf.encoding} after read"
    puts "@buffer: #{@buffer.encoding} after read"
    outbuf << @buffer

    puts "Outbuf: #{outbuf.encoding} after outbuf append"
    puts "@buffer: #{@buffer.encoding} after outbuf append"

    if length
      length -= @buffer.length
      break if length.zero?
    end

    outbuf unless length && outbuf.empty?
  end

  def read_force_both(length = nil, outbuf = nil)
    outbuf = outbuf.to_s.clear
    outbuf.force_encoding(Encoding::BINARY)

    puts "Outbuf: #{outbuf.encoding} setup"
    puts "@buffer: #{@buffer.encoding} setup"
    @io.read(length, @buffer)
    outbuf << @buffer.force_encoding(Encoding::BINARY)

    puts "Outbuf: #{outbuf.encoding} after read"
    puts "@buffer: #{@buffer.encoding} after read"
    outbuf << @buffer

    puts "Outbuf: #{outbuf.encoding} after outbuf append"
    puts "@buffer: #{@buffer.encoding} after outbuf append"

    if length
      length -= @buffer.length
      break if length.zero?
    end

    outbuf unless length && outbuf.empty?
  end
end

the_test = IOTest.new(File.new('/some_image_file.jpg'))
puts "**************** read without force_encoding"
the_test.read
puts "**************** END read without force_encoding\n\n"


the_test_2 = IOTest.new(File.new('/some_image_file.jpg'))
puts "**************** read with outbuf force_encoding"
the_test_2.read_force_outbuf
puts "**************** END read with outbuf force_encoding\n\n"

the_test_3 = IOTest.new(File.new('/some_image_file.jpg'))
puts "**************** read with both outbuf/@buffer force_encoding"
the_test_3.read_force_both
puts "**************** END read with both outbuf/@buffer force_encoding"

@nightsurge
Copy link

If you alter the script to include your other changes (which I think is just in the initialize):

@buffer = "".b

@ixti ixti closed this as completed in #19 Jan 29, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants