Encoding issue within rake task #2213

slowjack2k · 2015-09-05T12:32:57Z

Hi,

when I execute

rake rubocop

I get the following exception:

"\xF0" from ASCII-8BIT to UTF-8
rubocop-0.34.0/lib/rubocop/result_cache.rb:33:in write' .... rubocop-0.34.0/lib/rubocop/rake_task.rb:30:inblock (2 levels) in initialize'
rubocop-0.34.0/lib/rubocop/rake_task.rb:26

regards
dieter

The text was updated successfully, but these errors were encountered:

jonas054 · 2015-09-06T06:26:17Z

That's in code that I wrote, so I will try to fix it ASAP. For now you can circumvent the problem by adding this to your .rubocop.yml:

AllCops:
  UseCache: false

jonas054 · 2015-09-06T07:40:55Z

@slowjack2k I'm having trouble reproducing the error on my system. What's your settings for the environment variables LANG and LC_ALL, and can you find the file that results in this crash when RuboCop inspects it? I'd be interested to see its contents.

slowjack2k · 2015-09-06T08:19:07Z

Mac:

LANG=de_DE.UTF-8
local LC_CTYPE=C

The following fixes the issue to me:

lib/rubocop/result_cache.rb:29

f.write(Marshal.dump([offenses, Hash[disabled_line_ranges.sort],
                              comments]).force_encoding(Encoding::UTF_8))

The file I can't send you, sorry its's an internal product.

Can I inspect the comment furthor?

#<Parser::Source::Comment XXX.rb:1:1 "# -*- encoding : utf-8 -*-">

offenses and Hash[disabled_line_ranges.sort] seems to be empty.

slowjack2k · 2015-09-06T08:56:08Z

I did dig a little deeper with pry:

# lib/rubocop/result_cache.rb @ line 33
binding.pry

location = comments.first.instance_variable_get('@location')
source = location.expression.source_buffer.instance_variable_get '@source'
Marshal.dump source
# => "\x04\bI\"\x02\xF0\b# -*- encoding : utf-8 -*-\nclass XXX"
source.encoding
# => #<Encoding:UTF-8>

Marshal.dump(source).encoding

# => #<Encoding:ASCII-8BIT>

Does this help?

jonas054 · 2015-09-06T11:33:29Z

Yes, a little bit. I still haven't been able to write a failing spec example to demonstrate the problem, but here's my understanding of what's going on.

You get an exception because File#write is trying to transcode the ASCII-8BIT ("binary") string returned by Marshal.dump into UTF-8, which is the default external encoding on your system. The F0 byte becomes a problem there.

The thing I don't understand is why I don't get the same problem. I order to force this thing to occur, I have to open the file with File.open(..., 'w:UTF-8'), setting the file's external encoding to UTF-8.

As far as I can see, the best solution should be to specify binary encoding for the file when we open it for writing. It's binary data that we're going to write to it.

Can you try to just insert a b after the w?

--- a/lib/rubocop/result_cache.rb
+++ b/lib/rubocop/result_cache.rb
@@ -26,7 +26,7 @@ module RuboCop
     def save(offenses, disabled_line_ranges, comments)
       FileUtils.mkdir_p(File.dirname(@path))
       preliminary_path = "#{@path}_#{rand(1_000_000_000)}"
-      File.open(preliminary_path, 'w') do |f|
+      File.open(preliminary_path, 'wb') do |f|
         # The Hash[x.sort] call is a trick that converts a Hash with a default
         # block to a Hash without a default block. Thus making it possible to
         # dump.

Does that solve the problem for you?

slowjack2k · 2015-09-06T13:10:27Z

File.open(preliminary_path, 'wb') does the trick. It works.

I found a way to reproduce it:

str = " "*2288
Marshal.dump(str).inspect

It seems to be the string length encoded at this position.

jonas054 · 2015-09-06T14:06:00Z

I was finally able to write a spec example that fails for 'w' and passes for 'wb'. The problem wasn't how to get the \xF0 into the string. It was how to set the default encoding (internal vs external).

Thanks all the same for your help in chasing down the bug, @slowjack2k!

[Fix #2213] Write to cache in binary mode

slowjack2k · 2015-09-07T05:42:55Z

Thank you for fixing this bug.

jonas054 self-assigned this Sep 6, 2015

jonas054 closed this as completed in 7122c50 Sep 6, 2015

bbatsov added a commit that referenced this issue Sep 6, 2015

Merge pull request #2221 from jonas054/fix_cache_encoding_bug

f367cb8

[Fix #2213] Write to cache in binary mode

This was referenced Sep 7, 2015

UTF-8 issue with 0.34.0 (?) #2226

Closed

Character encoding exception when writing the cache #2229

Closed

Rubocop failing to run after upgrade to 0.34.0 from 0.33.0 #2230

Closed

cboos mentioned this issue Sep 12, 2015

marshal data too short error when reading cache file on Windows with 0.34.1 #2241

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Encoding issue within rake task #2213

Encoding issue within rake task #2213

slowjack2k commented Sep 5, 2015

jonas054 commented Sep 6, 2015

jonas054 commented Sep 6, 2015

slowjack2k commented Sep 6, 2015

slowjack2k commented Sep 6, 2015

jonas054 commented Sep 6, 2015

slowjack2k commented Sep 6, 2015

jonas054 commented Sep 6, 2015

slowjack2k commented Sep 7, 2015

Encoding issue within rake task #2213

Encoding issue within rake task #2213

Comments

slowjack2k commented Sep 5, 2015

jonas054 commented Sep 6, 2015

jonas054 commented Sep 6, 2015

slowjack2k commented Sep 6, 2015

slowjack2k commented Sep 6, 2015

jonas054 commented Sep 6, 2015

slowjack2k commented Sep 6, 2015

jonas054 commented Sep 6, 2015

slowjack2k commented Sep 7, 2015