Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encoding issue within rake task #2213

Closed
slowjack2k opened this issue Sep 5, 2015 · 8 comments
Closed

Encoding issue within rake task #2213

slowjack2k opened this issue Sep 5, 2015 · 8 comments
Assignees

Comments

@slowjack2k
Copy link

Hi,

when I execute

rake rubocop

I get the following exception:

"\xF0" from ASCII-8BIT to UTF-8
rubocop-0.34.0/lib/rubocop/result_cache.rb:33:in write' .... rubocop-0.34.0/lib/rubocop/rake_task.rb:30:inblock (2 levels) in initialize'
rubocop-0.34.0/lib/rubocop/rake_task.rb:26

regards
dieter

@jonas054 jonas054 self-assigned this Sep 6, 2015
@jonas054
Copy link
Collaborator

jonas054 commented Sep 6, 2015

That's in code that I wrote, so I will try to fix it ASAP. For now you can circumvent the problem by adding this to your .rubocop.yml:

AllCops:
  UseCache: false

@jonas054
Copy link
Collaborator

jonas054 commented Sep 6, 2015

@slowjack2k I'm having trouble reproducing the error on my system. What's your settings for the environment variables LANG and LC_ALL, and can you find the file that results in this crash when RuboCop inspects it? I'd be interested to see its contents.

@slowjack2k
Copy link
Author

Mac:

LANG=de_DE.UTF-8
local LC_CTYPE=C

The following fixes the issue to me:

lib/rubocop/result_cache.rb:29

f.write(Marshal.dump([offenses, Hash[disabled_line_ranges.sort],
                              comments]).force_encoding(Encoding::UTF_8))

The file I can't send you, sorry its's an internal product.

Can I inspect the comment furthor?

#<Parser::Source::Comment XXX.rb:1:1 "# -*- encoding : utf-8 -*-">

offenses and Hash[disabled_line_ranges.sort] seems to be empty.

@slowjack2k
Copy link
Author

I did dig a little deeper with pry:

# lib/rubocop/result_cache.rb @ line 33
binding.pry

location = comments.first.instance_variable_get('@location')
source = location.expression.source_buffer.instance_variable_get '@source'
Marshal.dump source
# => "\x04\bI\"\x02\xF0\b# -*- encoding : utf-8 -*-\nclass XXX"
source.encoding
# => #<Encoding:UTF-8>

Marshal.dump(source).encoding

# => #<Encoding:ASCII-8BIT>

Does this help?

@jonas054
Copy link
Collaborator

jonas054 commented Sep 6, 2015

Yes, a little bit. I still haven't been able to write a failing spec example to demonstrate the problem, but here's my understanding of what's going on.

You get an exception because File#write is trying to transcode the ASCII-8BIT ("binary") string returned by Marshal.dump into UTF-8, which is the default external encoding on your system. The F0 byte becomes a problem there.

The thing I don't understand is why I don't get the same problem. I order to force this thing to occur, I have to open the file with File.open(..., 'w:UTF-8'), setting the file's external encoding to UTF-8.

As far as I can see, the best solution should be to specify binary encoding for the file when we open it for writing. It's binary data that we're going to write to it.

Can you try to just insert a b after the w?

--- a/lib/rubocop/result_cache.rb
+++ b/lib/rubocop/result_cache.rb
@@ -26,7 +26,7 @@ module RuboCop
     def save(offenses, disabled_line_ranges, comments)
       FileUtils.mkdir_p(File.dirname(@path))
       preliminary_path = "#{@path}_#{rand(1_000_000_000)}"
-      File.open(preliminary_path, 'w') do |f|
+      File.open(preliminary_path, 'wb') do |f|
         # The Hash[x.sort] call is a trick that converts a Hash with a default
         # block to a Hash without a default block. Thus making it possible to
         # dump.

Does that solve the problem for you?

@slowjack2k
Copy link
Author

File.open(preliminary_path, 'wb') does the trick. It works.

I found a way to reproduce it:

str = " "*2288
Marshal.dump(str).inspect

It seems to be the string length encoded at this position.

@jonas054
Copy link
Collaborator

jonas054 commented Sep 6, 2015

I was finally able to write a spec example that fails for 'w' and passes for 'wb'. The problem wasn't how to get the \xF0 into the string. It was how to set the default encoding (internal vs external).

Thanks all the same for your help in chasing down the bug, @slowjack2k!

bbatsov added a commit that referenced this issue Sep 6, 2015
@slowjack2k
Copy link
Author

Thank you for fixing this bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants