Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't process django.po file "invalid byte sequence in UTF-8" #285

Open
anentropic opened this issue Jul 31, 2019 · 1 comment
Open

Can't process django.po file "invalid byte sequence in UTF-8" #285

anentropic opened this issue Jul 31, 2019 · 1 comment

Comments

@anentropic
Copy link

Twine version 1.0.6

$ twine consume-all-localization-files twine.txt locale/ --consume-all --consume-comments --format=django
Traceback (most recent call last):
	13: from /Users/anentropic/.rvm/gems/ruby-2.6.3/bin/ruby_executable_hooks:24:in `<main>'
	12: from /Users/anentropic/.rvm/gems/ruby-2.6.3/bin/ruby_executable_hooks:24:in `eval'
	11: from /Users/anentropic/.rvm/gems/ruby-2.6.3/bin/twine:23:in `<main>'
	10: from /Users/anentropic/.rvm/gems/ruby-2.6.3/bin/twine:23:in `load'
	 9: from /Users/anentropic/.rvm/gems/ruby-2.6.3/gems/twine-1.0.6/bin/twine:4:in `<top (required)>'
	 8: from /Users/anentropic/.rvm/gems/ruby-2.6.3/gems/twine-1.0.6/lib/twine/runner.rb:33:in `run'
	 7: from /Users/anentropic/.rvm/gems/ruby-2.6.3/gems/twine-1.0.6/lib/twine/runner.rb:190:in `consume_all_localization_files'
	 6: from /Users/anentropic/.rvm/gems/ruby-2.6.3/gems/twine-1.0.6/lib/twine/runner.rb:190:in `glob'
	 5: from /Users/anentropic/.rvm/gems/ruby-2.6.3/gems/twine-1.0.6/lib/twine/runner.rb:193:in `block in consume_all_localization_files'
	 4: from /Users/anentropic/.rvm/gems/ruby-2.6.3/gems/twine-1.0.6/lib/twine/runner.rb:323:in `read_localization_file'
	 3: from /Users/anentropic/.rvm/gems/ruby-2.6.3/gems/twine-1.0.6/lib/twine/runner.rb:323:in `open'
	 2: from /Users/anentropic/.rvm/gems/ruby-2.6.3/gems/twine-1.0.6/lib/twine/runner.rb:325:in `block in read_localization_file'
	 1: from /Users/anentropic/.rvm/gems/ruby-2.6.3/gems/twine-1.0.6/lib/twine/formatters/django.rb:22:in `read'
/Users/anentropic/.rvm/gems/ruby-2.6.3/gems/twine-1.0.6/lib/twine/formatters/django.rb:22:in `match': invalid byte sequence in UTF-8 (ArgumentError)

Unfortunately the unhandled exception does not give any information about the location of the bad char within the file.

We're using these .po files fine in our Django project so I'm not sure they really contain any wrongly encoded data.

At the top of the file there's an entry like:

msgid ""
msgstr ""
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"

if I use --encoding=ASCII-8BIT:

twine consume-all-localization-files twine.txt garage/locale/ --consume-all --consume-comments --format=django --encoding=ASCII-8BIT

then it logs Adding new definition <msg id> for all the messages in the .po but fails when writing result to the twine.txt with this error:

Traceback (most recent call last):
	19: from /Users/anentropic/.rvm/gems/ruby-2.6.3/bin/ruby_executable_hooks:24:in `<main>'
	18: from /Users/anentropic/.rvm/gems/ruby-2.6.3/bin/ruby_executable_hooks:24:in `eval'
	17: from /Users/anentropic/.rvm/gems/ruby-2.6.3/bin/twine:23:in `<main>'
	16: from /Users/anentropic/.rvm/gems/ruby-2.6.3/bin/twine:23:in `load'
	15: from /Users/anentropic/.rvm/gems/ruby-2.6.3/gems/twine-1.0.6/bin/twine:4:in `<top (required)>'
	14: from /Users/anentropic/.rvm/gems/ruby-2.6.3/gems/twine-1.0.6/lib/twine/runner.rb:33:in `run'
	13: from /Users/anentropic/.rvm/gems/ruby-2.6.3/gems/twine-1.0.6/lib/twine/runner.rb:201:in `consume_all_localization_files'
	12: from /Users/anentropic/.rvm/gems/ruby-2.6.3/gems/twine-1.0.6/lib/twine/runner.rb:55:in `write_twine_data'
	11: from /Users/anentropic/.rvm/gems/ruby-2.6.3/gems/twine-1.0.6/lib/twine/twine_file.rb:180:in `write'
	10: from /Users/anentropic/.rvm/gems/ruby-2.6.3/gems/twine-1.0.6/lib/twine/twine_file.rb:180:in `open'
	 9: from /Users/anentropic/.rvm/gems/ruby-2.6.3/gems/twine-1.0.6/lib/twine/twine_file.rb:181:in `block in write'
	 8: from /Users/anentropic/.rvm/gems/ruby-2.6.3/gems/twine-1.0.6/lib/twine/twine_file.rb:181:in `each'
	 7: from /Users/anentropic/.rvm/gems/ruby-2.6.3/gems/twine-1.0.6/lib/twine/twine_file.rb:188:in `block (2 levels) in write'
	 6: from /Users/anentropic/.rvm/gems/ruby-2.6.3/gems/twine-1.0.6/lib/twine/twine_file.rb:188:in `each'
	 5: from /Users/anentropic/.rvm/gems/ruby-2.6.3/gems/twine-1.0.6/lib/twine/twine_file.rb:206:in `block (3 levels) in write'
	 4: from /Users/anentropic/.rvm/gems/ruby-2.6.3/gems/twine-1.0.6/lib/twine/twine_file.rb:206:in `each'
	 3: from /Users/anentropic/.rvm/gems/ruby-2.6.3/gems/twine-1.0.6/lib/twine/twine_file.rb:207:in `block (4 levels) in write'
	 2: from /Users/anentropic/.rvm/gems/ruby-2.6.3/gems/twine-1.0.6/lib/twine/twine_file.rb:224:in `write_value'
	 1: from /Users/anentropic/.rvm/gems/ruby-2.6.3/gems/twine-1.0.6/lib/twine/twine_file.rb:224:in `puts'
/Users/anentropic/.rvm/gems/ruby-2.6.3/gems/twine-1.0.6/lib/twine/twine_file.rb:224:in `write': "\xC3" from ASCII-8BIT to UTF-8 (Encoding::UndefinedConversionError)

If I modify django.rb in twine like:

        while line = io.gets
          if line != nil
            line = line.scrub("BADCHAR")
          end

...then I'm able to get complete output in my twine.txt file with no errors.

Curiously the replacement BADCHAR does not appear anywhere in the output.

@sebastianludwig
Copy link
Collaborator

Hi @anentropic, thanks for opening the issue and sorry for not getting back to you any sooner. Could you provide a minimal example as file that exhibits this problem?

In general, are you sure the file is ASCII-8BIT encoded? It says charset=UTF-8 directly above... Did you encounter any problems without the --encoding parameter?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants