Chokes on source files with non-utf-8 encoding #157

nedbat · 2011-11-08T07:58:22Z

Originally reported by Wolfgang Schnerring (Bitbucket: wosc, GitHub: wosc)

If you have python source files that are, e.g. latin-1 encoded, the reporter will die like this:

    coverage.main()
  File "/var/cache/eggs/coverage-3.5.1-py2.6-linux-x86_64.egg/coverage/cmdline.py", line 657, in main
    status = CoverageScript().command_line(argv)
  File "/var/cache/eggs/coverage-3.5.1-py2.6-linux-x86_64.egg/coverage/cmdline.py", line 549, in command_line
    directory=options.directory, **report_args)
  File "/var/cache/eggs/coverage-3.5.1-py2.6-linux-x86_64.egg/coverage/control.py", line 599, in html_report
    reporter.report(morfs, config=self.config)
  File "/var/cache/eggs/coverage-3.5.1-py2.6-linux-x86_64.egg/coverage/html.py", line 83, in report
    self.report_files(self.html_file, morfs, config, config.html_dir)
  File "/var/cache/eggs/coverage-3.5.1-py2.6-linux-x86_64.egg/coverage/report.py", line 86, in report_files
    report_fn(cu, self.coverage._analyze(cu))
  File "/var/cache/eggs/coverage-3.5.1-py2.6-linux-x86_64.egg/coverage/html.py", line 198, in html_file
    self.write_html(html_path, html)
  File "/var/cache/eggs/coverage-3.5.1-py2.6-linux-x86_64.egg/coverage/html.py", line 103, in write_html
    write_encoded(fname, html, 'ascii', 'xmlcharrefreplace')
  File "/var/cache/eggs/coverage-3.5.1-py2.6-linux-x86_64.egg/coverage/backward.py", line 137, in write_encoded
    f.write(text.decode('utf8'))
  File "/usr/local/python2.6/lib/python2.6/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe4 in position 14451: invalid continuation byte

The workaround is simple, of course, change the file's encoding and declaration (and you should be using utf-8 if any, anyway). But still I wonder whether this could be handled more gracefully and with an error message that tells what's going on.

Bitbucket: https://bitbucket.org/ned/coveragepy/issue/157

The text was updated successfully, but these errors were encountered:

nedbat · 2011-12-05T15:31:56Z

Issue #159 was marked as a duplicate of this issue.

nedbat · 2011-12-05T15:32:16Z

Issue #162 was marked as a duplicate of this issue.

nedbat · 2011-12-19T08:10:41Z

Original comment by Kirit Sælensminde (Bitbucket: KayEss, GitHub: KayEss)

We've recently seen this error on what appears to be a properly Unicode encoded file :(

It would be really great if we could at least get the file name that was being processed when the error is thrown. I'd be happy to look into how to do that.

What do you think the right approach would be? Change the exception type to one that includes that in the error, or try to annotate the existing exception in some way?

nedbat · 2011-12-19T12:20:30Z

@kirit: the problem isn't bad encodings, it's any encoding other than utf-8. Is that your situation? If you think you have a new scenario, attach a file demonstrating the problem.

The right way to fix the problem is to use the encoding declaration at the top of the file when reading the source.

nedbat · 2011-12-21T07:24:03Z

Original comment by Kirit Sælensminde (Bitbucket: KayEss, GitHub: KayEss)

Hi Ned. The file is UTF-8 and has the encoding declaration at the beginning -- or at least, the file that we think it is. I agree totally that the file needs to be fixed and UTF-8 is the way to go.

What I'm hoping to do for you though is to get the full file pathname that causes the problem into the exception in some way so that it's clear when the error happens which file needs fixing. I.e. the error might read:

UnicodeDecodeError: 'utf8' codec can't decode byte 0xe4 in position 14451: invalid continuation byte in file '/home/kirit/Projects/foo/bar/baz.py'

nedbat · 2012-03-22T01:00:55Z

This is now fixed in <<changeset f7acbcfe9ca9 (bb)>>.

Kirit: I never wanted you to have to "fix" your source code. If Python accepts it, coverage.py should accept it. I hope you'll find it works better now.

nedbat · 2012-03-22T01:02:29Z

Issue #169 was marked as a duplicate of this issue.

nedbat closed this as completed Mar 22, 2012

nedbat added minor bug Something isn't working html labels Jun 23, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chokes on source files with non-utf-8 encoding #157

Chokes on source files with non-utf-8 encoding #157

nedbat commented Nov 8, 2011

nedbat commented Dec 5, 2011

nedbat commented Dec 5, 2011

nedbat commented Dec 19, 2011

nedbat commented Dec 19, 2011

nedbat commented Dec 21, 2011

nedbat commented Mar 22, 2012

nedbat commented Mar 22, 2012

Chokes on source files with non-utf-8 encoding #157

Chokes on source files with non-utf-8 encoding #157

Comments

nedbat commented Nov 8, 2011

nedbat commented Dec 5, 2011

nedbat commented Dec 5, 2011

nedbat commented Dec 19, 2011

nedbat commented Dec 19, 2011

nedbat commented Dec 21, 2011

nedbat commented Mar 22, 2012

nedbat commented Mar 22, 2012