Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chokes on source files with non-utf-8 encoding #157

Closed
nedbat opened this issue Nov 8, 2011 · 7 comments
Closed

Chokes on source files with non-utf-8 encoding #157

nedbat opened this issue Nov 8, 2011 · 7 comments
Labels
bug Something isn't working html

Comments

@nedbat
Copy link
Owner

nedbat commented Nov 8, 2011

Originally reported by Wolfgang Schnerring (Bitbucket: wosc, GitHub: wosc)


If you have python source files that are, e.g. latin-1 encoded, the reporter will die like this:

    coverage.main()
  File "/var/cache/eggs/coverage-3.5.1-py2.6-linux-x86_64.egg/coverage/cmdline.py", line 657, in main
    status = CoverageScript().command_line(argv)
  File "/var/cache/eggs/coverage-3.5.1-py2.6-linux-x86_64.egg/coverage/cmdline.py", line 549, in command_line
    directory=options.directory, **report_args)
  File "/var/cache/eggs/coverage-3.5.1-py2.6-linux-x86_64.egg/coverage/control.py", line 599, in html_report
    reporter.report(morfs, config=self.config)
  File "/var/cache/eggs/coverage-3.5.1-py2.6-linux-x86_64.egg/coverage/html.py", line 83, in report
    self.report_files(self.html_file, morfs, config, config.html_dir)
  File "/var/cache/eggs/coverage-3.5.1-py2.6-linux-x86_64.egg/coverage/report.py", line 86, in report_files
    report_fn(cu, self.coverage._analyze(cu))
  File "/var/cache/eggs/coverage-3.5.1-py2.6-linux-x86_64.egg/coverage/html.py", line 198, in html_file
    self.write_html(html_path, html)
  File "/var/cache/eggs/coverage-3.5.1-py2.6-linux-x86_64.egg/coverage/html.py", line 103, in write_html
    write_encoded(fname, html, 'ascii', 'xmlcharrefreplace')
  File "/var/cache/eggs/coverage-3.5.1-py2.6-linux-x86_64.egg/coverage/backward.py", line 137, in write_encoded
    f.write(text.decode('utf8'))
  File "/usr/local/python2.6/lib/python2.6/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe4 in position 14451: invalid continuation byte

The workaround is simple, of course, change the file's encoding and declaration (and you should be using utf-8 if any, anyway). But still I wonder whether this could be handled more gracefully and with an error message that tells what's going on.


@nedbat
Copy link
Owner Author

nedbat commented Dec 5, 2011

Issue #159 was marked as a duplicate of this issue.

@nedbat
Copy link
Owner Author

nedbat commented Dec 5, 2011

Issue #162 was marked as a duplicate of this issue.

@nedbat
Copy link
Owner Author

nedbat commented Dec 19, 2011

Original comment by Kirit Sælensminde (Bitbucket: KayEss, GitHub: KayEss)


We've recently seen this error on what appears to be a properly Unicode encoded file :(

It would be really great if we could at least get the file name that was being processed when the error is thrown. I'd be happy to look into how to do that.

What do you think the right approach would be? Change the exception type to one that includes that in the error, or try to annotate the existing exception in some way?

@nedbat
Copy link
Owner Author

nedbat commented Dec 19, 2011

@kirit: the problem isn't bad encodings, it's any encoding other than utf-8. Is that your situation? If you think you have a new scenario, attach a file demonstrating the problem.

The right way to fix the problem is to use the encoding declaration at the top of the file when reading the source.

@nedbat
Copy link
Owner Author

nedbat commented Dec 21, 2011

Original comment by Kirit Sælensminde (Bitbucket: KayEss, GitHub: KayEss)


Hi Ned. The file is UTF-8 and has the encoding declaration at the beginning -- or at least, the file that we think it is. I agree totally that the file needs to be fixed and UTF-8 is the way to go.

What I'm hoping to do for you though is to get the full file pathname that causes the problem into the exception in some way so that it's clear when the error happens which file needs fixing. I.e. the error might read:

UnicodeDecodeError: 'utf8' codec can't decode byte 0xe4 in position 14451: invalid continuation byte in file '/home/kirit/Projects/foo/bar/baz.py'

@nedbat
Copy link
Owner Author

nedbat commented Mar 22, 2012

This is now fixed in <<changeset f7acbcfe9ca9 (bb)>>.

Kirit: I never wanted you to have to "fix" your source code. If Python accepts it, coverage.py should accept it. I hope you'll find it works better now.

@nedbat nedbat closed this as completed Mar 22, 2012
@nedbat
Copy link
Owner Author

nedbat commented Mar 22, 2012

Issue #169 was marked as a duplicate of this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working html
Projects
None yet
Development

No branches or pull requests

1 participant