Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTMLReporter fails when source file is encoded in UTF-8 with BOM signature #179

Closed
nedbat opened this issue May 8, 2012 · 2 comments
Closed
Labels
bug Something isn't working html

Comments

@nedbat
Copy link
Owner

nedbat commented May 8, 2012

Originally reported by pablodcar (Bitbucket: pablodcar, GitHub: pablodcar)


Hi, I'm thankful for this wonderful tool. We are using it very extensively and I hope to contribute adding new APIs and features in the future.

When a source code is encoded in UTF-8 with BOM signature, //coverage.phystokens.source_encoding// returns the correct encoding: //"utf-8-sig"//. But when the file is rendered inside the html template, using that encoding to write the report to disk, it raises a //UnicodeDecodeError//, because the BOM can not be in the middle of the final output:

  File "/home/pablo/baco-dyn/lib/python2.6/site-packages/coverage/control.py", line 603, in html_report
    reporter.report(morfs)
  File "/home/pablo/baco-dyn/lib/python2.6/site-packages/coverage/html.py", line 87, in report
    self.report_files(self.html_file, morfs, self.config.html_dir)
  File "/home/pablo/baco-dyn/lib/python2.6/site-packages/coverage/report.py", line 83, in report_files
    report_fn(cu, self.coverage._analyze(cu))
  File "/home/pablo/baco-dyn/lib/python2.6/site-packages/coverage/html.py", line 222, in html_file
    html = html.encode(encoding)
  File "/home/pablo/baco-dyn/lib/python2.6/encodings/utf_8_sig.py", line 15, in encode
    return (codecs.BOM_UTF8 + codecs.utf_8_encode(input, errors)[0], len(input))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 18296: ordinal not in range(128)

I'm attaching a patch to decode and encode the source file in advance, using UTF-8 when utf-8-sig is detected. I hope you can review it and consider adding this change.

Thanks in advance,

Pablo Carballo


@nedbat
Copy link
Owner Author

nedbat commented May 15, 2012

Fixed in <<changeset 544017505920 (bb)>>.

I had to adapt your patch and add a little more, so the test suite would pass on 2.3 through 3.3. Thanks!

@nedbat
Copy link
Owner Author

nedbat commented May 15, 2012

Original comment by pablodcar (Bitbucket: pablodcar, GitHub: pablodcar)


Excellent, thanks!

@nedbat nedbat closed this as completed May 15, 2012
@nedbat nedbat added major bug Something isn't working html labels Jun 23, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working html
Projects
None yet
Development

No branches or pull requests

1 participant