Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Script fails with exception when encountering UTF-8 character #9

Closed
GoogleCodeExporter opened this issue Aug 23, 2015 · 12 comments
Closed

Comments

@GoogleCodeExporter
Copy link

What steps will reproduce the problem?
1. run ./gitinspector.py script against repo with UTF-8 character in author's 
name


What is the expected output? What do you see instead?

raceback (most recent call last):
  File "./gitinspector.py", line 136, in <module>
    __run__.output()
  File "./gitinspector.py", line 57, in output
    outputable.output(changes.ChangesOutput(self.hard))
  File "/Users/tajima/Downloads/gitinspector/outputable.py", line 37, in output
    outputable.output_text()
  File "/Users/tajima/Downloads/gitinspector/changes.py", line 240, in output_text
    print(i.ljust(20)[0:20], end=" ")
UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 8: 
ordinal not in range(128)

The person's name has a ü

What version of the product are you using? On what operating system?

Mac OS Snow Lion
 ./gitinspector.py  --version
gitinspector 0.2.2


Please provide any additional information below.

Original issue reported on code.google.com by johntaj...@gmail.com on 12 Jul 2013 at 10:26

@dgruss
Copy link

dgruss commented May 15, 2021

Doesn't seem to work for me. Locales were already configured properly I think.

>>> import locale
>>> import sys
>>> 
>>> print locale.getpreferredencoding()
UTF-8
>>> print sys.getdefaultencoding()
ascii
>>> print sys.stdout.encoding
UTF-8
>>> print sys.stdin.encoding
UTF-8
>>> 

Still:
UnicodeEncodeError: 'ascii' codec can't encode character u'\u201c' in position 0: ordinal not in range(128)

@adam-waldenberg
Copy link
Member

adam-waldenberg commented May 15, 2021

@dgruss It's not - because its trying to encode a UTF-8 character into ascii - which it can't do. This is not a locale issue, but rather a terminal configuration issue.

Your problem is;

>>> print sys.getdefaultencoding()
ascii

So it's doing exactly what it should. Either change the terminal encoding to whatever the repo uses (UTF-8 in this case), or use the environment variable PYTHONIOENCODING to force it into UTF-8 regardless of what the terminal says.

You can read more about it here;
https://docs.python.org/3/using/cmdline.html#envvar-PYTHONIOENCODING

Redirecting to a file should also do the trick, because that defaults to UTF-8 regardless.

@dgruss
Copy link

dgruss commented May 15, 2021 via email

@adam-waldenberg
Copy link
Member

adam-waldenberg commented May 16, 2021

No. The encoding for the terminal where you run gitinspector will always be the same. It doesn't matter what the source encoding is. Essentially, your problem here is that Python is trying to convert and show a character that is not available in the ascii charset. A UTF-8 destination, on the other hand, will support most characters and the conversion will work.

We can't display any data in the terminal if it's inherently impossible to do so. If the terminal doesnt support a certain character - it just doesn't. Python has ignoreor replace parameters that you can use when doing encoding. However, doing so would cause non-deterministic behavior where running on different terminals could create different results - something that's not desirable.

@dgruss
Copy link

dgruss commented May 16, 2021

Ok, then I'd add one more solution to the list here as PYTHONIOENCODING didn't change anything on my server:

Add to /usr/lib/python2.7/sitecustomize.py the code:

import sys
sys.setdefaultencoding('UTF-8')

Works then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants