Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeDecodeError on Danish Linux #476

Closed
giampaolo opened this issue May 23, 2014 · 10 comments
Closed

UnicodeDecodeError on Danish Linux #476

giampaolo opened this issue May 23, 2014 · 10 comments

Comments

@giampaolo
Copy link
Owner

From l...@hupfeldtit.dk on February 12, 2014 14:00:10

proc.name or proc.cmdline containing non ascii character results in below error:

    arg_name = os.path.basename(proc.cmdline[1]) if proc.cmdline else None
  File "/usr/lib/python3.3/site-packages/psutil/__init__.py", line 402, in cmdline
    return self._platform_impl.get_process_cmdline()
  File "/usr/lib/python3.3/site-packages/psutil/_pslinux.py", line 463, in wrapper
    return fun(self, *args, **kwargs)
  File "/usr/lib/python3.3/site-packages/psutil/_pslinux.py", line 531, in 
get_process_cmdline
    return [x for x in f.read().split('\x00') if x]
  File "/usr/lib/python3.3/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 92: 
ordinal not in range(128)

It seems wrong to do an 'ascii.decode', name and cmdline may contain non ascii characters

Original issue: http://code.google.com/p/psutil/issues/detail?id=476

@giampaolo
Copy link
Owner Author

From g.rodola on February 12, 2014 08:50:40

Mmmm, I fear this is going to be a nasty one.
Could you please paste the output of the following commands?

$ python -c "import sys; print(sys.getfilesystemencoding())"
$ echo $LC_ALL
$ echo $LANG

Also, it would be interesting to see how ps represents those commands, so can 
you paste the interesting part(s) of ps output as well.

@giampaolo
Copy link
Owner Author

From l...@hupfeldtit.dk on February 14, 2014 15:24:26

I finally got around to reporducing the error.
I does not require a non-english linux setup, although it is unlikely to on an 
english login.
I did not have an LC_ALL env var, but I had an LC_CTYPE var and LANG.

To reproduce:
unset LC_CTYPE and LANG (they both need to ne unset)
run the attached æøåÅ.sh
run the attached psutil_test.py (while the above is running)

@giampaolo
Copy link
Owner Author

From g.rodola on February 15, 2014 12:13:29

OK, I can reproduce the problem. If the correct encoding is set for the shell 
python 2.X returns a bytes string (because file is open in binary mode) while 
3.X will report the right cmdline (because text mode is the default):

giampaolo@UX32VD:~/svn/psutil$ python2.7 -c "import psutil; 
print(psutil.Process().cmdline())" æøåÅ.sh  
['python2.7', '-c', 'import psutil; print(psutil.Process().cmdline())', 
'\xc3\xa6\xc3\xb8\xc3\xa5\xc3\x85.sh']

giampaolo@UX32VD:~/svn/psutil$ python3.4 -c "import psutil; 
print(psutil.Process().cmdline())" æøåÅ.sh ['python3.4', '-c', 'import psutil; 
print(psutil.Process().cmdline())', 'æøåÅ.sh']

If the correct encoding is not set we'll get the same byte string on Python 2.x 
and UnicodeEncodeError on Python 3.x.
I'm not sure what's best to do here.
I think we should always open the file in text mode on both Python versions so 
that we return the right value.
On the other hand I'm not sure what's best to do in case of encoding errors.
Python provides different options for dealing with them: 
http://docs.python.org/3.4/library/functions.html#open We may choose to use 
errors='ignore' or errors='replace' although I don't like imposing such a 
decision on the users.

Note: other than cmdline() the problem also affects process name() and exe() methods.

I'll also have to make sure what happens on systems different than Linux.

@giampaolo
Copy link
Owner Author

From g.rodola on February 15, 2014 12:22:08

FWIW "ps" replaces the invalid characters with "?" which reflects 
errors="replace" Python behavior.

@giampaolo
Copy link
Owner Author

From l...@hupfeldtit.dk on February 16, 2014 04:08:58

I think the problem is that if the user locale is not setup correctly, the file 
is not opened with UTF-8 encoding, even though the proc filesystem is (always?) 
UTF-8 encoded on newer Linuxes.

As shown below 'ps', does not work, but 'cat' does and if "encoding='UTF-8'" is 
specified in python, then python works as well. I don't think it is correct to 
depend on the user locale. What would the interpretation of a proc created by a 
user with a different locale be?

------
.. 15686]$ unset LC_CTYPE
.. 15686]$ unset LANG
.. 15686]$ ps auxww | grep 15686
xxx      15686  0.0  0.0 113116  1428 pts/7    S+   12:30   0:00 /bin/bash 
./????????.sh
.. 15686]$ cat cmdline
/bin/bash./æøåÅ.sh
.. 15686]$ python3
Python 3.3.2 (default, Nov  7 2013, 10:01:05) 
[GCC 4.8.1 20130814 (Red Hat 4.8.1-6)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> with open('cmdline') as ll:
...     print(ll.read())
... 
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/usr/lib64/python3.3/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 12: 
ordinal not in range(128)
>>> with open('cmdline', encoding='UTF-8') as ll:
...     print(ll.read())
... 
/bin/bash./æøåÅ.sh

@giampaolo
Copy link
Owner Author

From g.rodola on February 16, 2014 05:37:28

Thanks for sharing this info.
It seems sys.getdefaultencoding() always return 'utf8' no matter what the 
current locale is therefore that looks like the way to go on Python 3.
Fixed in revision 42c5b20d7f5b .

@giampaolo
Copy link
Owner Author

From l...@hupfeldtit.dk on February 16, 2014 05:45:49

Thank you for providing psutil. I makes system management with python so much easier.

@giampaolo
Copy link
Owner Author

From g.rodola on February 16, 2014 08:41:24

Glad to hear psutil is useful to you. 
Cheers.

@giampaolo
Copy link
Owner Author

From g.rodola on March 09, 2014 15:26:29

Status: FixedInHG
Labels: Milestone-2.0.0

@giampaolo
Copy link
Owner Author

From g.rodola on March 10, 2014 04:36:50

Closing out as fixed as 2.0.0 version is finally out.

Status: Fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant