Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python3 "unorderable types" error with LOCALE and PyICU installed. #22

Closed
tallforasmurf opened this issue Apr 2, 2015 · 5 comments
Closed
Labels

Comments

@tallforasmurf
Copy link

The test case is quite simple,

kf = natsort.natsort_keygen( alg = (natsort.ns.LOCALE | natsort.ns.TYPESAFE ) )
ll = ['0','Á','2','Z']
sorted(ll,key=kf)
Traceback (most recent call last):
    File "<string>", line 1, in <fragment>
    builtins.TypeError: unorderable types: bytes() < str()

The issue seems to be that when the input key is numeric e.g. '2', the output of kf is ('',2), where the first element is the null string. When the input key is a letter, the output is e.g. (b'*\x05 \x01E\x88\x01\x06\x00',) where the first element is bytes.

Frankly I don't understand how kf can return a tuple in any case, since docs for sorted() (and for SortedDict, which is where I actually hit this) seem to imply it should return a scalar item. But whatever -- if in the case of a number it returned b'' instead of just '' I think all would be well. I believe this would be line 135 of utils.py?

@SethMMorton
Copy link
Owner

I don't get this behavior on my Python 3.4.3 installation with your example as-is, but if I change to ll = ['0','Á','2',b'Z'] I am able to replicate.

If I do

kf = natsort.natsort_keygen( alg = (natsort.ns.LOCALE | natsort.ns.TYPESAFE ) )
ll = ['0','Á','2',b'Z']
sorted(map(str, ll),key=kf)

I don't get any errors.

I'm a little bit confused about where b'*\x05 \x01E\x88\x01\x06\x00' is coming from. Are you calling ".encode()" on your list elements before sorting?

Either way, I'll try and come up with a way to handle this gracefully internally to natsort.

FYI: natsort returns a tuple for all inputs because it has to split the numbers from the strings, but keep them logically grouped together. An empty string is placed before numbers to avoid unorderable types errors (please see issue #7). Sorting tuples is well defined. A key can return anything as long as it is orderable, which does not seem to be the case if bytes and str are mixed.

@SethMMorton
Copy link
Owner

As a side note, the issue has nothing to do with the '2' to ('', 2) transformation, or with TYPESAFE, but rather with the unexpected bytes string that has appeared.

@tallforasmurf
Copy link
Author

Thanks for clarifying the tuple issue. But something is not comparing between our test systems.

[14:51:58 trials] python3
Python 3.4.2 (v3.4.2:ab2c023a9432, Oct  5 2014, 20:42:22) 
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import natsort
>>> kf = natsort.natsort_keygen( alg = (natsort.ns.LOCALE ) )
>>> kf('2')
('', 2.0)
>>> kf('A')
(b')\x01\x05\x01\xdc\x00',)
>>> sorted( ['0','A'], key=kf )
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unorderable types: bytes() < str()

Comparison of tuples may be well defined but Python3 does not like comparing the bytes element of the 'A' tuple to the string element of the '2' tuple:

>>> ('',9) < (b'',9)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unorderable types: str() < bytes()

Perhaps the real problem is that kf('A')==> (bytevalue,). You indicate this is not expected? Under what circumstances might this happen? Could it be an unexpected artifact of running OSX10.10 with default Locale?

>>> locale.setlocale(locale.LC_ALL)
'C/en_US.UTF-8/C/C/C/C'
>>> locale.setlocale(locale.LC_ALL,'en_US.UTF-8')
'en_US.UTF-8'
>>> kf = natsort.natsort_keygen( alg = (natsort.ns.LOCALE ) )
>>> kf('A')
(b')\x01\x05\x01\xdc\x00',)

n.b. icu is installed per your recommendation.

@SethMMorton
Copy link
Owner

I think I see the origin of the mismatch problem. I'm at work on a linux machine without PyICU, and apparently the return value of locale and PyICU are not the same. I will take a look at this when I get home tonight on my Mac.

This does need to be fixed. Thanks for finding this bug.

@SethMMorton
Copy link
Owner

Check out version 3.5.4 on PyPI. It should solve the problem (your specific example is now in the unit tests).

@SethMMorton SethMMorton changed the title Python3 "unorderable types" error with TYPESAFE Python3 "unorderable types" error with LOCALE and PyICU installed. May 8, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants