-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to get accented chars to group with their unaccented versions? #21
Comments
Note if I use |
Do you have PyICU installed? I have found that python's built-in |
I saw the note about PyICU in the docs, and specifically recommended for OSX. Before I install that rather large package, (a) what sequence would you expect the above code to print, if everything is working as you expect it (e.g. on your own test system)? and 2, would you expect changing locale from en_US to fr_FR or de_DE to make a difference? |
a. I would expect the sequence that Qt printed to be the correct sequence. I can confirm that using Mac OS X's locale library (python uses's the system's C locale library), I get the (incorrect) results that you see. Below is the test file I used. # -*- coding: utf-8 -*-
from __future__ import print_function, unicode_literals
import locale
from natsort import natsort_keygen, ns
words = ['apple', 'åpple', 'Apple', 'Äpple', 'Epple', 'Èpple', 'épple', 'epple']
locale.setlocale(locale.LC_ALL, str('de_DE.UTF-8'))
key_func_L = natsort_keygen(alg=ns.LOCALE)
print(' '.join(sorted(words, key=key_func_L))) When I disabled
When I turn on
This is identical to what Qt is reporting. |
Unfortunately, this is not something I can fix... it is a bug in the BSD locale implementation. There is a recent Python bug report on this... check it out: http://bugs.python.org/issue23195 (also check this out: http://stackoverflow.com/questions/3412933/python-not-sorting-unicode-properly-strcoll-doesnt-help). I'll definitely keep an eye on the bug report, but notice one of the solutions suggested is to install PyICU. Incidentally, it seems like the only affected locales are en_US, fr_FR and de_DE, which are the three you tried. I'll make sure to update the docs in the next release to indicate that PyICU should only be needed on Mac OS X and BSD. |
BTW, if you use HomeBrew (and I recommend it!), you can easily install ICU and PyICU with the following commands:
HomeBrew does not link icu4c to the system to avoid conflicts, so you need to tell python where to find it when installing PyICU. |
Yes, good. I had to add exports, pip didn't pick up the flags otherwise. Putting this in for reference for anybody else:
After which natsort did behave as you say. |
I am writing a PyQt app and am unhappy with the performance of their table sorting. However it does do "locale-aware" sorting in what I believe to be the correct way. Given this word list:
and not ignoring case, Qt sorts in the order: apple, Apple, åpple, Äpple, epple, Epple, épple, Èpple
That is, all forms of A are grouped, then all forms of E. When I do the same sort in native Python using natsort:
The resulting order is 'Apple', 'Epple', 'apple', 'epple', 'Äpple', 'Èpple', 'åpple', 'épple'
That is, all accented forms sort higher than un-accented forms. I am not so concerned that in the one, lowercase is first and the other, uppercase is first. I am concerned that in a long table, words starting with é may be hundreds of rows removed from words starting with e.
I am working in Python 3.4, PyQt5.4, Mac OS 10.10. Changing the locale to fr_FR and de_DE didn't make any difference.
The text was updated successfully, but these errors were encountered: