Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong os_sorted sorting with special character in filename #145

Closed
toroConverter opened this issue Jan 29, 2022 · 2 comments · Fixed by #146
Closed

Wrong os_sorted sorting with special character in filename #145

toroConverter opened this issue Jan 29, 2022 · 2 comments · Fixed by #146

Comments

@toroConverter
Copy link

Describe the bug
It can happen that the os_sorted functionality does not sort correctly files with special character inside

Expected behavior
File sorting equal to Windows Explorer

Environment (please complete the following information):

  • Python Version: 3.9
  • Natsort Version: 8.0.2
  • OS Windows
  • If the bug involves LOCALE or humansorted:
    • Is PyICU installed? No
    • Do you have a locale set? If so, to what? Italian

To Reproduce

from natsort import os_sorted

file_list = ['Try.Me.Bug - 09 - One.Two.Three.[text].mkv',
             'Try.Me.Bug - 07 - One.Two.5.[text].mkv',
             'Try.Me.Bug - 08 - One.Two.Three[text].mkv']

file_list2 = ['TryMe - 02 - One Two Three [text].mkv',
              'TryMe_-_03_-_One_Two_Three_[text].mkv',
              'TryMe_-_01_-_One_Two_Three_[text].mkv']

for file in os_sorted(file_list):
    print(file)

for file in os_sorted(file_list2):
    print(file)

Expected sorting:

file_list
Try.Me.Bug - 07 - One.Two.5.[text].mkv
Try.Me.Bug - 08 - One.Two.Three[text].mkv
Try.Me.Bug - 09 - One.Two.Three.[text].mkv

file_list2
TryMe_-_01_-_One_Two_Three_[text].mkv
TryMe - 02 - One Two Three [text].mkv
TryMe_-_03_-_One_Two_Three_[text].mkv

Actual sorting

file_list
Try.Me.Bug - 08 - One.Two.Three[text].mkv
Try.Me.Bug - 09 - One.Two.Three.[text].mkv
Try.Me.Bug - 07 - One.Two.5.[text].mkv

file_list2
TryMe - 02 - One Two Three [text].mkv
TryMe_-_01_-_One_Two_Three_[text].mkv
TryMe_-_03_-_One_Two_Three_[text].mkv


At the moment the only way to overcome this issue is to use os_sorted along with key=lambda x: re.sub(r'[^a-zA-Z0-9]+', ' ', x)

@SethMMorton
Copy link
Owner

I agree that the Try.Me.Bug - 07 - One.Two.5.[text].mkv case was a bug (at least undesired behavior) and I have a fix proposed in #146.

However, the TryMe_-_01_-_One_Two_Three_[text].mkv case is not a bug, and you would need to use something like key=lambda x: x.replace("_", " ") to sort it properly. This is because the code has no way to tell that you want to treat "_" and " " the same unless you tell it to.

@toroConverter
Copy link
Author

@SethMMorton ok thanks a lot for your feedback and your work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants