Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use different collation for Asset path field #1885

Merged
merged 1 commit into from
Mar 6, 2024
Merged

Conversation

jjnesbitt
Copy link
Member

@jjnesbitt jjnesbitt commented Mar 5, 2024

Closes #1839

This fixes an issue where assets were returned seemingly out of order when ordered by path. The underlying reason for this is that the default collation used by the database (in my local case and production en_us.utf-8) has specific rules when it comes to handling punctuation or special characters. Namely, it seems that they are largely ignored on the first path of sorting, later used to resolve ties if necessary. Aside from that, it seems there are other rules, of which I don't fully understand, but that seem to be in conflict with our requirements for ordering this field.

What this means is that if there are two paths like the following:

  • a/z
  • aa/z

Since the default collation will initially ignore the slashes, the comparison it sees is between az and aaz, so it will sort these in the order aa/z, a/z. This is obviously in stark contrast to what we want, since these paths are supposed to represent file-system paths, and in that case, you would obviously sort them the other way around.


To resolve this, the collation for the Asset path field is set to C.utf8. The postgres docs explain this a bit, but the following quote is fairly concise:

The C and POSIX collations both specify “traditional C” behavior, in which only the ASCII letters “A” through “Z” are treated as letters, and sorting is done strictly by character code byte values.

As far as I can tell, C.utf8 is just the C collation with Unicode encoding.

@jjnesbitt
Copy link
Member Author

jjnesbitt commented Mar 5, 2024

It seems there's some difference between the postgres in CI and locally that I was unaware of. Looking into that now...

This has been fixed.

@jjnesbitt
Copy link
Member Author

@mvandenburgh This is good to go now.

Copy link
Member

@mvandenburgh mvandenburgh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To add to your explanation, I found this post to be informative about this as well - https://dba.stackexchange.com/a/240950

@jjnesbitt jjnesbitt added patch Increment the patch version when merged release Create a release when this pr is merged labels Mar 6, 2024
@jjnesbitt jjnesbitt merged commit ebce2d2 into master Mar 6, 2024
11 checks passed
@jjnesbitt jjnesbitt deleted the 1839-path-ordering branch March 6, 2024 22:42
@dandibot
Copy link
Member

dandibot commented Mar 6, 2024

🚀 PR was released in v0.3.78 🚀

@dandibot dandibot added the released This issue/pull request has been released. label Mar 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
patch Increment the patch version when merged release Create a release when this pr is merged released This issue/pull request has been released.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

…/assets/?order=path is horribly broken
3 participants