Use different collation for Asset `path` field #1885

jjnesbitt · 2024-03-05T20:07:22Z

This fixes an issue where assets were returned seemingly out of order when ordered by path. The underlying reason for this is that the default collation used by the database (in my local case and production en_us.utf-8) has specific rules when it comes to handling punctuation or special characters. Namely, it seems that they are largely ignored on the first path of sorting, later used to resolve ties if necessary. Aside from that, it seems there are other rules, of which I don't fully understand, but that seem to be in conflict with our requirements for ordering this field.

What this means is that if there are two paths like the following:

a/z
aa/z

Since the default collation will initially ignore the slashes, the comparison it sees is between az and aaz, so it will sort these in the order aa/z, a/z. This is obviously in stark contrast to what we want, since these paths are supposed to represent file-system paths, and in that case, you would obviously sort them the other way around.

To resolve this, the collation for the Asset path field is set to C.utf8. The postgres docs explain this a bit, but the following quote is fairly concise:

The C and POSIX collations both specify “traditional C” behavior, in which only the ASCII letters “A” through “Z” are treated as letters, and sorting is done strictly by character code byte values.

As far as I can tell, C.utf8 is just the C collation with Unicode encoding.

jjnesbitt · 2024-03-05T23:55:10Z

~~It seems there's some difference between the postgres in CI and locally that I was unaware of. Looking into that now...~~

This has been fixed.

jjnesbitt · 2024-03-06T17:45:25Z

@mvandenburgh This is good to go now.

mvandenburgh

To add to your explanation, I found this post to be informative about this as well - https://dba.stackexchange.com/a/240950

dandibot · 2024-03-06T22:43:26Z

🚀 PR was released in v0.3.78 🚀

jjnesbitt requested a review from mvandenburgh March 5, 2024 20:07

Change collation on asset path field

187e13c

jjnesbitt force-pushed the 1839-path-ordering branch from c4d0ef5 to 187e13c Compare March 6, 2024 17:07

mvandenburgh approved these changes Mar 6, 2024

View reviewed changes

jjnesbitt added patch Increment the patch version when merged release Create a release when this pr is merged labels Mar 6, 2024

jjnesbitt merged commit ebce2d2 into master Mar 6, 2024
11 checks passed

jjnesbitt deleted the 1839-path-ordering branch March 6, 2024 22:42

dandibot added the released This issue/pull request has been released. label Mar 6, 2024

jjnesbitt mentioned this pull request Mar 6, 2024

Change asset path collation to "C" #1888

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use different collation for Asset `path` field #1885

Use different collation for Asset `path` field #1885

jjnesbitt commented Mar 5, 2024 •

edited

Loading

jjnesbitt commented Mar 5, 2024 •

edited

Loading

jjnesbitt commented Mar 6, 2024

mvandenburgh left a comment

dandibot commented Mar 6, 2024

Use different collation for Asset path field #1885

Use different collation for Asset path field #1885

Conversation

jjnesbitt commented Mar 5, 2024 • edited Loading

jjnesbitt commented Mar 5, 2024 • edited Loading

jjnesbitt commented Mar 6, 2024

mvandenburgh left a comment

Choose a reason for hiding this comment

dandibot commented Mar 6, 2024

Use different collation for Asset `path` field #1885

Use different collation for Asset `path` field #1885

jjnesbitt commented Mar 5, 2024 •

edited

Loading

jjnesbitt commented Mar 5, 2024 •

edited

Loading