Use different collation for Asset path
field
#1885
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Closes #1839
This fixes an issue where assets were returned seemingly out of order when ordered by
path
. The underlying reason for this is that the default collation used by the database (in my local case and productionen_us.utf-8
) has specific rules when it comes to handling punctuation or special characters. Namely, it seems that they are largely ignored on the first path of sorting, later used to resolve ties if necessary. Aside from that, it seems there are other rules, of which I don't fully understand, but that seem to be in conflict with our requirements for ordering this field.What this means is that if there are two paths like the following:
a/z
aa/z
Since the default collation will initially ignore the slashes, the comparison it sees is between
az
andaaz
, so it will sort these in the orderaa/z
,a/z
. This is obviously in stark contrast to what we want, since these paths are supposed to represent file-system paths, and in that case, you would obviously sort them the other way around.To resolve this, the collation for the Asset
path
field is set toC.utf8
. The postgres docs explain this a bit, but the following quote is fairly concise:As far as I can tell,
C.utf8
is just the C collation with Unicode encoding.