-
Notifications
You must be signed in to change notification settings - Fork 922
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Require user names to be unique after unicode normalisation #4405
Require user names to be unique after unicode normalisation #4405
Conversation
As with the previous checks on case sensitivity this only affects new users, and changes to names of existing users.
dad2502
to
c12f895
Compare
This looks good to me, and I like using postgresql to do the normalisation. And I like the roman numeral stuff in the tests too, nice find! However, What do we all think about that? Personally, I'm sure you can guess how I noticed the issue, but I don't know how many other developers will be affected or whether I'm the only one who needs to knuckle down and do some updates. |
So basically you have to do it in the database in order to efficiently look for duplicates - the alternative to the function would be to add an extra column with the normalized name that was indexed and then you could compute the normalized name in ruby and use the normal rails uniqueness validator on it. Now we might want up wanting to do that anyway if we want to go further than what this does. I know @grischard keeps threatening to produce a list of homonyms he'd like to block which I'm pretty sure includes things like cyrillic characters that are similar to latin characters which this won't find. Personally I'm happy to go to postgres 13 and ubuntu 22.04 as a minimum - we're getting close to 24.04 now anyway at which point we would normally start to wind down 20.04 support. |
I am strongly in favour of requiring 12. I doubt parts of the toolset work on earlier versions, with osmdbt requiring logical replication. Recent postgres versions are easily available with pgdg on Ubuntu, Debian, and RHEL-based systems. I was researching another way to do it which right now is equivalent in functionality, but could be much better under PostgreSQL 16. PostgreSQL 12 added non-deterministic collations with an index created on that collation, then you get Something this would create a suitable collation CREATE COLLATION usernames (
provider = icu,
deterministic = false,
locale = 'und-u-ka-shifted-kk-ks-level1'
); This would only look at the base character, case insensitive. e.g. Where this approach shines is under PostgreSQL 16, where you can add tailoring rules which set equality differently. CREATE COLLATION coll1 (
provider = icu,
deterministic = false,
locale = 'und',
rules = '& a = b');
SELECT 'a' = 'b' COLLATE coll1;
?column?
──────────
t
(1 row) I'm still trying to figure out how to start with a locale other than a base locale when adding rules, as well as how to handle all the quoting needed when every character you care about is a homograph to another. |
Ah there's a bit of confusion here. We already require 12, my PR yesterday was just to update the documentation. The discussion now is whether we need to require 13, which is a different matter, since it's not what ships with the oldest currently support Ubuntu LTS. |
Ah. My arguments also are equally valid for 13 - I don't believe there are any common OS versions that don't provide 13 that do provide 12. |
Thanks @tomhughes for the PR, I've merged it now. Sorry for the delay while I upgraded my dev environment in order to test it. |
As with the previous checks on case sensitivity this only affects new users, and changes to names of existing users.
Once this is merged and deployed we can drop the old lowercase index in a separate change.