Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Require user names to be unique after unicode normalisation #4405

Merged
merged 1 commit into from
Jan 17, 2024

Conversation

tomhughes
Copy link
Member

As with the previous checks on case sensitivity this only affects new users, and changes to names of existing users.

Once this is merged and deployed we can drop the old lowercase index in a separate change.

As with the previous checks on case sensitivity this only affects
new users, and changes to names of existing users.
@tomhughes tomhughes force-pushed the normalize-display-name branch from dad2502 to c12f895 Compare December 13, 2023 22:29
@gravitystorm
Copy link
Collaborator

This looks good to me, and I like using postgresql to do the normalisation. And I like the roman numeral stuff in the tests too, nice find!

However, normalize ( text [, form ] ) → text isn't available in postgresql 12, which is our current minimum version as that's what shipped in Ubuntu 20.04. So if we want to merge this, then we need to update our minimums to postgresql 13, and by implication, Ubuntu 22.04.

What do we all think about that? Personally, I'm sure you can guess how I noticed the issue, but I don't know how many other developers will be affected or whether I'm the only one who needs to knuckle down and do some updates.

@tomhughes
Copy link
Member Author

So basically you have to do it in the database in order to efficiently look for duplicates - the alternative to the function would be to add an extra column with the normalized name that was indexed and then you could compute the normalized name in ruby and use the normal rails uniqueness validator on it.

Now we might want up wanting to do that anyway if we want to go further than what this does. I know @grischard keeps threatening to produce a list of homonyms he'd like to block which I'm pretty sure includes things like cyrillic characters that are similar to latin characters which this won't find.

Personally I'm happy to go to postgres 13 and ubuntu 22.04 as a minimum - we're getting close to 24.04 now anyway at which point we would normally start to wind down 20.04 support.

@pnorman
Copy link
Contributor

pnorman commented Dec 21, 2023

I am strongly in favour of requiring 12. I doubt parts of the toolset work on earlier versions, with osmdbt requiring logical replication. Recent postgres versions are easily available with pgdg on Ubuntu, Debian, and RHEL-based systems.

I was researching another way to do it which right now is equivalent in functionality, but could be much better under PostgreSQL 16.

PostgreSQL 12 added non-deterministic collations with an index created on that collation, then you get SELECT 'n' = 'ñ COLLATE usernames; returning true and using an index.

Something this would create a suitable collation

CREATE COLLATION usernames (
provider = icu,
deterministic = false, 
locale = 'und-u-ka-shifted-kk-ks-level1'
);

This would only look at the base character, case insensitive. e.g. 'N' = 'ñ'.

Where this approach shines is under PostgreSQL 16, where you can add tailoring rules which set equality differently.

CREATE COLLATION coll1 (
provider = icu,
deterministic = false,
locale = 'und',
rules = '& a = b');
SELECT 'a' = 'b' COLLATE coll1;
 ?column?
──────────
 t
(1 row)

I'm still trying to figure out how to start with a locale other than a base locale when adding rules, as well as how to handle all the quoting needed when every character you care about is a homograph to another.

@gravitystorm
Copy link
Collaborator

I am strongly in favour of requiring 12.

Ah there's a bit of confusion here. We already require 12, my PR yesterday was just to update the documentation.

The discussion now is whether we need to require 13, which is a different matter, since it's not what ships with the oldest currently support Ubuntu LTS.

@pnorman
Copy link
Contributor

pnorman commented Dec 21, 2023

Ah there's a bit of confusion here. We already require 12, my PR yesterday was just to update the documentation.

The discussion now is whether we need to require 13, which is a different matter, since it's not what ships with the oldest currently support Ubuntu LTS.

Ah. My arguments also are equally valid for 13 - I don't believe there are any common OS versions that don't provide 13 that do provide 12.

@gravitystorm gravitystorm merged commit d5efa4c into openstreetmap:master Jan 17, 2024
20 checks passed
@gravitystorm
Copy link
Collaborator

Thanks @tomhughes for the PR, I've merged it now. Sorry for the delay while I upgraded my dev environment in order to test it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants