Require user names to be unique after unicode normalisation #4405

tomhughes · 2023-12-13T21:05:21Z

As with the previous checks on case sensitivity this only affects new users, and changes to names of existing users.

Once this is merged and deployed we can drop the old lowercase index in a separate change.

As with the previous checks on case sensitivity this only affects new users, and changes to names of existing users.

gravitystorm · 2023-12-20T15:38:08Z

This looks good to me, and I like using postgresql to do the normalisation. And I like the roman numeral stuff in the tests too, nice find!

However, normalize ( text [, form ] ) → text isn't available in postgresql 12, which is our current minimum version as that's what shipped in Ubuntu 20.04. So if we want to merge this, then we need to update our minimums to postgresql 13, and by implication, Ubuntu 22.04.

What do we all think about that? Personally, I'm sure you can guess how I noticed the issue, but I don't know how many other developers will be affected or whether I'm the only one who needs to knuckle down and do some updates.

tomhughes · 2023-12-20T15:49:14Z

So basically you have to do it in the database in order to efficiently look for duplicates - the alternative to the function would be to add an extra column with the normalized name that was indexed and then you could compute the normalized name in ruby and use the normal rails uniqueness validator on it.

Now we might want up wanting to do that anyway if we want to go further than what this does. I know @grischard keeps threatening to produce a list of homonyms he'd like to block which I'm pretty sure includes things like cyrillic characters that are similar to latin characters which this won't find.

Personally I'm happy to go to postgres 13 and ubuntu 22.04 as a minimum - we're getting close to 24.04 now anyway at which point we would normally start to wind down 20.04 support.

pnorman · 2023-12-21T10:02:34Z

I am strongly in favour of requiring 12. I doubt parts of the toolset work on earlier versions, with osmdbt requiring logical replication. Recent postgres versions are easily available with pgdg on Ubuntu, Debian, and RHEL-based systems.

I was researching another way to do it which right now is equivalent in functionality, but could be much better under PostgreSQL 16.

PostgreSQL 12 added non-deterministic collations with an index created on that collation, then you get SELECT 'n' = 'ñ COLLATE usernames; returning true and using an index.

Something this would create a suitable collation

CREATE COLLATION usernames (
provider = icu,
deterministic = false, 
locale = 'und-u-ka-shifted-kk-ks-level1'
);

This would only look at the base character, case insensitive. e.g. 'N' = 'ñ'.

Where this approach shines is under PostgreSQL 16, where you can add tailoring rules which set equality differently.

CREATE COLLATION coll1 (
provider = icu,
deterministic = false,
locale = 'und',
rules = '& a = b');
SELECT 'a' = 'b' COLLATE coll1;
 ?column?
──────────
 t
(1 row)

I'm still trying to figure out how to start with a locale other than a base locale when adding rules, as well as how to handle all the quoting needed when every character you care about is a homograph to another.

gravitystorm · 2023-12-21T10:58:08Z

I am strongly in favour of requiring 12.

Ah there's a bit of confusion here. We already require 12, my PR yesterday was just to update the documentation.

The discussion now is whether we need to require 13, which is a different matter, since it's not what ships with the oldest currently support Ubuntu LTS.

pnorman · 2023-12-21T20:07:03Z

Ah there's a bit of confusion here. We already require 12, my PR yesterday was just to update the documentation.

The discussion now is whether we need to require 13, which is a different matter, since it's not what ships with the oldest currently support Ubuntu LTS.

Ah. My arguments also are equally valid for 13 - I don't believe there are any common OS versions that don't provide 13 that do provide 12.

gravitystorm · 2024-01-17T16:25:09Z

Thanks @tomhughes for the PR, I've merged it now. Sorry for the delay while I upgraded my dev environment in order to test it.

Refs openstreetmap#4405

Require user names to be unique after unicode normalisation

c12f895

As with the previous checks on case sensitivity this only affects new users, and changes to names of existing users.

tomhughes force-pushed the normalize-display-name branch from dad2502 to c12f895 Compare December 13, 2023 22:29

pnorman mentioned this pull request Jan 5, 2024

problem with "James Bay" in water_name (water label) openmaptiles/openmaptiles#1595

Closed

gravitystorm merged commit d5efa4c into openstreetmap:master Jan 17, 2024
20 checks passed

gravitystorm added a commit to gravitystorm/openstreetmap-website that referenced this pull request Jan 17, 2024

Update minimum PostgreSQL version to 13 in documentatation

4303829

Refs openstreetmap#4405

gravitystorm mentioned this pull request Jan 17, 2024

Update minimum PostgreSQL version to 13 in documentatation #4484

Merged

tomhughes deleted the normalize-display-name branch January 17, 2024 19:01

Woazboat mentioned this pull request May 3, 2024

Deprecate support for Ubuntu 20.04 zerebubuth/openstreetmap-cgimap#404

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Require user names to be unique after unicode normalisation #4405

Require user names to be unique after unicode normalisation #4405

tomhughes commented Dec 13, 2023

gravitystorm commented Dec 20, 2023

tomhughes commented Dec 20, 2023

pnorman commented Dec 21, 2023

gravitystorm commented Dec 21, 2023

pnorman commented Dec 21, 2023

gravitystorm commented Jan 17, 2024

Require user names to be unique after unicode normalisation #4405

Require user names to be unique after unicode normalisation #4405

Conversation

tomhughes commented Dec 13, 2023

gravitystorm commented Dec 20, 2023

tomhughes commented Dec 20, 2023

pnorman commented Dec 21, 2023

gravitystorm commented Dec 21, 2023

pnorman commented Dec 21, 2023

gravitystorm commented Jan 17, 2024