-
Notifications
You must be signed in to change notification settings - Fork 859
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tidy locationSet during build #7226
Conversation
We should use gb consistently, firstly because its the correct/actual location, secondly to make it easier and better compatibility for external tools and finally because it makes searches on nsi.guide more consistent too.
I agree it's probably ok to just have the build script replace 'uk' -> 'gb', but I kind of don't like the idea of adding a config file for this. |
Okay, I was just going off the others. There are also theoretically some other mappings that could happen, although I suspect this is by far the most common... |
Yeah I think if it's just this one thing, I'd rather have an Also, from last year I got the sense that people might also prefer to use the more specific QIDs for the UK like on this issue: |
IMHO that would be a step backwards, especially if it's only done for the UK. E.g. https://github.com/westnordost/osmfeatures currently consumes this data (via the iD presets too, but it doesn't handle the UK nations currently, so anything tagged specifically in England for example doesn't currently show up. It strikes me that heading towards more niche (rather than internationally standardised via ISO) designations isn't going to help matters. Whereas pretty much any language can find out which ISO country code you're in without too much trouble. |
I'm ashamed that I've actually never been able to figure out the nuances that exist between gb and uk. At first I think this PR meaning to replace |
No worries, I hadn't really looked into it much myself in the past despite living here! This diagram shows it quite nicely: So pendantically, gb<uk, however given only one of them is defined in ISO 3166-1: And the other is just reserved: The confusion is our inconsistency between our ISO 3166-1 alpha-2 code and country code TLD, which I think is fairly unique.
Yeah I figured that while changing them once would resolve the short term issue, inevitably more would sneak in, hence the scripting...
I assume this is people using a language code rather than a country code? I think your list is 50-50 currently, from what I can see, these are fine as there's no current reservation: But this is more problematic: And this is just blocked: So is @bhousel happy for my config file to be reinstated if it's now more complicated... We could also offer reverse mapping for some misused country codes back to language codes: Apart from this one, where both exist: It might make sense to push some of this into config too, given it's the same pairings: name-suggestion-index/scripts/build_index.js Lines 482 to 515 in 03b3934
It doesn't look like we currently do any validation that language codes are sane, which ought to be fairly easy, or country codes, which might be harder given some of the other stuff supported via location-conflation. |
Oh, I haven't noticed this, so I think we can only do this check when those langcode also appear in tags: e.g:
Because we found a |
I think the build script does validate all the locationSets, so all the codes appearing in there should be a valid country code (or something recognized as one). As far as I remember, we don't really validate the language codes, and I'm not sure I want to get too deep into doing that, since this topic seems to be full of special cases and exceptions: https://wiki.openstreetmap.org/wiki/Multilingual_names @LaoshuBaby said:
I think if someone tries to put a |
Emmm……So I guess, our coding philosophy is "let it crash" to expose this error, not try to fix it, even we can infer what is correct……? OK, I won't request for this checker&auto-fix in build script again |
Thanks @peternewman this seems ok 👍
Yes - I think having the build fail and having a human take a look at why is a good thing. (This is different from what @peternewman wants, where there are several valid codes and he wants to just standardize on one of them.) |
I'll admit I hadn't tested this very well, aside from that it compiled, but I'm assuming your run in e9efc61 should have included it @bhousel so it looks like maybe it doesn't actually work currently? 😢 Can anyone spot anything obvious I've missed?
Confirmed it does (although incidentally the "chalk" doesn't work in GitHub actions it seems, now fixed in #7283 ): We also don't currently have GitHub annotations to flag these errors up to people, I could have a go at adding them if you'd like?
I appreciate it can get complicated, but presumably we could e.g. validate that any two character language code was an allocated one, even if we didn't check all the sub-codes? |
- Add Westfield - Update countries for TK Maxx (not a typo)
We should use gb consistently, firstly because its the correct/actual location, secondly to make it easier and better compatibility for external tools and finally because it makes searches on nsi.guide more consistent too.