-
-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve name:latin logic #147
Conversation
https://github.com/onthegomap/planetiler/actions/runs/2039157136 ℹ️ Base Logs 34f2be7
ℹ️ This Branch Logs 0c8d9e9
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me at a glance. The notes below are only for completeness, but I’m glad my penchant for proper typography won’t cause OpenStreetMap Americana’s labels to be translated into QIDs going forward. 😅
// Name tags that should be eligible for finding a latin name. | ||
// See https://wiki.openstreetmap.org/wiki/Multilingual_names | ||
private static final Predicate<String> VALID_NAME_TAGS = | ||
Pattern.compile("^name:[a-z]{2,3}(-[a-z]{4})?([-_][a-z]{2,})?(-[a-z]{2})?$", Pattern.CASE_INSENSITIVE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically, BCP 47 allows a UN M49 region code in place of an ISO 3166-1 country code, but the only extant occurrences are mere coincidences, all of them instances of mistagging.
The abc-x-extension
syntax is also permitted for unregistered extensions, but name:fr-x-gallo
is the only key with non-negligible use, for Gallo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Added those cases and they pass now. The transliteration improvements will be a larger project, will need to tackle that in a separate change later.
* name:latin improvements * improve latin letter regex * allow region codes and x-extension's on localized names
Simplify/improve latin character regular expressions and limit
name:latin
lookup to just language-specificname:*
tags and not tags likename:prefix
orname:etymology:wikidata
. Does not attempt to improve transliteration, that will be a future change.Fixes #146 and partially improves #86.