-
Notifications
You must be signed in to change notification settings - Fork 5
text-transform is not locale-aware #21
Comments
Or perhaps TileJSON (and any runtime equivalent) should be extended to map layers to language codes, which would avoid the unnecessary tile size increase from putting a language code on every individual feature. TileJSON doesn’t currently distinguish between raster and vector tiles, but this feature would be specific to vector tiles. |
We'd actually have to tag every value, since a feature can contain names for a feature in multiple languages. |
I was thinking that a source-wide mapping from source properties to languages would be more economical, since for instance Mapbox Streets’ Once we implement token fallbacks, the next version of Mapbox Streets would be able to remove those backfilled names, so that we’re left with a purely German Edit: s/layer/property/ |
@1ec5 Are you using "layer" to mean "property"? Individual features can have a |
Oof, that’s what I meant. Sorry – fixed. There are two proposals so far in this ticket:
It’s entirely possible that both proposals are rubegoldbergian and there are simpler ways to accomplish locale-aware text transforms. Any ideas? |
ECMA-402 defines extensions to This does not seem to be implemented by any major browser yet (tested |
migrated to mapbox/mapbox-gl-js#3999 |
The
text-transform
property is documented in the style specification as being “similar to the CSStext-transform
property”. One key difference is that most modern browser engines transform a text node based on the node’s declared or inherited locale, via thelang
HTML attribute orxml:lang
XML attribute, taking into account any language-specific case rules. By contrast, Mapbox GL implementations perform a locale-neutral transformation (for example the “C locale” on POSIX platforms).A locale-neutral transformation works well for many alphabets, such as English and Spanish, and as expected it has no effect on ideographic writing systems such as the CJK scripts. However, many Latin alphabets have special cases that the C locale doesn’t respect. For example, the Turkish city Kırşehir, whose name includes both dotted and dotless I’s, should be labeled “KIRŞEHİR” but instead is labeled “KIRŞEHIR” (omitting a tittle):
German street names should be labeled, e.g., “GROSSER STERN” instead of “GROßER STERN”:
It isn’t sufficient to transform text to the user’s current locale. The examples above come from applying
"text-field": "{name}"
in the Bright style. Thename
field in the Mapbox Streets source is written in each feature’s native language, but it provides no way to distinguish between different languages. One could imagine a future version of the source providing a best guess of the name’s language, expressed as a BCP 47 / ISO 639 tag, based on the containing country and some character range–based heuristics. The style specification, then, could be extended with atext-language
property that would be set to{language}
for any layer that setstext-field
to{name}
.Adding a
text-language
property isn’t semantically ideal, since it’s really the data that has an intrinsic language, not the style. But it seems like overkill to extend the vector tile specification with a new type that pairs a string with a language identifier.The native platforms supported by Mapbox GL have standard APIs for uppercasing or lowercasing a string based on a locale. For example, the Mapbox iOS/macOS SDK implementation of
"text-transform": "uppercase"
calls-[NSString uppercaseString]
, but it should call-[NSString uppercaseStringWithLocale:]
instead.On the other hand, Mapbox GL JS calls
String.prototype.toUpperCase()
, and there is currently no standard API for locale-aware conversions beyond the user’s current locale. From the discussion at mapbox/mapbox-gl-js#149 (comment), it sounds like it’d be impractical to include a JavaScript library for pan-language support. However, maybe there’s room to support a handful of high-priority languages like German and Turkish.The specification should make it clear that locale awareness is made on a best-effort basis, just like in CSS. For example, the Mapbox iOS and macOS SDKs won’t necessarily uppercase the English “E MacDonald St” as “E MacDONALD ST”, the Mapbox Android SDK may fall back to the C locale for Klingon, and Mapbox GL JS wouldn’t be required to do anything differently than it already does.
Beyond text transformations, Mapbox GL could in the future use the
text-language
property to choose the correct national language variant for each Unihan character in CJK text, just as native text rendering engines and Web browser rendering engines do./cc @mapbox/gl @mapbox/cartography-cats
The text was updated successfully, but these errors were encountered: