Skip to content
This repository has been archived by the owner on Dec 8, 2017. It is now read-only.

text-transform is not locale-aware #21

Closed
1ec5 opened this issue Jul 23, 2016 · 8 comments
Closed

text-transform is not locale-aware #21

1ec5 opened this issue Jul 23, 2016 · 8 comments

Comments

@1ec5
Copy link
Contributor

1ec5 commented Jul 23, 2016

The text-transform property is documented in the style specification as being “similar to the CSS text-transform property”. One key difference is that most modern browser engines transform a text node based on the node’s declared or inherited locale, via the lang HTML attribute or xml:lang XML attribute, taking into account any language-specific case rules. By contrast, Mapbox GL implementations perform a locale-neutral transformation (for example the “C locale” on POSIX platforms).

A locale-neutral transformation works well for many alphabets, such as English and Spanish, and as expected it has no effect on ideographic writing systems such as the CJK scripts. However, many Latin alphabets have special cases that the C locale doesn’t respect. For example, the Turkish city Kırşehir, whose name includes both dotted and dotless I’s, should be labeled “KIRŞEHİR” but instead is labeled “KIRŞEHIR” (omitting a tittle):

Kırşehir

German street names should be labeled, e.g., “GROSSER STERN” instead of “GROßER STERN”:

Großer Stern

It isn’t sufficient to transform text to the user’s current locale. The examples above come from applying "text-field": "{name}" in the Bright style. The name field in the Mapbox Streets source is written in each feature’s native language, but it provides no way to distinguish between different languages. One could imagine a future version of the source providing a best guess of the name’s language, expressed as a BCP 47 / ISO 639 tag, based on the containing country and some character range–based heuristics. The style specification, then, could be extended with a text-language property that would be set to {language} for any layer that sets text-field to {name}.

Adding a text-language property isn’t semantically ideal, since it’s really the data that has an intrinsic language, not the style. But it seems like overkill to extend the vector tile specification with a new type that pairs a string with a language identifier.

The native platforms supported by Mapbox GL have standard APIs for uppercasing or lowercasing a string based on a locale. For example, the Mapbox iOS/macOS SDK implementation of "text-transform": "uppercase" calls -[NSString uppercaseString], but it should call -[NSString uppercaseStringWithLocale:] instead.

On the other hand, Mapbox GL JS calls String.prototype.toUpperCase(), and there is currently no standard API for locale-aware conversions beyond the user’s current locale. From the discussion at mapbox/mapbox-gl-js#149 (comment), it sounds like it’d be impractical to include a JavaScript library for pan-language support. However, maybe there’s room to support a handful of high-priority languages like German and Turkish.

The specification should make it clear that locale awareness is made on a best-effort basis, just like in CSS. For example, the Mapbox iOS and macOS SDKs won’t necessarily uppercase the English “E MacDonald St” as “E MacDONALD ST”, the Mapbox Android SDK may fall back to the C locale for Klingon, and Mapbox GL JS wouldn’t be required to do anything differently than it already does.

Beyond text transformations, Mapbox GL could in the future use the text-language property to choose the correct national language variant for each Unihan character in CJK text, just as native text rendering engines and Web browser rendering engines do.

/cc @mapbox/gl @mapbox/cartography-cats

@1ec5
Copy link
Contributor Author

1ec5 commented Jul 23, 2016

The current situation is also suboptimal because each platform’s default case transformations already differ. In contrast to the screenshot of GL JS above, the same German street does say “GROSSER STERN” on macOS and iOS, still using the language-neutral locale:

Großer Stern macOS

@1ec5
Copy link
Contributor Author

1ec5 commented Nov 17, 2016

Adding a text-language property isn’t semantically ideal, since it’s really the data that has an intrinsic language, not the style. But it seems like overkill to extend the vector tile specification with a new type that pairs a string with a language identifier.

Or perhaps TileJSON (and any runtime equivalent) should be extended to map layers to language codes, which would avoid the unnecessary tile size increase from putting a language code on every individual feature. TileJSON doesn’t currently distinguish between raster and vector tiles, but this feature would be specific to vector tiles.

@kkaefer
Copy link
Member

kkaefer commented Nov 17, 2016

putting a language code on every individual feature

We'd actually have to tag every value, since a feature can contain names for a feature in multiple languages.

@1ec5
Copy link
Contributor Author

1ec5 commented Nov 17, 2016

We'd actually have to tag every value, since a feature can contain names for a feature in multiple languages.

I was thinking that a source-wide mapping from source properties to languages would be more economical, since for instance Mapbox Streets’ name_de property can be presumed to be in German for all features and name_en in English for all features. There are two problems: one is that name is a generic tag that contains the local name in whatever language is appropriate for a given feature. The other is that, due to a lack of a token fallback syntax in the style specification (mapbox/mapbox-gl-style-spec#104), Mapbox Streets must duplicate all names in all name properties, even if that means backfilling a non-German name into the name_de property.

Once we implement token fallbacks, the next version of Mapbox Streets would be able to remove those backfilled names, so that we’re left with a purely German name_de and a purely English name_en. The reclaimed space could be repurposed for a wider variety of name properties, including for instance name_chy for the occasional Cherokee translation. Then I think we’d be more justified in mapping name to the mul language code and providing language codes on a per–property basis rather than a per-feature basis. However, it’s still true that per-feature language information would be needed for the best text-transform results.

Edit: s/layer/property/

@jfirebaugh
Copy link
Contributor

@1ec5 Are you using "layer" to mean "property"? Individual features can have a name_de property, but there is no "name_de layer" in Mapbox Streets. And in that light, I don't understand what you're proposing.

@1ec5
Copy link
Contributor Author

1ec5 commented Nov 17, 2016

Are you using "layer" to mean "property"? Individual features can have a name_de property, but there is no "name_de layer" in Mapbox Streets. And in that light, I don't understand what you're proposing.

Oof, that’s what I meant. Sorry – fixed. There are two proposals so far in this ticket:

  1. A text-language layout property that accepts tokens just as text-field would; text-language would affect at least text transforms but potentially also font fallbacks in the future. If the designer sets text-field to {name_tr}, then they can set text-language to tr. But if they want to set text-field to {name}, they’d need Mapbox Streets to provide a name_language property on each individual feature.
  2. Alternatively, a mapping – somewhere, maybe in the style JSON, maybe in TileJSON – from vector tile properties to language codes that Mapbox GL would consult any time it tries to transform text that originates in one of these vector tile properties. So if text-field is {name_de} — {name_en} and text-transform is uppercase, then Mapbox GL would know to uppercase the name_de value with the German locale before inserting it into the overall string. Indicating the language of the name field per-feature would be out of scope.

It’s entirely possible that both proposals are rubegoldbergian and there are simpler ways to accomplish locale-aware text transforms. Any ideas?

@jfirebaugh
Copy link
Contributor

On the other hand, Mapbox GL JS calls String.prototype.toUpperCase(), and there is currently no standard API for locale-aware conversions beyond the user’s current locale.

ECMA-402 defines extensions to String.prototype.toLocale{Upper,Lower}Case accepting a set of locales as a parameter.

This does not seem to be implemented by any major browser yet (tested "Kırşehir".toLocaleUpperCase("tr") on Chrome, Firefox, and Safari, and all returned "KIRŞEHIR"), but it's a potential future solution to a portion of this issue.

@lucaswoj
Copy link

migrated to mapbox/mapbox-gl-js#3999

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants