Optimizing data for Display Names #3260

sffc · 2023-04-04T23:19:51Z

The DisplayNames component comes with a large amount of data. It is the largest locale-specific data in ICU and will also likely be the largest in ICU4X.

There are a few things that make DisplayNames interesting:

The majority of the display names are probably not useful to carry for most clients. For example, users speaking Japanese are more likely to need the translation for the Katakana script than the translation for the Cherokee script. We should explore something like japanext and likelysubtagsext where we have a core set and an extended set.
Regional variants often override only a small number of strings. For example, en-GB and en-US might be equivalent for all region names except for one or two. This doesn't play nicely with the deduplication mechanism we've thusfar relied on.

CC @snktd @robertbastian @markusicu

robertbastian · 2023-04-05T08:55:03Z

I think 2 is a big issue, and I think it also happens for other data. We could, instead of loading a single data struct in the formatter constructor, load all structs for the whole fallback chain. This could use naive fallback (i.e. chopping off tags), so no additional data would be needed. We can then remove redundant entries from en-GB and en-001 if they are in en (if we're using naive we'd still have duplication across GB and 001 though).

sffc · 2023-05-11T18:33:46Z

Discuss with:

sffc · 2023-07-05T08:37:27Z

Discussed on 2023-07-04. We will use the auxiliary key model, similar to currency formatter (#1441), which resolves the issues in the OP.

sffc added A-design Area: Architecture or design discuss Discuss at a future ICU4X-SC meeting A-data Area: Data coverage or quality C-dnames Component: Language/Region/... Display Names labels Apr 4, 2023

sffc added the discuss-triaged The stakeholders for this issue have been identified and it can be discussed out-of-band label May 25, 2023

sffc removed discuss Discuss at a future ICU4X-SC meeting discuss-triaged The stakeholders for this issue have been identified and it can be discussed out-of-band labels Jul 5, 2023

sffc added this to the 1.4 Blocking ⟨P1⟩ milestone Jul 5, 2023

sffc added T-core Type: Required functionality S-medium Size: Less than a week (larger bug fix or enhancement) labels Jul 5, 2023

sffc mentioned this issue Jul 5, 2023

Implement and re-implement auxiliary keys #3632

Closed

sffc mentioned this issue Aug 22, 2023

Finalize the DisplayNames component #3913

Open

5 tasks

sffc modified the milestones: 1.4 Blocking ⟨P1⟩, 1.5 Blocking ⟨P1⟩ Nov 14, 2023

sffc modified the milestones: 1.5 Blocking ⟨P1⟩, 1.x Priority ⟨P2⟩ Feb 29, 2024

sffc mentioned this issue Oct 8, 2024

Fix generic location format for single-tz countries #5657

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimizing data for Display Names #3260

Optimizing data for Display Names #3260

sffc commented Apr 4, 2023 •

edited

Loading

robertbastian commented Apr 5, 2023

sffc commented May 11, 2023 •

edited by robertbastian

Loading

sffc commented Jul 5, 2023

Optimizing data for Display Names #3260

Optimizing data for Display Names #3260

Comments

sffc commented Apr 4, 2023 • edited Loading

robertbastian commented Apr 5, 2023

sffc commented May 11, 2023 • edited by robertbastian Loading

sffc commented Jul 5, 2023

sffc commented Apr 4, 2023 •

edited

Loading

sffc commented May 11, 2023 •

edited by robertbastian

Loading