Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure that the provider performs correct alias mapping for Traditional Chinese locales #1964

Open
hsivonen opened this issue May 30, 2022 · 7 comments
Assignees
Labels
C-data-infra Component: provider, datagen, fallback, adapters S-epic Size: Major project (create smaller child issues) U-ecma402 User: ECMA-402 compatibility

Comments

@hsivonen
Copy link
Member

Ensure that if a specific (and existing) collation hasn't been specified with -u-co-, the following map to zh-u-co-stroke:

  • zh-Hant regardless of region.
  • zh without Hans but with any of HK, MO, TW.
  • yue without either Hans or CN.
@hsivonen hsivonen added the C-data-infra Component: provider, datagen, fallback, adapters label May 30, 2022
@hsivonen hsivonen added this to the ICU4X 1.0 (Features) milestone May 30, 2022
@hsivonen
Copy link
Member Author

CC @sffc

@hsivonen
Copy link
Member Author

For clarity: CLDR maps yue-CN and yue-Hans to zh-Hans, i.e. zh-u-co-pinyin.

@sffc sffc self-assigned this Jun 7, 2022
@sffc
Copy link
Member

sffc commented Jun 16, 2022

Ensure that if a specific (and existing) collation hasn't been specified with -u-co-, the following map to zh-u-co-stroke:

  • zh-Hant regardless of region.

This will be possible so long as zh-Hant contains the correct data. I'll add a test for this.

  • zh without Hans but with any of HK, MO, TW.

This should be automatic given that these fallbacks are included in parent locales / likely subtags; all of these locales will fall back via zh-Hant.

  • yue without either Hans or CN.

Looks like the mappings in likely subtags are correct:

      "yue": "yue-Hant-HK",
      "yue-CN": "yue-Hans-CN",
      "yue-Hans": "yue-Hans-CN",

I'll add a test for it.

@sffc
Copy link
Member

sffc commented Aug 31, 2022

There is a list of collation-specific aliases/parents in the LDML-to-ICU converter:

https://github.com/unicode-org/icu/blob/0266970e977b9e2488dfbf788cc280be3a0338ca/tools/cldr/cldr-to-icu/build-icu-data.xml#L263

Obviously, that list isn't making it into ICU4X.

I chatted with @markusicu about this today. He says that it may make sense to introduce a "processing" mode to the locale fallback engine. This mode can be used for both collator and break iterator.

I need to verify whether the set of ICU-specific overrides should apply uniformly to both collator data and segmenter data.

@sffc sffc added the S-epic Size: Major project (create smaller child issues) label Sep 3, 2022
@sffc
Copy link
Member

sffc commented Sep 26, 2022

I still need to implement the actual zigzag fallback, but this can be done in the Collation fallback mode.

@sffc sffc modified the milestones: ICU4X 1.0 (Final), ICU4X 1.1 Sep 26, 2022
@sffc
Copy link
Member

sffc commented Dec 20, 2022

Upstream issue involving the ICU-specific fallback aliases: https://unicode-org.atlassian.net/browse/CLDR-16253

@sffc
Copy link
Member

sffc commented Apr 22, 2024

See some more recent discussion in #3867

@sffc sffc mentioned this issue Apr 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-data-infra Component: provider, datagen, fallback, adapters S-epic Size: Major project (create smaller child issues) U-ecma402 User: ECMA-402 compatibility
Projects
None yet
Development

No branches or pull requests

2 participants