-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[wasm][globalization][icu] Tracking issue for HybridGlobalization: Web API + ICU #79989
Comments
Tagging subscribers to this area: @dotnet/area-system-globalization Issue DetailsThe task is to remove as much data from ICU files as possible and exchange ICU4C functions that are using this data with platform native functions - in the case of WASM with Web API. Because we are not able to get rid of ICU datafile completely (some functionalities are not easily replaceable) we will keep loading
....
|
Tagging subscribers to 'arch-wasm': @lewing Issue DetailsThe task is to remove as much data from ICU files as possible and exchange ICU4C functions that are using this data with platform native functions - in the case of WASM with Web API. Because we are not able to get rid of ICU datafile completely (some functionalities are not easily replaceable) we will keep loading
....
|
Closing, the planned work for |
The task is to remove as much data from ICU files as possible and exchange ICU4C functions that are using this data with platform native functions - in the case of WASM with Web API. Because we are not able to get rid of ICU datafile completely (some functionalities are not easily replaceable) we will keep loading
icudt.dat
in a reduced form. This mode will be calledHybridGlobalization
and will be by default switched off. User can switch it on by setting MsBuild's<HybridGlobalization>
to true.PoC branch is here: main...ilonatommy:runtime:icu-platform-native.
collations
for WASMHybridGlobalization
icu#300 [ILONA]GlobalizationNative_ChangeCase
+ optimize memory usage - do not create a new string for returning the value but pass the address of buffer reserved on C# size that will hold the result. [ILONA]public API:
TextInfo.ToLower
,TextInfo.ToUpper
, TextInfo.ToTitleCase`GlobalizationNative_IndexOf
andGlobalizationNative_LastIndexOf
- will not work for letters that consist of more than one grapheme, issue: Add locale sensitive substring matching functions to Intl.Collator tc39/ecma402#506 [ILONA]public API:
CompareInfo.IndexOf
,String.IndexOf
,MemoryExtensions.IndexOf
,CompareInfo.LastIndexOf
,String.LastIndexOf
,MemoryExtensions.LastIndexOf
.GlobalizationNative_StartsWith
andGlobalizationNative_EndsWith
[ILONA]public API:
CompareInfo.IsSuffix
,String.EndsWidth
,MemoryExtensions.EndsWith
,CompareInfo.IsPrefix
,String.StartsWidth
,MemoryExtensions.StartsWith
.GlobalizationNative_CompareString
(withoutOrdinal
andOrdinalIgnoreCase
,IgnoreKanaType
,IgnoreWidth
) [ILONA]public API:
CompareInfo.Compare
,String.Compare
ImplementIgnoreKanaType
andIgnoreWidth
basing onpal_collation.c
code [ILONA]Ordinal
andOrdinalIgnoreCase
. [ILONA]GlobalizationNative_GetSortKey
If much, throw PNSE onGlobalizationNative_GetSortVersion
.HybridGlobalization
from Blazor. Changes in dotnet/sdk might be needed.normalization
for WASM:Removed from planned Hybrid features. Savings from normalization removal on WASM are ~60kB. The removal breaks public APIs: string.Normalize, string.IsNormalized, IdnMapping.GetAsciii, IdnMapping.GetUnicode. Normalize/IsNormalized were succesfully replaced in [browser][non-icu]
HybridGlobalization
normalization. #85510.For GetAscii/GetUnicode replacement, Invariant implementation enhanced by normalization step was used, see branch https://github.com/ilonatommy/runtime/tree/idn-mapping. The mapping still lacked detection of disallowed/ignored/mapped characters and would need access to MappingTables of the current Unicode version to e.g. detect incorrect inputs to throw. One Unicode version mapping table in plain text weights ~900kB. Even if we compressed it, we still would need to maintain it with every Unicode version. Development time spent on correct implementation and chances of real size reduction, taking into cosideration the need to keep the mapping tables, are too small to remove normalization data from ICU.
Updateicudt_wasm.dat
and corresponding sharded datafiles.Implement Punycode,might be using this algorithmusing InvariantGlobalization algorithm + normalization function.Use normalization from the PoC branch.Update documentations.coll_ucadata
,locales_tree
etc.no exception thrown
forCultureInfoAll.LcidTest
,CultureInfoAll.GetCultureTest
,CultureInfoConstructor.Ctor_String
(now we support wider range of locales so we should not expect some of them throw as it was with standard ICU)Consider fixing someIgnoreSymbols
by adding static data on JS side.Consider shifting katakana/hiragana and high/low symbols, based onpal_collation.c
codeHybridGlobalization
compare #84249 (comment) to catch them already during the build time.Tracking issues:
#101912
#102305
#102373
#95921
#95795
#95623
The text was updated successfully, but these errors were encountered: