-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ICU in embedded v8 #1968
Comments
I've been working on this for the past week and here is what I got so far. Compiling Deno with chromium's ICU:
What happens here is that when you want any IANA date's you get the following error:
So I did a little bit of debugging inside v8 and ICU and I found out that there is a C++ Error coresponding to What this means is that there are some kind of data for the ICU which it uses to load different time zones and etc. However those are not found here. (There is a data folder inside chromium's ICU (/deno/third_party/icu/data/) However it looks like that wasn't enough. Using README.chromium:There is a readme file inside chromium's icu called
So I did all that. Still nothing got fixed, same error. Using Official ICU's Docs:There are some docs inside the official ICU website about ICU data: It says that there are 3 ways of loading data to ICU
So I realized that there are two .dat files that have changed and git is showing them as unstashed files inside
I did a little bit of debugging and I found out that the data was loaded inside ICU, But it was not complete and part of it is still missing (about currencies of the countries I believe) |
I'm not super keen to add ICU due to its size. But I suppose people need it. :| @ATheCoder sounds like you're off to a good start. Maybe put up your branch? |
@ry i totally understand but if we don't add it, maybe a creation of deno_internationalization with polyfills would be a good compromise? Another alternative would be to have one build with icu and one without. But it means doubling the ci and so on... I'm not really thinking it's a good alternative. |
@ry But I didn't manage to get it to work. |
AFAIK, even Node.js does not include full ICU sometimes on certain platform. It'd would be better off adding new flag that allows users to use whichever ICU they want instead of embedding it because of the size concern. An awesome package like full-icu is used in such case. |
Another thing: It’s pretty crazy that node doesn’t have an API to set the locale at runtime. It’s pretty important to be able to do that, e.g. when offering a language switcher in your application or running the tests in a predictable environment. |
node have recently made full-icu the default: nodejs/node#19214 |
Can't agree more with @hayd |
deno 1.0.0-rc3 | v8 8.4.300 | typescript 3.8.3 It looks like this issue will persist into the Deno 1.0.0 release. I ran this code on 1.0.0-rc3 and the same error remains. I like Ryan's eye for performance and efficient resource utilization. On one hand, it seems reasonable to me, that people who don't need Are there any workarounds (other than simply writing code that doesn't use |
I could managed to make ICU to work. Basically I follow the instruction which @ATheCoder provided earlier in this ticket. One change I made was data file to be embedded in the ICU library. Compiling Deno with chromium's ICU:
diff --git a/core/Cargo.toml b/core/Cargo.toml
index ebd10838..409fb9ef 100644
--- a/core/Cargo.toml
+++ b/core/Cargo.toml
@@ -19,7 +19,7 @@ futures = { version = "0.3.4", features = ["thread-pool", "compat"] }
lazy_static = "1.4.0"
libc = "0.2.69"
log = "0.4.8"
-rusty_v8 = "0.4.2"
+rusty_v8 = { version="0.4.2", path="../../rusty_v8" }
serde_json = "1.0.52"
url = "2.1.1"
diff --git a/.gn b/.gn
index b21600a..5c74f75 100644
--- a/.gn
+++ b/.gn
@@ -28,7 +28,7 @@ default_args = {
# https://cs.chromium.org/chromium/src/docs/ccache_mac.md
clang_use_chrome_plugins = false
- v8_enable_i18n_support = false
+ v8_enable_i18n_support = true
v8_monolithic = false
v8_use_external_startup_data = false
v8_use_snapshot = true
$ cd rusty_v8/third_party
$ git clone https://chromium.googlesource.com/chromium/deps/icu.git
$ grep icu rusty_v8/v8/DEPS
'v8/third_party/icu':
Var('chromium_url') + '/chromium/deps/icu.git' + '@' + 'f2223961702f00a8833874b0560d615a2cc42738',
$ cd rusty_v8/third_party/icu
$ git checkout f2223961702f00a8833874b0560d615a2cc42738
diff --git a/config.gni b/config.gni
index c64f8348..d42678be 100644
--- a/config.gni
+++ b/config.gni
@@ -5,7 +5,7 @@
declare_args() {
# Tells icu to load an external data file rather than rely on the icudata
# being linked directly into the binary.
- icu_use_data_file = true
+ icu_use_data_file = false
# If true, compile icu into a standalone static library. Currently this is
# only useful on Chrome OS.
icu_disable_thin_archive = false
$ cd deno
$ V8_FROM_SOURCE=1 cargo build -vv On Linux, there would be no issue for compiling. On Mac we might need to apply one change to build config file to refer ICU header file location. I'm not expert of the V8 build system, so there would be much better solution for it. Anyway I could build Deno with following change on Mac.
diff --git a/config/mac/BUILD.gn b/config/mac/BUILD.gn
index de8233bba..b6a3778dc 100644
--- a/config/mac/BUILD.gn
+++ b/config/mac/BUILD.gn
@@ -47,6 +47,7 @@ config("compiler") {
if (export_libcxxabi_from_executables) {
ldflags += [ "-Wl,-undefined,dynamic_lookup" ]
}
+ cflags += [ "-I../../../../rusty_v8/third_party/icu/source/common" ]
}
# This is included by reference in the //build/config/compiler:runtime_library
It seems we might need to remove leading '_' from ICU bult in data symbol. diff --git a/scripts/make_data_assembly.py b/scripts/make_data_assembly.py
index 94b07c81..eb2bf72c 100755
--- a/scripts/make_data_assembly.py
+++ b/scripts/make_data_assembly.py
@@ -60,10 +60,10 @@ if options.mac:
"\t.align 4\n"
"_icudt%s_dat:\n" %tuple([version_number] * 3))
elif options.win:
- output.write(".globl _icudt%s_dat\n"
+ output.write(".globl icudt%s_dat\n"
"\t.section .rdata\n"
"\t.balign 16\n"
- "_icudt%s_dat:\n" % tuple([version_number] * 2))
+ "icudt%s_dat:\n" % tuple([version_number] * 2))
else:
output.write(".globl icudt%s_dat\n"
"\t.section .note.GNU-stack,\"\",%%progbits\n" Run Deno to try ICU functions.Once build success, we can try ICU functionality with the new Deno binary. $ deno/target/debug/deno
Deno 1.0.2
exit using ctrl+d or close()
> new Date().toLocaleString("en-US", {timeZone: "America/New_York"});
5/24/2020, 5:27:10 AM
> const formatjp = new Intl.NumberFormat('ja-JP', { style: 'currency',currency: 'JPY'}).format;
undefined
> formatjp('1280000');
¥1,280,000 |
@kishiguro what's the size of resulting binary? |
with debug build (no strip of symbol) on Linux, it looks like this. let me compare it with non ICU version.
|
Could you |
Here is summary of binary size. Parenthesis value is non-strip binary size.
I guess ICU code+data size would be around 13-14M. This result matches to nodejs's full ICU support thread result nodejs/node#19214 (on Mac 35M->49M roughly 14M increase). |
Trying to build on Windows. Non ICU version can be built without any issue. But ICU build can't be completed with following error. $ cd deno
$ cargo build -vv
...
[591/1425] LINK gen-regexp-special-case.exe gen-regexp-special-case.exe.pdb
FAILED: gen-regexp-special-case.exe gen-regexp-special-case.exe.pdb
ninja -t msvc -e environment.x64 -- ..\clang\bin\lld-link.exe /nologo "-libpath:..\..\..\..\..\..\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.25.28610\ATLMFC\lib\x64" "-libpath:..\..\..\..\..\..\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.25.28610\lib\x64" "-libpath:..\..\..\..\..\..\Program Files (x86)\Windows Kits\10\lib\10.0.18362.0\ucrt\x64" "-libpath:..\..\..\..\..\..\Program Files (x86)\Windows Kits\10\lib\10.0.18362.0\um\x64" /OUT:./gen-regexp-special-case.exe /PDB:./gen-regexp-special-case.exe.pdb @./gen-regexp-special-case.exe.rsp
lld-link: error: undefined symbol: icudt67_dat
>>> referenced by .\..\..\..\..\rusty_v8\third_party\icu\source\common\udata.cpp:698
>>> obj/third_party/icu\icuuc/udata.obj:(struct UDataMemory * __cdecl openCommonData(char const *, int, enum UErrorCode *))
>>> referenced by .\..\..\..\..\rusty_v8\third_party\icu\source\common\udata.cpp:722
>>> obj/third_party/icu\icuuc/udata.obj:(struct UDataMemory * __cdecl openCommonData(char const *, int, enum UErrorCode *))
[592/1425] ACTION //v8:generate_bytecode_builtins_list(//build/toolchain/win:win_clang_x64)
[593/1425] CXX obj/v8/src/inspector/inspector/Runtime.obj
ninja: build stopped: subcommand failed.
--- stderr
thread 'main' panicked at '
command did not execute successfully, got: exit code: 1
build script failed, must exit now', C:\Users\kunih\.cargo\registry\src\github.com-1ecc6299db9ec823\cargo_gn-0.0.15\src\lib.rs:203:3
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace |
icudt67_dat must be defined as ICU data source. On Mac it is defined in icudtl_dat.o
Looks like this is built from
Need to figure out how this is built on Windows...
|
Finally, I managed to make Windows version work as well. I have updated binary image size above table (13M increase on Windows).
I made change to remove leading '_' from the symbol.
diff --git a/scripts/make_data_assembly.py b/scripts/make_data_assembly.py
index 94b07c81..eb2bf72c 100755
--- a/scripts/make_data_assembly.py
+++ b/scripts/make_data_assembly.py
@@ -60,10 +60,10 @@ if options.mac:
"\t.align 4\n"
"_icudt%s_dat:\n" %tuple([version_number] * 3))
elif options.win:
- output.write(".globl _icudt%s_dat\n"
+ output.write(".globl icudt%s_dat\n"
"\t.section .rdata\n"
"\t.balign 16\n"
- "_icudt%s_dat:\n" % tuple([version_number] * 2))
+ "icudt%s_dat:\n" % tuple([version_number] * 2))
else:
output.write(".globl icudt%s_dat\n"
"\t.section .note.GNU-stack,\"\",%%progbits\n" I'm not really sure why link success with this change (I know in C/C++ world symbol always has leading '_'). Anyway at least it build and works fine.
Now question is are we going forward to have ICU support by default? IMHO, 14M increase for full ICU support will worth it. |
@kishiguro have you checked the sizes of release binaries? They might end up different that debug ones. |
@bartlomieju sure, let me check it. There was significant change of the binary size.
with the release build, on all platform increased binary size is around 12M. |
On the one hand it seems convenient to have statements like
behaving in the expected way. On the other hand I wonder whether it would be a worthwhile alternative to provide the corresponding features via a third party module - saving binary size of around 12M ... if I got things correctly. I created a third party module which serves my private little fun requirements: https://deno.land/x/time in this context as an alternative. |
An external library wouldn't work on regexes. I surely hope people aren't expected to recompile all their regexes to be Deno-compatible... None of the Unicode category matches work in Deno, and an external import won't fix regex literals. |
ICU has gotten a lot more features since FF4. |
I reported this bug on Jan 31, 2019 (see #1636). That's approaching "2 years ago". Nodejs supports this standard and so does every major web browser. Does Deno not intend to support internationalization? I'm just curious: what's the main reason for delay on this? Is it due to "apprehension over whether or not to support it", or (more so) due to the "difficultly of supporting it"? |
It is two fold. Priority and finding the right balance of the functional needs with the "cost" of adding ICU. If we add all of the ICU that will dramatically increase the size of the executable, for something that not everyone needs or uses. If we put just |
I thought @kishiguro had already gotten full ICU support working on Linux, Mac, and Windows from reading these comments above: Here's what I think would be really cool: What if you ported Internationalization to web assembly and wrapped that with an ES6 module? Then, if someone needs Internationalization, they are one ES6 include away from having it. This is probably way easier said than done, but if accomplished it would address the performance priorities that are causing reluctance and apprehension on this issue. |
Yes, exactly... full ICU support is a 25% increase in executable size. 🤯 That is huge. Which leads to the "how do we only do something like
It doesn't work that way. |
The core team discussed this today and agreed it is something we need to get into Deno. The "right" way to add it is to add ICU support to Once it is added to |
If we care about the detractors we could use |
no idea if that's a potential path forward, but how about two deno builds, basically one with ICU and one without for apps that don't need it and want to save the bloat? |
We are working on a solution for Deno 1.7.0. |
Hi. I'm one of the authors of ICU4X which is a Rust implementation of ICU. We're quite young (with just 0.1 released), but we're working on 0.2 now which we hope to provide some basic ECMA-402 compatible functionality (DateTimeFormat, PlrualRules, Locale etc.) We also have ecma_402 traits and some ICU4C wrappers in Rust that are used by the Fucshia team for now. Would you consider cooperating with us instead of writing your own bindings? Link: https://github.com/unicode-org/icu4x and https://crates.io/crates/rust_icu |
ICU support has landed in #9466 and will be released in |
As seen in #1952 / #1636 ICU needs to be added in Deno build.
Switching this flag to true maybe: https://github.com/denoland/deno/blob/master/.gn#L50 ?
ref: https://v8.dev/docs/i18n
The text was updated successfully, but these errors were encountered: