Skip to content
This repository has been archived by the owner on Apr 22, 2023. It is now read-only.

Intl: Build "English-only" binaries for the distribution. #7676

Closed
srl295 opened this issue May 23, 2014 · 46 comments
Closed

Intl: Build "English-only" binaries for the distribution. #7676

srl295 opened this issue May 23, 2014 · 46 comments
Assignees
Milestone

Comments

@srl295
Copy link
Member

srl295 commented May 23, 2014

A new issue to continue the discussion in #6371 and #4689 - especially to implement #6371 (comment)

  • provide a way to build against any ICU, not just Chromium's
  • default build has very little data (just English) for a small file size
  • default disk location at build time for additional data --with-default-icu-path=...
  • runtime option --icu-data-dir=path.. or NODE_ICU_DATA env var can override the default
  • ICU won't be in github, but can be enabled by default in builds.

/cc @tjfontaine @trevnorris

@srl295
Copy link
Member Author

srl295 commented May 23, 2014

opened a couple of tickets in v8 that may be needed here.

@trevnorris
Copy link

Does static bool InitializeICU(const char* icu_data_file = NULL); not do enough? The API doesn't say it can't be run multiple times for each icu_data_file.

@srl295
Copy link
Member Author

srl295 commented May 27, 2014

@trevnorris that one has some odd behavior (maybe fixable) , ignored under some build configs, and also reads the entire file into memory (fread) before applying it. My version just configures ICU to do the right thing.

Could maybe improve the API upstream (v8) and just have one function.

I actually think that InitializeICU's model may be what we will need to go with eventually, though. u_setDataDirectory() doesn't have the effect I expected it to have - in particular, it cannot override inbuilt data.

@srl295
Copy link
Member Author

srl295 commented Jul 30, 2014

@tjfontaine as #7719 is nearing completion and is marked for v0.12 .. what else will need to happen so that we can get "icu .. enabled by default in builds, but not necessarily included in github"? Who builds the binaries for http://nodejs.org and is there anything that should be done to make things easier for building those?

@trevnorris
Copy link

Three things I'd like fixed before --with-intl is part of the default build:

  1. ./configure shouldn't show the path to every source file that's being included for build.
  2. The warning: overriding commands for target messages should be fixed (if this is a problem with gyp then either upstream the fix and update gyp, or float a patch).
  3. Determine how the icu dep is included. The icu4c-53_1-src.tgz decompresses to 108 MB, but we only need ~40MB of that to do the build. So either we strip it down and include only the necessary or we download it at compile time. I don't know what we want to do, but the decision will have to be made.

@srl295
Copy link
Member Author

srl295 commented Oct 1, 2014

@trevnorris

  • #1 - will do.
  • #2 Who is the best contact or contacts to figure this out? EDIT done in build: i18n: fix issue with icu toolset dependencies #8681
  • #3 There are pros and cons here. For example, the build doesn't need ICU's unit tests, but I hesitate to separate them because they give insight into build correctness. I would like to keep investigating reducing the on disk footprint as well.

@trevnorris
Copy link

@bnoordhuis You have any ideas about the possible bug in gyp wrt the warning: overriding commands for target message?

@bnoordhuis
Copy link
Member

I looked into it and I know what causes it now but I'm not sure how to fix it. You get that warning when using multiple toolsets for a target of type 'executable' (because both binaries end up in the PRODUCT_DIR directory) or when multiple actions have the same filename as their output (for pretty much the same reason.)

Neither is something that @srl295's patch seems to do outright but perhaps it's the result of transitive dependencies. It may be a bug in our .gyp and .gypi files after all.

srl295 added a commit to srl295/node-v0.x-archive that referenced this issue Oct 4, 2014
fix number one of
nodejs#7676 (comment)
move `icu_src_*` variables to separate `.gypi`
srl295 added a commit to srl295/node-v0.x-archive that referenced this issue Oct 4, 2014
fix number one of
nodejs#7676 (comment)
move `icu_src_*` variables to separate `.gypi`
trevnorris pushed a commit that referenced this issue Oct 7, 2014
Fixes: #7676 (comment)
Reviewed-by: Trevor Norris <trev.norris@gmail.com>
@srl295
Copy link
Member Author

srl295 commented Oct 23, 2014

@trevnorris re #3 in #7676 (comment) - where do we go from here? I would recommend just redownloading it at this point- source is actually sourceforge mirrored now, could cache locally for buildbots - what do you think?

@srl295
Copy link
Member Author

srl295 commented Oct 23, 2014

@bnoordhuis @trevnorris and I'm still not sure what to do about #2 in the same.

@srl295
Copy link
Member Author

srl295 commented Oct 23, 2014

@bnoordhuis
Copy link
Member

Apropos the warnings from the Makefile, I'm not sure if that is a bug in GYP or in our .gyp files but it's because some targets and actions have both host and target toolsets. Maybe try asking on the mailing list?

@srl295
Copy link
Member Author

srl295 commented Oct 23, 2014

@bnoordhuis sure.. I will try that. Last I checked there weren't docs on toolset

@trevnorris
Copy link

@srl295 re #2. I recently ran into this. It occurred when a build and one of its dependencies had the same target_name. In the generated out/tools/icu/icui18n.{host,target}.mk you can see the following:

all_deps += $(obj).target/tools/icu/libicui18n.a
# Add target alias
.PHONY: icui18n
icui18n: $(obj).host/tools/icu/libicui18n.a

# Add target alias to "all" target.
.PHONY: all
all: icui18n

# Add target alias
.PHONY: icui18n
icui18n: $(builddir)/libicui18n.a

That target exists in both tools/icu/icu-{generic,system}.gyp.

@srl295
Copy link
Member Author

srl295 commented Oct 23, 2014

@trevnorris yes, but why bother having toolsets be a list? Both the host and the target need this library. Maybe the generator should be generating icui18n-host: .. and icui18n-target: ..? Anyways, I'll ask over on gyp-developer

@trevnorris
Copy link

@srl295 that's actually exactly what I had to do. Appended something the target_name and it all magically worked.

@trevnorris
Copy link

Though in this case it does look like V8 requires that specific name. From deps/v8/tools/gyp/v8.gyp:

        ['v8_enable_i18n_support==1', {
          'dependencies': [
            '<(icu_gyp_path):icui18n',
            '<(icu_gyp_path):icuuc',
          ]

I'll see if I can fix it.

@trevnorris
Copy link

Here's a partial "fix"

diff --git a/tools/icu/icu-generic.gyp b/tools/icu/icu-generic.gyp
index bc0a660..8848096 100644
--- a/tools/icu/icu-generic.gyp
+++ b/tools/icu/icu-generic.gyp
@@ -90,7 +90,7 @@
     {
       'target_name': 'icui18n',
       'type': '<(library)',
-      'toolsets': [ 'host', 'target' ],
+      'toolsets': [ 'target' ],
       'sources': [
         '<@(icu_src_i18n)'
       ],
@@ -108,6 +108,27 @@
       },
       'export_dependent_settings': [ 'icuucx' ],
     },
+    {
+      'target_name': 'icui18n-host',
+      'type': '<(library)',
+      'toolsets': [ 'host' ],
+      'sources': [
+        '<@(icu_src_i18n)'
+      ],
+      'include_dirs': [
+        '../../deps/icu/source/i18n',
+      ],
+      'defines': [
+        'U_I18N_IMPLEMENTATION=1',
+      ],
+      'dependencies': [ 'icuucx-host', 'icu_implementation', 'icu_uconfig' ],
+      'direct_dependent_settings': {
+        'include_dirs': [
+          '../../deps/icu/source/i18n',
+        ],
+      },
+      'export_dependent_settings': [ 'icuucx-host' ],
+    },
     # this library is only built for derb..
     {
       'target_name': 'icuio',
@@ -122,13 +143,13 @@
       'defines': [
         'U_IO_IMPLEMENTATION=1',
       ],
-      'dependencies': [ 'icuucx', 'icui18n', 'icu_implementation', 'icu_uconfig' ],
+      'dependencies': [ 'icuucx-host', 'icui18n-host', 'icu_implementation', 'icu_uconfig' ],
       'direct_dependent_settings': {
         'include_dirs': [
           '../../deps/icu/source/io',
         ],
       },
-      'export_dependent_settings': [ 'icuucx', 'icui18n' ],
+      'export_dependent_settings': [ 'icuucx-host', 'icui18n-host' ],
     },
     # This exports actual ICU data
     {
@@ -288,7 +309,19 @@
     {
       'target_name': 'icustubdata',
       'type': '<(library)',
-      'toolsets': [ 'host', 'target' ],
+      'toolsets': [ 'target' ],
+      'dependencies': [ 'icu_implementation' ],
+      'sources': [
+        '<@(icu_src_stubdata)'
+      ],
+      'include_dirs': [
+        '../../deps/icu/source/common',
+      ],
+    },
+    {
+      'target_name': 'icustubdata-host',
+      'type': '<(library)',
+      'toolsets': [ 'host' ],
       'dependencies': [ 'icu_implementation' ],
       'sources': [
         '<@(icu_src_stubdata)'
@@ -313,7 +346,35 @@
       'target_name': 'icuucx',
       'type': '<(library)',
       'dependencies': [ 'icu_implementation', 'icu_uconfig' ],
-      'toolsets': [ 'host', 'target' ],
+      'toolsets': [ 'target' ],
+      'sources': [
+        '<@(icu_src_common)'
+      ],
+      'include_dirs': [
+        '../../deps/icu/source/common',
+      ],
+      'defines': [
+        'U_COMMON_IMPLEMENTATION=1',
+      ],
+      'export_dependent_settings': [ 'icu_uconfig' ],
+      'direct_dependent_settings': {
+        'include_dirs': [
+          '../../deps/icu/source/common',
+        ],
+        'conditions': [
+          [ 'OS=="win"', {
+            'link_settings': {
+              'libraries': [ '-lAdvAPI32.Lib', '-lUser32.lib' ],
+            },
+          }],
+        ],
+      },
+    },
+    {
+      'target_name': 'icuucx-host',
+      'type': '<(library)',
+      'dependencies': [ 'icu_implementation', 'icu_uconfig' ],
+      'toolsets': [ 'host' ],
       'sources': [
         '<@(icu_src_common)'
       ],
@@ -342,7 +403,7 @@
       'target_name': 'icutools',
       'type': '<(library)',
       'toolsets': [ 'host' ],
-      'dependencies': [ 'icuucx', 'icui18n', 'icustubdata' ],
+      'dependencies': [ 'icuucx-host', 'icui18n-host', 'icustubdata-host' ],
       'sources': [
         '<@(icu_src_tools)'
       ],
@@ -358,7 +419,7 @@
           '../../deps/icu/source/tools/toolutil',
         ],
       },
-      'export_dependent_settings': [ 'icuucx', 'icui18n', 'icustubdata' ],
+      'export_dependent_settings': [ 'icuucx-host', 'icui18n-host', 'icustubdata-host' ],
     },
     # This tool is needed to rebuild .res files from .txt,
     # or to build index (res_index.txt) files for small-icu
     # or to build index (res_index.txt) files for small-icu
@@ -366,7 +427,7 @@
       'target_name': 'genrb',
       'type': 'executable',
       'toolsets': [ 'host' ],
-      'dependencies': [ 'icutools', 'icuucx', 'icui18n' ],
+      'dependencies': [ 'icutools', 'icuucx-host', 'icui18n-host' ],
       'sources': [
         '<@(icu_src_genrb)'
       ],
@@ -382,7 +443,7 @@
       'target_name': 'iculslocs',
       'toolsets': [ 'host' ],
       'type': 'executable',
-      'dependencies': [ 'icutools', 'icuucx', 'icui18n', 'icuio' ],
+      'dependencies': [ 'icutools', 'icuucx-host', 'icui18n-host', 'icuio' ],
       'sources': [
         'iculslocs.cc',
         'no-op.cc',
@@ -394,7 +455,7 @@
       'target_name': 'icupkg',
       'toolsets': [ 'host' ],
       'type': 'executable',
-      'dependencies': [ 'icutools', 'icuucx', 'icui18n' ],
+      'dependencies': [ 'icutools', 'icuucx-host', 'icui18n-host' ],
       'sources': [
         '<@(icu_src_icupkg)',
         'no-op.cc',
@@ -405,7 +466,7 @@
       'target_name': 'genccode',
       'toolsets': [ 'host' ],
       'type': 'executable',
-      'dependencies': [ 'icutools', 'icuucx', 'icui18n' ],
+      'dependencies': [ 'icutools', 'icuucx-host', 'icui18n-host' ],
       'sources': [
         '<@(icu_src_genccode)',
         'no-op.cc',

Maybe there's a better way to do this.

@srl295
Copy link
Member Author

srl295 commented Oct 24, 2014

@trevnorris Argh! That's exactly what I was trying to avoid- duplicating everything- seems very fragile. I posted this - https://groups.google.com/forum/#!topic/gyp-developer/NrFzsJd2l78

@trevnorris
Copy link

@srl295 Thanks for posting that. Though I'll be surprised if anyone responds. Honestly I'm not super familiar with gyp, but maybe there's a way to use a variable. I forget how gyp uses those, but something like:

'dependencies': [ 'icutools-<@(toolset_type)', 'icuucx-<@(toolset_type)', ... ]

No idea how that would actually work, but might be worth a try.

@srl295
Copy link
Member Author

srl295 commented Oct 31, 2014

@trevnorris in between Unicode committe and Unicode conference. . but, I can try to do something like the above if there's a short window to v0.12

@trevnorris
Copy link

@srl295 The above should be the quickest solution. Why don't we have that one on hand just in case we can't figure out something better by the time v0.12 is released.

srl295 added a commit to srl295/node-v0.x-archive that referenced this issue Nov 27, 2014
This is to implement
nodejs#7676 (comment)

* make `--with-intl=none` the default
 * Download, verify (md5), unpack ICU's zip if not there
 * update docs

There's a "list" of URLs being used, but right now only the
first is picked up. The logic works something like this:

* if there is no directory `deps/icu`,
 * if no zip file (currently `icu4c-54_1-src.zip`),
  * download zip file (icu-project.org -> sf.net)
 * verify the MD5 sum of the zipfile
  * if bad, print error and exit
 * unpack the zipfile into `deps/icu`
* if `deps/icu` now exists, use it, else fail with help text

Also:
* refactor some code into tools/configure.d/nodedownload.py
* add `intl-none` option for `vcbuild.bat`

To rebuild `deps/icu-small` - (not currently checked in)
```
bash tools/icu/prepare-icu-source.sh
```

Also:
Reduce space by about 1MB with ICU 54 (over without this patch).
Also trims a few other source files, but only conditional on the exact ICU
version used. This is to future-proof - a file that is unneeded now may
be needed in future ICUs.
@srl295 srl295 changed the title enable i18n by default, flexible options Automatically download ICU. Keep --with-intl=none as the default. Nov 27, 2014
@srl295 srl295 changed the title Automatically download ICU. Keep --with-intl=none as the default. Intl: Build "English-only" binaries for the distribution. Nov 27, 2014
srl295 added a commit to srl295/node-v0.x-archive that referenced this issue Dec 9, 2014
This is to implement
nodejs#7676 (comment)

* make `--with-intl=none` the default
 * Download, verify (md5), unpack ICU's zip if not there
 * update docs
* add a test

There's a "list" of URLs being used, but right now only the
first is picked up. The logic works something like this:

* if there is no directory `deps/icu`,
 * if no zip file (currently `icu4c-54_1-src.zip`),
  * download zip file (icu-project.org -> sf.net)
 * verify the MD5 sum of the zipfile
  * if bad, print error and exit
 * unpack the zipfile into `deps/icu`
* if `deps/icu` now exists, use it, else fail with help text

Also:
* refactor some code into tools/configure.d/nodedownload.py
* add `intl-none` option for `vcbuild.bat`

To rebuild `deps/icu-small` - (not currently checked in)
```
bash tools/icu/prepare-icu-source.sh
```

Also:
Reduce space by about 1MB with ICU 54 (over without this patch).
Also trims a few other source files, but only conditional on the exact ICU
version used. This is to future-proof - a file that is unneeded now may
be needed in future ICUs.
srl295 added a commit to srl295/node-v0.x-archive that referenced this issue Dec 11, 2014
This is to implement
nodejs#7676 (comment)

* make `--with-intl=none` the default
 * Download, verify (md5), unpack ICU's zip if not there
 * update docs
* add a test

There's a "list" of URLs being used, but right now only the
first is picked up. The logic works something like this:

* if there is no directory `deps/icu`,
 * if no zip file (currently `icu4c-54_1-src.zip`),
  * download zip file (icu-project.org -> sf.net)
 * verify the MD5 sum of the zipfile
  * if bad, print error and exit
 * unpack the zipfile into `deps/icu`
* if `deps/icu` now exists, use it, else fail with help text

Also:
* refactor some code into tools/configure.d/nodedownload.py
* add `intl-none` option for `vcbuild.bat`

To rebuild `deps/icu-small` - (not currently checked in)
```
bash tools/icu/prepare-icu-source.sh
```

Reduce space by about 1MB with ICU 54 (over without this patch).
Also trims a few other source files, but only conditional on the exact ICU
version used. This is to future-proof - a file that is unneeded now may
be needed in future ICUs.
srl295 added a commit to srl295/node-v0.x-archive that referenced this issue Dec 13, 2014
This is to implement
nodejs#7676 (comment)

* make `--with-intl=none` the default
 * Download, verify (md5), unpack ICU's zip if not there
 * update docs
* add a test

There's a "list" of URLs being used, but right now only the
first is picked up. The logic works something like this:

* if there is no directory `deps/icu`,
 * if no zip file (currently `icu4c-54_1-src.zip`),
  * download zip file (icu-project.org -> sf.net)
 * verify the MD5 sum of the zipfile
  * if bad, print error and exit
 * unpack the zipfile into `deps/icu`
* if `deps/icu` now exists, use it, else fail with help text

Also:
* refactor some code into tools/configure.d/nodedownload.py
* add `intl-none` option for `vcbuild.bat`

To rebuild `deps/icu-small` - (not currently checked in)
```
bash tools/icu/prepare-icu-source.sh
```

Reduce space by about 1MB with ICU 54 (over without this patch).
Also trims a few other source files, but only conditional on the exact ICU
version used. This is to future-proof - a file that is unneeded now may
be needed in future ICUs.
trevnorris pushed a commit that referenced this issue Jan 3, 2015
Make "--with-intl=none" the default and add "intl-none" option to
vcbuild.bat.

If icu data is missing print a warning unless either --download=all or
--download=icu is set. If set then automatically download, verify (MD5)
and unpack the ICU data if not already available.

There's a "list" of URLs being used, but right now only the first is
picked up. The logic works something like this:

* If there is no directory deps/icu,
  * If no zip file (currently icu4c-54_1-src.zip),
    * Download zip file (icu-project.org -> sf.net)
  * Verify the MD5 sum of the zipfile
    * If bad, print error and exit
  * Unpack the zipfile into deps/icu
* If deps/icu now exists, use it, else fail with help text

Add the configuration option "--with-icu-source=..."

Usage:
  * --with-icu-source=/path/to/my/other/icu
  * --with-icu-source=/path/to/icu54.zip
  * --with-icu-source=/path/to/icu54.tgz
  * --with-icu-source=http://example.com/icu54.tar.bz2

Add the configuration option "--with-icu-locals=...".  Allows choosing
which locales are used in the "small-icu" case.

Example:
    configure --with-intl=small-icu --with-icu-locales=tlh,grc,nl

(Also note that as of this writing, neither Klingon nor Ancient Greek
are in upstream CLDR data. Serving suggestion only.)

Don't use hard coded ../../out paths on windows. This was suggested by
@misterdjules as it causes test failures.  With this fix, "out" is no
longer created on windows and the following can run properly:

    python tools/test.py simple

Reduce space by about 1MB with ICU 54 (over without this patch). Also
trims a few other source files, but only conditional on the exact ICU
version used. This is to future-proof - a file that is unneeded now may
be needed in future ICUs.

Also:
  * Update distclean to remove icu related files
  * Refactor some code into tools/configure.d/nodedownload.py
  * Update docs
  * Add test

PR-URL: #8719
Fixes: #7676 (comment)
[trev.norris@gmail.com small change to test's whitespace and logic]
Signed-off-by: Trevor Norris <trev.norris@gmail.com>
@misterdjules
Copy link

I find that having this "general discussion" issue makes it difficult to track the current state of this work.

If I understand correctly, here's what we want regarding the distribution of binaries that include ICU support:

  • First, release binaries with "small" ICU support, the target for that is the 0.11.15 milestone. This is already supported, only Jenkins jobs must be updated to use --with-intl=small-icu and vcbuild.bat small-icu.
  • When i18n: Allow locating ICU data relative to 'exec_path'. #8983 is implemented, include "full" ICU data in binary distributions (Windows and OS X installers, as well as binary tarball). This can be done in a later milestone, like 0.12.1, because it doesn't change node's API.

If we agree on that, I suggest closing this issue and creating two separate issues that correspond to the two points mentioned above.

@tjfontaine @srl295 @trevnorris What do you think?

@rvagg rvagg mentioned this issue Jan 6, 2015
@srl295
Copy link
Member Author

srl295 commented Jan 6, 2015

First, release binaries with "small" ICU support, the target for that is
the 0.11.15 milestone. This is already supported, only Jenkins jobs must be
updated to use --with-intl=small-icu and vcbuild.bat small-icu.

But, that's the only and entire point of #7676 - if that's true,, it should
be considered fixed.

@misterdjules
Copy link

@srl295 Ok, I was just confused by some comments that mention several sub-issues/sub-tasks that we may want to address separately. This issue also contains comments that were not updated and I think that just adds to the confusion.

Like I said I would rather have each issue/PR address only "atomic" issues so that we can track them more easily.

I just tested the small-icu build on all supported platforms here: http://jenkins.nodejs.org/job/nodejs-julien/235/ and here: http://jenkins.nodejs.org/job/nodejs-julien-windows/77/, and there is no regression. The failing test on Windows has nothing to do with ICU-related changes. It was introduced by b636ba8 and is already tracked by #8986.

However, in my previous comment I failed to mention that some more work needs to be done to have small-icu support enabled for some binary packages (OSX pkg and binary tarballs). Basically, the base Makefile needs to be updated to pass with-intl=small-icu to the configure script. I will take care of that.

So as soon as the changes needed to make binary packages include small-icu land, I suggest closing this issue and we'll start creating several more specific issues. @srl295 How does that sound?

@misterdjules misterdjules added this to the 0.11.15 milestone Jan 7, 2015
@misterdjules misterdjules self-assigned this Jan 7, 2015
@srl295
Copy link
Member Author

srl295 commented Jan 7, 2015

Sounds good to me... more later.

misterdjules pushed a commit to misterdjules/node that referenced this issue Jan 8, 2015
Invokes the configure script used to build binary packages
(OSX pkg, binary tarballs, pkgsrc, MSI) with --download=all
--with-intl=small-icu.

Also makes PACKAGEMAKER customizable, because PackageMaker is not
necessarily installed in /Developer on OSX anymore.

Tested all binary packages on Windows, OSX, Linux and SmartOS.

Fixes nodejs#7676.
@misterdjules
Copy link

@srl295 I just submitted #8994 that enables small-icu support for all binary packages.

misterdjules pushed a commit to misterdjules/node that referenced this issue Jan 15, 2015
Invokes the configure script used to build binary packages
(OSX pkg, binary tarballs, pkgsrc, MSI) with --download=all
--with-intl=small-icu.

Also makes PACKAGEMAKER customizable, because PackageMaker is not
necessarily installed in /Developer on OSX anymore.

Tested all binary packages on Windows, OSX, Linux and SmartOS.

Fixes nodejs#7676.

Reviewed-by: Steven R. Loomis <srl@icu-project.org>
Reviewed-by: Trevor Norris <trev.norris@gmail.com>
@misterdjules
Copy link

#8994 Landed in 67f87a7, so closing this issue. @srl295 Please ping me on #libuv when you have some time to discuss the next steps.

Thanks all!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

7 participants