Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge master into units-staging #29

Merged
merged 61 commits into from
Mar 13, 2020
Merged

Conversation

sffc
Copy link
Owner

@sffc sffc commented Mar 13, 2020

Checklist

jefgen and others added 30 commits January 20, 2020 14:58
Somehow these tests are now fail on trunks.
Per https://mm.icann.org/pipermail/tz-announce/2019-July/000056.html
     Brazil has canceled DST and will stay on standard time indefinitely.

Cherry-picked from: 11ad8d6
…ng statically linked ICU data from executables.
Compiled regular expression patterns make use of several shared common
UnicodeSets. This change simplifies the creation and use of these
static UnicodeSets.

- Pointer fields to the static sets are removed from the compiled patterns,
  and the static variables are accessed directly. The deleted pointers
  were a hold-over from earlier code that did not use shared statics.

- The UnicodeSet pattern literals are changed from hex constants to
  u"string literals".

- The size of fRuleSets (from regexst.h) is changed from a hard-coded 10
  to the number of UnicodeSets actually required. Doing this required
  a change to regexcst.pl to export the required size. Changing and
  rerunning this perl code resulted in massive but benign changes to
  the generated file regexcst.h, the result of perl having changed its
  order of enumeration of hashes since the file was last regenerated.

- UnicodeSets are frozen when possible. Should result in faster matching.
In regular expressions, when testing for word boundaries with \b, the
boundaries were incorrect when in Unicode mode, meaning that an ICU word break
iterator is being used to find the boundaries, and the text being matched is
UTF-8 encoded.

The bug stemmed from a misunderstanding of how string indexes work with UText
and break iterators, leading to the inclusion of code to convert from UTF-8 to
UTF-16 indexing, when what was wanted was the original UTF-8 index everywhere.
Removing the indexing conversion fixes the problem.
Adds `ICU_TIMEZONE_FILES_DIR_PREFIX_ENV_VAR`, similar to
`ICU_DATA_DIR_PREFIX_ENV_VAR`, that specifies an environment variable
to retrieve and prepend to the ICU time zone data file path.
This change builds on Vincent Torri's changes.

This installs the ICU DLL files in $prefix/bin instead of $prefix/lib.

Note: In order to disable this change in behavior you can edit
the "mh-mingw*" file(s). If you set the variable MINGW_MOVEDLLSTOBINDIR
to NO instead of YES, then it will retain the previous behavior of
installing the DLLs into the bin folder.
Also recursively list out the contents of the install directory,
and run the icuinfo.exe program.
Change the implementation of grapheme cluster matching in regex to use an ICU
break iterator instead of a little one-off state machine.

The old implementation had fallen behind the Unicode UAX-29 specification for
graphem clusters, and could not be easily updated.

The implementation follows the same general pattern that is used for finding
word boundaries with an ICU break iterator. In reviewing that code, a few
improvements to the handling of ICU error codes were also made.

Also note that this change adds a new dependency on Break Iteration.  Regex
patterns that previously would work with ICU builds that were configured with
no break iteration will now fail. But only if they include \X for matching
grapheme cluster boundaries.
- Compared against ICU4C 65.1
- no substantive change. Just dropped 'preview'
since the move of the DLL to bin/ the library names in .pc files is
wrong. With ICU 65.1, icu-uc.pc contains

Libs: -L${libdir} -licuuc65 -licudt65

the version number should not appear. Indeed, the linker looks for the
libraries in $prefix/lib in the following order (see [1]):

libxxx.dll.a
xxx.dll.a
libxxx.a
cygxxx.dll
libxxx.dll
xxx.dll

As the is only the import library with no versioning (which is normal),
the is a link error when using ICU pc files.

[1] https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/4/html/Using_ld_the_GNU_Linker/win32.html
sffc and others added 28 commits February 21, 2020 18:21
Conflicts:
	icu4j/main/shared/data/icudata.jar
The issue shows under valgrind or as an Address Sanitizer failure.
This makes fixes in order to run the icu4c tests (intltest, cintltst,
iotest, and icuinfo) cleanly under valgrind with --leak-check=full.
…rCycle

If you call the API getDefaultHourCycle on an empty DateTimePatternGenerator
instance (ie: no locale) then it calls UPRV_UNREACHABLE which calls abort().
We should return an error code instead of aborting.
…ve defaultLocaleIndex field; constructor check if locales are equivalent to default, not just equal; simplify locale sorting; minor builder & test deflaking
…ce older accept-language-string parsing by LocalePriorityList
This adds a separate CI pipeline for running valgrind on ICU4C.

The Azure Pipeline images don't have valgrind installed by default though,
so we need to install valgrind first.

We also add `--error-exitcode=1` to the valgrind options, so that any
errors found by valgrind will fail the CI build.
…triggers.

It seems that having "pr:none" completely disables running on PRs, even
when explicitly triggered by a comment.
- add new key 4569BBC09DA846FC91CBD21CE1BBA44593CF2AE0
To print it,
$ CXXFLAGS="-DRBBI_DEBUG" ./runConfigureICU --enable-debug --disable-release  Linux/gcc --disable-layoutex
$ make clean
$ U_RBBIDEBUG="size" make
@sffc sffc merged commit 2eac56e into sffc:units-staging Mar 13, 2020
@hugovdm hugovdm mentioned this pull request Apr 7, 2020
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.