-
Notifications
You must be signed in to change notification settings - Fork 30.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
toLocaleUpperCase seems broken for greek on ICU 58 #9445
Comments
I would also expect |
I just noticed that https://codereview.chromium.org/1812673005 mentions a |
Yes, that flag needs to be turned on. In addition, with ICU 58, I can just use ICU's case conversion API for Greek uppercasing instead of relying on ICU transliteration API. |
|
* toLocaleUpperCase() and toLocaleLowerCase() do not function properly without this flag. * basic test case. The test case would fail if `--no_icu_case_mapping` was set. Fixes: nodejs#9445 PR-URL: nodejs#9454 Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
For the record, it is behind the flag because there's a performance issue for characters below u+0080 or u+0100. |
FYI, a CL was landed last night in v8's ToT to enable icu_case_mapping by default. The performance for [U+0100, U+FFFF] is 60~80% of non-ICU approach, but the correctness trumped the performance. |
@jungshik @srl295 I'm a little confused here... the Turkish However... all of the greek special case mappings appear to be defined as "Language-Insensitive" or "Unconditional" mappings! So why then is Am I missing something? 🧐 |
@markusicu can you comment on this last? ICU4C tests seems to also use // ICU4C test case
#include <unicode/ustring.h>
#include <unicode/ustdio.h>
#include <unicode/errorcode.h>
int main(int argc, const char **argv) {
const char16_t in[] = u"Πατάτα";
const char *loc = "el"; // or ""
char16_t str[256];
icu::ErrorCode status;
u_strToUpper(str, 256, in, -1, loc, status);
u_printf_u(u"Status: %s, loc %s\n", status.errorName(), loc);
if (status.isFailure()) {
return 1;
}
u_printf_u(u"=> %S\n", str);
return 0;
} loc=""
Status: U_ZERO_ERROR, loc (root locale)
=> ΠΑΤΆΤΑ loc="el"
Status: U_ZERO_ERROR, loc el
=> ΠΑΤΑΤΑ |
The UTC has given up defining language-specific case mappings in the Unicode Character Database and in the Unicode core spec. It defers to CLDR+ICU for new work in this area. In particular, Greek uppercasing is quite complicated, and cannot be expressed with the limited machinery in UCD SpecialCasing.txt. It looks like Steven created this issue here because in ICU 58 I implemented proper Greek uppercasing. Look for "Greek" in https://icu.unicode.org/download/58 |
@markusicu gotcha, makes sense. I guess that still begs the question, though: why would locale "en" (for example) not follow the same rules for greek uppercasing as locale "el"? Shouldn't the greek language have the final say on what proper greek uppercasing is? It's not like the |
|
@markusicu cool, all makes sense. Thanks for the explanations; this has been enlightening! |
@markusicu yes, thanks for replying here. Will leave this issue closed. |
ICU 58 (HEAD as of this writing)
should be
ΠΑΤΑΤΑ
(no accent)investigate…
may be related to hard coded lists mentioned in http://bugs.icu-project.org/trac/ticket/12647
// cc @nodejs/intl @jshin
The text was updated successfully, but these errors were encountered: