Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IntlDateFormatter with 'C' locale still broken on PHP 8.2 #12943

Closed
Girgias opened this issue Dec 12, 2023 · 8 comments
Closed

IntlDateFormatter with 'C' locale still broken on PHP 8.2 #12943

Girgias opened this issue Dec 12, 2023 · 8 comments
Assignees

Comments

@Girgias
Copy link
Member

Girgias commented Dec 12, 2023

<?php

$a = new IntlDateFormatter('C', IntlDateFormatter::FULL, IntlDateFormatter::FULL);
var_dump($a->getLocale(ULOC_VALID_LOCALE));
?>

Result:

Fatal error: Uncaught Error: Found unconstructed IntlDateFormatter in /in/qCifp:6
Stack trace:
#0 /in/qCifp(6): IntlDateFormatter->getLocale(1)
#1 {main}
  thrown in /in/qCifp on line 6

Process exited with code 255.

I would expect a notice or warning but not the fatal error. Please can you explain the usage?

Originally posted by @hormus in #12568 (comment)

@Girgias
Copy link
Member Author

Girgias commented Dec 12, 2023

Seems #12568 didn't completely resolve the issue.

@hormus
Copy link

hormus commented Dec 12, 2023

@Girgias thank you for new topic.
https://unicode-org.github.io/icu/userguide/locale/#canonicalization

From ICU doc:

Level 1 canonicalization. This operation performs minor, isolated changes, such as changing “en-us” to “en_US”. Level 1 canonicalization is not designed to handle “foreign” locale IDs (POSIX, .NET) but rather IDs that are in ICU format, but which do not have normalized case and delimiters. Level 1 canonicalization is accomplished by the ICU functions uloc_getName, Locale::createFromName, and Locale::Locale. The latter two APIs exist in both C++ and Java.

PHP use Locale::createFromName https://github.com/php/php-src/pull/12568/files/040927ed7946fb15cb2e3934db8a137f8794f32b#diff-a08e91205599dcf2fe2e21791185d07a67fa543d3b1cb51825eee0ae88fc24c6R113

Level 2 canonicalization. This operation may make major changes to the ID, possibly replacing entire elements of the ID. An example is changing “fr-fr@EURO” to “fr_FR@currency=EUR”. Level 2 canonicalization is designed to translate POSIX and .NET IDs, as well as nonstandard ICU locale IDs. Level 2 is a superset of level 1; every operation performed by level 1 is also performed by level 2. Level 2 canonicalization is performed by uloc_canonicalize and Locale::createCanonical. The latter API exists in both C++ and Java.

³ POSIX charset specifiers are deleted, e.g. “en_US.utf8” => “en_US”.
glibc 2.29 fixed a bug, en_US.UTF-8 started showing in 12-hour format, as was intended. If wanting to use 24-hour format, use LC_TIME=C.UTF-8 or LC_ALL=C.UTF-8

@devnexen
Copy link
Member

devnexen commented Dec 12, 2023

So the aforementioned fix was not intended for intl to accept locales such as C but still considered as invalid. Now we can decide to abstract more and be able to do so indeed.
In the meantime, @hormus, would try/catching your code like here (example 3) work for you ?

devnexen added a commit to devnexen/php-src that referenced this issue Dec 15, 2023
devnexen added a commit to devnexen/php-src that referenced this issue Dec 20, 2023
@hormus
Copy link

hormus commented Dec 27, 2023

ICU Canonicalization level 1 or level 2
Example 1. "C"
Example 2. "C.UTF-8" // posix syntax

<?php

error_reporting(PHP_INT_MAX);
ini_set("intl.error_level", E_WARNING);
//Locale::setDefault('it_IT');
$LocaleDefault1 = Locale::getDefault();
$LocaleDefault2 = 'C';

$copy = $LocaleDefault1 . '.';
if(is_int(strpos($copy, 'C.'))) {
$LocaleDefault1 = strtr(Locale::canonicalize($localeDefault1), array($localeDefault1 => 'en_US'));
Locale::setDefault($LocaleDefault1);
if(!$LocaleDefault2) {
$LocaleDefault2 = $LocaleDefault1;
}
}
$copy = $LocaleDefault2 . '.';
if(is_int(strpos($copy, 'C.'))) {
$LocaleDefault2 = strtr(Locale::canonicalize($LocaleDefault2), array($LocaleDefault2 => 'en_US'));
} elseif(!$localeDefault2) { // Use Default locale for null or empty string
$localeDefault2 = $localeDefault1;
}
$LocaleDefault3 = null;
try {
try {
$dateFormatter = new IntlDateFormatter($LocaleDefault2, IntlDateFormatter::FULL, IntlDateFormatter::NONE);
$LocaleDefault3 = $dateFormatter->getLocale(ULOC_VALID_LOCALE);
} catch(IntlException $e) {
}
} catch(Error $e) {
}
$b = intl_get_error_code();
var_dump($b, $LocaleDefault3, $LocaleDefault1);

?>

This Is my php fallback, $localeDefault2 Is code input or output by user.

Locale::canonicalize('C') don't convert to en_US_POSIX for this user #12561 (comment) my fallback convert explicit to en_US

https://github.com/php/php-src/blob/master/ext/intl/locale/locale_methods.c#L798-L805 Locale::canonicalize is get_icu_value_src_php function

@devnexen
Copy link
Member

devnexen commented Mar 4, 2024

php 8.2.16/8.3.2 works for me with your code snippet.

@hormus
Copy link

hormus commented Mar 4, 2024

@abc1000200 don't use direct string C but wraper for (C only posix system from ICU).
#12943 (comment)

@Edvarian
Copy link

Hello, I ran into a similar issue a few days ago, not when using 'C' as a locale, but the content of 'Accept-Language' http header.

I ran some tests using just this little code :

<?php
$locale = $_SERVER['HTTP_ACCEPT_LANGUAGE'];
echo "$locale<br/>";
$dateFormat = new \IntlDateFormatter($locale, \IntlDateFormatter::LONG, \IntlDateFormatter::NONE);

echo $dateFormat->format(\IntlCalendar::fromDateTime('06/22/2024'));
?>

My default language is french, but depending on the brower configuration, it can work, or not...

Everything works fine with any php version if the header is "fr-FR,fr;q=0.9,en-US;q=0.8,en;q=0.7" (my default Chrome configuration)
But my Firefox browser has another default order for locales : "fr,fr-FR;q=0.8,en-US;q=0.5,en;q=0.3"
No problem with this until php 8.1.24/8.2.11, but with next releases I get an Exception :
IntlException: datefmt_create: invalid locale: U_ILLEGAL_ARGUMENT_ERROR
php 8.3 throws another one :
Fatal error: Found unconstructed IntlDateFormatter

So I've been tweaking around with my browser settings, changing languages order in Chrome to replicate my FF settings causes the exact same crash. Same problem with MSEdge.
After several tries, it seems that if the first part of this string is a language identifier NOT followed by a hypen, an exception is thrown ("en-,en-US;q=0.8,fr;q=0.5,fr-FR;q=0.3" works fine, but doesn't seem so valid to me)
Passing only one language id (2 or 3 chars, like 'en' or 'eng') is OK though...

So what is considered as a valid/known locale ? This header shouldn't be used anymore ?

@devnexen
Copy link
Member

devnexen commented Jun 23, 2024

So what is considered as a valid/known locale ? This header shouldn't be used anymore ?

I would not say that but you need to extract the locale you re interested in e.g. fr-FR is valid but fr-FR,fr;q=0.9,en-US;q=0.8,en;q=0.7 isn't.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants
@devnexen @Girgias @hormus @Edvarian and others