Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PHP 8.4][Intl] Add grapheme_str_split #483

Open
wants to merge 1 commit into
base: 1.x
Choose a base branch
from

Conversation

Ayesh
Copy link
Contributor

@Ayesh Ayesh commented Jun 5, 2024

Adds a polyfill for the grapheme_str_split function added in PHP 8.4.

Requires PHP 7.3, because the polyfill is based on \X Regex, and it only works properly on PCRE2, which only comes with PHP 7.3+.

Further, there are some cases that the polyfill cannot split complex characters (such as two consecutive country flag Emojis). This is now fixed in PCRE2Project/pcre2#410. However, this change will likely only make it to PHP 8.4.

References:

@Ayesh
Copy link
Contributor Author

Ayesh commented Jun 5, 2024

(working on the Intl changes, I'll mark the PR ready then)

@derrabus
Copy link
Member

derrabus commented Jun 5, 2024

Thank you. I think, we should add this polyfill to the intl-grapheme polyfill as well.

@Ayesh Ayesh force-pushed the grapheme_str_split branch 2 times, most recently from 75b1867 to 123cf13 Compare June 8, 2024 10:18
@Ayesh Ayesh marked this pull request as ready for review June 8, 2024 10:21
@Ayesh Ayesh force-pushed the grapheme_str_split branch from 123cf13 to 3e8ced0 Compare June 8, 2024 10:21
@Ayesh
Copy link
Contributor Author

Ayesh commented Jun 8, 2024

Thank you @derrabus - I added polyfill and tests for grapheme_str_split to the Intl polyfill too.

README.md Outdated Show resolved Hide resolved
src/Intl/Grapheme/Grapheme.php Outdated Show resolved Hide resolved
src/Intl/Grapheme/Grapheme.php Outdated Show resolved Hide resolved
src/Intl/Grapheme/Grapheme.php Outdated Show resolved Hide resolved
src/Intl/Grapheme/bootstrap.php Outdated Show resolved Hide resolved
src/Intl/Grapheme/bootstrap73.php Outdated Show resolved Hide resolved
@nicolas-grekas
Copy link
Member

Friendly ping @Ayesh :)

@Ayesh Ayesh force-pushed the grapheme_str_split branch 7 times, most recently from 0d90a13 to 7a429f5 Compare September 9, 2024 14:28
@Ayesh
Copy link
Contributor Author

Ayesh commented Sep 9, 2024

Thank you @nicolas-grekas - really helpful comments, I addressed them and force-pushed. \X regex polyfill for PCRE1 is very cool, it worked beautifully, 10,000 IQ regex 🤯 :)

Copy link
Member

@nicolas-grekas nicolas-grekas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one question + minor CS things and GTM

tests/Intl/Grapheme/GraphemeTest.php Outdated Show resolved Hide resolved
tests/Intl/Grapheme/GraphemeTest.php Outdated Show resolved Hide resolved
tests/Intl/Grapheme/GraphemeTest.php Outdated Show resolved Hide resolved
tests/Php84/Php84Test.php Outdated Show resolved Hide resolved
tests/Php84/Php84Test.php Outdated Show resolved Hide resolved
@Ayesh Ayesh force-pushed the grapheme_str_split branch from 7a429f5 to bf4a1a7 Compare September 9, 2024 14:42
@Ayesh
Copy link
Contributor Author

Ayesh commented Sep 9, 2024

One last push, thank you for being patient with this 💜

@nicolas-grekas
Copy link
Member

So we have a test failure on PHP 7.2 :)
Maybe we should remove the corresponding test case? The fallback regexp doesn't account for ZWJ emojis IIRC

@Ayesh Ayesh force-pushed the grapheme_str_split branch 2 times, most recently from 84bcd48 to cf9281b Compare September 9, 2024 15:23
src/Intl/Grapheme/README.md Outdated Show resolved Hide resolved
@Ayesh
Copy link
Contributor Author

Ayesh commented Sep 9, 2024

Perfect, fixed.
So far, we exclude this ZW joiner case on PHP 7.2, and a known buggy PCRE2 \X capture on PCRE2 < 10.44 regardless of the PHP version.

Add a polyfill for the `grapheme_str_split` function added in PHP 8.4.

Requires PHP 7.3, because the polyfill is based on `\X` Regex, and it
only works properly on PCRE2, which
[only comes with PHP 7.3+](https://php.watch/versions/7.3/pcre2).

Further, there are some cases that the polyfill cannot split complex
characters (such as two consecutive country flag Emojis). This is now
fixed in [PCRE2Project/pcre2#410](PCRE2Project/pcre2#410).
However, this change will likely only make it to PHP 8.4.

References:
 - [RFC: Grapheme cluster for `str_split` function: `grapheme_str_split`](https://wiki.php.net/rfc/grapheme_str_split)
 - [PHP.Watch: PHP 8.4: New `grapheme_str_split` function](https://php.watch/versions/8.4/grapheme_str_split)
@Ayesh Ayesh force-pushed the grapheme_str_split branch from cf9281b to 4a7aa31 Compare September 9, 2024 15:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants