Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apply case expansion to unicode property escapes within i modifier #97

Merged
merged 6 commits into from
Oct 7, 2024

Conversation

JLHwung
Copy link
Collaborator

@JLHwung JLHwung commented Oct 3, 2024

Fixes #96.

In this PR we also pack caseFold parameters into a bit array avoid per-character queries to these parameters.

'pattern': '[\\p{Lowercase_Letter}&&\\p{ASCII}](?-i:a)',
'flags': 'iv',
'options': { unicodeSetsFlag: 'transform', unicodePropertyEscapes: 'transform', modifiers: 'transform' },
'expected': '[A-Za-z\\u017F\\u212A](?:a)',
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per tc39/proposal-regexp-v-flag#30, \u017F and \u212A should not have been included here as they are not in the simple case closure of the given set [\\p{Lowercase_Letter}&&\\p{ASCII}], i.e. (\u212A) does not have simple case folding although k's simple case folding includes \u212A.

I think this should be addressed in a new PR, we can introduce a new SCC case mapping and hopefully this approach may fix #66 as well.

'options': { unicodePropertyEscapes: 'transform', modifiers: 'transform' },
'matches': ['ck', 'Ck', 'δk', 'Δk', '\u{118A8}k', '\u{118C8}k'],
'nonMatches': ['cK', 'CK', 'δK', 'ΔK', '\u{118A8}K', '\u{118C8}K', 'c\u212A', 'C\u212A'],
'expectedFlags': 'u'
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We did not test the transform result here because it is very likely that \p{Lowercase_Letter} will change per every Unicode versions. Here we use the matches and nonMatches tests to ensure the transform result is good.

@@ -103,15 +105,15 @@ const modifiersFixtures = [
'expected': '(?:[Aa][^])',
'expectedFlags': ''
},
{
!IS_NODE_6 && {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two tests are failing on node.js 6. However, starting from Node.js 8, the test is passing as expected. I assume this is a node.js 6 bug and it has been fixed in later node versions. So we can skip this test for node.js 6.

Minimum reproduction:

// returns false on Node.js 6, true on Node.js 8 and above
/(?:[\u{10000}])k/u.test("\u{10000}k")

caseFoldUnicode = configNeedCaseFoldUnicode();
const singleChars = data.singleChars;
if (caseFoldBMP || caseFoldUnicode) {
for (const codepoint of singleChars.toArray()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mathiasbynens It'd be great if regenerate objects were iterable, to avoid having to go through the array :)

@JLHwung JLHwung merged commit b38de88 into mathiasbynens:main Oct 7, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants