-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Apply case expansion to unicode property escapes within i
modifier
#97
Conversation
The commits also avoids querying caseFold flags when we are handling every single characters within a character sets.
'pattern': '[\\p{Lowercase_Letter}&&\\p{ASCII}](?-i:a)', | ||
'flags': 'iv', | ||
'options': { unicodeSetsFlag: 'transform', unicodePropertyEscapes: 'transform', modifiers: 'transform' }, | ||
'expected': '[A-Za-z\\u017F\\u212A](?:a)', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per tc39/proposal-regexp-v-flag#30, \u017F
and \u212A
should not have been included here as they are not in the simple case closure of the given set [\\p{Lowercase_Letter}&&\\p{ASCII}]
, i.e. (\u212A)
does not have simple case folding although k
's simple case folding includes \u212A
.
I think this should be addressed in a new PR, we can introduce a new SCC case mapping and hopefully this approach may fix #66 as well.
'options': { unicodePropertyEscapes: 'transform', modifiers: 'transform' }, | ||
'matches': ['ck', 'Ck', 'δk', 'Δk', '\u{118A8}k', '\u{118C8}k'], | ||
'nonMatches': ['cK', 'CK', 'δK', 'ΔK', '\u{118A8}K', '\u{118C8}K', 'c\u212A', 'C\u212A'], | ||
'expectedFlags': 'u' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We did not test the transform result here because it is very likely that \p{Lowercase_Letter}
will change per every Unicode versions. Here we use the matches
and nonMatches
tests to ensure the transform result is good.
@@ -103,15 +105,15 @@ const modifiersFixtures = [ | |||
'expected': '(?:[Aa][^])', | |||
'expectedFlags': '' | |||
}, | |||
{ | |||
!IS_NODE_6 && { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two tests are failing on node.js 6. However, starting from Node.js 8, the test is passing as expected. I assume this is a node.js 6 bug and it has been fixed in later node versions. So we can skip this test for node.js 6.
Minimum reproduction:
// returns false on Node.js 6, true on Node.js 8 and above
/(?:[\u{10000}])k/u.test("\u{10000}k")
rewrite-pattern.js
Outdated
caseFoldUnicode = configNeedCaseFoldUnicode(); | ||
const singleChars = data.singleChars; | ||
if (caseFoldBMP || caseFoldUnicode) { | ||
for (const codepoint of singleChars.toArray()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mathiasbynens It'd be great if regenerate objects were iterable, to avoid having to go through the array :)
Fixes #96.
In this PR we also pack
caseFold
parameters into a bit array avoid per-character queries to these parameters.