Skip to content

Commit

Permalink
Fix case-insensitive set operations (#104)
Browse files Browse the repository at this point in the history
* fix: expand case foldings before intersection/subtraction

* fix: maintain config.modifiersData when we don't transform modifiers

* fix: pass through caseFoldFlags to computeClassStrings

* add more test cases

* fix: update the anchor/dot when modifiers are transformed

* add more test cases

* refactor: rename caseFold to caseEquivalents

In spec, caseFold refers to mapping uppercase letter to the lowercase, here we are actually adding case equivalents to any given set of characters, such that they map to the same character via scf(). To avoid confusion, rename caseFold to caseEquivalents.

* build: emit one way mappings to iu-foldings

* polish: apply scf() to the class set operand

* test: add more test cases

* perf: apply scf only in intersection/subtraction

* fix: apply SCF on unicode escape and wW

* fix: generate \D and \S from UNICODE_IV_SET

* fix: call scf on character class range and pass through shouldApplySCF to nested class

* test: remove matches tests for node 6 compat

The matches are already tested in unicode-set.js

* Update data/character-class-escape-sets.js

* Update scripts/case-mappings.js

* Update scripts/character-class-escape-sets.js

---------

Co-authored-by: Mathias Bynens <mathias@qiwi.be>
  • Loading branch information
JLHwung and mathiasbynens authored Nov 21, 2024
1 parent c9db4c2 commit 924446a
Show file tree
Hide file tree
Showing 8 changed files with 1,884 additions and 106 deletions.
22 changes: 22 additions & 0 deletions data/character-class-escape-sets.js
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
'use strict';

const regenerate = require('regenerate');
const UNICODE_IV_SET = require('./all-characters.js').UNICODE_IV_SET;

exports.REGULAR = new Map([
['d', regenerate()
Expand Down Expand Up @@ -103,3 +104,24 @@ exports.UNICODE_IGNORE_CASE = new Map([
.addRange(0x180, 0x2129)
.addRange(0x212B, 0x10FFFF)]
]);

exports.UNICODESET_IGNORE_CASE = new Map([
['d', regenerate()
.addRange(0x30, 0x39)],
['D', UNICODE_IV_SET.clone().remove(regenerate()
.addRange(0x30, 0x39))],
['s', regenerate(0x20, 0xA0, 0x1680, 0x202F, 0x205F, 0x3000, 0xFEFF)
.addRange(0x9, 0xD)
.addRange(0x2000, 0x200A)
.addRange(0x2028, 0x2029)],
['S', UNICODE_IV_SET.clone().remove(regenerate(0x20, 0xA0, 0x1680, 0x202F, 0x205F, 0x3000, 0xFEFF)
.addRange(0x9, 0xD)
.addRange(0x2000, 0x200A)
.addRange(0x2028, 0x2029))],
['w', regenerate(0x5F)
.addRange(0x30, 0x39)
.addRange(0x61, 0x7A)],
['W', UNICODE_IV_SET.clone().remove(regenerate(0x5F)
.addRange(0x30, 0x39)
.addRange(0x61, 0x7A))]
]);
Loading

0 comments on commit 924446a

Please sign in to comment.