-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Matching lone surrogates *only* #28
Comments
This doesn’t fully fix the issue. Doing so is hard, since JS doesn’t support lookbehind. var set = regenerate().addRange(0xD800, 0xDBFF).addRange(0xDC00, 0xDFFF);
var regex = RegExp('^a(?:' + set.toString() + ')b$');
// currently, this results in `/^a(?:[\uD800-\uDBFF](?![\uDC00-\uDFFF])|(?:[^\uD800-\uDBFF]|^)[\uDC00-\uDFFF])b$/`
console.log(regex.test('a\uD834b')); // expected: true; actual: true
console.log(regex.test('a\uDC00b')); // expected: true; actual: false |
Given the above code: var set = regenerate().addRange(0xD800, 0xDBFF).addRange(0xDC00, 0xDFFF);
var regex = RegExp('^a(?:' + set.toString() + ')b$');
// currently, this results in `/^a(?:[\uD800-\uDBFF](?![\uDC00-\uDFFF])|(?:[^\uD800-\uDBFF]|^)[\uDC00-\uDFFF])b$/` Consider these three tests: console.log(regex.test('a\uD834b')); // expected: true
console.log(regex.test('a\uDC00b')); // expected: true
console.log(regex.test('a\uD834\uDF06b')); // expected: false There are two options: a. Either we pass test 1 and 3 but fail test 2 (i.e. lone low surrogates aren’t matched accurately). (As in the current implementation in v1.2.1.) Which is the lesser evil — a or b? |
As Marja said:
Let’s go with |
See mathiasbynens/regexpu#16 and https://gist.github.com/mathiasbynens/bbe7f870208abcfec860.
Instead, it would make more sense to match lone surrogates only in such cases.
The text was updated successfully, but these errors were encountered: