Add Unicode property escape support to RegExp #1295

jonathanj · 2024-02-03T13:07:06Z

Summary

Extends RegExp to handle \p and \P Unicode property escapes, standalone and within character classes, for Unicode mode. The implementation extends the existing idea of tables of codepoint ranges to cover all of the properties explicitly required by ES262.

The genUnicodeTable.py script was updated to generate code related to the binary and non-binary Unicode properties explicitly mentioned in ES262 (ref); based on the Unicode 15.1.0 data files.

Closes #1027

Test Plan

Added regexp_unicode_properties.js to the Hermes test suite
- Binary properties and the General_Category properties
- Non-binary properties (Script=Latin, Script_Extensions=Thai)
- Inverted character class escapes \p{…} and \P{…}
- All of the above atom escapes in and out of character classes
- Exercise the parser for the above forms, and incomplete forms
test262 suite via hermes/utils/testsuite/run_testsuite.py
- I had to use --test-skiplist otherwise around 417 tests (in test262/test/built-ins/RegExp/property-escapes) are skipped with the reason "Skipping test with 'const'". This was confusing to me since most of these do pass, but I couldn't find an explainer for this skip.
- There are some failures here, at least one of which is caused by test262 not being updated for Unicode 15.1.0, which I see as test suite issue rather than an implementation issue.

jonathanj · 2024-02-03T13:08:35Z

(I'm including this outside of the PR description, in case that's used for automation purposes.)

Looking for Feedback

I have modified genUnicodeTable.py to fetch from HTTP URLs instead of FTP, because it is significantly faster and that starts to matter now that there are about a dozen fetches, just checking if there's a reason these were specifically FTP before?
I'm sure there is low-hanging performance fruit (I'm not a C++ dev), I mostly extended what already existed to achieve a working result, but am very keen to hear any ideas.
UnicodeData.inc is now several thousand lines longer than it was previously, because of all the category tables, but still far smaller than including ICU. I'm wondering if this comes as a shock, or mostly in line with what was expected to support Unicode properties?
The BuildingAndRunning.md docs suggest to use hermes/utils/format.sh to format all code, but that fails for me expecting clang-format version 12, which isn't in macOS Homebrew any more. I configured VS Code to use clang-format 17 (from Homebrew), but I'm not sure if the version difference will pose an issue?
The contribution guidelines suggest "squashing your commits", which I took to mean "squash your PR to a single commit", but I realise now that maybe it means "squash commits for related concepts". I'm happy to rebase and break this into a few commits for each piece of work, if it'll make things easier.

tmikov · 2024-02-05T23:00:20Z

Hi! Thank you for this contribution, it must have been a lot of work.

We would like to land this in Hermes, but since this is a lot of code, I hope you can bear through our review process, which can be very detail oriented.

At first glance I see some things that likely need to be tweaked, like the static constructors of std::unordered_map and std::string. Another concern is that at first glance it looks like this change adds 300KB of binary size, which is quite significant - a "lean" version of Hermes compiled for Arm32 is about 800KB, so this would be close to a 40% increase. (BTW, I suspect part of the increase is due to the static constructors, so it will likely go down.)

In any case, this is required ECMAScript functionality, so we are very willing to work with you to land this, and I hope you are willing to work with us.

The contribution guidelines suggest "squashing your commits", which I took to mean "squash your PR to a single commit", but I realise now that maybe it means "squash commits for related concepts". I'm happy to rebase and break this into a few commits for each piece of work, if it'll make things easier.

As a first step, I think it would be definitely very helpful if you break this into commits, to make it easier to review and iterate on each.

The contribution guideline to squash everything is generally well intentioned, because it assumes most contributions will be smaller and because Github does not provide the best UI for reviewing a stack of commits. But a sizable contribution like this really needs independent review of its parts.

jonathanj · 2024-02-06T07:05:14Z

We would like to land this in Hermes, but since this is a lot of code, I hope you can bear through our review process, which can be very detail oriented.

In any case, this is required ECMAScript functionality, so we are very willing to work with you to land this, and I hope you are willing to work with us.

I'm more than happy to go through your review process, and look forward to it!

At first glance I see some things that likely need to be tweaked, like the static constructors of std::unordered_map and std::string. Another concern is that at first glance it looks like this change adds 300KB of binary size

This is exactly the kind of feedback I was hoping for, thank you. I knew writing this that this wouldn't necessarily be the best approach, but it was fairly easy to understand and iterate on. I'd be glad to see this evolve into a better solution.

As a first step, I think it would be definitely very helpful if you break this into commits, to make it easier to review and iterate on each.

Of course, I'll get to that as soon as I can.

jonathanj · 2024-02-06T19:54:59Z

@tmikov I've restructured the original giant commit into several more digestible ones, I hope it helps make the review easier.

neildhar · 2024-02-13T23:22:02Z

Thanks for splitting this up, I think the first thing to do here is to try and reduce the size of the additional data so that we get a more realistic picture of the final size. The idea is to eliminate the need to emit static constructors to populate the data structures you're using, particularly unordered_map, vector, and string. Ideally, all of the data should be represented efficiently as constexpr data structures, optionally with some small runtime code to process them if necessary.

To pack all of the data into constexpr data structures, we can make the following transformations:

Replace any std::vector with a separate static constexpr array, since we know their size and contents statically. Then, wherever they are needed, we can use an llvh::ArrayRef to refer to them. (we may need to pull in that llvh header if it isn't already available)
Replace any std::string with a std::string_view. Since these strings are all pointing to constant data, a constexpr string_view referencing the constant string will be much more efficient than dynamically constructing the string.
Convert any unordered_map into a sorted array of key,value pairs. This allows the map to be encoded efficiently, and we can perform binary search to find an element.

I've demonstrated these transformations on a small snippet in https://godbolt.org/z/Mvr7qKxnT, showing how the current data structures can be changed into this format. Notice that the generated output is dramatically smaller, while encoding the same information.

As you make these changes, you can measure the effect they have on the final size. You can configure and build Hermes as follows:

cmake -S hermes -B build_opt -G Ninja -DCMAKE_BUILD_TYPE=MinSizeRel -DHERMES_ENABLE_DEBUGGER=OFF -DHERMES_IS_MOBILE_BUILD=ON
cd build_opt
ninja libhermes

To give you a rough sense of the numbers, the generated binary (on my M1 Mac) goes from 2.4MB before this PR to 2.6MB after it. Hopefully with these changes, the size gap will shrink considerably.

jonathanj · 2024-02-15T18:54:50Z

Thanks for the detailed feedback @neildhar, the godbolt output for those two snippets was really interesting! I've made the changes to genUnicodeTable.py and the utility functions to use the proposed data structures.

I ran optimized build you suggested for main, this branch after the changes, and this branch before the changes.

-rwxr-xr-x@ 1 jonathan  staff  1728920 Feb 15 18:30 hermesc    # main
-rwxr-xr-x@ 1 jonathan  staff  1957320 Feb 15 19:48 hermesc    # this branch (after changes)
-rwxr-xr-x@ 1 jonathan  staff  1978584 Feb 15 17:38 hermesc    # this branch (before changes)

The difference between main and this branch after the changes is 249664 bytes, which is about the difference you mentioned in your own builds. The difference between this branch before and after the changes is 21264 bytes, which is smaller than I expected (although I guess ~20KB of assembly just to initialize static data is a lot). Is this about the size difference you expected?

I ran some rough calculations for the size of the generated Unicode data:

# Rough number of 32-bit ints
$ cat hermes/lib/Platform/Unicode/UnicodeData.inc | grep -o '0x' | wc -l
   36742
# Bytes of data
$ echo '36742 * 8' | bc 
293936

Seems like the ~200KB increase in binary size roughly matches up with the amount of raw data that has been added.

As far as I can tell the code now matches your suggested structures, but I'd love to find out that I overlooked something obvious. What do you think the next steps are, trying to improve how the data is stored?

neildhar · 2024-02-15T21:47:08Z

Thanks for making these changes so quickly, and for being so receptive iterating on this! I agree that the improvement is somewhat smaller than I would have expected. The next step is probably to try and cut down dynamic relocations, it looks like they add tens of kilobytes to the size, and will also slow down startup.

This will be a slightly more involved transformation, since we're trying to have as few pointers as possible in the constant data. Right now, every string and array pointer in the data needs a dynamic relocation to set the actual pointer at runtime, based on where the library is loaded into memory.

The basic idea is to turn the array and string pointers into offsets instead of pointers. We need to merge all of the arrays of a given type into a single giant pool (same for strings). Then in each place where we currently have a pointer (like each string_view and ArrayRef), we would instead keep an offset into that large array, and a length. Depending on how large the arrays and elements are, we can use either 16 bit or 32 bit values.

This adds some marginal cost when accessing the data, but it represents the data much more efficiently (since the offsets will be smaller than pointers), and avoids the startup and size overhead of the dynamic relocations.

I repurposed the same example to demonstrate how this would look: https://godbolt.org/z/novqf4PdK

Note that the size win here will be less obvious in godbolt, but we should see an improvement in the final binary. I expect this to be on the same order as the code size reduction you observed from the previous change.

In the case of strings, it will also be worth doing some basic deduplication, since many of the strings are repeated.

This should get us fairly close to the bare minimum needed to encode this data, which will guide how to include this in the build, and then we can proceed with reviewing the actual logic.

jonathanj · 2024-02-24T00:09:41Z

Thank you again for the excellent feedback! Near the beginning of this change, having a single large pool did occur to me but I didn't know about dynamic reallocations, so I decided to avoid the complexity and would probably not have had the 3-stage lookup anyway.

Sorry about the longer turnaround on this iteration, between a busy day job and wrapping my head around how to restructure the Python logic, it just took me longer to get to. I ran the optimized build again and this is the new size:

-rwxr-xr-x@ 1 jonathan  staff  1880152 Feb 24 01:26 hermesc

Looks like a 75KB improvement, I ended up going with 16-bit values since none of the offset+size values were larger than 65535:

string_offset_bits: 12
string_size_bits: 5
range_pool_offset_bits: 15
range_pool_size_bits: 10
range_array_pool_offset_bits: 9
range_array_pool_size_bits: 3

Interesting (but maybe not useful) things:

clang-format is no longer reformatting the ranges into columns of 3, so the file is now ±20K LOC. If I avoid inserting comments into the range pools, it goes back to creating columns of 3 again and the file is down to ±6K LOC.
The original UNICODE_LETTERS, etc. tables (used in the lexer and other places) do duplicate around 500 ranges, they seem to contribute ~200 bytes to the binary size

jonathanj · 2024-03-05T15:37:15Z

@neildhar Are there any other changes you'd like me to make, prior to moving on to reviewing the logic?

neildhar · 2024-03-08T23:44:37Z

@jonathanj Sorry for the delay, nope, I'll take a look at the actual logic next.

Given the size regression, we'll also want to retain the option to disable this. There are two steps to doing this (which you can do in parallel while I review the rest):

Add a new CMake option HERMES_ENABLE_REGEXP_UNICODE_CHARACTER_CLASS that we can use to turn the feature off.
Dump whether this is available under Features: when you run hermes -version. Check for this in the test262 runner and conditionally exclude this feature when it is not available.

jonathanj · 2024-03-13T13:37:23Z

Thanks @neildhar!

Add a new CMake option HERMES_ENABLE_REGEXP_UNICODE_CHARACTER_CLASS that we can use to turn the feature off.

I ended up naming the option HERMES_ENABLE_UNICODE_REGEXP_PROPERTY_ESCAPES, I thought that aligned better with ES262's language and covers both atom and class escapes. Happy to change it if you feel I made a mistake. I think I defined the option with a default of ON, I'm not sure if that was the right default.

Do I need to specify this option in the gradle, podspec, Circle CI config, and other build files, like is done for HERMES_ENABLE_INTL or HERMES_ENABLE_DEBUGGER?

Dump whether this is available under Features: when you run hermes -version. Check for this in the test262 runner and conditionally exclude this feature when it is not available.

Done. There are still quite a few regexp-unicode-property-escapes test262 tests skipped because of the const issue mentioned in the PR description.

tmikov · 2024-03-14T02:53:50Z

@jonathanj I think we will keep this option always enabled for OSS releases (we are much more sensitive to binary size internally). If we set it on to default, there is no need to worry about gradle, podspec, etc. About CircleCI: I think we should also keep it on (though I am curious what @neildhar thinks).

neildhar · 2024-03-14T17:12:52Z

I agree, it seems simplest to turn this on by default, and not override it in any of our build configs. We can specify different defaults when building internally.

neildhar

My apologies for the delay in getting to this.

I've reviewed the C++ changes, which mostly look good. It's awesome that you were able to familiarise yourself with our RegExp implementation and the nuances of this feature!

I have one correctness concern, and some organisational suggestions that should hopefully simplify things.

doc/RegExp.md

lib/Regex/RegexParser.cpp

include/hermes/Platform/Unicode/CharacterProperties.h

lib/Regex/RegexParser.cpp

lib/Platform/Unicode/CharacterProperties.cpp

neildhar · 2024-03-28T01:14:08Z

lib/Platform/Unicode/CharacterProperties.cpp

+  }
+
+  // Canonicalize the property name.
+  auto canonicalNameEntry = findNameMapEntry(nameMapStart, nameMapEnd, key);


I'm finding it a little hard to follow the naming here between name, value, and key. Are there more precise terms we can use? Otherwise, a comment describing what each thing refers to would be helpful.

I can't say I blame you, I think it's confusing too and would really appreciate finding some better names. Maybe I can explain the terms, and you can decide if there are better names or if perhaps a comment is more useful.

The RegExp Unicode property syntax accepts:

Binary properties, where the presence of the property implies a value of true, e.g. ASCII_Hex , Emoji, etc.

Additionally, all of General_Category is also allowed to be specified as a binary property, e.g. Lowercase_Letter, N, Symbol, etc.

Properties with a value, where the value is not a binary choice, e.g. Scripts=Latin, Script_Extensions=Katakana, General_Category=Symbol, etc.

The "property name" and "property value" terminology comes from ES262.

The "key" terminology is my own, and is imprecise because it is used to resolve a name to its canonical form, by looking it up in canonicalPropertyNameMap_<CATEGORY>, where CATEGORY is determined as follows:

If there is no propertyValue (in the PropertyName=PropertyValue sense) given, then CATEGORY assumed to be BinaryProperty

If that assumption is incorrect, and it's not in the binary property category, then try GeneralCategory

If there is a propertyValue, then CATEGORY is determined by propertyName (e.g. canonicalPropertyNameMap_Script)

In both cases key needs to be resolved according to the canonical name of the value, but in the binary property case the "value" can be thought of something like General_Category=VALUE. I chose key in the sense of value = dict[key] but maybe a term like needle better conveys the "we need to find this" idea.

Makes sense. Perhaps the best thing to do is to compute canonicalNameEntry in each of the cases, so the logic is easier to follow. It would also avoid duplicating the lookup for binary properties. So for example:

NameMapEntry *canonicalNameEntry; llvh::ArrayRef<RangeMapEntry> rangeMap; if (propertyValue.empty()) { // There was no property value, this is either a binary property or a value // from General_Category, as per `LoneUnicodePropertyNameOrValue`. if(canonicalNameEntry = findNameMapEntry(canonicalPropertyNameMap_BinaryProperty, propertyName)) rangeMap = unicodePropertyRangeMap_BinaryProperty; else if(canonicalNameEntry = findNameMapEntry(canonicalPropertyNameMap_GeneralCategory, propertyName)) rangeMap = unicodePropertyRangeMap_GeneralCategory; } ...

That's a great suggestion, thanks! It definitely helps localise the initial setup, before moving onto the range lookups.

I did have to do findMapEntry(llvh::ArrayRef(…), …); to appease compiler, let me know if you have any thoughts on that.

lib/Platform/Unicode/CharacterProperties.cpp

neildhar · 2024-03-29T19:07:09Z

I'm working through the generation script now, and have some suggestions that would make it much easier to review:

Could we update the unicode version in a separate commit from all of the changes to the script? This will be a useful sanity check that the changes to the script do not change the data emitted for existing fields like UNICODE_LETTERS.
It would be very helpful to have high level comments in the script about the transformations that some of these functions do. Specifically, the format of the incoming data, and the structure that it produces.
The new output should use the same 3-column style for the data as the old output does.

Download a significant portion of the Unicode database in order to generate codepoint tables, and property name/value canonicalization, for the Unicode 15.1.0 binary and non-binary property names/aliases supported by ES262. Certain properties that are not explicitly defined in the Unicode database are derived here, such as "Assigned", and "Any".

Resolves the property name/value to a Unicode range table, then directly adds all entries from that table to a CodePointSet. It also handles the inverted case (i.e. `\P`) where all ranges _not_ in a table are added to the CodePointSet.

This uses the newly named Unicode range tables for the same functionality, tables such as `UNICODE_LETTER` are left as-is, since they are more efficient as they are for their purpose.

In Unicode mode, RegExp is now able to parse `\p{...}` and `\P{...}` atom escapes, both in and out of character classes. Outside of Unicode mode, `\p` and `\P` are unnecessary escapings of literal `p` and `P` respectively. A new error type `InvalidPropertyName` was added, which is returned for all property-name related parsing/resolving issues. This seems to be consistent with other JS engines.

lib/Platform/Unicode/CharacterProperties.cpp

This means a script extension range can be specified as a single `ArrayRef` that includes the range for the original script.

Now, the class atom contains the ArrayRef for the codepoint ranges, and the range is validated, and that range is then used separately to add the codepoints to the bracket. This simplifies a lot of the validation logic, and requires fewer conditional returns as a result.

neildhar · 2024-04-05T23:19:53Z

lib/Platform/Unicode/CharacterProperties.cpp

+/// Find a matching entry (such as \p NameMapEntry or \p RangeMapEntry) by
+/// matching a string \p name against the entry's \p name field.
+template <class T>
+const T *findNameMapEntry(


Let's rename this to match the description, and mark it as static.

Suggested change

const T *findNameMapEntry(

static const T *findMapEntry(

Good point, I've done that with the commit that fixes up the logic flow in unicodePropertyRanges.

neildhar · 2024-04-05T23:31:44Z

lib/Platform/Unicode/CharacterProperties.cpp

+  }
+
+  // Canonicalize the property name.
+  auto canonicalNameEntry = findNameMapEntry(nameMapStart, nameMapEnd, key);


Makes sense. Perhaps the best thing to do is to compute canonicalNameEntry in each of the cases, so the logic is easier to follow. It would also avoid duplicating the lookup for binary properties. So for example:

NameMapEntry *canonicalNameEntry; llvh::ArrayRef<RangeMapEntry> rangeMap; if (propertyValue.empty()) { // There was no property value, this is either a binary property or a value // from General_Category, as per `LoneUnicodePropertyNameOrValue`. if(canonicalNameEntry = findNameMapEntry(canonicalPropertyNameMap_BinaryProperty, propertyName)) rangeMap = unicodePropertyRangeMap_BinaryProperty; else if(canonicalNameEntry = findNameMapEntry(canonicalPropertyNameMap_GeneralCategory, propertyName)) rangeMap = unicodePropertyRangeMap_GeneralCategory; } ...

This helps localise the logic/assignment before proceeding to the general range lookup part.

jonathanj · 2024-04-23T22:00:59Z

@neildhar Just checking in if there's anything I still need to address, or can help with for this review.

neildhar · 2024-04-23T22:03:48Z

@jonathanj Thanks for the ping, there isn't anything outstanding, let me import it so we can more easily do a quick review of the python changes

facebook-github-bot · 2024-04-23T22:04:03Z

@neildhar has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

neildhar

Everything looks good, apart from the minor comments, the only real ask is to add a file-level doc-comment in genUnicodeTable.py explaining what the steps are in producing the final output. Once that's done, this should be good to merge.

Thank you for your patience with this process!

neildhar · 2024-04-08T22:38:36Z

utils/genUnicodeTable.py

+  unsigned offset:16;
+  unsigned size:16;


We shouldn't need a bitfield here and below, we can just use uint16_t

Good point. I've updated all of these to uint16_t.

neildhar · 2024-05-01T18:45:00Z

utils/genUnicodeTable.py

+        """
+        binary_property_aliases = {
+            canonical_name: []
+            for canonical_name in "ASCII ASCII_Hex_Digit Alphabetic Bidi_Control Bidi_Mirrored "


I think it would be preferable to just make this a list to begin with instead of creating and splitting a string. It can also be assigned to a separate variable beforehand to make the loop easier to read.

No problem, I've gone ahead and made that change.

facebook-github-bot · 2024-05-04T17:02:08Z

@jonathanj has updated the pull request. You must reimport the pull request before landing.

jonathanj · 2024-05-04T17:07:57Z

Everything looks good, apart from the minor comments, the only real ask is to add a file-level doc-comment in genUnicodeTable.py explaining what the steps are in producing the final output. Once that's done, this should be good to merge.

I added instructions to the end of the existing file-level comment, that explain invoking the script from the project working root, use with clang-format, and the redirect path. I think I sort of just figured this out, so I actually have no idea if this was the intended process! Let me know if I need to adjust it.

Thank you for your patience with this process!

Thank you for your patience with my submission, I'm very appreciative of the insightful comments and explanations along the way, I learned some interesting new things! It feels good to contribute to an exciting project like Hermes, with supportive maintainers.

facebook-github-bot · 2024-05-09T21:12:22Z

@neildhar has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

jonathanj · 2024-05-28T11:23:20Z

Hello @neildhar 👋

Just checking in if there's anything else I should do, perhaps rebase on main? The test-e2e check has been failing, as a result of a build issue, which seems to be unrelated to my changes.

neildhar · 2024-05-28T21:54:02Z

Hey @jonathanj, nothing at the moment, I've rebased it internally and it should get merged soon.

facebook-github-bot · 2024-06-07T03:06:59Z

@neildhar merged this pull request in dcf8e7b.

Summary: Original Author: jonathan@yoco.co.za Original Git: dcf8e7b Original Reviewed By: avp Original Revision: D56493540 Extends RegExp to handle `\p` and `\P` Unicode property escapes, standalone and within character classes, for Unicode mode. The implementation extends the existing idea of tables of codepoint ranges to cover all of the properties explicitly required by ES262. The `genUnicodeTable.py` script was updated to generate code related to the binary and non-binary Unicode properties explicitly mentioned in ES262 ([ref](https://tc39.es/ecma262/multipage/text-processing.html#sec-runtime-semantics-unicodematchproperty-p)); based on the Unicode 15.1.0 data files. Closes #1027 Pull Request resolved: #1295 Pulled By: neildhar Reviewed By: neildhar Differential Revision: D60361619 fbshipit-source-id: 70d7293114576d4b05f1dbc2d9b22e553bec089f

facebook-github-bot added the CLA Signed Do not delete this pull request or issue due to inactivity. label Feb 3, 2024

jonathanj force-pushed the 1027-unicode-property-escapes branch 2 times, most recently from 040070d to 187aa18 Compare February 6, 2024 18:27

jonathanj force-pushed the 1027-unicode-property-escapes branch 3 times, most recently from a7830ed to 62705ac Compare February 15, 2024 18:35

neildhar reviewed Mar 28, 2024

View reviewed changes

Jonathan Jacobs added 9 commits March 30, 2024 23:18

Generate Unicode data tables.

fe08a42

Bump Unicode Database version to 15.1.0

349bb05

Generate Unicode data tables.

fc556df

Replace Unicode digit and connector punctuation usage

8eceeba

This uses the newly named Unicode range tables for the same functionality, tables such as `UNICODE_LETTER` are left as-is, since they are more efficient as they are for their purpose.

Tests for Unicode property names in RegExp

3bd76fd

Update RegExp docs to include Unicode property name support

9b207c8

neildhar reviewed Apr 1, 2024

View reviewed changes

lib/Platform/Unicode/CharacterProperties.cpp Outdated Show resolved Hide resolved

Jonathan Jacobs added 4 commits April 3, 2024 17:59

Generate contiguous script and script extension ranges

10c1264

This means a script extension range can be specified as a single `ArrayRef` that includes the range for the original script.

Generate Unicode data tables

55a36b8

Add more test cases for script extensions

339f5c9

jonathanj force-pushed the 1027-unicode-property-escapes branch from d2e98a1 to 339f5c9 Compare April 3, 2024 20:25

neildhar reviewed Apr 5, 2024

View reviewed changes

Make the logical flow of unicodePropertyRanges more linear

8ce3a2f

This helps localise the logic/assignment before proceeding to the general range lookup part.

opiation mentioned this pull request Apr 25, 2024

Make @bonfhir/core compatible with react-native / hermes bonfhir/bonfhir#139

Open

neildhar reviewed May 1, 2024

View reviewed changes

Jonathan Jacobs added 3 commits May 4, 2024 18:48

Replace unnecessary bitfields with standard types

64a18e7

Expand split string of binary property names to a literal list

b372992

Add instructions for the use of genUnicodeTable.py

3b223ab

facebook-github-bot closed this in dcf8e7b Jun 7, 2024

facebook-github-bot added the Merged label Jun 7, 2024

neildhar mentioned this pull request Jul 20, 2024

SyntaxError: Invalid RegExp: Invalid escape, js engine: hermes #1460

Closed

Movsar-Khalakhoev mentioned this pull request Aug 17, 2024

Improve regexes fabian-hiller/valibot#666

Merged

UziTech mentioned this pull request Nov 11, 2024

Marked@5.1.0 crashes in Hermes (React Native) environment markedjs/marked#2843

Closed

UziTech mentioned this pull request Nov 21, 2024

fix: update punctuation regex syntax for compatibility markedjs/marked#3540

Merged

5 tasks

Add Unicode property escape support to RegExp #1295

Add Unicode property escape support to RegExp #1295

Conversation

jonathanj commented Feb 3, 2024 • edited Loading

Summary

Test Plan

jonathanj commented Feb 3, 2024 • edited Loading

Looking for Feedback

tmikov commented Feb 5, 2024 • edited Loading

jonathanj commented Feb 6, 2024

jonathanj commented Feb 6, 2024

neildhar commented Feb 13, 2024

jonathanj commented Feb 15, 2024 • edited Loading

neildhar commented Feb 15, 2024

jonathanj commented Feb 24, 2024 • edited Loading

jonathanj commented Mar 5, 2024

neildhar commented Mar 8, 2024

jonathanj commented Mar 13, 2024

tmikov commented Mar 14, 2024

neildhar commented Mar 14, 2024

neildhar left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

neildhar commented Mar 29, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jonathanj commented Apr 23, 2024

neildhar commented Apr 23, 2024

facebook-github-bot commented Apr 23, 2024

neildhar left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

facebook-github-bot commented May 4, 2024

jonathanj commented May 4, 2024

facebook-github-bot commented May 9, 2024

jonathanj commented May 28, 2024

neildhar commented May 28, 2024

facebook-github-bot commented Jun 7, 2024

jonathanj commented Feb 3, 2024 •

edited

Loading

jonathanj commented Feb 3, 2024 •

edited

Loading

tmikov commented Feb 5, 2024 •

edited

Loading

jonathanj commented Feb 15, 2024 •

edited

Loading

jonathanj commented Feb 24, 2024 •

edited

Loading