Proposal for adding a RegExp.escape
method to the ECMAScript standard.
This proposal is a stage 0 (strawman) proposal and is awaiting implementation and more input. Please see the issues on how to get involved.
It is often the case when we want to build a regular expression out of a string without treating special characters from the string as special regular expression tokens. For example, if we want to replace all occurrences of the the string let text = "Hello."
which we got from the user, we might be tempted to do ourLongText.replace(new RegExp(text, "g"))
. However, this would match .
against any character rather than matching it against a dot.
This is commonly-desired functionality, as can be seen from this years-old es-discuss thread. Standardizing it would be very useful to developers, and avoid subpar implementations they might create that could miss edge cases.
We propose the addition of an RegExp.escape
function, such that strings can be escaped in order to be used inside regular expressions:
var str = prompt("Please enter a string");
str = RegExp.escape(str);
alert(ourLongText.replace(new RegExp(str, "g")); // handles reg exp special tokens with the replacement.
RegExp.escape("The Quick Brown Fox"); // "The Quick Brown Fox"
RegExp.escape("Buy it. use it. break it. fix it.") // "Buy it\. use it\. break it\. fix it\."
RegExp.escape("(*.*)"); // "\(\*\.\*\)"
RegExp.escape("。^・ェ・^。") // "。\^・ェ・\^。"
RegExp.escape("😊 *_* +_+ ... 👍"); // "😊 \*_\* \+_\+ \.\.\. 👍"
RegExp.escape("\d \D (?:)"); // "\\d \\D \(\?\:\)"
The list of escaped identifiers should be kept in sync with what the regular expression grammar considers to be syntax characters that need escaping. For this reason, instead of hard-coding the list of escaped characters, we escape characters that are recognized as SyntaxCharacter
s by the engine. For example, if regexp comments are ever added to the specification (presumably under a flag), this ensures that they are properly escaped.
- Perl: quotemeta(str)
- PHP: preg_quote(str)
- Python: re.escape(str)
- Ruby: Regexp.escape(str)
- Java: Pattern.quote(str)
- .NET: Regex.Escape(str)
Note that the languages differ in what they do (e.g. Perl does something different from C#), but they all have the same goal.
We've had a meeting about this subject, whose notes include a more detailed writeup of what other languages do, and the pros and cons thereof.
-
Why not escape every character?
Other languages that have done this regretted this choice because of the readability impact and string size. More imformation on why other languages have moved from this in the data folder under /other_languages.
-
Why is each escaped character escaped?
See the EscapedChars.md file for a detailed per-character description.
-
What about the
/
character?Empirical data has been collected (see the /data folder) from about a hundred thousand code bases (most popular sites, most popular packages, most depended on packages and Q&A sites) and it was found out that its use case (for
eval
) was not common enough to justify addition. -
How is Unicode handled?
This proposal deals with code points and not code units, so further extensions and dealing with Unicode is done.
-
What about
RegExp.unescape
?While some other languages provide an unescape method we choose to defer discussion about it to a later point, mainly because no evidence of people asking for it has been found (while
RegExp.escape
is commonly asked for). -
How does this relate to EscapeRegExpPattern?
EscapeRegExpPattern (as the name implies) takes a pattern and escapes it so that it can be represented as a string. What
RegExp.escape
does is take a string and escapes it so it can be literally represented as a pattern. The two do not need to share an escaped set and we can't use one for the other. We're discussing renaming EscapeRegExpPattern in the spec in the future to avoid confusion for readers. -
Why don't you do X?
If you believe there is a concern that was not addressed yet, please open an issue.