The object obtained from building a regular expressions builders. Builders are augmented with members and methods to build the regex further, but they're basically immutable objects as every call to extend the builder returns a new builder instance.
All the following properties are read-only.
Type | Name | Description |
---|---|---|
string | regex |
The regular expression defined by the builder. It's compiled the first time the property is requested, then cached |
string | source |
The source of the underlying regular expression. Used to compile it |
string | flags |
A string comprising the regex' flags. It may include one or more of the letters "g" , "m" , "i" , "u" or "y" |
boolean | global |
The regex' global flag |
boolean | ignoreCase |
The regex' ignoreCase flag |
boolean | multiline |
The regex' multiline flag |
boolean | unicode |
The regex' unicode flag |
boolean | sticky |
The regex' sticky flag |
Returns | Name | Description |
---|---|---|
RegExp |
toRegExp() |
Basically, returns the regex property |
RegExp |
valueOf() |
See above |
string | toString() |
Returns a string representation |
boolean | test(string) |
Uses the underlying regex to test a string. Short for .regex.test(...) |
array | exec(string) |
Executes the underlying regex on a string. Short for .regex.exec(...) |
string | replace(string, string/function) |
Uses the underlying regex to perform a regex-based replacement. Short for string.replace(regex, ...) |
array | split(string) |
Uses the underlying regex to perform a regex-based string split. Short for string.split(regex) |
number | search(string) |
Uses the underlying regex to perform a string search. Short for string.search(regex) |
Regex building begins from the he RE
object returned by the module. You can obtain a builder every time you use "words" like digit
, then
and such. Some of these words act like functions (like atLeast
and codePoint
), some like properties (like digit
and theEnd
), some work as both.
In this last case, if the word is not used as a function, additional words are expected to obtain a builder:
var foo = RE.matching.digit.then.alphaNumeric;
Many words that can (or must) be used as functions accept a variable number of arguments, that can be either strings, or regular expressions, or builders, which are all appended to the source. Strings are backslash-escaped, while in the other cases the source
property is then added unescaped:
var amount = RE.oneOrMore.digit.then(".").then.digit.then.digit,
currency = /[$€£]/;
var builder = RE.matching.theStart
.then("Total: ", amount, currency)
.then.theEnd;
Other words that work as functions only usually accept other types of arguments.
The flags of a builder (and its underlying regular expression) can be set using words starting from the RE
object. After one of these words, another flag word or matching
must follow, with the exception of withFlags
that must be followed by matching
only.
-
globally
Set the
global
flag on. -
anyCase
Set the
ignoreCase
flag on. -
fullText
Set the
multiline
flag on. -
stickily
Set the
sticky
flag on. -
withUnicode
Set the
unicode
flag on. -
withFlags(flags)
Set multiple flags.
flags
is expected to be a string containing letters in the set"g"
,"m"
,"i"
and"y"
.
Conjunctions append additional blocks to the current source. They can follow any open or set block.
-
then
Appends a block to the current source.
-
or
Adds an alternative block (prefixed by the pipe
|
character in regular expressions).
These words can be used in both "open" sequences or inside character sets. They can be used after conjunction words, or a quantifier, or the matching
word, or the RE
object itself, or the and
word joining blocks in character sets.
-
digit
/not.digit
A digit character (
\d
) or its negation (\D
). -
alphaNumeric
/not.alphaNumeric
An alphanumeric character plus the undescore (
\w
) or its negation (\W
). -
whiteSpace
/not.whisteSpace
A whitespace (
\s
) or its negation (\W
). -
cReturn
\r
-
newLine
\n
-
tab
\t
-
vTab
\v
-
formFeed
\f
-
null
\0
-
slash
\/
-
backslash
\\
-
ascii(code)
An ASCII escape sequence (
\xhh
).code
must be an integer between 0 and 255. It it then converted as two hexadecimal digits in the sequence. -
codePoint(code, ...)
An Unicode escape sequence (
\uhhhh
, or\u{hhhhh}
with theunicode
flag set and with a code not from the Basic Multilingual Plane).code
must be an integer between 0 and 1114111 (0x10ffff
) or aRangeError
will be thrown; or it can be a string, whose code points will be converted in the corresponding Unicode escape sequence. Keep in mind that code points from astral planes, when theunicode
flag is not set, are encoded in the corresponding surrogate code point pairs (e.g.:"🍰"
will become"\ud83c\udf70"
): it is your duty to wrap the pairs in a group if needed or, when it's not possible (for example, in a character range) using an adequate regex structure. -
control(letter)
A control sequence (
\cx
).letter
must be a string of a single letter. It is then converted to uppercase in the sequence.
These words can be used in open block sequences only (which means, not inside character sets). They can be used after conjunction words, or a quantifier, or the matching
word, or the RE
object itself.
-
anyChar
The universal character (
.
). -
theStart
/theEnd
The string-start and string-end boundaries (
^
and$
, respectively). -
wordBoundary
/not.wordBoundary
A word boundary (
\b
) or its negation (\B
). -
oneOf
/not.oneOf
Appends a character set (
[...]
or[^...]
, respectively). See the paragraph about character sets. -
group(...)
Non-capturing group -
(?:...)
. Used as functions only. Arguments can be strings, regexes or builders. -
capture(...)
Capturing group -
(...)
. Used as functions only. Arguments can be strings, regexes or builders. -
reference(number)
Group backreference (
\number
).number
should be a positive integer.
Character sets are introduced by the oneOf
word, and may include one or more blocks separated by the and
word (e.g.: RE.oneOf.digit.and("abcdef")
).
These words can be used in character sets only:
-
range(start, end)
Adds a character interval into the character set (
[...start-end...]
).start
andend
are supposed to be strings of single characters defining the boundaries of the character range; or they can be builders that define one single character, or character class usable in character ranges (which include:ascii
,unicode
,control
,newLine
,cReturn
,tab
,vTab
,formFeed
,null
). -
backspace
The backspace character,
\b
(U+0008). Not to be confused with the word boundary, which can be used as an "open" block only.
Quantifiers can follow conjunction words, or the matching
word, or the RE
object itself, and can precede any "open" block, with the exception of wordBoundary
, not.wordBoundary
, theStart
and theEnd
.
They can be prefixed by lazily
to define a lazy quantifier, instead of a greedy one.
Quantifiers can be used as functions, and accept strings, regexes or builders as arguments. A convenient group wrap will be used if necessary:
var foo = RE.oneOrMore("a"); // /a+/
var bar = RE.oneOrMore("abc"); // /(?:abc)+/
-
anyAmountOf
*
-
oneOrMore
+
-
noneOrOne
?
-
atLeast(n)
n
must be a non-negative integer. Ifn
is 0, a*
is produced; ifn
is 1, then+
is produced; else, the quantifier is{n,}
. -
atMost(n)
n
must be a non-negative integer. Ifn
is 1, then?
is produced; else, the quantifier is{,n}
. -
exactly(n)
n
must be a non-negative integer. Ifn
is 1, then no quantifier is defined; else, the quantifier is{n}
. -
between(n, m)
n
andm
must be non-negative integers. If the the values are adequate, the produced quantifier can be one of the above; otherwise, the quantifier is{n,m}
.
-
followedBy(...)
/not.followedBy(...)
Appends a look-ahead (
(?=...)
or(?!...)
, respectively). Used as functions only. Arguments can be strings, regexes or builders.Can follow any open block, or the
matching
word, or theRE
object itself, or theor
conjunction.