Skip to content

Latest commit

 

History

History
219 lines (130 loc) · 8.89 KB

reference.md

File metadata and controls

219 lines (130 loc) · 8.89 KB

RE-Build reference

RegExp builders

The object obtained from building a regular expressions builders. Builders are augmented with members and methods to build the regex further, but they're basically immutable objects as every call to extend the builder returns a new builder instance.

Properties

All the following properties are read-only.

Type Name Description
string regex The regular expression defined by the builder. It's compiled the first time the property is requested, then cached
string source The source of the underlying regular expression. Used to compile it
string flags A string comprising the regex' flags. It may include one or more of the letters "g", "m", "i", "u" or "y"
boolean global The regex' global flag
boolean ignoreCase The regex' ignoreCase flag
boolean multiline The regex' multiline flag
boolean unicode The regex' unicode flag
boolean sticky The regex' sticky flag

Methods

Returns Name Description
RegExp toRegExp() Basically, returns the regex property
RegExp valueOf() See above
string toString() Returns a string representation
boolean test(string) Uses the underlying regex to test a string. Short for .regex.test(...)
array exec(string) Executes the underlying regex on a string. Short for .regex.exec(...)
string replace(string, string/function) Uses the underlying regex to perform a regex-based replacement. Short for string.replace(regex, ...)
array split(string) Uses the underlying regex to perform a regex-based string split. Short for string.split(regex)
number search(string) Uses the underlying regex to perform a string search. Short for string.search(regex)

Building a regex

Regex building begins from the he RE object returned by the module. You can obtain a builder every time you use "words" like digit, then and such. Some of these words act like functions (like atLeast and codePoint), some like properties (like digit and theEnd), some work as both.

In this last case, if the word is not used as a function, additional words are expected to obtain a builder:

var foo = RE.matching.digit.then.alphaNumeric;

Many words that can (or must) be used as functions accept a variable number of arguments, that can be either strings, or regular expressions, or builders, which are all appended to the source. Strings are backslash-escaped, while in the other cases the source property is then added unescaped:

var amount = RE.oneOrMore.digit.then(".").then.digit.then.digit,
    currency = /[$€£]/;

var builder = RE.matching.theStart
                .then("Total: ", amount, currency)
                .then.theEnd;

Other words that work as functions only usually accept other types of arguments.

Flags

The flags of a builder (and its underlying regular expression) can be set using words starting from the RE object. After one of these words, another flag word or matching must follow, with the exception of withFlags that must be followed by matching only.

  • globally

    Set the global flag on.

  • anyCase

    Set the ignoreCase flag on.

  • fullText

    Set the multiline flag on.

  • stickily

    Set the sticky flag on.

  • withUnicode

    Set the unicode flag on.

  • withFlags(flags)

    Set multiple flags. flags is expected to be a string containing letters in the set "g", "m", "i" and "y".

Conjunctions

Conjunctions append additional blocks to the current source. They can follow any open or set block.

  • then

    Appends a block to the current source.

  • or

    Adds an alternative block (prefixed by the pipe | character in regular expressions).

Open and set blocks

These words can be used in both "open" sequences or inside character sets. They can be used after conjunction words, or a quantifier, or the matching word, or the RE object itself, or the and word joining blocks in character sets.

  • digit / not.digit

    A digit character (\d) or its negation (\D).

  • alphaNumeric / not.alphaNumeric

    An alphanumeric character plus the undescore (\w) or its negation (\W).

  • whiteSpace / not.whisteSpace

    A whitespace (\s) or its negation (\W).

  • cReturn \r

  • newLine \n

  • tab \t

  • vTab \v

  • formFeed \f

  • null \0

  • slash \/

  • backslash \\

  • ascii(code)

    An ASCII escape sequence (\xhh). code must be an integer between 0 and 255. It it then converted as two hexadecimal digits in the sequence.

  • codePoint(code, ...)

    An Unicode escape sequence (\uhhhh, or \u{hhhhh} with the unicode flag set and with a code not from the Basic Multilingual Plane). code must be an integer between 0 and 1114111 (0x10ffff) or a RangeError will be thrown; or it can be a string, whose code points will be converted in the corresponding Unicode escape sequence. Keep in mind that code points from astral planes, when the unicode flag is not set, are encoded in the corresponding surrogate code point pairs (e.g.: "🍰" will become "\ud83c\udf70"): it is your duty to wrap the pairs in a group if needed or, when it's not possible (for example, in a character range) using an adequate regex structure.

  • control(letter)

    A control sequence (\cx). letter must be a string of a single letter. It is then converted to uppercase in the sequence.

Open-only blocks

These words can be used in open block sequences only (which means, not inside character sets). They can be used after conjunction words, or a quantifier, or the matching word, or the RE object itself.

  • anyChar

    The universal character (.).

  • theStart / theEnd

    The string-start and string-end boundaries (^ and $, respectively).

  • wordBoundary / not.wordBoundary

    A word boundary (\b) or its negation (\B).

  • oneOf / not.oneOf

    Appends a character set ([...] or [^...], respectively). See the paragraph about character sets.

  • group(...)

    Non-capturing group - (?:...). Used as functions only. Arguments can be strings, regexes or builders.

  • capture(...)

    Capturing group - (...). Used as functions only. Arguments can be strings, regexes or builders.

  • reference(number)

    Group backreference (\number). number should be a positive integer.

Character sets

Character sets are introduced by the oneOf word, and may include one or more blocks separated by the and word (e.g.: RE.oneOf.digit.and("abcdef")).

These words can be used in character sets only:

  • range(start, end)

    Adds a character interval into the character set ([...start-end...]). start and end are supposed to be strings of single characters defining the boundaries of the character range; or they can be builders that define one single character, or character class usable in character ranges (which include: ascii, unicode, control, newLine, cReturn, tab, vTab, formFeed, null).

  • backspace

    The backspace character, \b (U+0008). Not to be confused with the word boundary, which can be used as an "open" block only.

Quantifiers

Quantifiers can follow conjunction words, or the matching word, or the RE object itself, and can precede any "open" block, with the exception of wordBoundary, not.wordBoundary, theStart and theEnd.

They can be prefixed by lazily to define a lazy quantifier, instead of a greedy one.

Quantifiers can be used as functions, and accept strings, regexes or builders as arguments. A convenient group wrap will be used if necessary:

var foo = RE.oneOrMore("a");   // /a+/
var bar = RE.oneOrMore("abc"); // /(?:abc)+/
  • anyAmountOf *

  • oneOrMore +

  • noneOrOne ?

  • atLeast(n)

    n must be a non-negative integer. If n is 0, a * is produced; if n is 1, then + is produced; else, the quantifier is {n,}.

  • atMost(n)

    n must be a non-negative integer. If n is 1, then ? is produced; else, the quantifier is {,n}.

  • exactly(n)

    n must be a non-negative integer. If n is 1, then no quantifier is defined; else, the quantifier is {n}.

  • between(n, m)

    n and m must be non-negative integers. If the the values are adequate, the produced quantifier can be one of the above; otherwise, the quantifier is {n,m}.

Look-aheads

  • followedBy(...) / not.followedBy(...)

    Appends a look-ahead ((?=...) or (?!...), respectively). Used as functions only. Arguments can be strings, regexes or builders.

    Can follow any open block, or the matching word, or the RE object itself, or the or conjunction.