Skip to content

regex cheat sheet

holzkohlengrill edited this page Dec 15, 2023 · 5 revisions

Regex Cheat Sheet

  • Escape special characters with a prepending \
  • Greediness
    • Regexes are per se greedy; meaning as many as possible characters will be matched while still satisfying the regex pattern
    • Appending ? to quantifiers results in non-greediness

Basic Regex

Pattern Description
. Any character
^ Beginning of line
$ EOL
[a-c8] Characters a, b, c OR 8
[^chars] Any character except c, h, a, r, s
( ) Capture group
( a ( b ) c ) Nested capture group >> \1 = abc; \2 = b
( a )? Optional capture group; Abc?a Matches Abc and Abca

Quantifier

Pattern Description
* 0 or more
+ 1 or more
{N} N occurences
{N, M} M to N occurences If omitted: N = 0; M = inf.
{N, M}? M to N as few as possible
? 0 OR 1

Extended Regex

Pattern Description
\w [a-zA-Z0-9_] (alphanumeric)
\W [^a-zA-Z0-9_] (non-alphanumeric)
\d [0-9] (digit)
\D [^0-9] (non-digit)
\b Empty string (@ word boundary (between \w and \W))
\B Empty string (not at word boundary)
\s [\t\n\r\f\v] (whitespace)
\S [^\t\n\r\f\v] (non-whitespace)
\A Beginning of string
\Z End of string
\g<id> Previously defined group
R|S Regex R OR S

PyRegex Extensions

Pattern Description
(?:...) Non-capturing group (match but do not use)
(?\<name>A) Define named group; A = Regex, <name> = callable name
(?P\<name>A) Same as before; first does not always work
(?P...) Match any named group
(?#...) Comment (use for documentation)
(?=...) Lookahead; matches without consuming
(?!...) Negative lookahead
(?<=...) Lookbehind; matches without consuming
(?<!...) Negative lookbehind
(?(A)B|C) 'B' if A matched, else 'B'

Search & Replace / Reference a Group

Pattern Description
\1, \2, ... \n Backreference; Get match of n-th capturing group

Search & replace in some IDEs

You can even backreference capture groups in find and use them in replace. In some IDEs backreferencing differs:

  • PyCharm: $n instead of \n
  • Notepad++: \n

Exemplary Basic Regex Workflow in Python

  1. re.compile()
  2. re.search()
  3. match.groups() or match.group(<group_name>)
import re

# "Normal" synthax
pattModuleSummary = re.compile(r"[0-9a-f]{8}")      # Matches 8 chars long hex numbers

# Find and print matches
for line in lines:
    match = re.search(pattModuleSummary, line)
    # Check if we have at least one match
    if match:
        # Print matched groups
        print(match.groups())

Exemplary Extended Regex Workflow in Python

Comment + multiline synthax (ignores whitespaces and (python) comments):

import re

pattModuleSummary = re.compile(r"""
([0-9a-f]{8})           # Origin
(?:\+{1})([0-9a-f]{8})  # Size
""", re.X)              # <-- re.X is important!!

# Find and print matches
for line in lines:
    match = re.search(pattModuleSummary, line)
    # Check if we have at least one match
    if match:
        # Print matched groups
        print(match.groups())

re.X is neccesary if you want to use the multiline re.compile synthax.

Exemplary Regex Workflow in Python with Named Capture Groups

import re
pattern1 = re.compile('^(?P<addr>[0-9a-f]{8,16})\+(?P<size>[0-9a-f]{8,})$')
match = pattern1.search(line)

match.group('addr')     # References only the group `addr`
Clone this wiki locally