A high-level intermediate representation of regex in Python.
Requires Python >=3.10. Also requires the unicategories
library for easy access to categorised Unicode characters.
This library constructs an intermediate representation of the regex AST created by the built-in re
module. This functions similary to the Rust regex_syntax
crate, which was completely the inspiration for this module.
All of the syntax supported by re
is supported by this module.
import regex_hir
hir = regex_hir.hir(r"[abc]")
hir.dumps()
# CharacterClass(
# [
# CharacterRange(start=97, end=97)
# CharacterRange(start=98, end=98)
# CharacterRange(start=99, end=99)
# ]
# negate=False
# ignore_case=False
# )