Skip to content

Compiler targeting a regular expression virtual machine (for educational purposes only)

Notifications You must be signed in to change notification settings

irskep/regex_compiler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Rajax: Regex Bytecode Compiler

This module turns regular expressions into bytecode for execution on the virtual machine invented by Ken Thompson and described by Russ Cox. It also produces a Graphviz representation of the abstract syntax tree (with some nodes removed).

Some of the VM instructions (NCHAR, WILD) are not documented by Cox. NCHAR means "match anything but this" and WILD is the wildcard.

Usage

Usage: Compile a regular expression into NFA instructions

Options:
  -h, --help            show this help message and exit
  -d DOT, --dot=DOT     Write the AST as a Graphviz dot file
  -f FORMAT, --format=FORMAT
                        Output format, either "pretty" or "json"
  -j, --json            Alias for --format=json
  -p PDF, --pdf=PDF     Write the AST as a PDF file. Implies
                        --dot=FILENAME.dot.
  -v, --verbose         Print debugging information

Install

pip install -e git+https://github.com/cwru-compilers/regex_compiler.git#egg=rajax

Example

python -m rajax ab[\D8]+

Tokens:
'ORD_CHAR' 'a'
'ORD_CHAR' 'b'
'LBRACK' '['
'ES_SPECIAL' 'D'
'ORD_CHAR' '8'
'RBRACK' ']'
'PLUS' '+'
-------
Graphviz written to AST.dot
PDF written to AST.pdf
-------
Instructions for VM:
 0: char    a ZERO
 1: char    b ZERO
 2: split   3   5
 3: split   4   7
 4: jmp     9
 5: char    : INF
 6: jmp    10
 7: char  ZERO   /
 8: jmp    10
 9: char    8 ZERO
10: split   2  11
11: match

AST.dot will look like this:

digraph AST {
    node [shape=box];
    {
        rank=same; 
        1 [label="re_expr_concat: ", style=filled, fillcolor="#eeeeee"];
    }
    {
        rank=same; 
        2 [label="re_expr_concat: ", style=filled, fillcolor="#eeeeee"];
        5 [label="simple_re_dup: +", style=filled, fillcolor="#ccffcc"];
    }
    {
        rank=same; 
        3 [label="nondup_re_char: a", style=filled, fillcolor="#ffffcc"];
        4 [label="nondup_re_char: b", style=filled, fillcolor="#ffffcc"];
        6 [label="brack_expr_matching_list: ", style=filled, fillcolor="#ffffff"];
    }
    {
        rank=same; 
        7 [label="follow_list_multi: ", style=filled, fillcolor="#cccccc"];
    }
    {
        rank=same; 
        8 [label="char_class: D", style=filled, fillcolor="#ffcccc"];
        9 [label="end_range_char: 8", style=filled, fillcolor="#ccccff"];
    }
    1 -> 2;
    1 -> 5;
    2 -> 3;
    2 -> 4;
    5 -> 6;
    6 -> 7;
    7 -> 8;
    7 -> 9;
}

About

Compiler targeting a regular expression virtual machine (for educational purposes only)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages