Skip to content

Latest commit

 

History

History
225 lines (197 loc) · 6.2 KB

README.md

File metadata and controls

225 lines (197 loc) · 6.2 KB

Consecutive integer match patterns

This came up in a macro that wanted to take a comma-separated sequence of expressions like themacro!('A', 'B', f()) and emit a match expression indexed by position in the sequence:

match VALUE {
    0 => 'A',
    1 => 'B',
    2 => f(),
    _ => unimplemented!(),
}

As a macro_rules macro, a core limitation was that we can't make identifiers dynamically, so the generated code would be limited to using some fixed number of identifiers regardless of how many expressions are in the macro input.

In the actual use case, this match was just one part of a more complicated macro; we wouldn't necessarily want a macro for doing literally what is described here by itself.


Rejected solutions

Procedural macro. The whole thing could have been made a procedural macro instead. A procedural macro would be able to emit exactly a match expression as shown above. However the stable Rust compiler does not yet support calling procedural macros in expression position, so the procedural macro would have needed to be restricted to nightly only. Also it would mean pulling in some extra dependencies for parsing.

Change input syntax. The input syntax for the macro could have been changed to require the caller to pass their own counter in the input: something like themacro!((0, 'A'), (1, 'B'), (2, f())). This makes things easy for the macro implementation but at the expense of the caller, which was the wrong tradeoff. Here is what that would look like implemented:

// Force caller to provide their own counter.
macro_rules! themacro {
    ($(($i:pat, $e:expr)),*) => {
        match VALUE {
            $($i => $e,)*
            _ => unimplemented!(),
        }
    };
}

Good solutions

If-else chain. We can make the macro expand to a chain of if-else comparisons structured like this, with a counter in a local variable:

{
    let _value = VALUE;
    let mut _i = 0;
    if {
        let eq = _value == _i;
        _i += 1;
        eq
    } {
        $e
    } else if {
        let eq = _value == _i;
        _i += 1;
        eq
    } {
        $e
    } else if {
        let eq = _value == _i;
        _i += 1;
        eq
    } {
        $e
    } else {
        unimplemented!()
    }
}

The conditions of the if are equivalent to _value == _i++ except that unary increment does not exist in Rust.

The leading underscore in the local variables _value and _i is meaningful in that it suppresses some of the compiler's lints on unused variables, unused assignment, and unused mut. If the caller's sequence of expressions is empty, then _value and _i are never read and _i is never mutated. If the caller's sequence of expressions is nonempty, the value written to _i by the last _i += 1 is never read. We could alternatively use #[allow(unused_variables, unused_mut, unused_assignments)] but placing these attributes in a way that they apply correctly to the macro-generated local variables but not to the caller's $e expressions makes things more complicated.

Notice that the way the if-else chain is structured there is a clear chunk of repeating tokens -- each if through the following else. That repeating structure makes it very easy for this to be generated from a macro_rules macro in one step of expansion.

macro_rules! themacro {
    ($($e:expr),*) => {{
        let value = VALUE;
        let mut i = 0;
        $(
            if {
                let eq = value == i;
                i += 1;
                eq
            } {
                $e
            } else
        )* {
            unimplemented!()
        }
    }};
}

Const counter. In some situations we may really want to stick with a match expression rather than an if-else chain, for example if the value being matched is just part of a larger data structure and we need to bind other parts of the data structure by-move in the same match.

We can't expand to a match in which the patterns are integer literals 0, 1, 2 etc as shown in the introduction, at least not while supporting an arbitrary number of input expressions, because macro_rules can only copy and paste tokens around, never come up with new tokens. If the caller passes 9999 input expressions, there wouldn't be any way for a macro_rules macro to conjure up a 9998 integer literal token to place in the output.

We also can't expand to arithmetic patterns because this is not legal Rust syntax.

match VALUE {
    0 => $e,
    0 + 1 => $e,
    0 + 1 + 1 => $e,
    ...
}

Instead we will make generated code that looks like this:

{
    mod m {
        pub const X: usize = 0;
        pub mod m {
            pub const X: usize = super::X + 1;
            pub mod m {
                pub const X: usize = super::X + 1;
            }
        }
    }
    match VALUE {
        m::X => $e,
        m::m::X => $e,
        m::m::m::X => $e,
        _ => unimplemented!(),
    }
}

The nested modules here provide a way to avoid needing unique names for each const, which macro_rules wouldn't be able to create.

Figuring out the right generated code is the hard part. The macro implementation ends up being an unremarkable tt-muncher macro that produces one layer of the nesting at a time.

macro_rules! themacro {
    ($($v:expr),*) => {
        $crate::themacro_helper! {
            path: (m::X)
            def: ()
            arms: ()
            $($v),*
        }
    };
}

macro_rules! themacro_helper {
    (
        path: ($($path:tt)*)
        def: ($($def:tt)*)
        arms: ($(($i:pat, $v:expr))*)
    ) => {{
        #[allow(dead_code)]
        mod m {
            pub const X: usize = 0;
            $($def)*
        }
        match VALUE {
            $(
                $i => $v,
            )*
            _ => unimplemented!(),
        }
    }};
    (
        path: ($($path:tt)*)
        def: ($($def:tt)*)
        arms: ($(($i:pat, $v:expr))*)
        $next:expr $(, $rest:expr)*
    ) => {
        $crate::themacro_helper! {
            path: (m::$($path)*)
            def: (pub mod m { pub const X: usize = super::X + 1; $($def)* })
            arms: ($(($i, $v))* ($($path)*, $next))
            $($rest),*
        }
    };
}