Fast pattern matching on port graphs.
This crate is designed to find embeddings of port graphs fast, in the case of many patterns. Patterns are preprocessed to make matching faster and compiled into a graph trie data structure similar to the finite automata used for string matching and regular expressions.
The crate exports several Matcher
instances that can be used for pattern matching.
The principal object of interest in this crate is the ManyMatcher
object.
The other Matcher
s serve as baselines for benchmarking purposes.
SinglePatternMatcher
matches a single pattern
in time O(nm)
(size of the input times size of the pattern).
The NaiveManyMatcher
uses k
instances of
the SinglePatternMatcher
to find matches
of any of k
patterns in time O(kmn)
.
For all benchmarks, inputs and patterns are random graphs with
quantum circuit-like structure.
Inputs have up to 2000 nodes (gates) and 400 qubits,
patterns have up to 30 nodes and between 2 and 5 qubits.
For pattern matching on weighted graphs,
the weights for each node n
are chosen at random
in the set {(d, 0), (d, 1), (d, 2)}
, where d
is the degree (arity) of
the node n
.
Note: weighted pattern matching is much easier than unweighted, especially as weights fix the arity of the nodes. In a sense, unweighted pattern matching corresponds to the worst case. Optimisations to the automaton could be made to significantly speed up unweighted matching, but this has not been the focus.
Pattern matching times for 0 ... 1000 patterns, NaiveManyMatcher
vs automaton-based ManyMatcher
.
The plot measures matching time (ms) as a function of the number of patterns.
Matching time (ms) as a function of the number of patterns for ManyMatcher
only, up to 10k patterns.
The patterns are categorised by the number of qubits.
On top of the running time plotted above, there is also a one-time cost to construct the automaton from the set of patterns. This is plotted here, again as a function of the number of patterns.
TODO
serde
: Enable serialization and deserialization via serde.datagen
: Necessary for thedata_generation
binary, for benchmarking. Currently not useful to the end user of the crate.
Distributed under the MIT License. See LICENSE for more information.