These programs demonstrate linear-time subtree search:
If you want to render the DOT code as a graph drawing, you can do so by running
the Graphviz dot
command, or simply by pasting the
DOT code into GraphvizOnline by
dreampuf.
This repository is licensed under 0BSD.
See LICENSE
.
The algorithm does postorder traversal of the “corpus” tree to
search in, memoizing every unique subtree in a hash table. The values stored in
the table are sequential subtree IDs. Each key is an aggregate of the subroot
element (the node “key”), the ID representing the left subtree, and
the ID representing its right subtree. When a child pointer is nil
/null
, the
ID is
0. Nonempty subtrees are given strictly positive IDs, starting with 1.
Then it does the same thing for the “pattern” tree to be searched, except no new mappings in the hash table are created: if any subtree of the pattern tree, including the entire pattern tree, is not already memoized, then then the corpus tree does not contain (any copy of) the pattern tree. Otherwise it does.
Assuming good hash distribution, this algorithm runs in O(M + N) time with high probability†, when searching a tree of M nodes for a subtree that is a copy of a tree with N nodes.
† Because finding or inserting a key in a hash table takes amortized O(1) time with high probability.
The way I’ve drawn binary trees using GraphViz is inspired by “Visualizing binary trees with Graphviz” by Eli Bandersky.
See the comments above the dot
method in subtree-search.cr
and the
NodeExtensions.ToDot
method in subtree-search.linq
for information on how
the approach I’ve used is similar and different.
An earlier version of this repository exists as a gist. This repository supersedes that gist. Currently the main difference, besides the presence of this README file, is that the C#/LINQPad version in the Gist was never updated to include DOT code generation. (The Crystal version does generate DOT code.)
SubtreeSearch was written in 2020. This README was added in 2021.