Skip to content

Latest commit

 

History

History
79 lines (60 loc) · 3.48 KB

README.md

File metadata and controls

79 lines (60 loc) · 3.48 KB

SubtreeSearch - linear-time subtree search

These programs demonstrate linear-time subtree search:

  • in Crystal – emitting DOT code, and
  • in C# – emitting a LINQPad dump, including DOT code.

If you want to render the DOT code as a graph drawing, you can do so by running the Graphviz dot command, or simply by pasting the DOT code into GraphvizOnline by dreampuf.

License

This repository is licensed under 0BSD. See LICENSE.

How it works

The algorithm does postorder traversal of the “corpus” tree to search in, memoizing every unique subtree in a hash table. The values stored in the table are sequential subtree IDs. Each key is an aggregate of the subroot element (the node “key”), the ID representing the left subtree, and the ID representing its right subtree. When a child pointer is nil/null, the ID is 0. Nonempty subtrees are given strictly positive IDs, starting with 1.

Then it does the same thing for the “pattern” tree to be searched, except no new mappings in the hash table are created: if any subtree of the pattern tree, including the entire pattern tree, is not already memoized, then then the corpus tree does not contain (any copy of) the pattern tree. Otherwise it does.

Assuming good hash distribution, this algorithm runs in O(M + N) time with high probability, when searching a tree of M nodes for a subtree that is a copy of a tree with N nodes.

Because finding or inserting a key in a hash table takes amortized O(1) time with high probability.

Acknowledgements

The way I’ve drawn binary trees using GraphViz is inspired by “Visualizing binary trees with Graphviz” by Eli Bandersky.

See the comments above the dot method in subtree-search.cr and the NodeExtensions.ToDot method in subtree-search.linq for information on how the approach I’ve used is similar and different.

History

An earlier version of this repository exists as a gist. This repository supersedes that gist. Currently the main difference, besides the presence of this README file, is that the C#/LINQPad version in the Gist was never updated to include DOT code generation. (The Crystal version does generate DOT code.)

SubtreeSearch was written in 2020. This README was added in 2021.