Skip to content

A disk-resident Clustered B+ tree implementation optimized for secondary memory storage for fixed-length records. This solution supports efficient insertion and range search operations, making it well-suited for managing datasets.

Notifications You must be signed in to change notification settings

ByJuanDiego/disk-b-plus-tree

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

B+ Tree Cpp

Description

Implementation of a dynamic multilevel tree-based index.

Compatibility

  • C++ 20 or higher

About B+ Trees

In a Clustered B+ Tree, records are stored only at the leaf nodes of the tree. The leaf nodes are also linked to provide ordered access to the records when performing a search operation. These nodes can be interpreted as the deepest (base) level of the index.

Internal nodes of the B+ Tree correspond to the superior levels of the multilevel index. These nodes helps to determine the descend criteria when performing the index operations.

The structure of the internal nodes of a B+ tree with index page capacity $M$ is as follows:

  • Each internal node is of the form $\left[P_1, K_1, P_2, K_2, ..., P_{q-1}, K_{q-1}, P_q \right]$, where $q \leq M$. Each $K_i$ refers to a search field (for $1 \leq i \lt q$) and each $P_i$ refers to a tree pointer (for $1 \leq i \leq q$), the pointer may point either to a internal node or to a leaf node. Note that an internal node with $q$ pointers has $(q - 1)$ search field values.

  • Within each internal node we have that $K_{1} < K_{2} < ... < K_{q-1}$.

  • For all search field values $X$ in the subtree pointed by $P_{i}$, we have that:

    • $K_{i-1} \lt X \lt K_{i}$ for $1 \lt i \lt q$
    • $X \lt K_{i}$ for $i = 1$
    • $K_{i-1} \lt X$ for $i = q$
  • Each internal node has at least $\lceil \frac{M}{2} \rceil$ pointers and at most $M$ pointers. The root node has at least two pointers if it is an internal node.

The structure of the leaf nodes of a B+ tree with data page capacity $N$ is as follows:

  • Each leaf node is of the form $(P_{prev}, \left[R_1, R_2, ..., R_{q-1}, R_{q} \right], P_{next})$, where $q \leq N$. Each $R_i$ refers to a record (for $1 \leq i \leq N$). $P_{prev}$ points to the previous leaf node of the B+ Tree, same idea with $P_{next}$.

  • Within each leaf node we have that $K(R_{1}) < K(R_{2}) < ... < K(R_{q})$.

  • Each leaf node has at least $(\lceil \frac{N}{2} \rceil - 1)$ records and at most $N$ records. The root node may have less than $(\lceil \frac{N}{2} \rceil - 1)$ records if it is the only node in the tree and only the root node has this exception.

  • All leaf nodes are at the same level.

Note that the order (capacity) of internal and leaf nodes are different because they differ in information as in structure.

References

  • [1] Elmasri, R. & Navathe, S. (2010). Indexing Structures for Files. In Fundamentals of Database Systems (pp. 652-653). Addison–Wesley.

About

A disk-resident Clustered B+ tree implementation optimized for secondary memory storage for fixed-length records. This solution supports efficient insertion and range search operations, making it well-suited for managing datasets.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published