- Balanced Search Trees
What's balanced search Tree ? I recommend thinking about it as a dynamic version of a sorted array.
BST likes sorted array + fast(logarithmic) insertions and deletions.
RANK: how many keys stored in the data structure are less than or equal a given value.
Operation | sorted array | BST | heap | Hash |
---|---|---|---|---|
search | θ(logn) | O(logn) | ok | |
select (iᵗʰ ) |
O(1) | O(logn),up from O(1) | ||
min/max | O(1) | O(logn),up from O(1) | only support MIN or MAX at 1 time | |
pred/succ | O(1) | O(logn),up from O(1) | ||
RANK | O(logn) | O(logn) | ||
output in sorted order | O(n) | O(n) | ||
insertion | unacceptable | O(logn) | O(logn) | Really good |
deletion | unacceptable | O(logn) | O(logn) | Really good |
注意 这里BST 的 O(lgn)算法都是 balanced search tree.
普通的binary search tree, 这些操作都是 O(n) , 实际中 复杂度依赖于 h: heigt of the tree, and h≈lgn if keys inserted in random order !!!
- exactly one node per key
- most basic version
each node has:
- left child pointer
- right child pointer
- parent pointer (root has null )
- any arbitrary node of tree with key x
- all keys in left subtree < x . (<= if duplicate key allowed)
- all keys in right subtree > x . (>= if duplicate key allowed)
- many possible search trees for a set of keys
- height could be anywhere from ≈ logn to ≈ n
- height=logn , best case perfectly balanced
- height=n, worst case a chain
for simplicity the first think about the case where there are no duplicated keys.
- start at the root
- traverse left/right child pointer as need
- return node with key k or NULL
- search for k (unsuccessfully because assuming no dup)
- rewire final NULL ptr to point to new node with key k
Q: What's the worst-case running time of Search(or Insert) operation in a binary search tree containing n keys ?
A: θ(height)
To compute the minimum key of A tree:
minimum key is always at the left most offspring , so just follow the left child pointers.
- start at root
- follow left child pointer
unitl you can't go anymore
(return last key found)
maximum key is always at the right most offspring, so just follow the right child pointers.
Predecessor: given key in the tree, find the next smallest element. eg. show as the pic above, the pred of 3 is 2.
To compute the predecessor of key k:
- easy case: if k's left subtree non-empty
- return max key in left subtree
- otherwise: follow parent pointers unitl get to a key less than k
优先左子孙中查找最大的,没有则去父辈中查找第一个满足条件的。
Q: What's the worst-case running time of Max operation in a binary search tree containing n keys ?
A: θ(height)
To print out keys in increasing order
- let r = root of BST, with subtress T_l and T_r
- recurse on T_l #print keys in order
- print key of root
- recurse on T_r #print keys in order
处理方法近似分治法的思想,每个 recursive call 实际只是 print key of the root, it n(1) , and every key will be printed only once. so the Running time is O(n).
In most data structions, deletion is the most difficult operation , and in search trees there is no exception.
To delete a key from a BST
- search for k # find where it is
- EASY CASE ( k's node has no children )
- just delete the k's node from tree
- MEDIUM CASE ( k's node has 1 child )
- delete the node that you want to delete, that creates a hole in the tree, the unique child will take that hole.
- DIFFICULT CASE ( k's node has 2 children )
- compute k's predecessor l
- SWAP k and l , NOTE definitly new position k has no right child , now k's node has 1 or 0 child , we know how to delete it already.
Runnint time: θ(height)
IDEA: store a little bit of extra info about each tree node about the tree itself .
Example Augmentation:
size(x) = number of tree nodes in subtrees rooted at x.
Note: if x has children y and z , then size(y) + size(z) + 1 equals size(x).
How to select iᵗʰ ordered statistic from augmented searth tree (with subtree sizes)
- start at root x, with children y and z
- let size_leftChild = size(y) # size_leftChild=0 if x has no left child
- if size_leftChild=i-1 return key of x #BEST CASE
- if size_leftChild> i-1 , recursively compute iᵗʰ ordered statistic of search tree rooted at y
- if size_leftChild < i-1 recursively compute (i-1 -size_leftChild)ᵗʰ ordered statistic of search tree rooted at z
Runnint time: θ(height)
红黑树(Red Black Tree) 是一种自平衡二叉查找树,是在计算机科学中用到的一种数据结构,典型的用途是实现关联数组。
红黑树和AVL树类似,都是在进行插入和删除操作时通过特定操作保持二叉查找树的平衡,从而获得较高的查找性能。
The running time of all of operations of search trees depends on the height of tree. It's so important to have a small height.
IDEA: ensure the height always O(logn) [best possible] => Search / Insert / Delete / Min / Max / Pred / Succ will then run in O(logn) time [n=#keys in tree]
Example : red-black trees
- each node red or black
- root is black
- no 2 reds in a row
- red nodes => only black children
- every path you might take from a root to a null pointer , passes through exactly the same number of black nodes.
- root null path: like as an unsuccessful search
- any unsuccusful search you pass through the same number of black nodes
Claim: a chain of length 3 can not be a RBT.
Proof: There is 3 nodes in the tree. The 1st has to be black, so there is 4 possible ways to color the 2nd and 3rd: BB, BR, RB, RR. But really , beacuse of the third invariant , RR is not allowd. If we color 2nd red and 3rd black, the invariant broken: when search 0, you pass through 1 black node, when search 4, you pass through 2 black nodes. Also the BB and BR will lead to same result. so proved.
*Claim: every red-black tree with n nodes has height <= 2log(n+1).
Proof:
- size n> 2ᵏ-1 , where k = minimum #nodes of root-null path (in a binary tree).
- => k<=log₂(n+1)
- Thus, in a red-black tree with n nodes, therer is a root-null path with at most log₂(n+1) black nodes.
- By 4th invariant: every root-null path has <=log₂(n+1) black nodes.
- By 3rd invariant: every root-null path has <= 2log₂(n+1) total nodes.
- black nodes are a majority of nodes in the tree. So if we know the number of black nodes is small, then because you can't have two reds in a row, the number of total nodes on the path is at most twice as large (BRBRB...).
这些约束强制了红黑树的关键性质: 从根到叶子的最长的可能路径不多于最短的可能路径的两倍长。
结果是这个树大致上是平衡的。
因为操作比如插入、删除和查找某个值的最坏情况时间都要求与树的高度成比例,这个在高度上的理论上限允许红黑树在最坏情况下都是高效的,而不同于普通的二叉查找树。
当我们在对红黑树进行插入和删除等操作时,对树做了修改,那么可能会违背红黑树的性质。
为了保持红黑树的性质,我们可以通过对树进行旋转,即修改树种某些结点的颜色及指针结构,以达到对红黑树进行插入、删除结点等操作时,红黑树依然能保持它特有的性质.
- KEY PRIMITIVE: Rotations
- common to all balanced search tree implementations - red-black, AVL, B+ , etc
- IDEA: locally re-balance subtrees at a node in O(1) time.
- left Rotation: of the parent x , and right child y
currently x is the parent and y is the child. We want to rewire a few pointers so that y is the parent and x is the child. Because x<y , So if x will be a child of y, it's got to be the left child.
- x's parent p
- y will inherit x's parent p
- subtrees A,B,C:
- after y becoming x's child , there is 3 child slot left for A,B,C
- A < x and y, so A should be x's left child (no change)
- C > x and y, so C should be y's right child (no change)
- B used to be y's left child, but now x is y's left child, so the only hope is to slot B into the only space we have is x's right child. Fortunately for us this acctually works.
Rotation Running Time:
search tree property invariant, can implement in O(1) time.
IDEA for insertion/deletion:
proceed as in a normal binary search tree, then recolor and perform rotations until invariants are restored.
Insert x:
- insert x ans usual (make x a leaf) y->x
- try coloring x red
- when we colorize x, it has potential to break the invariants. if we color x red, 3rd invariant may be broken; if we color x black, 4th invariant may be broken.
- 插入黑点会增加某条路径上黑结点的数目,从而导致整棵树黑高度的不平衡
- if x's parent y is black , done.
- else y is red.
- if y is red, it should be the root, y must have a parent, we call it w , y has a black parent w.
- there is 2 different cases:
- case 1: the other child z of w is also red
- solution: recolor y,z black, w red
- 但由于w 的父节点有可能为红色,从而违反红黑树性质。此时必须将 w节点作为新的判定点继续向上传播进行平衡操作。
- case 2: the other child z of w is NULL or black
- solution: recoloring, rotations. 红黑树插入