-
Notifications
You must be signed in to change notification settings - Fork 1
/
ch11useThis.tex
executable file
·4197 lines (2472 loc) · 125 KB
/
ch11useThis.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
\chapter{Search Trees}
%%% Local Variables:
%%% mode: latex
%%% TeX-master: "book"
%%% End:
%\setcounter{chapter}{1}
%\newcommand{\hatcolon}{\hatChar \hskip-0.4em :}
%\newcommand{\hatminus}{\hatChar \hskip-0.4em -}
%\newcommand{\hatminus}{\hbox{\hatChar \texttt{-}}}
\section{How to Find a Folder}
Suppose you have a bunch of file folders (physical ones, those
manilla-colored things, not folders on the computer system). Each
folder contains some documents and is identified by a number on its
filing tab. Every now and then someone gives you an identifying
number and asks you to retrieve the corresponding folder. How much
work is that going to be?
It depends on the number of file folders and the care you take in
organizing them. If you have only a few folders, it doesn't matter
much. You can just scatter them around on the desktop, and when
you're asked to get one, look around until you find it. It might be
in the first folder you pick up, or the second, and the worst that
can happen is that you will have to look at all of the folders
before you find the one you want. Since there aren't very many, it
won't take long.
That's fine when there are only a few folders, but what if there
were hundreds? Then, the frustration in finding the right folder
might motivate you to rethink the arrangement. Scattering the
folders around the desktop would be an unattractive way to keep
track of them. You might go out and buy a four-drawer filing cabinet
and arrange the folders in the cabinet in order by their identifying
numbers. When somebody asks you for a particular folder, you quickly
locate the folder by relying on the numbering. Much better, eh?
How much better? Say you have a thousand folders and you were using the
old desktop method. Sometimes you would find it on the first go,
sometimes the second, sometimes the thousandth. Because the folders
are scattered on the desktop at random, no folder is more likely to
be the one you're looking for than any other folder, so the average
search time would be the average of the numbers $1, 2,\ldots 1,000$,
which is about 500.
Now, how about the filing-cabinet method? The time it takes you here
depends on how much you take advantage of the numbering, but it
turns out that the typical search time will be much improved over
the desktop method.
You could proceed by looking at
the identifying number and comparing it to the number on the middle
file in your four-drawer cabinet.
If the number on the folder you're looking for is bigger than the first one in
the middle file drawer, then you know to look in the bottom half of the
file drawers. If it's smaller, you look in the top half.
That is, you've narrowed down the search to two of the four file drawers.
Then, you could do the same thing again to narrow it down to one file drawer,
and after that, look at a folder near the middle of the file drawer
to figure out whether the one you're looking for is in the front half
of the drawer or the back half.
At each stage, you eliminate half of the folders.
When you started, the folder you're looking for might have been any
of the 1,000 folders. After the first step, you can limit your search
to 500 folders. After the second step, 250 folders, and so on,
halving the number of candidates at each stage until you finally
narrow it down to one folder.
Of course, you might have found it somewhere along the way,
but the worst that can happen is that you don't find it until you
narrow it down to one folder. That takes ten steps, worst case, so
the average case must be at least that good, which
fifty times better than the 500-step average required in the desktop method.
Good as that sounds, it gets even better. If you have 10,000 folders,
instead of just 100, the first go knocks it down to 5,000 folders,
then 2,500, and so on down to a single folder in fourteen steps, max,
for the filing cabinet method, which is hundreds of times faster
than the 5,000-step average for the desktop method.
If you have a million
folders, the improvement is even more dramatic. With a million
folders, the filing cabinet method has a twenty-five-thousand-fold
advantage over the desktop method. It gets better and better, the
more folders you have. Of course, you will have to buy more filing
cabinets, but regardless of how many filing cabinets you have,
you can halve the search space at every stage.
\section{The Filing Cabinet Advantage}
Let's work out a formula for the filing-cabinet advantage. If you
have $n$ folders, the most steps you have to make in your search is
the number of times you have to successively halve $n$ to get down
to one. That number is the base-two logarithm of $n$. Actually,
the logarithm is not an integer, it's the next integer up, so
round up to the next integer.
The notation $\lceil x \rceil$ means the smallest integer that is
$x$ or bigger.
For example $\lceil 3.5 \rceil = 4$.
Here are the formulas for the typical number of steps in the two
methods: desktop versus filing cabinet, as functions of the number
of folders. \newline
$\begin{tabular}{ll}
D(n) = $\lceil n/2 \rceil$ & \emph{desktop method} \\
C(n) = $\lceil log_2(n) \rceil & \emph{filing cabinet method} \\
\end{array}$\newline
The ratio $D(n)/C(n)$ expresses the advantage of the filing-cabinet
method over the desktop method. As the examples indicated, it gets
better and better as $n$, the number of folders, gets larger. For a
billion folders, there is more than a ten-million-fold advantage.
Plus, the desktop would be long buried by that time,
and the folders would be scattered all over the office building.
\section{The New-File Problem}
label{binary-search}
The filing-cabinet method performs an algorithm known as a
``binary search''.
It's a simple idea, and
probably one that almost anyone would think of, faced with the
problem of organizing a large number of folders for retrieval
purposes. It works amazingly well, as long as the number
of folders doesn't change very often.
However, when a new folder needs to be added to the
collection, it can be a problem. The problem usually doesn't come up
on the first few new folders. For those, there will be room to
squeeze them into the drawers where they belong. The problem comes
up when the number of folders keeps growing. Eventually, there is no
room left in the drawer where a new folder belongs, so some of the
folders in that drawer have to be moved down to another drawer to
make room. What's worse, the next drawer may also be full. The
problem can cascade further and further along. With a whole office
building filled with cabinets for billions of folders, putting in
one new folder can call for a lot of heavy lifting.
In fact, with an office building filled with cabinets, it will be
advantageous to keep the folders in as few cabinets as possible. If
they were scattered out, and many cabinets were nearly empty, there
could be a lot of walking involved to find the appropriate cabinet.
In other words, to keep down the walking, folders must be kept in
the smallest number of drawers possible. This means that the drawers
involved are almost always full, and whenever a new folder comes in,
a bunch of folders have to be moved around to accommodate the new
one.
How much moving around? As in the desktop method, it depends on
where the new folder goes. If it goes at the end, no problem. Just
put it at the end of the last non-empty drawer. But, if it goes at
the beginning, all of the folders must be moved down one slot to
accommodate the new one. Bummer!
On the average, just as with finding folders using the desktop
method, about half of the folders must be moved to accommodate a new
folder. Something similar happens if a folder needs to be discarded.
Again, on the average, about half of the folders must be moved to
keep the folders compressed into the smallest number of drawers.
\section{The AVL Solution}
In a computer, the filing-cabinet method is normally implemented
using an array to store the folders. The identifying numbers on the
folders are arranged in order in the array, and to find a particular
identifying number, the software looks first at the middle element
of the array, then at the middle element of the narrowed ranged
resulting from the first comparison of identifying numbers, and so
on. That is, a binary search is performed in the array.
When a new folder comes in, it must be inserted in the
place required by its identifying number to keep the numbers
in order. So, all of the elements beyond that point
must be moved down to make room. This can be time-consuming if there
are, say, billions of folders in the array.
The same thing happens in deleting a folder from the array. All the
folders beyond the point of deletion must be moved up to eliminate
the gap. In either case, the number of array elements to be moved
is, on the average, about half of the number of folders in the array.
That's not good, and it's not an easy problem to fix.
In fact, it can't be fixed with array-based methods.
In the 1960s (ancient history in the computing business),
two Russian mathematicians, Adelson-Velski and Landis,
figured out a way to store folders in a tree structure so that
finding a folder, adding a new folder, and removing a folder all
require only about $log(n)$ steps, where $n$ is the number
of folders stored in the tree. The structure these mathematicians
invented, together with their insertion and deletion methods, is
known as the AVL tree. \label{AVL-tree}.
The simple part of the Adelson-Velski/Landis solution is to store
the folders in a tree that facilitates binary search. Except for
leaf-nodes, each node in the tree stores a folder and has a left
subtree and a right subtree. (Leaf nodes just mark the boundaries of
the tree. They don't store folders, and they don't have subtrees.)
All of the folders in the left subtree have identifying numbers that
are smaller than that of the folder in the root node (that is,
the one you start with), and all of
the folders in the right subtree have identifying numbers that are larger.
To find a folder, look at the root. If the folder stored at
the root has the identifying number you're looking for, deliver that folder.
If not, look at the left subtree if the sought-for identifying number
is smaller than the one in the root, and look in the right subtree
if it is larger.
If a folder with the identifying number you're looking for is in the tree,
it will be found. Instead, the search will arrive at a leaf-node,
indicating that the required folder is not in the tree.
That's the simple part. The hard part has to do with insertion and
deletion. The number of steps in the search depends on where the
folder occurs in the tree, but at worst, the number of steps cannot
exceed the number of levels in the tree. Nodes in a tree have
subtrees, and the roots of subtrees have subtrees, and so on.
Eventually, the desired folder is encountered, or a leaf is
encountered, and the search cannot continue. The maximum number of
steps in this search, for all possible search paths in the tree, is
known as the $\emph{height}$ of the tree.
AVL trees maintain a balance among the heights of subtrees, at all levels, as
nodes are inserted and deleted. By maintaining balance, the AVL tree
preserves a high ratio between number of folders it contains and the
height of the tree.
In fact, the height of an AVL tree is approximately the base-2
logarithm of the number of folders it contains. That means every
search will terminate within $log_2(n)$ steps, where $n$ is the
number of folders stored in the tree.
If you want to get a feeling for just how ingenious the AVL
solution is, try to find a way to insert and delete folders into a
tree that maintains order (left subtrees have folders with smaller
identifying numbers, right subtrees larger) and balance (left
subtrees have about the same number of nodes as right subtrees, at
all levels). To match the effectiveness of AVL trees, your method
will have to be able to insert or delete a folder in about
$log(n)$ steps, where $n$ is the number of folders stored
in the tree.
After a few hours work, you'll begin to appreciate
the contribution that Adelson-Velski and Landis made to the technology for
maintaining large data bases.
\section{Search Trees and Occurrence of Keys}
It's a long road from here to the complete AVL solution. As usual,
the road starts with formulas and equations that make the ideas
amenable to mathematical reasoning.
The first step is to define a formal representation of a tree.
To avoid unnecessary details, the definition of the
does not specify the kind of data associated with nodes.
They may contain any kind of data.
We assume that each node is identified by a numeric key,
which is the basis for locating the node.
We are going to assume that keys are natural numbers,
but any set with an ordering among its elements could serve
as the domain of the keys.
\label{leaf-node}
Each node in the tree is either a ``leaf node'', represented by ``nil'',
\label{interior-node}
or an ``interior node'', represented by a list of four elements:
key (a number), data (of any kind), left subtree, and right subtree.
\label{root-node}
The ``root node'' is the one that is not preceded by any other node.
In the diagram, it's the node at the top.
In a formula, it's the entire formula.
The key of the root node is the first element of the list that represents the tree,
and the data associated with the root is the second element.
Figure \ref{fig:searchtree-diagram} shows a formula
for a search tree in which the data are strings.
The algebraic formula represents the tree in the diagram.
We will rely extensively on diagrams to
illustrate ideas expressed algebraically in terms of formulas.
You will need to develop an ability to see the connection
and convert freely between diagrams and formulas.
\begin{figure}
\begin{center}
\includegraphics[scale=0.5]{images/searchtree.png}
\end{center}
\caption{Search Tree Diagram and Corresponding Formula}
\label{fig:searchtree-diagram}
\end{figure}
\label{empty-tree}
We use nil to represent the empty tree.
For non-empty trees, we use
four-element lists in which the key and data at
the root are the first two elements and the left
and right subtrees are the last two elements.
\label{subtree}
A ``subtree'' is any part of a tree beginning with a node
other than the root
and continuing all the way from that node to the leaves.
A key occurs in a search tree if it is the key at the root
or if it occurs in either the left or right subtree.
\label{subtree}
The height of an empty tree is zero, by default.
The height of a non-empty tree is one more than the
larger of the heights of its left and right subtrees.
The size of a search tree is the number of keys in the tree.
Figure \ref{fig:tree-functions} makes these ideas precise
by providing formal definitions.
\begin{figure}
\begin{center}
\begin{lstlisting}
(defun key (s) (first s)) ; key at root
(defun dat (s) (second s)) ; data at root
(defun lft (s) (third s)) ; left subtree
(defun rgt (s) (fourth s)) ; right subtree
(defun emptyp (s) (not (consp s))) ; empty tree?
(defun height (s) ; tree height
(if (emptyp s)
0 ; ht0
(+ 1 (max (height (lft s)) (height (rgt s)))))) ; ht1
(defun size (s) ; number of keys
(if (emptyp s)
0 ; sz0
(+ 1 (size (lft s)) (size (rgt s))))) ; sz1
(defun 4-element-list (xs)
(and (consp xs) (consp (rest xs)) (consp (rest (rest xs)))
(consp (rest (rest (rest xs))))
(equal nil (rest (rest (rest (rest xs)))))))
(defun treep (s) ; search tree?
(or (emptyp s)
(and (4-element-list s) (natp (key s))
(treep (lft s)) (treep (rgt s)))))
(defun subp (r s) ; r subtree of s?
(and (treep r) (treep s) (not (emptyp s))
(or (equal r (lft s)) (subp r (lft s))
(equal r (rgt s)) (subp r (rgt s)))))
(defun keyp (k s) ; key k occurs in s?
(and (not (emptyp s))
(or (= k (key s)) (keyp k (lft s)) (keyp k (rgt s)))))
\end{lstlisting}
\end{center}
\caption{Search Tree Functions and Predicates}
\label{fig:tree-functions}
\end{figure}
The only tree of height zero is the empty tree because
the height of a non-empty tree is one more than the maximum
of two other numbers, which makes it at least one.
Theorem \(\emph{ht-emp0}\} is a formal statement of this fact.
Theorem \(\emph{ht-emp0}\} ((height $s$) $= 0$) = (emptyp $s$)
The height of a subtree is strictly less than the height of the tree.
Theorem \(\emph{subtree height}\}. (subp $r$ $s$) $\rightarrow ((height $r$) $<$ (height $s$))
To prove this, observe that (emptyp $s$) $\vee$ (not (emptyp $s$)) is true
\($\vee$ \emph{complement}\}.
So, we can prove the theorem by deriving it from
each of these formulas, independently \($\vee$ \emph{elimination}\}.
\begin{center}
\begin{tabular}{rll}
Case 1: & (emptyp $s$) = true & \\
& (subp $r$ $s$) $\rightarrow$ \dots & \\
$=$ & (and (treep $r$) (treep $s$) (not (emptyp $s$))) \dots) $\rightarrow$ \dots & \{\emph{emptyp}\} \\
$=$ & (and true true (not true)) $\rightarrow$ \dots & \{\emph{assumption}\} \\
$=$ & (and true true nil) $\rightarrow$ \dots & \{$\neg$ \emph{truth table}\} \\
$=$ & nil $\rightarrow$ \dots & \{$\wedge$ \emph{truth table}\} \\
$=$ & true & \{$\rightarrow$ \emph{truth table}\} \\
& & \\
Case 2: & (not (emptyp $s$)) $>$ 0 & \\
& (height $s$)
$=$ & 1 + (max (height (lft $s$)) (height (rgt $s$))) & \(\emph{ht-emp0}\}, \{\emph{ht1}\}
$=$ & 1 + (max (height $r$) (height $r$)) & \{\emph{subp}\} \\
$>$ & (height $r$) & \{\emph{algebra}\} \\
\end{tabular}
\end{center}
\section{Ordered Search Trees and Tree Induction}
A search tree is ``ordered'' if the key in each non-leaf node is
greater than all the keys that occur in the left subtree of the node
and is less than all the keys that occur in the right subtree. A
leaf is ordered by default. That is, the predicate
``ordp'' satisfies the following equations.
For any search tree $s$, (ordp $s$) is true if $s$ is ordered
and false otherwise.
\begin{center}
\begin{tabular}{ll}
(ordp nil) = true & \{\emph{ord0}\} \\
(ordp (k d lf rt) = ($\forall x$,(keyp $x$ $lf$).$x < k$) $\wedge$
($\forall y$,(keyp $y$ $rt$).$y > k$) $\wedge$
(ordp $lf$) $\wedge$ (ordp $rt$) & \{\emph{ord1}\} \\
\end{tabular}
\end{center}
From the definition of ordp, it's not a big step to
guess that an ordered search tree cannot contain duplicate keys.
However, saying exactly what that means turns out to be tricky. One
approach is to define a function that extracts from a search tree a
sequence containing all the data elements in the tree that are
associated with a given key, and then to prove that the sequence
contains exactly one element if the key occurs in an ordered tree.
If the key doesn't occur in the tree, the sequence is empty.
$\ $\newline $\ $\newline
$\mathbf{sequence}\ \mathbf{of}\ \mathbf{matching}\
\mathbf{keys}$\newline
\begin{verbatim}
dataElems :: SearchTree d -> Integer -> [d]
dataElems Nub x = [] {dataElems N}
dataElems (Cel k d lf rt) x {dataElems C}
2 = if k == x
then (dataElems lf x) ++ [d] ++ (dataElems rt x)
else (dataElems lf x) ++ (dataElems rt x)
\end{verbatim}
\begin{theorem}
\label{th:uk_1}
(unique keys, part 1).
$\forall s. (\mathit{ordered}\ (s)\ \logand\ \mathtt{k} \in s)\
\logimp\ (\mathtt{length}\ (\mathtt{dataElems}\ s\ \mathtt{k})\
==\ 1)$
\end{theorem}
\begin{theorem}
\label{th:uk_2}
(unique keys, part 2).
$\forall s. (\mathit{ordered}\ (s)\ \logand\ \mathtt{k} \not\in
s)\ \logimp\ ((\mathtt{dataElems}\ \mathit{s}\ \mathtt{k})\ ==\
[ ])$
\end{theorem}
We will prove\index{$\mathtt{dataElems}$} Theorem \ref{th:uk_2}
first, and we will use a new form of induction that we will call
{\bf tree induction}\index{tree induction}.
%$$\mathbf{Principle}\ \mathbf{of}\ \mathbf{Tree}\
%\mathbf{Induction}.
\medskip
\noindent\textbf{Principle of Tree Induction.}
$$
(\forall\ t.\ (\forall\ s\subset t.\ P(s))\
\logimp\ P(t))\ \logimp\ (\forall t.\ P(t))$$
Note: The domain of discourse of the for-alls is the set of all
search trees.
The principle of induction derives from basic elements of set
theory, and all forms of inductive proof are equivalent when taken
back to this basic framework. In practice, the form of induction
used in a particular proof depends on the domain of discourse.
Verifying the equivalence of various inductive forms to ordinary,
mathematical induction on the natural numbers would require a major
digression into details of set theory.
Using the principle of tree induction, we can prove that a predicate
is true for every search tree if we can prove a certain implication.
The implication is this: if the predicate is true for every proper
subtree of a particular, chosen tree, then it must also be true for
the chosen tree. The implication must be proved for an arbitrarily
chosen tree, but once this implication is proved, the principle of
tree induction delivers the conclusion that the predicate is true
for every search tree.
The statement of the principle of tree induction is identical to the
statement of strong induction for the domain of natural numbers,
except that where the relation ``less than'' appears in the
principle of strong induction, the relation ``proper subset''
appears in the principle of tree induction. The two forms of
induction share the implicit requirement that the predicate must be
proved directly for the simplest element.
In the case of strong induction, the simplest element is zero. There
are no natural numbers smaller than zero. Therefore, when the chosen
element is zero, the universe of discourse for the for-all in the
hypothesis of the implication $(\forall\ s < 0.\ P(s))$ is empty. A
for-all over an empty universe of discourse is true by default, so
for the case when the chosen element is zero, the implication to be
proved is $(\True\ \logimp\ P(0))$. The hypothesis in this
implication can be of no help in proving its conclusion.
The same is true for tree induction. When the chosen element is a
$\mathtt{Nub}$, the universe of discourse for the hypothesis of the
implication to be proved is empty. That is, we must prove
$((\forall\ s\subset \mathtt{Nub}.\ P(s))\ \logimp\
P(\mathtt{Nub}))$, which is the same as $(\True\ \logimp\
P(\mathtt{Nub}))$. The hypothesis of this implication (namely,
``true'') cannot help in arriving at its conclusion (namely,
$P(\mathtt{Nub})$). We will use tree induction to prove many
properties of software that operates on search trees, and a few
properties of the search trees themselves. For starters, we use tree
induction to prove Theorem \ref{th:uk_2}.
\begin{proof} of Theorem \ref{th:uk_2}
(unique keys, part 2):
$\forall s. (\mathit{ordered}\ (s)\ \logand\ \mathtt{k} \not\in
s)\ \logimp\ ((\mathtt{dataElems}\ \mathtt{s}\ \mathtt{k})\ ==\
[ ])$\newline
Proof: by tree induction.\newline
Base Case.\newline
$\begin{array}{ll}
(\mathit{ordered}\ (\mathtt{Nub})\ \logand\ \mathtt{k} \not\in
\mathtt{Nub})\ \logimp\ ((\mathtt{dataElems}\ \mathtt{Nub}\
\mathtt{k})\ ==\ [ ]) & \\
\quad =\ \{\mathit{ord}\ N, \in N\} & \\
(\False \ \logand\ \False)\ \logimp\
((\mathtt{dataElems}\ \mathtt{Nub}\ \mathtt{k})\ ==\ [ ]) & \\
\quad =\ \{\logand\ \mathit{idem}\} & \\
\False\ \logimp\ ((\mathtt{dataElems}\ \mathtt{Nub}\ \mathtt{k})
\ ==\ [ ]) & \\
\quad =\ \{\False \ \logimp\ \mathit{any}\ =\ \True\} & \\
\True & \\
\end{array}$\newline
Inductive Case.\newline
First, we work with just the hypothesis of the implication we're
trying to prove.\newline
$\begin{array}{l}
(\mathit{ordered}\ (\mathtt{Cel}\ \mathtt{x}\ \mathtt{a}\ \mathtt{lf}\
\mathtt{rt})\ \logand\
\mathtt{k}\ \not\in\ (\mathtt{Cel}\ \mathtt{x}\ \mathtt{a}\
\mathtt{lf}\
\mathtt{rt})) \\
\quad =\ \{\in\ C\} \\
(\mathit{ordered}\ (\mathtt{Cel}\ \mathtt{x}\ \mathtt{a}\
\mathtt{lf}\ \mathtt{rt})\ \logand\ \lognot (\mathtt{x}\ =\
\mathtt{k}\ \logor\ \mathtt{k} \in \mathtt{lf}\ \logor\ \mathtt{k}
\in \mathtt{rt} )) \\
\quad = \ \{DM\} \\
(\mathit{ordered}\ (\mathtt{Cel}\ \mathtt{x}\ \mathtt{a}\
\mathtt{lf}\ \mathtt{rt})\ \logand\ (\mathtt{x} \not =
\mathtt{k})\ \logand\ (\mathtt{k} \not\in \mathtt{lf})\ \logor\
(\mathtt{k} \not\in \mathtt{rt} )) \\
\end{array}$\newline
We are trying to prove that when the above formula is true, the
formula in the conclusion of the theorem is also true. That is, we
want to prove that the $\mathtt{dataElems}$ function delivers an
empty sequence in this case.\newline
$\begin{array}{l}
\mathtt{dataElems}\ (\mathtt{Cel}\ \mathtt{x}\ \mathtt{a}\
\mathtt{lf}\ \mathtt{rt})\ \mathtt{k} \\
\quad = \ \{\mathtt{dataElems}\ C, \mathtt{x} \not = \mathtt{k}\} \\
(\mathtt{dataElems}\ \mathtt{lf}\ \mathtt{k})\ \append \
(\mathtt{dataElems}\ \mathtt{rt}\ \mathtt{k}) \\
\quad = \ \{\mathit{ord}\ C ,k \not\in \mathtt{lf},
\mathit{induction}\ \mathit{hypothesis},\ \mathit{applied}\
\mathit{twice}\} \\
[ ]\ \append\ [ ] \\
\quad = \ \{\append\ [ ]\} \\
[ ] \\
\end{array}$
The induction step in the proof occurred when we observed that with
respect to the formula $(\mathtt{dataElems}\ \mathtt{lf}\
\mathtt{k})$, the hypotheses of the theorem are true. That is, the
tree $\mathtt{lf}$ is ordered (by the definition of ordered, since
$\mathtt{lf}$ is a subtree of an ordered tree) and the key
$\mathtt{k}$ does not occur in that tree. As $\mathtt{lf}$ is a
proper subtree of the tree we started with, the principle of
induction allows us to assume that the theorem is true for the tree
$\mathtt{lf}$. (Remember, induction doesn't require a direct proof
in the inductive case. It only requires that you prove an
implication whose hypothesis is that the theorem is true for every
proper subtree of the one you started with.) In this case, we apply
the induction hypothesis twice: once for the left subtree and again
for the right subtree.
\end{proof}
Now, what about Theorem \ref{th:uk_1}? Induction also provides the
mechanism for its proof.\newline
%Theorem \ref{th:uk_1} (unique keys, part 1).
%
%$\forall s. (\mathit{ordered}\ (s)\ \logand\ \mathtt{k} \in s)\
%\logimp\ (\mathtt{length}\ (\mathtt{dataElems}\ s\ \mathtt{k})\
%==\ 1)$\newline
\begin{proof} of Theorem \ref{th:uk_1} (unique keys, part 1), by tree
induction.\newline
Base Case.\newline
$\begin{array}{l}
(\mathit{ordered}\ (\mathtt{Nub})\ \logand\ \mathtt{k}\in \mathtt{Nub})\
\logimp\
(\mathtt{length}\ (\mathtt{dataElems}\ \mathtt{Nub}\ \mathtt{k})\ =
=\
1) \\
\quad =\ \{\in N\} \\
(\mathit{ordered}\ (\mathtt{Nub})\ \logand\ \False)\ \logimp\
(\mathtt{length}\ (\mathtt{dataElems}\ \mathtt{Nub}\ \mathtt{k}) \
==\ 1) \\
\quad = \ \{\logand\ null\} \\
\False\ \logimp\ (\mathtt{length}\ (\mathtt{dataElems}\
\mathtt{Nub}\ \mathtt{k})\ ==\ 1) \\
\quad =\ \{\False \ \logimp\ \mathit{any}\ =\ \True\} \\
\True
\end{array}$\newline
Inductive Case.\newline
First, we work with just the hypothesis of the implication we're
trying to prove.\newline
$\begin{array}{l}
(\mathit{ordered}\ (\mathtt{Cel}\ \mathtt{x}\ \mathtt{a}\ \mathtt{lf}\
\mathtt{rt})\ \logand\ \mathtt{k} \in
(\mathtt{Cel}\ \mathtt{x}\ \mathtt{a}\ \mathtt{lf}\ \mathtt{rt})) \\
\quad =\ \{\in C\} \\
(\mathit{ordered}\ (\mathtt{Cel}\ \mathtt{x}\ \mathtt{a}\
\mathtt{lf}\ \mathtt{rt})\ \logand\ (\mathtt{x} = \mathtt{k}\
\logor\ \mathtt{k} \in \mathtt{lf}\ \logor\ \mathtt{k} \in
\mathtt{rt} ))
\\
\quad =\ \{\logand\ \mathit{over}\ \logor\} \\
(\mathit{ordered}\ (\mathtt{Cel}\ \mathtt{x}\ \mathtt{a}\ \mathtt{lf}\
\mathtt{rt})\ \logand\
(\mathtt{x} = \mathtt{k}))\ \logor\ \\
(\mathit{ordered}\ (\mathtt{Cel}\ \mathtt{x}\ \mathtt{a}\ \mathtt{lf}
\
\mathtt{rt})\ \logand\
\mathtt{k} \in \mathtt{lf})\ \logor\ \\
(\mathit{ordered}\ (\mathtt{Cel}\ \mathtt{x}\ \mathtt{a}\ \mathtt{lf}
\
\mathtt{rt})\ \logand
\mathtt{k} \in \mathtt{rt}) \\
\end{array}$\newline
We are trying to prove that when the above formula is true, the
formula in the conclusion of the theorem is also true. That is, we
want to prove that the $\mathtt{dataElems}$ function delivers a
sequence with exactly one element in this case. The implication we
are trying to verify has the following form: $(a\ \logor\ b\
\logor\ c)\ \logimp\ d$, where $d$ is the conclusion of the
theorem (that is, $d\ =\ (\mathtt{length}\ (\mathtt{dataElems}\ s\
\mathtt{k})\ ==\ 1))$, and $a$, $b$, and $c$ are the terms in the
above formula. For example, $a\ =\ (\mathit{ordered}\
(\mathtt{Cel}\ \mathtt{x}\ \mathtt{a}\ \mathtt{lf}\ \mathtt{rt})\
\logand\ (\mathtt{x}\ =\ \mathtt{k}))$. Using the Boolean algebra
of propositions, one can verify that
$((a\ \logor\ b\ \logor\ c)\ \logimp\ d)\ =\ ((a\ \logimp\
d)\ \logand\ (b\ \logimp\ d)\ \logand\ (c\ \logimp\ d))$
That is, the formula $((a\ \logor\ b\ \logor\ c)\ \logimp\ d)$
can be verified by proving that each of the terms, $(a\ \logimp\
d)$, $(b\ \logimp\ d)$, and $(c\ \logimp\ d)$, is true.\newline
Proof of $(a\ \logimp\ d)$:
$(\mathit{ordered}\ (\mathtt{Cel}\ \mathtt{x}\ \mathtt{a}\
\mathtt{lf}\ \mathtt{rt})\ \logand\ (\mathtt{x}\ =\ \mathtt{k}))\
\logimp\ $
$\quad\quad\quad(\mathtt{length}\ (\mathtt{dataElems}\
(\mathtt{Cel}\ \mathtt{x}\ \mathtt{a}\ \mathtt{lf}\ \mathtt{rt})\
\mathtt{k})\ ==\ 1)$
Again, we work with the hypothesis of the implication first. Since
the tree $(\mathtt{Cel}\ \mathtt{x}\ \mathtt{a}\ \mathtt{lf}\
\mathtt{rt})$ is ordered, $\mathtt{x} \not\in \mathtt{lf}$ and
$\mathtt{x} \not\in \mathtt{rt}$. (All the keys in the left subtree
must be smaller than $\mathtt{x}$, and all those in the right
subtree must be larger than $\mathtt{x}$. Because $\mathtt{x}\ =
\mathtt{k}$, we conclude that $\mathtt{k} \not\in \mathtt{lf}$ and
$\mathtt{k} \not\in \mathtt{rt}$.) These observations take us to the
conclusion of the theorem by the following logic.\newline
$\begin{array}{l}
\mathtt{length}\ (\mathtt{dataElems}\
(\mathtt{Cel}\ \mathtt{x}\
\mathtt{a}\ \mathtt{lf}\ \mathtt{rt})\ \mathtt{k}) \\
\quad = \ \{\mathtt{dataElems}\ C, \mathtt{x}\ =\ \mathtt{k}\} \\
\mathtt{length}\ ((\mathtt{dataElems}\ \mathtt{lf}\ \mathtt{k})\
\append\
[\mathtt{a}]
\ \append\ (\mathtt{dataElems}\ \mathtt{rt}\ \mathtt{k})) \\
\quad = \ \{\mathit{Thm}\ \ref{th:uk_2}, \mathtt{k}\ \not\in
\mathtt{lf}\}
\\
\mathtt{length}\ ([ ]\ \append [\mathtt{a}]\ \append\
(\mathtt{dataElems}\ \mathtt{rt}\ \mathtt{k})) \\
\quad =\ \{\mathit{Thm}\ \ref{th:uk_2}, \mathtt{k}\ \not\in
\mathtt{rt}\}
\\
\mathtt{length}\ ([ ]\ \append\ [\mathtt{a}]\ \append\ [ ]) \\
\quad =\ \{\append.1, \append []\} \\
\mathtt{length}\ ([\mathtt{a}]) \\
\quad = \ \{\mathit{Thm}\ \mathit{len}\} \\
1 \\
\end{array}$\newline
It turns out that the induction hypothesis was not needed for the
proof of $(a\ \logimp\ d)$. It will be needed for the other two
proofs, however.\newline
Proof of $(b\ \logimp\ d)$:
$\begin{array}{l}
(\mathit{ordered}\ (\mathtt{Cel}\ \mathtt{x}\ \mathtt{a}\ \mathtt{lf}\
\mathtt{rt})\ \logand\ \mathtt{k} \in \mathtt{lf})\ \logimp\
\\
\qquad (\mathtt{length}\ (\mathtt{dataElems}\ (\mathtt{Cel}\ \mathtt{x}\
\mathtt{a}\ \mathtt{lf}\ \mathtt{rt})\ \mathtt{k})\ ==\ 1)
\end{array}$
Again, we work with the hypothesis of the implication first.
$\begin{array}{l}
(\mathit{ordered}\ (\mathtt{Cel}\ \mathtt{x}\ \mathtt{a}\ \mathtt{lf}\
\mathtt{rt})\
\logand\ \mathtt{k} \in \mathtt{lf}) \\
\quad \logimp\ \{\mathit{definition}\ \mathit{of}\ \mathit{ordered}\} \\
(\mathit{ordered}\ (\mathtt{Cel}\ \mathtt{x}\ \mathtt{a}\
\mathtt{lf}\ \mathtt{rt})\
\logand\ \mathtt{k} \in \mathtt{lf}\ \logand\ \mathtt{k} < \mathtt{x}) \\
\quad \logimp\ \{\mathit{def}\
\mathit{ord}, \mathit{since}\
\mathtt{k} < \mathtt{x}\
\logimp\
\mathtt{k} \not\in \mathtt{rt}\}\
\\
(\mathit{ordered}\ (\mathtt{Cel}\ \mathtt{x}\ \mathtt{a}\
\mathtt{lf}\ \mathtt{rt})\
\logand\ \mathtt{k} \in \mathtt{lf}\ \logand\ \mathtt{k} < \mathtt{x}\
\logand\ \mathtt{k} \not\in
\mathtt{rt}) \\
%(\mathit{ordered}\ (\mathtt{Cel}\ \mathtt{x}\ \mathtt{a}\
%\mathtt{lf}\ \mathtt{rt})\ \logand\ \mathtt{k} \in \mathtt{lf}\
%\logand\ \mathtt{k} < \mathtt{x}\ \logand\ \mathtt{k} \not\in
%\mathtt{rt}) \\
\quad \logimp\ \{\mathtt{dataElems}\ C, \mathtt{k} < \mathtt{x}\} \\