-
-
Notifications
You must be signed in to change notification settings - Fork 40
/
entry-tutorial.md
1203 lines (940 loc) · 45.4 KB
/
entry-tutorial.md
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
# Entry-level Tutorial on the Model Checker
**Difficulty: Blue trail – Easy**
In this tutorial, we show how to turn an implementation of binary search into a
TLA+ specification. This implementation is known to have an out-of-bounds
error, which once existed in Java, see [Nearly All Binary Searches and
Mergesorts are Broken][] by Joshua Bloch (2006). Our goal is to write a
specification after this implementation, not to write a specification of an
abstract binary search algorithm. You can find such a specification and a proof
in [Proving Safety Properties][] and [Binary search with a TLAPS proof][] by
[Leslie Lamport][] (2019).
This tutorial is written under the assumption that the reader does not have any
knowledge of TLA+ and Apalache. Since we are not diving into protocol and
algorithm specifications too quickly, this is a nice example to start with. We
demonstrate how to use Apalache to find errors that are caused by integer
overflow and the out-of-bounds error, which is caused by this overflow. We
also show that the same overflow error prevents the algorithm from terminating
in the number of steps that is expected from the binary search. Normally it is
expected that the binary search terminates in `log2(n)` steps, where `n` is the
length of the search interval.
Sometimes, we refer to the model checker TLC in this text. TLC is another
model checker for TLA+ and was introduced in the late 90s. If you are new
to TLA+ and want to learn more about TLC, check the [TLC][] project and the
[TLA+ Video Course][] by Leslie Lamport. If you are an experienced TLC user,
you will find this tutorial helpful too, as it demonstrates the strong points
of Apalache.
## Related documents
- [Tutorial on Snowcat][] shows how to write type annotations for Apalache.
- [TLA+ Cheatsheet in HTML][] summarizes the common TLA+ constructs. If you
prefer a printable version in pdf, check the [Summary of TLA+][].
## Setup
We assume that you have Apalache installed. If not, check the manual page on
[Apalache installation][]. The minimal required version is 0.22.0.
## Running example: Binary search
We are not going to explain the idea of binary search in this tutorial. If you
need more context on this, check the Wikipedia page on the [Binary search
algorithm][]. Let's jump straight into the Java code that is given in [Nearly All
Binary Searches and Mergesorts are Broken][]:
```java
1: public static int binarySearch(int[] a, int key) {
2: int low = 0;
3: int high = a.length - 1;
4:
5: while (low <= high) {
6: int mid = (low + high) / 2;
7: int midVal = a[mid];
8:
9: if (midVal < key)
10: low = mid + 1
11: else if (midVal > key)
12: high = mid - 1;
13: else
14: return mid; // key found
15: }
16: return -(low + 1); // key not found.
17: }
```
As was found by Joshua Bloch, the addition in line 6 may throw
an out of bounds exception at line 7, due to an integer overflow. This is because `low`
and `high` are signed integers, with a maximum value of `2^31 - 1`.
However, the sum of two values, each smaller than `2^31-1`, may be greater than `2^31 -1`. If this is the case, `low + high` can wrap into a negative number.
This bug was
[discussed](https://groups.google.com/g/tlaplus/c/msLltIcexF4/m/qnABiKJmDgAJ)
in the TLA+ User Group in 2015. Let's see how TLA+ and Apalache can help us
here. A bit of warning: The final TLA+ specification will happen to be longer
than the 17 lines above. Don't get disappointed too fast. There are several
reasons for that:
1. TLA+ is not tuned towards one particular class of algorithms, e.g.,
sequential algorithms.
2. Related to the previous point, TLA+ and Apalache are not tuned to C or Java
programs. A software model checker such as [CBMC][], [Stainless][], or
[Coral][] would probably accept a shorter program, and it would check it
faster. However, if you have a sledgehammer like TLA+, you don't have to learn
other languages.
3. We explicitly state the expected properties of the algorithm to be checked
by Apalache. In imperative languages, these properties are usually omitted or
written as plain-text comments.
4. We have to introduce a bit of boilerplate, to make Apalache work.
## Step 0: Introducing a template module
*Source files for this step*:
[BinSearch0.tla](https://github.com/informalsystems/apalache/blob/main/test/tla/bin-search/BinSearch0.tla).
TLA+ is built around the concept of a state machine. The specified system
starts in a state that is picked from the set of its *initial states*. This
set of states is described with a predicate over states in TLA+. This predicate
is usually called `Init`. Further, the state machine makes a *transition* from
the current state to a successor state. These transitions are described with a
predicate over pairs of states `(current, successor)` in TLA+. This predicate
is usually called `Next`.
We start with the simplest possible specification of a single-state machine.
If we visualize it as a state diagram, it looks like follows:
![Tux, the Linux mascot](./img/single.drawio.svg)
Let's open a new file called `BinSearch0.tla` and type a very minimal module
definition:
```tla
{{#include ../../../test/tla/bin-search/BinSearch0.tla:1:8}}
```
This module does not yet specify any part of the binary search implementation. However, it contains a few important things:
- It imports constants and operators from three standard modules: `Integers`,
`Sequences`, and `Apalache`.
- It declares the predicate `Init`. This predicate describes the initial
states of our state machine. Since we have not declared any variables, it
defines the single possible state.
- It declares the predicate `Next`. This predicate describes the transitions
of our state machine. Again, there are no variables and `Next == TRUE`, so
this transition defines the entire set of states as reachable in a single
step.
Now it is a good time to check that Apalache works. Run the following command:
```sh
$ apalache-mc check BinSearch0.tla
```
The tool output is a bit verbose. Below, you can see the important lines of the
output:
```
...
PASS #13: BoundedChecker
Step 0: picking a transition out of 1 transition(s)
Step 1: picking a transition out of 1 transition(s)
...
Step 10: picking a transition out of 1 transition(s)
The outcome is: NoError
Checker reports no error up to computation length 10
...
```
We can see that Apalache runs without finding an error, as expected.
If you are curious, replace `TRUE` with `FALSE` in either `Init` or `Next`,
run Apalache again and observe what happens.
It is usually a good idea to start with a spec like `BinSearch0.tla`, to ensure
that the tools are working.
## Step 1: Introducing specification parameters
*Source files for this step*:
[BinSearch1.tla](https://github.com/informalsystems/apalache/blob/main/test/tla/bin-search/BinSearch1.tla).
*Diffs*: [BinSearch1.tla.patch](https://github.com/informalsystems/apalache/blob/main/test/tla/bin-search/BinSearch1.tla.patch).
The Java code of `binarySearch` accepts two parameters: an array of integers
called `a`, and an integer called `key`. Similar to these parameters, we introduce
two specification parameters (called `CONSTANTS` in TLA+):
- the input sequence `INPUT_SEQ`, and
- the element to search for `INPUT_KEY`.
```tla
{{#include ../../../test/tla/bin-search/BinSearch1.tla:1:1}}
{{#include ../../../test/tla/bin-search/BinSearch1.tla:8:18}}
```
Importantly, the constants `INPUT_SEQ` and `INPUT_KEY` are prefixed with type
annotations in the comments:
- `INPUT_SEQ` has the type `Seq(Int)`, that is, it is a sequence of integers (sequences in TLA+ are indexed), and
- `INPUT_KEY` has the type `Int`, that is, it is an integer.
Recall that we wanted to specify signed and unsigned Java integers, which are
32 bit long. *TLA+ is not tuned towards any computer architecture.* Its integers
are mathematical integers: always signed and arbitrarily large (unbounded).
To model fixed bit-width integers, we introduce another constant `INT_WIDTH` of
type `Int`:
```tla
{{#include ../../../test/tla/bin-search/BinSearch1.tla:19:22}}
```
The benefit of defining the bit width as a parameter is that we can try
our specification for various bit widths of integers: 4-bit, 8-bit, 16-bit, 32-bit,
etc. Similar to many programming languages, we introduce several constant
definitions (`a^b` is `a` taken to the power of `b`):
```tla
{{#include ../../../test/tla/bin-search/BinSearch1.tla:24:30}}
```
Note that these definitions do not constrain integers in any way. They are
simply convenient names for the constants that we will need in the
specification.
To make sure that the new specification does not contain syntax errors or
type errors, execute:
```sh
$ apalache-mc check BinSearch1.tla
```
## Step 2: Specifying the base case
*Source files for this step*:
[BinSearch2.tla](https://github.com/informalsystems/apalache/blob/main/test/tla/bin-search/BinSearch2.tla).
*Diffs*: [BinSearch2.tla.patch](https://github.com/informalsystems/apalache/blob/main/test/tla/bin-search/BinSearch2.tla.patch).
We start with the simplest possible case that occurs in `binarySearch`. Namely,
we consider the case where `low > high`, that is, `binarySearch` never enters
the loop.
**Introduce variables**. To do that, we have to finally introduce some
variables. Obviously, we have to introduce variables `low` and `high`. This is
how we do it:
```tla
{{#include ../../../test/tla/bin-search/BinSearch2.tla:32:38}}
```
The variables `low` and `high` are called *state variables*. They define a state of our state machine. That is, they are never introduced and
never removed. Remember, TLA+ is not tuned towards any particular computer
architecture and thus it does not even have the notion of an execution stack.
You can think of `low` and `high` as being global variables. Yes, global
variables are generally frowned upon in programming languages. However, when dealing with a
specification, they are much easier to reason about than the execution stack.
We will demonstrate how to introduce local definitions later in this tutorial.
We introduce two additional variables, the purpose of which might be less obvious:
```tla
{{#include ../../../test/tla/bin-search/BinSearch2.tla:39:44}}
```
The variable `isTerminated` indicates whether our search has terminated. Why do
we even have to introduce it? Because, some computer systems are not designed
with termination in mind. For instance, such distributed systems as the
Internet and Bitcoin are designed to periodically serve incoming requests
instead of producing a single output for a single input.
If we want to specify the Internet or Bitcoin, do we
understand what it means for them to terminate?
The variable `returnValue` will contain the result of the binary search, when
the search terminates. Recall, there is no execution stack. Hence, we introduce
the variable `returnValue` right away. The downside is that we have to do
book-keeping for this variable.
**Initialize variables.** Having introduced the variables, we have to
initialize them. That is, we want to specify lines 2-3 of the Java code:
```java
1: public static int binarySearch(int[] a, int key) {
2: int low = 0;
3: int high = a.length - 1;
...
17: }
```
To this end, we change the body of the predicate `Init` to the following:
```tla
Init ==
low = 0 /\ high = Len(INPUT_SEQ) - 1 /\ isTerminated = FALSE /\ returnValue = 0
```
You probably have guessed, what the above line means. Maybe you are a bit
puzzled about the mountain-like operator `/\`. It is called *conjunction*,
which is usually written as `&&` or `and` in programming languages. The
important effect of the above expression is that every variable in the
left-hand side of `=` is required to have the value specified in the right-hand
side of `=`[^assignment-order].
As it is hard to fit many expressions in one line, TLA+ offers special syntax
for writing a big conjunction. Here is the standard way of writing `Init`
(indentation is important):
```tla
{{#include ../../../test/tla/bin-search/BinSearch2.tla:46:51}}
```
The above lines do not deserve a lot of explanation. As you have probably guessed,
`Len(INPUT_SEQ)` computes the length of the input sequence.
[^assignment-order]: It is important to know that TLA+ does not impose any
particular order of evaluation for `/\`. However, both Apalache and TLC
evaluate some expressions of the form `x = e` in the initialization predicate
as assignments. Hence, it is often a good idea to think about `/\` as being
evaluated from left to right.
**Update variables.** Having done all the preparatory work, we are now ready to
specify the behavior in lines 5 and 16 of `binarySearch`.
```java
1: public static int binarySearch(int[] a, int key) {
2: int low = 0;
3: int high = a.length - 1;
4:
5: while (low <= high) {
...
15: }
16: return -(low + 1); // key not found.
17: }
```
To this end, we redefine `Next` as follows:
```tla
{{#include ../../../test/tla/bin-search/BinSearch2.tla:53:62}}
```
Most likely, you have no problem reading this definition, except for the part
that includes `isTerminated'`, `returnValue'`, and `UNCHANGED`. Recall that a
transition predicate, like `Next`, specifies the relation between two states of
the state machine; the current state, the variables of which are referenced by
unprimed names, and the successor-state, the variables of which are referenced
by primed names.
The expression `isTerminated' = TRUE` means that only states where
`isTerminated` equals `TRUE` can be successors of the current state. In
general, `isTerminated'` could also depend on the value of `isTerminated`, but
here, it does not. Likewise, `returnValue' = -(low + 1)` means that
`returnValue` has the value `-(low + 1)` in the next state. The expression
`UNCHANGED <<low, high>>` is a convenient shortcut for writing `low' = low /\
high' = high`. Readers unfamiliar with specification languages might question
the purpose of `UNCHANGED`, since in most programming languages variables only
change when they are explicitly changed. However, a transition predicate, like
`Next`, establishes a relation between pairs of states. If we were to omit
`UNCHANGED`, this would mean that we consider states in which `low` and `high`
have _completely arbitrary_ values as valid successors. This is clearly not
how Java code should behave. To encode Java semantics, we must therefore
explicitly state that `low` and `high` do not change in this step.
It is important to understand that an expression like `returnValue' = -(low +
1)` *does not immediately update* the variable on the left-hand side. Hence,
`returnValue` still refers to the value in the state before evaluation of
`Next`, whereas `returnValue'` refers to the value in the state that is
computed after evaluation of `Next`. You can think of the effect of `x' = e`
being delayed until the whole predicate `Next` is evaluated.
## Step 2b: Basic checks for the base case
*Source files for this step*:
[BinSearch2.tla](https://github.com/informalsystems/apalache/blob/main/test/tla/bin-search/BinSearch2.tla)
and
[MC2_8.tla](https://github.com/informalsystems/apalache/blob/main/test/tla/bin-search/MC2_8.tla).
As we discussed, it is a good habit to periodically run the model checker, as you are writing
the specification. Even if it doesn't check much, you would be able to catch
the moment when the model checker slows down. This may give you a useful
hint about changing a few things before you have written too much code.
Let us check `BinSearch2.tla`:
```sh
$ apalache-mc check BinSearch2.tla
```
If it is your first TLA+ specification, you may be surprised by this error:
```
...
PASS #13: BoundedChecker
This error may show up when CONSTANTS are not initialized.
Check the manual: https://apalache.informal.systems/docs/apalache/parameters.html
Input error (see the manual): SubstRule: Variable INPUT_SEQ is not assigned a value
...
```
Apalache complains that we have declared several constants (`INPUT_SEQ`,
`INPUT_KEY`, and `INT_WIDTH`), but we have never defined them.
**Adding a model file.** The standard approach in this case is either to fix
all constants, or to introduce another module that
fixes the parameters and instantiates the general specification. Although
Apalache supports [TLC Configuration Files][], for the purpose of this tutorial,
we will stick to tool-agnostic TLA+ syntax.
To this end, we add a new file `MC2_8.tla` with the following contents:
```tla
{{#include ../../../test/tla/bin-search/MC2_8.tla}}
```
As you can see, we fix the values of all parameters. We are instantiating
the module `BinSearch2` with these fixed parameters. Since instantiation
requires all constants and variables to be defined, we copy the variables
definitions from `BinSearch2.tla`.
Since we are fixing the parameters with concrete values, `MC2_8.tla` looks very
much like a unit test. It's a good start for debugging a few things, but since
our program is entirely sequential, our specification is as good as a unit
test. Later in this tutorial we will show how to leverage Apalache to check
properties for all possible inputs (up to some bound).
Let us check `MC2_8.tla`:
```sh
$ apalache-mc check MC2_8.tla
...
Checker reports no error up to computation length 10
```
This time Apalache has not complained. This is a good time to stop and think
about whether the model checker has told us anything interesting. Kind of. It
told us that it has not found any contradictions. But it did not tell us
anything interesting about our expectations. Because we have not set our
expectations yet!
## Step 3: Specifying an invariant and checking it for the base case
*Source files for this step*:
[BinSearch3.tla](https://github.com/informalsystems/apalache/blob/main/test/tla/bin-search/BinSearch3.tla)
and
[MC3_8.tla](https://github.com/informalsystems/apalache/blob/main/test/tla/bin-search/MC3_8.tla).
*Diffs*:
[BinSearch3.tla.patch](https://github.com/informalsystems/apalache/blob/main/test/tla/bin-search/BinSearch3.tla.patch)
and
[MC3_8.tla.patch](https://github.com/informalsystems/apalache/blob/main/test/tla/bin-search/MC3_8.tla.patch).
What do we expect from binary search? We can check the Java documentation,
e.g., [Arrays.java in OpenJDK][]:
> ...the return value will be >= 0 if and only if the key is found.
This property is actually quite easy to write in TLA+. First, we
introduce the property that we call `ReturnValueIsCorrect`:
```tla
{{#include ../../../test/tla/bin-search/BinSearch3.tla:71:83}}
```
Let us decompose this property into smaller pieces. First, we define the set
`MatchingIndices`:
```tla
{{#include ../../../test/tla/bin-search/BinSearch3.tla:75:77}}
```
With this TLA+ expression we define a *local constant* called `MatchingIndices`
that is equal to the set of indices `i` in `INPUT_SEQ` so that the sequence
elements at these indices are equal to `INPUT_KEY`. If this syntax
is hard to parse for you, here is how we could write a similar definition in a
functional programming language (Scala):
```scala
val MatchingIndices =
INPUT_SEQ.indices.toSet.filter { i => INPUT_SEQ(i) == INPUT_KEY }
```
Since the
sequence indices in TLA+ start with 1, we require that `returnValue + 1`
belongs to `MatchingIndices` when `MatchingIndices` is non-empty. If
`MatchingIndices` is empty, we require `returnValue` to be negative.
We can check that the property `ReturnValueIsCorrect` is an *invariant*, that
is, it holds in every state that is reachable from the states specified by `Init`
via a sequence of transitions specified by `Next`:
```sh
$ apalache-mc check --inv=ReturnValueIsCorrect MC3_8.tla
```
This property is violated in the initial state. To see why, check the file
`counterexample1.tla`.
Actually, we only expect this property to hold after the computation terminates,
that is, when `isTerminated` equals to `TRUE`. Hence, we add the following
invariant:
```tla
{{#include ../../../test/tla/bin-search/BinSearch3.tla:84:86}}
```
**Digression: Boolean connectives.** In the above code, the operator `=>` is
the [Classical implication][]. In general, `A => B` is equivalent to `IF A THEN
B ELSE TRUE`. The implication `A => B` is also equivalent to the TLA+
expression `~A \/ B`, which one can read as "not A holds, or B holds". The
operator `\/` is called *disjunction*. As a reminder, here is the standard
truth table for the Boolean connectives, which are no different from the
Boolean logic in TLA+:
| `A` | `B` | `~A` | `A \/ B` | `A /\ B` | `A => B` |
| ------- | ------- | ------- | -------- | -------- | -------- |
| `FALSE` | `FALSE` | `TRUE` | `FALSE` | `FALSE` | `TRUE` |
| `FALSE` | `TRUE` | `TRUE` | `TRUE` | `FALSE` | `TRUE` |
| `TRUE` | `FALSE` | `FALSE` | `TRUE` | `FALSE` | `FALSE` |
| `TRUE` | `TRUE` | `FALSE` | `TRUE` | `TRUE` | `TRUE` |
**Checking Postcondition.**
Let us check `Postcondition` on `MC3_8.tla`:
```sh
$ apalache-mc check --inv=Postcondition MC3_8.tla
```
This property holds true. However, it's a small win, as `MC3_8.tla` fixes all
parameters. Hence, we have checked the property just for one data point. In
Step 5, we will check `Postcondition` for all sequences admitted by `INT_WIDTH`.
## Step 4: Specifying the loop (with a caveat)
*Source files for this step*:
[BinSearch4.tla](https://github.com/informalsystems/apalache/blob/main/test/tla/bin-search/BinSearch4.tla)
and
[MC4_8.tla](https://github.com/informalsystems/apalache/blob/main/test/tla/bin-search/MC4_8.tla).
*Diffs*:
[BinSearch4.tla.patch](https://github.com/informalsystems/apalache/blob/main/test/tla/bin-search/BinSearch4.tla.patch)
and
[MC4_8.tla.patch](https://github.com/informalsystems/apalache/blob/main/test/tla/bin-search/MC4_8.tla.patch).
We specify the loop of `binarySearch` in TLA+ as follows:
```tla
{{#include ../../../test/tla/bin-search/BinSearch4.tla:55:77}}
```
Let's start with these two definitions:
```tla
{{#include ../../../test/tla/bin-search/BinSearch4.tla:60:61}}
```
As you have probably guessed, we define two (local) values `mid` and `midVal`.
The value `mid` is the average of `low` and `high`. The operator `\div` is
simply integer division, which is usually written as `/` or `//` in programming
languages. The value `midVal` is the value at the location `mid + 1`. Since
the TLA+ sequence `INPUT_SEQ` has indices in the range `1..Len(INPUT_SEQ)`,
whereas we are computing zero-based indices, we are adjusting the index by one,
that is, we write `INPUT_SEQ[mid + 1]` instead of `INPUT_SEQ[mid]`.
*Warning: The definitions of `mid` and `midVal` do not properly reflect the
Java code of `binarySearch`. We will fix them later. It is a good exercise
to stop here and think about the source of this imprecision.*
The following lines look like ASCII graphics, but by now you should know
enough to read them:
```tla
{{#include ../../../test/tla/bin-search/BinSearch4.tla:60:71}}
```
These lines are the indented form of `\/` for three cases:
- when `midVal < INPUT_KEY`, or
- when `midVal > INPUT_KEY`, or
- when `midVal = INPUT_KEY`.
We could write these expressions with `IF-THEN-ELSE` or even with the TLA+
operator `CASE` (see [Summary of TLA+][]). However, we find the disjunctive
form to be the least cluttered, though unusual.
Now we can check the postcondition again:
```sh
$ apalache-mc check --inv=Postcondition MC4_8.tla
```
The check goes through, but did it do much? Recall, that we fixed `INPUT_SEQ`
to be the empty sequence `<< >>` in `MC4_8.tla`. Hence, we never enter the loop
we have just specified.
Actually, Apalache gives us a hint that it never tries some of the
cases:
```
...
PASS #13: BoundedChecker
State 0: Checking 1 state invariants
Step 0: picking a transition out of 1 transition(s)
Step 1: Transition #0 is disabled
Step 1: Transition #1 is disabled
Step 1: Transition #2 is disabled
Step 1: Transition #3 is disabled
State 1: Checking 1 state invariants
Step 1: picking a transition out of 1 transition(s)
Step 2: Transition #0 is disabled
Step 2: Transition #1 is disabled
Step 2: Transition #2 is disabled
Step 2: Transition #4 is disabled
Step 2: picking a transition out of 1 transition(s)
...
```
**Digression: symbolic transitions.** Internally, Apalache decomposes the
predicates `Init` and `Next` into independent pieces like `Init == Init$0 \/
Init$1` and `Next == Next$0 \/ Next$1 \/ Next$2 \/ Next$3`. If you want to see
how it is done, run Apalache with the options `--write-intermediate` and `--run-dir`:
```sh
$ apalache-mc check --inv=Postcondition --write-intermediate=1 --run-dir=out MC4_8.tla
```
Check the file `out/intermediate/XX_OutTransitionFinderPass.tla`, which contains the
preprocessed specification that has `Init` and `Next` decomposed. You can find
a detailed explanation in the section on [Assignments in Apalache][].
## Step 5: Checking Postcondition for plenty of inputs
*Source files for this step*:
[BinSearch5.tla](https://github.com/informalsystems/apalache/blob/main/test/tla/bin-search/BinSearch5.tla)
and
[MC5_8.tla](https://github.com/informalsystems/apalache/blob/main/test/tla/bin-search/MC5_8.tla).
*Diffs*:
[BinSearch5.tla.patch](https://github.com/informalsystems/apalache/blob/main/test/tla/bin-search/BinSearch5.tla.patch)
and
[MC5_8.tla.patch](https://github.com/informalsystems/apalache/blob/main/test/tla/bin-search/MC5_8.tla.patch).
In this step, we are going to check the invariant `Postcondition` for all
possible input sequences and all input keys (for a fixed bit-width).
Create the file `MC5_8.tla` with the following contents:
```tla
{{#include ../../../test/tla/bin-search/MC5_8.tla::34}}
==================
```
Note that we introduced `INPUT_SEQ` and `INPUT_KEY` as parameters again. We
cannot check `MC5_8.tla` just like that. If we try to check `MC5_8.tla`,
Apalache would complain again about a value missing for `INPUT_SEQ`.
To check the invariant for all sequences, we will use two advanced features
of Apalache: [ConstInit predicate][] and [Value generators][].
**ConstInit.** This idiom allows us to initialize `CONSTANTS` with a TLA+
formula. Let us introduce the following operator definition in `MC5_8.tla`:
```tla
ConstInit ==
/\ INPUT_KEY \in Int
\* Seq(Int) is a set of all sequences that have integers as elements
/\ INPUT_SEQ \in Seq(Int)
```
This is straightforward definition. However, it does not work in Apalache:
```sh
$ apalache-mc check --cinit=ConstInit --inv=Postcondition MC5_8.tla
...
MC5_8.tla:39:22-39:29: unsupported expression: Seq(_) produces an infinite set of unbounded sequences.
Checker has found an error
...
```
The issue with our definition of `ConstInit` is that it requires the model
checker to reason about the infinite set of sequences, namely, `Seq(Int)`.
Interestingly, the model checking does not complain about the expression
`INPUT_KEY \in Int`. The reason is that this expression requires the model
checker to reason about one integer, though it ranges over the infinite set of
integers.
**Value generators.** Fortunately, this problem can be easily circumvented by
using Apalache [Value generators][][^generators].
Let us rewrite `ConstInit` in `MC5_8.tla` as follows:
```tla
ConstInit ==
/\ INPUT_KEY = Gen(1)
/\ INPUT_SEQ = Gen(MAX_INT)
```
In this new version, we use the Apalache operator `Gen` to:
- produce an unrestricted integer to be used as a value of `INPUT_KEY` and
- produce a sequence of integers to be used as a value of `INPUT_SEQ`. This
sequence is unrestricted, except its length is bounded with `MAX_INT`,
which is exactly what we need in our case study.
The operator `Gen` introduces a data structure of proper type whose size is
bounded with the argument of `Gen`. For instance, the type of `INPUT_SEQ` is
the sequence of integers, and thus `Gen(MAX_INT)` produces an unrestricted
sequence of up to `MAX_INT` elements. This sequence is bound to the name
`INPUT_SEQ`. For details, see [Value generators][]. This lets Apalache check
all instances of the data structure, without enumerating the instances!
By doing so, we are able to check the specification for all the inputs, when we
fix the bit width. To quickly get feedback from Apalache, we fix `INT_WIDTH` to 8 in the model `MC5_8.tla`.
[^generators]: If you know property-based testing, e.g., [QuickCheck][],
Apalache generators are inspired by this idea. In contrast to property-based
testing, an Apalache generator is not randomly producing values. Rather,
Apalache simply introduces an unconstrained data structure (e.g., a set, a
function, or a sequence) of the proper type. Hence, Apalache is reasoning about
all possible instances of this data structure, instead of reasoning about a
small set of randomly chosen instances.
Let us check `Postcondition` again:
```sh
$ apalache-mc check --cinit=ConstInit --inv=Postcondition MC5_8.tla
...
State 2: state invariant 0 violated. Check the counterexample in:
/[a long path]/counterexample1.tla
...
```
Let us inspect the counterexample:
```tla
---------------------------- MODULE counterexample ----------------------------
EXTENDS MC5_8
(* Constant initialization state *)
ConstInit == INPUT_KEY = -1 /\ INPUT_SEQ = <<0, -1>>
(* Initial state *)
State0 ==
INPUT_KEY = -1
/\ INPUT_SEQ = <<0, -1>>
/\ high = 1
/\ isTerminated = FALSE
/\ low = 0
/\ returnValue = 0
(* Transition 1 to State1 *)
State1 ==
INPUT_KEY = -1
/\ INPUT_SEQ = <<0, -1>>
/\ high = -1
/\ isTerminated = FALSE
/\ low = 0
/\ returnValue = 0
(* Transition 3 to State2 *)
State2 ==
INPUT_KEY = -1
/\ INPUT_SEQ = <<0, -1>>
/\ high = -1
/\ isTerminated = TRUE
/\ low = 0
/\ returnValue = -1
...
```
Is it a real issue? It is, but it is not the issue of
the search, rather our invariant `Postcondition` is imprecise.
## Step 5b: Fixing the postcondition
*Source files for this step*:
[BinSearch5.tla](https://github.com/informalsystems/apalache/blob/main/test/tla/bin-search/BinSearch5.tla)
and
[MC5_8.tla](https://github.com/informalsystems/apalache/blob/main/test/tla/bin-search/MC5_8.tla).
If we check our source of truth, that is, the Java documentation in
[Arrays.java in OpenJDK][], we will see the following sentences:
The range must be sorted (as by the {@link #sort(int[], int, int)} method)
prior to making this call. If it is not sorted, the results are undefined.
If the range contains multiple elements with the specified value, there is
no guarantee which one will be found.
It is quite easy to add this constraint [^no-pre]. This is where TLA+ starts to
shine:
```tla
{{#include ../../../test/tla/bin-search/BinSearch5.tla:80:88}}
...
{{#include ../../../test/tla/bin-search/BinSearch5.tla:109:112}}
```
If we check `PostconditionSorted`, we do not get any error after 10 steps:
```sh
$ apalache-mc check --cinit=ConstInit --inv=PostconditionSorted MC5_8.tla
...
The outcome is: NoError
Checker reports no error up to computation length 10
```
It takes some time to explore all executions of length up to 10 steps, for all
input sequences of length up to `2^7 - 1` arbitrary integers. If we think about
it, the model checker managed to crunch infinitely many numbers in several
hours. Not bad.
*Exercise.* If you are impatient, you can check `PostconditionSorted` for the
configuration that has integer width of 4 bits. It takes only a few seconds to
explore all executions.
[^no-pre]: Instead of checking whether `INPUT_SEQ` is sorted in the
post-condition, we could restrict the constant `INPUT_SEQ` to be sorted in
every execution. That would effectively move this constraint into the
pre-condition of the search. Had we done that, we would not be able to observe
the behavior of the search on the unsorted inputs. An important property is
whether the search is terminating on all inputs.
## Step 6: Checking termination and progress
*Source files for this step*:
[BinSearch6.tla](https://github.com/informalsystems/apalache/blob/main/test/tla/bin-search/BinSearch6.tla)
and
[MC6_8.tla](https://github.com/informalsystems/apalache/blob/main/test/tla/bin-search/MC6_8.tla).
*Diffs*:
[BinSearch6.tla.patch](https://github.com/informalsystems/apalache/blob/main/test/tla/bin-search/BinSearch6.tla.patch)
and
[MC6_8.tla.patch](https://github.com/informalsystems/apalache/blob/main/test/tla/bin-search/MC6_8.tla.patch).
Actually, we do not need 10 steps to check termination for the case `INT_WIDTH
= 8`. If you recall the complexity of the binary search, it needs
`ceil(log2(Len(INPUT_SEQ)))` steps to terminate.
To check this property, we add the number of steps as a variable in
`BinSearch6.tla` and in `MC6_8.tla`:
```tla
VARIABLES
...
\* The number of executed steps.
\* @type: Int;
nSteps
```
Also, we update `Init` and `Next` in `BinSearch6.tla` as follows:
```tla
Init ==
...
/\ nSteps = 0
Next ==
IF ~isTerminated
THEN IF low <= high
THEN \* lines 6-14
/\ nSteps' = nSteps + 1
/\ LET mid == (low + high) \div 2 IN
...
ELSE \* line 16
/\ isTerminated' = TRUE
/\ returnValue' = -(low + 1)
/\ UNCHANGED <<low, high, nSteps>>
ELSE \* isTerminated
UNCHANGED <<low, high, returnValue, isTerminated, nSteps>>
```
Having `nSteps`, we can write the `Termination` property:
```tla
\* We know the exact number of steps to show termination.
Termination ==
(nSteps >= INT_WIDTH) => isTerminated
```
Let us check `Termination`:
```sh
$ apalache-mc check --cinit=ConstInit --inv=Termination MC6_8.tla
...
Checker reports no error up to computation length 10
It took me 0 days 0 hours 0 min 19 sec
```
Even if did not know precisely complexity of the binary search, we could
write a simpler property, which demonstrates progress of the search:
```tla
Progress ==
~isTerminated' => (low' > low \/ high' < high)
```
It takes about 10 seconds to check `Progress` as well.
## Step 7: Fixed-width integers
*Source files for this step*:
[BinSearch7.tla](https://github.com/informalsystems/apalache/blob/main/test/tla/bin-search/BinSearch7.tla)
and
[MC7_8.tla](https://github.com/informalsystems/apalache/blob/main/test/tla/bin-search/MC7_8.tla).
*Diffs*:
[BinSearch7.tla.patch](https://github.com/informalsystems/apalache/blob/main/test/tla/bin-search/BinSearch7.tla.patch)
and
[MC7_8.tla.patch](https://github.com/informalsystems/apalache/blob/main/test/tla/bin-search/MC7_8.tla.patch).
Do you recall that our specification of the loop had a caveat? Let us have a
look at this piece of the specification again:
```tla
{{#include ../../../test/tla/bin-search/BinSearch6.tla:63:78}}
```
You can see that all arithmetic operations are performed over TLA+ integers,
that is, unbounded integers. We have to implement fixed-width integers ourselves.
Fortunately, we do not have to implement the whole set of integer operators,
but only the addition over signed integers, which has a potential to overflow.
To this end, we have to recall how signed integers are represented in modern
computers, see [Two's complement][]. Fortunately, we do not have to worry about
an efficient implementation of integer addition. We simply use addition over
unbounded integers to implement addition over fixed-width integers:
```tla
{{#include ../../../test/tla/bin-search/BinSearch7.tla:54:65}}
```
Having defined `IAdd`, we replace addition over unbounded integers with `IAdd`:
```tla
{{#include ../../../test/tla/bin-search/BinSearch7.tla:76:93}}
...
```
This finally gives us a specification that faithfully represents the Java code
of `binarySearch`. Now we can check all expected properties once again:
```sh
$ apalache-mc check --cinit=ConstInit --inv=PostconditionSorted MC7_8.tla
...
State 2: state invariant 0 violated.
...
Total time: 2.786 sec
```
```sh
$ apalache-mc check --cinit=ConstInit --inv=Progress MC7_8.tla
...
State 1: action invariant 0 violated.
...
Total time: 2.935 sec
```
```sh
$ apalache-mc check --cinit=ConstInit --inv=Termination MC7_8.tla
...
State 8: state invariant 0 violated.
...
Total time: 39.540 sec
```
As we can see, all of our invariants are violated. They all demonstrate the
issue that is caused by the integer overflow!
## Step 8: Checking the boundaries
*Source files for this step*:
[BinSearch8.tla](https://github.com/informalsystems/apalache/blob/main/test/tla/bin-search/BinSearch8.tla)
and
[MC8_8.tla](https://github.com/informalsystems/apalache/blob/main/test/tla/bin-search/MC8_8.tla).
*Diffs*:
[BinSearch8.tla.patch](https://github.com/informalsystems/apalache/blob/main/test/tla/bin-search/BinSearch8.tla.patch)
and
[MC8_8.tla.patch](https://github.com/informalsystems/apalache/blob/main/test/tla/bin-search/MC8_8.tla.patch).
As we have seen in Step 7, the cause of all errors in `PostconditionSorted`,
`Termination`, and `Progress` is that the addition `low + high` overflows and
thus the expression `INPUT_SEQ[mid + 1]` accesses `INPUT_SEQ` outside of its
domain.
Why did Apalache not complain about access outside of the domain? Its behavior
is actually consistent with [Specifying Systems][] (p. 302):
> A function *f* has a domain DOMAIN *f*, and the value of *f*[*v*] is
> specified only if *v* is an element of DOMAIN *f*.
Hence, Apalache returns some value of a proper type, if `v` is outside of
`DOMAIN f`. As we have seen in Step 7, such a value would usually show up in a
counterexample. In the future, Apalache will probably have an automatic check
for out-of-domain access. For the moment, we can write such a check as a state
invariant. By propagating the conditions from `INPUT_SEQ[mid + 1]` up in `Next`,
we construct the following invariant:
```tla
{{#include ../../../test/tla/bin-search/BinSearch8.tla:144:150}}
```
Apalache finds a violation of this invariant in a few seconds:
```
$ apalache-mc check --cinit=ConstInit --inv=InBounds MC8_8.tla
...
State 1: state invariant 0 violated.
...
Total time: 3.411 sec
```
If we check `counterexample1.tla`, it contains the following values for `low`