add sections about sponge state, input and output, merkleization

TritonVM · Aug 11, 2023 · 2dcdad2 · 2dcdad2
1 parent ae3de5b
commit 2dcdad2
Showing 1 changed file with 48 additions and 5 deletions.
diff --git a/tips/tip-0008/tip-0008.md b/tips/tip-0008/tip-0008.md
@@ -38,11 +38,12 @@ The state of the VM is fully determined by a bunch of things divisible into thre
  - (a) the program,
  - (b) the instruction pointer, which lives in a register,
  - (b) the top 16 elements of the operational stack, which live in registers,
+ - (b) the 16 sponge state elements
  - (c) the number of elements that have been read from standard input, (the input itself is part of the claim)
  - (c) the number of elements that have been written to standard output, (the output itself is part of the claim)
- - (c) the other elements of the operational stack which are stored in OpStack memory,
- - (c) the entire JumpStack memory,
- - (c) the entire RAM.
+ - (d) the other elements of the operational stack which are stored in OpStack memory,
+ - (d) the entire JumpStack memory,
+ - (d) the entire RAM.
 
 ### (a) Program
 
@@ -76,7 +77,23 @@ To convert the state-carrying registers to memory, we assign addresses to them:
 | `st13`   | 13      |
 | `st14`   | 14      |
 | `st15`   | 15      |
-| `ip`     | 16      |
+| `sponge0`  | 16    |
+| `sponge1`  | 17    |
+| `sponge2`  | 18    |
+| `sponge3`  | 19    |
+| `sponge4`  | 20    |
+| `sponge5`  | 21    |
+| `sponge6`  | 22    |
+| `sponge7`  | 23    |
+| `sponge8`  | 24    |
+| `sponge9`  | 25    |
+| `sponge10` | 26    |
+| `sponge11` | 27    |
+| `sponge12` | 28    |
+| `sponge13` | 29    |
+| `sponge14` | 30    |
+| `sponge15` | 31    |
+| `ip`     | 32      |
 
 Naturally, the value of the memory object at those addresses corresponds to the value of the matching register.
 
@@ -89,7 +106,13 @@ To do this, we commit to $K'(X) = \frac{K(X)}{Z_{\{0,\ldots,16\}}(X)}$ and $V'(X
 
 Note that we do not even need to commit to $K(X)$ and $V(X)$ directly; we simulate their evaluation in $\alpha$ via $K(\alpha) = K'(\alpha) \cdot Z_{\{0,\ldots,16\}}(\alpha)$ and $V(\alpha) = V'(\alpha) \cdot Z_{\{0,\ldots,16\}}(X) + V_I(\alpha)$.
 
-### (c) Memory
+### (c) Input and Output
+
+The input evaluation arguments for proofs of consecutive segments need not be linked. What matters is that these evaluation arguments work relative to the correct subsequences of input symbols. To facilitate this, the segment claim has two additional fields relative to the whole claim: a start and stop index for reading input symbols. When read in combination with the claim for the whole computation, it is easy to select the correct substring of field elements. When merging consecutive segment proofs, it is possible that the stop index of the former equals the start index of the latter.
+
+An analogous discussion holds for the output evaluation arguments, and thus introduces two more index fields onto the segment claim.
+
+### (d) Memory
 
 **1. Representation.**
 
@@ -179,10 +202,30 @@ If it is possible to commit to the execution traces of all segments, then it is
 
 In this case, the random point $\alpha$ can be sampled from the Merkle root of the tree built from the commitments to execution traces of all segments. In every segment proof, the prover sends the commitment to the base table along with its Merkle authentication path in this data structure.
 
+While this approach reduces the overall complexity somewhat, the downside is that the computation has to finish before you can start proving.
+
 **9. Combining memories**
 
 In Triton VM there are three memory tables, and that's not counting the register state which is also encoded as memory polynomials. The larger number 4 gives rise to the question whether it is possible to merge memories and prove their integral continuation with a single continuation argument rather than separately.
 
 Memories are combined by assigning them to distinct address spaces using an extension field. So for instance, RAM memory is stored in addresses of the form $(* ,1)$ whereas OpStack uses $(* ,2)$. With this approach there only needs to be one pair of incoming continuation polynomials, one pair of outgoing continuation polynomials, and one pair of remainder polynomials – but they must be defined over the extension field.
 
 In regards to the trackers, it is ill-advised to combine them because they may require updates from distinct memory operations in the same row.
+
+## Large Memory
+
+The technique described here is ill-suited when a large number of memory cells are touched. The reason is that the degrees of the memory polynomials is essentially equal to the number of addresses. It's not uncommong for computations to touch gigabytes of memory, but even assuming for simplicity (and contrary to fact) that we can store 8 bytes in a field element, a memory polynomial storing 1 GiB of data would have degree roughly $2^{27}$.
+
+There are two strategies for accomodating continuations for computations that touch a lot of memory explained below. Neither one requires a modification to the architecture of Triton VM as they rely only on clever programming.
+
+### Merkleization
+
+Divide the memory into pages and put the pages into a sparse Merkle tree. Store the current page in its entirety in RAM but once you need access to another page, you need to page out and page in.
+
+To page out, hash the page stored in RAM to obtain the new Merkle leaf. Walk up the Merkle tree to modify the root accordingly.
+
+To page in, guess the page nondeterministically and hash it to compute the leaf. Walk up the Merkle tree to authenticate it against the root.
+
+### Disk Reads
+
+Use standard input and standard output communicate with the operating system, which performs disk reads and writes on the program's behalf. A separate proof system needs to establish that the list of disk reads and writes is authentic, but the point is that this argument is external to Triton VM and thus out of scope here.