TritonVM · jan-ferdinand · Oct 23, 2023 · Aug 10, 2023 · Aug 10, 2023 · Aug 10, 2023
diff --git a/tips/tip-0008/tip-0008.md b/tips/tip-0008/tip-0008.md
@@ -0,0 +1,231 @@
+# TIP 0008: Continuations
+
+| TIP            | 0008                                               |
+|:---------------|:---------------------------------------------------|
+| authors:       | Alan Szepieniec                  |
+| title:         | Continuations                                      |
+| status:        | draft                                              |
+| created:       | 2023-08-10                                         |
+| issue tracker: | <https://github.com/TritonVM/triton-vm/pull/219>   |
+
+**Abstract.**
+This note describes an architectural change to Triton VM that enables splitting the algebraic execution trace into separately provable segments and linking the resulting proofs while authenticating the transmitted state. As a result, you can parallelize the generation of proofs for long-running computations.
+
+## Introduction
+
+How do you parallelize the generation of a proof for a long-running computation? Easy, you split the execution trace into segments, prove the segments individually, and then merge the resulting proofs in a binary tree using recursion.
+
+Except, there is one missing piece in this description. How do you ensure that the state of the VM between two consecutive segments does not change?
+
+This note describes a way to commit to the state of the virtual machine in a way that enables linking two consecutive segment-proofs.
+
+At the heart of this technique lies the representation of memory as a pair of polynomials, $K(X)$ and $V(X)$. $K(X)$ is the smallest-degree polynomial that evaluates to zero in all addresses. $V(X)$ is the lowest-degree interpolant that takes the memory's value in those points.
+
+## Construction
+
+### Proof Types
+
+Depending on the nature of the segment in question, the proof type is one of:
+ - Standalone: no segmentation at all;
+ - ReceiveState: the segment is followed by another but has no predecessor;
+ - SendState: the segment follows after another but is the last one in a sequence;
+ - ReceiveAndSend: the segment is sandwiched between two others.
+
+### The State
+
+The state of the VM is fully determined by a bunch of things divisible into three categories:
+
+ - (a) the program,
+ - (b) the instruction pointer, which lives in a register,
+ - (b) the top 16 elements of the operational stack, which live in registers,
+ - (b) the 16 sponge state elements
+ - (c) the number of elements that have been read from standard input, (the input itself is part of the claim)
+ - (c) the number of elements that have been written to standard output, (the output itself is part of the claim)
+ - (d) the other elements of the operational stack which are stored in OpStack memory,
+ - (d) the entire JumpStack memory,
+ - (d) the entire RAM.
+
+### (a) Program
+
+The VM is initialized with the program hash on the bottom of its opstack. By making these five elements **read-only**, we get program integrity from opstack integrity and collision-resistance.
+
+The only way in which the program hash could be overwritten is through `swap` instructions when the opstack size is 21 or less. 
+
+**Constraints.** Todo.
+
+### (b) Registers
+
+To prove the integrity of the state-carrying registers across segments, we convert them to a memory object. At this point, they can be transmitted together with the other three memory objects.
+
+To convert the state-carrying registers to memory, we assign addresses to them: 
+
+| register | address |
+|----------|---------|
+| `st0`    | 0       |
+| `st1`    | 1       |
+| `st2`    | 2       |
+| `st3`    | 3       |
+| `st4`    | 4       |
+| `st5`    | 5       |
+| `st6`    | 6       |
+| `st7`    | 7       |
+| `st8`    | 8       |
+| `st9`    | 9       |
+| `st10`   | 10      |
+| `st11`   | 11      |
+| `st12`   | 12      |
+| `st13`   | 13      |
+| `st14`   | 14      |
+| `st15`   | 15      |
+| `sponge0`  | 16    |
+| `sponge1`  | 17    |
+| `sponge2`  | 18    |
+| `sponge3`  | 19    |
+| `sponge4`  | 20    |
+| `sponge5`  | 21    |
+| `sponge6`  | 22    |
+| `sponge7`  | 23    |
+| `sponge8`  | 24    |
+| `sponge9`  | 25    |
+| `sponge10` | 26    |
+| `sponge11` | 27    |
+| `sponge12` | 28    |
+| `sponge13` | 29    |
+| `sponge14` | 30    |
+| `sponge15` | 31    |
+| `ip`     | 32      |
+
+Naturally, the value of the memory object at those addresses corresponds to the value of the matching register.
+
+**Constraints.** Let $Z_{\{0,\ldots,16\}}(X)$ be the zerofier for $\{0, \ldots, 16\}$ and $I_V(X)$ be the interpolant that takes the values of the listed registers on their respective domain points. We need to show that both:
+
+ - $Z_{\{0,\ldots,16\}}(X)$ divides $K(X)$, and
+ - $Z_{\{0,\ldots,16\}}(X)$ divides $V(X) - I_V(X)$.
+
+To do this, we commit to $K'(X) = \frac{K(X)}{Z_{\{0,\ldots,16\}}(X)}$ and $V'(X) = \frac{V(X)-I_V(X)}{Z_{\{0,\ldots,16\}}(X)}$ by listing their coefficients in new dedicated columns in the base table. New columns in the extension table prove the correct evaluation of these polynomial.
+
+Note that we do not even need to commit to $K(X)$ and $V(X)$ directly; we simulate their evaluation in $\alpha$ via $K(\alpha) = K'(\alpha) \cdot Z_{\{0,\ldots,16\}}(\alpha)$ and $V(\alpha) = V'(\alpha) \cdot Z_{\{0,\ldots,16\}}(X) + V_I(\alpha)$.
+
+### (c) Input and Output
+
+The input evaluation arguments for proofs of consecutive segments need not be linked. What matters is that these evaluation arguments work relative to the correct subsequences of input symbols. To facilitate this, the segment claim has two additional fields relative to the whole claim: a start and stop index for reading input symbols. When read in combination with the claim for the whole computation, it is easy to select the correct substring of field elements. When merging consecutive segment proofs, it is possible that the stop index of the former equals the start index of the latter.
+
+An analogous discussion holds for the output evaluation arguments, and thus introduces two more index fields onto the segment claim.
+
+### (d) Memory
+
+**1. Representation.**
+
+Memory can be thought of as a key-value map. As such, we can commit to memory using two polynomial commitments, $K(X)$ and $V(X)$ such that for every element of memory $(k,v)$, $K(k) = 0$ and $V(k) = v$; and such that for every key $k$ not present in memory, $K(k) \neq 0$. Let $K(X)$ and $V(X)$ be the polynomials of smallest degree satisfying these descriptions. We call $K(X)$ the *key polynomial* and $V(X)$ the *value polynomial*.
+
+When used for the purpose of transmitting state from one segment to the next, $K(X)$ and $V(X)$ are called *continuation polynomials*. Every receiving segment has one pair of incoming continuation polynomials. Every sending segment has one pair of outgoing continuation polynomials.
+
+**2. Continuation**
+
+To prove that the continuation polynomials at the end of one segment are equal to the continuation polynomials at the start of the next segment, assert the equality of their evaluations in a random point $\alpha$. In order to get $\alpha$ non-interactively, the prover must commit to the execution traces of both segments.
+
+Specifically, let $B_A, B_B, B_C$ be the base tables of segments $A, B, C$. In order to obtain the point $\alpha$ for extending $A$, the prover must first send both the commitments to (or Merkle roots of) $B_A$ and to $B_B$. Then $\alpha \leftarrow \mathsf{H}(\mathsf{com}(B_A) \Vert \mathsf{com}(B_B))$ can be used to extend $A$.
+
+The same $\alpha$ can be used to extend the evaluation columns for $B$ that assert the integrity of its initial memory. However, the extension of $B$ needs a second random point $\beta$ obtained from $\beta \leftarrow \mathsf{H}(\mathsf{com}(B_B) \Vert \mathsf{com}(B_C))$.
+
+**3. Access**
+
+When a value read from memory appears nondeterministically on the stack, how to prove that it corresponds to the memory polynomials? Specifically, we want to show that:
+
+ - the address is set in the key polynomial: $(X - \mathtt{address}) | K(X)$; and
+ - the values polynomial assumes the correct value in that point: $(X - \mathtt{address}) | (V(X) - \mathtt{value})$.
+
+It is tempting to show that the quotient has low degree, but unfortunately it is too expensive to commit to a new quotient in every cycle.
+
+Instead, we do this. Recall that the memory table is sorted by address first and clock cycle second. Let $f_K(X), f_V(X)$ denote variable polynomials called *tracker polynomials*. They are associated with the memory table and their values evolve as follows. Initially they are equal to $K(X)$ and $V(X)$. In every first row of contiguous region of constant memory address, find the new values $f_K^*(X), f_V^*(X)$ as
+
+ - $f_K^*(X) = \frac{f_K(X)}{X-\mathtt{address}}$; and
+ - $f_V^*(X) = \frac{f_V(X) - \mathtt{value}\cdot\frac{f_K(X)}{f_K(\mathtt{address})}}{X - \mathtt{address}}$.
+
+These polynomials shrink in degree only if the first reads in each location are integral; and if even one read is not integral then this polynomial explodes in degree. By adding new extension columns to the memory table, we can track the values of $f_K(X)$ and $f_V(X)$ in the random point $\alpha$.
+
+Let $R_K(X)$ and $R_V(X)$ denote polynomials called *remainder polynomials*, defined as the result from applying these updates all the way down the memory table. They are known at the time of building the base table and are committed by listing their coefficients in new columns in the base table. In the extension table these terminal polynomials are evaluated in two points, $\alpha$ and $\beta$. The evaluation in $\alpha$ serves to link the remainder polynomials with the tracker polynomials and incoming continuation polynomials. The evaluation in $\beta$ is motivated in the next section.
+
+**4. Mutation**
+
+For every contiguous region of constant memory address, the existing constraints already establish the correct internal evolution. What is left to prove is that the final state of each addressed cell of memory is persisted into the outgoing continuation polynomials.
+
+The argument proceeds like *3. Access* but in reverse. Specifically, a pair of tracker polynomials are initially equal to the outgoing continuation polynomials. As we work from the bottom row up, whenever we encounter a new contiguous region of constant memory address, the new tracker polynomials $f_K(X)$, $f_V(X)$ are determined in terms of the old ones $f_K^*(X)$, $f_V^*(X)$ via
+
+ - $f_K^*(X) = f_K(X) \cdot (X - \mathtt{address})$; and
+ - $f_V^*(X) = f_V(X) \cdot (X - \mathtt{address}) + \mathtt{value} \cdot \frac{f_K(X)}{f_K(\mathtt{address})}$.
+
+The tracker columns track the values of $f_K(X)$ and $f_V(X)$ in $\beta$. The factor $f_K(X)$ guarantees that $f_V(X) = f_V^*(X)$ in all roots of $f_K(X)$ and the denominator normalizes this value so that the polynomial $\mathtt{value} \cdot \frac{f_K(X)}{f_K(\mathtt{address})}$ really does evaluate to $\mathtt{value}$ in $\mathtt{address}$.
+
+Let $R_K(X)$ and $R_V(X)$ denote the *remainder polynomials*, resulting from applying these updates all the way up the memory table. These polynomials are committed to in the base table which has a pair of columns listing their coefficients. The extension table evaluates these polynomials in two points, $\alpha$ and $\beta$. The evaluation in $\beta$ establishes the connection between the remainder polynomials on the one hand, and the tracker polynomials and outgoing continuation polynomials on the other. The evaluation of $\alpha$ is motivated in the previous section.
+
+**5. Wait – the remainder polynomials are equal?**
+
+For the key polynomials this should be obvious. Let $C_K^{in}(X)$ be the incoming continuation key polynomial, $C_K^{out}(X)$ the outgoing continuation key polynomial, $R_K(X)$ the remainder key polynomial, and $\{k_i\}$ the set of touched memory addresses. Let $P(X) = \prod_i X-k_i$. Then $R_K(X) = \frac{C_K^{in}(X)}{P(X)}$ and $C_K^{out}(X) = R_K(X) \cdot P(X)$ which is true iff $C_K^{in}(X) = C_K^{out}(X)$.
+
+To see why the remainder value polynomials ought to be equal, observe that the way in which $f_V(X)$ is updated in both cases keeps the values in the zeros of $R_K(X)$ the same. From this observation it follows that $\frac{C_V^{in}(X) - I_v^{in}(X)}{P(X)} = \frac{C_V^{out}(X) - I_v^{out}(X)}{P(X)} = R_V(X)$ for the lowest-degree interpolants $I_v^{in}(X)$ and $I_v^{out}(X)$ that agree with $C_V^{in}(X)$ and $C_V^{out}(X)$ respectively on $\{k_i\}$.
+
+**6. Summary**
+
+This construction introduces the following base columns:
+
+ - incoming continuation key polynomial coefficients
+ - incoming continuation value polynomial coefficients
+ - outgoing continuation key polynomial coefficients
+ - outgoing continuation value polynomial coefficients
+ - remainder key polynomial coefficients
+ - remainder value polynomial coefficients
+ - tracker key polynomial normalizers
+
+and the following extension columns:
+
+ - incoming continuation key polynomial evaluation in $\beta$
+ - incoming continuation value polynomial evaluation in $\beta$
+ - incoming tracker key polynomial value in $\beta$
+ - incoming tracker value polynomial value in $\beta$
+ - outgoing continuation key polynomial evaluation in $\alpha$
+ - outgoing continuation value polynomial evaluation in $\alpha$
+ - outgoing tracker key polynomial value in $\alpha$
+ - outgoing tracker value polynomial value in $\alpha$
+ - remainder key polynomial evaluation in $\alpha$
+ - remainder key polynomial evaluation in $\beta$
+ - remainder value polynomial evaluation in $\alpha$
+ - remainder value polynomial evaluation in $\beta$.
+
+**7. Constraints**
+
+Todo.
+
+**8. Globally simultaneous continuation**
+
+If it is possible to commit to the execution traces of all segments, then it is worthwhile obtaining a single random point for all transitions rather than a new one for each transition. The advantage would be half as many extension columns.
+
+In this case, the random point $\alpha$ can be sampled from the Merkle root of the tree built from the commitments to execution traces of all segments. In every segment proof, the prover sends the commitment to the base table along with its Merkle authentication path in this data structure.
+
+While this approach reduces the overall complexity somewhat, the downside is that the computation has to finish before you can start proving.
+
+**9. Combining memories**
+
+In Triton VM there are three memory tables, and that's not counting the register state which is also encoded as memory polynomials. The larger number 4 gives rise to the question whether it is possible to merge memories and prove their integral continuation with a single continuation argument rather than separately.
+
+Memories are combined by assigning them to distinct address spaces using an extension field. So for instance, RAM memory is stored in addresses of the form $(* ,1)$ whereas OpStack uses $(* ,2)$. With this approach there only needs to be one pair of incoming continuation polynomials, one pair of outgoing continuation polynomials, and one pair of remainder polynomials – but they must be defined over the extension field.
+
+In regards to the trackers, it is ill-advised to combine them because they may require updates from distinct memory operations in the same row.
+
+## Large Memory
+
+The technique described here is ill-suited when a large number of memory cells are touched. The reason is that the degrees of the memory polynomials is essentially equal to the number of addresses. It's not uncommong for computations to touch gigabytes of memory, but even assuming for simplicity (and contrary to fact) that we can store 8 bytes in a field element, a memory polynomial storing 1 GiB of data would have degree roughly $2^{27}$.
+
+There are two strategies for accomodating continuations for computations that touch a lot of memory explained below. Neither one requires a modification to the architecture of Triton VM as they rely only on clever programming.
+
+### Merkleization
+
+Divide the memory into pages and put the pages into a sparse Merkle tree. Store the current page in its entirety in RAM but once you need access to another page, you need to page out and page in.
+
+To page out, hash the page stored in RAM to obtain the new Merkle leaf. Walk up the Merkle tree to modify the root accordingly.
+
+To page in, guess the page nondeterministically and hash it to compute the leaf. Walk up the Merkle tree to authenticate it against the root.
+
+### Disk Reads
+
+Use standard input and standard output communicate with the operating system, which performs disk reads and writes on the program's behalf. A separate proof system needs to establish that the list of disk reads and writes is authentic, but the point is that this argument is external to Triton VM and thus out of scope here.