Introduce instructions `pick` + `i` and `place` + `i` #297

jan-ferdinand · 2024-05-30T12:05:47Z

The two new instructions simplify manipulation of the operational stack. Instruction pick + i moves the indicated stack element to the top of the stack. Instruction place + i is its dual, moving the top of the stack to the indicated position.

	old op stack	new op stack
`pick` + `i`	e.g., `_ d x c b a`	e.g., `_ d c b a x`
`place` + `i`	e.g., `_ d c b a x`	e.g., `_ d x c b a`

jan-ferdinand · 2024-07-24T09:01:16Z

Arguments in favor of introducing the new instructions:

The assembly code of the recursive verifier in tasm-lib currently has 570 occurences of the pattern swap 1 swap 2 swap 3, which corresponds to pick 3, and 566 occurences of the pattern swap 3 swap 2 swap 1, which corresponds to place 3.¹ Together, these two patterns account for ~42%² of all executed swap chains.
A common pattern for reading from memory when the memory pointer is in the shallow part of the stack is swap i read_mem n swap (i + n) pop 1 which can be replaced by pick i read_mem n place (i + n).
“swap juggling” is an acquired skill and might be a deterrent for newcomers.

Arguments against introducing the new instructions:

Arithmetic overhead.

Replacing the two mentioned patterns with pick & place would save an estimated 1.6% the number of executed instructions. ↩
yes, actually ↩

aszepieniec · 2024-07-24T09:25:05Z

To make the strongest possible argument in favor of this PR, one has to argue that the following cost metrics either don't increase or are worth it:

the number of new constraints (recall that we need to compute a linear combination with this number of terms; so it affects prover speed)
the number of new degree-lowering columns
the reduced space of opcodes.

Do we have numbers or arguments relating to these cost metrics?

jan-ferdinand · 2024-07-24T10:12:20Z

The arithmetization overview can answer the first two questions:

number of new constraints: 4
number of new degree-lowering columns: 4

The arithmetization overview also tells us that evaluating the new constraints would introduce 2382 additional rows in the processor table when evaluating the AIR in assembly.

The question regarding opcode space is slightly more involved because of instruction categories. The relevant categories are currently (without pick and place) filled as follows:

u32?	shrink?	size	num
non-u32	no shrink	single word	13
non-u32	no shrink	double word	7
non-u32	shrink stack	single word	11
non-u32	shrink stack	double word	3
u32	no shrink	single word	4
u32	no shrink	double word	0
u32	shrink stack	single word	4
u32	shrink stack	double word	0

Within each category, the maximum is 16. The instructions pick and place both fall in the category (non-u32, no-shrink, double word), increasing its size from 7 to 9.

aszepieniec

Reviewed. No comments. LGTM.

aszepieniec · 2024-07-25T11:38:10Z

To merge or not to merge? Here is an argument in favor worth considering.

The 1.6% fewer cycles translates to rows in the processor table. Assume that translates to a smaller table height before padding. (And even if this assumption is wrong the argument still kinda holds.) We get this reduced row count in exchange for 0.7% (=1 - 607/611) performance drop in two big steps, namely NTT and row hashing. While this PR does not move the row count across the next power-of-two threshold, it can achieve that in combination with other similar improvements.

I propose to merge this and future similar improvements. And then, at a later stage when we crossed the next power of two boundary, we can figure out which instructions to drop without crossing it back. In the mean time we can

avoid using the instructions explicitly so as to not complicate the potential reversion
automatically optimize generated tasm so as to reap the benefits.

Sword-Smith · 2024-07-25T13:54:27Z

To merge or not to merge? Here is an argument in favor worth considering.

The 1.6% fewer cycles translates to rows in the processor table. Assume that translates to a smaller table height before padding. (And even if this assumption is wrong the argument still kinda holds.) We get this reduced row count in exchange for 0.7% (=1 - 607/611) performance drop in two big steps, namely NTT and row hashing. While this PR does not move the row count across the next power-of-two threshold, it can achieve that in combination with other similar improvements.

How did @jan-ferdinand get to the 1.6 % fewer cycles? Do they include the added cost of the AIR-constraint evaluation? Because of this AIR-constraint evaluation cost, this PR adds 2 % to the opstack table 0.8 % to the processor table, and 0.6 % to the memory table. With the above-mentioned 0.7 % of additional prover time because of extra columns, I would say merging this is neutral from a performance perspective (assuming I didn't misunderstand anything).

I'm in favor of advancing on the consensus programs and only after those have been written, reconsider these instruction. But it's not a strong preference. I'm happy to let the majority, or just @jan-ferdinand, decide.

Sword-Smith · 2024-07-25T14:08:21Z

For reference, here is a table row count for the current verifier, run on Triton-VM 0.42 alpha 6.

[
  {
    "name": "tasmlib_verifier_stark_verify_inner_padded_height_524288_fri_exp_4",
    "benchmark_result": {
      "clock_cycle_count": 285183,
      "hash_table_height": 234025,
      "u32_table_height": 211197,
      "op_stack_table_height": 235124,
      "ram_table_height": 281599
    },
    "case": "CommonCase"
  }
]

jan-ferdinand · 2024-07-25T14:16:28Z

Please note that yesterday's comment was not meant to try to conclude the question regarding merging this PR; I mostly wanted to jot down some findings wherever they fit best. Some of this is here.
However, more of the findings are in this comment on #259, where I try to analyse the recufier's execution trace in order to identify common patterns that might be worth introducing additional instructions for. The method is somewhat crude right now, and might not be the right tool for all questions in this line.

jan-ferdinand · 2024-07-25T14:21:58Z

How did @jan-ferdinand get to the 1.6 % fewer cycles?

I analysed how often a swap pattern that could be replaced by a pick & place pattern occured when executing the recufier, then used the respective patterns lengths as well as the total number of executed instructions.

Do the [1.6% reduced number of instructions] include the added cost of the AIR-constraint evaluation?

No. That's part of the (admittedly overly short) con “arithmetic overhead.”

These two new instructions simplify manipulation of the operational stack. Instruction `pick` + `i` moves the indicated stack element to the top of the stack. Instruction `place` + `i` is its dual, moving the top of the stack to the indicated position.

jan-ferdinand requested a review from aszepieniec May 30, 2024 12:05

jan-ferdinand force-pushed the pick_place branch 3 times, most recently from 9f8c2f1 to 95d2a06 Compare June 7, 2024 08:16

jan-ferdinand force-pushed the master branch from 3727f02 to c683170 Compare June 7, 2024 09:16

jan-ferdinand force-pushed the pick_place branch 2 times, most recently from 7b63e76 to e18d3c3 Compare June 8, 2024 06:45

jan-ferdinand force-pushed the master branch 2 times, most recently from 1c2ccef to dd496f7 Compare June 8, 2024 08:02

jan-ferdinand force-pushed the pick_place branch from e18d3c3 to 3a3ad2e Compare June 13, 2024 07:31

aszepieniec approved these changes Jul 25, 2024

View reviewed changes

jan-ferdinand force-pushed the master branch from 02b483d to c6791e8 Compare August 19, 2024 12:23

jan-ferdinand force-pushed the master branch 2 times, most recently from 358c67f to 6271c3d Compare September 13, 2024 07:20

jan-ferdinand force-pushed the pick_place branch from 3a3ad2e to a2a97f4 Compare September 13, 2024 11:03

jan-ferdinand force-pushed the pick_place branch from a2a97f4 to b769392 Compare September 13, 2024 11:04

jan-ferdinand merged commit b769392 into master Sep 16, 2024
5 checks passed

jan-ferdinand deleted the pick_place branch October 3, 2024 07:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce instructions `pick` + `i` and `place` + `i` #297

Introduce instructions `pick` + `i` and `place` + `i` #297

jan-ferdinand commented May 30, 2024 •

edited

Loading

jan-ferdinand commented Jul 24, 2024 •

edited

Loading

aszepieniec commented Jul 24, 2024

jan-ferdinand commented Jul 24, 2024 •

edited

Loading

aszepieniec left a comment

aszepieniec commented Jul 25, 2024 •

edited by Sword-Smith

Loading

Sword-Smith commented Jul 25, 2024 •

edited

Loading

Sword-Smith commented Jul 25, 2024 •

edited

Loading

jan-ferdinand commented Jul 25, 2024

jan-ferdinand commented Jul 25, 2024

Introduce instructions pick + i and place + i #297

Introduce instructions pick + i and place + i #297

Conversation

jan-ferdinand commented May 30, 2024 • edited Loading

jan-ferdinand commented Jul 24, 2024 • edited Loading

Footnotes

aszepieniec commented Jul 24, 2024

jan-ferdinand commented Jul 24, 2024 • edited Loading

aszepieniec left a comment

Choose a reason for hiding this comment

aszepieniec commented Jul 25, 2024 • edited by Sword-Smith Loading

Sword-Smith commented Jul 25, 2024 • edited Loading

Sword-Smith commented Jul 25, 2024 • edited Loading

jan-ferdinand commented Jul 25, 2024

jan-ferdinand commented Jul 25, 2024

Introduce instructions `pick` + `i` and `place` + `i` #297

Introduce instructions `pick` + `i` and `place` + `i` #297

jan-ferdinand commented May 30, 2024 •

edited

Loading

jan-ferdinand commented Jul 24, 2024 •

edited

Loading

jan-ferdinand commented Jul 24, 2024 •

edited

Loading

aszepieniec commented Jul 25, 2024 •

edited by Sword-Smith

Loading

Sword-Smith commented Jul 25, 2024 •

edited

Loading

Sword-Smith commented Jul 25, 2024 •

edited

Loading