Memory boosting mode #2006

querolita · 2025-02-04T17:30:07Z

Problem description

WASM bytecode allows web browsers to execute high-performance code written in lower-level languages (e.g. Rust) with near native speed. However, WASM has a heap limitation set at 4GB, what can be a problem when compiling memory demanding applications. In our case, o1js makes use of the Kimchi SNARK behind the hood, which has been designed to provide the fastest proving time across our code stack. This is thank to, among many others, efficient algorithms, optimized formatting, and precomputation of costly parameters. On this end, several polynomials that describe the circuit are stored ahead of time to avoid having the prover generate them whenever needed. This comes at the cost of an important part of the heap being consumed by these structures. This configuration is preventing us from using that precious space for other features that could require further memory resources, such as chunking circuits from zkApps. Moreover, there's a bound in the number of methods that can be defined, and some of them are currently being used by the selector polynomials (also limiting the complexity of the circuits that can be deployed).

Proposal

The idea is to provide a memory boosting mode inside the o1js prover interface that would send a signal throughout the stack to trade memory for proving time. That means, by default o1js would still function as it does today. But upon request by the zkApp developer, our SNARK will be able to operate in a (slightly) slower pace in order to release sufficient heap space that could be used by the circuit for other concerns. If that flag is set, these heavy polynomials would only be computed when needed by the SNARK instead of being recovered from memory.

A second step which would alleviate this prover overhead consists on cacheing these polynomials directly in the browser. This way the Typescript code would obtain their evaluations in a database fashion from the webpage.

Concrete steps

The relation defining our circuit is given by a set of polynomials that play an essential role in the proving procedure. They are split into three main groups: gate selectors, lookup polynomials, and coefficient polynomials. These are fixed and known per circuit, as they define the operation that the SNARK is proving claims about, leaking no information at all about the underlying witness. They can be seen as switches throughout the execution trace, turning on and off certain features of the gates involved. In particular, each of them has degree in the order of the number of rows (as they are computed by interpolating their evaluations across the domain). However, in order to compute other necessary polynomials for the proof generation, we need further evaluations of them. In particular, 8 times more than the domain size (what is referred to as D8 in the code). These are the concrete evaluations that we are targeting in this proposal. Instead of loading them from memory at proof generation, the prover would compute them whenever needed as the degrees of the constraints require so.

Implementation details

The first efforts will try to get this mode working on Kimchi. The bindings should be updated accordingly to be able to pass on this new flag to the OCaml code. Then, it needs to be studied how much of Pickles will be affected. Hopefully, not much since the actual computation of these polynomials should live mostly entirely in the Rust code. Either way, it needs to be compatible with the o1js layer, so it can be configured from the zkApp, at compilation time.

Related PRs

TODO

mitschabaude · 2025-02-07T07:56:53Z

Instead of loading them from memory at proof generation, the prover would compute them whenever needed as the degrees of the constraints require so.

How about storing larger objects in JS memory, which doesn't have the 4GB limit? Loading them back from JS memory into Wasm memory should have very small overhead compared to recomputing.

Trivo25 · 2025-02-07T09:20:49Z

Instead of loading them from memory at proof generation, the prover would compute them whenever needed as the degrees of the constraints require so.

How about storing larger objects in JS memory, which doesn't have the 4GB limit? Loading them back from JS memory into Wasm memory should have very small overhead compared to recomputing.

Ohhh that's a great idea, thanks! I was thinking about storing it in the local storage of the browser but this approach is much smarter

querolita · 2025-02-07T12:15:31Z

How about storing larger objects in JS memory

Would this make sense as a default strategy? Meaning, instead of creating a memory boosting mode trading prover cost for memory space, I am wondering if that approach could bring the best of both worlds.

mitschabaude · 2025-02-07T12:52:51Z

Good question!

Generally speaking, as a user, I don't want to set a special flag just to prevent o1js from going OOM. It should not do that by itself.

So yeah I think if it's low overhead then you could simply always keep certain large objects in JS memory.

An alternative would be to be smart in the prover and figure out when we have to do that. However, that adds complexity which might not be worth it.

mrmr1993 · 2025-02-07T13:53:27Z

Having JS memory, browser db, etc. as options for these precomputations was part of the plan, yea :) In the spirit of incrementalism, it feels best to me to start with the original goal of decoupling the prover from the values in the prover index. Ultimately, we may even want the option to cache some frequently-called methods but not others, so I'm leaning towards doing a second design pass once we have this special 'no precomputations' mode that we can plug into.

dfstio · 2025-02-10T11:21:43Z

As we are discussing that the prover can use several memory types, can we also review the possibility of

using protected memory of Intel SGX (that is limited to 128 or 256 MB) or Intel TDX (that is much bigger) only for the computations where the private witness data is used
being able to compile the SmartContract/ZkProgram to the serializable form that should be the only one needed to create a proof, thus allowing not to use TypeScript at all to generate the proof and just run native prover using the compiled circuit and witness.

Trivo25 · 2025-02-10T11:27:33Z

As we are discussing that the prover can use several memory types, can we also review the possibility of

using protected memory of Intel SGX (that is limited to 128 or 256 MB) or Intel TDX (that is much bigger) only for the computations where the private witness data is used

being able to compile the SmartContract/ZkProgram to the serializable form that should be the only one needed to create a proof, thus allowing not to use TypeScript at all to generate the proof and just run native prover using the compiled circuit and witness.

Happy to consider that! Let's open a different issue that's focused on memory modes and other potential optimisations, I want to keep the scope of this project as small as possible and follow up with any improvements!

querolita self-assigned this Feb 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory boosting mode #2006

Memory boosting mode #2006

querolita commented Feb 4, 2025 •

edited

Loading

mitschabaude commented Feb 7, 2025

Trivo25 commented Feb 7, 2025

querolita commented Feb 7, 2025

mitschabaude commented Feb 7, 2025

mrmr1993 commented Feb 7, 2025

dfstio commented Feb 10, 2025

Trivo25 commented Feb 10, 2025

Memory boosting mode #2006

Memory boosting mode #2006

Comments

querolita commented Feb 4, 2025 • edited Loading

Problem description

Proposal

Concrete steps

Implementation details

Related PRs

mitschabaude commented Feb 7, 2025

Trivo25 commented Feb 7, 2025

querolita commented Feb 7, 2025

mitschabaude commented Feb 7, 2025

mrmr1993 commented Feb 7, 2025

dfstio commented Feb 10, 2025

Trivo25 commented Feb 10, 2025

querolita commented Feb 4, 2025 •

edited

Loading