this repo contains a circom code for sudoku game and used lambdaworks circom adapter to use the circom code for proving and verifying.
as the number of constraints in this game is nearly 2 ^ 14 constraints, we will need some resource optimization to run the code in a reasonable time, for this purpose we need to enable parallel feature and metal to use the gpu. for using the parallet featuer in lambdaworks we need to add the following code to our config.toml in groth16 folder in lambdaworks:
# rayon
rayon = { version = "1.7"}
[features]
metal = ["lambdaworks-math/metal"]
parallel = ["lambdaworks-crypto/parallel"]
and also modifying qap.rs file to use this feature by importing:
use rayon::iter::{ IntoParallelRefIterator, ParallelIterator};
in my case it helped me to ran 10 parallel thread to run the execution and it takes about 7 minutes to do the whole process, including qap generation, setup, power of tau , proving (including MSM) and also verification!
but as straightforward this part seems to be, using metal wasn't an easy job, I've tried to add this feature to the groth16 protocol but I find out when we turn on the metal feature you also have to consider in which field you're working on, right now lambdaworks didn't implement the interface of metal for calculating fft for bls1381 but it was implemented for couple of other fields like babybear , stark101 ,... this will be the start for me to read more about this feature and this repo will be updated soon!
PS: the bottleneck for efficiency is in committing phase, to improve it we need to implement MSM in GPU instead of CPU.