Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sp1 is faster when including recommended compilation flags #1

Open
jtguibas opened this issue Oct 7, 2024 · 1 comment · May be fixed by #2
Open

sp1 is faster when including recommended compilation flags #1

jtguibas opened this issue Oct 7, 2024 · 1 comment · May be fixed by #2

Comments

@jtguibas
Copy link

jtguibas commented Oct 7, 2024

The instructions in this repo for running SP1 are missing rust compilation flags that give free performance gains on most CPUs. The AMD EPYC 7713 used in your most recent benchmarking report supports AVX acceleration and hence should be enabled.

On my r6a.16xlarge instance, I get the following numbers:

  • Simple Arithmetic (SP1): 2.47 Seconds
  • Simple Arithmetic (R0): 5.3 Seconds
  • Memory Alloc (SP1): 3.84 Seconds
  • Memory Alloc (R0): 9.72 Seconds

These flags are recommended in our docs.

@tyshko-rostyslav
Copy link

The instructions in this repo for running SP1 are missing rust compilation flags that give free performance gains on most CPUs. The AMD EPYC 7713 used in your most recent benchmarking report supports AVX acceleration and hence should be enabled.

On my r6a.16xlarge instance, I get the following numbers:

  • Simple Arithmetic (SP1): 2.47 Seconds
  • Simple Arithmetic (R0): 5.3 Seconds
  • Memory Alloc (SP1): 3.84 Seconds
  • Memory Alloc (R0): 9.72 Seconds

These flags are recommended in our docs.

Dear @jtguibas,

Thank you for your comments and suggestions!

We have implemented the native optimization flag you proposed, and it has improved SP1’s performance. Specifically, the optimized simple arithmetic test for SP1 now runs in 14.383s, compared to 15.623s in the non-optimized case, and the optimized memory allocation test runs in 16.029s, compared to 17.900s without optimization.

Please note that since running our benchmarks, our machine has been updated and now operates approximately 10% faster. As a result, all zkVMs are running about 10% faster in our current tests.

In general, we opted not to use hardware acceleration when performing our benchmarks, as our project targets a broad audience. We cannot assume AVX512 support by default, given that this is primarily available in high-end CPUs.

We will update the current blog post to explicitly mention that hardware acceleration was not tested in our initial comparison. Additionally, we will soon include hardware acceleration results for SP1 (enabling AVX512).

Looking ahead, we plan to compare RISC0 and SP1, both with CUDA acceleration. We hope that by that time, other zkVMs will also support such acceleration, allowing for a fair and representative comparison.

Please feel free to reach out if you have any further questions or concerns.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants