Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel MSM load balancing: minimize work per-thread #451

Open
mratsim opened this issue Aug 1, 2024 · 0 comments
Open

Parallel MSM load balancing: minimize work per-thread #451

mratsim opened this issue Aug 1, 2024 · 0 comments

Comments

@mratsim
Copy link
Owner

mratsim commented Aug 1, 2024

Currently the MSM scheduler is minimizing the global number of additions and doublings.

However to benefits from maximum parallelism it might be worthwhile to minimize per-thread number of additions and doublings even if there are slightly more globally.

Motivating example:

  • 256 inputs require c = 7
  • After endomorphism acceleration we have coefficients of 128 bits hence 128/7 = 18.29 mini-MSMs.
  • On a 16 threads machine, you would wait for 2 rounds of mini-MSMs with 15 out of 18 threads idle at the second round.
  • This can be fixed with latency hiding but you can only do so-much if the imbalance is that large.
  • Here moving to c = 8 for an exact 16-level parallelization or c = 4 for 32 would better utilize the cores.
  • Note that if cores are not homogeneous with one 3x faster than the other, we're at a loss with exact work division.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant