Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve powmod #135

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Improve powmod #135

wants to merge 1 commit into from

Conversation

konsumlamm
Copy link
Contributor

  • avoid redundant modulo call if exponent >= 0
  • avoid shifting exponent

Unfortunately, I didn't notice any difference in performance, but this should reduce the number of allocations.

@mratsim
Copy link

mratsim commented Jul 25, 2023

Here is a guide on how to implement fast powmod status-im/nim-stint#126

The result is +/- 10% faster than GMP, without assembly: status-im/nim-stint#126 (comment)

image

@dlesnoff
Copy link
Contributor

We trade one shr and a minus sign by a div, a shr and a mod?
@konsumlamm Have you compared the generated assembly?
Is there some optimizations with fastLog that I missed?
For which limb sizes did you benchmark it?
@mratsim is mainly proposing to switch to the clang compiler and perform some algorithmic optimizations too. Do you think it could be doable to switch to the clang compiler when clang is available on the user's computer?

@konsumlamm
Copy link
Contributor Author

We trade one shr and a minus sign by a div, a shr and a mod?

The shr that is replaced operated on BigInts, so it allocated a new BigInt every iteration. The new div, mod and shl are for uint32. The - can be left out since the rest of the code only uses exponent.limbs.

@konsumlamm
Copy link
Contributor Author

@mratsim thanks for the link! I created a new issue for further optimizations (#137).

@konsumlamm
Copy link
Contributor Author

@dlesnoff @narimiran what do you think?

@dlesnoff
Copy link
Contributor

I am not convinced. The exponent being an int in your code, it would allocate at most a two-limb BigInt in my code (and for most of the uses, it will be a single limb).

Is the cost of memory allocation so huge that it is better to perform a few more computations?

I didn't notice any difference in performance,

It does not seem so, on your computer at least. The number of allocations might impact on more constrained environments (embedded) but I do not think that we target embedded. At least, there are algorithmic changes to perform before micro-optimizations.

I would be interested to see either (or both):
– A reproducible benchmark showing the differences in performance and memory usage (not in the PR since that will be difficult but in the conversation).
– An overload of the procedure to give a separate interface for people having an int as exponent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants