-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add inverse NTTs for Kyber & Dilithium #37
Conversation
* Fix reductions
@dop-amin I cancelled the CI which was spinning indefinitely on the example dry run. |
@@ -120,7 +120,7 @@ def get_min_max_objective(slothy): | |||
vqdmulh_lane, | |||
vmull, vmlal, | |||
vsrshr, vushr, vusra, vshl, | |||
vand, vbic): ExecutionUnit.V(), | |||
vand, vbic, cmge): ExecutionUnit.V(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here and below you can now refer to cmge
as ASimdCompare
instead (assuming all compare instructions have similar performance characteristics). That makes the models more readable and simplifies the addition of further comparison instructions, because you only need to tweak the arch model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I modified the models. However, if we have something like is_qform_form_of
, then passing the parent class won't work. Do we want to keep it this way or do we want the function returned by is_qform_form_of to also return True
on the child classes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dop-amin Good point. I think is_qform_of
should be able to handle parent classes, yes. I bet the way I did this in aarch64_neon.py
is not terribly pythonic, but you could use all_subclass_leaves()
here.
@dop-amin Are you going to investigate the CI failure or do you need help? |
@dop-amin Is there [going to be] a sibling PR to PQAX as well adding tests for the inverse NTT? |
Hi Hanno, I think the CI just times out because it takes too long to go through all the examples. Especially the ones using heuristics seem to take long because it involves so many individual calls to the solver. Do you have a suggestion on how to go about this? We could disable the CI for examples using heuristics. |
Yes, I've been planning to submit it for a couple of days but now I finally did so. Thanks for reminding me. |
I am surprised by this because the dry run sets |
modulus_addr: .quad 8380417 | ||
ninv_addr: .quad 16382 | ||
ninv_tw_addr: .quad 4197891 | ||
intt_dilithium_1234_5678_manual_ld4: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again an optional improvement, but better to do while we're at it:
Could you go through the clean [inv]NTTs and add a comment at their entry symbol describing in what way (reduction / ordering) they deviate from a standard, fully-reduced NTT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point in general. I followed the way the current PQClean implementations handle the reductions sucht that we can have a fair comparison.
Regarding the ordering: All of our NTTs on aarch64 (should) output in the canonical ordering.
.endm | ||
|
||
.macro montg_reduce a | ||
srshr tmp.4S, \a\().4S, #23 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this really a Montgomery reduction? It looks more like Barrett, leveraging somehow that q
is very close to 2^23
.
In more detail: It looks like the absolute value |a/q - a/2**23|
is at most 2**31 * |1/q - q/r| ~ 0.25
, so the rounding is at most off by one. This neatly uses that the 'buffer' from q
to the word boundary 2^32
is has about the same bitlength than the approximation q~2^23
(since q=2^23-2^13+1
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me, this appears to be a shortened version of reduce32
from the reference implementation. As mentioned above, I followed the current state in PQClean.
Anyhow, the naming is off.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you adjust the macro names?
str_vo data6, in, (6*(1024/8)) | ||
str_vo data7, in, (7*(1024/8)) | ||
|
||
mul_ninv data4, data5, data6, data7, data0, data1, data2, data3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be interesting to double-check if the mul_ninv
and the canonical_reduce
can be replaced by a refined Barrett multiplication the sense of https://eprint.iacr.org/2022/439.pdf.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, @dop-amin -- thank you very much for this work.
Great, thanks for your feedback in the process! |
This PR introduces inverse NTTs for Kyber and Dilithium.
The type of transposition and reduction is supposed to match the code from PQClean [1,2].
TODO:
Simplify syntax as in NTTs: remove ldr/str macros that are no longer needed #36(do this separately, we want to re-run every example for this)Optional: Add(from experience, these variants perform not as well as the ones that merge more layers in the second merge)intt_kyber_1234_567
[1] https://github.com/PQClean/PQClean/tree/8e221ae797b229858a0b0d784577a8cb149d5789/crypto_sign/dilithium3/aarch64
[2] https://github.com/PQClean/PQClean/tree/8e221ae797b229858a0b0d784577a8cb149d5789/crypto_kem/kyber768/aarch64