Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NTT: Simplify computation of zeta index #677

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

hanno-becker
Copy link
Contributor

@hanno-becker hanno-becker commented Jan 21, 2025

The n-th layer of the NTT needs the zeta table entries at indices 2^(n-1)..2^n-1. Previously, the initial index was computed using a division by the len parameter, upholding the status of the layer parameter as a pure ghost variable only needed for the specification, but not the code itself.

This commit gives up the status of layer as a ghost variable and instead uses it to compute the initial zeta index using a shift instead of a division.

While len is not secret and there is, thus, no risk of information leakage when a div instruction is used, it still seems cleaner and faster to work with a shift.

Suggested-by: @rod-chapman

The n-th layer of the NTT needs the zeta table entries at indices
`2^(n-1)..2^n-1`. Previously, the initial index was computed using
a division by the `len` parameter, upholding the status of the
`layer` parameter as a pure ghost variable only needed for the
specification, but not the code itself.

This commit gives up the status of `layer` as a ghost variable
and instead uses it to compute the initial zeta index using a
shift instead of a division.

While `len` is not secret and there is, thus, no risk of
information leakage when a `div` instruction is used, it still
seems cleaner and faster to work with a shift.

Suggested-by: Rod Chapman <[email protected]>
Signed-off-by: Hanno Becker <[email protected]>
@hanno-becker hanno-becker added the benchmark this PR should be benchmarked in CI label Jan 21, 2025
Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A76 (Raspberry Pi 5) benchmarks

Benchmark suite Current: c6e75b7 Previous: 21c0c39 Ratio
ML-KEM-512 keypair 29054 cycles 29082 cycles 1.00
ML-KEM-512 encaps 35416 cycles 35434 cycles 1.00
ML-KEM-512 decaps 45890 cycles 45904 cycles 1.00
ML-KEM-768 keypair 49303 cycles 49301 cycles 1.00
ML-KEM-768 encaps 55606 cycles 55593 cycles 1.00
ML-KEM-768 decaps 70373 cycles 70352 cycles 1.00
ML-KEM-1024 keypair 72088 cycles 72021 cycles 1.00
ML-KEM-1024 encaps 80835 cycles 80731 cycles 1.00
ML-KEM-1024 decaps 100707 cycles 100643 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 4th gen (c7i)

Benchmark suite Current: c6e75b7 Previous: 21c0c39 Ratio
ML-KEM-512 keypair 13539 cycles 13511 cycles 1.00
ML-KEM-512 encaps 17277 cycles 17292 cycles 1.00
ML-KEM-512 decaps 22790 cycles 22847 cycles 1.00
ML-KEM-768 keypair 22542 cycles 22509 cycles 1.00
ML-KEM-768 encaps 24578 cycles 24493 cycles 1.00
ML-KEM-768 decaps 32620 cycles 32433 cycles 1.01
ML-KEM-1024 keypair 30246 cycles 31544 cycles 0.96
ML-KEM-1024 encaps 33575 cycles 34858 cycles 0.96
ML-KEM-1024 decaps 44395 cycles 45807 cycles 0.97

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 4th gen (c7i) (no-opt)

Benchmark suite Current: c6e75b7 Previous: 21c0c39 Ratio
ML-KEM-512 keypair 33177 cycles 33248 cycles 1.00
ML-KEM-512 encaps 38891 cycles 38603 cycles 1.01
ML-KEM-512 decaps 49870 cycles 49993 cycles 1.00
ML-KEM-768 keypair 54286 cycles 54116 cycles 1.00
ML-KEM-768 encaps 62161 cycles 61116 cycles 1.02
ML-KEM-768 decaps 76834 cycles 75823 cycles 1.01
ML-KEM-1024 keypair 81593 cycles 82438 cycles 0.99
ML-KEM-1024 encaps 92383 cycles 92939 cycles 0.99
ML-KEM-1024 decaps 112337 cycles 111511 cycles 1.01

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A55 (Snapdragon 888) benchmarks

Benchmark suite Current: c6e75b7 Previous: 21c0c39 Ratio
ML-KEM-512 keypair 58332 cycles 58329 cycles 1.00
ML-KEM-512 encaps 65717 cycles 65721 cycles 1.00
ML-KEM-512 decaps 84560 cycles 84520 cycles 1.00
ML-KEM-768 keypair 98995 cycles 98937 cycles 1.00
ML-KEM-768 encaps 110169 cycles 110308 cycles 1.00
ML-KEM-768 decaps 136901 cycles 136632 cycles 1.00
ML-KEM-1024 keypair 149851 cycles 150103 cycles 1.00
ML-KEM-1024 encaps 166300 cycles 166430 cycles 1.00
ML-KEM-1024 decaps 202015 cycles 202325 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 3rd gen (c6i)

Benchmark suite Current: c6e75b7 Previous: 21c0c39 Ratio
ML-KEM-512 keypair 20341 cycles 20361 cycles 1.00
ML-KEM-512 encaps 26921 cycles 27150 cycles 0.99
ML-KEM-512 decaps 35739 cycles 35746 cycles 1.00
ML-KEM-768 keypair 34882 cycles 34897 cycles 1.00
ML-KEM-768 encaps 38169 cycles 38179 cycles 1.00
ML-KEM-768 decaps 50940 cycles 50949 cycles 1.00
ML-KEM-1024 keypair 48081 cycles 48039 cycles 1.00
ML-KEM-1024 encaps 53914 cycles 53911 cycles 1.00
ML-KEM-1024 decaps 71383 cycles 71361 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 3rd gen (c6a)

Benchmark suite Current: c6e75b7 Previous: 21c0c39 Ratio
ML-KEM-512 keypair 18132 cycles 18112 cycles 1.00
ML-KEM-512 encaps 23003 cycles 22987 cycles 1.00
ML-KEM-512 decaps 30244 cycles 30219 cycles 1.00
ML-KEM-768 keypair 31091 cycles 31173 cycles 1.00
ML-KEM-768 encaps 33876 cycles 33895 cycles 1.00
ML-KEM-768 decaps 44492 cycles 44517 cycles 1.00
ML-KEM-1024 keypair 44578 cycles 44528 cycles 1.00
ML-KEM-1024 encaps 49748 cycles 49789 cycles 1.00
ML-KEM-1024 decaps 64269 cycles 64472 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 4th gen (c7a)

Benchmark suite Current: c6e75b7 Previous: 21c0c39 Ratio
ML-KEM-512 keypair 14875 cycles 14886 cycles 1.00
ML-KEM-512 encaps 19685 cycles 19693 cycles 1.00
ML-KEM-512 decaps 26309 cycles 26308 cycles 1.00
ML-KEM-768 keypair 25593 cycles 25598 cycles 1.00
ML-KEM-768 encaps 28074 cycles 28059 cycles 1.00
ML-KEM-768 decaps 37821 cycles 37926 cycles 1.00
ML-KEM-1024 keypair 35850 cycles 35896 cycles 1.00
ML-KEM-1024 encaps 40727 cycles 40642 cycles 1.00
ML-KEM-1024 decaps 54105 cycles 54329 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton4

Benchmark suite Current: c6e75b7 Previous: 21c0c39 Ratio
ML-KEM-512 keypair 18125 cycles 18116 cycles 1.00
ML-KEM-512 encaps 22175 cycles 22181 cycles 1.00
ML-KEM-512 decaps 28837 cycles 28846 cycles 1.00
ML-KEM-768 keypair 30550 cycles 30555 cycles 1.00
ML-KEM-768 encaps 33627 cycles 33624 cycles 1.00
ML-KEM-768 decaps 43169 cycles 43151 cycles 1.00
ML-KEM-1024 keypair 44169 cycles 44155 cycles 1.00
ML-KEM-1024 encaps 49646 cycles 49644 cycles 1.00
ML-KEM-1024 decaps 62612 cycles 62631 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 4th gen (c7a) (no-opt)

Benchmark suite Current: c6e75b7 Previous: 21c0c39 Ratio
ML-KEM-512 keypair 39353 cycles 39338 cycles 1.00
ML-KEM-512 encaps 45470 cycles 45566 cycles 1.00
ML-KEM-512 decaps 58950 cycles 59072 cycles 1.00
ML-KEM-768 keypair 64919 cycles 64869 cycles 1.00
ML-KEM-768 encaps 73047 cycles 73172 cycles 1.00
ML-KEM-768 decaps 91119 cycles 91278 cycles 1.00
ML-KEM-1024 keypair 96453 cycles 96609 cycles 1.00
ML-KEM-1024 encaps 107475 cycles 107795 cycles 1.00
ML-KEM-1024 decaps 130836 cycles 131159 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 3rd gen (c6i) (no-opt)

Benchmark suite Current: c6e75b7 Previous: 21c0c39 Ratio
ML-KEM-512 keypair 51169 cycles 51581 cycles 0.99
ML-KEM-512 encaps 59183 cycles 59624 cycles 0.99
ML-KEM-512 decaps 76098 cycles 76584 cycles 0.99
ML-KEM-768 keypair 83951 cycles 84341 cycles 1.00
ML-KEM-768 encaps 95944 cycles 95697 cycles 1.00
ML-KEM-768 decaps 118078 cycles 117794 cycles 1.00
ML-KEM-1024 keypair 125000 cycles 125581 cycles 1.00
ML-KEM-1024 encaps 139507 cycles 139615 cycles 1.00
ML-KEM-1024 decaps 168074 cycles 168458 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton3

Benchmark suite Current: c6e75b7 Previous: 21c0c39 Ratio
ML-KEM-512 keypair 18954 cycles 18948 cycles 1.00
ML-KEM-512 encaps 23560 cycles 23573 cycles 1.00
ML-KEM-512 decaps 30705 cycles 30676 cycles 1.00
ML-KEM-768 keypair 32336 cycles 32337 cycles 1.00
ML-KEM-768 encaps 35893 cycles 35895 cycles 1.00
ML-KEM-768 decaps 46037 cycles 46039 cycles 1.00
ML-KEM-1024 keypair 46567 cycles 46584 cycles 1.00
ML-KEM-1024 encaps 52442 cycles 52474 cycles 1.00
ML-KEM-1024 decaps 66268 cycles 66237 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 3rd gen (c6a) (no-opt)

Benchmark suite Current: c6e75b7 Previous: 21c0c39 Ratio
ML-KEM-512 keypair 43181 cycles 43294 cycles 1.00
ML-KEM-512 encaps 51737 cycles 51438 cycles 1.01
ML-KEM-512 decaps 66976 cycles 66703 cycles 1.00
ML-KEM-768 keypair 71064 cycles 71830 cycles 0.99
ML-KEM-768 encaps 82953 cycles 83290 cycles 1.00
ML-KEM-768 decaps 102981 cycles 103534 cycles 0.99
ML-KEM-1024 keypair 106698 cycles 107123 cycles 1.00
ML-KEM-1024 encaps 121469 cycles 122111 cycles 0.99
ML-KEM-1024 decaps 147059 cycles 147944 cycles 0.99

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton4 (no-opt)

Benchmark suite Current: c6e75b7 Previous: 21c0c39 Ratio
ML-KEM-512 keypair 37968 cycles 38050 cycles 1.00
ML-KEM-512 encaps 43383 cycles 43372 cycles 1.00
ML-KEM-512 decaps 55574 cycles 55552 cycles 1.00
ML-KEM-768 keypair 63002 cycles 63040 cycles 1.00
ML-KEM-768 encaps 70545 cycles 70438 cycles 1.00
ML-KEM-768 decaps 87034 cycles 86902 cycles 1.00
ML-KEM-1024 keypair 94604 cycles 94524 cycles 1.00
ML-KEM-1024 encaps 105404 cycles 105306 cycles 1.00
ML-KEM-1024 decaps 126691 cycles 126977 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton2

Benchmark suite Current: c6e75b7 Previous: 21c0c39 Ratio
ML-KEM-512 keypair 29053 cycles 29072 cycles 1.00
ML-KEM-512 encaps 35402 cycles 35448 cycles 1.00
ML-KEM-512 decaps 45900 cycles 45890 cycles 1.00
ML-KEM-768 keypair 49315 cycles 49312 cycles 1.00
ML-KEM-768 encaps 55625 cycles 55594 cycles 1.00
ML-KEM-768 decaps 70414 cycles 70386 cycles 1.00
ML-KEM-1024 keypair 72115 cycles 72023 cycles 1.00
ML-KEM-1024 encaps 80914 cycles 80753 cycles 1.00
ML-KEM-1024 decaps 100822 cycles 100669 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton3 (no-opt)

Benchmark suite Current: c6e75b7 Previous: 21c0c39 Ratio
ML-KEM-512 keypair 39296 cycles 39346 cycles 1.00
ML-KEM-512 encaps 45353 cycles 45340 cycles 1.00
ML-KEM-512 decaps 57229 cycles 57376 cycles 1.00
ML-KEM-768 keypair 65823 cycles 65864 cycles 1.00
ML-KEM-768 encaps 73676 cycles 73714 cycles 1.00
ML-KEM-768 decaps 89651 cycles 89699 cycles 1.00
ML-KEM-1024 keypair 98910 cycles 98985 cycles 1.00
ML-KEM-1024 encaps 109926 cycles 109973 cycles 1.00
ML-KEM-1024 decaps 130635 cycles 130732 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton2 (no-opt)

Benchmark suite Current: c6e75b7 Previous: 21c0c39 Ratio
ML-KEM-512 keypair 60692 cycles 60695 cycles 1.00
ML-KEM-512 encaps 69689 cycles 69825 cycles 1.00
ML-KEM-512 decaps 88743 cycles 88750 cycles 1.00
ML-KEM-768 keypair 101971 cycles 101845 cycles 1.00
ML-KEM-768 encaps 115210 cycles 115088 cycles 1.00
ML-KEM-768 decaps 140643 cycles 140736 cycles 1.00
ML-KEM-1024 keypair 154692 cycles 154440 cycles 1.00
ML-KEM-1024 encaps 171868 cycles 171523 cycles 1.00
ML-KEM-1024 decaps 204462 cycles 204076 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bananapi bpi-f3 benchmarks

Benchmark suite Current: c6e75b7 Previous: 21c0c39 Ratio
ML-KEM-512 keypair 331240 cycles 331511 cycles 1.00
ML-KEM-512 encaps 440035 cycles 440475 cycles 1.00
ML-KEM-512 decaps 588468 cycles 588796 cycles 1.00
ML-KEM-768 keypair 547964 cycles 548481 cycles 1.00
ML-KEM-768 encaps 686870 cycles 687539 cycles 1.00
ML-KEM-768 decaps 878532 cycles 880077 cycles 1.00
ML-KEM-1024 keypair 809112 cycles 809152 cycles 1.00
ML-KEM-1024 encaps 981046 cycles 981016 cycles 1.00
ML-KEM-1024 decaps 1214638 cycles 1214963 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@hanno-becker hanno-becker marked this pull request as ready for review January 21, 2025 11:01
@hanno-becker hanno-becker requested a review from a team as a code owner January 21, 2025 11:01
Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A72 (Raspberry Pi 4) benchmarks

Benchmark suite Current: c6e75b7 Previous: 21c0c39 Ratio
ML-KEM-512 keypair 51606 cycles 51874 cycles 0.99
ML-KEM-512 encaps 58199 cycles 58261 cycles 1.00
ML-KEM-512 decaps 74202 cycles 75219 cycles 0.99
ML-KEM-768 keypair 88859 cycles 88284 cycles 1.01
ML-KEM-768 encaps 97722 cycles 95963 cycles 1.02
ML-KEM-768 decaps 120184 cycles 119280 cycles 1.01
ML-KEM-1024 keypair 132283 cycles 131943 cycles 1.00
ML-KEM-1024 encaps 145365 cycles 144532 cycles 1.01
ML-KEM-1024 decaps 177459 cycles 175238 cycles 1.01

This comment was automatically generated by workflow using github-action-benchmark.

@rod-chapman
Copy link
Contributor

CI looks good. Happy to merge.

Copy link
Contributor

@rod-chapman rod-chapman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A welcome simplification, and also removes the odd "ghost" status the layer parameter.

mlkem/ntt.c Show resolved Hide resolved
Copy link
Contributor

@mkannwischer mkannwischer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since layer is now no longer a ghost variable, please adjust the comments above the function accordingly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
benchmark this PR should be benchmarked in CI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants