Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: API Extension #715

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from
Draft

Proposal: API Extension #715

wants to merge 6 commits into from

Conversation

mkannwischer
Copy link
Contributor

@mkannwischer mkannwischer commented Jan 29, 2025

This PR rebases an old proposal for extending the API to allow operating on validated+expanded keys (splitting out the serialization and deserialization into separate functions which contain the input validation).
Benefit of this is that in case you can keep the keys expanded, you get much better performance. This is particularly useful for an ephemeral use-case (e.g., TLS), where the secret key never has to leave memory.

The benchmarks below show why we should consider doing this: Decapsulation gets up to 3x faster if you can cache the expanded secret key from key generation. (Encapsulation gets even up to 5x faster if you can cache, but I don't think this is useful for any major use case of ML-KEM).

This is far from being ready, but I'd like to kick off a discussion on how the API should look like.
One point I do not like about this anymore is that it changes the signature of the previous API - It would be cleaner to just add new functions and implement the old 3-function API on top of the new API. Downside of that is that we will have a lot of functions.

Once we have consensus on how to do this, there is still some work to do here: I have not addressed C90 complaince, constant-time tests, kat tests, and CBMC proofs. Also currently the user (through mlkem_native.h) requires to know the details of the key structs - this should be changed to be a generic uint8_t array of the right size.

@mkannwischer mkannwischer added the benchmark this PR should be benchmarked in CI label Jan 29, 2025
Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A76 (Raspberry Pi 5) benchmarks

Benchmark suite Current: 4c5b57e Previous: 3dc9642 Ratio
ML-KEM-512 keypair 28989 cycles 29044 cycles 1.00
ML-KEM-512 encaps 18345 cycles 34919 cycles 0.53
ML-KEM-512 decaps 29401 cycles 45430 cycles 0.65
ML-KEM-768 keypair 49525 cycles 49260 cycles 1.01
ML-KEM-768 encaps 22398 cycles 55570 cycles 0.40
ML-KEM-768 decaps 38088 cycles 70407 cycles 0.54
ML-KEM-1024 keypair 72470 cycles 71976 cycles 1.01
ML-KEM-1024 encaps 27886 cycles 80729 cycles 0.35
ML-KEM-1024 decaps 48885 cycles 100536 cycles 0.49

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 4th gen (c7i)

Benchmark suite Current: 4c5b57e Previous: 3dc9642 Ratio
ML-KEM-512 keypair 9548 cycles 9322 cycles 1.02
ML-KEM-512 encaps 4648 cycles 10791 cycles 0.43
ML-KEM-512 decaps 8705 cycles 14879 cycles 0.59
ML-KEM-768 keypair 16204 cycles 16110 cycles 1.01
ML-KEM-768 encaps 5153 cycles 17328 cycles 0.30
ML-KEM-768 decaps 11148 cycles 22971 cycles 0.49
ML-KEM-1024 keypair 21778 cycles 21807 cycles 1.00
ML-KEM-1024 encaps 6529 cycles 23551 cycles 0.28
ML-KEM-1024 decaps 14544 cycles 31213 cycles 0.47

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 4th gen (c7i) (no-opt)

Benchmark suite Current: 4c5b57e Previous: 3dc9642 Ratio
ML-KEM-512 keypair 29127 cycles 28793 cycles 1.01
ML-KEM-512 encaps 21816 cycles 34819 cycles 0.63
ML-KEM-512 decaps 32954 cycles 44962 cycles 0.73
ML-KEM-768 keypair 48722 cycles 48491 cycles 1.00
ML-KEM-768 encaps 28989 cycles 58558 cycles 0.50
ML-KEM-768 decaps 44767 cycles 72121 cycles 0.62
ML-KEM-1024 keypair 72995 cycles 71744 cycles 1.02
ML-KEM-1024 encaps 37349 cycles 86145 cycles 0.43
ML-KEM-1024 decaps 57599 cycles 103558 cycles 0.56

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 3rd gen (c6a)

Benchmark suite Current: 4c5b57e Previous: 3dc9642 Ratio
ML-KEM-512 keypair 16990 cycles 16963 cycles 1.00
ML-KEM-512 encaps 7511 cycles 18638 cycles 0.40
ML-KEM-512 decaps 13017 cycles 23952 cycles 0.54
ML-KEM-768 keypair 28794 cycles 28897 cycles 1.00
ML-KEM-768 encaps 7667 cycles 29934 cycles 0.26
ML-KEM-768 decaps 15695 cycles 37633 cycles 0.42
ML-KEM-1024 keypair 42024 cycles 42144 cycles 1.00
ML-KEM-1024 encaps 9464 cycles 44254 cycles 0.21
ML-KEM-1024 decaps 19905 cycles 54692 cycles 0.36

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 3rd gen (c6i)

Benchmark suite Current: 4c5b57e Previous: 3dc9642 Ratio
ML-KEM-512 keypair 15875 cycles 15779 cycles 1.01
ML-KEM-512 encaps 7482 cycles 17843 cycles 0.42
ML-KEM-512 decaps 14268 cycles 24358 cycles 0.59
ML-KEM-768 keypair 27408 cycles 27053 cycles 1.01
ML-KEM-768 encaps 8093 cycles 28661 cycles 0.28
ML-KEM-768 decaps 17951 cycles 38119 cycles 0.47
ML-KEM-1024 keypair 36840 cycles 36575 cycles 1.01
ML-KEM-1024 encaps 10331 cycles 39488 cycles 0.26
ML-KEM-1024 decaps 23356 cycles 51951 cycles 0.45

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 4th gen (c7a)

Benchmark suite Current: 4c5b57e Previous: 3dc9642 Ratio
ML-KEM-512 keypair 11470 cycles 11577 cycles 0.99
ML-KEM-512 encaps 5326 cycles 13033 cycles 0.41
ML-KEM-512 decaps 10347 cycles 17949 cycles 0.58
ML-KEM-768 keypair 20014 cycles 19635 cycles 1.02
ML-KEM-768 encaps 5405 cycles 20577 cycles 0.26
ML-KEM-768 decaps 12744 cycles 27635 cycles 0.46
ML-KEM-1024 keypair 26754 cycles 26223 cycles 1.02
ML-KEM-1024 encaps 7000 cycles 28249 cycles 0.25
ML-KEM-1024 decaps 16757 cycles 37534 cycles 0.45

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 3rd gen (c6a) (no-opt)

Benchmark suite Current: 4c5b57e Previous: 3dc9642 Ratio
ML-KEM-512 keypair 39672 cycles 39696 cycles 1.00
ML-KEM-512 encaps 30099 cycles 48393 cycles 0.62
ML-KEM-512 decaps 47036 cycles 63017 cycles 0.75
ML-KEM-768 keypair 65600 cycles 65547 cycles 1.00
ML-KEM-768 encaps 41304 cycles 77185 cycles 0.54
ML-KEM-768 decaps 64125 cycles 96276 cycles 0.67
ML-KEM-1024 keypair 97560 cycles 98056 cycles 0.99
ML-KEM-1024 encaps 53345 cycles 112583 cycles 0.47
ML-KEM-1024 decaps 82066 cycles 137084 cycles 0.60

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton4

Benchmark suite Current: 4c5b57e Previous: 3dc9642 Ratio
ML-KEM-512 keypair 18174 cycles 18124 cycles 1.00
ML-KEM-512 encaps 10438 cycles 21280 cycles 0.49
ML-KEM-512 decaps 17391 cycles 27915 cycles 0.62
ML-KEM-768 keypair 30748 cycles 30584 cycles 1.01
ML-KEM-768 encaps 12009 cycles 33621 cycles 0.36
ML-KEM-768 decaps 21958 cycles 43181 cycles 0.51
ML-KEM-1024 keypair 44751 cycles 44190 cycles 1.01
ML-KEM-1024 encaps 15924 cycles 49652 cycles 0.32
ML-KEM-1024 decaps 29486 cycles 62638 cycles 0.47

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 3rd gen (c6i) (no-opt)

Benchmark suite Current: 4c5b57e Previous: 3dc9642 Ratio
ML-KEM-512 keypair 47319 cycles 47172 cycles 1.00
ML-KEM-512 encaps 33047 cycles 56388 cycles 0.59
ML-KEM-512 decaps 51798 cycles 72631 cycles 0.71
ML-KEM-768 keypair 78567 cycles 77979 cycles 1.01
ML-KEM-768 encaps 45414 cycles 89785 cycles 0.51
ML-KEM-768 decaps 70661 cycles 111101 cycles 0.64
ML-KEM-1024 keypair 114683 cycles 115046 cycles 1.00
ML-KEM-1024 encaps 59010 cycles 130352 cycles 0.45
ML-KEM-1024 decaps 91673 cycles 157863 cycles 0.58

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 4th gen (c7a) (no-opt)

Benchmark suite Current: 4c5b57e Previous: 3dc9642 Ratio
ML-KEM-512 keypair 37564 cycles 37785 cycles 0.99
ML-KEM-512 encaps 27686 cycles 44424 cycles 0.62
ML-KEM-512 decaps 43785 cycles 58614 cycles 0.75
ML-KEM-768 keypair 61384 cycles 61433 cycles 1.00
ML-KEM-768 encaps 37479 cycles 70047 cycles 0.54
ML-KEM-768 decaps 59200 cycles 89231 cycles 0.66
ML-KEM-1024 keypair 88513 cycles 88826 cycles 1.00
ML-KEM-1024 encaps 46691 cycles 101176 cycles 0.46
ML-KEM-1024 decaps 73050 cycles 123493 cycles 0.59

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton2

Benchmark suite Current: 4c5b57e Previous: 3dc9642 Ratio
ML-KEM-512 keypair 29015 cycles 29045 cycles 1.00
ML-KEM-512 encaps 18346 cycles 34950 cycles 0.52
ML-KEM-512 decaps 29404 cycles 45444 cycles 0.65
ML-KEM-768 keypair 49526 cycles 49244 cycles 1.01
ML-KEM-768 encaps 22398 cycles 55555 cycles 0.40
ML-KEM-768 decaps 38095 cycles 70408 cycles 0.54
ML-KEM-1024 keypair 72489 cycles 71971 cycles 1.01
ML-KEM-1024 encaps 27885 cycles 80753 cycles 0.35
ML-KEM-1024 decaps 48884 cycles 100437 cycles 0.49

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton4 (no-opt)

Benchmark suite Current: 4c5b57e Previous: 3dc9642 Ratio
ML-KEM-512 keypair 37901 cycles 37939 cycles 1.00
ML-KEM-512 encaps 24585 cycles 43313 cycles 0.57
ML-KEM-512 decaps 38593 cycles 55545 cycles 0.69
ML-KEM-768 keypair 63366 cycles 63144 cycles 1.00
ML-KEM-768 encaps 33427 cycles 70605 cycles 0.47
ML-KEM-768 decaps 52681 cycles 86945 cycles 0.61
ML-KEM-1024 keypair 95216 cycles 94525 cycles 1.01
ML-KEM-1024 encaps 44220 cycles 105293 cycles 0.42
ML-KEM-1024 decaps 69331 cycles 126582 cycles 0.55

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton2 (no-opt)

Benchmark suite Current: 4c5b57e Previous: 3dc9642 Ratio
ML-KEM-512 keypair 60752 cycles 61376 cycles 0.99
ML-KEM-512 encaps 38938 cycles 69997 cycles 0.56
ML-KEM-512 decaps 60893 cycles 88962 cycles 0.68
ML-KEM-768 keypair 102315 cycles 102056 cycles 1.00
ML-KEM-768 encaps 52908 cycles 114152 cycles 0.46
ML-KEM-768 decaps 82771 cycles 139501 cycles 0.59
ML-KEM-1024 keypair 155041 cycles 154590 cycles 1.00
ML-KEM-1024 encaps 67320 cycles 170334 cycles 0.40
ML-KEM-1024 decaps 106294 cycles 202511 cycles 0.52

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton3

Benchmark suite Current: 4c5b57e Previous: 3dc9642 Ratio
ML-KEM-512 keypair 19004 cycles 18956 cycles 1.00
ML-KEM-512 encaps 10896 cycles 22521 cycles 0.48
ML-KEM-512 decaps 18360 cycles 29631 cycles 0.62
ML-KEM-768 keypair 32475 cycles 32343 cycles 1.00
ML-KEM-768 encaps 12794 cycles 35917 cycles 0.36
ML-KEM-768 decaps 23503 cycles 46302 cycles 0.51
ML-KEM-1024 keypair 47062 cycles 46668 cycles 1.01
ML-KEM-1024 encaps 16623 cycles 52528 cycles 0.32
ML-KEM-1024 decaps 31165 cycles 66312 cycles 0.47

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton3 (no-opt)

Benchmark suite Current: 4c5b57e Previous: 3dc9642 Ratio
ML-KEM-512 keypair 39326 cycles 39346 cycles 1.00
ML-KEM-512 encaps 24749 cycles 45274 cycles 0.55
ML-KEM-512 decaps 38758 cycles 57142 cycles 0.68
ML-KEM-768 keypair 65940 cycles 65866 cycles 1.00
ML-KEM-768 encaps 33405 cycles 73883 cycles 0.45
ML-KEM-768 decaps 52545 cycles 89852 cycles 0.58
ML-KEM-1024 keypair 99448 cycles 99001 cycles 1.00
ML-KEM-1024 encaps 43458 cycles 109983 cycles 0.40
ML-KEM-1024 decaps 68410 cycles 130860 cycles 0.52

This comment was automatically generated by workflow using github-action-benchmark.

@mkannwischer mkannwischer changed the title Proposal: New API Proposal: API Extension Jan 29, 2025
Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bananapi bpi-f3 benchmarks

Benchmark suite Current: 4c5b57e Previous: 3dc9642 Ratio
ML-KEM-512 keypair 329727 cycles 331334 cycles 1.00
ML-KEM-512 encaps 308683 cycles 439896 cycles 0.70
ML-KEM-512 decaps 472974 cycles 588436 cycles 0.80
ML-KEM-768 keypair 546924 cycles 548599 cycles 1.00
ML-KEM-768 encaps 433011 cycles 688259 cycles 0.63
ML-KEM-768 decaps 648468 cycles 880050 cycles 0.74
ML-KEM-1024 keypair 812647 cycles 814706 cycles 1.00
ML-KEM-1024 encaps 564190 cycles 988517 cycles 0.57
ML-KEM-1024 decaps 830107 cycles 1222676 cycles 0.68

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A55 (Snapdragon 888) benchmarks

Benchmark suite Current: 4c5b57e Previous: 3dc9642 Ratio
ML-KEM-512 keypair 58513 cycles 58238 cycles 1.00
ML-KEM-512 encaps 31247 cycles 65650 cycles 0.48
ML-KEM-512 decaps 51216 cycles 84458 cycles 0.61
ML-KEM-768 keypair 100330 cycles 98763 cycles 1.02
ML-KEM-768 encaps 41423 cycles 110184 cycles 0.38
ML-KEM-768 decaps 69816 cycles 136303 cycles 0.51
ML-KEM-1024 keypair 152515 cycles 150097 cycles 1.02
ML-KEM-1024 encaps 51824 cycles 165912 cycles 0.31
ML-KEM-1024 decaps 89795 cycles 202018 cycles 0.44

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A72 (Raspberry Pi 4) benchmarks

Benchmark suite Current: 4c5b57e Previous: 3dc9642 Ratio
ML-KEM-512 keypair 51722 cycles 51590 cycles 1.00
ML-KEM-512 encaps 28720 cycles 60136 cycles 0.48
ML-KEM-512 decaps 45557 cycles 75769 cycles 0.60
ML-KEM-768 keypair 88252 cycles 88608 cycles 1.00
ML-KEM-768 encaps 33755 cycles 96818 cycles 0.35
ML-KEM-768 decaps 58361 cycles 120138 cycles 0.49
ML-KEM-1024 keypair 132440 cycles 131788 cycles 1.00
ML-KEM-1024 encaps 42071 cycles 145353 cycles 0.29
ML-KEM-1024 decaps 75130 cycles 175673 cycles 0.43

This comment was automatically generated by workflow using github-action-benchmark.

@mkannwischer
Copy link
Contributor Author

@hanno-becker - any thoughts on this?

@hanno-becker
Copy link
Contributor

@mkannwischer What stands in the way of keeping the old API and providing the new one as an (optional) addition?

@mkannwischer mkannwischer force-pushed the newapi branch 2 times, most recently from 3ca90c7 to 1f56ed1 Compare February 25, 2025 04:26
Copy link
Contributor

@hanno-becker hanno-becker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going in a good direction I think. You may defer updating mlkem_native.h for the time being, keeping that to the core API, and using kem.h directly from the test exercising the new API.


crypto_kem_marshal_pk(pk, &pks);
crypto_kem_marshal_sk(sk, &sks);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Referring to the PCT comment, I think we need to retain the PCT here for the time being.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No - If PCT is enabled, all public APIs need to do PCT - so we cannot move it here, otherwise crypto_kem_keypair_struct doesn't peform PCT. We can perform serialization in the PCT if needed, but that is very expensive.

Copy link
Contributor

@hanno-becker hanno-becker Feb 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current code does not perform the PCT on the key material that is returned to the caller -- it is thus not clear if it is FIPS compliant. Please re-introduce the previous form of the PCT.

I don't know how to best introduce a PCT in the internal-format API (without duplicating the PCT in the standard API) -- but this is secondary to me at the moment, because the FIPS-compliance of that API is still questionable anyway.

Of course, one can always have mlk_indcpa_keypair_derand_internal which does not perform the PCT, and (a) call that directly from crypto_kem_keypair, with the PCT on the standard-keys afterwards, (b) call that from mlk_indcpa_keypair_derand(), with the PCT on the internal keys afterwards. This way, you also have PCTs at every top-level API, without duplication.

To be clear: I support the PCT the way you propose to do it -- but we cannot do it until we know that it's compliant. It would at the least be blocking integration into AWS-LC.

Copy link
Contributor Author

@mkannwischer mkannwischer Feb 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay - I have implemented what you said. we should be explicit somewhere that PCT is ONLY implemented when using crypto_kem_keypair, but not when using any of the other functions.
AWS-LC may want to consider changing this in the medium-term - it's much faster if you can work on the internal formats.

Copy link
Contributor

@hanno-becker hanno-becker Feb 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AWS-LC may want to consider changing this in the medium-term - it's much faster if you can work on the internal formats.

Of course it is -- but it's about FIPS compliance. When it's clear that the internal PCT is still considered compliant, I'm confident it would be changed.

@mkannwischer mkannwischer force-pushed the newapi branch 5 times, most recently from 0ab73bb to c1e1a00 Compare February 25, 2025 07:26
@mkannwischer mkannwischer force-pushed the newapi branch 10 times, most recently from c2de257 to 38e570c Compare February 26, 2025 03:46
#define crypto_kem_keypair_struct MLK_NAMESPACE_K(keypair_struct)
#define crypto_kem_enc_derand_struct MLK_NAMESPACE_K(enc_derand_struct)
#define crypto_kem_enc_struct MLK_NAMESPACE_K(enc_struct)
#define crypto_kem_dec_struct MLK_NAMESPACE_K(dec_struct)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be MLK_ADD_LEVEL for the structs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I can follow. For the struct I am using using MLK_ADD_LEVEL,

@mkannwischer mkannwischer force-pushed the newapi branch 3 times, most recently from 74ce0ac to 39391ea Compare February 27, 2025 07:57
Signed-off-by: Matthias J. Kannwischer <[email protected]>
Signed-off-by: Matthias J. Kannwischer <[email protected]>
Signed-off-by: Matthias J. Kannwischer <[email protected]>
Signed-off-by: Matthias J. Kannwischer <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
benchmark this PR should be benchmarked in CI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants