Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-99108: add HACL*-based 1-shot HMAC implementation #126359

Draft
wants to merge 35 commits into
base: main
Choose a base branch
from

Conversation

picnixz
Copy link
Contributor

@picnixz picnixz commented Nov 3, 2024

I'm opening a PR but I'm pretty sure that there are multiple things to discuss:

  • HACL* HMAC requires including hash functions headers (by the way, SHA1 and MD5 headers are not included in Hacl_HMAC.h, but they are included in the .c; you might want to check that on your side @msprotz).
  • I decided to create static intermediate libraries just. Otherwise, I'm getting a bunch of "undefined symbols". I know that we have a simplification PR (gh-116043: Simplify HACL* build #119320) to get rid of *.a. The reason I kept (and used) them is the following:
    • When I use normal dependencies in Setup.stdlib.in, I get a bunch of warnings "Makefile:3471: warning: overriding recipe for target 'Modules/_hacl/Hacl_Hash_MD5.o" for instance.
    • In addition, I don't know how to specify that the HMAC module should use libHacl_Hash_Blake2.a (I get "gcc: error: Modules/_hacl/libHacl_Hash_Blake2.a: No such file or directory").
      The build I suggest compiles on my machine and I would appreciate if @erlend-aasland could help me in cleanng it up.

I'm leaving my dev session in a few hours and won't be back until Wednesday. I need to test the implementatino as well (for SHA1, I think something is weird because I get different results, well mainly junk at the end but I'll need to check again when I'm back).

HACL* is at revision fc2e38f4d899ba28665c5b91caedaf35b3b37452.
@picnixz
Copy link
Contributor Author

picnixz commented Nov 3, 2024

Adding skip news for now just for the CI to be green.

@picnixz picnixz force-pushed the hacl/HMAC-99108 branch 5 times, most recently from ca5e851 to d6c92a6 Compare November 3, 2024 13:19
@picnixz
Copy link
Contributor Author

picnixz commented Nov 3, 2024

@msprotz The following code is not liked by MSVC:

  uint32_t l = 64U;
  KRML_CHECK_SIZE(sizeof (uint8_t), l);
  uint8_t key_block[l]; // <- this

I wonder whether you can do something (like inlining the variable instead) or tell me if it's possible to make MSVC happy.

@picnixz
Copy link
Contributor Author

picnixz commented Nov 3, 2024

I'm marking this PR as ready for review even though it fails to build on Windows. We'll address that later (for now, I want to get feedback on the build itself and how to get rid of .a files).

Copy link
Member

@ZeroIntensity ZeroIntensity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sometime I forget that you're a cryptography genius 😅

Modules/hmacmodule.c Outdated Show resolved Hide resolved
PyMODINIT_FUNC
PyInit__hmac(void)
{
return PyModuleDef_Init(&_hmacmodule);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(This is primarily a note for myself when this is closer to being ready.)

Do modules without a m_traverse get to use DRC to ease contention (which will be especially bad here because of the critical sections)? If not, force the module to do that, or consider making this immortal somehow. (I'll follow up on this--it will take some work, because I don't think we're doing it anywhere else in the stdlib.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(For others: DRC means deferred reference count). This is something I also wondered, namely should I explicitly call Py_VISIT(Py_TYPE(mod)) even if I don't have a state or not? I assume not because that's what multiple modules with m_size = 0 do as well. I don't know where to find an exact reference on that behaviour though.

Modules/hmacmodule.c Outdated Show resolved Hide resolved
@Nankunda489 Nankunda489 mentioned this pull request Nov 3, 2024
configure.ac Outdated
@@ -7460,6 +7460,15 @@ for builtin_hash in $with_builtin_hashlib_hashes; do
done
IFS=$as_save_IFS

# builtin HACL* HMAC module
AC_MSG_CHECKING([for --with-builtin-hashlib-hmac=yes,no])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a good reason to have a configure flag for this? I don't expect anyone to turn it off. If so, document why it's even an option in a comment in here, or in the flags own help text.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't really sure about this. It's possible to disable the sha3 module, which is also built-in with HACL* now, so I wondered whether it could make sense to disable the HACL* implementation of HMAC for people who wanted to rely on OpenSSL (for whatever reason).

But I think we can definitely remove this option. WDYT?

@gpshead
Copy link
Member

gpshead commented Nov 11, 2024

I plan to keep this 1-shot HMAC in the (new) private _hmac module but expose it directly in hmac.py via new functions. That way, people can have both the OpenSSL-based implementation and the HACL* one if they want (and we could fallback implicitly to OpenSSL if the data is too large).

I don't think we need any new public APIs within the hmac module for this. The existing hmac.digest(key, msg, digest) function is sufficient. The existing documentation around when an optimal implementation is used can be updated. That existing doc note already means people who care about performance are already likely using that function.

@picnixz
Copy link
Contributor Author

picnixz commented Nov 11, 2024

I don't think we need any new public APIs within the hmac module for this.

Ah yes sorry, I forgot about the standalone hmac.digest. It's perfectly fine to update that implementation to use standalone HACL* HMAC when possible.

@picnixz
Copy link
Contributor Author

picnixz commented Nov 11, 2024

Ok, I need to think of a way to easily map hashlib<->openssl dispatch for HMAC. Converting into a draft for now.

@picnixz picnixz marked this pull request as draft November 11, 2024 12:39
@msprotz
Copy link
Contributor

msprotz commented Nov 11, 2024

Just to set expectations, @R1kM and I looked at what it would take to add streaming HMAC and we're relatively confident that it can be done within a few weeks. We have a couple more urgent things, then we'll get to it and post here as soon as we have an update, and then it'll simplify everything to have a single implementation that can do streaming too.

@picnixz
Copy link
Contributor Author

picnixz commented Nov 12, 2024

Ok, so I managed to fix the compilation and find a nice way to resolve OpenSSL names. However, since I need to bind OpenSSL for Windows builds, I'd like someone to create the project files for Windows since I don't have .NET installed (and honestly I don't want to install it; I could use a VM if I have time). So I'll just ignore the Windows build failures for now.

I looked at what it would take to add streaming HMAC and we're relatively confident that it can be done within a few weeks

No hurry, take your time! It'd be nice if the API for all HMAC-HASH functions are the same (or at least, that the functions can be stored using the same function pointer type).

@msprotz
Copy link
Contributor

msprotz commented Nov 12, 2024

It'd be nice if the API for all HMAC-HASH functions are the same (or at least, that the functions can be stored using the same function pointer type).

There are two styles of APIs, and we're not sure yet which one we want to go for. Your feedback is most welcome.

Agile APIs are of the form struct state { union { uint64_t *sha2_512_state; uint32_t *sha2_256 state; .... } hash_state; seen: uint64_t; hash_alg a; } void update(state *s, uint8_t *input, uint32_t input_len);

  • a single update function for all choices of algorithms
  • choice of algorithm is provided at state allocation-time (i.e. void malloc(hash_alg a);)
  • extra level of indirection in the state due to the need to have a tagged union

Plain APIs are of the form typedef uint64_t *sha2_512 state; void update_512(sha2_512 *state, uint8_t *input, uint32_t input_len); and come in different variants (and different types) for all algorithms, i.e. there's also typedef uint32_t *sha2_256 state; void update_256(sha2_256 *state, uint8_t *input, uint32_t *input_len);, etc.

  • one update function for each algorithm
  • client manages the choice of algorithm (possibly keeping a tagged union themselves, or function pointers, whatever their style is)
  • no extra level of indirection
  • massive code duplication since each algorithm re-implements the state management logic

Thoughts on which one you'd prefer?

@picnixz
Copy link
Contributor Author

picnixz commented Nov 12, 2024

Thank you for the reply! Some questions:

massive code duplication since each algorithm re-implements the state management logic"

I assume however that performances are slightly better since you don't have indirection, right? How better do they fare in general? are plain APIs really better than agile ones?


Agile API seems good because I only need to specify the hash_alg value and be done with that. I'll need to think a bit more on that but I'm leaving for today and won't be available until tomorrow. I'll try to find a good example of how I plan to use it and then I'll come back to you!

@msprotz
Copy link
Contributor

msprotz commented Nov 12, 2024

I assume however that performances are slightly better since you don't have indirection, right? How better do they fare in general? are plain APIs really better than agile ones?

generally there's no noticeable performance impact unless you repeatedly hmac a lot of very small data, and repeatedly create/destroy the underlying python object, meaning you end up calling malloc/free a lot

the plain APIs exist for engineering reasons: you can package a single hmac algorithm along with its plain API, but if the agile API is the only API offered, a client like Python has to take all of the hmac algorithms in one go (since the toplevel agile API now contains references to every individual algorithm)

if you intend to overhaul all of the hmac algorithms in one go, then I would do agile API

@picnixz
Copy link
Contributor Author

picnixz commented Nov 15, 2024

FTR, I have a branch with streaming HMAC: https://github.com/picnixz/cpython/tree/hacl/HMAC-stream-99108. For now, I'm assuming that I have a plain API that can be used as follows:

/* Function pointer type for HACL* streaming HMAC state allocation */
typedef void *(*HACL_HMAC_state_malloc_func)(void);
/* Function pointer type for HACL* streaming HMAC state deallocation */
typedef void (*HACL_HMAC_state_free_func)(void *state);
/* Function pointer type for HACL* streaming HMAC state copy */
typedef void *(*HACL_HMAC_state_copy_func)(void *state);

/* Function pointer type for HACL* streaming HMAC setup functions. */
typedef hacl_exit_code
(*HACL_HMAC_setup_func)(void *state, uint8_t *key, uint32_t len);
/* Function pointer type for HACL* streaming HMAC update functions. */
typedef hacl_exit_code
(*HACL_HMAC_update_func)(void *state, uint8_t *buf, uint32_t len);
/* Function pointer type for HACL* streaming HMAC digest functions. */
typedef hacl_exit_code
(*HACL_HMAC_digest_func)(void *state, uint8_t *out);
  • HACL_HMAC_state_malloc_func would allocate me the state for the HMAC-H algorithm
  • HACL_HMAC_state_free_func would allow me to free it
  • HACL_HMAC_state_copy_func would allow me to clone it
  • HACL_HMAC_setup_func would allow me to feed the state with the user key (given at construction time). No message is given at that moment yet.
  • HACL_HMAC_update_func would allow me to feed data in a streaming fashion.
  • HACL_HMAC_digest_func would allow me to extract the digest from the state and putting it in out.

Ideally, I would like to expose the transformation function $f\colon\left\{0,1\right\}^*\longrightarrow\left\{0,1\right\}^B$ where $B$ is the block size so that users can also pass keys whose length exceeds $2^{32}-1$ or pad keys.

For now, the branch I mentioned assumes that the state consists of the message until now and the key (just to make it work).


Now, using an agile API, I could lighten the current interface to avoid storing those function pointers and only store the hash_alg. However, I'd like to be able to feed the initial key without having to feed the message, as well as the possibility to clone a state (since we have a .copy() method).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants