-
Notifications
You must be signed in to change notification settings - Fork 705
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add basic support for s390x #1297
Conversation
Ping? Any comments on this approach? |
crypto/internal.h
Outdated
#if defined(__GNUC__) && __GNUC__ >= 2 | ||
static inline uint32_t CRYPTO_bswap4(uint32_t x) { | ||
return __builtin_bswap32(x); | ||
static inline uint32_t CRYPTO_read_le32(const uint8_t *p) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will do a merge from the BoringSSL sources to include these functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that the BoringSSL versions (CRYPTO_load_u32_le etc.) only work correctly on little-endian hosts. (I guess this goes back to the BoringSSL team's decision to not support big-endian hosts at all.)
The versions I've provided in this PR work on any host, little-endian or big-endian.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I know if I merge the BoringSSL versions then they'll need to be modified to support big endian. I also am very fond of the endian-neutral implementation style you chose. However, I remember that one compiler I tested (I think MSVC) didn't recognize those idioms as of a couple years ago. Also, it's easier to make use of the BoringSSL team's performance/benchmarking work and the AWS/Galois team's verification work if we keep the code the same for little-endian platforms. That's why I'm thinking to merge in the BoringSSL teams's functions and then we can modify them to support big-endian.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I see. That should work as well, thanks.
crypto/fipsmodule/ec/p256_shared.h
Outdated
@@ -50,7 +50,14 @@ typedef unsigned char P256_SCALAR_BYTES[33]; | |||
|
|||
static inline void p256_scalar_bytes_from_limbs( | |||
P256_SCALAR_BYTES bytes_out, const BN_ULONG limbs[P256_LIMBS]) { | |||
OPENSSL_memcpy(bytes_out, limbs, 32); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to go through all the remaining uses of OPENSSL_memcpy and make sure they're OK.
crypto/fipsmodule/bn/montgomery.c
Outdated
!defined(OPENSSL_ARM) && !defined(OPENSSL_AARCH64) | ||
void bn_mul_mont(BN_ULONG *rp, const BN_ULONG *ap, const BN_ULONG *bp, | ||
const BN_ULONG *np, const BN_ULONG *n0, size_t num) { | ||
Limb tmp[2 * num]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree this makes sense. This would be the first place we've used variable-length arrays in ring though, which makes me hesitate. I will circle back to this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we could use a global maximum limit along the lines of MODULUS_MAX_LIMBS in src/arithmetic/bigint.rs. I'd be happy to implement whatever change you prefer -- just let me know.
hi @uweigand @briansmith, i am also interested in getting |
ping @briansmith @uweigand |
Hi @briansmith, let me try to make another attempt to get this going. I've just updated the patch to use a different style of endian conversion, as you suggested. This keeps the CRYPTO_bswap4 implementation from boringssl and adds the corresponding CRYPTO_bswap8 implementation from there as well. The other endian-specific accessors are now all defined in term of CRYPTO_bswap4 / CRYPTO_bswap8 plus OPENSSL_memcpy (to handle any alignment issues if necessary). Does this address your concerns? If there's anything else I can do to help get this reviewed and merged, please let me know. |
Fixed merge conflict against current mainline. Any comments? |
Hi @briansmith , can this pr be merged now? |
Rebased and fixed merge conflicts against current mainline. |
Added support for cross-building and testing as requested in #1555. |
PR updated after #1558 was merged. |
Done.
Huh, I thought I added a comment but it seems to have disappeared. I've removed that change as re-testing showed it is no longer necessary in the current code base. The change was to fix an endian issue in |
Good news: Less good news: |
@briansmith: Rebased against current mainline to fix both merge conflicts. |
@briansmith - this is unfortunate. The recently added For some reason, clang chooses to use this particular z13 instruction (LOCFHR - which is generally quite rare) when implementing the I'm wondering what the best way to proceed here would be. I can imagine a number of options:
Any preferences? |
Let's pick this choice. Note that this test isn't new. It is one of the oldest ones, which is why it is written in C! |
Huh, interesting. Not sure what exactly triggered this to show up just now - the test seems to have passed before the recent boringssl merges. Anyway, with things like that, some random changes in surrounding code may trigger different instruction selection choices. I've updated the patch now to exclude this one test on s390x. |
Thanks. Maybe IBM can send me one of these 390x boxes? Then my daughter could do her homework on it when I'm not using it for testing. :) |
Well, we're not usually sending out boxes, but if you're actually interested, you could in fact get free access to s390x machines for the purpose of open-source development and testing :-) See here: https://www.ibm.com/community/z/open-source/virtual-machines-request/ |
Rebased once again after the riscv64 merge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your PR is the next one to merge after the requested changes are made. Sorry about the many conflicts and delays.
src/constant_time.rs
Outdated
@@ -40,6 +40,7 @@ mod tests { | |||
use crate::{bssl, error}; | |||
|
|||
#[test] | |||
#[cfg(not(target_arch = "s390x"))] // Triggers a qemu bug before 8.0.3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, this is not what I was expecting. I thought we'd have an #ifdef
around a line or two within constant_time_test.cc
.
Maybe instead we should change ci.yml so that we have something like:
- target: s390x-unknown-linux-gnu
host_os: ubuntu-22.04
# XXX: `constant_time_test` fails when run under QEMU 8.0.2 and earlier
# (https://link that you shared with me in your earlier comment).
cargo_options: -- --skip constant_time_test
This way people would not ever run into the issue on bare metal testing and I could also play with it locally. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, this is indeed nicer. I've changed the PR accordingly.
.github/workflows/ci.yml
Outdated
@@ -266,6 +267,9 @@ jobs: | |||
- target: x86_64-unknown-linux-gnu | |||
host_os: ubuntu-22.04 | |||
|
|||
- target: s390x-unknown-linux-gnu | |||
host_os: ubuntu-22.04 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the analogous PPC64LE PR I just merged, the author noted that they kept the alphabetical order intact. Could you please do the same throughout this PR? It's an arbitrary rule but it avoids people needing to wonder where to put things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Used alphabetic order everywhere now.
It seems like codecov doesn't understand s390x profiler output? Too bad, but we'll deal. |
Why do you say it doesn't understand the output? I saw changes reported for most of the new lines added (except for those in unused inline functions), and also for the now-skipped test_constant_time test ... |
Ah, the
It is passed to both tests and benchmarks, and benchmarks apparently do not expect the I've now changed the patch to use What do you think? If you have any other suggestion, please let me know. |
Thanks! |
Thanks for your support in getting this merged! |
This adds basic support for Linux on s390x, using the fallback implementation of all algorithms and no assembler routines. This addresses #986 and makes "cargo test" fully pass on s390x.
There were two main changes I had to make to get this working:
@briansmith, please let me know if this approach makes sense to you or if this should be done differently.
Once this basic (unoptimized) support is in, we can then build on it by adding assembler optimizations (they're already present in OpenSSL, but would need to be copied over and adapted).