Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add basic support for s390x #1297

Merged
merged 2 commits into from
Oct 1, 2023
Merged

Add basic support for s390x #1297

merged 2 commits into from
Oct 1, 2023

Conversation

uweigand
Copy link
Contributor

This adds basic support for Linux on s390x, using the fallback implementation of all algorithms and no assembler routines. This addresses #986 and makes "cargo test" fully pass on s390x.

There were two main changes I had to make to get this working:

  • Provide a bn_mul_mont fallback implementation. Note that while there is a fallback implementation on the Rust side for limbs_mont_mul, some elliptic curve code directly calls bn_mul_mont on the C side, where there is currently no fallback. However, given that we have a bn_from_montgomery_in_place fallback, it is straightforward to add bn_mul_mont (along the lines of the limbs_mont_mul fallback).
  • Support big-endian platforms. This consists of adding a number of endian access primitives in internal.h, and using them where required, currently in aes_nohw.c, poly1305.c, and the p256_scalar_bytes_from_limbs routine (for some reason, the p384 version of that routine is already endian-agnostic). I have chosen to use a plain C implementation of the primitives rather than bswap intrinsics, because that is easier to maintain, doesn't require platform #ifdef's, and still compiles to good code with any recent compiler.

@briansmith, please let me know if this approach makes sense to you or if this should be done differently.

Once this basic (unoptimized) support is in, we can then build on it by adding assembler optimizations (they're already present in OpenSSL, but would need to be copied over and adapted).

@uweigand
Copy link
Contributor Author

uweigand commented Jun 9, 2021

Ping? Any comments on this approach?

#if defined(__GNUC__) && __GNUC__ >= 2
static inline uint32_t CRYPTO_bswap4(uint32_t x) {
return __builtin_bswap32(x);
static inline uint32_t CRYPTO_read_le32(const uint8_t *p) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will do a merge from the BoringSSL sources to include these functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the BoringSSL versions (CRYPTO_load_u32_le etc.) only work correctly on little-endian hosts. (I guess this goes back to the BoringSSL team's decision to not support big-endian hosts at all.)

The versions I've provided in this PR work on any host, little-endian or big-endian.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I know if I merge the BoringSSL versions then they'll need to be modified to support big endian. I also am very fond of the endian-neutral implementation style you chose. However, I remember that one compiler I tested (I think MSVC) didn't recognize those idioms as of a couple years ago. Also, it's easier to make use of the BoringSSL team's performance/benchmarking work and the AWS/Galois team's verification work if we keep the code the same for little-endian platforms. That's why I'm thinking to merge in the BoringSSL teams's functions and then we can modify them to support big-endian.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I see. That should work as well, thanks.

@@ -50,7 +50,14 @@ typedef unsigned char P256_SCALAR_BYTES[33];

static inline void p256_scalar_bytes_from_limbs(
P256_SCALAR_BYTES bytes_out, const BN_ULONG limbs[P256_LIMBS]) {
OPENSSL_memcpy(bytes_out, limbs, 32);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to go through all the remaining uses of OPENSSL_memcpy and make sure they're OK.

crypto/fipsmodule/bn/montgomery.c Outdated Show resolved Hide resolved
!defined(OPENSSL_ARM) && !defined(OPENSSL_AARCH64)
void bn_mul_mont(BN_ULONG *rp, const BN_ULONG *ap, const BN_ULONG *bp,
const BN_ULONG *np, const BN_ULONG *n0, size_t num) {
Limb tmp[2 * num];
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this makes sense. This would be the first place we've used variable-length arrays in ring though, which makes me hesitate. I will circle back to this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we could use a global maximum limit along the lines of MODULUS_MAX_LIMBS in src/arithmetic/bigint.rs. I'd be happy to implement whatever change you prefer -- just let me know.

@vimalk78
Copy link

hi @uweigand @briansmith, i am also interested in getting ring supported for s390x. Let me know if i can contribute in some way.
From this PR, I was able to build ring using cross build --release --target s390x-unknown-linux-gnu
btw, how do we get the above tests to run on the PR?

@vimalk78
Copy link

ping @briansmith @uweigand

@uweigand
Copy link
Contributor Author

Hi @briansmith, let me try to make another attempt to get this going.

I've just updated the patch to use a different style of endian conversion, as you suggested. This keeps the CRYPTO_bswap4 implementation from boringssl and adds the corresponding CRYPTO_bswap8 implementation from there as well. The other endian-specific accessors are now all defined in term of CRYPTO_bswap4 / CRYPTO_bswap8 plus OPENSSL_memcpy (to handle any alignment issues if necessary).

Does this address your concerns?

If there's anything else I can do to help get this reviewed and merged, please let me know.

@uweigand
Copy link
Contributor Author

uweigand commented May 4, 2022

Fixed merge conflict against current mainline. Any comments?

@Xynnn007
Copy link

Xynnn007 commented Nov 4, 2022

Hi @briansmith , can this pr be merged now?

@uweigand
Copy link
Contributor Author

uweigand commented Nov 4, 2022

Rebased and fixed merge conflicts against current mainline.

@uweigand
Copy link
Contributor Author

Added support for cross-building and testing as requested in #1555.
Split into three logically distinct commits.

@uweigand
Copy link
Contributor Author

PR updated after #1558 was merged.

@uweigand
Copy link
Contributor Author

Please rebase on top of main and resubmit.

Done.

Why did the earlier version of this have changes to p256_shared.h but the newest one doesn't?

Huh, I thought I added a comment but it seems to have disappeared. I've removed that change as re-testing showed it is no longer necessary in the current code base.

The change was to fix an endian issue in p256_scalar_bytes_from_limbs, but that function is now only called from the crypto/fipsmodule/ec/p256-nistz.c file, which in turn is only even built at all on the (little-endian) AARCH64 and X86_64 targets. So there is no need for this file to support big-endian platforms, and any code added there would be untested anyway.

@briansmith
Copy link
Owner

Good news:
The refactoring to fix the --no-default-features build was merged into main.

Less good news:
There's a merge conflict in crypto/internal.h from the latest BoringSSL merge. Also, PR #1663 moved the stuff in base.h to target.h. I will merge PR #1663 ASAP so you can rebase this on top of main for the release. I am hoping to finish the BoringSSL merge today so there shouldn't be any new conflicts.

@uweigand
Copy link
Contributor Author

@briansmith: Rebased against current mainline to fix both merge conflicts.

@uweigand
Copy link
Contributor Author

@briansmith - this is unfortunate. The recently added test_constant_time test now fails on s390x. However, the test passes when run on a native machine, and it turns out this is actually a bug in qemu - and one that was already fixed here: https://lists.gnu.org/archive/html/qemu-devel/2023-05/msg06965.html

For some reason, clang chooses to use this particular z13 instruction (LOCFHR - which is generally quite rare) when implementing the bssl_constant_time_test_main routine.

I'm wondering what the best way to proceed here would be. I can imagine a number of options:

  • Just disable this test on s390x (either generally, or only when running on qemu)
  • Use a qemu that has the above fix - but that would mean building qemu yourself, or switching to Ubuntu 23.04 (if github actions already support that?), as older distro versions aren't recent enough
  • Tell clang to not emit this instruction, e.g. by selecting an older target architecture (adding -march=zEC12 to CFLAGS_s390x_unknown_linux_gnu works for me)

Any preferences?

@briansmith
Copy link
Owner

Just disable this test on s390x (either generally, or only when running on qemu)

Let's pick this choice. Note that this test isn't new. It is one of the oldest ones, which is why it is written in C!

@uweigand
Copy link
Contributor Author

Just disable this test on s390x (either generally, or only when running on qemu)

Let's pick this choice. Note that this test isn't new. It is one of the oldest ones, which is why it is written in C!

Huh, interesting. Not sure what exactly triggered this to show up just now - the test seems to have passed before the recent boringssl merges. Anyway, with things like that, some random changes in surrounding code may trigger different instruction selection choices.

I've updated the patch now to exclude this one test on s390x.

@briansmith
Copy link
Owner

briansmith commented Sep 30, 2023

Thanks. Maybe IBM can send me one of these 390x boxes? Then my daughter could do her homework on it when I'm not using it for testing. :)

@uweigand
Copy link
Contributor Author

Thanks. Maybe IBM can send me one of these 390x boxes? Then my daughter could do her homework on it when I'm not using it for testing. :)

Well, we're not usually sending out boxes, but if you're actually interested, you could in fact get free access to s390x machines for the purpose of open-source development and testing :-)

See here: https://www.ibm.com/community/z/open-source/virtual-machines-request/

@uweigand
Copy link
Contributor Author

Rebased once again after the riscv64 merge.

Copy link
Owner

@briansmith briansmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your PR is the next one to merge after the requested changes are made. Sorry about the many conflicts and delays.

@@ -40,6 +40,7 @@ mod tests {
use crate::{bssl, error};

#[test]
#[cfg(not(target_arch = "s390x"))] // Triggers a qemu bug before 8.0.3
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, this is not what I was expecting. I thought we'd have an #ifdef around a line or two within constant_time_test.cc.

Maybe instead we should change ci.yml so that we have something like:

          - target: s390x-unknown-linux-gnu
            host_os: ubuntu-22.04
            # XXX: `constant_time_test` fails when run under QEMU 8.0.2 and earlier
            # (https://link that you shared with me in your earlier comment).
            cargo_options:  -- --skip constant_time_test

This way people would not ever run into the issue on bare metal testing and I could also play with it locally. WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, this is indeed nicer. I've changed the PR accordingly.

@@ -266,6 +267,9 @@ jobs:
- target: x86_64-unknown-linux-gnu
host_os: ubuntu-22.04

- target: s390x-unknown-linux-gnu
host_os: ubuntu-22.04
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the analogous PPC64LE PR I just merged, the author noted that they kept the alphabetical order intact. Could you please do the same throughout this PR? It's an arbitrary rule but it avoids people needing to wonder where to put things.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used alphabetic order everywhere now.

@briansmith
Copy link
Owner

It seems like codecov doesn't understand s390x profiler output? Too bad, but we'll deal.

@uweigand
Copy link
Contributor Author

uweigand commented Oct 1, 2023

It seems like codecov doesn't understand s390x profiler output? Too bad, but we'll deal.

Why do you say it doesn't understand the output? I saw changes reported for most of the new lines added (except for those in unused inline functions), and also for the now-skipped test_constant_time test ...

@uweigand
Copy link
Contributor Author

uweigand commented Oct 1, 2023

Ah, the --skip test_constant_time unfortunately seems to break the full run:

     Running `qemu-s390x -L /usr/s390x-linux-gnu /home/uweigand/ring/target/s390x-unknown-linux-gnu/release/deps/aead-0c8df580230e6a2e --skip test_constant_time`
error: Found argument '--skip' which wasn't expected, or isn't valid in this context

It is passed to both tests and benchmarks, and benchmarks apparently do not expect the --skip argument.

I've now changed the patch to use -march=zEC12 instead. That seems the straightforward workaround (directly addresses the root cause that QEMU fails to correctly emulate a z13 instruction), only needs a change in one place, and also is used only when doing the QEMU-based cross testing.

What do you think? If you have any other suggestion, please let me know.

@briansmith briansmith merged commit baa823b into briansmith:main Oct 1, 2023
166 of 168 checks passed
@briansmith
Copy link
Owner

Thanks!

@uweigand uweigand deleted the s390x branch October 1, 2023 22:13
@uweigand
Copy link
Contributor Author

uweigand commented Oct 1, 2023

Thanks for your support in getting this merged!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants