Skip to content

Commit

Permalink
Merge pull request #740 from pq-code-package/simpasm_move
Browse files Browse the repository at this point in the history
Use 'raw' assembly in main source tree
  • Loading branch information
mkannwischer authored Feb 5, 2025
2 parents 16c34ff + 6f7401f commit 3a4cabc
Show file tree
Hide file tree
Showing 125 changed files with 23,036 additions and 12,561 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/bench.yml
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ jobs:
name: Arm Cortex-A55 (Snapdragon 888) benchmarks
bench_pmu: PERF
archflags: "-mcpu=cortex-a55 -march=armv8.2-a"
cflags: "-flto -static -DFORCE_AARCH64 -DMLKEM_NATIVE_FIPS202_BACKEND_FILE=\\\\\\\"fips202/native/aarch64/cortex_a55.h\\\\\\\""
cflags: "-flto -static -DFORCE_AARCH64 -DMLKEM_NATIVE_FIPS202_BACKEND_FILE=\\\\\\\"fips202/native/aarch64/meta_cortex_a55.h\\\\\\\""
bench_extra_args: -w exec-on-a55
- system: bpi
name: SpacemiT K1 8 (Banana Pi F3) benchmarks
Expand Down
70 changes: 46 additions & 24 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,9 @@ concurrency:
jobs:
lint:
strategy:
fail-fast: false
matrix:
system: [ubuntu-latest]
system: [ubuntu-latest, ubuntu-24.04-arm]
name: Linting
runs-on: ${{ matrix.system }}
steps:
Expand Down Expand Up @@ -164,6 +165,50 @@ jobs:
- name: multilevel_build_native
run: |
CFLAGS="-O0" make run -C examples/multilevel_build_native
check_autogenerated_files:
needs: [quickcheck, quickcheck-windows, quickcheck-c90, quickcheck-lib, examples, lint, lint-markdown-link]
strategy:
fail-fast: false
matrix:
system: [ubuntu-latest, ubuntu-24.04-arm]
runs-on: ${{ matrix.system }}
name: Check autogenerated files
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: ./.github/actions/setup-shell
with:
nix-shell: 'ci-cross' # Need cross-compiler for ASM simplification
nix-cache: 'true'
gh_token: ${{ secrets.GITHUB_TOKEN }}
script: |
python3 ./scripts/autogen --dry-run --force-cross
simpasm:
strategy:
fail-fast: false
matrix:
backend:
- arg: '--aarch64-clean'
name: Clean
- arg: ''
name: Optimized
simplify:
- arg: ''
name: Simplified
- arg: '--no-simplify'
name: Unmodified
runs-on: ubuntu-24.04-arm
name: AArch64 dev backend (${{ matrix.backend.name }}, ${{ matrix.simplify.name }})
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- name: Reinstate and test backend
uses: ./.github/actions/setup-shell
with:
nix-shell: 'ci'
gh_token: ${{ secrets.GITHUB_TOKEN }}
script: |
./scripts/autogen ${{ matrix.backend.arg }} ${{ matrix.simplify.arg }}
make clean
OPT=1 make quickcheck
build_kat:
needs: [quickcheck, quickcheck-windows, quickcheck-c90, quickcheck-lib, examples, lint, lint-markdown-link]
strategy:
Expand Down Expand Up @@ -267,29 +312,6 @@ jobs:
gh_token: ${{ secrets.GITHUB_TOKEN }}
compile_mode: native
cflags: "-DMLKEM_DEBUG -fsanitize=address -fsanitize=undefined -fno-sanitize-recover=all"
simpasm:
name: ASM simplifier
strategy:
fail-fast: false
matrix:
target:
- runner: ubuntu-24.04-arm
arch: 'aarch64'
- runner: ubuntu-latest
arch: 'x86_64'
runs-on: ${{ matrix.target.runner }}
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- name: Simplify arithmetic assembly
run: |
./scripts/simpasm -d mlkem/native/${{ matrix.target.arch }}/src --cflags='-Imlkem/native/${{ matrix.target.arch }}/src/' -p
- name: Simplify FIPS202 assembly (AArch64 only)
if: ${{ matrix.target.arch == 'aarch64' }}
run: |
./scripts/simpasm -d mlkem/fips202/native/aarch64/src --cflags='-Imlkem/fips202/native/aarch64/src/ -march=armv8.4-a+sha3' -p
- name: Test simplified assembly
run: |
OPT=1 make quickcheck
compiler_tests:
name: Compiler tests (${{ matrix.compiler.name }}, ${{ matrix.target.name }})
needs: [quickcheck, quickcheck-windows, quickcheck-c90, quickcheck-lib, examples, lint, lint-markdown-link]
Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,8 +80,8 @@ offers three backends for C, AArch64 and x86_64 - if you'd like contribute new b
PR.

Our AArch64 assembly is developed using [SLOTHY](https://github.com/slothy-optimizer/slothy): We write
'clean' assembly by hand and automate micro-optimizations (e.g. see the [clean](test/aarch64_clean/src/ntt_clean.S)
vs [optimized](mlkem/native/aarch64/src/ntt_opt.S) AArch64 NTT).
'clean' assembly by hand and automate micro-optimizations (e.g. see the [clean](dev/aarch64_clean/src/ntt_clean.S)
vs [optimized](mlkem/native/aarch64/src/ntt_opt.S) AArch64 NTT). See [dev/README.md](dev/README.md) for more details.

## How should I use mlkem-native?

Expand Down
50 changes: 50 additions & 0 deletions dev/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
[//]: # (SPDX-License-Identifier: CC-BY-4.0)

# Development files

This directory contains intermediate development artifacts that are not part of the final mlkem-native sources.

It is only relevant to you if you are developing mlkem-native or would like to understand the origin
of the assembly source files.

## AArch64 arithmetic assembly

#### Clean

[`aarch64_clean`](aarch64_clean) contains the 'clean' assembly underlying the AArch64 native backend of mlkem-native.
The files in this directory are handwritten and kept readable through the extensive use of register aliases and macros.

#### Optimized

[`aarch64_opt`](aarch64_opt) contains the results of running the [SLOTHY](https://github.com/slothy-optimizer/slothy/)
superoptimizer on the clean assembly files in [`aarch64_clean`](aarch64_clean). The optimized sections are 'raw'
assembly in the sense that they no longer use register macros or aliases, but the surrounding code (such as the
function preamble and postamble) typically still use those register aliases/macros. Also, the macros and alias
definitions themselves are still kept.

#### Final

The final AArch64 arithmetic assembly from [mlkem/native/aarch64/src](../mlkem/native/aarch64/src) is auto-generated
from the optimized assembly using the [`simpasm`](../scripts/simpasm) script, which simplifies it through a combination
of assembly+disassembly. This final assembly does not contain any register aliases or macros anymore.

The final assembly is autogenerated from the optimized assembly through the [`autogen`](../scripts/autogen) script.
Non-assembly files are synchronized by copy between this directory and [`mlkem`](../mlkem).

#### Testing clean/optimized assembly

To test the clean assembly, run `autogen --aarch64-clean`. This will import the clean backend into `mlkem/native/aarch64/*`,
replacing the optimized one. With `autogen --aarch64-clean --no-simplify` or `autogen --no-simplify` you can moreover reinstate
the non-simplified assembly in the main source tree.

Alternatively, you can also just manually copy the entire `aarch64_clean` and `aarch64_opt` trees into `mlkem/native/aarch64/`.

## AArch64 FIPS-202 assembly

As for the AArch64 arithmetic assembly, the final FIPS-202 assembly is the result of running [`simpasm`](../scripts/simpasm)
on the assembly in [fips202/aarch64/src](fips202/aarch64/src). Non-assembly files are synchronized by copy.

## x86_64 arithmetic assembly

As for the AArch64 arithmetic assembly, the final x86_64 arithmetic assembly is the result of running [`simpasm`](../scripts/simpasm)
on the assembly in [x86_64/src](x86_64/src). Non-assembly files are synchronized by copy.
File renamed without changes.
6 changes: 5 additions & 1 deletion test/aarch64_clean/clean.h → dev/aarch64_clean/meta.h
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@
* SPDX-License-Identifier: Apache-2.0
*/

#ifndef MLKEM_NATIVE_DEV_AARCH64_CLEAN_META_H
#define MLKEM_NATIVE_DEV_AARCH64_CLEAN_META_H
/* ML-KEM arithmetic native profile for clean assembly */

#ifdef MLKEM_NATIVE_ARITH_PROFILE_H
Expand All @@ -19,6 +21,8 @@
/* Filename of the C backend implementation.
* This is not inlined here because this header is included in assembly
* files as well. */
#define MLKEM_NATIVE_ARITH_BACKEND_IMPL "native/aarch64_clean/src/clean_impl.h"
#define MLKEM_NATIVE_ARITH_BACKEND_IMPL "native/aarch64/src/clean_impl.h"

#endif /* MLKEM_NATIVE_ARITH_PROFILE_H */

#endif /* MLKEM_NATIVE_DEV_AARCH64_CLEAN_META_H */
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@
* Copyright (c) 2024-2025 The mlkem-native project authors
* SPDX-License-Identifier: Apache-2.0
*/
#ifndef MLKEM_AARCH64_NATIVE_H
#define MLKEM_AARCH64_NATIVE_H
#ifndef MLKEM_NATIVE_DEV_AARCH64_CLEAN_SRC_ARITH_NATIVE_AARCH64_H
#define MLKEM_NATIVE_DEV_AARCH64_CLEAN_SRC_ARITH_NATIVE_AARCH64_H

#include <stdint.h>
#include "../../../common.h"
Expand Down Expand Up @@ -75,4 +75,4 @@ void polyvec_basemul_acc_montgomery_cached_asm_k4_clean(int16_t *r,
const int16_t *b,
const int16_t *b_cache);

#endif /* MLKEM_AARCH64_NATIVE_H */
#endif /* MLKEM_NATIVE_DEV_AARCH64_CLEAN_SRC_ARITH_NATIVE_AARCH64_H */
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@
* SPDX-License-Identifier: Apache-2.0
*/

#ifndef MLKEM_NATIVE_DEV_AARCH64_CLEAN_SRC_CLEAN_IMPL_H
#define MLKEM_NATIVE_DEV_AARCH64_CLEAN_SRC_CLEAN_IMPL_H
/* ML-KEM arithmetic native profile for clean assembly */

#ifdef MLKEM_NATIVE_ARITH_PROFILE_IMPL_H
Expand Down Expand Up @@ -88,3 +90,5 @@ static INLINE int rej_uniform_native(int16_t *r, unsigned len,
}

#endif /* MLKEM_NATIVE_ARITH_PROFILE_IMPL_H */

#endif /* MLKEM_NATIVE_DEV_AARCH64_CLEAN_SRC_CLEAN_IMPL_H */
19 changes: 19 additions & 0 deletions dev/aarch64_clean/src/consts.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
/*
* Copyright (c) 2024-2025 The mlkem-native project authors
* SPDX-License-Identifier: Apache-2.0
*/

#ifndef MLKEM_NATIVE_DEV_AARCH64_CLEAN_SRC_CONSTS_H
#define MLKEM_NATIVE_DEV_AARCH64_CLEAN_SRC_CONSTS_H

#include <stdint.h>
#include "../../../common.h"

#define zetas_mulcache_native MLKEM_NAMESPACE(zetas_mulcache_native)
extern const int16_t zetas_mulcache_native[256];

#define zetas_mulcache_twisted_native \
MLKEM_NAMESPACE(zetas_mulcache_twisted_native)
extern const int16_t zetas_mulcache_twisted_native[256];

#endif /* MLKEM_NATIVE_DEV_AARCH64_CLEAN_SRC_CONSTS_H */
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
18 changes: 18 additions & 0 deletions dev/aarch64_opt/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
[//]: # (SPDX-License-Identifier: CC-BY-4.0)

# AArch64 backend (little endian)

This directory contains a native backend for little endian AArch64 systems. It is derived from the following research
works:

- _Neon NTT: Faster Dilithium, Kyber, and Saber on Cortex-A72 and Apple M1_, Hanno Becker, Vincent Hwang, Matthias
J. Kannwischer, Bo-Yin Yang, and Shang-Yi Yang, [https://eprint.iacr.org/2021/986](https://eprint.iacr.org/2021/986)
- _Fast and Clean: Auditable high-performance assembly via constraint solving_, Amin Abdulrahman, Hanno Becker, Matthias
J. Kannwischer, Fabien Klein, [https://eprint.iacr.org/2022/1303](https://eprint.iacr.org/2022/1303)


## Variants

This backend comes in two versions: "clean" and optimized. The "clean" backend is handwritten and meant to be easy to read and modify; for example, is heavily leverages register aliases and assembly macros. This directory contains the optimized version, which is automatically generated from the clean one via [SLOTHY](https://github.com/slothy-optimizer/slothy). Currently, the
target architecture is Cortex-A55, but you can easily re-optimize the code for a different microarchitecture supported
by SLOTHY, by adjusting the parameters in [optimize.sh](../aarch64_clean/src/optimize.sh).
26 changes: 26 additions & 0 deletions dev/aarch64_opt/meta.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
/*
* Copyright (c) 2024-2025 The mlkem-native project authors
* SPDX-License-Identifier: Apache-2.0
*/

#ifndef MLKEM_NATIVE_DEV_AARCH64_OPT_META_H
#define MLKEM_NATIVE_DEV_AARCH64_OPT_META_H
#ifdef MLKEM_NATIVE_ARITH_PROFILE_H
#error Only one MLKEM_ARITH assembly profile can be defined -- did you include multiple profiles?
#else
#define MLKEM_NATIVE_ARITH_PROFILE_H

/* Identifier for this backend so that source and assembly files
* in the build can be appropriately guarded. */
#define MLKEM_NATIVE_ARITH_BACKEND_AARCH64_OPT

#define MLKEM_NATIVE_ARITH_BACKEND_NAME AARCH64_OPT

/* Filename of the C backend implementation.
* This is not inlined here because this header is included in assembly
* files as well. */
#define MLKEM_NATIVE_ARITH_BACKEND_IMPL "native/aarch64/src/opt_impl.h"

#endif /* MLKEM_NATIVE_ARITH_PROFILE_H */

#endif /* MLKEM_NATIVE_DEV_AARCH64_OPT_META_H */
Loading

0 comments on commit 3a4cabc

Please sign in to comment.