From 19b1e5ab5244b51d01e34142b250b57eca8b9d5f Mon Sep 17 00:00:00 2001
From: Matthew Malcomson <matthew.malcomson@arm.com>
Date: Fri, 15 Sep 2023 11:05:48 +0100
Subject: [PATCH] Extra discussion around "large" size alignment

Some questions were raised during the GCC implementation around our
"chunk" sizes.  This extra information is there to include the extra
points raised in that discussion.

Overall it makes the decision between chunk sizes much less clear,
choosing a chunk size which matches the decision made by x86 is more
appealing than it was before.

Currently still proposing to use a chunk size matching the quad-word
Fundamental Data Type.  Adding extra discussion to indicate that this is
less of a clear-cut decision than before.

N.b. I happened to also notice a stack overflow question which was
suggesting the use of _BitInt(128) for u128.  Given the discussion did
not have any mention of the different ABI between this and __uint128 I
would take it as evidence for the need to have the two integrals ABI
match.
https://stackoverflow.com/questions/16088282/is-there-a-128-bit-integer-in-gcc
---
 design-documents/bit-precise-types.rst | 39 ++++++++++++++++++--------
 1 file changed, 28 insertions(+), 11 deletions(-)

diff --git a/design-documents/bit-precise-types.rst b/design-documents/bit-precise-types.rst
index 08f77073..96cefa54 100644
--- a/design-documents/bit-precise-types.rst
+++ b/design-documents/bit-precise-types.rst
@@ -306,7 +306,13 @@ B. Double-register sized.
 
 Option ``A`` has the following benefits:
 
+- This would mean that the alignment of a ``_BitInt(128)`` on AArch64 matches
+  that of x86.  This could reduce surprises when writing portable code.
 - Less space used for half of the values of ``N``.
+- Multiplications on large ``_BitInt(N)`` can be logically done on the limbs of
+  size ``M``, which should result in a neater compiler implementation.  E.g.
+  for AArch64 there is a ``SMULH`` which could be used as part of a
+  multiplication on an entire limb.
 
 Option ``B`` has the following benefit:
 
@@ -317,19 +323,30 @@ Option ``B`` has the following benefit:
 - On AArch32 a ``_BitInt(64)`` would have the same alignment and size as an
   ``int64_t``, and on AArch64 a ``_BitInt(128)`` would have the same alignment
   and size as a ``__int128``.
-  These are the largest types defined on the relevant architectures, and
-  correspond to the largest integral Fundamental Data Type defined in the PCS
-  for both platforms.
+- Double-register sized integers match the largest Fundamental Data Types
+  defined in the relevant PCS architectures for both platforms.  We believe
+  that that implementors familiar with the AArch64 ABI would find this mapping
+  less surprising and hence make less mistakes.
 
 The "large" size use-cases we have identified so far are of power-of-two sizes.
-These sizes would not benefit from the positives of either of the options
-presented here.
-
-Hence for "large" sizes we are choosing based on an estimate of which choice is
-more "generally useful".  Our estimate is that the benefits of option ``B`` are
-more generally useful than those from option ``A``.  That is we choose to define
-the size and alignment of ``_BitInt(N > [register-size])`` types by treating
-them "as if" they are an array of double-register sized Fundamental Data Types.
+These sizes would not benefit greatly from the positives of either of the
+options presented here, with the only difference being around the implementation
+of multiplication.
+
+Our estimate is that the benefits of option ``B`` are more useful for sizes
+between register and double-register than those from option ``A``.  This is not
+considered a clear-cut choice, with the main point in favour of option ``A``
+being a smaller difference from x86.
+
+Other variants are available, such as choosing alignment and size based on
+register sized chunks except for the special case of the double-register sized
+_BitInt.  Though such variants can provide a good combination of the properties
+above we judge them to have an extra complexity of definition and associated
+increased likelyhood of mistakes of implementation.
+
+Based on the above reasoning, we would choose to define the size and alignment
+of ``_BitInt(N > [register-size])`` types by treating them "as if" they are an
+array of double-register sized Fundamental Data Types.
 
 Representation in bits
 ----------------------