Rationale document for bit-precise types _BitInt

This type has been added into the C2x specification, alongside changes to describe how they are represented at a machine level we also add a design document describing the rationale behind the choices we made.
ARM-software · Sep 12, 2023 · 2208af1 · 2208af1
1 parent ad4f088
commit 2208af1
Showing 1 changed file with 213 additions and 0 deletions.
diff --git a/design-documents/bit-precise-types.rst b/design-documents/bit-precise-types.rst
@@ -0,0 +1,213 @@
+..
+   Copyright (c) 2023, Arm Limited and its affiliates.  All rights reserved.
+   CC-BY-SA-4.0 AND Apache-Patent-License
+   See LICENSE file for details
+
+Rationale Document for ABI related to the C23 _BitInt type.
+***********************************************************
+
+Preamble
+========
+
+Background
+----------
+
+This document describes the rationale behind the ABI choices made for using the
+bit-precise integral types defined in C2x.  These are ``_BitInt(N)`` and
+``unsigned _BitInt(N)``.  These are defined for integral ``N`` and each ``N`` is
+a different type.
+
+The proposal for these types can be found in following link.
+https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2763.pdf
+
+As the rationale mentioned, some applications have uses for a specific bit-width
+type.  In the case of writing C code which can be used to determine FPGA
+hardware these specific bit-width types can lead to large performance and space
+savings.
+
+From the perspective of the Arm ABI we have some trade-offs to determine.  We
+need to choose a representation for these objects in memory and in registers
+along with the size and alignment of the objects.  The main trade-offs we have
+identified in this case are on performance between different types of C-level
+operations, whether certain hardware-level atomic operations are possible, and
+general familiarity of programmers with the representation.
+
+For this particular type we are estimating that the use of ``_BitInt`` types
+will not be such that operations on these types are performance critical.
+
+There seem to be two different regimes for these types.  The "small" regime
+where bit-precise types could be stored in a single register, and the "large"
+regime where bit-precise types must span multiple registers.
+
+Alignment and sizes
+-------------------
+
+These types must be at least byte-aligned so they are addressable, and at least
+rounded to a byte boundary in size for ``sizeof``.  Since these types have an
+aesthetic similarity to bit-fields one might expect better packing in an array
+of ``_BitInt(24)`` than an array of ``int32_t`` types (i.e. packing as good as a
+byte-array).  However, this would require a low alignment of such types and that
+would mean loading and storing of even "small" sized ``_BitInt``'s crossing
+cache boundaries -- leading to an unnecessary performance hit and hindering any
+atomic operations on these.
+
+Hence for "small" sizes we are choosing to define a ``_BitInt(N)`` size and
+alignment according to the smallest Fundamental Data Type which has a bit-size
+greater or equal to ``N``.  Similar for ``unsigned`` versions.
+
+For "large" sizes the only approach considered has been to treat these
+bit-precise types as an array of ``M`` sized chunks, for some ``M``.  The two
+"reasonable" choices for this ``M`` seem to either be register sized or
+double-register sized.  Choosing a register sized chunk would mean smaller sizes
+of types for half of the values of ``N``, while choosing a double-register sized
+chunk would allow atomic operations on types in the range between the register
+and double-register sizes due to the associated extra alignment allowing
+operations like ``CASP`` on aarch64 and ``LDRD`` on aarch32.  Moreover, the
+majority of "large" size use-cases proposed so far are of power-of-two sizes
+like sha256 which would not be in the range which suffers in space-cost from
+this choice.  Finally, defining the ``_BitInt`` representation in this manner
+means that on AArch32 a ``_BitInt(64)`` has the same alignment and size as a
+``int64_t`` which is the largest size defined on that platform, and on AArch64
+a ``_BitInt(128)`` has the same alignment and size as a ``__int128`` which is
+the largest type defined on that platform.  This falls out of the fact that
+double-register size maps to the largest integral Fundamental Data Type defined
+on both platforms.
+
+Hence for "large" sizes we are choosing to define a ``_BitInt(N)`` size and
+alignment by treating them "as if" they are an array of double-register sized
+Fundamental Data Types.
+
+Representation in bits
+----------------------
+
+There are two decisions around the representation of a "small" ``_BitInt`` that
+we have identified.  (1) Whether required bits are stored in the least
+significant end of a register or most significant end of a register. (2) Whether
+the "remaining" bits after rounding up to the size specified in `Alignment and
+sizes`_ are specified or not -- with how these bits would naturally be specified
+depending on the choice made for (1).
+
+Options and their trade-offs
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+We have identified three viable options:
+
+A. Required bits stored in most significant end.
+   Not-required bits are specified as zero at ABI boundaries.
+B. Required bits stored in least significant end.
+   Not-required bits are unspecified at ABI boundaries.
+C. Required bits stored in least significant end.
+   Not-required bits are specified as zero- or sign-extended.
+
+While it would be possible to make different requirements for bit-precise
+integer types in memory vs in registers, we believe that the negatives of having
+to perform a transformation on loading and storing values and the programmer
+confusion associated with different representations are reason enough to not
+look into this option further.  Especially since the differentiating factors
+were not drastically different between memory and register regimes.
+
+Similarly, it would be possible to define a representation that does something
+like specifying bits ``[2-7]`` of a ``_BitInt(2)`` but leaves bits ``[8-63]``
+unspecified.  This would seem to choose the worst of both worlds in terms of
+performance, since one must both ensure "overflow" from an addition of
+``_BitInt(2)`` types does not affect the specified bits **and** ensure that the
+unspecified bits do not affect multiplication or division operations.
+Hence we do not look at variations of this kind.
+
+For option ``A`` there is an extra choice around how "large" values are stored.
+One could either have the "padding" bits in the least significant "chunk", or
+the most significant "chunk".  Having these padding bits in the least
+significant chunk would mean require something like a widening cast would
+require updating every "chunk" in memory, hence we assume large values of option
+``A`` would be represented with the padding bits in the most significant chunk.
+
+
+Option ``A`` has the following benefits:
+
+- For small values in memory, on AArch64, the operations like ``LDADD`` and
+  ``LD{S,U}MAX`` both work (assuming the relevant register operand is
+  appropriately shifted).
+
+- Operations ``+,-,%,==,<=,>=,<,>,<<`` all work without any extra instructions
+  (which is more of the common operations than other representations).
+
+It has the following negatives:
+
+- This would be a less familiar representation to programmers.  Especially the
+  fact that a ``_BitInt(8)`` would not have the same representation in a
+  register as a ``char`` would likely cause confusion (e.g. when debugging, or
+  writing assembly code).  This would likely be increased if other architectures
+  that programmers may use have a more familiar representation.
+
+- Operations ``*,/``, saving and loading values to memory, and casting to
+  another type would all require extra cost.
+
+- Operations ``+,-`` on "large" values (greater than one register) would require
+  an extra instruction to "normalize" the carry-bit.
+
+
+Option ``B`` has the following benefits:
+
+- For small values in memory, the AArch64 ``LDADD`` operations work naturally.
+
+- Operations ``+,-,*,<<``, narrowing conversions, and loading/storing to memory
+  would all naturally work.
+
+- On AArch64 this would most likely match the expectation of developers, and
+  e.g. a ``_BitInt(8)`` would have the same representation as a ``char`` in
+  registers.
+
+It has the following negatives:
+
+- The AArch64 ``LD{S,U}MAX`` operations would not work naturally on small values
+  of this representation.
+
+- Operations ``/,%,==,<,>,<=,>=,>>`` and widening conversions would not
+  require extra work.
+
+- On AArch32 this could cause surprises to developers, given that on this
+  architecture small Fundamental Data Types are have zero- or sign-extended
+  extra bits.  So a ``char`` would not have the same representation as a
+  ``_BitInt(8)`` on this architecture.
+
+
+Option ``C`` has the following benefits:
+
+- For small values in memory, the AArch64 ``LD{S,U}MAX`` operations work
+  naturally.
+
+- Operations ``==,<,<=,>=,>,>>``, widening conversions, and loading/storing to
+  memory would all naturally work.
+
+- On AArch32 this could match the expectation of developers, with a
+  ``_BitInt(8)`` in a register matching the representation of a ``char``.
+
+It has the following negatives:
+
+- The AArch64 ``LDADD`` operations would not work naturally.
+
+- Operations ``+,-,*,<<`` would all cause the need for masking at an ABI
+  boundary.
+
+- On AArch64 this would not match the expectation of developers, with
+  ``_BitInt(8)`` not matching the representation of a ``char``.
+
+Summary, suggestion, and reasoning
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Overall it seems that for operations on small values option ``A`` is more
+performant.  However, when acting on "large" values (i.e. greater than the size
+of one register) it loses some of that benefit.  Storing to and from memory
+would also come at a cost for this representation.  This is also likely to be
+the most surprising representation for developers on an Arm platform.
+
+Between option ``B`` and option ``C`` there is not a great difference in
+performance characteristics.  However it should be noted that option ``C`` is
+likely the most natural extension of the AArch32 PCS rules for unspecified bits
+in a register containing a small Fundamental Data Type, while option ``B`` is
+the most natural extension of the similar rules in AArch64 PCS.
+
+As mentioned above, we do not expect operations on ``_BitInt`` types to be
+performance critical.  Given that providing a productive environment for
+developers is valuable and following the "principle of least surprise" is a
+good way to achieve that, we suggest choosing option ``C`` for AArch32 and
+option ``B`` for AArch64.