diff --git a/design-documents/bit-precise-types.rst b/design-documents/bit-precise-types.rst new file mode 100644 index 00000000..fad9dcce --- /dev/null +++ b/design-documents/bit-precise-types.rst @@ -0,0 +1,213 @@ +.. + Copyright (c) 2023, Arm Limited and its affiliates. All rights reserved. + CC-BY-SA-4.0 AND Apache-Patent-License + See LICENSE file for details + +Rationale Document for ABI related to the C23 _BitInt type. +*********************************************************** + +Preamble +======== + +Background +---------- + +This document describes the rationale behind the ABI choices made for using the +bit-precise integral types defined in C2x. These are ``_BitInt(N)`` and +``unsigned _BitInt(N)``. These are defined for integral ``N`` and each ``N`` is +a different type. + +The proposal for these types can be found in following link. +https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2763.pdf + +As the rationale mentioned, some applications have uses for a specific bit-width +type. In the case of writing C code which can be used to determine FPGA +hardware these specific bit-width types can lead to large performance and space +savings. + +From the perspective of the Arm ABI we have some trade-offs to determine. We +need to choose a representation for these objects in memory and in registers +along with the size and alignment of the objects. The main trade-offs we have +identified in this case are on performance between different types of C-level +operations, whether certain hardware-level atomic operations are possible, and +general familiarity of programmers with the representation. + +For this particular type we are estimating that the use of ``_BitInt`` types +will not be such that operations on these types are performance critical. + +There seem to be two different regimes for these types. The "small" regime +where bit-precise types could be stored in a single register, and the "large" +regime where bit-precise types must span multiple registers. + +Alignment and sizes +------------------- + +These types must be at least byte-aligned so they are addressable, and at least +rounded to a byte boundary in size for ``sizeof``. Since these types have an +aesthetic similarity to bit-fields one might expect better packing in an array +of ``_BitInt(24)`` than an array of ``int32_t`` types (i.e. packing as good as a +byte-array). However, this would require a low alignment of such types and that +would mean loading and storing of even "small" sized ``_BitInt``'s crossing +cache boundaries -- leading to an unnecessary performance hit and hindering any +atomic operations on these. + +Hence for "small" sizes we are choosing to define a ``_BitInt(N)`` size and +alignment according to the smallest Fundamental Data Type which has a bit-size +greater or equal to ``N``. Similar for ``unsigned`` versions. + +For "large" sizes the only approach considered has been to treat these +bit-precise types as an array of ``M`` sized chunks, for some ``M``. The two +"reasonable" choices for this ``M`` seem to either be register sized or +double-register sized. Choosing a register sized chunk would mean smaller sizes +of types for half of the values of ``N``, while choosing a double-register sized +chunk would allow atomic operations on types in the range between the register +and double-register sizes due to the associated extra alignment allowing +operations like ``CASP`` on aarch64 and ``LDRD`` on aarch32. Moreover, the +majority of "large" size use-cases proposed so far are of power-of-two sizes +like sha256 which would not be in the range which suffers in space-cost from +this choice. Finally, defining the ``_BitInt`` representation in this manner +means that on AArch32 a ``_BitInt(64)`` has the same alignment and size as a +``int64_t`` which is the largest size defined on that platform, and on AArch64 +a ``_BitInt(128)`` has the same alignment and size as a ``__int128`` which is +the largest type defined on that platform. This falls out of the fact that +double-register size maps to the largest integral Fundamental Data Type defined +on both platforms. + +Hence for "large" sizes we are choosing to define a ``_BitInt(N)`` size and +alignment by treating them "as if" they are an array of double-register sized +Fundamental Data Types. + +Representation in bits +---------------------- + +There are two decisions around the representation of a "small" ``_BitInt`` that +we have identified. (1) Whether required bits are stored in the least +significant end of a register or most significant end of a register. (2) Whether +the "remaining" bits after rounding up to the size specified in `Alignment and +sizes`_ are specified or not -- with how these bits would naturally be specified +depending on the choice made for (1). + +Options and their trade-offs +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +We have identified three viable options: + +A. Required bits stored in most significant end. + Not-required bits are specified as zero at ABI boundaries. +B. Required bits stored in least significant end. + Not-required bits are unspecified at ABI boundaries. +C. Required bits stored in least significant end. + Not-required bits are specified as zero- or sign-extended. + +While it would be possible to make different requirements for bit-precise +integer types in memory vs in registers, we believe that the negatives of having +to perform a transformation on loading and storing values and the programmer +confusion associated with different representations are reason enough to not +look into this option further. Especially since the differentiating factors +were not drastically different between memory and register regimes. + +Similarly, it would be possible to define a representation that does something +like specifying bits ``[2-7]`` of a ``_BitInt(2)`` but leaves bits ``[8-63]`` +unspecified. This would seem to choose the worst of both worlds in terms of +performance, since one must both ensure "overflow" from an addition of +``_BitInt(2)`` types does not affect the specified bits **and** ensure that the +unspecified bits do not affect multiplication or division operations. +Hence we do not look at variations of this kind. + +For option ``A`` there is an extra choice around how "large" values are stored. +One could either have the "padding" bits in the least significant "chunk", or +the most significant "chunk". Having these padding bits in the least +significant chunk would mean require something like a widening cast would +require updating every "chunk" in memory, hence we assume large values of option +``A`` would be represented with the padding bits in the most significant chunk. + + +Option ``A`` has the following benefits: + +- For small values in memory, on AArch64, the operations like ``LDADD`` and + ``LD{S,U}MAX`` both work (assuming the relevant register operand is + appropriately shifted). + +- Operations ``+,-,%,==,<=,>=,<,>,<<`` all work without any extra instructions + (which is more of the common operations than other representations). + +It has the following negatives: + +- This would be a less familiar representation to programmers. Especially the + fact that a ``_BitInt(8)`` would not have the same representation in a + register as a ``char`` would likely cause confusion (e.g. when debugging, or + writing assembly code). This would likely be increased if other architectures + that programmers may use have a more familiar representation. + +- Operations ``*,/``, saving and loading values to memory, and casting to + another type would all require extra cost. + +- Operations ``+,-`` on "large" values (greater than one register) would require + an extra instruction to "normalize" the carry-bit. + + +Option ``B`` has the following benefits: + +- For small values in memory, the AArch64 ``LDADD`` operations work naturally. + +- Operations ``+,-,*,<<``, narrowing conversions, and loading/storing to memory + would all naturally work. + +- On AArch64 this would most likely match the expectation of developers, and + e.g. a ``_BitInt(8)`` would have the same representation as a ``char`` in + registers. + +It has the following negatives: + +- The AArch64 ``LD{S,U}MAX`` operations would not work naturally on small values + of this representation. + +- Operations ``/,%,==,<,>,<=,>=,>>`` and widening conversions would not + require extra work. + +- On AArch32 this could cause surprises to developers, given that on this + architecture small Fundamental Data Types are have zero- or sign-extended + extra bits. So a ``char`` would not have the same representation as a + ``_BitInt(8)`` on this architecture. + + +Option ``C`` has the following benefits: + +- For small values in memory, the AArch64 ``LD{S,U}MAX`` operations work + naturally. + +- Operations ``==,<,<=,>=,>,>>``, widening conversions, and loading/storing to + memory would all naturally work. + +- On AArch32 this could match the expectation of developers, with a + ``_BitInt(8)`` in a register matching the representation of a ``char``. + +It has the following negatives: + +- The AArch64 ``LDADD`` operations would not work naturally. + +- Operations ``+,-,*,<<`` would all cause the need for masking at an ABI + boundary. + +- On AArch64 this would not match the expectation of developers, with + ``_BitInt(8)`` not matching the representation of a ``char``. + +Summary, suggestion, and reasoning +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Overall it seems that for operations on small values option ``A`` is more +performant. However, when acting on "large" values (i.e. greater than the size +of one register) it loses some of that benefit. Storing to and from memory +would also come at a cost for this representation. This is also likely to be +the most surprising representation for developers on an Arm platform. + +Between option ``B`` and option ``C`` there is not a great difference in +performance characteristics. However it should be noted that option ``C`` is +likely the most natural extension of the AArch32 PCS rules for unspecified bits +in a register containing a small Fundamental Data Type, while option ``B`` is +the most natural extension of the similar rules in AArch64 PCS. + +As mentioned above, we do not expect operations on ``_BitInt`` types to be +performance critical. Given that providing a productive environment for +developers is valuable and following the "principle of least surprise" is a +good way to achieve that, we suggest choosing option ``C`` for AArch32 and +option ``B`` for AArch64.