-
Notifications
You must be signed in to change notification settings - Fork 188
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Rationale document for bit-precise types _BitInt
This type has been added into the C2x specification, alongside changes to describe how they are represented at a machine level we also add a design document describing the rationale behind the choices we made.
- Loading branch information
1 parent
ad4f088
commit 2208af1
Showing
1 changed file
with
213 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,213 @@ | ||
.. | ||
Copyright (c) 2023, Arm Limited and its affiliates. All rights reserved. | ||
CC-BY-SA-4.0 AND Apache-Patent-License | ||
See LICENSE file for details | ||
Rationale Document for ABI related to the C23 _BitInt type. | ||
*********************************************************** | ||
|
||
Preamble | ||
======== | ||
|
||
Background | ||
---------- | ||
|
||
This document describes the rationale behind the ABI choices made for using the | ||
bit-precise integral types defined in C2x. These are ``_BitInt(N)`` and | ||
``unsigned _BitInt(N)``. These are defined for integral ``N`` and each ``N`` is | ||
a different type. | ||
|
||
The proposal for these types can be found in following link. | ||
https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2763.pdf | ||
|
||
As the rationale mentioned, some applications have uses for a specific bit-width | ||
type. In the case of writing C code which can be used to determine FPGA | ||
hardware these specific bit-width types can lead to large performance and space | ||
savings. | ||
|
||
From the perspective of the Arm ABI we have some trade-offs to determine. We | ||
need to choose a representation for these objects in memory and in registers | ||
along with the size and alignment of the objects. The main trade-offs we have | ||
identified in this case are on performance between different types of C-level | ||
operations, whether certain hardware-level atomic operations are possible, and | ||
general familiarity of programmers with the representation. | ||
|
||
For this particular type we are estimating that the use of ``_BitInt`` types | ||
will not be such that operations on these types are performance critical. | ||
|
||
There seem to be two different regimes for these types. The "small" regime | ||
where bit-precise types could be stored in a single register, and the "large" | ||
regime where bit-precise types must span multiple registers. | ||
|
||
Alignment and sizes | ||
------------------- | ||
|
||
These types must be at least byte-aligned so they are addressable, and at least | ||
rounded to a byte boundary in size for ``sizeof``. Since these types have an | ||
aesthetic similarity to bit-fields one might expect better packing in an array | ||
of ``_BitInt(24)`` than an array of ``int32_t`` types (i.e. packing as good as a | ||
byte-array). However, this would require a low alignment of such types and that | ||
would mean loading and storing of even "small" sized ``_BitInt``'s crossing | ||
cache boundaries -- leading to an unnecessary performance hit and hindering any | ||
atomic operations on these. | ||
|
||
Hence for "small" sizes we are choosing to define a ``_BitInt(N)`` size and | ||
alignment according to the smallest Fundamental Data Type which has a bit-size | ||
greater or equal to ``N``. Similar for ``unsigned`` versions. | ||
|
||
For "large" sizes the only approach considered has been to treat these | ||
bit-precise types as an array of ``M`` sized chunks, for some ``M``. The two | ||
"reasonable" choices for this ``M`` seem to either be register sized or | ||
double-register sized. Choosing a register sized chunk would mean smaller sizes | ||
of types for half of the values of ``N``, while choosing a double-register sized | ||
chunk would allow atomic operations on types in the range between the register | ||
and double-register sizes due to the associated extra alignment allowing | ||
operations like ``CASP`` on aarch64 and ``LDRD`` on aarch32. Moreover, the | ||
majority of "large" size use-cases proposed so far are of power-of-two sizes | ||
like sha256 which would not be in the range which suffers in space-cost from | ||
this choice. Finally, defining the ``_BitInt`` representation in this manner | ||
means that on AArch32 a ``_BitInt(64)`` has the same alignment and size as a | ||
``int64_t`` which is the largest size defined on that platform, and on AArch64 | ||
a ``_BitInt(128)`` has the same alignment and size as a ``__int128`` which is | ||
the largest type defined on that platform. This falls out of the fact that | ||
double-register size maps to the largest integral Fundamental Data Type defined | ||
on both platforms. | ||
|
||
Hence for "large" sizes we are choosing to define a ``_BitInt(N)`` size and | ||
alignment by treating them "as if" they are an array of double-register sized | ||
Fundamental Data Types. | ||
|
||
Representation in bits | ||
---------------------- | ||
|
||
There are two decisions around the representation of a "small" ``_BitInt`` that | ||
we have identified. (1) Whether required bits are stored in the least | ||
significant end of a register or most significant end of a register. (2) Whether | ||
the "remaining" bits after rounding up to the size specified in `Alignment and | ||
sizes`_ are specified or not -- with how these bits would naturally be specified | ||
depending on the choice made for (1). | ||
|
||
Options and their trade-offs | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
We have identified three viable options: | ||
|
||
A. Required bits stored in most significant end. | ||
Not-required bits are specified as zero at ABI boundaries. | ||
B. Required bits stored in least significant end. | ||
Not-required bits are unspecified at ABI boundaries. | ||
C. Required bits stored in least significant end. | ||
Not-required bits are specified as zero- or sign-extended. | ||
|
||
While it would be possible to make different requirements for bit-precise | ||
integer types in memory vs in registers, we believe that the negatives of having | ||
to perform a transformation on loading and storing values and the programmer | ||
confusion associated with different representations are reason enough to not | ||
look into this option further. Especially since the differentiating factors | ||
were not drastically different between memory and register regimes. | ||
|
||
Similarly, it would be possible to define a representation that does something | ||
like specifying bits ``[2-7]`` of a ``_BitInt(2)`` but leaves bits ``[8-63]`` | ||
unspecified. This would seem to choose the worst of both worlds in terms of | ||
performance, since one must both ensure "overflow" from an addition of | ||
``_BitInt(2)`` types does not affect the specified bits **and** ensure that the | ||
unspecified bits do not affect multiplication or division operations. | ||
Hence we do not look at variations of this kind. | ||
|
||
For option ``A`` there is an extra choice around how "large" values are stored. | ||
One could either have the "padding" bits in the least significant "chunk", or | ||
the most significant "chunk". Having these padding bits in the least | ||
significant chunk would mean require something like a widening cast would | ||
require updating every "chunk" in memory, hence we assume large values of option | ||
``A`` would be represented with the padding bits in the most significant chunk. | ||
|
||
|
||
Option ``A`` has the following benefits: | ||
|
||
- For small values in memory, on AArch64, the operations like ``LDADD`` and | ||
``LD{S,U}MAX`` both work (assuming the relevant register operand is | ||
appropriately shifted). | ||
|
||
- Operations ``+,-,%,==,<=,>=,<,>,<<`` all work without any extra instructions | ||
(which is more of the common operations than other representations). | ||
|
||
It has the following negatives: | ||
|
||
- This would be a less familiar representation to programmers. Especially the | ||
fact that a ``_BitInt(8)`` would not have the same representation in a | ||
register as a ``char`` would likely cause confusion (e.g. when debugging, or | ||
writing assembly code). This would likely be increased if other architectures | ||
that programmers may use have a more familiar representation. | ||
|
||
- Operations ``*,/``, saving and loading values to memory, and casting to | ||
another type would all require extra cost. | ||
|
||
- Operations ``+,-`` on "large" values (greater than one register) would require | ||
an extra instruction to "normalize" the carry-bit. | ||
|
||
|
||
Option ``B`` has the following benefits: | ||
|
||
- For small values in memory, the AArch64 ``LDADD`` operations work naturally. | ||
|
||
- Operations ``+,-,*,<<``, narrowing conversions, and loading/storing to memory | ||
would all naturally work. | ||
|
||
- On AArch64 this would most likely match the expectation of developers, and | ||
e.g. a ``_BitInt(8)`` would have the same representation as a ``char`` in | ||
registers. | ||
|
||
It has the following negatives: | ||
|
||
- The AArch64 ``LD{S,U}MAX`` operations would not work naturally on small values | ||
of this representation. | ||
|
||
- Operations ``/,%,==,<,>,<=,>=,>>`` and widening conversions would not | ||
require extra work. | ||
|
||
- On AArch32 this could cause surprises to developers, given that on this | ||
architecture small Fundamental Data Types are have zero- or sign-extended | ||
extra bits. So a ``char`` would not have the same representation as a | ||
``_BitInt(8)`` on this architecture. | ||
|
||
|
||
Option ``C`` has the following benefits: | ||
|
||
- For small values in memory, the AArch64 ``LD{S,U}MAX`` operations work | ||
naturally. | ||
|
||
- Operations ``==,<,<=,>=,>,>>``, widening conversions, and loading/storing to | ||
memory would all naturally work. | ||
|
||
- On AArch32 this could match the expectation of developers, with a | ||
``_BitInt(8)`` in a register matching the representation of a ``char``. | ||
|
||
It has the following negatives: | ||
|
||
- The AArch64 ``LDADD`` operations would not work naturally. | ||
|
||
- Operations ``+,-,*,<<`` would all cause the need for masking at an ABI | ||
boundary. | ||
|
||
- On AArch64 this would not match the expectation of developers, with | ||
``_BitInt(8)`` not matching the representation of a ``char``. | ||
|
||
Summary, suggestion, and reasoning | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
Overall it seems that for operations on small values option ``A`` is more | ||
performant. However, when acting on "large" values (i.e. greater than the size | ||
of one register) it loses some of that benefit. Storing to and from memory | ||
would also come at a cost for this representation. This is also likely to be | ||
the most surprising representation for developers on an Arm platform. | ||
|
||
Between option ``B`` and option ``C`` there is not a great difference in | ||
performance characteristics. However it should be noted that option ``C`` is | ||
likely the most natural extension of the AArch32 PCS rules for unspecified bits | ||
in a register containing a small Fundamental Data Type, while option ``B`` is | ||
the most natural extension of the similar rules in AArch64 PCS. | ||
|
||
As mentioned above, we do not expect operations on ``_BitInt`` types to be | ||
performance critical. Given that providing a productive environment for | ||
developers is valuable and following the "principle of least surprise" is a | ||
good way to achieve that, we suggest choosing option ``C`` for AArch32 and | ||
option ``B`` for AArch64. |