Skip to content

Commit

Permalink
Rationale document for bit-precise types _BitInt
Browse files Browse the repository at this point in the history
This type has been added into the C2x specification, alongside changes
to describe how they are represented at a machine level we also add a
design document describing the rationale behind the choices we made.
  • Loading branch information
mmalcomson committed Sep 12, 2023
1 parent ad4f088 commit 2208af1
Showing 1 changed file with 213 additions and 0 deletions.
213 changes: 213 additions & 0 deletions design-documents/bit-precise-types.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,213 @@
..
Copyright (c) 2023, Arm Limited and its affiliates. All rights reserved.
CC-BY-SA-4.0 AND Apache-Patent-License
See LICENSE file for details
Rationale Document for ABI related to the C23 _BitInt type.
***********************************************************

Preamble
========

Background
----------

This document describes the rationale behind the ABI choices made for using the
bit-precise integral types defined in C2x. These are ``_BitInt(N)`` and
``unsigned _BitInt(N)``. These are defined for integral ``N`` and each ``N`` is
a different type.

The proposal for these types can be found in following link.
https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2763.pdf

As the rationale mentioned, some applications have uses for a specific bit-width
type. In the case of writing C code which can be used to determine FPGA
hardware these specific bit-width types can lead to large performance and space
savings.

From the perspective of the Arm ABI we have some trade-offs to determine. We
need to choose a representation for these objects in memory and in registers
along with the size and alignment of the objects. The main trade-offs we have
identified in this case are on performance between different types of C-level
operations, whether certain hardware-level atomic operations are possible, and
general familiarity of programmers with the representation.

For this particular type we are estimating that the use of ``_BitInt`` types
will not be such that operations on these types are performance critical.

There seem to be two different regimes for these types. The "small" regime
where bit-precise types could be stored in a single register, and the "large"
regime where bit-precise types must span multiple registers.

Alignment and sizes
-------------------

These types must be at least byte-aligned so they are addressable, and at least
rounded to a byte boundary in size for ``sizeof``. Since these types have an
aesthetic similarity to bit-fields one might expect better packing in an array
of ``_BitInt(24)`` than an array of ``int32_t`` types (i.e. packing as good as a
byte-array). However, this would require a low alignment of such types and that
would mean loading and storing of even "small" sized ``_BitInt``'s crossing
cache boundaries -- leading to an unnecessary performance hit and hindering any
atomic operations on these.

Hence for "small" sizes we are choosing to define a ``_BitInt(N)`` size and
alignment according to the smallest Fundamental Data Type which has a bit-size
greater or equal to ``N``. Similar for ``unsigned`` versions.

For "large" sizes the only approach considered has been to treat these
bit-precise types as an array of ``M`` sized chunks, for some ``M``. The two
"reasonable" choices for this ``M`` seem to either be register sized or
double-register sized. Choosing a register sized chunk would mean smaller sizes
of types for half of the values of ``N``, while choosing a double-register sized
chunk would allow atomic operations on types in the range between the register
and double-register sizes due to the associated extra alignment allowing
operations like ``CASP`` on aarch64 and ``LDRD`` on aarch32. Moreover, the
majority of "large" size use-cases proposed so far are of power-of-two sizes
like sha256 which would not be in the range which suffers in space-cost from
this choice. Finally, defining the ``_BitInt`` representation in this manner
means that on AArch32 a ``_BitInt(64)`` has the same alignment and size as a
``int64_t`` which is the largest size defined on that platform, and on AArch64
a ``_BitInt(128)`` has the same alignment and size as a ``__int128`` which is
the largest type defined on that platform. This falls out of the fact that
double-register size maps to the largest integral Fundamental Data Type defined
on both platforms.

Hence for "large" sizes we are choosing to define a ``_BitInt(N)`` size and
alignment by treating them "as if" they are an array of double-register sized
Fundamental Data Types.

Representation in bits
----------------------

There are two decisions around the representation of a "small" ``_BitInt`` that
we have identified. (1) Whether required bits are stored in the least
significant end of a register or most significant end of a register. (2) Whether
the "remaining" bits after rounding up to the size specified in `Alignment and
sizes`_ are specified or not -- with how these bits would naturally be specified
depending on the choice made for (1).

Options and their trade-offs
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

We have identified three viable options:

A. Required bits stored in most significant end.
Not-required bits are specified as zero at ABI boundaries.
B. Required bits stored in least significant end.
Not-required bits are unspecified at ABI boundaries.
C. Required bits stored in least significant end.
Not-required bits are specified as zero- or sign-extended.

While it would be possible to make different requirements for bit-precise
integer types in memory vs in registers, we believe that the negatives of having
to perform a transformation on loading and storing values and the programmer
confusion associated with different representations are reason enough to not
look into this option further. Especially since the differentiating factors
were not drastically different between memory and register regimes.

Similarly, it would be possible to define a representation that does something
like specifying bits ``[2-7]`` of a ``_BitInt(2)`` but leaves bits ``[8-63]``
unspecified. This would seem to choose the worst of both worlds in terms of
performance, since one must both ensure "overflow" from an addition of
``_BitInt(2)`` types does not affect the specified bits **and** ensure that the
unspecified bits do not affect multiplication or division operations.
Hence we do not look at variations of this kind.

For option ``A`` there is an extra choice around how "large" values are stored.
One could either have the "padding" bits in the least significant "chunk", or
the most significant "chunk". Having these padding bits in the least
significant chunk would mean require something like a widening cast would
require updating every "chunk" in memory, hence we assume large values of option
``A`` would be represented with the padding bits in the most significant chunk.


Option ``A`` has the following benefits:

- For small values in memory, on AArch64, the operations like ``LDADD`` and
``LD{S,U}MAX`` both work (assuming the relevant register operand is
appropriately shifted).

- Operations ``+,-,%,==,<=,>=,<,>,<<`` all work without any extra instructions
(which is more of the common operations than other representations).

It has the following negatives:

- This would be a less familiar representation to programmers. Especially the
fact that a ``_BitInt(8)`` would not have the same representation in a
register as a ``char`` would likely cause confusion (e.g. when debugging, or
writing assembly code). This would likely be increased if other architectures
that programmers may use have a more familiar representation.

- Operations ``*,/``, saving and loading values to memory, and casting to
another type would all require extra cost.

- Operations ``+,-`` on "large" values (greater than one register) would require
an extra instruction to "normalize" the carry-bit.


Option ``B`` has the following benefits:

- For small values in memory, the AArch64 ``LDADD`` operations work naturally.

- Operations ``+,-,*,<<``, narrowing conversions, and loading/storing to memory
would all naturally work.

- On AArch64 this would most likely match the expectation of developers, and
e.g. a ``_BitInt(8)`` would have the same representation as a ``char`` in
registers.

It has the following negatives:

- The AArch64 ``LD{S,U}MAX`` operations would not work naturally on small values
of this representation.

- Operations ``/,%,==,<,>,<=,>=,>>`` and widening conversions would not
require extra work.

- On AArch32 this could cause surprises to developers, given that on this
architecture small Fundamental Data Types are have zero- or sign-extended
extra bits. So a ``char`` would not have the same representation as a
``_BitInt(8)`` on this architecture.


Option ``C`` has the following benefits:

- For small values in memory, the AArch64 ``LD{S,U}MAX`` operations work
naturally.

- Operations ``==,<,<=,>=,>,>>``, widening conversions, and loading/storing to
memory would all naturally work.

- On AArch32 this could match the expectation of developers, with a
``_BitInt(8)`` in a register matching the representation of a ``char``.

It has the following negatives:

- The AArch64 ``LDADD`` operations would not work naturally.

- Operations ``+,-,*,<<`` would all cause the need for masking at an ABI
boundary.

- On AArch64 this would not match the expectation of developers, with
``_BitInt(8)`` not matching the representation of a ``char``.

Summary, suggestion, and reasoning
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Overall it seems that for operations on small values option ``A`` is more
performant. However, when acting on "large" values (i.e. greater than the size
of one register) it loses some of that benefit. Storing to and from memory
would also come at a cost for this representation. This is also likely to be
the most surprising representation for developers on an Arm platform.

Between option ``B`` and option ``C`` there is not a great difference in
performance characteristics. However it should be noted that option ``C`` is
likely the most natural extension of the AArch32 PCS rules for unspecified bits
in a register containing a small Fundamental Data Type, while option ``B`` is
the most natural extension of the similar rules in AArch64 PCS.

As mentioned above, we do not expect operations on ``_BitInt`` types to be
performance critical. Given that providing a productive environment for
developers is valuable and following the "principle of least surprise" is a
good way to achieve that, we suggest choosing option ``C`` for AArch32 and
option ``B`` for AArch64.

0 comments on commit 2208af1

Please sign in to comment.