From 3cb0f9fc3cba1b9f36bb6e1d6b60ba251ae2b0e7 Mon Sep 17 00:00:00 2001 From: lukeg101 <6547672+lukeg101@users.noreply.github.com> Date: Fri, 5 Apr 2024 21:54:48 +0100 Subject: [PATCH 01/17] [ATOMICSABI64]: Alpha Draft of Atomics ABI MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This is the Alpha draft of the ABI for the "C/C++ Atomics Application Binary Interface Standard for the Arm® 64-bit Architecture" This document describes the C/C++ Atomics Application Binary Interface for the Arm 64-bit architecture. This document concerns the valid mappings from C/C++ Atomic Operations to sequences of A64 instructions. This document does not support Armv7. For matters concerning the memory model, please consult §B2 of the Arm Architecture Reference Manual. We focus only on a subset of the C11 atomic operations and their mapping to A64 instructions at the time of writing. More atomics will be added. Co-Authored with Wilco Dijkstra (@Wilco1). --- CONTRIBUTING.md | 1 + README.md | 1 + atomicsabi64/Arm_logo_blue_RGB.svg | 15 + atomicsabi64/CONTRIBUTIONS | 3 + atomicsabi64/LICENSE | 22 + atomicsabi64/README.md | 38 + atomicsabi64/TRADEMARK_NOTICE | 8 + atomicsabi64/atomicsabi64.rst | 1161 ++++++++++++++++++++++++ tools/common/check-rst-syntax.sh | 3 + tools/common/generate-release-links.sh | 1 + tools/rst2pdf/generate-pdfs.sh | 3 + 11 files changed, 1256 insertions(+) create mode 100644 atomicsabi64/Arm_logo_blue_RGB.svg create mode 100644 atomicsabi64/CONTRIBUTIONS create mode 100644 atomicsabi64/LICENSE create mode 100644 atomicsabi64/README.md create mode 100644 atomicsabi64/TRADEMARK_NOTICE create mode 100644 atomicsabi64/atomicsabi64.rst diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index fa836e48..3db4550f 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -109,6 +109,7 @@ document | owner | Github handle [Morello extensions to ELF for the Arm 64-bit Architecture](https://github.com/ARM-software/abi-aa/tree/master/aaelf64-morello) | Silviu Baranga | @sbaranga-arm [Morello Descriptor ABI for the Arm 64-bit Architecture](https://github.com/ARM-software/abi-aa/tree/master/descabi-morello) | Silviu Baranga | @sbaranga-arm [Memtag ABI Extension to ELF for the Arm 64-bit Architecture](https://github.com/ARM-software/abi-aa/tree/master/memtagabielf64) | Mitch Phillips | @hctim +[C/C++ Atomics Application Binary Interface Standard for the Arm 64-bit Architecture](https://github.com/ARM-software/abi-aa/tree/master/atomicsabi64) | Luke Geeson | @lukeg101 3. Merging the change diff --git a/README.md b/README.md index 571a0e0d..973d82a6 100644 --- a/README.md +++ b/README.md @@ -71,6 +71,7 @@ ELF for the Arm 64-bit Architecture | [aaelf64](a DWARF for the Arm 64-bit Architecture | [aadwarf64](aadwarf64/aadwarf64.rst) | [2020Q2](legacy-documents/aadwarf64/ihi0057_E/IHI0057_E_2020Q2_aadwarf64.pdf) C++ ABI for the Arm 64-bit Architecture | [cppabi64](cppabi64/cppabi64.rst) | [2020Q2](legacy-documents/cppabi64/ihi0059_E/IHI0059E_2020Q2_cppabi64.pdf) Vector Function ABI for the Arm 64-bit Architecture | [vfabia64](vfabia64/vfabia64.rst) | [2019Q2](legacy-documents/vfabia64/101129_1920/101129_1920_01_en.pdf) +C/C++ Atomics ABI for the Arm 64-bit Architecture | [atomicsabi64](atomicsabi64/atomicsabi64.rst) | n/a ### ABI for the Arm 64-bit Architecture with SVE support diff --git a/atomicsabi64/Arm_logo_blue_RGB.svg b/atomicsabi64/Arm_logo_blue_RGB.svg new file mode 100644 index 00000000..1f9a9ba1 --- /dev/null +++ b/atomicsabi64/Arm_logo_blue_RGB.svg @@ -0,0 +1,15 @@ + + + + + + diff --git a/atomicsabi64/CONTRIBUTIONS b/atomicsabi64/CONTRIBUTIONS new file mode 100644 index 00000000..113f5fa6 --- /dev/null +++ b/atomicsabi64/CONTRIBUTIONS @@ -0,0 +1,3 @@ +Contributions to this project are licensed under an inbound=outbound +model such that any such contributions are licensed by the contributor +under the same terms as those in the LICENSE file. diff --git a/atomicsabi64/LICENSE b/atomicsabi64/LICENSE new file mode 100644 index 00000000..aa6d8392 --- /dev/null +++ b/atomicsabi64/LICENSE @@ -0,0 +1,22 @@ +This work is licensed under the Creative Commons +Attribution-ShareAlike 4.0 International License. To view a copy of +this license, visit http://creativecommons.org/licenses/by-sa/4.0/ or +send a letter to Creative Commons, PO Box 1866, Mountain View, CA +94042, USA. + +Grant of Patent License. Subject to the terms and conditions of this +license (both the Public License and this Patent License), each +Licensor hereby grants to You a perpetual, worldwide, non-exclusive, +no-charge, royalty-free, irrevocable (except as stated in this +section) patent license to make, have made, use, offer to sell, sell, +import, and otherwise transfer the Licensed Material, where such +license applies only to those patent claims licensable by such +Licensor that are necessarily infringed by their contribution(s) alone +or by combination of their contribution(s) with the Licensed Material +to which such contribution(s) was submitted. If You institute patent +litigation against any entity (including a cross-claim or counterclaim +in a lawsuit) alleging that the Licensed Material or a contribution +incorporated within the Licensed Material constitutes direct or +contributory patent infringement, then any licenses granted to You +under this license for that Licensed Material shall terminate as of +the date such litigation is filed. diff --git a/atomicsabi64/README.md b/atomicsabi64/README.md new file mode 100644 index 00000000..64136bd4 --- /dev/null +++ b/atomicsabi64/README.md @@ -0,0 +1,38 @@ +
+ +
+ +# Atomics ABI for the Arm® 64-bit Architecture (AArch64) + + +## About this document + +This document describes the [Application Binary Interface for the use +of code generated by compiling C/C++ atomics targeting the Arm 64-bit architecture](atomicsabi64.rst). + +## About the license + +As identified more fully in the [LICENSE](LICENSE) file, this project +is licensed under CC-BY-SA-4.0 along with an additional patent +license. The language in the additional patent license is largely +identical to that in Apache-2.0 (specifically, Section 3 of Apache-2.0 +as reflected at https://www.apache.org/licenses/LICENSE-2.0) with two +exceptions. + +First, several changes were made related to the defined terms so as to +reflect the fact that such defined terms need to align with the +terminology in CC-BY-SA-4.0 rather than Apache-2.0 (e.g., changing +“Work” to “Licensed Material”). + +Second, the defensive termination clause was changed such that the +scope of defensive termination applies to “any licenses granted to +You” (rather than “any patent licenses granted to You”). This change +is intended to help maintain a healthy ecosystem by providing +additional protection to the community against patent litigation +claims. + +## Defects report + +Please report defects in the [Atomics Application Binary Interface (ABI) +for the Arm 64-bit architecture](atomicsabi64.rst) to the [issue tracker +page on GitHub](https://github.com/ARM-software/abi-aa/issues). diff --git a/atomicsabi64/TRADEMARK_NOTICE b/atomicsabi64/TRADEMARK_NOTICE new file mode 100644 index 00000000..9a7a7252 --- /dev/null +++ b/atomicsabi64/TRADEMARK_NOTICE @@ -0,0 +1,8 @@ +The text of and illustrations in this document are licensed +under a Creative Commons Attribution–Share Alike 4.0 International +license ("CC-BY-SA-4.0”), with an additional clause on patents. +The Arm trademarks featured here are registered trademarks or +trademarks of Arm Limited (or its subsidiaries) in the US and/or +elsewhere. All rights reserved. Please visit +https://www.arm.com/company/policies/trademarks for more information +about Arm’s trademarks. diff --git a/atomicsabi64/atomicsabi64.rst b/atomicsabi64/atomicsabi64.rst new file mode 100644 index 00000000..128067a3 --- /dev/null +++ b/atomicsabi64/atomicsabi64.rst @@ -0,0 +1,1161 @@ +.. + Copyright (c) 2024, Arm Limited and its affiliates. All rights reserved. + CC-BY-SA-4.0 AND Apache-Patent-License + See LICENSE file for details + +.. |release| replace:: 2024Q1 +.. |date-of-issue| replace:: 5\ :sup:`th` April 2024 +.. |copyright-date| replace:: 2024 +.. |footer| replace:: Copyright © |copyright-date|, Arm Limited and its + affiliates. All rights reserved. + +.. _ARMARM: https://developer.arm.com/documentation/ddi0487/latest +.. _AAELF64: https://github.com/ARM-software/abi-aa/releases +.. _CPPABI64: https://github.com/ARM-software/abi-aa/releases +.. _CSTD: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1548.pdf + +********************************************************************************************* +C/C++ Atomics Application Binary Interface Standard for the Arm\ :sup:`®` 64-bit Architecture +********************************************************************************************* + +.. class:: version + +|release| + +.. class:: issued + +Date of Issue: |date-of-issue| + +.. class:: logo + +.. image:: Arm_logo_blue_RGB.svg + :scale: 30% + +.. section-numbering:: + +.. raw:: pdf + + PageBreak oneColumn + + +Preamble +======== + +Abstract +-------- + +This document describes the C/C++ Atomics Application Binary Interface for the +Arm 64-bit architecture. This document concerns the valid mappings from C/C++ +Atomic Operations to sequences of A64 instructions. For matters concerning the +memory model, please consult §B2 of the Arm Architecture Reference Manual +[ARMARM_]. We focus only on a subset of the C11 atomic operations at the time +of writing. + +Keywords +-------- + +C++, C, Application Binary Interface, ABI, AArch64, C++ ABI, generic C++ ABI, +Atomics, Concurrency + +Latest release and defects report +--------------------------------- + +Please check `Atomics Application Binary Interface for the Arm® Architecture +`_ for the latest +release of this document. + +Please report defects in this specification to the `issue tracker page +on GitHub +`_. + +.. raw:: pdf + + PageBreak + +Acknowledgement +--------------- + +This document came about in the process of Luke Geeson’s PhD on testing the +compilation of concurrent C/C++ with assistance from Wilco Dijkstra from Arm's +Compiler Teams. + + + +Licence +------- + +This work is licensed under the Creative Commons +Attribution-ShareAlike 4.0 International License. To view a copy of +this license, visit http://creativecommons.org/licenses/by-sa/4.0/ or +send a letter to Creative Commons, PO Box 1866, Mountain View, CA +94042, USA. + +Grant of Patent License. Subject to the terms and conditions of this +license (both the Public License and this Patent License), each +Licensor hereby grants to You a perpetual, worldwide, non-exclusive, +no-charge, royalty-free, irrevocable (except as stated in this +section) patent license to make, have made, use, offer to sell, sell, +import, and otherwise transfer the Licensed Material, where such +license applies only to those patent claims licensable by such +Licensor that are necessarily infringed by their contribution(s) alone +or by combination of their contribution(s) with the Licensed Material +to which such contribution(s) was submitted. If You institute patent +litigation against any entity (including a cross-claim or counterclaim +in a lawsuit) alleging that the Licensed Material or a contribution +incorporated within the Licensed Material constitutes direct or +contributory patent infringement, then any licenses granted to You +under this license for that Licensed Material shall terminate as of +the date such litigation is filed. + +About the license +----------------- + +As identified more fully in the Licence_ section, this project +is licensed under CC-BY-SA-4.0 along with an additional patent +license. The language in the additional patent license is largely +identical to that in Apache-2.0 (specifically, Section 3 of Apache-2.0 +as reflected at https://www.apache.org/licenses/LICENSE-2.0) with two +exceptions. + +First, several changes were made related to the defined terms so as to +reflect the fact that such defined terms need to align with the +terminology in CC-BY-SA-4.0 rather than Apache-2.0 (e.g., changing +“Work” to “Licensed Material”). + +Second, the defensive termination clause was changed such that the +scope of defensive termination applies to “any licenses granted to +You” (rather than “any patent licenses granted to You”). This change +is intended to help maintain a healthy ecosystem by providing +additional protection to the community against patent litigation +claims. + +Contributions +------------- + +Contributions to this project are licensed under an inbound=outbound +model such that any such contributions are licensed by the contributor +under the same terms as those in the `Licence`_ section. + +Trademark notice +---------------- + +The text of and illustrations in this document are licensed by Arm +under a Creative Commons Attribution–Share Alike 4.0 International +license ("CC-BY-SA-4.0”), with an additional clause on patents. +The Arm trademarks featured here are registered trademarks or +trademarks of Arm Limited (or its subsidiaries) in the US and/or +elsewhere. All rights reserved. Please visit +https://www.arm.com/company/policies/trademarks for more information +about Arm’s trademarks. + +Copyright +--------- + +Copyright (c) |copyright-date|, Arm Limited and its affiliates. All rights +reserved. + +.. raw:: pdf + + PageBreak + +.. contents:: + :depth: 3 + +.. raw:: pdf + + PageBreak + +About this document +=================== + +Change control +-------------- + +Current status and anticipated changes +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The following support level definitions are used by the Arm Atomics ABI +specifications: + +**Release** + Arm considers this specification to have enough implementations, which have + received sufficient testing, to verify that it is correct. The details of + these criteria are dependent on the scale and complexity of the change over + previous versions: small, simple changes might only require one + implementation, but more complex changes require multiple independent + implementations, which have been rigorously tested for cross-compatibility. + Arm anticipates that future changes to this specification will be limited to + typographical corrections, clarifications and compatible extensions. + +**Beta** + Arm considers this specification to be complete, but existing + implementations do not meet the requirements for confidence in its release + quality. Arm may need to make incompatible changes if issues emerge from its + implementation. + +**Alpha** + The content of this specification is a draft, and Arm considers the + likelihood of future incompatible changes to be significant. + +All content in this document is at the **Alpha** quality level. + +Change History +-------------- + +If there is no entry in the change history table for a release, there are no +changes to the content of the document for that release. + +.. class:: atomicsabi64-change-history + +.. table:: + + +---------+------------------------------+-------------------------------------------------------------------+ + | Issue | Date | Change | + +=========+==============================+===================================================================+ + | 00alp0 | 5\ :sup:`th` April 2024. | Alpha release. | + +---------+------------------------------+-------------------------------------------------------------------+ + + +References +---------- + +This document refers to, or is referred to by, the following documents. + +.. table:: + + +-------------+--------------------------------------------------------------+-----------------------------------------------------------------------------+ + | Ref | External reference or URL | Title | + +=============+==============================================================+=============================================================================+ + | ARMARM_ | DDI 0487 | Arm Architecture Reference Manual Armv8 for Armv8-A architecture profile | + +-------------+--------------------------------------------------------------+-----------------------------------------------------------------------------+ + | CSTD_ | ISO/IEC 9899:2018 | International Standard ISO/IEC 9899:2018 – Programming languages C. | + +-------------+--------------------------------------------------------------+-----------------------------------------------------------------------------+ + +Note: At the time of writing C23 is not released, as such ISO C17 is considered the latest published document. + +.. raw:: pdf + + PageBreak + +Terms and Abbreviations +----------------------- + +The Atomics ABI for the Arm 64-bit Architecture uses the following terms and +abbreviations. + +A64 + The instruction set available when in AArch64 state. + +AArch64 + The 64-bit general-purpose register width state of the Armv8 architecture. + +ABI + Application Binary Interface: + + 1. The specifications to which an executable must conform in order to + execute in a specific execution environment. For example, the + :title-reference:`Linux ABI for the Arm Architecture`. + + 2. A particular aspect of the specifications to which independently + produced relocatable files must conform in order to be statically + linkable and executable. For example, the C++ ABI for the Arm 64-bit + Architecture [CPPABI64_], or ELF for the Arm Architecture [AAELF64_]. + +Arm-based + ... based on the Arm architecture ... + +Concurrent Program + A C or C++ program that consists of one or more Threads of Execution. Each + Thread of Execution must communicate with other threads in the Concurrent + Program through Shared-Memory Locations, using Atomic Operations to be + deemed *concurrent*. + +Thread of Execution + A unit of computation that executes one or more Atomic Operations, + Synchronization Operations or other C language statements. The Arm + Architecture Reference Manual [ARMARM_] calls these *Observers*. Typically a + thread is defined as a function (e.g. a POSIX thread) although we do not + limit threads to such implementations. + +Atomic Operation + A C/C++ operation on a Shared-Memory Location. Typically either a load, + store, exchange, compare, or arithmetic instruction (such as a fetch and add + operation). Atomics are used to define higher level primitives including + locks and concurrent queues. ISO C defines the range of supported atomic + operations and the ``atomic`` type. Operations on atomic-qualified data are + guaranteed not to be interrupted by another Thread of Execution. + +Synchronization Operation + The order that atomic operations are executed by each Thread of Execution + may not be the same as the order they are written in the program. + Synchronization Operations are statements that constrain the order of + accesses made to Shared-Memory Locations by each thread. Synchronization + Operations include Thread Fences, and certain control flow structures. + +Shared-Memory Location + A memory location that can be accessed by any Thread of Execution in the + program. + +Memory Order Parameters + Describes a constraint on an Atomic Operation or Synchronization Operation. + Memory Order describes how memory accesses made by Atomic Operations may be + ordered with respect to other Atomic Operations and Synchronization + Operations. ISO C defines a ``memory_order`` enum type to capture the + possible memory order parameters. + +Thread Fence + A Thread Fence is a Synchronization Operation that constrains the order of + Accesses made by Atomic Operations on a given Thread of Execution. Fences + are equipped with a Memory Order Parameter that specifies which kinds of + accesses may be reordered before or after the fence. ISO C defines the + ``atomic_thread_fence`` to synchronize the order of accesses made by atomic + operations on ``_Atomic`` qualified data. + +Atomic Instruction + An A64 instruction that may have Memory Order semantics. For instance an A64 + LDR instruction has no atomicity, but the LDAR instruction has *acquire* + semantics. (see [ARMARM_]). + +Assembly Sequence + A sequence of Atomic Instructions. + +Mapping + A pair of Atomic Operation and Assembly Sequence. A compiler generates the + Assembly Sequence, given an Atomic Operation and Compiler Profile as input. + +Compiler Profile + A combination of a compiler and command-line flags that implements a set of + Mappings from Atomic Operations to A64 Assembly Sequences. When the compiler + is provided with a Concurrent Program and Compiler Profile, it generates an + Assembly Sequence. + +More specific terminology is defined when it is first used. + +.. raw:: pdf + + PageBreak + +Overview +======== + +The C/C++ Atomics ABI for the Arm 64-bit architecture (AABI64) comprises the +following sub-components. + +* The `Mappings from Atomic Operations to Assembly Sequences`_, which defines + the mappings from C/C++ atomic operations to sequences of A64 assembly that + are interoperable with respect to each other. + +* A `Declarative statement of Mappings compatibility`_, as far as + non-exhaustive testing can validate, that the aforementioned Mappings can be + used together. That is, there is no tested combination of Mappings that + induces unexpected program behaviour when a compiled program that uses + atomics is executed on a multi-core Arm-based machine. + +Mappings from Atomic Operations to Assembly Sequences +===================================================== + +We now describe the compatible Mappings for C/C++ Atomic Operations and +Assembly Sequences. Since there is a large number of ways these mappings may be +combined, we break down the tables by the width of the access, and list +compatible Assembly Sequences for each Atomic Operation. + +This is an open ABI, we encourage improvements to this specification to be +submitted to the `issue tracker page on +GitHub `_. + +These mappings are not exhaustive, but aim to cover the atomics we have tested. +Please request more atomics using the issue tracker. + +Notational Conventions +---------------------- +To reduce repetition, we use the following notational conventions + +.. table:: + + +-----------------------------------------+--------------------------------------+ + | Memory Order Parameter | Notation | + +=========================================+======================================+ + | ``memory_order_relaxed`` | ``relaxed`` | + +-----------------------------------------+--------------------------------------+ + | ``memory_order_acquire`` | ``acq`` | + +-----------------------------------------+--------------------------------------+ + | ``memory_order_release`` | ``rel`` | + +-----------------------------------------+--------------------------------------+ + | ``memory_order_acq_rel`` | ``acq_rel`` | + +-----------------------------------------+--------------------------------------+ + | ``memory_order_seq_cst`` | ``sc`` | + +-----------------------------------------+--------------------------------------+ + +In what follows ``loc`` refers to the location, ``val`` refers to a value +parameter. + +Arbitrary registers may be used in the Assembly Sequences that may change in +compiler implementations. Cases where arbitrary registers may *not* be used are +covered in the Special Cases section. + +Further, in what follows there may be multiple valid Mappings from Atomic +Operation to Assembly Sequence, as made available by a given architecture +extension. In this case we split the rows of the table to represent multiple +options. + +.. table:: + + +--------------------------------------------------------+--------------------------------------+ + | Atomic Operation | Assembly Sequence | + +============================================+===========+======================================+ + | ``atomic_store_explicit(loc,val,relaxed)`` | ARCH1 | ``option A`` | + + +-----------+--------------------------------------+ + | | ARCH2 | ``option B`` | + +--------------------------------------------+-----------+--------------------------------------+ + +Where ARCH is for example BASE (armv8), LSE, LSE2, LSE128, RCPC, or LRCPC3. + +Lastly, all operations are in a shorthand form: + +.. table:: + + +----------------------------------------------------+--------------------------------------+ + | Atomic Operation | ShortHand Atomic Operation | + +====================================================+======================================+ + | ``atomic_store_explicit(...)`` | ``store(...)`` | + +----------------------------------------------------+--------------------------------------+ + | ``atomic_load_explicit(...)`` | ``load(...)`` | + +----------------------------------------------------+--------------------------------------+ + | ``atomic_thread_fence(...)`` | ``fence(...)`` | + +----------------------------------------------------+--------------------------------------+ + | ``atomic_exchange_explicit(...)`` | ``exchange(...)`` | + +----------------------------------------------------+--------------------------------------+ + | ``atomic_fetch_add_explicit(...)`` | ``fetch_add(...)`` | + +----------------------------------------------------+--------------------------------------+ + | ``atomic_fetch_sub_explicit(...)`` | ``fetch_sub(...)`` | + +----------------------------------------------------+--------------------------------------+ + | ``atomic_fetch_or_explicit(...)`` | ``fetch_or(...)`` | + +----------------------------------------------------+--------------------------------------+ + | ``atomic_fetch_xor_explicit(...)`` | ``fetch_xor(...)`` | + +----------------------------------------------------+--------------------------------------+ + | ``atomic_fetch_and_explicit(...)`` | ``fetch_and(...)`` | + +----------------------------------------------------+--------------------------------------+ + + +Mappings for 32-bit types +------------------------- + +In what follows, register ``X1`` contains the location ``loc`` and ``W2`` +contains ``val``. The result is returned in ``W0``. + +.. table:: + + +------------------------------------------+--------------------------------------+ + | Atomic Operation | Assembly Sequence | + +==========================================+======================================+ + | ``store(loc,val,relaxed)`` | ``STR W2, [X1]`` | + +------------------------------------------+--------------------------------------+ + +| ``store(loc,val,rel)`` + ``STLR W2, [X1]`` + + +| ``store(loc,val,sc)`` + + + +------------------------------------------+--------------------------------------+ + | ``load(loc,relaxed)`` | ``LDR W2, [X1]`` | + +-------------------------------+----------+--------------------------------------+ + | ``load(loc,acq)`` | ``BASE`` | ``LDAR W2, [X1]`` | + + +----------+--------------------------------------+ + | | ``RCPC`` | ``LDAPR W2, [X1]`` | + +-------------------------------+----------+--------------------------------------+ + | ``load(loc,sc)`` | ``LDAR W2, [X1]`` | + +------------------------------------------+--------------------------------------+ + | ``fence(relaxed)`` | ``NOP`` | + +------------------------------------------+--------------------------------------+ + | ``fence(acq)`` | ``DMB ISHLD`` | + +------------------------------------------+--------------------------------------+ + | | ``fence(rel)`` | ``DMB ISH`` | + | | ``fence(acq_rel)`` | | + | | ``fence(sc)`` | | + +-------------------------------+----------+--------------------------------------+ + | ``exchange(loc,val,relaxed)`` | ``BASE`` | | ``loop:`` | + | + + | ``LDXR W0, [X1]`` + + + | | | ``STXR W3, W2, [X1]`` | + | + + | ``CBNZ W3, loop`` + + + +----------+--------------------------------------+ + | | ``LSE`` | ``SWP W2, W0, [X1]`` | + +-------------------------------+----------+--------------------------------------+ + | ``exchange(loc,val,acq)`` | ``BASE`` | | ``loop:`` | + | | | | ``LDAXR W0, [X1]`` | + + + + | ``STXR W3, W2, [X1]`` + + | | | | ``CBNZ W3, loop`` | + + +----------+--------------------------------------+ + | | ``LSE`` | ``SWPA W2, W0, [X1]`` | + +-------------------------------+----------+--------------------------------------+ + | ``exchange(loc,val,rel)`` | ``BASE`` | | ``loop:`` | + | | | | ``LDXR W0, [X1]`` | + + + + | ``STLXR W3, W2, [X1]`` + + | | | | ``CBNZ W3, loop`` | + + +----------+--------------------------------------+ + | | ``LSE`` | ``SWPL W2, W0, [X1]`` | + +-------------------------------+----------+--------------------------------------+ + | ``exchange(loc,val,acq_rel)`` | ``BASE`` | | ``loop:`` | + | ``exchange(loc,val,sc)`` | | | ``LDAXR W0, [X1]`` | + + + + | ``STLXR W3, W2, [X1]`` + + | | | | ``CBNZ W3, loop`` | + + +----------+--------------------------------------+ + | | ``LSE`` | ``SWPAL W2, W0, [X1]`` | + +-------------------------------+----------+--------------------------------------+ + | ``fetch_add(loc,val,relaxed)``| ``BASE`` | | ``loop:`` | + | + + | ``LDXR W0, [X1]`` + + | + + | ``ADD W2, W2, W0`` + + + | | | ``STXR W3, W2, [X1]`` | + | + + | ``CBNZ W3, loop`` + + + +----------+--------------------------------------+ + | | ``LSE`` | ``LDADD W2, W0, [X1]`` | + +-------------------------------+----------+--------------------------------------+ + | ``fetch_add(loc,val,acq)`` | ``BASE`` | | ``loop:`` | + | + + | ``LDAXR W0, [X1]`` + + | + + | ``ADD W2, W2, W0`` + + + | | | ``STXR W3, W2, [X1]`` | + | + + | ``CBNZ W3, loop`` + + + +----------+--------------------------------------+ + | | ``LSE`` | ``LDADDA W2, W0, [X1]`` | + +-------------------------------+----------+--------------------------------------+ + | ``fetch_add(loc,val,rel)`` | ``BASE`` | | ``loop:`` | + | + + | ``LDXR W0, [X1]`` + + | + + | ``ADD W2, W2, W0`` + + + | | | ``STLXR W3, W2, [X1]`` | + | + + | ``CBNZ W3, loop`` + + + +----------+--------------------------------------+ + | | ``LSE`` | ``LDADDL W2, W0, [X1]`` | + +-------------------------------+----------+--------------------------------------+ + | ``fetch_add(loc,val,acq_rel)``| ``BASE`` | | ``loop:`` | + | ``fetch_add(loc,val,sc)`` + + | ``LDXAR W0, [X1]`` + + | + + | ``ADD W2, W2, W0`` + + + | | | ``STLXR W3, W2, [X1]`` | + | + + | ``CBNZ W3, loop`` + + + +----------+--------------------------------------+ + | | ``LSE`` | ``LDADDAL W2, W0, [X1]`` | + +-------------------------------+----------+--------------------------------------+ + | ``compare_exchange_strong(`` | ``BASE`` | | ``loop:`` | + | ``loc,&exp,val,relaxed,`` + + | ``LDXR W0, [X1]`` + + | ``relaxed)`` + + | ``CMP W0, W4`` + + | + + | ``B.NE fail`` + + + | | | ``STXR W3, W2, [X1]`` | + | + + | ``CBNZ W3, loop`` + + + + + | ``fail:`` + + + +----------+--------------------------------------+ + | | ``LSE`` | ``CAS W0, W2, [X1]`` | + +-------------------------------+----------+--------------------------------------+ + | ``compare_exchange_strong(`` | ``BASE`` | | ``loop:`` | + | ``loc,&exp,val,acq,acq)`` + + | ``LDAXR W0, [X1]`` + + | + + | ``CMP W0, W4`` + + | + + | ``B.NE fail`` + + + | | | ``STXR W3, W2, [X1]`` | + | + + | ``CBNZ W3, loop`` + + + + + | ``fail:`` + + + +----------+--------------------------------------+ + | | ``LSE`` | ``CASA W0, W2, [X1]`` | + +-------------------------------+----------+--------------------------------------+ + | ``compare_exchange_strong(`` | ``BASE`` | | ``loop:`` | + | ``loc,&exp,val,rel,rel)`` + + | ``LDXR W0, [X1]`` + + | + + | ``CMP W0, W4`` + + | + + | ``B.NE fail`` + + + | | | ``STLXR W3, W2, [X1]`` | + | + + | ``CBNZ W3, loop`` + + + + + | ``fail:`` + + + +----------+--------------------------------------+ + | | ``LSE`` | ``CASL W0, W2, [X1]`` | + +-------------------------------+----------+--------------------------------------+ + | ``compare_exchange_strong(`` | ``BASE`` | | ``loop:`` | + | ``loc,&exp,val,acq_rel,acq)``+ + | ``LDAXR W0, [X1]`` + + | + + | ``CMP W0, W4`` + + | ``compare_exchange_strong(`` + + | ``B.NE fail`` + + + ``loc,&exp,val,sc,sc)`` | | | ``STLXR W3, W2, [X1]`` | + | + + | ``CBNZ W3, loop`` + + + + + | ``fail:`` + + + +----------+--------------------------------------+ + | | ``LSE`` | ``CASAL W0, W2, [X1]`` | + +-------------------------------+----------+--------------------------------------+ + +Mappings for 8-bit types +------------------------ + +The mappings for 8-bit types are the same as 32-bit types except they use the +``B`` variants of instructions. + + +Mappings for 16-bit types +------------------------- + +The mappings for 16-bit types are the same as 32-bit types except they use the +``H`` variants of instructions. + +Mappings for 64-bit types +------------------------- + +The mappings for 64-bit types are the same as 32-bit types except the registers +used are X-registers. + +Mappings for 128-bit types +-------------------------- + +Since the access width of 128-bit types is double that of the 64-bit register +width, the following Mappings use *pair* instructions, which require their own +table. + +In what follows, register ``X4`` contains the location ``loc``, ``X2`` and +``X3`` contain the input value. The result is returned in ``X0`` and ``X1``. + +.. table:: + + +-----------------------------------------------+--------------------------------------+ + | Atomic Operation | Assembly Sequence | + +=================================+=============+======================================+ + | ``store(loc,val,relaxed)`` | ``BASE`` | | ``loop:`` | + | | | | ``LDXP XZR, X1, [X4]`` | + | | | | ``STXP W5, X2, X3, [X4]`` | + | | | | ``CBNZ W5, loop`` | + + +-------------+--------------------------------------+ + | | ``LSE`` | | ``LDP X0, X1, [X4]`` | + | | | | ``loop:`` | + | | | | ``MOV X6, X0`` | + | | | | ``MOV X7, X1`` | + | | | | ``CASP X0, X1, X2, X3, [X4]`` | + | | | | ``CMP X0, X6`` | + | | | | ``CCMP X1, X7, 0, EQ`` | + | | | | ``B.NE loop`` | + + +-------------+--------------------------------------+ + | | ``LSE2`` | ``STP x2, X3, [X4]`` | + +---------------------------------+-------------+--------------------------------------+ + | ``store(loc,val,rel)`` | ``BASE`` | | ``loop:`` | + | | | | ``LDXP XZR, X1, [X4]`` | + | | | | ``STLXP W5, X2, X3, [X4]`` | + | | | | ``CBNZ W5, loop`` | + + +-------------+--------------------------------------+ + | | ``LSE`` | | ``LDP X0, X1, [X4]`` | + | | | | ``loop:`` | + | | | | ``MOV X6, X0`` | + | | | | ``MOV X7, X1`` | + | | | | ``CASPL X0, X1, X2, X3, [X4]`` | + | | | | ``CMP X0, X6`` | + | | | | ``CCMP X1, X7, 0, EQ`` | + | | | | ``B.NE loop`` | + + +-------------+--------------------------------------+ + | | ``LSE2`` | | ``DMB ISH`` | + | | | | ``STP X2, X3, [X4]`` | + + +-------------+--------------------------------------+ + | | ``LRCPC3`` | ``STILP X2, X3, [X4]`` | + +---------------------------------+-------------+--------------------------------------+ + | ``store(loc,val,sc)`` | ``BASE`` | | ``loop:`` | + | | | | ``LDXP XZR, X1, [X4]`` | + | | | | ``STLXP W5, X2, X3, [X4]`` | + | | | | ``CBNZ W5, loop`` | + + +-------------+--------------------------------------+ + | | ``LSE`` | | ``LDP X0, X1, [X4]`` | + | | | | ``loop:`` | + | | | | ``MOV X6, X0`` | + | | | | ``MOV X7, X1`` | + | | | | ``CASPL X0, X1, X2, X3, [X4]`` | + | | | | ``CMP X0, X6`` | + | | | | ``CCMP X1, X7, 0, EQ`` | + | | | | ``B.NE loop`` | + + +-------------+--------------------------------------+ + | | ``LSE2`` | | ``DMB ISH`` | + | | | | ``STP X2, X3, [X4]`` | + | | | | ``DMB ISH`` | + + +-------------+--------------------------------------+ + | | ``LRCPC3`` | ``STILP X2, X3, [X4]`` | + +---------------------------------+-------------+--------------------------------------+ + | ``load(loc,relaxed)`` | ``BASE`` | | ``loop:`` | + | | | | ``LDXP X0, X1, [X4]`` | + | | | | ``STXP W5, X0, X1, [X4]`` | + | | | | ``CBNZ W5, loop`` | + + +-------------+--------------------------------------+ + | | ``LSE`` | ``CASP X0, X1, X0, X1, [X4]`` | + + +-------------+--------------------------------------+ + | | ``LSE2`` | ``LDP X0, X1, [X4]`` | + +---------------------------------+-------------+--------------------------------------+ + | ``load(loc,acq)`` | ``BASE`` | | ``loop:`` | + | | | | ``LDAXP X0, X1, [X4]`` | + | | | | ``STXP W5, X0, X1, [X4]`` | + | | | | ``CBNZ W5, loop`` | + + +-------------+--------------------------------------+ + | | ``LSE`` | ``CASPA X0, X1, X0, X1, [X4]`` | + + +-------------+--------------------------------------+ + | | ``LSE2`` | | ``LDP X0, X1, [X4]`` | + | | | | ``DMB ISHLD`` | + + +-------------+--------------------------------------+ + | | ``LRCPC3`` | ``LDIAPP X0, X1, [X4]`` | + +---------------------------------+-------------+--------------------------------------+ + | ``load(loc,sc)`` | ``BASE`` | | ``loop:`` | + | | | | ``LDAXP X0, X1, [X4]`` | + | | | | ``STXP W5, X0, X1, [X4]`` | + | | | | ``CBNZ W5, loop`` | + + +-------------+--------------------------------------+ + | | ``LSE`` | ``CASPA X0, X1, X0, X1, [X4]`` | + + +-------------+--------------------------------------+ + | | ``LSE2`` | | ``LDAR X5, [X4]`` | + | | | | ``LDP X0, X1, [X4]`` | + | | | | ``DMB ISHLD`` | + + +-------------+--------------------------------------+ + | | ``LRCPC3`` | | ``LDAR X5, [X4]`` | + | | | | ``LDIAPP X0, X1, [X4]`` | + +---------------------------------+-------------+--------------------------------------+ + | ``exchange(loc,val,relaxed)`` | ``BASE`` | | ``loop:`` | + | | | | ``LDXP X0, X1, [X4]`` | + | | | | ``STXP W5, X2, X3, [X4]`` | + | | | | ``CBNZ W5, loop`` | + + +-------------+--------------------------------------+ + | | ``LSE`` | | ``LDP X0, X1, [X4]`` | + | | | | ``loop:`` | + | | | | ``MOV X6, X0`` | + | | | | ``MOV X7, X1`` | + | | | | ``CASP X0, X1, X2, X3, [X4]`` | + | | | | ``CMP X0, X6`` | + | | | | ``CCMP X1, X7, 0, EQ`` | + | | | | ``B.NE loop`` | + + +-------------+--------------------------------------+ + | | ``LSE128`` | | ``MOV X0, X2`` | + | | | | ``MOV X1, X3`` | + | | | | ``SWPP X0, X1, [X4]`` | + +---------------------------------+-------------+--------------------------------------+ + | ``exchange(loc,val,acq)`` | ``BASE`` | | ``loop:`` | + | | | | ``LDAXP X0, X1, [X4]`` | + | | | | ``STXP W5, X2, X3, [X4]`` | + | | | | ``CBNZ W5, loop`` | + + +-------------+--------------------------------------+ + | | ``LSE`` | | ``LDP X0, X1, [X4]`` | + | | | | ``loop:`` | + | | | | ``MOV X6, X0`` | + | | | | ``MOV X7, X1`` | + | | | | ``CASPA X0, X1, X2, X3, [X4]`` | + | | | | ``CMP X0, X6`` | + | | | | ``CCMP X1, X7, 0, EQ`` | + | | | | ``B.NE loop`` | + + +-------------+--------------------------------------+ + | | ``LSE128`` | | ``MOV X0, X2`` | + | | | | ``MOV X1, X3`` | + | | | | ``SWPPA X0, X1, [X4]`` | + +---------------------------------+-------------+--------------------------------------+ + | ``exchange(loc,val,rel)`` | ``BASE`` | | ``loop:`` | + | | | | ``LDXP X0, X1, [X4]`` | + | | | | ``STLXP W5, X2, X3, [X4]`` | + | | | | ``CBNZ W5, loop`` | + + +-------------+--------------------------------------+ + | | ``LSE`` | | ``LDP X0, X1, [X4]`` | + | | | | ``loop:`` | + | | | | ``MOV X6, X0`` | + | | | | ``MOV X7, X1`` | + | | | | ``CASPL X0, X1, X2, X3, [X4]`` | + | | | | ``CMP X0, X6`` | + | | | | ``CCMP X1, X7, 0, EQ`` | + | | | | ``B.NE loop`` | + + +-------------+--------------------------------------+ + | | ``LSE128`` | | ``MOV X0, X2`` | + | | | | ``MOV X1, X3`` | + | | | | ``SWPPL X0, X1, [X4]`` | + +---------------------------------+-------------+--------------------------------------+ + | ``exchange(loc,val,acq_rel)`` | ``BASE`` | | ``loop:`` | + | ``exchange(loc,val,sc)`` | | | ``LDAXP X0, X1, [X4]`` | + | | | | ``STLXP W5, X2, X3, [X4]`` | + | | | | ``CBNZ W5, loop`` | + + +-------------+--------------------------------------+ + | | ``LSE`` | | ``LDP X0, X1, [X4]`` | + | | | | ``loop:`` | + | | | | ``MOV X6, X0`` | + | | | | ``MOV X7, X1`` | + | | | | ``CASPAL X0, X1, X2, X3, [X4]`` | + | | | | ``CMP X0, X6`` | + | | | | ``CCMP X1, X7, 0, EQ`` | + | | | | ``B.NE loop`` | + + +-------------+--------------------------------------+ + | | ``LSE128`` | | ``MOV X0, X2`` | + | | | | ``MOV X1, X3`` | + | | | | ``SWPPAL X0, X1, [X4]`` | + +---------------------------------+-------------+--------------------------------------+ + | ``fetch_add(loc,val,relaxed)`` | ``BASE`` | | ``loop:`` | + | | | | ``LDXP X0, X1, [X4]`` | + | | | | ``ADDS X0, X0, X2`` | + | | | | ``ADC X1, X1, X3`` | + | | | | ``STXP W5, X2, X3, [X4]`` | + | | | | ``CBNZ W5, loop`` | + + +-------------+--------------------------------------+ + | | ``LSE`` | | ``LDP X0, X1, [X4]`` | + | | | | ``loop:`` | + | | | | ``MOV X6, X0`` | + | | | | ``MOV X7, X1`` | + | | | | ``ADDS X8, X0, X2`` | + | | | | ``ADC X9, X1, X3`` | + | | | | ``CASP X0, X1, X8, X9, [X4]`` | + | | | | ``CMP X0, X6`` | + | | | | ``CCMP X1, X7, 0, EQ`` | + | | | | ``B.NE loop`` | + +---------------------------------+-------------+--------------------------------------+ + | ``fetch_add(loc,val,acq)`` | ``BASE`` | | ``loop:`` | + | | | | ``LDAXP X0, X1, [X4]`` | + | | | | ``ADDS X0, X0, X2`` | + | | | | ``ADC X1, X1, X3`` | + | | | | ``STXP W5, X2, X3, [X4]`` | + | | | | ``CBNZ W5, loop`` | + + +-------------+--------------------------------------+ + | | ``LSE`` | | ``LDP X0, X1, [X4]`` | + | | | | ``loop:`` | + | | | | ``MOV X6, X0`` | + | | | | ``MOV X7, X1`` | + | | | | ``ADDS X8, X0, X2`` | + | | | | ``ADC X9, X1, X3`` | + | | | | ``CASPA X0, X1, X8, X9, [X4]`` | + | | | | ``CMP X0, X6`` | + | | | | ``CCMP X1, X7, 0, EQ`` | + | | | | ``B.NE loop`` | + +---------------------------------+-------------+--------------------------------------+ + | ``fetch_add(loc,val,rel)`` | ``BASE`` | | ``loop:`` | + | | | | ``LDXP X0, X1, [X4]`` | + | | | | ``ADDS X0, X0, X2`` | + | | | | ``ADC X1, X1, X3`` | + | | | | ``STLXP W5, X2, X3, [X4]`` | + | | | | ``CBNZ W5, loop`` | + + +-------------+--------------------------------------+ + | | ``LSE`` | | ``LDP X0, X1, [X4]`` | + | | | | ``loop:`` | + | | | | ``MOV X6, X0`` | + | | | | ``MOV X7, X1`` | + | | | | ``ADDS X8, X0, X2`` | + | | | | ``ADC X9, X1, X3`` | + | | | | ``CASPL X0, X1, X8, X9, [X4]`` | + | | | | ``CMP X0, X6`` | + | | | | ``CCMP X1, X7, 0, EQ`` | + | | | | ``B.NE loop`` | + +---------------------------------+-------------+--------------------------------------+ + | ``fetch_add(loc,val,acq_rel)`` | ``BASE`` | | ``loop:`` | + | ``fetch_add(loc,val,sc)`` | | | ``LDAXP X0, X1, [X4]`` | + | | | | ``ADDS X0, X0, X2`` | + | | | | ``ADC X1, X1, X3`` | + | | | | ``STXLP W5, X2, X3, [X4]`` | + | | | | ``CBNZ W5, loop`` | + + +-------------+--------------------------------------+ + | | ``LSE`` | | ``LDP X0, X1, [X4]`` | + | | | | ``loop:`` | + | | | | ``MOV X6, X0`` | + | | | | ``MOV X7, X1`` | + | | | | ``ADDS X8, X0, X2`` | + | | | | ``ADC X9, X1, X3`` | + | | | | ``CASPAL X0, X1, X8, X9, [X4]`` | + | | | | ``CMP X0, X6`` | + | | | | ``CCMP X1, X7, 0, EQ`` | + | | | | ``B.NE loop`` | + +---------------------------------+-------------+--------------------------------------+ + | ``fetch_or(loc,val,relaxed)`` | ``LSE128`` | | ``MOV X0, X2`` | + | | | | ``MOV X1, X3`` | + | | | | ``LDSETP X0, X1, [X4]`` | + +---------------------------------+-------------+--------------------------------------+ + | ``fetch_or(loc,val,acq)`` | ``LSE128`` | | ``MOV X0, X2`` | + | | | | ``MOV X1, X3`` | + | | | | ``LDSETPA X0, X1, [X4]`` | + +---------------------------------+-------------+--------------------------------------+ + | ``fetch_or(loc,val,rel)`` | ``LSE128`` | | ``MOV X0, X2`` | + | | | | ``MOV X1, X3`` | + | | | | ``LDSETPL X0, X1, [X4]`` | + +---------------------------------+-------------+--------------------------------------+ + | ``fetch_or(loc,val,acq_rel)`` | ``LSE128`` | | ``MOV X0, X2`` | + | ``fetch_or(loc,val,sc)`` | | | ``MOV X1, X3`` | + | | | | ``LDSETPAL X0, X1, [X4]`` | + +---------------------------------+-------------+--------------------------------------+ + | ``fetch_and(loc,val,relaxed)`` | ``LSE128`` | | ``MVN X0, X2`` | + | | | | ``MVN X1, X3`` | + | | | | ``LDCLRP X0, X1, [X4]`` | + +---------------------------------+-------------+--------------------------------------+ + | ``fetch_and(loc,val,acq)`` | ``LSE128`` | | ``MVN X0, X2`` | + | | | | ``MVN X1, X3`` | + | | | | ``LDCLRPA X0, X1, [X4]`` | + +---------------------------------+-------------+--------------------------------------+ + | ``fetch_and(loc,val,rel)`` | ``LSE128`` | | ``MVN X0, X2`` | + | | | | ``MVN X1, X3`` | + | | | | ``LDCLRPL X0, X1, [X4]`` | + +---------------------------------+-------------+--------------------------------------+ + | ``fetch_and(loc,val,acq_rel)`` | ``LSE128`` | | ``MVN X0, X2`` | + | ``fetch_and(loc,val,sc)`` | | | ``MVN X1, X3`` | + | | | | ``LDCLRPAL X0, X1, [X4]`` | + +---------------------------------+-------------+--------------------------------------+ + | ``compare_exchange_strong(`` | ``BASE`` | | ``loop:`` | + | ``loc,&exp,val,relaxed,`` + + | ``LDXP X6, x7, [X4]`` + + | ``relaxed)`` + + | ``CMP X6, X0`` + + + | | | ``CCMP X7, X1, 0, EQ`` | + | + + | ``CSEL X8, X2, X6, EQ`` + + | + + | ``CSEL X9, X3, X7, EQ`` + + | + + | ``STXP W5, X8, X9, [X4]`` + + | + + | ``CBNZ W5, loop`` + + + + + | ``MOV X0, X6`` + + + + + | ``MOV X1, X7`` + + + +-------------+--------------------------------------+ + | | ``LSE`` | ``CASP X0, X1, X2, X3, [X4]`` | + +---------------------------------+-------------+--------------------------------------+ + | ``compare_exchange_strong(`` | ``BASE`` | | ``loop:`` | + | ``loc,&exp,val,acq, acq)`` + + | ``LDAXP X6, x7, [X4]`` + + | + + | ``CMP X6, X0`` + + + | | | ``CCMP X7, X1, 0, EQ`` | + | + + | ``CSEL X8, X2, X6, EQ`` + + | + + | ``CSEL X9, X3, X7, EQ`` + + | + + | ``STXP W5, X8, X9, [X4]`` + + | + + | ``CBNZ W5, loop`` + + + + + | ``MOV X0, X6`` + + + + + | ``MOV X1, X7`` + + + +-------------+--------------------------------------+ + | | ``LSE`` | ``CASPA X0, X1, X2, X3, [X4]`` | + +---------------------------------+-------------+--------------------------------------+ + | ``compare_exchange_strong(`` | ``BASE`` | | ``loop:`` | + | ``loc,&exp,val,rel,rel)`` + + | ``LDXP X6, x7, [X4]`` + + | + + | ``CMP X6, X0`` + + + | | | ``CCMP X7, X1, 0, EQ`` | + | + + | ``CSEL X8, X2, X6, EQ`` + + | + + | ``CSEL X9, X3, X7, EQ`` + + | + + | ``STLXP W5, X8, X9, [X4]`` + + | + + | ``CBNZ W5, loop`` + + + + + | ``MOV X0, X6`` + + + + + | ``MOV X1, X7`` + + + +-------------+--------------------------------------+ + | | ``LSE`` | ``CASPL X0, X1, X2, X3, [X4]`` | + +---------------------------------+-------------+--------------------------------------+ + | ``compare_exchange_strong(`` | ``BASE`` | | ``loop:`` | + | ``loc,&exp,val,acq_rel,acq)`` + + | ``LDAXP X6, x7, [X4]`` + + | + + | ``CMP X6, X0`` + + + ``compare_exchange_strong(`` | | | ``CCMP X7, X1, 0, EQ`` | + | ``loc,&exp,val,sc,sc)`` + + | ``CSEL X8, X2, X6, EQ`` + + | + + | ``CSEL X9, X3, X7, EQ`` + + | + + | ``STLXP W5, X8, X9, [X4]`` + + | + + | ``CBNZ W5, loop`` + + + + + | ``MOV X0, X6`` + + + + + | ``MOV X1, X7`` + + + +-------------+--------------------------------------+ + | | ``LSE`` | ``CASPAL X0, X1, X2, X3, [X4]`` | + +---------------------------------+-------------+--------------------------------------+ + + +We do not list other variants of ``fetch_`` since their mappings should be +the same (modulo implementations of that are not in scope of this +document). Precisely implementations that use loops should use the instructions +that load or store from memory with the relevant memory order, and the +appropriate Assembly Sequence inside the loop. Exceptions, where Assembly +Sequences exist, are stated (for instance ``fetch_or`` can be implemented using +``LDSETP`` when the LSE128 extension is enabled). + +Special Cases +------------- + +There are special cases in the Mappings presented above, these must be handled +in order to prevent unexpected outcomes of the compiled program. + +Re-Ordering of Read-Modify-Write Effects and Acquire Fence +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Consider the following Concurrent Program:: + + // Shared-Memory Locations + _Atomic int* x; + _Atomic int* y; + + // Memory Order Parameter + #define relaxed memory_order_relaxed + #define release memory_order_release + #define acquire memory_order_acquire + + // Threads of Execution + void thread_0 () { + atomic_store_explicit(x,1,relaxed); + atomic_thread_fence(release); + atomic_store_explicit(y,1,relaxed); + } + + void thread_1 () { + atomic_exchange_explicit(y,2,release); + atomic_thread_fence(acquire); + int r0 = atomic_load_explicit(x,relaxed); + } + + +Under ISO C, the above Concurrent Program finishes execution in one of three +possible outcomes:: + + { thread_1:r0=0; y=1; } + { thread_1:r0=1; y=1; } + { thread_1:r0=1; y=2; } + +In this case the value read by the exchange on ``thread_1`` is not used, and a +compiler is free to remove references to unused data. It is thus legal under +ISO C for a compiler to translate the program into the following Assembly +Sequences:: + + thread_0: + MOV W9,#1 + STR W9,[X2] + DMB ISH + STR W3,[X4] + + thread_1: + MOV W9,#2 + SWP W9, WZR, [X2] + DMB ISHLD + LDR W3,[X4] + +where ``thread_0:X2`` contains the address of ``x``, ``thread_0:X4`` contains +the address of ``y``, and +``thread_1:X2`` contains the address of ``y``, ``thread_1:X4`` contains the +address of ``x``. + +Note: the ``exchange`` Atomic Operation is compiled to a ``SWP`` Assembly +Instruction, where its destination register is the zero register ``WZR``. The +``acquire`` fence on ``thread_1`` is compiled to the ``DMB ISHLD`` Assembly +Instruction. + +Executing the compiled program on an Arm-based machine from a fixed initial +state (where ``x`` and ``y`` are ``0``) produces one of the following outcomes, +according to the AArch64 Memory Model contained in §B2 of the Arm Architecture +Reference Manual [ARMARM_]:: + + { thread_1:r0=0; [y]=1; } + { thread_1:r0=0; [y]=2; } <-- Forbidden by source model, a bug! + { thread_1:r0=1; [y]=1; } + { thread_1:r0=1; [y]=2; } + +By comparing ``W3`` and the local variable ``r0`` of the original Concurrent +Program we see there is one additional Outcome of executing the compiled +program that is not an outcome of executing the Concurrent Program. This is due +to the fact that according to the Arm Architecture Reference Manual [ARMARM_] +*instructions where the destination register is WZR or XZR, are not regarded as +doing a read for the purpose of a DMB LD barrier.* + +ISO C permits a conforming implementation to delete unused data, but in this +case it introduces another Outcome of Execution. To fix this issue, a compiler +should not rewrite the destination register to be the zero register in this +case:: + + thread_0: + MOV W9,#1 + STR W9,[X2] + DMB ISH + STR W3,[X4] + + thread_1: + MOV W9,#2 + SWP W9, W10, [X2] + DMB ISHLD + LDR W3,[X4] + +Executing the compiled program on an Arm-based machine from a fixed initial +state (where ``x`` and ``y`` are ``0``) produces one of the following outcomes, +according to the AArch64 Memory Model contained in §B2 of the Arm Architecture +Reference Manual [ARMARM_]:: + + { thread_1:r0=0; [y]=1; } + { thread_1:r0=1; [y]=1; } + { thread_1:r0=1; [y]=2; } + +As such the unexpected outcome has disappeared. There are multiple Mappings +that exhibit this behaviour, those effected make use of ``SWP`` and ``LD`` +Assembly instructions. These include but are not limited to: + +.. table:: + + +-----------------------------------------+--------------------------------------+ + | Atomic Operation | Assembly Sequence | + +=========================================+======================================+ + | ``exchange(loc,val,sc)`` | ``MOV W4, #val;`` | + | | ``SWP W4, W10, [X1]`` | + +-----------------------------------------+--------------------------------------+ + | ``fetch_add(loc,val,sc)`` | ``MOV W4, #val;`` | + | | ``LDADD W4, W10, [X1]`` | + +-----------------------------------------+--------------------------------------+ + +Where ``X1`` contains the address of ``loc``. + +Const-Qualified 128-bit Atomic Loads +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Const-qualified data containing 128-bit atomic types should not be placed +in readonly memory (the ``.rodata`` section). + +Before LSE2, the only way to implement a single-copy 128-bit atomic load +is by using a Read-Modify-Write sequence. The write is not visible to +software if the memory is writeable. Compilers and runtimes should use the +LSE2/LRCPC3 sequence when available. + + +Declarative statement of Mappings compatibility +=============================================== + +To ensure that the above Mappings are ABI-compatible we test the compilation of +Concurrent Programs, where each Atomic Operation is compiled to one of the +aforementioned Mappings. We test if there is a compiled program that exhibits +an outcome of execution according to the AArch64 Memory Model contained in §B2 +of the Arm Architecture Reference Manual [ARMARM_] that is not an outcome of +execution of the source program under the ISO C model. In this section we +define the process by which we test compatibility. + +The Mix Testing Process +----------------------- + +We test for Compiler bugs, a compiler bug is defined as an Outcome of a +compiled program execution (under the AArch64 model) that is not an Outcome of +execution of the source Concurrent Program (under the ISO C model). Consider +the hypothetical example where a source Concurrent Program finishes execution +in one of three possible outcomes:: + + { thread_0:r0=0, thread_1:r0=1 } + { thread_0:r0=1, thread_1:r0=0 } + { thread_0:r0=1, thread_1:r0=1 } + +and one possible compiled program outcome has the following according to the +AArch64 Memory Model contained in §B2 of the Arm Architecture Reference Manual +[ARMARM_]:: + + { thread_0:X3=0, thread_1:X3=0 } <--- Forbidden by source model, compiler bug! + { thread_0:X3=0, thread_1:X3=1 } + { thread_0:X3=1, thread_1:X3=0 } + { thread_0:X3=1, thread_1:X3=1 } + +By comparing ``X3`` and the local variable ``r0`` of the original Concurrent +Program in this example we see there is one additional outcome of executing the +compiled program that is not an outcome of executing the source program (under +the respective models). This suggests the Mappings under question are +incompatible, and a compiler that implements them exhibits a compiler bug. To +ensure compatibility we therefore test for the absence of such Outcomes of the +compiled programs when mixing all combinations of the above Mappings. We define +the *Mix Testing* process as follows: + +#. Given a C/C++ Concurrent Program. +#. Split it into its representative Atomic Operations. +#. Compile each Atomic Operation separately using a Compiler Profile that + generates Assembly Sequences under a given Mapping. +#. Combine the Assembly Sequences into *multiple* possible Compiled Programs. +#. Compute the outcomes of executing the Source Concurrent Program under the + ISO C memory model. Get source program outcomes *S*. +#. Compute the outcomes of each compiled program under the AArch64 memory model + [ARMARM_]. Get a *set* of compiled program outcomes *C*. +#. If any *c* in *C* exhibits a compiler bug with respect to the outcomes *S* + then the given mappings are not interoperable. + +Using Mix Testing we now define ABI-Compatibility of Atomic Operations. + + +Definition of ABI-Compatibility for Atomic Operations +----------------------------------------------------- + +*A compiler that implements the above set of Mappings is ABI-Compatible with +respect to other compilers that implement the Mappings, if Mix Testing their +code generation finds no compiler bugs.* + +We impose some constraints on this definition: + +* This is not a correctness guarantee, but rather a statement backed up by + bounded testing. Atomics ABI-compatibility is thus tested for the Mappings + above by generating C/C++ Concurrent Programs that permute combinations of + Atomic Operations on each Thread of Execution. We bound our test size between + 2 and 5 Threads of Execution, where each Thread has at least 1 Atomic + Operation or Synchronization Operation and at most 5 Atomic Operations or + Synchronization Operations. We do not make any statement about the + ABI-Compatibility of Concurrent Programs outside these bounds. +* We test Concurrent Programs with a fixed initial state, loop unroll factor + (equal to 1 loop unroll), and function calls or recursion. +* The above Mappings are not exhaustive, We hope that Arm's partners will + submit requests for other Mappings to the ABI team using the issue tracker + page on GitHub. +* This document makes no statement about the ABI-Compatibility of optimised + Concurrent Programs, nor does a statement concerning the performance of + compiled programs under the above Mappings when executed on a given Arm-based + machine. +* This document makes no statement about the ABI-Compatibility of compilers + that implement Mappings other than what is stated in this document. + diff --git a/tools/common/check-rst-syntax.sh b/tools/common/check-rst-syntax.sh index cd99217e..842bec51 100755 --- a/tools/common/check-rst-syntax.sh +++ b/tools/common/check-rst-syntax.sh @@ -38,6 +38,9 @@ declare -a docs=( # semihosting "semihosting" + + # atomics + "atomicsabi64" ) for doc in "${docs[@]}"; do diff --git a/tools/common/generate-release-links.sh b/tools/common/generate-release-links.sh index db008878..2774e788 100755 --- a/tools/common/generate-release-links.sh +++ b/tools/common/generate-release-links.sh @@ -57,6 +57,7 @@ cat < Date: Thu, 2 May 2024 15:34:22 +0100 Subject: [PATCH 02/17] [atomicsabi64]: address code review comments --- atomicsabi64/README.md | 2 +- atomicsabi64/atomicsabi64.rst | 1324 ++++++++++++++++++++------------- 2 files changed, 789 insertions(+), 537 deletions(-) diff --git a/atomicsabi64/README.md b/atomicsabi64/README.md index 64136bd4..24bea6b6 100644 --- a/atomicsabi64/README.md +++ b/atomicsabi64/README.md @@ -2,7 +2,7 @@ -# Atomics ABI for the Arm® 64-bit Architecture (AArch64) +# C/C++ Atomics ABI for the Arm® 64-bit Architecture (AArch64) ## About this document diff --git a/atomicsabi64/atomicsabi64.rst b/atomicsabi64/atomicsabi64.rst index 128067a3..f0520c33 100644 --- a/atomicsabi64/atomicsabi64.rst +++ b/atomicsabi64/atomicsabi64.rst @@ -13,6 +13,7 @@ .. _AAELF64: https://github.com/ARM-software/abi-aa/releases .. _CPPABI64: https://github.com/ARM-software/abi-aa/releases .. _CSTD: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1548.pdf +.. _PAPER: https://doi.org/10.1109/CGO57630.2024.10444836 ********************************************************************************************* C/C++ Atomics Application Binary Interface Standard for the Arm\ :sup:`®` 64-bit Architecture @@ -45,7 +46,7 @@ Abstract -------- This document describes the C/C++ Atomics Application Binary Interface for the -Arm 64-bit architecture. This document concerns the valid mappings from C/C++ +Arm 64-bit architecture. This document concerns the valid Mappings from C/C++ Atomic Operations to sequences of A64 instructions. For matters concerning the memory model, please consult §B2 of the Arm Architecture Reference Manual [ARMARM_]. We focus only on a subset of the C11 atomic operations at the time @@ -60,7 +61,7 @@ Atomics, Concurrency Latest release and defects report --------------------------------- -Please check `Atomics Application Binary Interface for the Arm® Architecture +Please check `C/C++ Atomics Application Binary Interface Standard for the Arm 64-bit Architecture `_ for the latest release of this document. @@ -230,8 +231,15 @@ This document refers to, or is referred to by, the following documents. +-------------+--------------------------------------------------------------+-----------------------------------------------------------------------------+ | CSTD_ | ISO/IEC 9899:2018 | International Standard ISO/IEC 9899:2018 – Programming languages C. | +-------------+--------------------------------------------------------------+-----------------------------------------------------------------------------+ + | AAELF64_ | ELF for the Arm 64-bit Architecture (AArch64) | ELF for the Arm 64-bit Architecture (AArch64) | + +-------------+--------------------------------------------------------------+-----------------------------------------------------------------------------+ + | PAPER_ | CGO paper | Compiler Testing with Relaxed Memory Models | + +-------------+--------------------------------------------------------------+-----------------------------------------------------------------------------+ + + -Note: At the time of writing C23 is not released, as such ISO C17 is considered the latest published document. +Note: At the time of writing C23 is not released, as such ISO C17 is considered +the latest published document. .. raw:: pdf @@ -240,7 +248,7 @@ Note: At the time of writing C23 is not released, as such ISO C17 is considered Terms and Abbreviations ----------------------- -The Atomics ABI for the Arm 64-bit Architecture uses the following terms and +The C/C++ Atomics ABI for the Arm 64-bit Architecture uses the following terms and abbreviations. A64 @@ -264,12 +272,6 @@ ABI Arm-based ... based on the Arm architecture ... -Concurrent Program - A C or C++ program that consists of one or more Threads of Execution. Each - Thread of Execution must communicate with other threads in the Concurrent - Program through Shared-Memory Locations, using Atomic Operations to be - deemed *concurrent*. - Thread of Execution A unit of computation that executes one or more Atomic Operations, Synchronization Operations or other C language statements. The Arm @@ -285,18 +287,26 @@ Atomic Operation operations and the ``atomic`` type. Operations on atomic-qualified data are guaranteed not to be interrupted by another Thread of Execution. +Concurrent Program + A C or C++ program that consists of one or more Threads of Execution. Each + Thread of Execution must communicate with other threads in the Concurrent + Program through Shared-Memory Locations, using both Atomic Operations and + Non-Atomic Operations (Operations that lack the atomic qualifier) to be + deemed *concurrent*. This document focuses on compiling such programs for + Arm-based machines that run the A64 instruction set. + Synchronization Operation The order that atomic operations are executed by each Thread of Execution may not be the same as the order they are written in the program. Synchronization Operations are statements that constrain the order of accesses made to Shared-Memory Locations by each thread. Synchronization - Operations include Thread Fences, and certain control flow structures. + Operations include Thread Fences. Shared-Memory Location A memory location that can be accessed by any Thread of Execution in the program. -Memory Order Parameters +Memory Order Parameter Describes a constraint on an Atomic Operation or Synchronization Operation. Memory Order describes how memory accesses made by Atomic Operations may be ordered with respect to other Atomic Operations and Synchronization @@ -311,23 +321,16 @@ Thread Fence ``atomic_thread_fence`` to synchronize the order of accesses made by atomic operations on ``_Atomic`` qualified data. -Atomic Instruction - An A64 instruction that may have Memory Order semantics. For instance an A64 - LDR instruction has no atomicity, but the LDAR instruction has *acquire* - semantics. (see [ARMARM_]). - Assembly Sequence - A sequence of Atomic Instructions. + A sequence of A64 instructions, optionally including Atomic Instructions. Mapping - A pair of Atomic Operation and Assembly Sequence. A compiler generates the - Assembly Sequence, given an Atomic Operation and Compiler Profile as input. + A Mapping takes an Atomic Operation and Compiler Profile as input, + producing an Assembly Sequence as output. Compiler Profile - A combination of a compiler and command-line flags that implements a set of - Mappings from Atomic Operations to A64 Assembly Sequences. When the compiler - is provided with a Concurrent Program and Compiler Profile, it generates an - Assembly Sequence. + A Compiler implementation and command-line flags or attributes that use + Mappings. More specific terminology is defined when it is first used. @@ -342,8 +345,8 @@ The C/C++ Atomics ABI for the Arm 64-bit architecture (AABI64) comprises the following sub-components. * The `Mappings from Atomic Operations to Assembly Sequences`_, which defines - the mappings from C/C++ atomic operations to sequences of A64 assembly that - are interoperable with respect to each other. + the Mappings from C/C++ atomic operations to sto one of more Assembly + Sequences that are interoperable with respect to each other. * A `Declarative statement of Mappings compatibility`_, as far as non-exhaustive testing can validate, that the aforementioned Mappings can be @@ -355,7 +358,7 @@ Mappings from Atomic Operations to Assembly Sequences ===================================================== We now describe the compatible Mappings for C/C++ Atomic Operations and -Assembly Sequences. Since there is a large number of ways these mappings may be +Assembly Sequences. Since there is a large number of ways these Mappings may be combined, we break down the tables by the width of the access, and list compatible Assembly Sequences for each Atomic Operation. @@ -363,7 +366,7 @@ This is an open ABI, we encourage improvements to this specification to be submitted to the `issue tracker page on GitHub `_. -These mappings are not exhaustive, but aim to cover the atomics we have tested. +These Mappings are not exhaustive, but aim to cover the atomics we have tested. Please request more atomics using the issue tracker. Notational Conventions @@ -409,6 +412,8 @@ options. +--------------------------------------------+-----------+--------------------------------------+ Where ARCH is for example BASE (armv8), LSE, LSE2, LSE128, RCPC, or LRCPC3. +ARCH describes the required extension, with BASE meaning Armv8-A with no +extensions and LSE is shorthand for FEAT_LSE (likewise for the other extensions). Lastly, all operations are in a shorthand form: @@ -443,6 +448,12 @@ Mappings for 32-bit types In what follows, register ``X1`` contains the location ``loc`` and ``W2`` contains ``val``. The result is returned in ``W0``. + +-------------------------------------------------------------------------------------------+ + | Note | + +===========================================================================================+ + | ``*`` Using ``WZR`` or ``XZR`` for the destination register is invalid (Section 4.7). | + +-------------------------------------------------------------------------------------------+ + .. table:: +------------------------------------------+--------------------------------------+ @@ -450,8 +461,8 @@ contains ``val``. The result is returned in ``W0``. +==========================================+======================================+ | ``store(loc,val,relaxed)`` | ``STR W2, [X1]`` | +------------------------------------------+--------------------------------------+ - +| ``store(loc,val,rel)`` + ``STLR W2, [X1]`` + - +| ``store(loc,val,sc)`` + + + | ``store(loc,val,rel)`` | ``STLR W2, [X1]`` | + | ``store(loc,val,sc)`` | | +------------------------------------------+--------------------------------------+ | ``load(loc,relaxed)`` | ``LDR W2, [X1]`` | +-------------------------------+----------+--------------------------------------+ @@ -465,128 +476,168 @@ contains ``val``. The result is returned in ``W0``. +------------------------------------------+--------------------------------------+ | ``fence(acq)`` | ``DMB ISHLD`` | +------------------------------------------+--------------------------------------+ - | | ``fence(rel)`` | ``DMB ISH`` | - | | ``fence(acq_rel)`` | | - | | ``fence(sc)`` | | + | ``fence(rel)`` | ``DMB ISH`` | + | ``fence(acq_rel)`` | | + | ``fence(sc)`` | | +-------------------------------+----------+--------------------------------------+ - | ``exchange(loc,val,relaxed)`` | ``BASE`` | | ``loop:`` | - | + + | ``LDXR W0, [X1]`` + - + | | | ``STXR W3, W2, [X1]`` | - | + + | ``CBNZ W3, loop`` + - + +----------+--------------------------------------+ - | | ``LSE`` | ``SWP W2, W0, [X1]`` | + | ``exchange(loc,val,relaxed)`` | ``BASE`` | ``loop:`` | + | | | ``LDXR W0, [X1]`` | + | | | | + | | | ``STXR W3, W2, [X1]`` | + | | | | + | | | ``CBNZ W3, loop`` | + | +----------+--------------------------------------+ + | | ``LSE`` | ``SWP W2, W0, [X1]`` * | +-------------------------------+----------+--------------------------------------+ - | ``exchange(loc,val,acq)`` | ``BASE`` | | ``loop:`` | - | | | | ``LDAXR W0, [X1]`` | - + + + | ``STXR W3, W2, [X1]`` + - | | | | ``CBNZ W3, loop`` | - + +----------+--------------------------------------+ - | | ``LSE`` | ``SWPA W2, W0, [X1]`` | + | ``exchange(loc,val,acq)`` | ``BASE`` | ``loop:`` | + | | | ``LDAXR W0, [X1]`` | + | | | | + | | | ``STXR W3, W2, [X1]`` | + | | | | + | | | ``CBNZ W3, loop`` | + | +----------+--------------------------------------+ + | | ``LSE`` | ``SWPA W2, W0, [X1]`` * | +-------------------------------+----------+--------------------------------------+ - | ``exchange(loc,val,rel)`` | ``BASE`` | | ``loop:`` | - | | | | ``LDXR W0, [X1]`` | - + + + | ``STLXR W3, W2, [X1]`` + - | | | | ``CBNZ W3, loop`` | - + +----------+--------------------------------------+ - | | ``LSE`` | ``SWPL W2, W0, [X1]`` | + | ``exchange(loc,val,rel)`` | ``BASE`` | ``loop:`` | + | | | ``LDXR W0, [X1]`` | + | | | | + | | | ``STLXR W3, W2, [X1]`` | + | | | | + | | | ``CBNZ W3, loop`` | + | +----------+--------------------------------------+ + | | ``LSE`` | ``SWPL W2, W0, [X1]`` * | +-------------------------------+----------+--------------------------------------+ - | ``exchange(loc,val,acq_rel)`` | ``BASE`` | | ``loop:`` | - | ``exchange(loc,val,sc)`` | | | ``LDAXR W0, [X1]`` | - + + + | ``STLXR W3, W2, [X1]`` + - | | | | ``CBNZ W3, loop`` | - + +----------+--------------------------------------+ - | | ``LSE`` | ``SWPAL W2, W0, [X1]`` | + | ``exchange(loc,val,acq_rel)`` | ``BASE`` | ``loop:`` | + | ``exchange(loc,val,sc)`` | | ``LDAXR W0, [X1]`` | + | | | | + | | | ``STLXR W3, W2, [X1]`` | + | | | | + | | | ``CBNZ W3, loop`` | + | +----------+--------------------------------------+ + | | ``LSE`` | ``SWPAL W2, W0, [X1]`` * | +-------------------------------+----------+--------------------------------------+ - | ``fetch_add(loc,val,relaxed)``| ``BASE`` | | ``loop:`` | - | + + | ``LDXR W0, [X1]`` + - | + + | ``ADD W2, W2, W0`` + - + | | | ``STXR W3, W2, [X1]`` | - | + + | ``CBNZ W3, loop`` + + | ``fetch_add(loc,val,relaxed)``| ``BASE`` | ``loop:`` | + | | | ``LDXR W0, [X1]`` | + | | | | + | | | ``ADD W2, W2, W0`` | + | | | | + | | | ``STXR W3, W2, [X1]`` | + | | | | + | | | ``CBNZ W3, loop`` | + +----------+--------------------------------------+ - | | ``LSE`` | ``LDADD W2, W0, [X1]`` | + | | ``LSE`` | ``LDADD W2, W0, [X1]`` * | +-------------------------------+----------+--------------------------------------+ - | ``fetch_add(loc,val,acq)`` | ``BASE`` | | ``loop:`` | - | + + | ``LDAXR W0, [X1]`` + - | + + | ``ADD W2, W2, W0`` + - + | | | ``STXR W3, W2, [X1]`` | - | + + | ``CBNZ W3, loop`` + - + +----------+--------------------------------------+ - | | ``LSE`` | ``LDADDA W2, W0, [X1]`` | + | ``fetch_add(loc,val,acq)`` | ``BASE`` | ``loop:`` | + | | | ``LDAXR W0, [X1]`` | + | | | | + | | | ``ADD W2, W2, W0`` | + | | | | + | | | ``STXR W3, W2, [X1]`` | + | | | | + | | | ``CBNZ W3, loop`` | + | +----------+--------------------------------------+ + | | ``LSE`` | ``LDADDA W2, W0, [X1]`` * | +-------------------------------+----------+--------------------------------------+ - | ``fetch_add(loc,val,rel)`` | ``BASE`` | | ``loop:`` | - | + + | ``LDXR W0, [X1]`` + - | + + | ``ADD W2, W2, W0`` + - + | | | ``STLXR W3, W2, [X1]`` | - | + + | ``CBNZ W3, loop`` + - + +----------+--------------------------------------+ - | | ``LSE`` | ``LDADDL W2, W0, [X1]`` | + | ``fetch_add(loc,val,rel)`` | ``BASE`` | ``loop:`` | + | | | ``LDXR W0, [X1]`` | + | | | | + | | | ``ADD W2, W2, W0`` | + | | | | + | | | ``STLXR W3, W2, [X1]`` | + | | | | + | | | ``CBNZ W3, loop`` | + | +----------+--------------------------------------+ + | | ``LSE`` | ``LDADDL W2, W0, [X1]`` * | +-------------------------------+----------+--------------------------------------+ - | ``fetch_add(loc,val,acq_rel)``| ``BASE`` | | ``loop:`` | - | ``fetch_add(loc,val,sc)`` + + | ``LDXAR W0, [X1]`` + - | + + | ``ADD W2, W2, W0`` + - + | | | ``STLXR W3, W2, [X1]`` | - | + + | ``CBNZ W3, loop`` + - + +----------+--------------------------------------+ - | | ``LSE`` | ``LDADDAL W2, W0, [X1]`` | + | ``fetch_add(loc,val,acq_rel)``| ``BASE`` | ``loop:`` | + | ``fetch_add(loc,val,sc)`` | | ``LDXAR W0, [X1]`` | + | | | | + | | | ``ADD W2, W2, W0`` | + | | | | + | | | ``STLXR W3, W2, [X1]`` | + | | | | + | | | ``CBNZ W3, loop`` | + | +----------+--------------------------------------+ + | | ``LSE`` | ``LDADDAL W2, W0, [X1]`` * | +-------------------------------+----------+--------------------------------------+ - | ``compare_exchange_strong(`` | ``BASE`` | | ``loop:`` | - | ``loc,&exp,val,relaxed,`` + + | ``LDXR W0, [X1]`` + - | ``relaxed)`` + + | ``CMP W0, W4`` + - | + + | ``B.NE fail`` + - + | | | ``STXR W3, W2, [X1]`` | - | + + | ``CBNZ W3, loop`` + - + + + | ``fail:`` + - + +----------+--------------------------------------+ - | | ``LSE`` | ``CAS W0, W2, [X1]`` | + | ``compare_exchange_strong(`` | ``BASE`` | ``loop:`` | + | ``loc,&exp,val,relaxed,`` | | ``LDXR W0, [X1]`` | + | ``relaxed)`` | | | + | | | ``CMP W0, W4`` | + | | | | + | | | ``B.NE fail`` | + | | | | + | | | ``STXR W3, W2, [X1]`` | + | | | | + | | | ``CBNZ W3, loop`` | + | | | | + | | | ``fail:`` | + | +----------+--------------------------------------+ + | | ``LSE`` | ``CAS W0, W2, [X1]`` * | +-------------------------------+----------+--------------------------------------+ - | ``compare_exchange_strong(`` | ``BASE`` | | ``loop:`` | - | ``loc,&exp,val,acq,acq)`` + + | ``LDAXR W0, [X1]`` + - | + + | ``CMP W0, W4`` + - | + + | ``B.NE fail`` + - + | | | ``STXR W3, W2, [X1]`` | - | + + | ``CBNZ W3, loop`` + - + + + | ``fail:`` + - + +----------+--------------------------------------+ - | | ``LSE`` | ``CASA W0, W2, [X1]`` | + | ``compare_exchange_strong(`` | ``BASE`` | ``loop:`` | + | ``loc,&exp,val,acq,acq)`` | | ``LDAXR W0, [X1]`` | + | | | | + | | | ``CMP W0, W4`` | + | | | | + | | | ``B.NE fail`` | + | | | | + | | | ``STXR W3, W2, [X1]`` | + | | | | + | | | ``CBNZ W3, loop`` | + | | | | + | | | ``fail:`` | + | +----------+--------------------------------------+ + | | ``LSE`` | ``CASA W0, W2, [X1]`` * | +-------------------------------+----------+--------------------------------------+ - | ``compare_exchange_strong(`` | ``BASE`` | | ``loop:`` | - | ``loc,&exp,val,rel,rel)`` + + | ``LDXR W0, [X1]`` + - | + + | ``CMP W0, W4`` + - | + + | ``B.NE fail`` + - + | | | ``STLXR W3, W2, [X1]`` | - | + + | ``CBNZ W3, loop`` + - + + + | ``fail:`` + - + +----------+--------------------------------------+ - | | ``LSE`` | ``CASL W0, W2, [X1]`` | + | ``compare_exchange_strong(`` | ``BASE`` | ``loop:`` | + | ``loc,&exp,val,rel,rel)`` | | ``LDXR W0, [X1]`` | + | | | | + | | | ``CMP W0, W4`` | + | | | | + | | | ``B.NE fail`` | + | | | | + | | | ``STLXR W3, W2, [X1]`` | + | | | | + | | | ``CBNZ W3, loop`` | + | | | | + | | | ``fail:`` | + | +----------+--------------------------------------+ + | | ``LSE`` | ``CASL W0, W2, [X1]`` * | +-------------------------------+----------+--------------------------------------+ - | ``compare_exchange_strong(`` | ``BASE`` | | ``loop:`` | - | ``loc,&exp,val,acq_rel,acq)``+ + | ``LDAXR W0, [X1]`` + - | + + | ``CMP W0, W4`` + - | ``compare_exchange_strong(`` + + | ``B.NE fail`` + - + ``loc,&exp,val,sc,sc)`` | | | ``STLXR W3, W2, [X1]`` | - | + + | ``CBNZ W3, loop`` + - + + + | ``fail:`` + - + +----------+--------------------------------------+ - | | ``LSE`` | ``CASAL W0, W2, [X1]`` | + | ``compare_exchange_strong(`` | ``BASE`` | ``loop:`` | + | ``loc,&exp,val,acq_rel,acq)``| | ``LDAXR W0, [X1]`` | + | ``compare_exchange_strong(`` | | | + | ``loc,&exp,val,sc,sc)`` | | ``CMP W0, W4`` | + | | | | + | | | ``B.NE fail`` | + | | | | + | | | ``STLXR W3, W2, [X1]`` | + | | | | + | | | ``CBNZ W3, loop`` | + | | | | + | | | ``fail:`` | + | +----------+--------------------------------------+ + | | ``LSE`` | ``CASAL W0, W2, [X1]`` * | +-------------------------------+----------+--------------------------------------+ Mappings for 8-bit types ------------------------ -The mappings for 8-bit types are the same as 32-bit types except they use the +The Mappings for 8-bit types are the same as 32-bit types except they use the ``B`` variants of instructions. Mappings for 16-bit types ------------------------- -The mappings for 16-bit types are the same as 32-bit types except they use the +The Mappings for 16-bit types are the same as 32-bit types except they use the ``H`` variants of instructions. Mappings for 64-bit types ------------------------- -The mappings for 64-bit types are the same as 32-bit types except the registers +The Msappings for 64-bit types are the same as 32-bit types except the registers used are X-registers. Mappings for 128-bit types @@ -604,329 +655,500 @@ In what follows, register ``X4`` contains the location ``loc``, ``X2`` and +-----------------------------------------------+--------------------------------------+ | Atomic Operation | Assembly Sequence | +=================================+=============+======================================+ - | ``store(loc,val,relaxed)`` | ``BASE`` | | ``loop:`` | - | | | | ``LDXP XZR, X1, [X4]`` | - | | | | ``STXP W5, X2, X3, [X4]`` | - | | | | ``CBNZ W5, loop`` | - + +-------------+--------------------------------------+ - | | ``LSE`` | | ``LDP X0, X1, [X4]`` | - | | | | ``loop:`` | - | | | | ``MOV X6, X0`` | - | | | | ``MOV X7, X1`` | - | | | | ``CASP X0, X1, X2, X3, [X4]`` | - | | | | ``CMP X0, X6`` | - | | | | ``CCMP X1, X7, 0, EQ`` | - | | | | ``B.NE loop`` | - + +-------------+--------------------------------------+ + | ``store(loc,val,relaxed)`` | ``BASE`` | ``loop:`` | + | | | ``LDXP XZR, X1, [X4]`` | + | | | | + | | | ``STXP W5, X2, X3, [X4]`` | + | | | | + | | | ``CBNZ W5, loop`` | + | +-------------+--------------------------------------+ + | | ``LSE`` | ``LDP X0, X1, [X4]`` | + | | | | + | | | ``loop:`` | + | | | ``MOV X6, X0`` | + | | | | + | | | ``MOV X7, X1`` | + | | | | + | | | ``CASP X0, X1, X2, X3, [X4]`` | + | | | | + | | | ``CMP X0, X6`` | + | | | | + | | | ``CCMP X1, X7, 0, EQ`` | + | | | | + | | | ``B.NE loop`` | + | +-------------+--------------------------------------+ | | ``LSE2`` | ``STP x2, X3, [X4]`` | +---------------------------------+-------------+--------------------------------------+ - | ``store(loc,val,rel)`` | ``BASE`` | | ``loop:`` | - | | | | ``LDXP XZR, X1, [X4]`` | - | | | | ``STLXP W5, X2, X3, [X4]`` | - | | | | ``CBNZ W5, loop`` | - + +-------------+--------------------------------------+ - | | ``LSE`` | | ``LDP X0, X1, [X4]`` | - | | | | ``loop:`` | - | | | | ``MOV X6, X0`` | - | | | | ``MOV X7, X1`` | - | | | | ``CASPL X0, X1, X2, X3, [X4]`` | - | | | | ``CMP X0, X6`` | - | | | | ``CCMP X1, X7, 0, EQ`` | - | | | | ``B.NE loop`` | - + +-------------+--------------------------------------+ - | | ``LSE2`` | | ``DMB ISH`` | - | | | | ``STP X2, X3, [X4]`` | - + +-------------+--------------------------------------+ + | ``store(loc,val,rel)`` | ``BASE`` | ``loop:`` | + | | | ``LDXP XZR, X1, [X4]`` | + | | | ``STLXP W5, X2, X3, [X4]`` | + | | | ``CBNZ W5, loop`` | + | +-------------+--------------------------------------+ + | | ``LSE`` | ``LDP X0, X1, [X4]`` | + | | | | + | | | ``loop:`` | + | | | ``MOV X6, X0`` | + | | | | + | | | ``MOV X7, X1`` | + | | | | + | | | ``CASPL X0, X1, X2, X3, [X4]`` | + | | | | + | | | ``CMP X0, X6`` | + | | | | + | | | ``CCMP X1, X7, 0, EQ`` | + | | | | + | | | ``B.NE loop`` | + | +-------------+--------------------------------------+ + | | ``LSE2`` | ``DMB ISH`` | + | | | | + | | | ``STP X2, X3, [X4]`` | + | +-------------+--------------------------------------+ | | ``LRCPC3`` | ``STILP X2, X3, [X4]`` | +---------------------------------+-------------+--------------------------------------+ - | ``store(loc,val,sc)`` | ``BASE`` | | ``loop:`` | - | | | | ``LDXP XZR, X1, [X4]`` | - | | | | ``STLXP W5, X2, X3, [X4]`` | - | | | | ``CBNZ W5, loop`` | - + +-------------+--------------------------------------+ - | | ``LSE`` | | ``LDP X0, X1, [X4]`` | - | | | | ``loop:`` | - | | | | ``MOV X6, X0`` | - | | | | ``MOV X7, X1`` | - | | | | ``CASPL X0, X1, X2, X3, [X4]`` | - | | | | ``CMP X0, X6`` | - | | | | ``CCMP X1, X7, 0, EQ`` | - | | | | ``B.NE loop`` | - + +-------------+--------------------------------------+ - | | ``LSE2`` | | ``DMB ISH`` | - | | | | ``STP X2, X3, [X4]`` | - | | | | ``DMB ISH`` | - + +-------------+--------------------------------------+ + | ``store(loc,val,sc)`` | ``BASE`` | ``loop:`` | + | | | ``LDXP XZR, X1, [X4]`` | + | | | | + | | | ``STLXP W5, X2, X3, [X4]`` | + | | | | + | | | ``CBNZ W5, loop`` | + | +-------------+--------------------------------------+ + | | ``LSE`` | ``LDP X0, X1, [X4]`` | + | | | | + | | | ``loop:`` | + | | | ``MOV X6, X0`` | + | | | | + | | | ``MOV X7, X1`` | + | | | | + | | | ``CASPL X0, X1, X2, X3, [X4]`` | + | | | | + | | | ``CMP X0, X6`` | + | | | | + | | | ``CCMP X1, X7, 0, EQ`` | + | | | | + | | | ``B.NE loop`` | + | +-------------+--------------------------------------+ + | | ``LSE2`` | ``DMB ISH`` | + | | | | + | | | ``STP X2, X3, [X4]`` | + | | | | + | | | ``DMB ISH`` | + | +-------------+--------------------------------------+ | | ``LRCPC3`` | ``STILP X2, X3, [X4]`` | +---------------------------------+-------------+--------------------------------------+ - | ``load(loc,relaxed)`` | ``BASE`` | | ``loop:`` | - | | | | ``LDXP X0, X1, [X4]`` | - | | | | ``STXP W5, X0, X1, [X4]`` | - | | | | ``CBNZ W5, loop`` | - + +-------------+--------------------------------------+ + | ``load(loc,relaxed)`` | ``BASE`` | ``loop:`` | + | | | ``LDXP X0, X1, [X4]`` | + | | | | + | | | ``STXP W5, X0, X1, [X4]`` | + | | | | + | | | ``CBNZ W5, loop`` | + | +-------------+--------------------------------------+ | | ``LSE`` | ``CASP X0, X1, X0, X1, [X4]`` | - + +-------------+--------------------------------------+ + | +-------------+--------------------------------------+ | | ``LSE2`` | ``LDP X0, X1, [X4]`` | +---------------------------------+-------------+--------------------------------------+ - | ``load(loc,acq)`` | ``BASE`` | | ``loop:`` | - | | | | ``LDAXP X0, X1, [X4]`` | - | | | | ``STXP W5, X0, X1, [X4]`` | - | | | | ``CBNZ W5, loop`` | - + +-------------+--------------------------------------+ + | ``load(loc,acq)`` | ``BASE`` | ``loop:`` | + | | | ``LDAXP X0, X1, [X4]`` | + | | | | + | | | ``STXP W5, X0, X1, [X4]`` | + | | | | + | | | ``CBNZ W5, loop`` | + | +-------------+--------------------------------------+ | | ``LSE`` | ``CASPA X0, X1, X0, X1, [X4]`` | - + +-------------+--------------------------------------+ - | | ``LSE2`` | | ``LDP X0, X1, [X4]`` | - | | | | ``DMB ISHLD`` | - + +-------------+--------------------------------------+ + | +-------------+--------------------------------------+ + | | ``LSE2`` | ``LDP X0, X1, [X4]`` | + | | | | + | | | ``DMB ISHLD`` | + | +-------------+--------------------------------------+ | | ``LRCPC3`` | ``LDIAPP X0, X1, [X4]`` | +---------------------------------+-------------+--------------------------------------+ - | ``load(loc,sc)`` | ``BASE`` | | ``loop:`` | - | | | | ``LDAXP X0, X1, [X4]`` | - | | | | ``STXP W5, X0, X1, [X4]`` | - | | | | ``CBNZ W5, loop`` | - + +-------------+--------------------------------------+ + | ``load(loc,sc)`` | ``BASE`` | ``loop:`` | + | | | ``LDAXP X0, X1, [X4]`` | + | | | | + | | | ``STXP W5, X0, X1, [X4]`` | + | | | | + | | | ``CBNZ W5, loop`` | + | +-------------+--------------------------------------+ | | ``LSE`` | ``CASPA X0, X1, X0, X1, [X4]`` | - + +-------------+--------------------------------------+ - | | ``LSE2`` | | ``LDAR X5, [X4]`` | - | | | | ``LDP X0, X1, [X4]`` | - | | | | ``DMB ISHLD`` | - + +-------------+--------------------------------------+ - | | ``LRCPC3`` | | ``LDAR X5, [X4]`` | - | | | | ``LDIAPP X0, X1, [X4]`` | + | +-------------+--------------------------------------+ + | | ``LSE2`` | ``LDAR X5, [X4]`` | + | | | | + | | | ``LDP X0, X1, [X4]`` | + | | | | + | | | ``DMB ISHLD`` | + | +-------------+--------------------------------------+ + | | ``LRCPC3`` | ``LDAR X5, [X4]`` | + | | | | + | | | ``LDIAPP X0, X1, [X4]`` | +---------------------------------+-------------+--------------------------------------+ - | ``exchange(loc,val,relaxed)`` | ``BASE`` | | ``loop:`` | - | | | | ``LDXP X0, X1, [X4]`` | - | | | | ``STXP W5, X2, X3, [X4]`` | - | | | | ``CBNZ W5, loop`` | - + +-------------+--------------------------------------+ - | | ``LSE`` | | ``LDP X0, X1, [X4]`` | - | | | | ``loop:`` | - | | | | ``MOV X6, X0`` | - | | | | ``MOV X7, X1`` | - | | | | ``CASP X0, X1, X2, X3, [X4]`` | - | | | | ``CMP X0, X6`` | - | | | | ``CCMP X1, X7, 0, EQ`` | - | | | | ``B.NE loop`` | - + +-------------+--------------------------------------+ - | | ``LSE128`` | | ``MOV X0, X2`` | - | | | | ``MOV X1, X3`` | - | | | | ``SWPP X0, X1, [X4]`` | + | ``exchange(loc,val,relaxed)`` | ``BASE`` | ``loop:`` | + | | | ``LDXP X0, X1, [X4]`` | + | | | | + | | | ``STXP W5, X2, X3, [X4]`` | + | | | | + | | | ``CBNZ W5, loop`` | + | +-------------+--------------------------------------+ + | | ``LSE`` | ``LDP X0, X1, [X4]`` | + | | | | + | | | ``loop:`` | + | | | ``MOV X6, X0`` | + | | | | + | | | ``MOV X7, X1`` | + | | | | + | | | ``CASP X0, X1, X2, X3, [X4]`` | + | | | | + | | | ``CMP X0, X6`` | + | | | | + | | | ``CCMP X1, X7, 0, EQ`` | + | | | | + | | | ``B.NE loop`` | + | +-------------+--------------------------------------+ + | | ``LSE128`` | ``MOV X0, X2`` | + | | | | + | | | ``MOV X1, X3`` | + | | | | + | | | ``SWPP X0, X1, [X4]`` | +---------------------------------+-------------+--------------------------------------+ - | ``exchange(loc,val,acq)`` | ``BASE`` | | ``loop:`` | - | | | | ``LDAXP X0, X1, [X4]`` | - | | | | ``STXP W5, X2, X3, [X4]`` | - | | | | ``CBNZ W5, loop`` | - + +-------------+--------------------------------------+ - | | ``LSE`` | | ``LDP X0, X1, [X4]`` | - | | | | ``loop:`` | - | | | | ``MOV X6, X0`` | - | | | | ``MOV X7, X1`` | - | | | | ``CASPA X0, X1, X2, X3, [X4]`` | - | | | | ``CMP X0, X6`` | - | | | | ``CCMP X1, X7, 0, EQ`` | - | | | | ``B.NE loop`` | - + +-------------+--------------------------------------+ - | | ``LSE128`` | | ``MOV X0, X2`` | - | | | | ``MOV X1, X3`` | - | | | | ``SWPPA X0, X1, [X4]`` | + | ``exchange(loc,val,acq)`` | ``BASE`` | ``loop:`` | + | | | ``LDAXP X0, X1, [X4]`` | + | | | | + | | | ``STXP W5, X2, X3, [X4]`` | + | | | | + | | | ``CBNZ W5, loop`` | + | +-------------+--------------------------------------+ + | | ``LSE`` | ``LDP X0, X1, [X4]`` | + | | | | + | | | ``loop:`` | + | | | ``MOV X6, X0`` | + | | | | + | | | ``MOV X7, X1`` | + | | | | + | | | ``CASPA X0, X1, X2, X3, [X4]`` | + | | | | + | | | ``CMP X0, X6`` | + | | | | + | | | ``CCMP X1, X7, 0, EQ`` | + | | | | + | | | ``B.NE loop`` | + | +-------------+--------------------------------------+ + | | ``LSE128`` | ``MOV X0, X2`` | + | | | | + | | | ``MOV X1, X3`` | + | | | | + | | | ``SWPPA X0, X1, [X4]`` | +---------------------------------+-------------+--------------------------------------+ - | ``exchange(loc,val,rel)`` | ``BASE`` | | ``loop:`` | - | | | | ``LDXP X0, X1, [X4]`` | - | | | | ``STLXP W5, X2, X3, [X4]`` | - | | | | ``CBNZ W5, loop`` | - + +-------------+--------------------------------------+ - | | ``LSE`` | | ``LDP X0, X1, [X4]`` | - | | | | ``loop:`` | - | | | | ``MOV X6, X0`` | - | | | | ``MOV X7, X1`` | - | | | | ``CASPL X0, X1, X2, X3, [X4]`` | - | | | | ``CMP X0, X6`` | - | | | | ``CCMP X1, X7, 0, EQ`` | - | | | | ``B.NE loop`` | - + +-------------+--------------------------------------+ - | | ``LSE128`` | | ``MOV X0, X2`` | - | | | | ``MOV X1, X3`` | - | | | | ``SWPPL X0, X1, [X4]`` | + | ``exchange(loc,val,rel)`` | ``BASE`` | ``loop:`` | + | | | ``LDXP X0, X1, [X4]`` | + | | | | + | | | ``STLXP W5, X2, X3, [X4]`` | + | | | | + | | | ``CBNZ W5, loop`` | + | +-------------+--------------------------------------+ + | | ``LSE`` | ``LDP X0, X1, [X4]`` | + | | | | + | | | ``loop:`` | + | | | ``MOV X6, X0`` | + | | | | + | | | ``MOV X7, X1`` | + | | | | + | | | ``CASPL X0, X1, X2, X3, [X4]`` | + | | | | + | | | ``CMP X0, X6`` | + | | | | + | | | ``CCMP X1, X7, 0, EQ`` | + | | | | + | | | ``B.NE loop`` | + | +-------------+--------------------------------------+ + | | ``LSE128`` | ``MOV X0, X2`` | + | | | | + | | | ``MOV X1, X3`` | + | | | | + | | | ``SWPPL X0, X1, [X4]`` | +---------------------------------+-------------+--------------------------------------+ - | ``exchange(loc,val,acq_rel)`` | ``BASE`` | | ``loop:`` | - | ``exchange(loc,val,sc)`` | | | ``LDAXP X0, X1, [X4]`` | - | | | | ``STLXP W5, X2, X3, [X4]`` | - | | | | ``CBNZ W5, loop`` | - + +-------------+--------------------------------------+ - | | ``LSE`` | | ``LDP X0, X1, [X4]`` | - | | | | ``loop:`` | - | | | | ``MOV X6, X0`` | - | | | | ``MOV X7, X1`` | - | | | | ``CASPAL X0, X1, X2, X3, [X4]`` | - | | | | ``CMP X0, X6`` | - | | | | ``CCMP X1, X7, 0, EQ`` | - | | | | ``B.NE loop`` | - + +-------------+--------------------------------------+ - | | ``LSE128`` | | ``MOV X0, X2`` | - | | | | ``MOV X1, X3`` | - | | | | ``SWPPAL X0, X1, [X4]`` | + | ``exchange(loc,val,acq_rel)`` | ``BASE`` | ``loop:`` | + | ``exchange(loc,val,sc)`` | | ``LDAXP X0, X1, [X4]`` | + | | | | + | | | ``STLXP W5, X2, X3, [X4]`` | + | | | | + | | | ``CBNZ W5, loop`` | + | +-------------+--------------------------------------+ + | | ``LSE`` | ``LDP X0, X1, [X4]`` | + | | | | + | | | ``loop:`` | + | | | ``MOV X6, X0`` | + | | | | + | | | ``MOV X7, X1`` | + | | | | + | | | ``CASPAL X0, X1, X2, X3, [X4]`` | + | | | | + | | | ``CMP X0, X6`` | + | | | | + | | | ``CCMP X1, X7, 0, EQ`` | + | | | | + | | | ``B.NE loop`` | + | +-------------+--------------------------------------+ + | | ``LSE128`` | ``MOV X0, X2`` | + | | | | + | | | ``MOV X1, X3`` | + | | | | + | | | ``SWPPAL X0, X1, [X4]`` | +---------------------------------+-------------+--------------------------------------+ - | ``fetch_add(loc,val,relaxed)`` | ``BASE`` | | ``loop:`` | - | | | | ``LDXP X0, X1, [X4]`` | - | | | | ``ADDS X0, X0, X2`` | - | | | | ``ADC X1, X1, X3`` | - | | | | ``STXP W5, X2, X3, [X4]`` | - | | | | ``CBNZ W5, loop`` | - + +-------------+--------------------------------------+ - | | ``LSE`` | | ``LDP X0, X1, [X4]`` | - | | | | ``loop:`` | - | | | | ``MOV X6, X0`` | - | | | | ``MOV X7, X1`` | - | | | | ``ADDS X8, X0, X2`` | - | | | | ``ADC X9, X1, X3`` | - | | | | ``CASP X0, X1, X8, X9, [X4]`` | - | | | | ``CMP X0, X6`` | - | | | | ``CCMP X1, X7, 0, EQ`` | - | | | | ``B.NE loop`` | + | ``fetch_add(loc,val,relaxed)`` | ``BASE`` | ``loop:`` | + | | | ``LDXP X0, X1, [X4]`` | + | | | | + | | | ``ADDS X0, X0, X2`` | + | | | | + | | | ``ADC X1, X1, X3`` | + | | | | + | | | ``STXP W5, X2, X3, [X4]`` | + | | | | + | | | ``CBNZ W5, loop`` | + | +-------------+--------------------------------------+ + | | ``LSE`` | ``LDP X0, X1, [X4]`` | + | | | | + | | | ``loop:`` | + | | | ``MOV X6, X0`` | + | | | | + | | | ``MOV X7, X1`` | + | | | | + | | | ``ADDS X8, X0, X2`` | + | | | | + | | | ``ADC X9, X1, X3`` | + | | | | + | | | ``CASP X0, X1, X8, X9, [X4]`` | + | | | | + | | | ``CMP X0, X6`` | + | | | | + | | | ``CCMP X1, X7, 0, EQ`` | + | | | | + | | | ``B.NE loop`` | +---------------------------------+-------------+--------------------------------------+ - | ``fetch_add(loc,val,acq)`` | ``BASE`` | | ``loop:`` | - | | | | ``LDAXP X0, X1, [X4]`` | - | | | | ``ADDS X0, X0, X2`` | - | | | | ``ADC X1, X1, X3`` | - | | | | ``STXP W5, X2, X3, [X4]`` | - | | | | ``CBNZ W5, loop`` | - + +-------------+--------------------------------------+ - | | ``LSE`` | | ``LDP X0, X1, [X4]`` | - | | | | ``loop:`` | - | | | | ``MOV X6, X0`` | - | | | | ``MOV X7, X1`` | - | | | | ``ADDS X8, X0, X2`` | - | | | | ``ADC X9, X1, X3`` | - | | | | ``CASPA X0, X1, X8, X9, [X4]`` | - | | | | ``CMP X0, X6`` | - | | | | ``CCMP X1, X7, 0, EQ`` | - | | | | ``B.NE loop`` | + | ``fetch_add(loc,val,acq)`` | ``BASE`` | ``loop:`` | + | | | ``LDAXP X0, X1, [X4]`` | + | | | | + | | | ``ADDS X0, X0, X2`` | + | | | | + | | | ``ADC X1, X1, X3`` | + | | | | + | | | ``STXP W5, X2, X3, [X4]`` | + | | | | + | | | ``CBNZ W5, loop`` | + | +-------------+--------------------------------------+ + | | ``LSE`` | ``LDP X0, X1, [X4]`` | + | | | | + | | | ``loop:`` | + | | | ``MOV X6, X0`` | + | | | | + | | | ``MOV X7, X1`` | + | | | | + | | | ``ADDS X8, X0, X2`` | + | | | | + | | | ``ADC X9, X1, X3`` | + | | | | + | | | ``CASPA X0, X1, X8, X9, [X4]`` | + | | | | + | | | ``CMP X0, X6`` | + | | | | + | | | ``CCMP X1, X7, 0, EQ`` | + | | | | + | | | ``B.NE loop`` | +---------------------------------+-------------+--------------------------------------+ - | ``fetch_add(loc,val,rel)`` | ``BASE`` | | ``loop:`` | - | | | | ``LDXP X0, X1, [X4]`` | - | | | | ``ADDS X0, X0, X2`` | - | | | | ``ADC X1, X1, X3`` | - | | | | ``STLXP W5, X2, X3, [X4]`` | - | | | | ``CBNZ W5, loop`` | - + +-------------+--------------------------------------+ - | | ``LSE`` | | ``LDP X0, X1, [X4]`` | - | | | | ``loop:`` | - | | | | ``MOV X6, X0`` | - | | | | ``MOV X7, X1`` | - | | | | ``ADDS X8, X0, X2`` | - | | | | ``ADC X9, X1, X3`` | - | | | | ``CASPL X0, X1, X8, X9, [X4]`` | - | | | | ``CMP X0, X6`` | - | | | | ``CCMP X1, X7, 0, EQ`` | - | | | | ``B.NE loop`` | + | ``fetch_add(loc,val,rel)`` | ``BASE`` | ``loop:`` | + | | | ``LDXP X0, X1, [X4]`` | + | | | | + | | | ``ADDS X0, X0, X2`` | + | | | | + | | | ``ADC X1, X1, X3`` | + | | | | + | | | ``STLXP W5, X2, X3, [X4]`` | + | | | | + | | | ``CBNZ W5, loop`` | + | +-------------+--------------------------------------+ + | | ``LSE`` | ``LDP X0, X1, [X4]`` | + | | | | + | | | ``loop:`` | + | | | ``MOV X6, X0`` | + | | | | + | | | ``MOV X7, X1`` | + | | | | + | | | ``ADDS X8, X0, X2`` | + | | | | + | | | ``ADC X9, X1, X3`` | + | | | | + | | | ``CASPL X0, X1, X8, X9, [X4]`` | + | | | | + | | | ``CMP X0, X6`` | + | | | | + | | | ``CCMP X1, X7, 0, EQ`` | + | | | | + | | | ``B.NE loop`` | +---------------------------------+-------------+--------------------------------------+ - | ``fetch_add(loc,val,acq_rel)`` | ``BASE`` | | ``loop:`` | - | ``fetch_add(loc,val,sc)`` | | | ``LDAXP X0, X1, [X4]`` | - | | | | ``ADDS X0, X0, X2`` | - | | | | ``ADC X1, X1, X3`` | - | | | | ``STXLP W5, X2, X3, [X4]`` | - | | | | ``CBNZ W5, loop`` | - + +-------------+--------------------------------------+ - | | ``LSE`` | | ``LDP X0, X1, [X4]`` | - | | | | ``loop:`` | - | | | | ``MOV X6, X0`` | - | | | | ``MOV X7, X1`` | - | | | | ``ADDS X8, X0, X2`` | - | | | | ``ADC X9, X1, X3`` | - | | | | ``CASPAL X0, X1, X8, X9, [X4]`` | - | | | | ``CMP X0, X6`` | - | | | | ``CCMP X1, X7, 0, EQ`` | - | | | | ``B.NE loop`` | + | ``fetch_add(loc,val,acq_rel)`` | ``BASE`` | ``loop:`` | + | ``fetch_add(loc,val,sc)`` | | ``LDAXP X0, X1, [X4]`` | + | | | | + | | | ``ADDS X0, X0, X2`` | + | | | | + | | | ``ADC X1, X1, X3`` | + | | | | + | | | ``STXLP W5, X2, X3, [X4]`` | + | | | | + | | | ``CBNZ W5, loop`` | + | +-------------+--------------------------------------+ + | | ``LSE`` | ``LDP X0, X1, [X4]`` | + | | | | + | | | ``loop:`` | + | | | ``MOV X6, X0`` | + | | | | + | | | ``MOV X7, X1`` | + | | | | + | | | ``ADDS X8, X0, X2`` | + | | | | + | | | ``ADC X9, X1, X3`` | + | | | | + | | | ``CASPAL X0, X1, X8, X9, [X4]`` | + | | | | + | | | ``CMP X0, X6`` | + | | | | + | | | ``CCMP X1, X7, 0, EQ`` | + | | | | + | | | ``B.NE loop`` | +---------------------------------+-------------+--------------------------------------+ - | ``fetch_or(loc,val,relaxed)`` | ``LSE128`` | | ``MOV X0, X2`` | - | | | | ``MOV X1, X3`` | - | | | | ``LDSETP X0, X1, [X4]`` | + | ``fetch_or(loc,val,relaxed)`` | ``LSE128`` | ``MOV X0, X2`` | + | | | | + | | | ``MOV X1, X3`` | + | | | | + | | | ``LDSETP X0, X1, [X4]`` | +---------------------------------+-------------+--------------------------------------+ - | ``fetch_or(loc,val,acq)`` | ``LSE128`` | | ``MOV X0, X2`` | - | | | | ``MOV X1, X3`` | - | | | | ``LDSETPA X0, X1, [X4]`` | + | ``fetch_or(loc,val,acq)`` | ``LSE128`` | ``MOV X0, X2`` | + | | | | + | | | ``MOV X1, X3`` | + | | | | + | | | ``LDSETPA X0, X1, [X4]`` | +---------------------------------+-------------+--------------------------------------+ - | ``fetch_or(loc,val,rel)`` | ``LSE128`` | | ``MOV X0, X2`` | - | | | | ``MOV X1, X3`` | - | | | | ``LDSETPL X0, X1, [X4]`` | + | ``fetch_or(loc,val,rel)`` | ``LSE128`` | ``MOV X0, X2`` | + | | | | + | | | ``MOV X1, X3`` | + | | | | + | | | ``LDSETPL X0, X1, [X4]`` | +---------------------------------+-------------+--------------------------------------+ - | ``fetch_or(loc,val,acq_rel)`` | ``LSE128`` | | ``MOV X0, X2`` | - | ``fetch_or(loc,val,sc)`` | | | ``MOV X1, X3`` | - | | | | ``LDSETPAL X0, X1, [X4]`` | + | ``fetch_or(loc,val,acq_rel)`` | ``LSE128`` | ``MOV X0, X2`` | + | ``fetch_or(loc,val,sc)`` | | | + | | | ``MOV X1, X3`` | + | | | | + | | | ``LDSETPAL X0, X1, [X4]`` | +---------------------------------+-------------+--------------------------------------+ - | ``fetch_and(loc,val,relaxed)`` | ``LSE128`` | | ``MVN X0, X2`` | - | | | | ``MVN X1, X3`` | - | | | | ``LDCLRP X0, X1, [X4]`` | + | ``fetch_and(loc,val,relaxed)`` | ``LSE128`` | ``MVN X0, X2`` | + | | | | + | | | ``MVN X1, X3`` | + | | | | + | | | ``LDCLRP X0, X1, [X4]`` | +---------------------------------+-------------+--------------------------------------+ - | ``fetch_and(loc,val,acq)`` | ``LSE128`` | | ``MVN X0, X2`` | - | | | | ``MVN X1, X3`` | - | | | | ``LDCLRPA X0, X1, [X4]`` | + | ``fetch_and(loc,val,acq)`` | ``LSE128`` | ``MVN X0, X2`` | + | | | | + | | | ``MVN X1, X3`` | + | | | | + | | | ``LDCLRPA X0, X1, [X4]`` | +---------------------------------+-------------+--------------------------------------+ - | ``fetch_and(loc,val,rel)`` | ``LSE128`` | | ``MVN X0, X2`` | - | | | | ``MVN X1, X3`` | - | | | | ``LDCLRPL X0, X1, [X4]`` | + | ``fetch_and(loc,val,rel)`` | ``LSE128`` | ``MVN X0, X2`` | + | | | | + | | | ``MVN X1, X3`` | + | | | | + | | | ``LDCLRPL X0, X1, [X4]`` | +---------------------------------+-------------+--------------------------------------+ - | ``fetch_and(loc,val,acq_rel)`` | ``LSE128`` | | ``MVN X0, X2`` | - | ``fetch_and(loc,val,sc)`` | | | ``MVN X1, X3`` | - | | | | ``LDCLRPAL X0, X1, [X4]`` | + | ``fetch_and(loc,val,acq_rel)`` | ``LSE128`` | ``MVN X0, X2`` | + | ``fetch_and(loc,val,sc)`` | | | + | | | ``MVN X1, X3`` | + | | | | + | | | ``LDCLRPAL X0, X1, [X4]`` | +---------------------------------+-------------+--------------------------------------+ - | ``compare_exchange_strong(`` | ``BASE`` | | ``loop:`` | - | ``loc,&exp,val,relaxed,`` + + | ``LDXP X6, x7, [X4]`` + - | ``relaxed)`` + + | ``CMP X6, X0`` + - + | | | ``CCMP X7, X1, 0, EQ`` | - | + + | ``CSEL X8, X2, X6, EQ`` + - | + + | ``CSEL X9, X3, X7, EQ`` + - | + + | ``STXP W5, X8, X9, [X4]`` + - | + + | ``CBNZ W5, loop`` + - + + + | ``MOV X0, X6`` + - + + + | ``MOV X1, X7`` + - + +-------------+--------------------------------------+ + | ``compare_exchange_strong(`` | ``BASE`` | ``loop:`` | + | ``loc,&exp,val,relaxed,`` | | ``LDXP X6, x7, [X4]`` | + | ``relaxed)`` | | | + | | | ``CMP X6, X0`` | + | | | | + | | | ``CCMP X7, X1, 0, EQ`` | + | | | | + | | | ``CSEL X8, X2, X6, EQ`` | + | | | | + | | | ``CSEL X9, X3, X7, EQ`` | + | | | | + | | | ``STXP W5, X8, X9, [X4]`` | + | | | | + | | | ``CBNZ W5, loop`` | + | | | | + | | | ``MOV X0, X6`` | + | | | | + | | | ``MOV X1, X7`` | + | +-------------+--------------------------------------+ | | ``LSE`` | ``CASP X0, X1, X2, X3, [X4]`` | +---------------------------------+-------------+--------------------------------------+ - | ``compare_exchange_strong(`` | ``BASE`` | | ``loop:`` | - | ``loc,&exp,val,acq, acq)`` + + | ``LDAXP X6, x7, [X4]`` + - | + + | ``CMP X6, X0`` + - + | | | ``CCMP X7, X1, 0, EQ`` | - | + + | ``CSEL X8, X2, X6, EQ`` + - | + + | ``CSEL X9, X3, X7, EQ`` + - | + + | ``STXP W5, X8, X9, [X4]`` + - | + + | ``CBNZ W5, loop`` + - + + + | ``MOV X0, X6`` + - + + + | ``MOV X1, X7`` + - + +-------------+--------------------------------------+ + | ``compare_exchange_strong(`` | ``BASE`` | ``loop:`` | + | ``loc,&exp,val,acq, acq)`` | | ``LDAXP X6, x7, [X4]`` | + | | | | + | | | ``CMP X6, X0`` | + | | | | + | | | ``CCMP X7, X1, 0, EQ`` | + | | | | + | | | ``CSEL X8, X2, X6, EQ`` | + | | | | + | | | ``CSEL X9, X3, X7, EQ`` | + | | | | + | | | ``STXP W5, X8, X9, [X4]`` | + | | | | + | | | ``CBNZ W5, loop`` | + | | | | + | | | ``MOV X0, X6`` | + | | | | + | | | ``MOV X1, X7`` | + | +-------------+--------------------------------------+ | | ``LSE`` | ``CASPA X0, X1, X2, X3, [X4]`` | +---------------------------------+-------------+--------------------------------------+ - | ``compare_exchange_strong(`` | ``BASE`` | | ``loop:`` | - | ``loc,&exp,val,rel,rel)`` + + | ``LDXP X6, x7, [X4]`` + - | + + | ``CMP X6, X0`` + - + | | | ``CCMP X7, X1, 0, EQ`` | - | + + | ``CSEL X8, X2, X6, EQ`` + - | + + | ``CSEL X9, X3, X7, EQ`` + - | + + | ``STLXP W5, X8, X9, [X4]`` + - | + + | ``CBNZ W5, loop`` + - + + + | ``MOV X0, X6`` + - + + + | ``MOV X1, X7`` + - + +-------------+--------------------------------------+ + | ``compare_exchange_strong(`` | ``BASE`` | ``loop:`` | + | ``loc,&exp,val,rel,rel)`` | | ``LDXP X6, x7, [X4]`` | + | | | | + | | | ``CMP X6, X0`` | + | | | | + | | | ``CCMP X7, X1, 0, EQ`` | + | | | | + | | | ``CSEL X8, X2, X6, EQ`` | + | | | | + | | | ``CSEL X9, X3, X7, EQ`` | + | | | | + | | | ``STLXP W5, X8, X9, [X4]`` | + | | | | + | | | ``CBNZ W5, loop`` | + | | | | + | | | ``MOV X0, X6`` | + | | | | + | | | ``MOV X1, X7`` | + | +-------------+--------------------------------------+ | | ``LSE`` | ``CASPL X0, X1, X2, X3, [X4]`` | +---------------------------------+-------------+--------------------------------------+ - | ``compare_exchange_strong(`` | ``BASE`` | | ``loop:`` | - | ``loc,&exp,val,acq_rel,acq)`` + + | ``LDAXP X6, x7, [X4]`` + - | + + | ``CMP X6, X0`` + - + ``compare_exchange_strong(`` | | | ``CCMP X7, X1, 0, EQ`` | - | ``loc,&exp,val,sc,sc)`` + + | ``CSEL X8, X2, X6, EQ`` + - | + + | ``CSEL X9, X3, X7, EQ`` + - | + + | ``STLXP W5, X8, X9, [X4]`` + - | + + | ``CBNZ W5, loop`` + - + + + | ``MOV X0, X6`` + - + + + | ``MOV X1, X7`` + - + +-------------+--------------------------------------+ + | ``compare_exchange_strong(`` | ``BASE`` | ``loop:`` | + | ``loc,&exp,val,acq_rel,acq)`` | | ``LDAXP X6, x7, [X4]`` | + | ``compare_exchange_strong(`` | | | + | ``loc,&exp,val,sc,sc)`` | | ``CMP X6, X0`` | + | | | | + | | | ``CCMP X7, X1, 0, EQ`` | + | | | | + | | | ``CSEL X8, X2, X6, EQ`` | + | | | | + | | | ``CSEL X9, X3, X7, EQ`` | + | | | | + | | | ``STLXP W5, X8, X9, [X4]`` | + | | | | + | | | ``CBNZ W5, loop`` | + | | | | + | | | ``MOV X0, X6`` | + | | | | + | | | ``MOV X1, X7`` | + | +-------------+--------------------------------------+ | | ``LSE`` | ``CASPAL X0, X1, X2, X3, [X4]`` | +---------------------------------+-------------+--------------------------------------+ -We do not list other variants of ``fetch_`` since their mappings should be +We do not list other variants of ``fetch_`` since their Mappings should be the same (modulo implementations of that are not in scope of this -document). Precisely implementations that use loops should use the instructions +document). Precisely, implementations that use loops should use the instructions that load or store from memory with the relevant memory order, and the appropriate Assembly Sequence inside the loop. Exceptions, where Assembly Sequences exist, are stated (for instance ``fetch_or`` can be implemented using @@ -936,12 +1158,156 @@ Special Cases ------------- There are special cases in the Mappings presented above, these must be handled -in order to prevent unexpected outcomes of the compiled program. +in order to prevent unexpected outcomes of the compiled program. The special +cases are identified below. + +* Re-Ordering of Read-Modify-Write Effects and Acquire Fence +* Const-Qualified 128-bit Atomic Loads + +Destination Register Should Not Be Zero Register for Read-Modify-Writes +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +A compiler is not permitted to rewrite the destination register to be the +zero register for atomic operations that make use of ``SWP`` and ``LD`` +Assembly instructions. These include but are not limited to: + +.. table:: + + +-----------------------------------------+--------------------------------------+ + | Atomic Operation | Assembly Sequence | + +=========================================+======================================+ + | ``exchange(loc,val,sc)`` | ``MOV W4, #val;`` | + | | ``SWP W4, W10, [X1]`` | + +-----------------------------------------+--------------------------------------+ + | ``fetch_add(loc,val,sc)`` | ``MOV W4, #val;`` | + | | ``LDADD W4, W10, [X1]`` | + +-----------------------------------------+--------------------------------------+ + +Where ``X1`` contains the address of ``loc``. + +We annotate Mappings affected with ``*`` in section 4.2. + +Please refer to +`Appendix: Read-Modify-Write Destination Register Semantics`_ for information on why +this example must be documented. + +Const-Qualified 128-bit Atomic Loads Should Be Marked Mutable +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Const-qualified data containing 128-bit atomic types should not be placed +in read-only memory (such as the ``.rodata`` section). + +Before LSE2, the only way to implement a single-copy 128-bit atomic load +is by using a Read-Modify-Write sequence. The write is not visible to +software if the memory is writeable. Compilers and runtimes should use the +LSE2/LRCPC3 sequence when available. + + +Declarative statement of Mappings compatibility +=============================================== + +To ensure that the above Mappings are ABI-compatible we tested the compilation of +Concurrent Programs, where each Atomic Operation is compiled to one of the +aforementioned Mappings. We test if there is a compiled program that exhibits +an outcome of execution according to the AArch64 Memory Model contained in §B2 +of the Arm Architecture Reference Manual [ARMARM_] that is not an outcome of +execution of the source program under the ISO C model. In this section we +define the process by which we test compatibility. Please refer to +`Appendix: Mix Testing`_ for information on how ABI-compatibility is tested. + + +Definition of ABI-Compatibility for Atomic Operations +----------------------------------------------------- + +*A compiler that implements the above set of Mappings is ABI-Compatible with +respect to other compilers that implement the Mappings, if Mix Testing their +code generation finds no Compiler Bugs.* + +We impose some constraints on this definition: + +* This is not a correctness guarantee, but rather a statement backed up by + bounded testing. C/C++ Atomics ABI-compatibility is thus tested for the Mappings + above by generating C/C++ Concurrent Programs that permute combinations of + Atomic Operations on each Thread of Execution. We bound our test size between + 2 and 5 Threads of Execution, where each Thread has at least 1 Atomic + Operation or Synchronization Operation and at most 5 Atomic Operations or + Synchronization Operations. We do not make any statement about the + ABI-Compatibility of Concurrent Programs outside these bounds. +* We test Concurrent Programs with a fixed initial state, loop unroll factor + (equal to 1 loop unroll), and function calls or recursion. +* The above Mappings are not exhaustive, we recommend that Arm's partners + submit requests for other Mappings to the ABI team using the `issue tracker page on GitHub `_. +* This document makes no statement about the ABI-Compatibility of optimised + Concurrent Programs, nor does a statement concerning the performance of + compiled programs under the above Mappings when executed on a given Arm-based + machine. +* This document makes no statement about the ABI-Compatibility of compilers + that implement Mappings other than what is stated in this document. + +Appendix: Mix Testing +===================== + +The status of this appendix is informative. + + +The Mix Testing Process +----------------------- + +We test for Compiler bugs, a Compiler Bug is defined as an outcome of a +compiled program execution (under the AArch64 Memory Model contained in +§B2 of the Arm Architecture Reference Manual [ARMARM_]) that is not +an outcome of execution of the source Concurrent Program (under the +ISO C memory model). Consider the hypothetical example where a source +Concurrent Program finishes execution in one of three possible outcomes +(a reference for this notation is found here [PAPER_]):: + + { thread_0:r0=0, thread_1:r0=1 } + { thread_0:r0=1, thread_1:r0=0 } + { thread_0:r0=1, thread_1:r0=1 } + +and one possible compiled program outcome has the following according to the +AArch64 Memory Model contained in §B2 of the Arm Architecture Reference Manual +[ARMARM_]:: + + { thread_0:X3=0, thread_1:X3=0 } <--- Forbidden by source model, Compiler Bug! + { thread_0:X3=0, thread_1:X3=1 } + { thread_0:X3=1, thread_1:X3=0 } + { thread_0:X3=1, thread_1:X3=1 } + +By comparing ``X3`` and the local variable ``r0`` of the original Concurrent +Program in this example we see there is one additional outcome of executing the +compiled program that is not an outcome of executing the source program (under +the respective models). This suggests the Mappings under question are +incompatible, and a compiler that implements them exhibits a Compiler Bug. To +ensure compatibility we therefore test for the absence of such outcomes of the +compiled programs when mixing all combinations of the above Mappings. We define +the *Mix Testing* process as follows: + +#. Take an arbitrary Concurrent Program, when executed on the C/C++ memory + model will produce outcomes *S*. +#. Split out the individual Atomic Operations from the initial concurrent + program into individual source files. +#. Compile each individual source file containing an Atomic Operation + using each Compiler Profile under test that generates Assembly Sequences + under a given Mapping. +#. Combine the Assembly Sequences from above into *multiple* possible Compiled + Programs. +#. Compute the outcomes of each compiled program under the AArch64 Memory Model + contained in §B2 of the Arm Architecture Reference Manual [ARMARM_]. Get a + *set* of compiled program outcomes *C*. +#. If any compiled program set of outcomes *c* in *C* exhibits a Compiler Bug + (Check that *c* is a subset of *S*) with then the given Mappings are not + interoperable. + + +Appendix: Read-Modify-Write Destination Register Semantics +========================================================== -Re-Ordering of Read-Modify-Write Effects and Acquire Fence -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +We elaborate on why in the following example. -Consider the following Concurrent Program:: +Consider the following Concurrent Program: + +code-block:: // Shared-Memory Locations _Atomic int* x; @@ -967,16 +1333,16 @@ Consider the following Concurrent Program:: Under ISO C, the above Concurrent Program finishes execution in one of three -possible outcomes:: +possible outcomes (a reference for this notation is found here [PAPER_]):: { thread_1:r0=0; y=1; } { thread_1:r0=1; y=1; } { thread_1:r0=1; y=2; } In this case the value read by the exchange on ``thread_1`` is not used, and a -compiler is free to remove references to unused data. It is thus legal under -ISO C for a compiler to translate the program into the following Assembly -Sequences:: +compiler is free to remove references to unused data. It is not legal according +to this ABI for a compliant implementation piler to translate the program into +the following Assembly Sequences:: thread_0: MOV W9,#1 @@ -991,11 +1357,10 @@ Sequences:: LDR W3,[X4] where ``thread_0:X2`` contains the address of ``x``, ``thread_0:X4`` contains -the address of ``y``, and -``thread_1:X2`` contains the address of ``y``, ``thread_1:X4`` contains the -address of ``x``. +the address of ``y``, and ``thread_1:X2`` contains the address of ``y``, +``thread_1:X4`` contains the address of ``x``. -Note: the ``exchange`` Atomic Operation is compiled to a ``SWP`` Assembly +The ``exchange`` Atomic Operation is compiled to a ``SWP`` Assembly Instruction, where its destination register is the zero register ``WZR``. The ``acquire`` fence on ``thread_1`` is compiled to the ``DMB ISHLD`` Assembly Instruction. @@ -1011,16 +1376,15 @@ Reference Manual [ARMARM_]:: { thread_1:r0=1; [y]=2; } By comparing ``W3`` and the local variable ``r0`` of the original Concurrent -Program we see there is one additional Outcome of executing the compiled +Program we see there is one additional outcome of executing the compiled program that is not an outcome of executing the Concurrent Program. This is due to the fact that according to the Arm Architecture Reference Manual [ARMARM_] *instructions where the destination register is WZR or XZR, are not regarded as doing a read for the purpose of a DMB LD barrier.* -ISO C permits a conforming implementation to delete unused data, but in this -case it introduces another Outcome of Execution. To fix this issue, a compiler -should not rewrite the destination register to be the zero register in this -case:: +In this case the compiler introduces another outcome of Execution. To fix this +issue, a compiler is not permitted to rewrite the destination register to be the +zero register in this case:: thread_0: MOV W9,#1 @@ -1044,118 +1408,6 @@ Reference Manual [ARMARM_]:: { thread_1:r0=1; [y]=2; } As such the unexpected outcome has disappeared. There are multiple Mappings -that exhibit this behaviour, those effected make use of ``SWP`` and ``LD`` -Assembly instructions. These include but are not limited to: - -.. table:: - - +-----------------------------------------+--------------------------------------+ - | Atomic Operation | Assembly Sequence | - +=========================================+======================================+ - | ``exchange(loc,val,sc)`` | ``MOV W4, #val;`` | - | | ``SWP W4, W10, [X1]`` | - +-----------------------------------------+--------------------------------------+ - | ``fetch_add(loc,val,sc)`` | ``MOV W4, #val;`` | - | | ``LDADD W4, W10, [X1]`` | - +-----------------------------------------+--------------------------------------+ - -Where ``X1`` contains the address of ``loc``. - -Const-Qualified 128-bit Atomic Loads -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -Const-qualified data containing 128-bit atomic types should not be placed -in readonly memory (the ``.rodata`` section). - -Before LSE2, the only way to implement a single-copy 128-bit atomic load -is by using a Read-Modify-Write sequence. The write is not visible to -software if the memory is writeable. Compilers and runtimes should use the -LSE2/LRCPC3 sequence when available. - - -Declarative statement of Mappings compatibility -=============================================== - -To ensure that the above Mappings are ABI-compatible we test the compilation of -Concurrent Programs, where each Atomic Operation is compiled to one of the -aforementioned Mappings. We test if there is a compiled program that exhibits -an outcome of execution according to the AArch64 Memory Model contained in §B2 -of the Arm Architecture Reference Manual [ARMARM_] that is not an outcome of -execution of the source program under the ISO C model. In this section we -define the process by which we test compatibility. - -The Mix Testing Process ------------------------ - -We test for Compiler bugs, a compiler bug is defined as an Outcome of a -compiled program execution (under the AArch64 model) that is not an Outcome of -execution of the source Concurrent Program (under the ISO C model). Consider -the hypothetical example where a source Concurrent Program finishes execution -in one of three possible outcomes:: - - { thread_0:r0=0, thread_1:r0=1 } - { thread_0:r0=1, thread_1:r0=0 } - { thread_0:r0=1, thread_1:r0=1 } - -and one possible compiled program outcome has the following according to the -AArch64 Memory Model contained in §B2 of the Arm Architecture Reference Manual -[ARMARM_]:: - - { thread_0:X3=0, thread_1:X3=0 } <--- Forbidden by source model, compiler bug! - { thread_0:X3=0, thread_1:X3=1 } - { thread_0:X3=1, thread_1:X3=0 } - { thread_0:X3=1, thread_1:X3=1 } - -By comparing ``X3`` and the local variable ``r0`` of the original Concurrent -Program in this example we see there is one additional outcome of executing the -compiled program that is not an outcome of executing the source program (under -the respective models). This suggests the Mappings under question are -incompatible, and a compiler that implements them exhibits a compiler bug. To -ensure compatibility we therefore test for the absence of such Outcomes of the -compiled programs when mixing all combinations of the above Mappings. We define -the *Mix Testing* process as follows: - -#. Given a C/C++ Concurrent Program. -#. Split it into its representative Atomic Operations. -#. Compile each Atomic Operation separately using a Compiler Profile that - generates Assembly Sequences under a given Mapping. -#. Combine the Assembly Sequences into *multiple* possible Compiled Programs. -#. Compute the outcomes of executing the Source Concurrent Program under the - ISO C memory model. Get source program outcomes *S*. -#. Compute the outcomes of each compiled program under the AArch64 memory model - [ARMARM_]. Get a *set* of compiled program outcomes *C*. -#. If any *c* in *C* exhibits a compiler bug with respect to the outcomes *S* - then the given mappings are not interoperable. - -Using Mix Testing we now define ABI-Compatibility of Atomic Operations. - - -Definition of ABI-Compatibility for Atomic Operations ------------------------------------------------------ - -*A compiler that implements the above set of Mappings is ABI-Compatible with -respect to other compilers that implement the Mappings, if Mix Testing their -code generation finds no compiler bugs.* - -We impose some constraints on this definition: - -* This is not a correctness guarantee, but rather a statement backed up by - bounded testing. Atomics ABI-compatibility is thus tested for the Mappings - above by generating C/C++ Concurrent Programs that permute combinations of - Atomic Operations on each Thread of Execution. We bound our test size between - 2 and 5 Threads of Execution, where each Thread has at least 1 Atomic - Operation or Synchronization Operation and at most 5 Atomic Operations or - Synchronization Operations. We do not make any statement about the - ABI-Compatibility of Concurrent Programs outside these bounds. -* We test Concurrent Programs with a fixed initial state, loop unroll factor - (equal to 1 loop unroll), and function calls or recursion. -* The above Mappings are not exhaustive, We hope that Arm's partners will - submit requests for other Mappings to the ABI team using the issue tracker - page on GitHub. -* This document makes no statement about the ABI-Compatibility of optimised - Concurrent Programs, nor does a statement concerning the performance of - compiled programs under the above Mappings when executed on a given Arm-based - machine. -* This document makes no statement about the ABI-Compatibility of compilers - that implement Mappings other than what is stated in this document. +that exhibit this behaviour, those affected make use of ``SWP`` and ``LD`` +Assembly instructions. From 265eb01f2153be0eda22995c9f87bba2a01d0b82 Mon Sep 17 00:00:00 2001 From: lukeg101 <6547672+lukeg101@users.noreply.github.com> Date: Tue, 9 Jul 2024 17:17:01 +0100 Subject: [PATCH 03/17] Added design document --- atomicsabi64/atomicsabi64.rst | 176 +----------------- design-documents/atomics-ABI.rst | 308 +++++++++++++++++++++++++++++++ 2 files changed, 316 insertions(+), 168 deletions(-) create mode 100644 design-documents/atomics-ABI.rst diff --git a/atomicsabi64/atomicsabi64.rst b/atomicsabi64/atomicsabi64.rst index f0520c33..b92563d1 100644 --- a/atomicsabi64/atomicsabi64.rst +++ b/atomicsabi64/atomicsabi64.rst @@ -4,7 +4,7 @@ See LICENSE file for details .. |release| replace:: 2024Q1 -.. |date-of-issue| replace:: 5\ :sup:`th` April 2024 +.. |date-of-issue| replace:: 5\ :sup:`th` July 2024 .. |copyright-date| replace:: 2024 .. |footer| replace:: Copyright © |copyright-date|, Arm Limited and its affiliates. All rights reserved. @@ -80,6 +80,11 @@ This document came about in the process of Luke Geeson’s PhD on testing the compilation of concurrent C/C++ with assistance from Wilco Dijkstra from Arm's Compiler Teams. +This ABI arises from a paper to appear at OOPSLA 2024: +*Mix Testing: Specifying and Testing ABI Compatibility Of C/C++ Atomics Implementations* +by Luke Geeson, James Brotherston, Wilco Dijkstra, Alastair Donaldson, Lee Smith, +Tyler Sorensen, and John Wickerson. + Licence @@ -213,7 +218,7 @@ changes to the content of the document for that release. +---------+------------------------------+-------------------------------------------------------------------+ | Issue | Date | Change | +=========+==============================+===================================================================+ - | 00alp0 | 5\ :sup:`th` April 2024. | Alpha release. | + | 00alp0 | 5\ :sup:`th` July 2024. | Beta release. | +---------+------------------------------+-------------------------------------------------------------------+ @@ -1187,10 +1192,6 @@ Where ``X1`` contains the address of ``loc``. We annotate Mappings affected with ``*`` in section 4.2. -Please refer to -`Appendix: Read-Modify-Write Destination Register Semantics`_ for information on why -this example must be documented. - Const-Qualified 128-bit Atomic Loads Should Be Marked Mutable ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ @@ -1212,9 +1213,7 @@ aforementioned Mappings. We test if there is a compiled program that exhibits an outcome of execution according to the AArch64 Memory Model contained in §B2 of the Arm Architecture Reference Manual [ARMARM_] that is not an outcome of execution of the source program under the ISO C model. In this section we -define the process by which we test compatibility. Please refer to -`Appendix: Mix Testing`_ for information on how ABI-compatibility is tested. - +define the process by which we test compatibility. Definition of ABI-Compatibility for Atomic Operations ----------------------------------------------------- @@ -1250,164 +1249,5 @@ Appendix: Mix Testing The status of this appendix is informative. -The Mix Testing Process ------------------------ -We test for Compiler bugs, a Compiler Bug is defined as an outcome of a -compiled program execution (under the AArch64 Memory Model contained in -§B2 of the Arm Architecture Reference Manual [ARMARM_]) that is not -an outcome of execution of the source Concurrent Program (under the -ISO C memory model). Consider the hypothetical example where a source -Concurrent Program finishes execution in one of three possible outcomes -(a reference for this notation is found here [PAPER_]):: - - { thread_0:r0=0, thread_1:r0=1 } - { thread_0:r0=1, thread_1:r0=0 } - { thread_0:r0=1, thread_1:r0=1 } - -and one possible compiled program outcome has the following according to the -AArch64 Memory Model contained in §B2 of the Arm Architecture Reference Manual -[ARMARM_]:: - - { thread_0:X3=0, thread_1:X3=0 } <--- Forbidden by source model, Compiler Bug! - { thread_0:X3=0, thread_1:X3=1 } - { thread_0:X3=1, thread_1:X3=0 } - { thread_0:X3=1, thread_1:X3=1 } - -By comparing ``X3`` and the local variable ``r0`` of the original Concurrent -Program in this example we see there is one additional outcome of executing the -compiled program that is not an outcome of executing the source program (under -the respective models). This suggests the Mappings under question are -incompatible, and a compiler that implements them exhibits a Compiler Bug. To -ensure compatibility we therefore test for the absence of such outcomes of the -compiled programs when mixing all combinations of the above Mappings. We define -the *Mix Testing* process as follows: - -#. Take an arbitrary Concurrent Program, when executed on the C/C++ memory - model will produce outcomes *S*. -#. Split out the individual Atomic Operations from the initial concurrent - program into individual source files. -#. Compile each individual source file containing an Atomic Operation - using each Compiler Profile under test that generates Assembly Sequences - under a given Mapping. -#. Combine the Assembly Sequences from above into *multiple* possible Compiled - Programs. -#. Compute the outcomes of each compiled program under the AArch64 Memory Model - contained in §B2 of the Arm Architecture Reference Manual [ARMARM_]. Get a - *set* of compiled program outcomes *C*. -#. If any compiled program set of outcomes *c* in *C* exhibits a Compiler Bug - (Check that *c* is a subset of *S*) with then the given Mappings are not - interoperable. - - -Appendix: Read-Modify-Write Destination Register Semantics -========================================================== - -We elaborate on why in the following example. - -Consider the following Concurrent Program: - -code-block:: - - // Shared-Memory Locations - _Atomic int* x; - _Atomic int* y; - - // Memory Order Parameter - #define relaxed memory_order_relaxed - #define release memory_order_release - #define acquire memory_order_acquire - - // Threads of Execution - void thread_0 () { - atomic_store_explicit(x,1,relaxed); - atomic_thread_fence(release); - atomic_store_explicit(y,1,relaxed); - } - - void thread_1 () { - atomic_exchange_explicit(y,2,release); - atomic_thread_fence(acquire); - int r0 = atomic_load_explicit(x,relaxed); - } - - -Under ISO C, the above Concurrent Program finishes execution in one of three -possible outcomes (a reference for this notation is found here [PAPER_]):: - - { thread_1:r0=0; y=1; } - { thread_1:r0=1; y=1; } - { thread_1:r0=1; y=2; } - -In this case the value read by the exchange on ``thread_1`` is not used, and a -compiler is free to remove references to unused data. It is not legal according -to this ABI for a compliant implementation piler to translate the program into -the following Assembly Sequences:: - - thread_0: - MOV W9,#1 - STR W9,[X2] - DMB ISH - STR W3,[X4] - - thread_1: - MOV W9,#2 - SWP W9, WZR, [X2] - DMB ISHLD - LDR W3,[X4] - -where ``thread_0:X2`` contains the address of ``x``, ``thread_0:X4`` contains -the address of ``y``, and ``thread_1:X2`` contains the address of ``y``, -``thread_1:X4`` contains the address of ``x``. - -The ``exchange`` Atomic Operation is compiled to a ``SWP`` Assembly -Instruction, where its destination register is the zero register ``WZR``. The -``acquire`` fence on ``thread_1`` is compiled to the ``DMB ISHLD`` Assembly -Instruction. - -Executing the compiled program on an Arm-based machine from a fixed initial -state (where ``x`` and ``y`` are ``0``) produces one of the following outcomes, -according to the AArch64 Memory Model contained in §B2 of the Arm Architecture -Reference Manual [ARMARM_]:: - - { thread_1:r0=0; [y]=1; } - { thread_1:r0=0; [y]=2; } <-- Forbidden by source model, a bug! - { thread_1:r0=1; [y]=1; } - { thread_1:r0=1; [y]=2; } - -By comparing ``W3`` and the local variable ``r0`` of the original Concurrent -Program we see there is one additional outcome of executing the compiled -program that is not an outcome of executing the Concurrent Program. This is due -to the fact that according to the Arm Architecture Reference Manual [ARMARM_] -*instructions where the destination register is WZR or XZR, are not regarded as -doing a read for the purpose of a DMB LD barrier.* - -In this case the compiler introduces another outcome of Execution. To fix this -issue, a compiler is not permitted to rewrite the destination register to be the -zero register in this case:: - - thread_0: - MOV W9,#1 - STR W9,[X2] - DMB ISH - STR W3,[X4] - - thread_1: - MOV W9,#2 - SWP W9, W10, [X2] - DMB ISHLD - LDR W3,[X4] - -Executing the compiled program on an Arm-based machine from a fixed initial -state (where ``x`` and ``y`` are ``0``) produces one of the following outcomes, -according to the AArch64 Memory Model contained in §B2 of the Arm Architecture -Reference Manual [ARMARM_]:: - - { thread_1:r0=0; [y]=1; } - { thread_1:r0=1; [y]=1; } - { thread_1:r0=1; [y]=2; } - -As such the unexpected outcome has disappeared. There are multiple Mappings -that exhibit this behaviour, those affected make use of ``SWP`` and ``LD`` -Assembly instructions. diff --git a/design-documents/atomics-ABI.rst b/design-documents/atomics-ABI.rst new file mode 100644 index 00000000..0b4f890c --- /dev/null +++ b/design-documents/atomics-ABI.rst @@ -0,0 +1,308 @@ +.. + Copyright (c) 2023, Arm Limited and its affiliates. All rights reserved. + CC-BY-SA-4.0 AND Apache-Patent-License + See LICENSE file for details + +.. _ARMARM: https://developer.arm.com/documentation/ddi0487/latest +.. _PAPER: https://doi.org/10.1109/CGO57630.2024.10444836 + +Rationale Document for C11 Atomics ABI. +*************************************** + +Preamble +======== + +Background +---------- + +This document describes the rationale behind the ABI choices made for mapping +from C11 atomic operations to Arm AArch64 assembly sequences. + +From the perspective of the Arm ABI we have some decisions to +make: + +- We need to choose a baseline ABI (a set of mappings), that is compatible for all versions of the Armv8 architecture. +- The mappings should cover atomic accesses of various sign, size, and type accessible through C11 atomic operations using compiler profiles. + +The main trade-offs we have identified or have been made aware of are: + +- Performance of different mappings versus compatibility with all architectures. +- Whether certain compiler operations lead to unexpected behaviours. + +As motivated by the use cases expanded upon below: + +- The need for a baseline ABI +- Knowing when an implementation departs from that baseline +- Backwards compatibility of atomics as new mappings are added +- Compatibility between compilers and runtimes +- The need to constrain optimisations on specific atomic operations +- Documenting the interoperable mappings +- providing a basis upon which ABI compatibility can be tested. + +References +---------- + +This document refers to, or is referred to by, the following documents. + +.. table:: + + +-------------+--------------------------------------------------------------+-----------------------------------------------------------------------------+ + | Ref | External reference or URL | Title | + +=============+==============================================================+=============================================================================+ + | ARMARM_ | DDI 0487 | Arm Architecture Reference Manual Armv8 for Armv8-A architecture profile | + +-------------+--------------------------------------------------------------+-----------------------------------------------------------------------------+ + | PAPER_ | CGO paper | Compiler Testing with Relaxed Memory Models | + +-------------+--------------------------------------------------------------+-----------------------------------------------------------------------------+ + + + +Note: At the time of writing C23 is not released, as such ISO C17 is considered +the latest published document. + +Use-cases known of so far +------------------------- + + +A Baseline: Describing current implementations +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The ABI we provide is a baseline specification that compilers should or do implement. +The ABI provides a grounds to be compatible across all versions of the Armv8 architecture. Most +of the mappings in the ABI are already implemented in LLVM and GCC and this ABI ratifies +a decade of established practice, and provides alternatives where the current practice +is incompatible. + + +Sub-ABIs and ABI-islands: Departing from the baseline (or 'mainland') +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +We do *not* require that compilers implement this ABI. Implementers can specify their own +ABI, whether it is a subset of the allowed mappings of the baseline ABI (a sub-ABI), or +uses different mappings altogether (an ABI-island). Currently, sub-ABIs and ABI-islands implicitly +arise with each new architecture release, and implementers quickly find new candidate mappings +that are performant on their machines. Such mappings are proposed or added to mainstream +compilers. However due to the lack of a baseline specification or widespread +concurrency expertise, testing such mappings has been a challenge and concurrency bugs have been +unintentionally introduced into compilers when new mappings are added. + +We need a baseline ABI in order to determine if a given sub-ABI respects or departs +from the baseline. Adding command-line options is a logical consequence of defining such an ABI, +and makes it possible to track ABI compatibility of concurrent programs at compile or link-time, +rather than runtime. It is the responsibility of the sub-ABI maintainer to ensure code built +under their ABI does not mix with code built under the baseline. But a baseline must exist, +for sub-ABI compatibility to be decided in the first place. + +A baseline provides the means to describe or contain ABI-islands. Where a compiler implementation +departs from the baseline completely (an ABI-island), it would be the responsibility of the +maintainer of that implementation to ensure their programs are not mixed with programs built for +baseline ABI compatibility, or provide adequate warnings at compile time. + +Further, numerous different parties have asked the ABI team whether +the same atomics mapping is correct. Writing down the known cases helps engineers +answer these queries without the concurrency expertise required to come up with +current compatible mappings. A future section of the ABI could document common +queries received by the ABI team, in order to assist implementers and engineers +with such issues. + +Backwards Compatibility and New Architecture Features +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Put another way, a baseline ABI assists in deciding whether new mappings are compatible +with compiler implementations targeting older versions of the Armv8 architecture. +Certain instructions (such as Load/Store-Pair instructions [ARMARM_]) have different +single-copy atomicity guarantees with respect different architecture versions. A baseline +decides which assembly sequences can be composed correctly (at least as far as testing can decide). + + +Compatibility Between Compilers and Runtimes +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The above issues also apply when ensuring object files compiled with different compilers can be mixed. +For instance LLVM and GCC code should be interoperable. At the time of writing we identified a number of +places where this does not apply, both when compiling to target the same architecture version, and mixing +different (compatible) architecture versions. Further, the above is not limited to statically compiled code. We found +one instance where proposed mappings implemented in a JiT compiler would not be interoperable with respect +to the statically compiled code the runtime links against. Even if a JiT compiles under one set of mappings, +and is not subject to an ABI, it may still depend on other libraries or components that do have an ABI. + + +Constrain optimisations +~~~~~~~~~~~~~~~~~~~~~~~ + +There have been several instances where optimisations have been incorrectly applied, +or attempts to apply optimisations to atomic code generation that induce unexpected +concurrent program behaviour. This has happened frequently enough that we need to +collect these cases together to outline why they should not occur. For example + +Consider the following Concurrent Program:: + + // Shared-Memory Locations + _Atomic int* x; + _Atomic int* y; + + // Memory Order Parameter + #define relaxed memory_order_relaxed + #define release memory_order_release + #define acquire memory_order_acquire + + // Threads of Execution + void thread_0 () { + atomic_store_explicit(x,1,relaxed); + atomic_thread_fence(release); + atomic_store_explicit(y,1,relaxed); + } + + void thread_1 () { + atomic_exchange_explicit(y,2,release); + atomic_thread_fence(acquire); + int r0 = atomic_load_explicit(x,relaxed); + } + + +Under ISO C, the above Concurrent Program finishes execution in one of three +possible outcomes (a reference for this notation is found here [PAPER_]):: + + { thread_1:r0=0; y=1; } + { thread_1:r0=1; y=1; } + { thread_1:r0=1; y=2; } + +In this case the value read by the exchange on ``thread_1`` is not used, and a +compiler is free to remove references to unused data. It is not legal according +to this ABI for a compliant implementation piler to translate the program into +the following Assembly Sequences:: + + thread_0: + MOV W9,#1 + STR W9,[X2] + DMB ISH + STR W3,[X4] + + thread_1: + MOV W9,#2 + SWP W9, WZR, [X2] + DMB ISHLD + LDR W3,[X4] + +where ``thread_0:X2`` contains the address of ``x``, ``thread_0:X4`` contains +the address of ``y``, and ``thread_1:X2`` contains the address of ``y``, +``thread_1:X4`` contains the address of ``x``. + +The ``exchange`` Atomic Operation is compiled to a ``SWP`` Assembly +Instruction, where its destination register is the zero register ``WZR``. The +``acquire`` fence on ``thread_1`` is compiled to the ``DMB ISHLD`` Assembly +Instruction. + +Executing the compiled program on an Arm-based machine from a fixed initial +state (where ``x`` and ``y`` are ``0``) produces one of the following outcomes, +according to the AArch64 Memory Model contained in §B2 of the Arm Architecture +Reference Manual [ARMARM_]:: + + { thread_1:r0=0; [y]=1; } + { thread_1:r0=0; [y]=2; } <-- Forbidden by source model, a bug! + { thread_1:r0=1; [y]=1; } + { thread_1:r0=1; [y]=2; } + +By comparing ``W3`` and the local variable ``r0`` of the original Concurrent +Program we see there is one additional outcome of executing the compiled +program that is not an outcome of executing the Concurrent Program. This is due +to the fact that according to the Arm Architecture Reference Manual [ARMARM_] +*instructions where the destination register is WZR or XZR, are not regarded as +doing a read for the purpose of a DMB LD barrier.* + +In this case the compiler introduces another outcome of Execution. To fix this +issue, a compiler is not permitted to rewrite the destination register to be the +zero register in this case:: + + thread_0: + MOV W9,#1 + STR W9,[X2] + DMB ISH + STR W3,[X4] + + thread_1: + MOV W9,#2 + SWP W9, W10, [X2] + DMB ISHLD + LDR W3,[X4] + +Executing the compiled program on an Arm-based machine from a fixed initial +state (where ``x`` and ``y`` are ``0``) produces one of the following outcomes, +according to the AArch64 Memory Model contained in §B2 of the Arm Architecture +Reference Manual [ARMARM_]:: + + { thread_1:r0=0; [y]=1; } + { thread_1:r0=1; [y]=1; } + { thread_1:r0=1; [y]=2; } + +As such the unexpected outcome has disappeared. There are multiple Mappings +that exhibit this behaviour, those affected make use of ``SWP`` and ``LD`` +Assembly instructions. + +Documentation +~~~~~~~~~~~~~ + +The collective knowledge of atomics ABIs exists as numerous online discusions. +These discussions are neither authoritative nor persistent. Some discussions +are now inaccessible and others are out of date. This is problematic given the +inherent complexity of relaxed memory concurrency, the difficulty of finding bugs, +and the possibility of user error. We believe an ABI is necessary to document +this corner of code generation. + + +The Mix Testing Process +----------------------- + +ABI compatibility must be testable. Concurrency is not trivial, and the ABI +presents a simplification of part of the problem that is understandable by +engineers. We provide novel, yet simple, techniques and tools for +testing ABI compatibility. These techniques reduce the difficulty of checking +compatibility from a problem of understanding concurrent executions, to the +familiar testing domain of comparing program outcomes of tests. This document +does not preclude other means of testing compatibility however. + +We test for Compiler bugs, a Compiler Bug is defined as an outcome of a +compiled program execution (under the AArch64 Memory Model contained in +§B2 of the Arm Architecture Reference Manual [ARMARM_]) that is not +an outcome of execution of the source Concurrent Program (under the +ISO C memory model). Consider the hypothetical example where a source +Concurrent Program finishes execution in one of three possible outcomes +(a reference for this notation is found here [PAPER_]):: + + { thread_0:r0=0, thread_1:r0=1 } + { thread_0:r0=1, thread_1:r0=0 } + { thread_0:r0=1, thread_1:r0=1 } + +and one possible compiled program outcome has the following according to the +AArch64 Memory Model contained in §B2 of the Arm Architecture Reference Manual +[ARMARM_]:: + + { thread_0:X3=0, thread_1:X3=0 } <--- Forbidden by source model, Compiler Bug! + { thread_0:X3=0, thread_1:X3=1 } + { thread_0:X3=1, thread_1:X3=0 } + { thread_0:X3=1, thread_1:X3=1 } + +By comparing ``X3`` and the local variable ``r0`` of the original Concurrent +Program in this example we see there is one additional outcome of executing the +compiled program that is not an outcome of executing the source program (under +the respective models). This suggests the Mappings under question are +incompatible, and a compiler that implements them exhibits a Compiler Bug. To +ensure compatibility we therefore test for the absence of such outcomes of the +compiled programs when mixing all combinations of the above Mappings. We define +the *Mix Testing* process as follows: + +#. Take an arbitrary Concurrent Program, when executed on the C/C++ memory + model will produce outcomes *S*. +#. Split out the individual Atomic Operations from the initial concurrent + program into individual source files. +#. Compile each individual source file containing an Atomic Operation + using each Compiler Profile under test that generates Assembly Sequences + under a given Mapping. +#. Combine the Assembly Sequences from above into *multiple* possible Compiled + Programs. +#. Compute the outcomes of each compiled program under the AArch64 Memory Model + contained in §B2 of the Arm Architecture Reference Manual [ARMARM_]. Get a + *set* of compiled program outcomes *C*. +#. If any compiled program set of outcomes *c* in *C* exhibits a Compiler Bug + (Check that *c* is a subset of *S*) with then the given Mappings are not + interoperable. + From 7b9b325e5ef0c596b3d7e62a90e545af7d60fba8 Mon Sep 17 00:00:00 2001 From: lukeg101 <6547672+lukeg101@users.noreply.github.com> Date: Wed, 17 Jul 2024 15:05:35 +0100 Subject: [PATCH 04/17] Remove mix testing from compat defn --- atomicsabi64/atomicsabi64.rst | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/atomicsabi64/atomicsabi64.rst b/atomicsabi64/atomicsabi64.rst index b92563d1..30aefece 100644 --- a/atomicsabi64/atomicsabi64.rst +++ b/atomicsabi64/atomicsabi64.rst @@ -1218,9 +1218,8 @@ define the process by which we test compatibility. Definition of ABI-Compatibility for Atomic Operations ----------------------------------------------------- -*A compiler that implements the above set of Mappings is ABI-Compatible with -respect to other compilers that implement the Mappings, if Mix Testing their -code generation finds no Compiler Bugs.* +*A compiler that implements the above set of Mappings and special cases is ABI-Compatible with +respect to other compilers that implement the Mappings and special cases.* We impose some constraints on this definition: From 6a140f83fd2aa5d4da1a98930e63e4b61689513a Mon Sep 17 00:00:00 2001 From: Wilco Dijkstra Date: Mon, 19 Aug 2024 11:36:17 +0100 Subject: [PATCH 05/17] Update tables to improve formatting. --- atomicsabi64/atomicsabi64.rst | 1308 ++++++++++++++++----------------- 1 file changed, 639 insertions(+), 669 deletions(-) diff --git a/atomicsabi64/atomicsabi64.rst b/atomicsabi64/atomicsabi64.rst index 30aefece..92ffc2ad 100644 --- a/atomicsabi64/atomicsabi64.rst +++ b/atomicsabi64/atomicsabi64.rst @@ -385,13 +385,13 @@ To reduce repetition, we use the following notational conventions +=========================================+======================================+ | ``memory_order_relaxed`` | ``relaxed`` | +-----------------------------------------+--------------------------------------+ - | ``memory_order_acquire`` | ``acq`` | + | ``memory_order_acquire`` | ``acquire`` | +-----------------------------------------+--------------------------------------+ - | ``memory_order_release`` | ``rel`` | + | ``memory_order_release`` | ``release`` | +-----------------------------------------+--------------------------------------+ | ``memory_order_acq_rel`` | ``acq_rel`` | +-----------------------------------------+--------------------------------------+ - | ``memory_order_seq_cst`` | ``sc`` | + | ``memory_order_seq_cst`` | ``seq_cst`` | +-----------------------------------------+--------------------------------------+ In what follows ``loc`` refers to the location, ``val`` refers to a value @@ -416,9 +416,7 @@ options. | | ARCH2 | ``option B`` | +--------------------------------------------+-----------+--------------------------------------+ -Where ARCH is for example BASE (armv8), LSE, LSE2, LSE128, RCPC, or LRCPC3. -ARCH describes the required extension, with BASE meaning Armv8-A with no -extensions and LSE is shorthand for FEAT_LSE (likewise for the other extensions). +Where ARCH is either the base architecture (Armv8-A) or an extension like FEAT_LSE. Lastly, all operations are in a shorthand form: @@ -453,178 +451,202 @@ Mappings for 32-bit types In what follows, register ``X1`` contains the location ``loc`` and ``W2`` contains ``val``. The result is returned in ``W0``. - +-------------------------------------------------------------------------------------------+ - | Note | - +===========================================================================================+ - | ``*`` Using ``WZR`` or ``XZR`` for the destination register is invalid (Section 4.7). | - +-------------------------------------------------------------------------------------------+ - .. table:: - +------------------------------------------+--------------------------------------+ - | Atomic Operation | Assembly Sequence | - +==========================================+======================================+ - | ``store(loc,val,relaxed)`` | ``STR W2, [X1]`` | - +------------------------------------------+--------------------------------------+ - | ``store(loc,val,rel)`` | ``STLR W2, [X1]`` | - | ``store(loc,val,sc)`` | | - +------------------------------------------+--------------------------------------+ - | ``load(loc,relaxed)`` | ``LDR W2, [X1]`` | - +-------------------------------+----------+--------------------------------------+ - | ``load(loc,acq)`` | ``BASE`` | ``LDAR W2, [X1]`` | - + +----------+--------------------------------------+ - | | ``RCPC`` | ``LDAPR W2, [X1]`` | - +-------------------------------+----------+--------------------------------------+ - | ``load(loc,sc)`` | ``LDAR W2, [X1]`` | - +------------------------------------------+--------------------------------------+ - | ``fence(relaxed)`` | ``NOP`` | - +------------------------------------------+--------------------------------------+ - | ``fence(acq)`` | ``DMB ISHLD`` | - +------------------------------------------+--------------------------------------+ - | ``fence(rel)`` | ``DMB ISH`` | - | ``fence(acq_rel)`` | | - | ``fence(sc)`` | | - +-------------------------------+----------+--------------------------------------+ - | ``exchange(loc,val,relaxed)`` | ``BASE`` | ``loop:`` | - | | | ``LDXR W0, [X1]`` | - | | | | - | | | ``STXR W3, W2, [X1]`` | - | | | | - | | | ``CBNZ W3, loop`` | - | +----------+--------------------------------------+ - | | ``LSE`` | ``SWP W2, W0, [X1]`` * | - +-------------------------------+----------+--------------------------------------+ - | ``exchange(loc,val,acq)`` | ``BASE`` | ``loop:`` | - | | | ``LDAXR W0, [X1]`` | - | | | | - | | | ``STXR W3, W2, [X1]`` | - | | | | - | | | ``CBNZ W3, loop`` | - | +----------+--------------------------------------+ - | | ``LSE`` | ``SWPA W2, W0, [X1]`` * | - +-------------------------------+----------+--------------------------------------+ - | ``exchange(loc,val,rel)`` | ``BASE`` | ``loop:`` | - | | | ``LDXR W0, [X1]`` | - | | | | - | | | ``STLXR W3, W2, [X1]`` | - | | | | - | | | ``CBNZ W3, loop`` | - | +----------+--------------------------------------+ - | | ``LSE`` | ``SWPL W2, W0, [X1]`` * | - +-------------------------------+----------+--------------------------------------+ - | ``exchange(loc,val,acq_rel)`` | ``BASE`` | ``loop:`` | - | ``exchange(loc,val,sc)`` | | ``LDAXR W0, [X1]`` | - | | | | - | | | ``STLXR W3, W2, [X1]`` | - | | | | - | | | ``CBNZ W3, loop`` | - | +----------+--------------------------------------+ - | | ``LSE`` | ``SWPAL W2, W0, [X1]`` * | - +-------------------------------+----------+--------------------------------------+ - | ``fetch_add(loc,val,relaxed)``| ``BASE`` | ``loop:`` | - | | | ``LDXR W0, [X1]`` | - | | | | - | | | ``ADD W2, W2, W0`` | - | | | | - | | | ``STXR W3, W2, [X1]`` | - | | | | - | | | ``CBNZ W3, loop`` | - + +----------+--------------------------------------+ - | | ``LSE`` | ``LDADD W2, W0, [X1]`` * | - +-------------------------------+----------+--------------------------------------+ - | ``fetch_add(loc,val,acq)`` | ``BASE`` | ``loop:`` | - | | | ``LDAXR W0, [X1]`` | - | | | | - | | | ``ADD W2, W2, W0`` | - | | | | - | | | ``STXR W3, W2, [X1]`` | - | | | | - | | | ``CBNZ W3, loop`` | - | +----------+--------------------------------------+ - | | ``LSE`` | ``LDADDA W2, W0, [X1]`` * | - +-------------------------------+----------+--------------------------------------+ - | ``fetch_add(loc,val,rel)`` | ``BASE`` | ``loop:`` | - | | | ``LDXR W0, [X1]`` | - | | | | - | | | ``ADD W2, W2, W0`` | - | | | | - | | | ``STLXR W3, W2, [X1]`` | - | | | | - | | | ``CBNZ W3, loop`` | - | +----------+--------------------------------------+ - | | ``LSE`` | ``LDADDL W2, W0, [X1]`` * | - +-------------------------------+----------+--------------------------------------+ - | ``fetch_add(loc,val,acq_rel)``| ``BASE`` | ``loop:`` | - | ``fetch_add(loc,val,sc)`` | | ``LDXAR W0, [X1]`` | - | | | | - | | | ``ADD W2, W2, W0`` | - | | | | - | | | ``STLXR W3, W2, [X1]`` | - | | | | - | | | ``CBNZ W3, loop`` | - | +----------+--------------------------------------+ - | | ``LSE`` | ``LDADDAL W2, W0, [X1]`` * | - +-------------------------------+----------+--------------------------------------+ - | ``compare_exchange_strong(`` | ``BASE`` | ``loop:`` | - | ``loc,&exp,val,relaxed,`` | | ``LDXR W0, [X1]`` | - | ``relaxed)`` | | | - | | | ``CMP W0, W4`` | - | | | | - | | | ``B.NE fail`` | - | | | | - | | | ``STXR W3, W2, [X1]`` | - | | | | - | | | ``CBNZ W3, loop`` | - | | | | - | | | ``fail:`` | - | +----------+--------------------------------------+ - | | ``LSE`` | ``CAS W0, W2, [X1]`` * | - +-------------------------------+----------+--------------------------------------+ - | ``compare_exchange_strong(`` | ``BASE`` | ``loop:`` | - | ``loc,&exp,val,acq,acq)`` | | ``LDAXR W0, [X1]`` | - | | | | - | | | ``CMP W0, W4`` | - | | | | - | | | ``B.NE fail`` | - | | | | - | | | ``STXR W3, W2, [X1]`` | - | | | | - | | | ``CBNZ W3, loop`` | - | | | | - | | | ``fail:`` | - | +----------+--------------------------------------+ - | | ``LSE`` | ``CASA W0, W2, [X1]`` * | - +-------------------------------+----------+--------------------------------------+ - | ``compare_exchange_strong(`` | ``BASE`` | ``loop:`` | - | ``loc,&exp,val,rel,rel)`` | | ``LDXR W0, [X1]`` | - | | | | - | | | ``CMP W0, W4`` | - | | | | - | | | ``B.NE fail`` | - | | | | - | | | ``STLXR W3, W2, [X1]`` | - | | | | - | | | ``CBNZ W3, loop`` | - | | | | - | | | ``fail:`` | - | +----------+--------------------------------------+ - | | ``LSE`` | ``CASL W0, W2, [X1]`` * | - +-------------------------------+----------+--------------------------------------+ - | ``compare_exchange_strong(`` | ``BASE`` | ``loop:`` | - | ``loc,&exp,val,acq_rel,acq)``| | ``LDAXR W0, [X1]`` | - | ``compare_exchange_strong(`` | | | - | ``loc,&exp,val,sc,sc)`` | | ``CMP W0, W4`` | - | | | | - | | | ``B.NE fail`` | - | | | | - | | | ``STLXR W3, W2, [X1]`` | - | | | | - | | | ``CBNZ W3, loop`` | - | | | | - | | | ``fail:`` | - | +----------+--------------------------------------+ - | | ``LSE`` | ``CASAL W0, W2, [X1]`` * | - +-------------------------------+----------+--------------------------------------+ + +-----------------------------------------------------+--------------------------------------+ + | Atomic Operation | Assembly Sequence | + +=====================================================+======================================+ + | ``store(loc,val,relaxed)`` | .. code-block:: none | + | | | + | | STR W2, [X1] | + +-----------------------------------------------------+--------------------------------------+ + | ``store(loc,val,release)`` | .. code-block:: none | + | | | + | ``store(loc,val,seq_cst)`` | STLR W2, [X1] | + +-----------------------------------------------------+--------------------------------------+ + | ``load(loc,relaxed)`` | .. code-block:: none | + | | | + | | LDR W2, [X1] | + +-------------------------------------+---------------+--------------------------------------+ + | ``load(loc,acquire)`` | ``Armv8-A`` | .. code-block:: none | + | | | | + | | | LDAR W2, [X1] | + + +---------------+--------------------------------------+ + | | ``FEAT_RCPC`` | .. code-block:: none | + | | | | + | | | LDAPR W2, [X1] | + +-------------------------------------+---------------+--------------------------------------+ + | ``load(loc,seq_cst)`` | .. code-block:: none | + | | | + | | LDAR W2, [X1] | + +-----------------------------------------------------+--------------------------------------+ + | ``fence(relaxed)`` | .. code-block:: none | + | | | + | | NOP | + +-----------------------------------------------------+--------------------------------------+ + | ``fence(acquire)`` | .. code-block:: none | + | | | + | | DMB ISHLD | + +-----------------------------------------------------+--------------------------------------+ + | ``fence(release)`` | .. code-block:: none | + | | | + | ``fence(acq_rel)`` | DMB ISH | + | | | + | ``fence(seq_cst)`` | | + +-------------------------------------+---------------+--------------------------------------+ + | ``exchange(loc,val,relaxed)`` | ``Armv8-A`` | .. code-block:: none | + | | | | + | | | loop: | + | | | LDXR W0, [X1] | + | | | STXR W3, W2, [X1] | + | | | CBNZ W3, loop | + | +---------------+--------------------------------------+ + | | ``FEAT_LSE`` | .. code-block:: none | + | | | | + | | | SWP W2, W0, [X1] * | + +-------------------------------------+---------------+--------------------------------------+ + | ``exchange(loc,val,acquire)`` | ``Armv8-A`` | .. code-block:: none | + | | | | + | | | loop: | + | | | LDAXR W0, [X1] | + | | | STXR W3, W2, [X1] | + | | | CBNZ W3, loop | + | +---------------+--------------------------------------+ + | | ``FEAT_LSE`` | .. code-block:: none | + | | | | + | | | SWPA W2, W0, [X1] * | + +-------------------------------------+---------------+--------------------------------------+ + | ``exchange(loc,val,release)`` | ``Armv8-A`` | .. code-block:: none | + | | | | + | | | loop: | + | | | LDXR W0, [X1] | + | | | STLXR W3, W2, [X1] | + | | | CBNZ W3, loop | + | +---------------+--------------------------------------+ + | | ``FEAT_LSE`` | .. code-block:: none | + | | | | + | | | SWPL W2, W0, [X1] * | + +-------------------------------------+---------------+--------------------------------------+ + | ``exchange(loc,val,acq_rel)`` | ``Armv8-A`` | .. code-block:: none | + | ``exchange(loc,val,seq_cst)`` | | | + | | | loop: | + | | | LDAXR W0, [X1] | + | | | STLXR W3, W2, [X1] | + | | | CBNZ W3, loop | + | +---------------+--------------------------------------+ + | | ``FEAT_LSE`` | .. code-block:: none | + | | | | + | | | SWAL W2, W0, [X1] * | + +-------------------------------------+---------------+--------------------------------------+ + | ``fetch_add(loc,val,relaxed)`` | ``Armv8-A`` | .. code-block:: none | + | | | | + | | | loop: | + | | | LDXR W0, [X1] | + | | | ADD W2, W2, W0 | + | | | STXR W3, W2, [X1] | + | | | CBNZ W3, loop | + | +---------------+--------------------------------------+ + | | ``FEAT_LSE`` | .. code-block:: none | + | | | | + | | | LDADD W0, W2, [X1] * | + +-------------------------------------+---------------+--------------------------------------+ + | ``fetch_add(loc,val,acquire)`` | ``Armv8-A`` | .. code-block:: none | + | | | | + | | | loop: | + | | | LDAXR W0, [X1] | + | | | ADD W2, W2, W0 | + | | | STXR W3, W2, [X1] | + | | | CBNZ W3, loop | + | +---------------+--------------------------------------+ + | | ``FEAT_LSE`` | .. code-block:: none | + | | | | + | | | LDADDA W0, W2, [X1] * | + +-------------------------------------+---------------+--------------------------------------+ + | ``fetch_add(loc,val,release)`` | ``Armv8-A`` | .. code-block:: none | + | | | | + | | | loop: | + | | | LDXR W0, [X1] | + | | | ADD W2, W2, W0 | + | | | STLXR W3, W2, [X1] | + | | | CBNZ W3, loop | + | +---------------+--------------------------------------+ + | | ``FEAT_LSE`` | .. code-block:: none | + | | | | + | | | LDADDL W0, W2, [X1] * | + +-------------------------------------+---------------+--------------------------------------+ + | ``fetch_add(loc,val,acq_rel)`` | ``Armv8-A`` | .. code-block:: none | + | ``fetch_add(loc,val,seq_cst)`` | | | + | | | loop: | + | | | LDAXR W0, [X1] | + | | | ADD W2, W2, W0 | + | | | STLXR W3, W2, [X1] | + | | | CBNZ W3, loop | + | +---------------+--------------------------------------+ + | | ``FEAT_LSE`` | .. code-block:: none | + | | | | + | | | LDADDAL W0, W2, [X1] * | + +-------------------------------------+---------------+--------------------------------------+ + | ``compare_exchange_strong(`` | ``Armv8-A`` | .. code-block:: none | + | ``loc,&exp,val,relaxed,relaxed)`` | | | + | | | loop: | + | | | LDXR W0, [X1] | + | | | CMP W0, W4 | + | | | B.NE fail | + | | | STXR W3, W2, [X1] | + | | | CBNZ W3, loop | + | | | fail: | + | +---------------+--------------------------------------+ + | | ``FEAT_LSE`` | .. code-block:: none | + | | | | + | | | CAS W0, W2, [X1] * | + +-------------------------------------+---------------+--------------------------------------+ + | ``compare_exchange_strong(`` | ``Armv8-A`` | .. code-block:: none | + | ``loc,&exp,val,acquire,acquire)`` | | | + | | | loop: | + | | | LDAXR W0, [X1] | + | | | CMP W0, W4 | + | | | B.NE fail | + | | | STXR W3, W2, [X1] | + | | | CBNZ W3, loop | + | | | fail: | + | +---------------+--------------------------------------+ + | | ``FEAT_LSE`` | .. code-block:: none | + | | | | + | | | CASA W0, W2, [X1] * | + +-------------------------------------+---------------+--------------------------------------+ + | ``compare_exchange_strong(`` | ``Armv8-A`` | .. code-block:: none | + | ``loc,&exp,val,release,release)`` | | | + | | | loop: | + | | | LDXR W0, [X1] | + | | | CMP W0, W4 | + | | | B.NE fail | + | | | STLXR W3, W2, [X1] | + | | | CBNZ W3, loop | + | | | fail: | + | +---------------+--------------------------------------+ + | | ``FEAT_LSE`` | .. code-block:: none | + | | | | + | | | CASL W0, W2, [X1] * | + +-------------------------------------+---------------+--------------------------------------+ + | ``compare_exchange_strong(`` | ``Armv8-A`` | .. code-block:: none | + | ``loc,&exp,val,acq_rel,acquire)`` | | | + | ``compare_exchange_strong(`` | | loop: | + | ``loc,&exp,val,seq_cst,seq_cst)`` | | LDAXR W0, [X1] | + | | | CMP W0, W4 | + | | | B.NE fail | + | | | STLXR W3, W2, [X1] | + | | | CBNZ W3, loop | + | | | fail: | + | +---------------+--------------------------------------+ + | | ``FEAT_LSE`` | .. code-block:: none | + | | | | + | | | CASAL W0, W2, [X1] * | + +-------------------------------------+---------------+--------------------------------------+ + | Note | + +--------------------------------------------------------------------------------------------+ + | ``*`` Using ``WZR`` or ``XZR`` for the destination register is invalid (Section 4.7). | + +--------------------------------------------------------------------------------------------+ + Mappings for 8-bit types ------------------------ @@ -642,7 +664,7 @@ The Mappings for 16-bit types are the same as 32-bit types except they use the Mappings for 64-bit types ------------------------- -The Msappings for 64-bit types are the same as 32-bit types except the registers +The Mappings for 64-bit types are the same as 32-bit types except the registers used are X-registers. Mappings for 128-bit types @@ -657,498 +679,446 @@ In what follows, register ``X4`` contains the location ``loc``, ``X2`` and .. table:: - +-----------------------------------------------+--------------------------------------+ - | Atomic Operation | Assembly Sequence | - +=================================+=============+======================================+ - | ``store(loc,val,relaxed)`` | ``BASE`` | ``loop:`` | - | | | ``LDXP XZR, X1, [X4]`` | - | | | | - | | | ``STXP W5, X2, X3, [X4]`` | - | | | | - | | | ``CBNZ W5, loop`` | - | +-------------+--------------------------------------+ - | | ``LSE`` | ``LDP X0, X1, [X4]`` | - | | | | - | | | ``loop:`` | - | | | ``MOV X6, X0`` | - | | | | - | | | ``MOV X7, X1`` | - | | | | - | | | ``CASP X0, X1, X2, X3, [X4]`` | - | | | | - | | | ``CMP X0, X6`` | - | | | | - | | | ``CCMP X1, X7, 0, EQ`` | - | | | | - | | | ``B.NE loop`` | - | +-------------+--------------------------------------+ - | | ``LSE2`` | ``STP x2, X3, [X4]`` | - +---------------------------------+-------------+--------------------------------------+ - | ``store(loc,val,rel)`` | ``BASE`` | ``loop:`` | - | | | ``LDXP XZR, X1, [X4]`` | - | | | ``STLXP W5, X2, X3, [X4]`` | - | | | ``CBNZ W5, loop`` | - | +-------------+--------------------------------------+ - | | ``LSE`` | ``LDP X0, X1, [X4]`` | - | | | | - | | | ``loop:`` | - | | | ``MOV X6, X0`` | - | | | | - | | | ``MOV X7, X1`` | - | | | | - | | | ``CASPL X0, X1, X2, X3, [X4]`` | - | | | | - | | | ``CMP X0, X6`` | - | | | | - | | | ``CCMP X1, X7, 0, EQ`` | - | | | | - | | | ``B.NE loop`` | - | +-------------+--------------------------------------+ - | | ``LSE2`` | ``DMB ISH`` | - | | | | - | | | ``STP X2, X3, [X4]`` | - | +-------------+--------------------------------------+ - | | ``LRCPC3`` | ``STILP X2, X3, [X4]`` | - +---------------------------------+-------------+--------------------------------------+ - | ``store(loc,val,sc)`` | ``BASE`` | ``loop:`` | - | | | ``LDXP XZR, X1, [X4]`` | - | | | | - | | | ``STLXP W5, X2, X3, [X4]`` | - | | | | - | | | ``CBNZ W5, loop`` | - | +-------------+--------------------------------------+ - | | ``LSE`` | ``LDP X0, X1, [X4]`` | - | | | | - | | | ``loop:`` | - | | | ``MOV X6, X0`` | - | | | | - | | | ``MOV X7, X1`` | - | | | | - | | | ``CASPL X0, X1, X2, X3, [X4]`` | - | | | | - | | | ``CMP X0, X6`` | - | | | | - | | | ``CCMP X1, X7, 0, EQ`` | - | | | | - | | | ``B.NE loop`` | - | +-------------+--------------------------------------+ - | | ``LSE2`` | ``DMB ISH`` | - | | | | - | | | ``STP X2, X3, [X4]`` | - | | | | - | | | ``DMB ISH`` | - | +-------------+--------------------------------------+ - | | ``LRCPC3`` | ``STILP X2, X3, [X4]`` | - +---------------------------------+-------------+--------------------------------------+ - | ``load(loc,relaxed)`` | ``BASE`` | ``loop:`` | - | | | ``LDXP X0, X1, [X4]`` | - | | | | - | | | ``STXP W5, X0, X1, [X4]`` | - | | | | - | | | ``CBNZ W5, loop`` | - | +-------------+--------------------------------------+ - | | ``LSE`` | ``CASP X0, X1, X0, X1, [X4]`` | - | +-------------+--------------------------------------+ - | | ``LSE2`` | ``LDP X0, X1, [X4]`` | - +---------------------------------+-------------+--------------------------------------+ - | ``load(loc,acq)`` | ``BASE`` | ``loop:`` | - | | | ``LDAXP X0, X1, [X4]`` | - | | | | - | | | ``STXP W5, X0, X1, [X4]`` | - | | | | - | | | ``CBNZ W5, loop`` | - | +-------------+--------------------------------------+ - | | ``LSE`` | ``CASPA X0, X1, X0, X1, [X4]`` | - | +-------------+--------------------------------------+ - | | ``LSE2`` | ``LDP X0, X1, [X4]`` | - | | | | - | | | ``DMB ISHLD`` | - | +-------------+--------------------------------------+ - | | ``LRCPC3`` | ``LDIAPP X0, X1, [X4]`` | - +---------------------------------+-------------+--------------------------------------+ - | ``load(loc,sc)`` | ``BASE`` | ``loop:`` | - | | | ``LDAXP X0, X1, [X4]`` | - | | | | - | | | ``STXP W5, X0, X1, [X4]`` | - | | | | - | | | ``CBNZ W5, loop`` | - | +-------------+--------------------------------------+ - | | ``LSE`` | ``CASPA X0, X1, X0, X1, [X4]`` | - | +-------------+--------------------------------------+ - | | ``LSE2`` | ``LDAR X5, [X4]`` | - | | | | - | | | ``LDP X0, X1, [X4]`` | - | | | | - | | | ``DMB ISHLD`` | - | +-------------+--------------------------------------+ - | | ``LRCPC3`` | ``LDAR X5, [X4]`` | - | | | | - | | | ``LDIAPP X0, X1, [X4]`` | - +---------------------------------+-------------+--------------------------------------+ - | ``exchange(loc,val,relaxed)`` | ``BASE`` | ``loop:`` | - | | | ``LDXP X0, X1, [X4]`` | - | | | | - | | | ``STXP W5, X2, X3, [X4]`` | - | | | | - | | | ``CBNZ W5, loop`` | - | +-------------+--------------------------------------+ - | | ``LSE`` | ``LDP X0, X1, [X4]`` | - | | | | - | | | ``loop:`` | - | | | ``MOV X6, X0`` | - | | | | - | | | ``MOV X7, X1`` | - | | | | - | | | ``CASP X0, X1, X2, X3, [X4]`` | - | | | | - | | | ``CMP X0, X6`` | - | | | | - | | | ``CCMP X1, X7, 0, EQ`` | - | | | | - | | | ``B.NE loop`` | - | +-------------+--------------------------------------+ - | | ``LSE128`` | ``MOV X0, X2`` | - | | | | - | | | ``MOV X1, X3`` | - | | | | - | | | ``SWPP X0, X1, [X4]`` | - +---------------------------------+-------------+--------------------------------------+ - | ``exchange(loc,val,acq)`` | ``BASE`` | ``loop:`` | - | | | ``LDAXP X0, X1, [X4]`` | - | | | | - | | | ``STXP W5, X2, X3, [X4]`` | - | | | | - | | | ``CBNZ W5, loop`` | - | +-------------+--------------------------------------+ - | | ``LSE`` | ``LDP X0, X1, [X4]`` | - | | | | - | | | ``loop:`` | - | | | ``MOV X6, X0`` | - | | | | - | | | ``MOV X7, X1`` | - | | | | - | | | ``CASPA X0, X1, X2, X3, [X4]`` | - | | | | - | | | ``CMP X0, X6`` | - | | | | - | | | ``CCMP X1, X7, 0, EQ`` | - | | | | - | | | ``B.NE loop`` | - | +-------------+--------------------------------------+ - | | ``LSE128`` | ``MOV X0, X2`` | - | | | | - | | | ``MOV X1, X3`` | - | | | | - | | | ``SWPPA X0, X1, [X4]`` | - +---------------------------------+-------------+--------------------------------------+ - | ``exchange(loc,val,rel)`` | ``BASE`` | ``loop:`` | - | | | ``LDXP X0, X1, [X4]`` | - | | | | - | | | ``STLXP W5, X2, X3, [X4]`` | - | | | | - | | | ``CBNZ W5, loop`` | - | +-------------+--------------------------------------+ - | | ``LSE`` | ``LDP X0, X1, [X4]`` | - | | | | - | | | ``loop:`` | - | | | ``MOV X6, X0`` | - | | | | - | | | ``MOV X7, X1`` | - | | | | - | | | ``CASPL X0, X1, X2, X3, [X4]`` | - | | | | - | | | ``CMP X0, X6`` | - | | | | - | | | ``CCMP X1, X7, 0, EQ`` | - | | | | - | | | ``B.NE loop`` | - | +-------------+--------------------------------------+ - | | ``LSE128`` | ``MOV X0, X2`` | - | | | | - | | | ``MOV X1, X3`` | - | | | | - | | | ``SWPPL X0, X1, [X4]`` | - +---------------------------------+-------------+--------------------------------------+ - | ``exchange(loc,val,acq_rel)`` | ``BASE`` | ``loop:`` | - | ``exchange(loc,val,sc)`` | | ``LDAXP X0, X1, [X4]`` | - | | | | - | | | ``STLXP W5, X2, X3, [X4]`` | - | | | | - | | | ``CBNZ W5, loop`` | - | +-------------+--------------------------------------+ - | | ``LSE`` | ``LDP X0, X1, [X4]`` | - | | | | - | | | ``loop:`` | - | | | ``MOV X6, X0`` | - | | | | - | | | ``MOV X7, X1`` | - | | | | - | | | ``CASPAL X0, X1, X2, X3, [X4]`` | - | | | | - | | | ``CMP X0, X6`` | - | | | | - | | | ``CCMP X1, X7, 0, EQ`` | - | | | | - | | | ``B.NE loop`` | - | +-------------+--------------------------------------+ - | | ``LSE128`` | ``MOV X0, X2`` | - | | | | - | | | ``MOV X1, X3`` | - | | | | - | | | ``SWPPAL X0, X1, [X4]`` | - +---------------------------------+-------------+--------------------------------------+ - | ``fetch_add(loc,val,relaxed)`` | ``BASE`` | ``loop:`` | - | | | ``LDXP X0, X1, [X4]`` | - | | | | - | | | ``ADDS X0, X0, X2`` | - | | | | - | | | ``ADC X1, X1, X3`` | - | | | | - | | | ``STXP W5, X2, X3, [X4]`` | - | | | | - | | | ``CBNZ W5, loop`` | - | +-------------+--------------------------------------+ - | | ``LSE`` | ``LDP X0, X1, [X4]`` | - | | | | - | | | ``loop:`` | - | | | ``MOV X6, X0`` | - | | | | - | | | ``MOV X7, X1`` | - | | | | - | | | ``ADDS X8, X0, X2`` | - | | | | - | | | ``ADC X9, X1, X3`` | - | | | | - | | | ``CASP X0, X1, X8, X9, [X4]`` | - | | | | - | | | ``CMP X0, X6`` | - | | | | - | | | ``CCMP X1, X7, 0, EQ`` | - | | | | - | | | ``B.NE loop`` | - +---------------------------------+-------------+--------------------------------------+ - | ``fetch_add(loc,val,acq)`` | ``BASE`` | ``loop:`` | - | | | ``LDAXP X0, X1, [X4]`` | - | | | | - | | | ``ADDS X0, X0, X2`` | - | | | | - | | | ``ADC X1, X1, X3`` | - | | | | - | | | ``STXP W5, X2, X3, [X4]`` | - | | | | - | | | ``CBNZ W5, loop`` | - | +-------------+--------------------------------------+ - | | ``LSE`` | ``LDP X0, X1, [X4]`` | - | | | | - | | | ``loop:`` | - | | | ``MOV X6, X0`` | - | | | | - | | | ``MOV X7, X1`` | - | | | | - | | | ``ADDS X8, X0, X2`` | - | | | | - | | | ``ADC X9, X1, X3`` | - | | | | - | | | ``CASPA X0, X1, X8, X9, [X4]`` | - | | | | - | | | ``CMP X0, X6`` | - | | | | - | | | ``CCMP X1, X7, 0, EQ`` | - | | | | - | | | ``B.NE loop`` | - +---------------------------------+-------------+--------------------------------------+ - | ``fetch_add(loc,val,rel)`` | ``BASE`` | ``loop:`` | - | | | ``LDXP X0, X1, [X4]`` | - | | | | - | | | ``ADDS X0, X0, X2`` | - | | | | - | | | ``ADC X1, X1, X3`` | - | | | | - | | | ``STLXP W5, X2, X3, [X4]`` | - | | | | - | | | ``CBNZ W5, loop`` | - | +-------------+--------------------------------------+ - | | ``LSE`` | ``LDP X0, X1, [X4]`` | - | | | | - | | | ``loop:`` | - | | | ``MOV X6, X0`` | - | | | | - | | | ``MOV X7, X1`` | - | | | | - | | | ``ADDS X8, X0, X2`` | - | | | | - | | | ``ADC X9, X1, X3`` | - | | | | - | | | ``CASPL X0, X1, X8, X9, [X4]`` | - | | | | - | | | ``CMP X0, X6`` | - | | | | - | | | ``CCMP X1, X7, 0, EQ`` | - | | | | - | | | ``B.NE loop`` | - +---------------------------------+-------------+--------------------------------------+ - | ``fetch_add(loc,val,acq_rel)`` | ``BASE`` | ``loop:`` | - | ``fetch_add(loc,val,sc)`` | | ``LDAXP X0, X1, [X4]`` | - | | | | - | | | ``ADDS X0, X0, X2`` | - | | | | - | | | ``ADC X1, X1, X3`` | - | | | | - | | | ``STXLP W5, X2, X3, [X4]`` | - | | | | - | | | ``CBNZ W5, loop`` | - | +-------------+--------------------------------------+ - | | ``LSE`` | ``LDP X0, X1, [X4]`` | - | | | | - | | | ``loop:`` | - | | | ``MOV X6, X0`` | - | | | | - | | | ``MOV X7, X1`` | - | | | | - | | | ``ADDS X8, X0, X2`` | - | | | | - | | | ``ADC X9, X1, X3`` | - | | | | - | | | ``CASPAL X0, X1, X8, X9, [X4]`` | - | | | | - | | | ``CMP X0, X6`` | - | | | | - | | | ``CCMP X1, X7, 0, EQ`` | - | | | | - | | | ``B.NE loop`` | - +---------------------------------+-------------+--------------------------------------+ - | ``fetch_or(loc,val,relaxed)`` | ``LSE128`` | ``MOV X0, X2`` | - | | | | - | | | ``MOV X1, X3`` | - | | | | - | | | ``LDSETP X0, X1, [X4]`` | - +---------------------------------+-------------+--------------------------------------+ - | ``fetch_or(loc,val,acq)`` | ``LSE128`` | ``MOV X0, X2`` | - | | | | - | | | ``MOV X1, X3`` | - | | | | - | | | ``LDSETPA X0, X1, [X4]`` | - +---------------------------------+-------------+--------------------------------------+ - | ``fetch_or(loc,val,rel)`` | ``LSE128`` | ``MOV X0, X2`` | - | | | | - | | | ``MOV X1, X3`` | - | | | | - | | | ``LDSETPL X0, X1, [X4]`` | - +---------------------------------+-------------+--------------------------------------+ - | ``fetch_or(loc,val,acq_rel)`` | ``LSE128`` | ``MOV X0, X2`` | - | ``fetch_or(loc,val,sc)`` | | | - | | | ``MOV X1, X3`` | - | | | | - | | | ``LDSETPAL X0, X1, [X4]`` | - +---------------------------------+-------------+--------------------------------------+ - | ``fetch_and(loc,val,relaxed)`` | ``LSE128`` | ``MVN X0, X2`` | - | | | | - | | | ``MVN X1, X3`` | - | | | | - | | | ``LDCLRP X0, X1, [X4]`` | - +---------------------------------+-------------+--------------------------------------+ - | ``fetch_and(loc,val,acq)`` | ``LSE128`` | ``MVN X0, X2`` | - | | | | - | | | ``MVN X1, X3`` | - | | | | - | | | ``LDCLRPA X0, X1, [X4]`` | - +---------------------------------+-------------+--------------------------------------+ - | ``fetch_and(loc,val,rel)`` | ``LSE128`` | ``MVN X0, X2`` | - | | | | - | | | ``MVN X1, X3`` | - | | | | - | | | ``LDCLRPL X0, X1, [X4]`` | - +---------------------------------+-------------+--------------------------------------+ - | ``fetch_and(loc,val,acq_rel)`` | ``LSE128`` | ``MVN X0, X2`` | - | ``fetch_and(loc,val,sc)`` | | | - | | | ``MVN X1, X3`` | - | | | | - | | | ``LDCLRPAL X0, X1, [X4]`` | - +---------------------------------+-------------+--------------------------------------+ - | ``compare_exchange_strong(`` | ``BASE`` | ``loop:`` | - | ``loc,&exp,val,relaxed,`` | | ``LDXP X6, x7, [X4]`` | - | ``relaxed)`` | | | - | | | ``CMP X6, X0`` | - | | | | - | | | ``CCMP X7, X1, 0, EQ`` | - | | | | - | | | ``CSEL X8, X2, X6, EQ`` | - | | | | - | | | ``CSEL X9, X3, X7, EQ`` | - | | | | - | | | ``STXP W5, X8, X9, [X4]`` | - | | | | - | | | ``CBNZ W5, loop`` | - | | | | - | | | ``MOV X0, X6`` | - | | | | - | | | ``MOV X1, X7`` | - | +-------------+--------------------------------------+ - | | ``LSE`` | ``CASP X0, X1, X2, X3, [X4]`` | - +---------------------------------+-------------+--------------------------------------+ - | ``compare_exchange_strong(`` | ``BASE`` | ``loop:`` | - | ``loc,&exp,val,acq, acq)`` | | ``LDAXP X6, x7, [X4]`` | - | | | | - | | | ``CMP X6, X0`` | - | | | | - | | | ``CCMP X7, X1, 0, EQ`` | - | | | | - | | | ``CSEL X8, X2, X6, EQ`` | - | | | | - | | | ``CSEL X9, X3, X7, EQ`` | - | | | | - | | | ``STXP W5, X8, X9, [X4]`` | - | | | | - | | | ``CBNZ W5, loop`` | - | | | | - | | | ``MOV X0, X6`` | - | | | | - | | | ``MOV X1, X7`` | - | +-------------+--------------------------------------+ - | | ``LSE`` | ``CASPA X0, X1, X2, X3, [X4]`` | - +---------------------------------+-------------+--------------------------------------+ - | ``compare_exchange_strong(`` | ``BASE`` | ``loop:`` | - | ``loc,&exp,val,rel,rel)`` | | ``LDXP X6, x7, [X4]`` | - | | | | - | | | ``CMP X6, X0`` | - | | | | - | | | ``CCMP X7, X1, 0, EQ`` | - | | | | - | | | ``CSEL X8, X2, X6, EQ`` | - | | | | - | | | ``CSEL X9, X3, X7, EQ`` | - | | | | - | | | ``STLXP W5, X8, X9, [X4]`` | - | | | | - | | | ``CBNZ W5, loop`` | - | | | | - | | | ``MOV X0, X6`` | - | | | | - | | | ``MOV X1, X7`` | - | +-------------+--------------------------------------+ - | | ``LSE`` | ``CASPL X0, X1, X2, X3, [X4]`` | - +---------------------------------+-------------+--------------------------------------+ - | ``compare_exchange_strong(`` | ``BASE`` | ``loop:`` | - | ``loc,&exp,val,acq_rel,acq)`` | | ``LDAXP X6, x7, [X4]`` | - | ``compare_exchange_strong(`` | | | - | ``loc,&exp,val,sc,sc)`` | | ``CMP X6, X0`` | - | | | | - | | | ``CCMP X7, X1, 0, EQ`` | - | | | | - | | | ``CSEL X8, X2, X6, EQ`` | - | | | | - | | | ``CSEL X9, X3, X7, EQ`` | - | | | | - | | | ``STLXP W5, X8, X9, [X4]`` | - | | | | - | | | ``CBNZ W5, loop`` | - | | | | - | | | ``MOV X0, X6`` | - | | | | - | | | ``MOV X1, X7`` | - | +-------------+--------------------------------------+ - | | ``LSE`` | ``CASPAL X0, X1, X2, X3, [X4]`` | - +---------------------------------+-------------+--------------------------------------+ + +-----------------------------------------------------+--------------------------------------+ + | Atomic Operation | Assembly Sequence | + +=====================================+===============+======================================+ + | ``store(loc,val,relaxed)`` | ``Armv8-A`` | .. code-block:: none | + | | | | + | | | loop: | + | | | LDXP XZR, X1, [X4] | + | | | STXP W5, X2, X3, [X4] | + | | | CBNZ W5, loop | + | +---------------+--------------------------------------+ + | | ``FEAT_LSE`` | .. code-block:: none | + | | | | + | | | LDP X0, X1, [X4] | + | | | loop: | + | | | MOV X6, X0 | + | | | MOV X7, X1 | + | | | CASP X0, X1, X2, X3, [X4] | + | | | CMP X0, X6 | + | | | CCMP X1, X7, 0, EQ | + | | | B.NE loop | + | +---------------+--------------------------------------+ + | | ``FEAT_LSE2`` | .. code-block:: none | + | | | | + | | | STP X2, X3, [X4] | + +-------------------------------------+---------------+--------------------------------------+ + | ``store(loc,val,release)`` | ``Armv8-A`` | .. code-block:: none | + | | | | + | | | loop: | + | | | LDXP XZR, X1, [X4] | + | | | STLXP W5, X2, X3, [X4] | + | | | CBNZ W5, loop | + | +---------------+--------------------------------------+ + | | ``FEAT_LSE`` | .. code-block:: none | + | | | | + | | | LDP X0, X1, [X4] | + | | | loop: | + | | | MOV X6, X0 | + | | | MOV X7, X1 | + | | | CASPL X0, X1, X2, X3, [X4] | + | | | CMP X0, X6 | + | | | CCMP X1, X7, 0, EQ | + | | | B.NE loop | + + +---------------+--------------------------------------+ + | | ``FEAT_LSE2`` | .. code-block:: none | + | | | | + | | | DMB ISH | + | | | STP X2, X3, [X4] | + | +---------------+--------------------------------------+ + | |``FEAT_LRCPC3``| .. code-block:: none | + | | | | + | | | STILP X2, X3, [X4] | + +-------------------------------------+---------------+--------------------------------------+ + | ``store(loc,val,seq_cst)`` | ``Armv8-A`` | .. code-block:: none | + | | | | + | | | loop: | + | | | LDAXP XZR, X1, [X4] | + | | | STLXP W5, X2, X3, [X4] | + | | | CBNZ W5, loop | + | +---------------+--------------------------------------+ + | | ``FEAT_LSE`` | .. code-block:: none | + | | | | + | | | LDP X0, X1, [X4] | + | | | loop: | + | | | MOV X6, X0 | + | | | MOV X7, X1 | + | | | CASPAL X0, X1, X2, X3, [X4] | + | | | CMP X0, X6 | + | | | CCMP X1, X7, 0, EQ | + | | | B.NE loop | + + +---------------+--------------------------------------+ + | | ``FEAT_LSE2`` | .. code-block:: none | + | | | | + | | | DMB ISH | + | | | STP X2, X3, [X4] | + | | | DMB ISH | + | +---------------+--------------------------------------+ + | |``FEAT_LRCPC3``| .. code-block:: none | + | | | | + | | | STILP x2, X3, [X4] | + +-------------------------------------+---------------+--------------------------------------+ + | ``load(loc,relaxed)`` | ``Armv8-A`` | .. code-block:: none | + | | | | + | | | loop: | + | | | LDXP X0, X1, [X4] | + | | | STXP W5, X0, X1, [X4] | + | | | CBNZ W5, loop | + | +---------------+--------------------------------------+ + | | ``FEAT_LSE`` | .. code-block:: none | + | | | | + | | | CASP X0, X1, X0, X1, [X4] | + | +---------------+--------------------------------------+ + | | ``FEAT_LSE2`` | .. code-block:: none | + | | | | + | | | LDP X0, X1, [X4] | + +-------------------------------------+---------------+--------------------------------------+ + | ``load(loc,acquire)`` | ``Armv8-A`` | .. code-block:: none | + | | | | + | | | loop: | + | | | LDAXP X0, X1, [X4] | + | | | STXP W5, X0, X1, [X4] | + | | | CBNZ W5, loop | + | +---------------+--------------------------------------+ + | | ``FEAT_LSE`` | .. code-block:: none | + | | | | + | | | CASPA X0, X1, X0, X1, [X4] | + | +---------------+--------------------------------------+ + | | ``FEAT_LSE2`` | .. code-block:: none | + | | | | + | | | LDP X0, X1, [X4] | + | | | DMB ISHLD | + | +---------------+--------------------------------------+ + | |``FEAT_LRCPC3``| .. code-block:: none | + | | | | + | | | LDIAPP X0, X1, [X4] | + +-------------------------------------+---------------+--------------------------------------+ + | ``load(loc,seq_cst)`` | ``Armv8-A`` | .. code-block:: none | + | | | | + | | | loop: | + | | | LDAXP X0, X1, [X4] | + | | | STXP W5, X0, X1, [X4] | + | | | CBNZ W5, loop | + | +---------------+--------------------------------------+ + | | ``FEAT_LSE`` | .. code-block:: none | + | | | | + | | | CASPA X0, X1, X0, X1, [X4] | + | +---------------+--------------------------------------+ + | | ``FEAT_LSE2`` | .. code-block:: none | + | | | | + | | | LDAR X5, [X4] | + | | | LDP X0, X1, [X4] | + | | | DMB ISHLD | + | +---------------+--------------------------------------+ + | |``FEAT_LRCPC3``| .. code-block:: none | + | | | | + | | | LDAR X5, [X4] | + | | | LDIAPP X0, X1, [X4] | + +-------------------------------------+---------------+--------------------------------------+ + | ``exchange(loc,val,relaxed)`` | ``Armv8-A`` | .. code-block:: none | + | | | | + | | | loop: | + | | | LDXP X0, X1, [X4] | + | | | STXP W5, X2, X3, [X4] | + | | | CBNZ W5, loop | + | +---------------+--------------------------------------+ + | | ``FEAT_LSE`` | .. code-block:: none | + | | | | + | | | LDP X0, X1, [X4] | + | | | loop: | + | | | MOV X6, X0 | + | | | MOV X7, X1 | + | | | CASP X0, X1, X2, X3, [X4] | + | | | CMP X0, X6 | + | | | CCMP X1, X7, 0, EQ | + | | | B.NE loop | + | +---------------+--------------------------------------+ + | |``FEAT_LSE128``| .. code-block:: none | + | | | | + | | | MOV X0, X2 | + | | | MOV X1, X3 | + | | | SWPP X0, X1, [X4] | + +-------------------------------------+---------------+--------------------------------------+ + | ``exchange(loc,val,acquire)`` | ``Armv8-A`` | .. code-block:: none | + | | | | + | | | loop: | + | | | LDAXP X0, X1, [X4] | + | | | STXP W5, X2, X3, [X4] | + | | | CBNZ W5, loop | + | +---------------+--------------------------------------+ + | | ``FEAT_LSE`` | .. code-block:: none | + | | | | + | | | LDP X0, X1, [X4] | + | | | loop: | + | | | MOV X6, X0 | + | | | MOV X7, X1 | + | | | CASPA X0, X1, X2, X3, [X4] | + | | | CMP X0, X6 | + | | | CCMP X1, X7, 0, EQ | + | | | B.NE loop | + | +---------------+--------------------------------------+ + | |``FEAT_LSE128``| .. code-block:: none | + | | | | + | | | MOV X0, X2 | + | | | MOV X1, X3 | + | | | SWPPA X0, X1, [X4] | + +-------------------------------------+---------------+--------------------------------------+ + | ``exchange(loc,val,release)`` | ``Armv8-A`` | .. code-block:: none | + | | | | + | | | loop: | + | | | LDXP X0, X1, [X4] | + | | | STLXP W5, X2, X3, [X4] | + | | | CBNZ W5, loop | + | +---------------+--------------------------------------+ + | | ``FEAT_LSE`` | .. code-block:: none | + | | | | + | | | LDP X0, X1, [X4] | + | | | loop: | + | | | MOV X6, X0 | + | | | MOV X7, X1 | + | | | CASPL X0, X1, X2, X3, [X4] | + | | | CMP X0, X6 | + | | | CCMP X1, X7, 0, EQ | + | | | B.NE loop | + | +---------------+--------------------------------------+ + | |``FEAT_LSE128``| .. code-block:: none | + | | | | + | | | MOV X0, X2 | + | | | MOV X1, X3 | + | | | SWPPL X0, X1, [X4] | + +-------------------------------------+---------------+--------------------------------------+ + | ``exchange(loc,val,acq_rel)`` | ``Armv8-A`` | .. code-block:: none | + | | | | + | ``exchange(loc,val,seq_cst)`` | | loop: | + | | | LDAXP X0, X1, [X4] | + | | | STLXP W5, X2, X3, [X4] | + | | | CBNZ W5, loop | + | +---------------+--------------------------------------+ + | | ``FEAT_LSE`` | .. code-block:: none | + | | | | + | | | LDP X0, X1, [X4] | + | | | loop: | + | | | MOV X6, X0 | + | | | MOV X7, X1 | + | | | CASPAL X0, X1, X2, X3, [X4] | + | | | CMP X0, X6 | + | | | CCMP X1, X7, 0, EQ | + | | | B.NE loop | + | +---------------+--------------------------------------+ + | |``FEAT_LSE128``| .. code-block:: none | + | | | | + | | | MOV X0, X2 | + | | | MOV X1, X3 | + | | | SWPPAL X0, X1, [X4] | + +-------------------------------------+---------------+--------------------------------------+ + | ``fetch_add(loc,val,relaxed)`` | ``Armv8-A`` | .. code-block:: none | + | | | | + | | | loop: | + | | | LDXP X0, X1, [X4] | + | | | ADDS X0, X0, X2 | + | | | ADC X1, X1, X3 | + | | | STXP W5, X0, X1, [X4] | + | | | CBNZ W5, loop | + | +---------------+--------------------------------------+ + | | ``FEAT_LSE`` | .. code-block:: none | + | | | | + | | | LDP X0, X1, [X4] | + | | | loop: | + | | | MOV X6, X0 | + | | | MOV X7, X1 | + | | | ADDS X8, X0, X2 | + | | | ADC X9, X1, X3 | + | | | CASP X0, X1, X8, X9, [X4] | + | | | CMP X0, X6 | + | | | CCMP X1, X7, 0, EQ | + | | | B.NE loop | + +-------------------------------------+---------------+--------------------------------------+ + | ``fetch_add(loc,val,acquire)`` | ``Armv8-A`` | .. code-block:: none | + | | | | + | | | loop: | + | | | LDAXP X0, X1, [X4] | + | | | ADDS X0, X0, X2 | + | | | ADC X1, X1, X3 | + | | | STXP W5, X0, X1, [X4] | + | | | CBNZ W5, loop | + | +---------------+--------------------------------------+ + | | ``FEAT_LSE`` | .. code-block:: none | + | | | | + | | | LDP X0, X1, [X4] | + | | | loop: | + | | | MOV X6, X0 | + | | | MOV X7, X1 | + | | | ADDS X8, X0, X2 | + | | | ADC X9, X1, X3 | + | | | CASPA X0, X1, X8, X9, [X4] | + | | | CMP X0, X6 | + | | | CCMP X1, X7, 0, EQ | + | | | B.NE loop | + +-------------------------------------+---------------+--------------------------------------+ + | ``fetch_add(loc,val,release)`` | ``Armv8-A`` | .. code-block:: none | + | | | | + | | | loop: | + | | | LDXP X0, X1, [X4] | + | | | ADDS X0, X0, X2 | + | | | ADC X1, X1, X3 | + | | | STLXP W5, X0, X1, [X4] | + | | | CBNZ W5, loop | + | +---------------+--------------------------------------+ + | | ``FEAT_LSE`` | .. code-block:: none | + | | | | + | | | LDP X0, X1, [X4] | + | | | loop: | + | | | MOV X6, X0 | + | | | MOV X7, X1 | + | | | ADDS X8, X0, X2 | + | | | ADC X9, X1, X3 | + | | | CASPL X0, X1, X8, X9, [X4] | + | | | CMP X0, X6 | + | | | CCMP X1, X7, 0, EQ | + | | | B.NE loop | + +-------------------------------------+---------------+--------------------------------------+ + | ``fetch_add(loc,val,acq_rel)`` | ``Armv8-A`` | .. code-block:: none | + | | | | + | ``fetch_add(loc,val,seq_cst)`` | | loop: | + | | | LDAXP X0, X1, [X4] | + | | | ADDS X0, X0, X2 | + | | | ADC X1, X1, X3 | + | | | STLXP W5, X0, X1, [X4] | + | | | CBNZ W5, loop | + | +---------------+--------------------------------------+ + | | ``FEAT_LSE`` | .. code-block:: none | + | | | | + | | | LDP X0, X1, [X4] | + | | | loop: | + | | | MOV X6, X0 | + | | | MOV X7, X1 | + | | | ADDS X8, X0, X2 | + | | | ADC X9, X1, X3 | + | | | CASPAL X0, X1, X8, X9, [X4] | + | | | CMP X0, X6 | + | | | CCMP X1, X7, 0, EQ | + | | | B.NE loop | + +-------------------------------------+---------------+--------------------------------------+ + | ``fetch_or(loc,val,relaxed)`` |``FEAT_LSE128``| .. code-block:: none | + | | | | + | | | MOV X0, X2 | + | | | MOV X1, X3 | + | | | LDSETP X0, X1, [X4] | + +-------------------------------------+---------------+--------------------------------------+ + | ``fetch_or(loc,val,acquire)`` |``FEAT_LSE128``| .. code-block:: none | + | | | | + | | | MOV X0, X2 | + | | | MOV X1, X3 | + | | | LDSETPA X0, X1, [X4] | + +-------------------------------------+---------------+--------------------------------------+ + | ``fetch_or(loc,val,release)`` |``FEAT_LSE128``| .. code-block:: none | + | | | | + | | | MOV X0, X2 | + | | | MOV X1, X3 | + | | | LDSETPL X0, X1, [X4] | + +-------------------------------------+---------------+--------------------------------------+ + | ``fetch_or(loc,val,acq_rel)`` |``FEAT_LSE128``| .. code-block:: none | + | | | | + | ``fetch_or(loc,val,seq_cst)`` | | MOV X0, X2 | + | | | MOV X1, X3 | + | | | LDSETPAL X0, X1, [X4] | + +-------------------------------------+---------------+--------------------------------------+ + | ``fetch_and(loc,val,relaxed)`` |``FEAT_LSE128``| .. code-block:: none | + | | | | + | | | MVN X0, X2 | + | | | MVN X1, X3 | + | | | LDCLRP X0, X1, [X4] | + +-------------------------------------+---------------+--------------------------------------+ + | ``fetch_and(loc,val,acquire)`` |``FEAT_LSE128``| .. code-block:: none | + | | | | + | | | MVN X0, X2 | + | | | MNV X1, X3 | + | | | LDCLRPA X0, X1, [X4] | + +-------------------------------------+---------------+--------------------------------------+ + | ``fetch_and(loc,val,release)`` |``FEAT_LSE128``| .. code-block:: none | + | | | | + | | | MVN X0, X2 | + | | | MVN X1, X3 | + | | | LDCLRPL X0, X1, [X4] | + +-------------------------------------+---------------+--------------------------------------+ + | ``fetch_and(loc,val,acq_rel)`` |``FEAT_LSE128``| .. code-block:: none | + | | | | + | ``fetch_and(loc,val,seq_cst)`` | | MVN X0, X2 | + | | | MVN X1, X3 | + | | | LDCLRPAL X0, X1, [X4] | + +-------------------------------------+---------------+--------------------------------------+ + | ``compare_exchange_strong(`` | ``Armv8-A`` | .. code-block:: none | + | ``loc,exp,val,relaxed,relaxed)`` | | | + | | | loop: | + | | | LDXP X6, X7, [X4] | + | | | CMP X6, X0 | + | | | CCMP X7, X1, 0, EQ | + | | | CSEL X8, X2, X6, EQ | + | | | CSEL X9, X3, X7, EQ | + | | | STXP W5, X8, X9, [X4] | + | | | CBNZ W5, loop | + | | | MOV X0, X6 | + | | | MOV X1, X7 | + | +---------------+--------------------------------------+ + | | ``FEAT_LSE`` | .. code-block:: none | + | | | | + | | | CASP X0, X1, X2, X3, [X4] | + +-------------------------------------+---------------+--------------------------------------+ + | ``compare_exchange_strong(`` | ``Armv8-A`` | .. code-block:: none | + | ``loc,exp,val,acquire,acquire)`` | | | + | | | loop: | + | ``compare_exchange_strong(`` | | LDAXP X6, X7, [X4] | + | ``loc,exp,val,acquire,relaxed)`` | | CMP X6, X0 | + | | | CCMP X7, X1, 0, EQ | + | | | CSEL X8, X2, X6, EQ | + | | | CSEL X9, X3, X7, EQ | + | | | STXP W5, X8, X9, [X4] | + | | | CBNZ W5, loop | + | | | MOV X0, X6 | + | | | MOV X1, X7 | + | +---------------+--------------------------------------+ + | | ``FEAT_LSE`` | .. code-block:: none | + | | | | + | | | CASPA X0, X1, X2, X3, [X4] | + +-------------------------------------+---------------+--------------------------------------+ + | ``compare_exchange_strong(`` | ``Armv8-A`` | .. code-block:: none | + | ``loc,exp,val,release,relaxed)`` | | | + | | | loop: | + | | | LDXP X6, X7, [X4] | + | | | CMP X6, X0 | + | | | CCMP X7, X1, 0, EQ | + | | | CSEL X8, X2, X6, EQ | + | | | CSEL X9, X3, X7, EQ | + | | | STLXP W5, X8, X9, [X4] | + | | | CBNZ W5, loop | + | | | MOV X0, X6 | + | | | MOV X1, X7 | + | +---------------+--------------------------------------+ + | | ``FEAT_LSE`` | .. code-block:: none | + | | | | + | | | CASPL X0, X1, X2, X3, [X4] | + +-------------------------------------+---------------+--------------------------------------+ + | ``compare_exchange_strong(`` | ``Armv8-A`` | .. code-block:: none | + | ``loc,exp,val,acq_rel,acquire)`` | | | + | | | loop: | + | ``compare_exchange_strong(`` | | LDAXP X6, X7, [X4] | + | ``loc,exp,val,seq_cst,acquire)`` | | CMP X6, X0 | + | | | CCMP X7, X1, 0, EQ | + | | | CSEL X8, X2, X6, EQ | + | | | CSEL X9, X3, X7, EQ | + | | | STLXP W5, X8, X9, [X4] | + | | | CBNZ W5, loop | + | | | MOV X0, X6 | + | | | MOV X1, X7 | + | +---------------+--------------------------------------+ + | | ``FEAT_LSE`` | .. code-block:: none | + | | | | + | | | CASPAL X0, X1, X2, X3, [X4] | + +-------------------------------------+---------------+--------------------------------------+ + + + We do not list other variants of ``fetch_`` since their Mappings should be From 72faa506e77dff140e14df3d44690e7e1b5aeafb Mon Sep 17 00:00:00 2001 From: Wilco Dijkstra Date: Mon, 19 Aug 2024 12:21:53 +0100 Subject: [PATCH 06/17] Further fixes to tables and special cases. --- atomicsabi64/atomicsabi64.rst | 58 ++++++++++++----------------------- 1 file changed, 19 insertions(+), 39 deletions(-) diff --git a/atomicsabi64/atomicsabi64.rst b/atomicsabi64/atomicsabi64.rst index 92ffc2ad..9530aa25 100644 --- a/atomicsabi64/atomicsabi64.rst +++ b/atomicsabi64/atomicsabi64.rst @@ -394,9 +394,6 @@ To reduce repetition, we use the following notational conventions | ``memory_order_seq_cst`` | ``seq_cst`` | +-----------------------------------------+--------------------------------------+ -In what follows ``loc`` refers to the location, ``val`` refers to a value -parameter. - Arbitrary registers may be used in the Assembly Sequences that may change in compiler implementations. Cases where arbitrary registers may *not* be used are covered in the Special Cases section. @@ -449,7 +446,8 @@ Mappings for 32-bit types ------------------------- In what follows, register ``X1`` contains the location ``loc`` and ``W2`` -contains ``val``. The result is returned in ``W0``. +contains ``val``. ``W0`` contains input ``exp`` in compare-exchange. The result is +returned in ``W0``. .. table:: @@ -587,7 +585,8 @@ contains ``val``. The result is returned in ``W0``. | | | LDADDAL W0, W2, [X1] * | +-------------------------------------+---------------+--------------------------------------+ | ``compare_exchange_strong(`` | ``Armv8-A`` | .. code-block:: none | - | ``loc,&exp,val,relaxed,relaxed)`` | | | + | ``loc,exp,val,relaxed,relaxed)`` | | | + | | | MOV W4, W0 | | | | loop: | | | | LDXR W0, [X1] | | | | CMP W0, W4 | @@ -601,7 +600,8 @@ contains ``val``. The result is returned in ``W0``. | | | CAS W0, W2, [X1] * | +-------------------------------------+---------------+--------------------------------------+ | ``compare_exchange_strong(`` | ``Armv8-A`` | .. code-block:: none | - | ``loc,&exp,val,acquire,acquire)`` | | | + | ``loc,exp,val,acquire,acquire)`` | | | + | | | MOV W4, W0 | | | | loop: | | | | LDAXR W0, [X1] | | | | CMP W0, W4 | @@ -615,7 +615,8 @@ contains ``val``. The result is returned in ``W0``. | | | CASA W0, W2, [X1] * | +-------------------------------------+---------------+--------------------------------------+ | ``compare_exchange_strong(`` | ``Armv8-A`` | .. code-block:: none | - | ``loc,&exp,val,release,release)`` | | | + | ``loc,exp,val,release,release)`` | | | + | | | MOV W4, W0 | | | | loop: | | | | LDXR W0, [X1] | | | | CMP W0, W4 | @@ -629,9 +630,10 @@ contains ``val``. The result is returned in ``W0``. | | | CASL W0, W2, [X1] * | +-------------------------------------+---------------+--------------------------------------+ | ``compare_exchange_strong(`` | ``Armv8-A`` | .. code-block:: none | - | ``loc,&exp,val,acq_rel,acquire)`` | | | + | ``loc,exp,val,acq_rel,acquire)`` | | | + | | | MOV W4, W0 | | ``compare_exchange_strong(`` | | loop: | - | ``loc,&exp,val,seq_cst,seq_cst)`` | | LDAXR W0, [X1] | + | ``loc,exp,val,seq_cst,seq_cst)`` | | LDAXR W0, [X1] | | | | CMP W0, W4 | | | | B.NE fail | | | | STLXR W3, W2, [X1] | @@ -675,7 +677,8 @@ width, the following Mappings use *pair* instructions, which require their own table. In what follows, register ``X4`` contains the location ``loc``, ``X2`` and -``X3`` contain the input value. The result is returned in ``X0`` and ``X1``. +``X3`` contain the input value ``val``. ``X0`` and ``X1`` contain input ``exp`` in +compare-exchange. The result is returned in ``X0`` and ``X1``. .. table:: @@ -1132,35 +1135,12 @@ Sequences exist, are stated (for instance ``fetch_or`` can be implemented using Special Cases ------------- -There are special cases in the Mappings presented above, these must be handled -in order to prevent unexpected outcomes of the compiled program. The special -cases are identified below. - -* Re-Ordering of Read-Modify-Write Effects and Acquire Fence -* Const-Qualified 128-bit Atomic Loads - -Destination Register Should Not Be Zero Register for Read-Modify-Writes -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -A compiler is not permitted to rewrite the destination register to be the -zero register for atomic operations that make use of ``SWP`` and ``LD`` -Assembly instructions. These include but are not limited to: - -.. table:: - - +-----------------------------------------+--------------------------------------+ - | Atomic Operation | Assembly Sequence | - +=========================================+======================================+ - | ``exchange(loc,val,sc)`` | ``MOV W4, #val;`` | - | | ``SWP W4, W10, [X1]`` | - +-----------------------------------------+--------------------------------------+ - | ``fetch_add(loc,val,sc)`` | ``MOV W4, #val;`` | - | | ``LDADD W4, W10, [X1]`` | - +-----------------------------------------+--------------------------------------+ - -Where ``X1`` contains the address of ``loc``. +Read-Modify-Write atomics must not use the zero register +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -We annotate Mappings affected with ``*`` in section 4.2. +``CAS``, ``SWP`` and ``LD`` instructions must not use the zero register if +the result is not used since it allows reordering of the read past a +``DMB ISHLD`` barrier. Affected instructions are marked with ``*`` in section 4.2. Const-Qualified 128-bit Atomic Loads Should Be Marked Mutable ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ @@ -1170,7 +1150,7 @@ in read-only memory (such as the ``.rodata`` section). Before LSE2, the only way to implement a single-copy 128-bit atomic load is by using a Read-Modify-Write sequence. The write is not visible to -software if the memory is writeable. Compilers and runtimes should use the +software if the memory is writeable. Compilers and runtimes should prefer the LSE2/LRCPC3 sequence when available. From f32935316f283c74efedb0bec6b7ca7c9a8d917c Mon Sep 17 00:00:00 2001 From: lukeg101 <6547672+lukeg101@users.noreply.github.com> Date: Mon, 19 Aug 2024 15:23:04 +0100 Subject: [PATCH 07/17] Addresses Ties Feedback --- atomicsabi64/atomicsabi64.rst | 49 ++++++++++++------------- design-documents/atomics-ABI.rst | 63 ++++++++++++++++---------------- 2 files changed, 54 insertions(+), 58 deletions(-) diff --git a/atomicsabi64/atomicsabi64.rst b/atomicsabi64/atomicsabi64.rst index 9530aa25..afc7894a 100644 --- a/atomicsabi64/atomicsabi64.rst +++ b/atomicsabi64/atomicsabi64.rst @@ -4,7 +4,7 @@ See LICENSE file for details .. |release| replace:: 2024Q1 -.. |date-of-issue| replace:: 5\ :sup:`th` July 2024 +.. |date-of-issue| replace:: 19\ :sup:`th` August 2024 .. |copyright-date| replace:: 2024 .. |footer| replace:: Copyright © |copyright-date|, Arm Limited and its affiliates. All rights reserved. @@ -14,6 +14,7 @@ .. _CPPABI64: https://github.com/ARM-software/abi-aa/releases .. _CSTD: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1548.pdf .. _PAPER: https://doi.org/10.1109/CGO57630.2024.10444836 +.. _OOPSLA: https://2024.splashcon.org/track/splash-2024-oopsla#event-overview ********************************************************************************************* C/C++ Atomics Application Binary Interface Standard for the Arm\ :sup:`®` 64-bit Architecture @@ -47,10 +48,9 @@ Abstract This document describes the C/C++ Atomics Application Binary Interface for the Arm 64-bit architecture. This document concerns the valid Mappings from C/C++ -Atomic Operations to sequences of A64 instructions. For matters concerning the -memory model, please consult §B2 of the Arm Architecture Reference Manual -[ARMARM_]. We focus only on a subset of the C11 atomic operations at the time -of writing. +Atomic Operations to sequences of A64 instructions. Regarding the memory model, +please consult §B2 of the Arm Architecture Reference Manual [ARMARM_]. This +document only focusses on a subset of C11 atomic operations. Keywords -------- @@ -76,11 +76,11 @@ on GitHub Acknowledgement --------------- -This document came about in the process of Luke Geeson’s PhD on testing the +This ABI was written as part of Luke Geeson’s PhD on testing the compilation of concurrent C/C++ with assistance from Wilco Dijkstra from Arm's Compiler Teams. -This ABI arises from a paper to appear at OOPSLA 2024: +It is an offshoot from a paper that will be presented at OOPSLA 2024 [OOPSLA_]: *Mix Testing: Specifying and Testing ABI Compatibility Of C/C++ Atomics Implementations* by Luke Geeson, James Brotherston, Wilco Dijkstra, Alastair Donaldson, Lee Smith, Tyler Sorensen, and John Wickerson. @@ -203,7 +203,7 @@ specifications: The content of this specification is a draft, and Arm considers the likelihood of future incompatible changes to be significant. -All content in this document is at the **Alpha** quality level. +All content in this document is at the **Release** quality level. Change History -------------- @@ -218,7 +218,7 @@ changes to the content of the document for that release. +---------+------------------------------+-------------------------------------------------------------------+ | Issue | Date | Change | +=========+==============================+===================================================================+ - | 00alp0 | 5\ :sup:`th` July 2024. | Beta release. | + | 00rel0 | 19\ :sup:`th` August 2024. | Release. | +---------+------------------------------+-------------------------------------------------------------------+ @@ -282,7 +282,7 @@ Thread of Execution Synchronization Operations or other C language statements. The Arm Architecture Reference Manual [ARMARM_] calls these *Observers*. Typically a thread is defined as a function (e.g. a POSIX thread) although we do not - limit threads to such implementations. + limit threads to this type of implementation. Atomic Operation A C/C++ operation on a Shared-Memory Location. Typically either a load, @@ -301,7 +301,7 @@ Concurrent Program Arm-based machines that run the A64 instruction set. Synchronization Operation - The order that atomic operations are executed by each Thread of Execution + The order in which atomic operations are executed by each Thread of Execution may not be the same as the order they are written in the program. Synchronization Operations are statements that constrain the order of accesses made to Shared-Memory Locations by each thread. Synchronization @@ -347,10 +347,10 @@ Overview ======== The C/C++ Atomics ABI for the Arm 64-bit architecture (AABI64) comprises the -following sub-components. +following sub-components: -* The `Mappings from Atomic Operations to Assembly Sequences`_, which defines - the Mappings from C/C++ atomic operations to sto one of more Assembly +* The `Mappings from Atomic Operations to Assembly Sequences`_ defines + the Mappings from C/C++ atomic operations to the Assembly Sequences that are interoperable with respect to each other. * A `Declarative statement of Mappings compatibility`_, as far as @@ -367,13 +367,10 @@ Assembly Sequences. Since there is a large number of ways these Mappings may be combined, we break down the tables by the width of the access, and list compatible Assembly Sequences for each Atomic Operation. -This is an open ABI, we encourage improvements to this specification to be -submitted to the `issue tracker page on +This is an open ABI, we encourage suggestions and improvements to this +specification to be submitted to the `issue tracker page on GitHub `_. -These Mappings are not exhaustive, but aim to cover the atomics we have tested. -Please request more atomics using the issue tracker. - Notational Conventions ---------------------- To reduce repetition, we use the following notational conventions @@ -1125,12 +1122,12 @@ compare-exchange. The result is returned in ``X0`` and ``X1``. We do not list other variants of ``fetch_`` since their Mappings should be -the same (modulo implementations of that are not in scope of this +the same (modulo implementations of that are not in scope for this document). Precisely, implementations that use loops should use the instructions that load or store from memory with the relevant memory order, and the appropriate Assembly Sequence inside the loop. Exceptions, where Assembly -Sequences exist, are stated (for instance ``fetch_or`` can be implemented using -``LDSETP`` when the LSE128 extension is enabled). +Sequences exist, are stated. For instance ``fetch_or`` can be implemented using +``LDSETP`` when the LSE128 extension is enabled. Special Cases ------------- @@ -1157,7 +1154,7 @@ LSE2/LRCPC3 sequence when available. Declarative statement of Mappings compatibility =============================================== -To ensure that the above Mappings are ABI-compatible we tested the compilation of +To ensure that the above Mappings are ABI-compatible we test the compilation of Concurrent Programs, where each Atomic Operation is compiled to one of the aforementioned Mappings. We test if there is a compiled program that exhibits an outcome of execution according to the AArch64 Memory Model contained in §B2 @@ -1168,7 +1165,7 @@ define the process by which we test compatibility. Definition of ABI-Compatibility for Atomic Operations ----------------------------------------------------- -*A compiler that implements the above set of Mappings and special cases is ABI-Compatible with +*A compiler that implement these Mappings and special cases is ABI-Compatible with respect to other compilers that implement the Mappings and special cases.* We impose some constraints on this definition: @@ -1183,10 +1180,10 @@ We impose some constraints on this definition: ABI-Compatibility of Concurrent Programs outside these bounds. * We test Concurrent Programs with a fixed initial state, loop unroll factor (equal to 1 loop unroll), and function calls or recursion. -* The above Mappings are not exhaustive, we recommend that Arm's partners +* The above Mappings are not exhaustive. We recommend that Arm's partners submit requests for other Mappings to the ABI team using the `issue tracker page on GitHub `_. * This document makes no statement about the ABI-Compatibility of optimised - Concurrent Programs, nor does a statement concerning the performance of + Concurrent Programs, nor does it make a statement concerning the performance of compiled programs under the above Mappings when executed on a given Arm-based machine. * This document makes no statement about the ABI-Compatibility of compilers diff --git a/design-documents/atomics-ABI.rst b/design-documents/atomics-ABI.rst index 0b4f890c..f63c461e 100644 --- a/design-documents/atomics-ABI.rst +++ b/design-documents/atomics-ABI.rst @@ -89,7 +89,7 @@ We need a baseline ABI in order to determine if a given sub-ABI respects or depa from the baseline. Adding command-line options is a logical consequence of defining such an ABI, and makes it possible to track ABI compatibility of concurrent programs at compile or link-time, rather than runtime. It is the responsibility of the sub-ABI maintainer to ensure code built -under their ABI does not mix with code built under the baseline. But a baseline must exist, +under their ABI does not mix with code built under the baseline. But a baseline must exist for sub-ABI compatibility to be decided in the first place. A baseline provides the means to describe or contain ABI-islands. Where a compiler implementation @@ -97,12 +97,11 @@ departs from the baseline completely (an ABI-island), it would be the responsibi maintainer of that implementation to ensure their programs are not mixed with programs built for baseline ABI compatibility, or provide adequate warnings at compile time. -Further, numerous different parties have asked the ABI team whether -the same atomics mapping is correct. Writing down the known cases helps engineers -answer these queries without the concurrency expertise required to come up with -current compatible mappings. A future section of the ABI could document common -queries received by the ABI team, in order to assist implementers and engineers -with such issues. +Further, numerous parties have asked the ABI team whether the same atomics mapping is correct. +Writing down the known cases helps engineers answer these queries without the concurrency +expertise required to come up with current compatible mappings. A future section of the ABI +could document common queries received by the ABI team, in order to assist implementers and +engineers with such issues. Backwards Compatibility and New Architecture Features ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -110,7 +109,7 @@ Backwards Compatibility and New Architecture Features Put another way, a baseline ABI assists in deciding whether new mappings are compatible with compiler implementations targeting older versions of the Armv8 architecture. Certain instructions (such as Load/Store-Pair instructions [ARMARM_]) have different -single-copy atomicity guarantees with respect different architecture versions. A baseline +single-copy atomicity guarantees with respect to different architecture versions. A baseline decides which assembly sequences can be composed correctly (at least as far as testing can decide). @@ -119,11 +118,11 @@ Compatibility Between Compilers and Runtimes The above issues also apply when ensuring object files compiled with different compilers can be mixed. For instance LLVM and GCC code should be interoperable. At the time of writing we identified a number of -places where this does not apply, both when compiling to target the same architecture version, and mixing +places where this does not apply, both when compiling to target the same architecture version, and when mixing different (compatible) architecture versions. Further, the above is not limited to statically compiled code. We found -one instance where proposed mappings implemented in a JiT compiler would not be interoperable with respect -to the statically compiled code the runtime links against. Even if a JiT compiles under one set of mappings, -and is not subject to an ABI, it may still depend on other libraries or components that do have an ABI. +one instance where proposed mappings implemented in a JiT compiler would not be interoperable with statically +compiled code the runtime links against. Even if a JiT compiles under one set of mappings, and is not subject to +an ABI, it may still depend on other libraries or components that do have an ABI. Constrain optimisations @@ -168,7 +167,7 @@ possible outcomes (a reference for this notation is found here [PAPER_]):: In this case the value read by the exchange on ``thread_1`` is not used, and a compiler is free to remove references to unused data. It is not legal according -to this ABI for a compliant implementation piler to translate the program into +to this ABI for a compliant implementation to translate the program into the following Assembly Sequences:: thread_0: @@ -204,14 +203,14 @@ Reference Manual [ARMARM_]:: By comparing ``W3`` and the local variable ``r0`` of the original Concurrent Program we see there is one additional outcome of executing the compiled -program that is not an outcome of executing the Concurrent Program. This is due -to the fact that according to the Arm Architecture Reference Manual [ARMARM_] -*instructions where the destination register is WZR or XZR, are not regarded as -doing a read for the purpose of a DMB LD barrier.* +program that is not an outcome of executing the Concurrent Program. The Arm +Architecture Reference Manual [ARMARM_] states that *instructions where the +destination register is WZR or XZR, are not regarded as doing a read for the +purpose of a DMB LD barrier.* In this case the compiler introduces another outcome of Execution. To fix this issue, a compiler is not permitted to rewrite the destination register to be the -zero register in this case:: +zero register:: thread_0: MOV W9,#1 @@ -235,8 +234,8 @@ Reference Manual [ARMARM_]:: { thread_1:r0=1; [y]=2; } As such the unexpected outcome has disappeared. There are multiple Mappings -that exhibit this behaviour, those affected make use of ``SWP`` and ``LD`` -Assembly instructions. +that exhibit this behaviour. Assembly Sequences affected make use of ``SWP`` +and ``LD`` Assembly instructions. Documentation ~~~~~~~~~~~~~ @@ -254,13 +253,13 @@ The Mix Testing Process ABI compatibility must be testable. Concurrency is not trivial, and the ABI presents a simplification of part of the problem that is understandable by -engineers. We provide novel, yet simple, techniques and tools for -testing ABI compatibility. These techniques reduce the difficulty of checking -compatibility from a problem of understanding concurrent executions, to the -familiar testing domain of comparing program outcomes of tests. This document -does not preclude other means of testing compatibility however. +engineers. We provide a simple technique for testing ABI compatibility. +These techniques reduce the difficulty of checking compatibility from a +problem of understanding concurrent executions, to the familiar testing +domain of comparing program outcomes of tests. This document does not +preclude other means of testing compatibility. -We test for Compiler bugs, a Compiler Bug is defined as an outcome of a +We test for Compiler bugs. A Compiler Bug is defined as an outcome of a compiled program execution (under the AArch64 Memory Model contained in §B2 of the Arm Architecture Reference Manual [ARMARM_]) that is not an outcome of execution of the source Concurrent Program (under the @@ -272,9 +271,9 @@ Concurrent Program finishes execution in one of three possible outcomes { thread_0:r0=1, thread_1:r0=0 } { thread_0:r0=1, thread_1:r0=1 } -and one possible compiled program outcome has the following according to the -AArch64 Memory Model contained in §B2 of the Arm Architecture Reference Manual -[ARMARM_]:: +and one possible compiled program outcome has the following outcomes +according to the AArch64 Memory Model contained in §B2 of the Arm +Architecture Reference Manual [ARMARM_]:: { thread_0:X3=0, thread_1:X3=0 } <--- Forbidden by source model, Compiler Bug! { thread_0:X3=0, thread_1:X3=1 } @@ -290,8 +289,8 @@ ensure compatibility we therefore test for the absence of such outcomes of the compiled programs when mixing all combinations of the above Mappings. We define the *Mix Testing* process as follows: -#. Take an arbitrary Concurrent Program, when executed on the C/C++ memory - model will produce outcomes *S*. +#. Take an arbitrary Concurrent Program. When executed on the C/C++ memory + model, it will produce outcomes *S*. #. Split out the individual Atomic Operations from the initial concurrent program into individual source files. #. Compile each individual source file containing an Atomic Operation @@ -303,6 +302,6 @@ the *Mix Testing* process as follows: contained in §B2 of the Arm Architecture Reference Manual [ARMARM_]. Get a *set* of compiled program outcomes *C*. #. If any compiled program set of outcomes *c* in *C* exhibits a Compiler Bug - (Check that *c* is a subset of *S*) with then the given Mappings are not + (Check that *c* is a subset of *S*), the given Mappings are not interoperable. From 7bc4213ae40b1f97436db646d399e342fef79933 Mon Sep 17 00:00:00 2001 From: lukeg101 <6547672+lukeg101@users.noreply.github.com> Date: Mon, 19 Aug 2024 15:34:18 +0100 Subject: [PATCH 08/17] Addresses Sally's Feedback --- atomicsabi64/atomicsabi64.rst | 10 +++++----- design-documents/atomics-ABI.rst | 27 +++++++++++---------------- 2 files changed, 16 insertions(+), 21 deletions(-) diff --git a/atomicsabi64/atomicsabi64.rst b/atomicsabi64/atomicsabi64.rst index afc7894a..dd56e109 100644 --- a/atomicsabi64/atomicsabi64.rst +++ b/atomicsabi64/atomicsabi64.rst @@ -48,9 +48,9 @@ Abstract This document describes the C/C++ Atomics Application Binary Interface for the Arm 64-bit architecture. This document concerns the valid Mappings from C/C++ -Atomic Operations to sequences of A64 instructions. Regarding the memory model, -please consult §B2 of the Arm Architecture Reference Manual [ARMARM_]. This -document only focusses on a subset of C11 atomic operations. +Atomic Operations to sequences of A64 instructions. For further information +on the memory model, refer to §B2 of the Arm Architecture Reference Manual [ARMARM_]. +This document only focusses on a subset of C11 atomic operations. Keywords -------- @@ -243,8 +243,8 @@ This document refers to, or is referred to by, the following documents. -Note: At the time of writing C23 is not released, as such ISO C17 is considered -the latest published document. +Note: At the time of writing, C23 is not released. Therefore, ISO C17 is considered +the most recently published document. .. raw:: pdf diff --git a/design-documents/atomics-ABI.rst b/design-documents/atomics-ABI.rst index f63c461e..3da9d71a 100644 --- a/design-documents/atomics-ABI.rst +++ b/design-documents/atomics-ABI.rst @@ -106,8 +106,7 @@ engineers with such issues. Backwards Compatibility and New Architecture Features ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Put another way, a baseline ABI assists in deciding whether new mappings are compatible -with compiler implementations targeting older versions of the Armv8 architecture. +Put another way, A baseline ABI helps with the decisions of compatibility of new mappings. Certain instructions (such as Load/Store-Pair instructions [ARMARM_]) have different single-copy atomicity guarantees with respect to different architecture versions. A baseline decides which assembly sequences can be composed correctly (at least as far as testing can decide). @@ -119,21 +118,17 @@ Compatibility Between Compilers and Runtimes The above issues also apply when ensuring object files compiled with different compilers can be mixed. For instance LLVM and GCC code should be interoperable. At the time of writing we identified a number of places where this does not apply, both when compiling to target the same architecture version, and when mixing -different (compatible) architecture versions. Further, the above is not limited to statically compiled code. We found -one instance where proposed mappings implemented in a JiT compiler would not be interoperable with statically -compiled code the runtime links against. Even if a JiT compiles under one set of mappings, and is not subject to -an ABI, it may still depend on other libraries or components that do have an ABI. +different (compatible) architecture versions. Further, the above issues are not limited to statically compiled +code. We found one instance where proposed mappings implemented in a JiT compiler would not be interoperable +with statically compiled code the runtime links against. Even if a JiT compiles under one set of mappings, and +is not subject to an ABI, it may still depend on other libraries or components that do have an ABI. Constrain optimisations ~~~~~~~~~~~~~~~~~~~~~~~ -There have been several instances where optimisations have been incorrectly applied, -or attempts to apply optimisations to atomic code generation that induce unexpected -concurrent program behaviour. This has happened frequently enough that we need to -collect these cases together to outline why they should not occur. For example - -Consider the following Concurrent Program:: +The frequency of this behaviour justifies collecting these cases together to outline why they should not occur. +For example, consider the following Concurrent Program:: // Shared-Memory Locations _Atomic int* x; @@ -203,10 +198,10 @@ Reference Manual [ARMARM_]:: By comparing ``W3`` and the local variable ``r0`` of the original Concurrent Program we see there is one additional outcome of executing the compiled -program that is not an outcome of executing the Concurrent Program. The Arm -Architecture Reference Manual [ARMARM_] states that *instructions where the -destination register is WZR or XZR, are not regarded as doing a read for the -purpose of a DMB LD barrier.* +program that is not an outcome of executing the Concurrent Program. This is +because the Arm Architecture Reference Manual [ARMARM_] states that +*instructions where the destination register is WZR or XZR, are not regarded +as doing a read for the purpose of a DMB LD barrier.* In this case the compiler introduces another outcome of Execution. To fix this issue, a compiler is not permitted to rewrite the destination register to be the From 9ac36144e2e2e9f3a97cece16b6c59805516740e Mon Sep 17 00:00:00 2001 From: Wilco Dijkstra Date: Mon, 19 Aug 2024 15:38:26 +0100 Subject: [PATCH 09/17] Cleanup Terms and Abbreviations. --- atomicsabi64/atomicsabi64.rst | 78 +++++++++-------------------------- 1 file changed, 19 insertions(+), 59 deletions(-) diff --git a/atomicsabi64/atomicsabi64.rst b/atomicsabi64/atomicsabi64.rst index dd56e109..b241c2c4 100644 --- a/atomicsabi64/atomicsabi64.rst +++ b/atomicsabi64/atomicsabi64.rst @@ -48,9 +48,8 @@ Abstract This document describes the C/C++ Atomics Application Binary Interface for the Arm 64-bit architecture. This document concerns the valid Mappings from C/C++ -Atomic Operations to sequences of A64 instructions. For further information +Atomic Operations to sequences of AArch64 instructions. For further information on the memory model, refer to §B2 of the Arm Architecture Reference Manual [ARMARM_]. -This document only focusses on a subset of C11 atomic operations. Keywords -------- @@ -256,9 +255,6 @@ Terms and Abbreviations The C/C++ Atomics ABI for the Arm 64-bit Architecture uses the following terms and abbreviations. -A64 - The instruction set available when in AArch64 state. - AArch64 The 64-bit general-purpose register width state of the Armv8 architecture. @@ -277,67 +273,32 @@ ABI Arm-based ... based on the Arm architecture ... -Thread of Execution - A unit of computation that executes one or more Atomic Operations, - Synchronization Operations or other C language statements. The Arm - Architecture Reference Manual [ARMARM_] calls these *Observers*. Typically a - thread is defined as a function (e.g. a POSIX thread) although we do not - limit threads to this type of implementation. +Thread + A unit of computation (e.g. a POSIX thread) of a process, managed by the OS. Atomic Operation - A C/C++ operation on a Shared-Memory Location. Typically either a load, - store, exchange, compare, or arithmetic instruction (such as a fetch and add - operation). Atomics are used to define higher level primitives including - locks and concurrent queues. ISO C defines the range of supported atomic - operations and the ``atomic`` type. Operations on atomic-qualified data are - guaranteed not to be interrupted by another Thread of Execution. + An indivisble operation on a memory location. This can be a load, store, + exchange, compare, or arithmetic operation. Atomics may be used to define + higher level primitives including locks and concurrent queues. ISO C/C++ + defines a range of supported atomic types and operations. Concurrent Program - A C or C++ program that consists of one or more Threads of Execution. Each - Thread of Execution must communicate with other threads in the Concurrent - Program through Shared-Memory Locations, using both Atomic Operations and - Non-Atomic Operations (Operations that lack the atomic qualifier) to be - deemed *concurrent*. This document focuses on compiling such programs for - Arm-based machines that run the A64 instruction set. - -Synchronization Operation - The order in which atomic operations are executed by each Thread of Execution - may not be the same as the order they are written in the program. - Synchronization Operations are statements that constrain the order of - accesses made to Shared-Memory Locations by each thread. Synchronization - Operations include Thread Fences. - -Shared-Memory Location - A memory location that can be accessed by any Thread of Execution in the - program. + A C or C++ program that consists of one or more threads. Threads may + communicate with each other through memory locations, using both Atomic + Operations and standard memory accesses. Memory Order Parameter - Describes a constraint on an Atomic Operation or Synchronization Operation. - Memory Order describes how memory accesses made by Atomic Operations may be - ordered with respect to other Atomic Operations and Synchronization - Operations. ISO C defines a ``memory_order`` enum type to capture the - possible memory order parameters. - -Thread Fence - A Thread Fence is a Synchronization Operation that constrains the order of - Accesses made by Atomic Operations on a given Thread of Execution. Fences - are equipped with a Memory Order Parameter that specifies which kinds of - accesses may be reordered before or after the fence. ISO C defines the - ``atomic_thread_fence`` to synchronize the order of accesses made by atomic - operations on ``_Atomic`` qualified data. + The order of memory accesses as executed by each thread may not be the same + as the order they are written in the program. The Memory Order describes + how memory accesses are ordered with respect to other memory accesses or + Atomic Operations. ISO C/C++ defines a ``memory_order`` enum type for the set + of memory orders. Assembly Sequence - A sequence of A64 instructions, optionally including Atomic Instructions. + A sequence of AArch64 instructions. Mapping - A Mapping takes an Atomic Operation and Compiler Profile as input, - producing an Assembly Sequence as output. - -Compiler Profile - A Compiler implementation and command-line flags or attributes that use - Mappings. - -More specific terminology is defined when it is first used. + A Mapping from an Atomic Operation to an Assembly Sequence. .. raw:: pdf @@ -1174,9 +1135,8 @@ We impose some constraints on this definition: bounded testing. C/C++ Atomics ABI-compatibility is thus tested for the Mappings above by generating C/C++ Concurrent Programs that permute combinations of Atomic Operations on each Thread of Execution. We bound our test size between - 2 and 5 Threads of Execution, where each Thread has at least 1 Atomic - Operation or Synchronization Operation and at most 5 Atomic Operations or - Synchronization Operations. We do not make any statement about the + 2 and 5 threads, where each thread has at least 1 atomic operation or fence and + at most 5 atomic operations or fences. We do not make any statement about the ABI-Compatibility of Concurrent Programs outside these bounds. * We test Concurrent Programs with a fixed initial state, loop unroll factor (equal to 1 loop unroll), and function calls or recursion. From 13860e3de3840ad81fd83ae53ee65e3b24f6ac0c Mon Sep 17 00:00:00 2001 From: Wilco Dijkstra Date: Mon, 19 Aug 2024 16:35:59 +0100 Subject: [PATCH 10/17] More cleanups and changes from review comments. --- atomicsabi64/atomicsabi64.rst | 225 +++++++++++----------------------- 1 file changed, 72 insertions(+), 153 deletions(-) diff --git a/atomicsabi64/atomicsabi64.rst b/atomicsabi64/atomicsabi64.rst index b241c2c4..fe559350 100644 --- a/atomicsabi64/atomicsabi64.rst +++ b/atomicsabi64/atomicsabi64.rst @@ -15,6 +15,7 @@ .. _CSTD: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1548.pdf .. _PAPER: https://doi.org/10.1109/CGO57630.2024.10444836 .. _OOPSLA: https://2024.splashcon.org/track/splash-2024-oopsla#event-overview +.. _RATIONALE: https://github.com/ARM-software/abi-aa/design-documents/atomics-ABI.rst ********************************************************************************************* C/C++ Atomics Application Binary Interface Standard for the Arm\ :sup:`®` 64-bit Architecture @@ -47,9 +48,9 @@ Abstract -------- This document describes the C/C++ Atomics Application Binary Interface for the -Arm 64-bit architecture. This document concerns the valid Mappings from C/C++ -Atomic Operations to sequences of AArch64 instructions. For further information -on the memory model, refer to §B2 of the Arm Architecture Reference Manual [ARMARM_]. +Arm 64-bit architecture. This document lists the valid Mappings from C/C++ +Atomic Operations to sequences of AArch64 instructions. For further information +on the memory model, refer to §B2 of the Arm Architecture Reference Manual [ARMARM_]. Keywords -------- @@ -219,7 +220,7 @@ changes to the content of the document for that release. +=========+==============================+===================================================================+ | 00rel0 | 19\ :sup:`th` August 2024. | Release. | +---------+------------------------------+-------------------------------------------------------------------+ - + References ---------- @@ -237,14 +238,14 @@ This document refers to, or is referred to by, the following documents. +-------------+--------------------------------------------------------------+-----------------------------------------------------------------------------+ | AAELF64_ | ELF for the Arm 64-bit Architecture (AArch64) | ELF for the Arm 64-bit Architecture (AArch64) | +-------------+--------------------------------------------------------------+-----------------------------------------------------------------------------+ + | CPPABI64_ | C++ ABI for the Arm 64-bit Architecture (AArch64) | C++ ABI for the Arm 64-bit Architecture (AArch64) | + +-------------+--------------------------------------------------------------+-----------------------------------------------------------------------------+ + | RATIONALE_ | Rationale Document for C11 Atomics ABI | Rationale Document for C11 Atomics ABI | + +-------------+--------------------------------------------------------------+-----------------------------------------------------------------------------+ | PAPER_ | CGO paper | Compiler Testing with Relaxed Memory Models | +-------------+--------------------------------------------------------------+-----------------------------------------------------------------------------+ - -Note: At the time of writing, C23 is not released. Therefore, ISO C17 is considered -the most recently published document. - .. raw:: pdf PageBreak @@ -294,11 +295,8 @@ Memory Order Parameter Atomic Operations. ISO C/C++ defines a ``memory_order`` enum type for the set of memory orders. -Assembly Sequence - A sequence of AArch64 instructions. - Mapping - A Mapping from an Atomic Operation to an Assembly Sequence. + A Mapping from an Atomic Operation to a sequence of AArch64 instructions. .. raw:: pdf @@ -307,78 +305,22 @@ Mapping Overview ======== -The C/C++ Atomics ABI for the Arm 64-bit architecture (AABI64) comprises the -following sub-components: - -* The `Mappings from Atomic Operations to Assembly Sequences`_ defines - the Mappings from C/C++ atomic operations to the Assembly - Sequences that are interoperable with respect to each other. +`AArch64 atomics`_ defines the Mappings from C/C++ atomic operations +to AArch64 that are interoperable. -* A `Declarative statement of Mappings compatibility`_, as far as - non-exhaustive testing can validate, that the aforementioned Mappings can be - used together. That is, there is no tested combination of Mappings that - induces unexpected program behaviour when a compiled program that uses - atomics is executed on a multi-core Arm-based machine. +Arbitrary registers may be used in the Mappings. Instructions marked with ``*`` +in the tables cannot use ``WZR`` or ``XZR`` as a destination register. This is +further detailed in `Special Cases`_. -Mappings from Atomic Operations to Assembly Sequences -===================================================== +Only some variants of ``fetch_`` are listed since the Mappings are identical +except for a different ````. -We now describe the compatible Mappings for C/C++ Atomic Operations and -Assembly Sequences. Since there is a large number of ways these Mappings may be -combined, we break down the tables by the width of the access, and list -compatible Assembly Sequences for each Atomic Operation. - -This is an open ABI, we encourage suggestions and improvements to this -specification to be submitted to the `issue tracker page on -GitHub `_. - -Notational Conventions ----------------------- -To reduce repetition, we use the following notational conventions - -.. table:: - - +-----------------------------------------+--------------------------------------+ - | Memory Order Parameter | Notation | - +=========================================+======================================+ - | ``memory_order_relaxed`` | ``relaxed`` | - +-----------------------------------------+--------------------------------------+ - | ``memory_order_acquire`` | ``acquire`` | - +-----------------------------------------+--------------------------------------+ - | ``memory_order_release`` | ``release`` | - +-----------------------------------------+--------------------------------------+ - | ``memory_order_acq_rel`` | ``acq_rel`` | - +-----------------------------------------+--------------------------------------+ - | ``memory_order_seq_cst`` | ``seq_cst`` | - +-----------------------------------------+--------------------------------------+ - -Arbitrary registers may be used in the Assembly Sequences that may change in -compiler implementations. Cases where arbitrary registers may *not* be used are -covered in the Special Cases section. - -Further, in what follows there may be multiple valid Mappings from Atomic -Operation to Assembly Sequence, as made available by a given architecture -extension. In this case we split the rows of the table to represent multiple -options. - -.. table:: - - +--------------------------------------------------------+--------------------------------------+ - | Atomic Operation | Assembly Sequence | - +============================================+===========+======================================+ - | ``atomic_store_explicit(loc,val,relaxed)`` | ARCH1 | ``option A`` | - + +-----------+--------------------------------------+ - | | ARCH2 | ``option B`` | - +--------------------------------------------+-----------+--------------------------------------+ - -Where ARCH is either the base architecture (Armv8-A) or an extension like FEAT_LSE. - -Lastly, all operations are in a shorthand form: +Atomic operations and Memory Order are abbreviated as follows: .. table:: +----------------------------------------------------+--------------------------------------+ - | Atomic Operation | ShortHand Atomic Operation | + | Atomic Operation | Short form | +====================================================+======================================+ | ``atomic_store_explicit(...)`` | ``store(...)`` | +----------------------------------------------------+--------------------------------------+ @@ -388,17 +330,54 @@ Lastly, all operations are in a shorthand form: +----------------------------------------------------+--------------------------------------+ | ``atomic_exchange_explicit(...)`` | ``exchange(...)`` | +----------------------------------------------------+--------------------------------------+ - | ``atomic_fetch_add_explicit(...)`` | ``fetch_add(...)`` | + | ``atomic_fetch_add_explicit(...)`` | ``fetch_add(...)`` | +----------------------------------------------------+--------------------------------------+ - | ``atomic_fetch_sub_explicit(...)`` | ``fetch_sub(...)`` | + | ``atomic_fetch_sub_explicit(...)`` | ``fetch_sub(...)`` | +----------------------------------------------------+--------------------------------------+ - | ``atomic_fetch_or_explicit(...)`` | ``fetch_or(...)`` | + | ``atomic_fetch_or_explicit(...)`` | ``fetch_or(...)`` | +----------------------------------------------------+--------------------------------------+ - | ``atomic_fetch_xor_explicit(...)`` | ``fetch_xor(...)`` | + | ``atomic_fetch_xor_explicit(...)`` | ``fetch_xor(...)`` | +----------------------------------------------------+--------------------------------------+ - | ``atomic_fetch_and_explicit(...)`` | ``fetch_and(...)`` | + | ``atomic_fetch_and_explicit(...)`` | ``fetch_and(...)`` | +----------------------------------------------------+--------------------------------------+ +.. table:: + + +----------------------------------------------------+--------------------------------------+ + | Memory Order Parameter | Short form | + +====================================================+======================================+ + | ``memory_order_relaxed`` | ``relaxed`` | + +----------------------------------------------------+--------------------------------------+ + | ``memory_order_acquire`` | ``acquire`` | + +----------------------------------------------------+--------------------------------------+ + | ``memory_order_release`` | ``release`` | + +----------------------------------------------------+--------------------------------------+ + | ``memory_order_acq_rel`` | ``acq_rel`` | + +----------------------------------------------------+--------------------------------------+ + | ``memory_order_seq_cst`` | ``seq_cst`` | + +----------------------------------------------------+--------------------------------------+ + +If there are multiple Mappings for an Atomic Operation, the rows of the table +show the options: + +.. table:: + + +----------------------------------------------------+--------------------------------------+ + | Atomic Operation | AArch64 | + +========================================+===========+======================================+ + | ``store(loc,val,relaxed)`` | ARCH1 | ``option A`` | + + +-----------+--------------------------------------+ + | | ARCH2 | ``option B`` | + +----------------------------------------+-----------+--------------------------------------+ + +Where ARCH is either the base architecture (Armv8-A) or an extension like FEAT_LSE. + + +Suggestions and improvements to this specification may be submitted to: +`issue tracker page on GitHub `_. + +AArch64 atomics +=============== Mappings for 32-bit types ------------------------- @@ -410,7 +389,7 @@ returned in ``W0``. .. table:: +-----------------------------------------------------+--------------------------------------+ - | Atomic Operation | Assembly Sequence | + | Atomic Operation | AArch64 | +=====================================================+======================================+ | ``store(loc,val,relaxed)`` | .. code-block:: none | | | | @@ -602,17 +581,13 @@ returned in ``W0``. | | | | | | | CASAL W0, W2, [X1] * | +-------------------------------------+---------------+--------------------------------------+ - | Note | - +--------------------------------------------------------------------------------------------+ - | ``*`` Using ``WZR`` or ``XZR`` for the destination register is invalid (Section 4.7). | - +--------------------------------------------------------------------------------------------+ Mappings for 8-bit types ------------------------ The Mappings for 8-bit types are the same as 32-bit types except they use the -``B`` variants of instructions. +``B`` variants of instructions. Mappings for 16-bit types @@ -634,14 +609,14 @@ Since the access width of 128-bit types is double that of the 64-bit register width, the following Mappings use *pair* instructions, which require their own table. -In what follows, register ``X4`` contains the location ``loc``, ``X2`` and +In what follows, register ``X4`` contains the location ``loc``, ``X2`` and ``X3`` contain the input value ``val``. ``X0`` and ``X1`` contain input ``exp`` in compare-exchange. The result is returned in ``X0`` and ``X1``. .. table:: +-----------------------------------------------------+--------------------------------------+ - | Atomic Operation | Assembly Sequence | + | Atomic Operation | AArch64 | +=====================================+===============+======================================+ | ``store(loc,val,relaxed)`` | ``Armv8-A`` | .. code-block:: none | | | | | @@ -1080,80 +1055,24 @@ compare-exchange. The result is returned in ``X0`` and ``X1``. - - -We do not list other variants of ``fetch_`` since their Mappings should be -the same (modulo implementations of that are not in scope for this -document). Precisely, implementations that use loops should use the instructions -that load or store from memory with the relevant memory order, and the -appropriate Assembly Sequence inside the loop. Exceptions, where Assembly -Sequences exist, are stated. For instance ``fetch_or`` can be implemented using -``LDSETP`` when the LSE128 extension is enabled. - Special Cases -------------- +============= Read-Modify-Write atomics must not use the zero register -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +-------------------------------------------------------- ``CAS``, ``SWP`` and ``LD`` instructions must not use the zero register if the result is not used since it allows reordering of the read past a -``DMB ISHLD`` barrier. Affected instructions are marked with ``*`` in section 4.2. +``DMB ISHLD`` barrier. Affected instructions are marked with ``*``. -Const-Qualified 128-bit Atomic Loads Should Be Marked Mutable -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +Const-Qualified 128-bit Atomic Loads +------------------------------------ Const-qualified data containing 128-bit atomic types should not be placed in read-only memory (such as the ``.rodata`` section). -Before LSE2, the only way to implement a single-copy 128-bit atomic load +Before FEAT_LSE2, the only way to implement a single-copy 128-bit atomic load is by using a Read-Modify-Write sequence. The write is not visible to software if the memory is writeable. Compilers and runtimes should prefer the -LSE2/LRCPC3 sequence when available. - - -Declarative statement of Mappings compatibility -=============================================== - -To ensure that the above Mappings are ABI-compatible we test the compilation of -Concurrent Programs, where each Atomic Operation is compiled to one of the -aforementioned Mappings. We test if there is a compiled program that exhibits -an outcome of execution according to the AArch64 Memory Model contained in §B2 -of the Arm Architecture Reference Manual [ARMARM_] that is not an outcome of -execution of the source program under the ISO C model. In this section we -define the process by which we test compatibility. - -Definition of ABI-Compatibility for Atomic Operations ------------------------------------------------------ - -*A compiler that implement these Mappings and special cases is ABI-Compatible with -respect to other compilers that implement the Mappings and special cases.* - -We impose some constraints on this definition: - -* This is not a correctness guarantee, but rather a statement backed up by - bounded testing. C/C++ Atomics ABI-compatibility is thus tested for the Mappings - above by generating C/C++ Concurrent Programs that permute combinations of - Atomic Operations on each Thread of Execution. We bound our test size between - 2 and 5 threads, where each thread has at least 1 atomic operation or fence and - at most 5 atomic operations or fences. We do not make any statement about the - ABI-Compatibility of Concurrent Programs outside these bounds. -* We test Concurrent Programs with a fixed initial state, loop unroll factor - (equal to 1 loop unroll), and function calls or recursion. -* The above Mappings are not exhaustive. We recommend that Arm's partners - submit requests for other Mappings to the ABI team using the `issue tracker page on GitHub `_. -* This document makes no statement about the ABI-Compatibility of optimised - Concurrent Programs, nor does it make a statement concerning the performance of - compiled programs under the above Mappings when executed on a given Arm-based - machine. -* This document makes no statement about the ABI-Compatibility of compilers - that implement Mappings other than what is stated in this document. - -Appendix: Mix Testing -===================== - -The status of this appendix is informative. - - - +FEAT_LSE2/FEAT_LRCPC3 sequence when available. From caf08b8ca5aa13209e0edffbc8d0f9e2347cc6a2 Mon Sep 17 00:00:00 2001 From: Wilco Dijkstra Date: Thu, 22 Aug 2024 15:18:21 +0100 Subject: [PATCH 11/17] Further cleanups, split off fences. --- atomicsabi64/atomicsabi64.rst | 85 +++++++++++++++++++---------------- 1 file changed, 47 insertions(+), 38 deletions(-) diff --git a/atomicsabi64/atomicsabi64.rst b/atomicsabi64/atomicsabi64.rst index fe559350..ce03c411 100644 --- a/atomicsabi64/atomicsabi64.rst +++ b/atomicsabi64/atomicsabi64.rst @@ -48,7 +48,7 @@ Abstract -------- This document describes the C/C++ Atomics Application Binary Interface for the -Arm 64-bit architecture. This document lists the valid Mappings from C/C++ +Arm 64-bit architecture. This document lists the valid mappings from C/C++ Atomic Operations to sequences of AArch64 instructions. For further information on the memory model, refer to §B2 of the Arm Architecture Reference Manual [ARMARM_]. @@ -296,7 +296,7 @@ Memory Order Parameter of memory orders. Mapping - A Mapping from an Atomic Operation to a sequence of AArch64 instructions. + A mapping from an Atomic Operation to a sequence of AArch64 instructions. .. raw:: pdf @@ -305,14 +305,14 @@ Mapping Overview ======== -`AArch64 atomics`_ defines the Mappings from C/C++ atomic operations +`AArch64 atomic mappings`_ defines the mappings from C/C++ atomic operations to AArch64 that are interoperable. -Arbitrary registers may be used in the Mappings. Instructions marked with ``*`` +Arbitrary registers may be used in the mappings. Instructions marked with ``*`` in the tables cannot use ``WZR`` or ``XZR`` as a destination register. This is further detailed in `Special Cases`_. -Only some variants of ``fetch_`` are listed since the Mappings are identical +Only some variants of ``fetch_`` are listed since the mappings are identical except for a different ````. Atomic operations and Memory Order are abbreviated as follows: @@ -357,7 +357,7 @@ Atomic operations and Memory Order are abbreviated as follows: | ``memory_order_seq_cst`` | ``seq_cst`` | +----------------------------------------------------+--------------------------------------+ -If there are multiple Mappings for an Atomic Operation, the rows of the table +If there are multiple mappings for an Atomic Operation, the rows of the table show the options: .. table:: @@ -376,11 +376,34 @@ Where ARCH is either the base architecture (Armv8-A) or an extension like FEAT_L Suggestions and improvements to this specification may be submitted to: `issue tracker page on GitHub `_. -AArch64 atomics -=============== -Mappings for 32-bit types -------------------------- + +AArch64 atomic mappings +======================= + +Synchronization Fences +---------------------- + + +-----------------------------------------------------+--------------------------------------+ + | Fence | AArch64 | + +=====================================================+======================================+ + | ``atomic_thread_fence(relaxed)`` | .. code-block:: none | + | | | + | | NOP | + +-----------------------------------------------------+--------------------------------------+ + | ``atomic_thread_fence(acquire)`` | .. code-block:: none | + | | | + | | DMB ISHLD | + +-----------------------------------------------------+--------------------------------------+ + | ``atomic_thread_fence(release)`` | .. code-block:: none | + | | | + | ``atomic_thread_fence(acq_rel)`` | DMB ISH | + | | | + | ``atomic_thread_fence(seq_cst)`` | | + +-------------------------------------+---------------+--------------------------------------+ + +32-bit types +------------ In what follows, register ``X1`` contains the location ``loc`` and ``W2`` contains ``val``. ``W0`` contains input ``exp`` in compare-exchange. The result is @@ -414,20 +437,6 @@ returned in ``W0``. | ``load(loc,seq_cst)`` | .. code-block:: none | | | | | | LDAR W2, [X1] | - +-----------------------------------------------------+--------------------------------------+ - | ``fence(relaxed)`` | .. code-block:: none | - | | | - | | NOP | - +-----------------------------------------------------+--------------------------------------+ - | ``fence(acquire)`` | .. code-block:: none | - | | | - | | DMB ISHLD | - +-----------------------------------------------------+--------------------------------------+ - | ``fence(release)`` | .. code-block:: none | - | | | - | ``fence(acq_rel)`` | DMB ISH | - | | | - | ``fence(seq_cst)`` | | +-------------------------------------+---------------+--------------------------------------+ | ``exchange(loc,val,relaxed)`` | ``Armv8-A`` | .. code-block:: none | | | | | @@ -583,30 +592,30 @@ returned in ``W0``. +-------------------------------------+---------------+--------------------------------------+ -Mappings for 8-bit types ------------------------- +8-bit types +----------- -The Mappings for 8-bit types are the same as 32-bit types except they use the +The mappings for 8-bit types are the same as 32-bit types except they use the ``B`` variants of instructions. -Mappings for 16-bit types -------------------------- +16-bit types +------------ -The Mappings for 16-bit types are the same as 32-bit types except they use the +The mappings for 16-bit types are the same as 32-bit types except they use the ``H`` variants of instructions. -Mappings for 64-bit types -------------------------- +64-bit types +------------ -The Mappings for 64-bit types are the same as 32-bit types except the registers +The mappings for 64-bit types are the same as 32-bit types except the registers used are X-registers. -Mappings for 128-bit types --------------------------- +128-bit types +------------- Since the access width of 128-bit types is double that of the 64-bit register -width, the following Mappings use *pair* instructions, which require their own +width, the following mappings use *pair* instructions, which require their own table. In what follows, register ``X4`` contains the location ``loc``, ``X2`` and @@ -1058,8 +1067,8 @@ compare-exchange. The result is returned in ``X0`` and ``X1``. Special Cases ============= -Read-Modify-Write atomics must not use the zero register --------------------------------------------------------- +Unused result in Read-Modify-Write atomics +------------------------------------------ ``CAS``, ``SWP`` and ``LD`` instructions must not use the zero register if the result is not used since it allows reordering of the read past a From 42cf2978820325ad969841eee4a69100665aaf12 Mon Sep 17 00:00:00 2001 From: lukeg101 <6547672+lukeg101@users.noreply.github.com> Date: Wed, 28 Aug 2024 16:33:47 +0100 Subject: [PATCH 12/17] Ties Feedback on phrasing --- design-documents/atomics-ABI.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/design-documents/atomics-ABI.rst b/design-documents/atomics-ABI.rst index 3da9d71a..e8a44949 100644 --- a/design-documents/atomics-ABI.rst +++ b/design-documents/atomics-ABI.rst @@ -266,7 +266,7 @@ Concurrent Program finishes execution in one of three possible outcomes { thread_0:r0=1, thread_1:r0=0 } { thread_0:r0=1, thread_1:r0=1 } -and one possible compiled program outcome has the following outcomes +and one compiled program execution run has the following possible outcomes according to the AArch64 Memory Model contained in §B2 of the Arm Architecture Reference Manual [ARMARM_]:: From 7ba0888c2d3d77f905ad885ce6997d85d7335b2c Mon Sep 17 00:00:00 2001 From: lukeg101 <6547672+lukeg101@users.noreply.github.com> Date: Wed, 28 Aug 2024 16:58:42 +0100 Subject: [PATCH 13/17] peter feedback --- design-documents/atomics-ABI.rst | 55 +++++++++++++++++++++----------- 1 file changed, 36 insertions(+), 19 deletions(-) diff --git a/design-documents/atomics-ABI.rst b/design-documents/atomics-ABI.rst index e8a44949..8f7725b9 100644 --- a/design-documents/atomics-ABI.rst +++ b/design-documents/atomics-ABI.rst @@ -5,10 +5,26 @@ .. _ARMARM: https://developer.arm.com/documentation/ddi0487/latest .. _PAPER: https://doi.org/10.1109/CGO57630.2024.10444836 +.. _ATOMICS64: https://github.com/ARM-software/abi-aa/atomicsabi64/atomicsabi64.rst Rationale Document for C11 Atomics ABI. *************************************** +Scope +===== + +This document contains the design rationale for C/C++ Atomics Application +Binary Interface Standard for the Arm\ :sup:`®` 64-bit Architecture +defined in (ATOMICS64_). Nothing in this document +is part of the specification. The purpose is to record the rationale +for the specification as well as alternatives that were considered. +Any contradictions between this rationale and the specification shall +be resolved in favor of the specification. + +This document assumes that the reader is familiar with (ATOMICS64_) +and the 32-bit build attributes defined in (ATOMICS64_) and will use +concepts defined in these documents. + Preamble ======== @@ -24,19 +40,19 @@ make: - We need to choose a baseline ABI (a set of mappings), that is compatible for all versions of the Armv8 architecture. - The mappings should cover atomic accesses of various sign, size, and type accessible through C11 atomic operations using compiler profiles. -The main trade-offs we have identified or have been made aware of are: +We have identified the following trade-offs: - Performance of different mappings versus compatibility with all architectures. - Whether certain compiler operations lead to unexpected behaviours. As motivated by the use cases expanded upon below: -- The need for a baseline ABI -- Knowing when an implementation departs from that baseline -- Backwards compatibility of atomics as new mappings are added -- Compatibility between compilers and runtimes -- The need to constrain optimisations on specific atomic operations -- Documenting the interoperable mappings +- The need for a baseline ABI. +- Knowing when an implementation departs from that baseline. +- Backwards compatibility of atomics as new mappings are added. +- Compatibility between compilers and runtimes. +- The need to constrain optimisations on specific atomic operations. +- Documenting the interoperable mappings. - providing a basis upon which ABI compatibility can be tested. References @@ -59,18 +75,18 @@ This document refers to, or is referred to by, the following documents. Note: At the time of writing C23 is not released, as such ISO C17 is considered the latest published document. -Use-cases known of so far -------------------------- +Known use-cases +--------------- A Baseline: Describing current implementations ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -The ABI we provide is a baseline specification that compilers should or do implement. -The ABI provides a grounds to be compatible across all versions of the Armv8 architecture. Most -of the mappings in the ABI are already implemented in LLVM and GCC and this ABI ratifies -a decade of established practice, and provides alternatives where the current practice -is incompatible. +The ABI we provide is a baseline specification that compilers should implement. +Compilers that implement the baseline specification are compatible across all versions +of the Armv8 architecture. Most of the mappings in the ABI are already implemented in +LLVM and GCC and this ABI ratifies a decade of established practice, and provides +alternatives where the current practice is incompatible. Sub-ABIs and ABI-islands: Departing from the baseline (or 'mainland') @@ -88,14 +104,15 @@ unintentionally introduced into compilers when new mappings are added. We need a baseline ABI in order to determine if a given sub-ABI respects or departs from the baseline. Adding command-line options is a logical consequence of defining such an ABI, and makes it possible to track ABI compatibility of concurrent programs at compile or link-time, -rather than runtime. It is the responsibility of the sub-ABI maintainer to ensure code built +rather than runtime. It is the responsibility of the sub-ABI user to ensure code built under their ABI does not mix with code built under the baseline. But a baseline must exist for sub-ABI compatibility to be decided in the first place. -A baseline provides the means to describe or contain ABI-islands. Where a compiler implementation -departs from the baseline completely (an ABI-island), it would be the responsibility of the -maintainer of that implementation to ensure their programs are not mixed with programs built for -baseline ABI compatibility, or provide adequate warnings at compile time. +Where a compiler implementation departs from the baseline completely (an ABI-island), +Arm cannot provide any statement on the compatibility of the extensions with respect +to the baseline specification. In the ABI-island, which could be a known incompatibility +with the base-line then users should not mix ABIs. It is QoI whether a toolchain is +able to diagnose incompatibility. Further, numerous parties have asked the ABI team whether the same atomics mapping is correct. Writing down the known cases helps engineers answer these queries without the concurrency From 0e385573d086883c20c5c0b285e1512897ee6f36 Mon Sep 17 00:00:00 2001 From: lukeg101 <6547672+lukeg101@users.noreply.github.com> Date: Wed, 28 Aug 2024 17:02:25 +0100 Subject: [PATCH 14/17] peter feedback --- design-documents/atomics-ABI.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/design-documents/atomics-ABI.rst b/design-documents/atomics-ABI.rst index 8f7725b9..902c4467 100644 --- a/design-documents/atomics-ABI.rst +++ b/design-documents/atomics-ABI.rst @@ -116,7 +116,7 @@ able to diagnose incompatibility. Further, numerous parties have asked the ABI team whether the same atomics mapping is correct. Writing down the known cases helps engineers answer these queries without the concurrency -expertise required to come up with current compatible mappings. A future section of the ABI +expertise required to come up with current compatible mappings. A future section of this document could document common queries received by the ABI team, in order to assist implementers and engineers with such issues. From 93734e8b96db7b7d7419769d13df1c1533a58d5c Mon Sep 17 00:00:00 2001 From: lukeg101 <6547672+lukeg101@users.noreply.github.com> Date: Thu, 29 Aug 2024 11:47:32 +0100 Subject: [PATCH 15/17] alpha release --- atomicsabi64/atomicsabi64.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/atomicsabi64/atomicsabi64.rst b/atomicsabi64/atomicsabi64.rst index ce03c411..5e769ffd 100644 --- a/atomicsabi64/atomicsabi64.rst +++ b/atomicsabi64/atomicsabi64.rst @@ -218,7 +218,7 @@ changes to the content of the document for that release. +---------+------------------------------+-------------------------------------------------------------------+ | Issue | Date | Change | +=========+==============================+===================================================================+ - | 00rel0 | 19\ :sup:`th` August 2024. | Release. | + | 00alp0 | 19\ :sup:`th` August 2024. | Alpha Release. | +---------+------------------------------+-------------------------------------------------------------------+ From e5076e3cfbe87c7fef17f8f1f4e1fba7cfa38451 Mon Sep 17 00:00:00 2001 From: lukeg101 <6547672+lukeg101@users.noreply.github.com> Date: Thu, 29 Aug 2024 12:31:02 +0100 Subject: [PATCH 16/17] Ties II --- atomicsabi64/atomicsabi64.rst | 2 +- design-documents/atomics-ABI.rst | 28 +++++++++++++++------------- 2 files changed, 16 insertions(+), 14 deletions(-) diff --git a/atomicsabi64/atomicsabi64.rst b/atomicsabi64/atomicsabi64.rst index 5e769ffd..20ae47bd 100644 --- a/atomicsabi64/atomicsabi64.rst +++ b/atomicsabi64/atomicsabi64.rst @@ -373,7 +373,7 @@ show the options: Where ARCH is either the base architecture (Armv8-A) or an extension like FEAT_LSE. -Suggestions and improvements to this specification may be submitted to: +Suggestions and improvements to this specification may be submitted to the: `issue tracker page on GitHub `_. diff --git a/design-documents/atomics-ABI.rst b/design-documents/atomics-ABI.rst index 902c4467..2f12ecdd 100644 --- a/design-documents/atomics-ABI.rst +++ b/design-documents/atomics-ABI.rst @@ -15,14 +15,14 @@ Scope This document contains the design rationale for C/C++ Atomics Application Binary Interface Standard for the Arm\ :sup:`®` 64-bit Architecture -defined in (ATOMICS64_). Nothing in this document +defined in ATOMICS64_. Nothing in this document is part of the specification. The purpose is to record the rationale for the specification as well as alternatives that were considered. Any contradictions between this rationale and the specification shall be resolved in favor of the specification. -This document assumes that the reader is familiar with (ATOMICS64_) -and the 32-bit build attributes defined in (ATOMICS64_) and will use +This document assumes that the reader is familiar with ATOMICS64_ +and the 32-bit build attributes defined in ATOMICS64_ and will use concepts defined in these documents. Preamble @@ -45,7 +45,7 @@ We have identified the following trade-offs: - Performance of different mappings versus compatibility with all architectures. - Whether certain compiler operations lead to unexpected behaviours. -As motivated by the use cases expanded upon below: +The use cases expanded upon below motivate why we need an atomics abi: - The need for a baseline ABI. - Knowing when an implementation departs from that baseline. @@ -62,13 +62,15 @@ This document refers to, or is referred to by, the following documents. .. table:: - +-------------+--------------------------------------------------------------+-----------------------------------------------------------------------------+ - | Ref | External reference or URL | Title | - +=============+==============================================================+=============================================================================+ - | ARMARM_ | DDI 0487 | Arm Architecture Reference Manual Armv8 for Armv8-A architecture profile | - +-------------+--------------------------------------------------------------+-----------------------------------------------------------------------------+ - | PAPER_ | CGO paper | Compiler Testing with Relaxed Memory Models | - +-------------+--------------------------------------------------------------+-----------------------------------------------------------------------------+ + +-------------+--------------------------------------------------------------+-----------------------------------------------------------------------------------------------+ + | Ref | External reference or URL | Title | + +=============+==============================================================+===============================================================================================+ + | ARMARM_ | DDI 0487 | Arm Architecture Reference Manual Armv8 for Armv8-A architecture profile | + +-------------+--------------------------------------------------------------+-----------------------------------------------------------------------------------------------+ + | PAPER_ | CGO paper | Compiler Testing with Relaxed Memory Models | + +-------------+--------------------------------------------------------------+-----------------------------------------------------------------------------------------------+ + | ATOMICS64_ | Atomics ABI | C/C++ Atomics Application Binary Interface Standard for the Arm\ :sup:`®` 64-bit Architecture | + +-------------+--------------------------------------------------------------+-----------------------------------------------------------------------------------------------+ @@ -195,7 +197,7 @@ the following Assembly Sequences:: LDR W3,[X4] where ``thread_0:X2`` contains the address of ``x``, ``thread_0:X4`` contains -the address of ``y``, and ``thread_1:X2`` contains the address of ``y``, +the address of ``y``, ``thread_1:X2`` contains the address of ``y``, and ``thread_1:X4`` contains the address of ``x``. The ``exchange`` Atomic Operation is compiled to a ``SWP`` Assembly @@ -266,7 +268,7 @@ The Mix Testing Process ABI compatibility must be testable. Concurrency is not trivial, and the ABI presents a simplification of part of the problem that is understandable by engineers. We provide a simple technique for testing ABI compatibility. -These techniques reduce the difficulty of checking compatibility from a +This technique reduces the difficulty of checking compatibility from a problem of understanding concurrent executions, to the familiar testing domain of comparing program outcomes of tests. This document does not preclude other means of testing compatibility. From eb315cff357334a4c7665579a64d375e9c9f10be Mon Sep 17 00:00:00 2001 From: lukeg101 <6547672+lukeg101@users.noreply.github.com> Date: Thu, 29 Aug 2024 16:05:54 +0100 Subject: [PATCH 17/17] alpha release on current status line --- atomicsabi64/atomicsabi64.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/atomicsabi64/atomicsabi64.rst b/atomicsabi64/atomicsabi64.rst index 20ae47bd..cf3d915c 100644 --- a/atomicsabi64/atomicsabi64.rst +++ b/atomicsabi64/atomicsabi64.rst @@ -203,7 +203,7 @@ specifications: The content of this specification is a draft, and Arm considers the likelihood of future incompatible changes to be significant. -All content in this document is at the **Release** quality level. +All content in this document is at the **Alpha** quality level. Change History --------------