From 3cb0f9fc3cba1b9f36bb6e1d6b60ba251ae2b0e7 Mon Sep 17 00:00:00 2001
From: lukeg101 <6547672+lukeg101@users.noreply.github.com>
Date: Fri, 5 Apr 2024 21:54:48 +0100
Subject: [PATCH 01/17] [ATOMICSABI64]: Alpha Draft of Atomics ABI
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This is the Alpha draft of the ABI for the
"C/C++ Atomics Application Binary Interface Standard for the Arm® 64-bit Architecture"

This document describes the C/C++ Atomics Application Binary Interface for the Arm 64-bit architecture.

This document concerns the valid mappings from C/C++ Atomic Operations to sequences of A64 instructions.
This document does not support Armv7.

For matters concerning the memory model, please consult §B2 of the Arm Architecture Reference Manual.

We focus only on a subset of the C11 atomic operations and their mapping to A64 instructions at the time of writing.

More atomics will be added.

Co-Authored with Wilco Dijkstra (@Wilco1).
---
 CONTRIBUTING.md                        |    1 +
 README.md                              |    1 +
 atomicsabi64/Arm_logo_blue_RGB.svg     |   15 +
 atomicsabi64/CONTRIBUTIONS             |    3 +
 atomicsabi64/LICENSE                   |   22 +
 atomicsabi64/README.md                 |   38 +
 atomicsabi64/TRADEMARK_NOTICE          |    8 +
 atomicsabi64/atomicsabi64.rst          | 1161 ++++++++++++++++++++++++
 tools/common/check-rst-syntax.sh       |    3 +
 tools/common/generate-release-links.sh |    1 +
 tools/rst2pdf/generate-pdfs.sh         |    3 +
 11 files changed, 1256 insertions(+)
 create mode 100644 atomicsabi64/Arm_logo_blue_RGB.svg
 create mode 100644 atomicsabi64/CONTRIBUTIONS
 create mode 100644 atomicsabi64/LICENSE
 create mode 100644 atomicsabi64/README.md
 create mode 100644 atomicsabi64/TRADEMARK_NOTICE
 create mode 100644 atomicsabi64/atomicsabi64.rst

diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
index fa836e48..3db4550f 100644
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -109,6 +109,7 @@ document | owner | Github handle
 [Morello extensions to ELF for the Arm 64-bit Architecture](https://github.com/ARM-software/abi-aa/tree/master/aaelf64-morello) | Silviu Baranga | @sbaranga-arm
 [Morello Descriptor ABI for the Arm 64-bit Architecture](https://github.com/ARM-software/abi-aa/tree/master/descabi-morello) | Silviu Baranga | @sbaranga-arm
 [Memtag ABI Extension to ELF for the Arm 64-bit Architecture](https://github.com/ARM-software/abi-aa/tree/master/memtagabielf64) | Mitch Phillips | @hctim
+[C/C++ Atomics Application Binary Interface Standard for the Arm 64-bit Architecture](https://github.com/ARM-software/abi-aa/tree/master/atomicsabi64) | Luke Geeson | @lukeg101
 
 3. Merging the change
 
diff --git a/README.md b/README.md
index 571a0e0d..973d82a6 100644
--- a/README.md
+++ b/README.md
@@ -71,6 +71,7 @@ ELF for the Arm 64-bit Architecture                                | [aaelf64](a
 DWARF for the Arm 64-bit Architecture                              | [aadwarf64](aadwarf64/aadwarf64.rst) | [2020Q2](legacy-documents/aadwarf64/ihi0057_E/IHI0057_E_2020Q2_aadwarf64.pdf)
 C++ ABI for the Arm 64-bit Architecture                            | [cppabi64](cppabi64/cppabi64.rst)    | [2020Q2](legacy-documents/cppabi64/ihi0059_E/IHI0059E_2020Q2_cppabi64.pdf)
 Vector Function ABI for the Arm 64-bit Architecture                | [vfabia64](vfabia64/vfabia64.rst)    | [2019Q2](legacy-documents/vfabia64/101129_1920/101129_1920_01_en.pdf)
+C/C++ Atomics ABI for the Arm 64-bit Architecture                  | [atomicsabi64](atomicsabi64/atomicsabi64.rst)    | n/a
 
 
 ### ABI for the Arm 64-bit Architecture with SVE support
diff --git a/atomicsabi64/Arm_logo_blue_RGB.svg b/atomicsabi64/Arm_logo_blue_RGB.svg
new file mode 100644
index 00000000..1f9a9ba1
--- /dev/null
+++ b/atomicsabi64/Arm_logo_blue_RGB.svg
@@ -0,0 +1,15 @@
+<?xml version="1.0" encoding="utf-8"?>
+<!-- Generator: Adobe Illustrator 21.1.0, SVG Export Plug-In . SVG Version: 6.00 Build 0)  -->
+<svg version="1.1" id="Layer_1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" x="0px" y="0px"
+	 viewBox="0 0 245 80" style="enable-background:new 0 0 245 80;" width="245" height="80" xml:space="preserve">
+<style type="text/css">
+	.st0{fill:#0091BD;}
+</style>
+<g transform="translate(-275,-265)"><path class="st0" d="M331.4,271.8h15.9v68.1h-15.9v-7.1c-7,8.1-15.5,9.2-20.4,9.2c-21,0-33-17.5-33-36.2c0-22.2,15.2-35.8,33.2-35.8
+	c5,0,13.8,1.3,20.2,9.7V271.8z M294.2,306.1c0,11.8,7.4,21.7,18.9,21.7c10,0,19.3-7.3,19.3-21.5c0-14.9-9.2-22-19.3-22
+	C301.6,284.3,294.2,294,294.2,306.1z M366.1,271.8H382v6.1c1.8-2.1,4.4-4.4,6.6-5.7c3.1-1.8,6.1-2.3,9.7-2.3c3.9,0,8.1,0.6,12.5,3.2
+	l-6.5,14.4c-3.6-2.3-6.5-2.4-8.1-2.4c-3.4,0-6.8,0.5-9.9,3.7c-4.4,4.7-4.4,11.2-4.4,15.7v35.3h-15.9V271.8z M421,271.8h15.9v6.3
+	c5.3-6.5,11.6-8.1,16.8-8.1c7.1,0,13.8,3.4,17.6,10c5.7-8.1,14.2-10,20.2-10c8.3,0,15.5,3.9,19.4,10.7c1.3,2.3,3.6,7.3,3.6,17.2
+	v42.1h-15.9v-37.5c0-7.6-0.8-10.7-1.5-12.1c-1-2.6-3.4-6-9.1-6c-3.9,0-7.3,2.1-9.4,5c-2.8,3.9-3.1,9.7-3.1,15.5v35.1h-15.9v-37.5
+	c0-7.6-0.8-10.7-1.5-12.1c-1-2.6-3.4-6-9.1-6c-3.9,0-7.3,2.1-9.4,5c-2.8,3.9-3.1,9.7-3.1,15.5v35.1H421V271.8z"/></g>
+</svg>
diff --git a/atomicsabi64/CONTRIBUTIONS b/atomicsabi64/CONTRIBUTIONS
new file mode 100644
index 00000000..113f5fa6
--- /dev/null
+++ b/atomicsabi64/CONTRIBUTIONS
@@ -0,0 +1,3 @@
+Contributions to this project are licensed under an inbound=outbound
+model such that any such contributions are licensed by the contributor
+under the same terms as those in the LICENSE file.
diff --git a/atomicsabi64/LICENSE b/atomicsabi64/LICENSE
new file mode 100644
index 00000000..aa6d8392
--- /dev/null
+++ b/atomicsabi64/LICENSE
@@ -0,0 +1,22 @@
+This work is licensed under the Creative Commons
+Attribution-ShareAlike 4.0 International License. To view a copy of
+this license, visit http://creativecommons.org/licenses/by-sa/4.0/ or
+send a letter to Creative Commons, PO Box 1866, Mountain View, CA
+94042, USA.
+
+Grant of Patent License. Subject to the terms and conditions of this
+license (both the Public License and this Patent License), each
+Licensor hereby grants to You a perpetual, worldwide, non-exclusive,
+no-charge, royalty-free, irrevocable (except as stated in this
+section) patent license to make, have made, use, offer to sell, sell,
+import, and otherwise transfer the Licensed Material, where such
+license applies only to those patent claims licensable by such
+Licensor that are necessarily infringed by their contribution(s) alone
+or by combination of their contribution(s) with the Licensed Material
+to which such contribution(s) was submitted. If You institute patent
+litigation against any entity (including a cross-claim or counterclaim
+in a lawsuit) alleging that the Licensed Material or a contribution
+incorporated within the Licensed Material constitutes direct or
+contributory patent infringement, then any licenses granted to You
+under this license for that Licensed Material shall terminate as of
+the date such litigation is filed.
diff --git a/atomicsabi64/README.md b/atomicsabi64/README.md
new file mode 100644
index 00000000..64136bd4
--- /dev/null
+++ b/atomicsabi64/README.md
@@ -0,0 +1,38 @@
+<div align="center">
+   <img src="Arm_logo_blue_RGB.svg" />
+</div>
+
+# Atomics ABI for the Arm® 64-bit Architecture (AArch64)
+
+
+## About this document
+
+This document describes the [Application Binary Interface for the use
+of code generated by compiling C/C++ atomics targeting the Arm 64-bit architecture](atomicsabi64.rst).
+
+## About the license
+
+As identified more fully in the [LICENSE](LICENSE) file, this project
+is licensed under CC-BY-SA-4.0 along with an additional patent
+license.  The language in the additional patent license is largely
+identical to that in Apache-2.0 (specifically, Section 3 of Apache-2.0
+as reflected at https://www.apache.org/licenses/LICENSE-2.0) with two
+exceptions.
+
+First, several changes were made related to the defined terms so as to
+reflect the fact that such defined terms need to align with the
+terminology in CC-BY-SA-4.0 rather than Apache-2.0 (e.g., changing
+“Work” to “Licensed Material”).
+
+Second, the defensive termination clause was changed such that the
+scope of defensive termination applies to “any licenses granted to
+You” (rather than “any patent licenses granted to You”).  This change
+is intended to help maintain a healthy ecosystem by providing
+additional protection to the community against patent litigation
+claims.
+
+## Defects report
+
+Please report defects in the [Atomics Application Binary Interface (ABI)
+for the Arm 64-bit architecture](atomicsabi64.rst) to the [issue tracker
+page on GitHub](https://github.com/ARM-software/abi-aa/issues).
diff --git a/atomicsabi64/TRADEMARK_NOTICE b/atomicsabi64/TRADEMARK_NOTICE
new file mode 100644
index 00000000..9a7a7252
--- /dev/null
+++ b/atomicsabi64/TRADEMARK_NOTICE
@@ -0,0 +1,8 @@
+The text of and illustrations in this document are licensed
+under a Creative Commons Attribution–Share Alike 4.0 International
+license ("CC-BY-SA-4.0”), with an additional clause on patents.
+The Arm trademarks featured here are registered trademarks or
+trademarks of Arm Limited (or its subsidiaries) in the US and/or
+elsewhere. All rights reserved. Please visit
+https://www.arm.com/company/policies/trademarks for more information
+about Arm’s trademarks.
diff --git a/atomicsabi64/atomicsabi64.rst b/atomicsabi64/atomicsabi64.rst
new file mode 100644
index 00000000..128067a3
--- /dev/null
+++ b/atomicsabi64/atomicsabi64.rst
@@ -0,0 +1,1161 @@
+..
+   Copyright (c) 2024, Arm Limited and its affiliates.  All rights reserved.
+   CC-BY-SA-4.0 AND Apache-Patent-License
+   See LICENSE file for details
+
+.. |release| replace:: 2024Q1
+.. |date-of-issue| replace:: 5\ :sup:`th` April 2024
+.. |copyright-date| replace:: 2024
+.. |footer| replace:: Copyright © |copyright-date|, Arm Limited and its
+                      affiliates. All rights reserved.
+
+.. _ARMARM: https://developer.arm.com/documentation/ddi0487/latest
+.. _AAELF64: https://github.com/ARM-software/abi-aa/releases
+.. _CPPABI64: https://github.com/ARM-software/abi-aa/releases
+.. _CSTD: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1548.pdf
+
+*********************************************************************************************
+C/C++ Atomics Application Binary Interface Standard for the Arm\ :sup:`®` 64-bit Architecture
+*********************************************************************************************
+
+.. class:: version
+
+|release|
+
+.. class:: issued
+
+Date of Issue: |date-of-issue|
+
+.. class:: logo
+
+.. image:: Arm_logo_blue_RGB.svg
+   :scale: 30%
+
+.. section-numbering::
+
+.. raw:: pdf
+
+   PageBreak oneColumn
+
+
+Preamble
+========
+
+Abstract
+--------
+
+This document describes the C/C++ Atomics Application Binary Interface for the
+Arm 64-bit architecture. This document concerns the valid mappings from C/C++
+Atomic Operations to sequences of A64 instructions. For matters concerning the
+memory model, please consult §B2 of the Arm Architecture Reference Manual
+[ARMARM_]. We focus only on a subset of the C11 atomic operations at the time
+of writing.
+
+Keywords
+--------
+
+C++, C, Application Binary Interface, ABI, AArch64, C++ ABI,  generic C++ ABI,
+Atomics, Concurrency
+
+Latest release and defects report
+---------------------------------
+
+Please check `Atomics Application Binary Interface for the Arm® Architecture
+<https://github.com/ARM-software/abi-aa>`_ for the latest
+release of this document.
+
+Please report defects in this specification to the `issue tracker page
+on GitHub
+<https://github.com/ARM-software/abi-aa/issues>`_.
+
+.. raw:: pdf
+
+   PageBreak
+
+Acknowledgement
+---------------
+
+This document came about in the process of Luke Geeson’s PhD on testing the
+compilation of concurrent C/C++ with assistance from Wilco Dijkstra from Arm's
+Compiler Teams.
+
+
+
+Licence
+-------
+
+This work is licensed under the Creative Commons
+Attribution-ShareAlike 4.0 International License. To view a copy of
+this license, visit http://creativecommons.org/licenses/by-sa/4.0/ or
+send a letter to Creative Commons, PO Box 1866, Mountain View, CA
+94042, USA.
+
+Grant of Patent License. Subject to the terms and conditions of this
+license (both the Public License and this Patent License), each
+Licensor hereby grants to You a perpetual, worldwide, non-exclusive,
+no-charge, royalty-free, irrevocable (except as stated in this
+section) patent license to make, have made, use, offer to sell, sell,
+import, and otherwise transfer the Licensed Material, where such
+license applies only to those patent claims licensable by such
+Licensor that are necessarily infringed by their contribution(s) alone
+or by combination of their contribution(s) with the Licensed Material
+to which such contribution(s) was submitted. If You institute patent
+litigation against any entity (including a cross-claim or counterclaim
+in a lawsuit) alleging that the Licensed Material or a contribution
+incorporated within the Licensed Material constitutes direct or
+contributory patent infringement, then any licenses granted to You
+under this license for that Licensed Material shall terminate as of
+the date such litigation is filed.
+
+About the license
+-----------------
+
+As identified more fully in the Licence_ section, this project
+is licensed under CC-BY-SA-4.0 along with an additional patent
+license.  The language in the additional patent license is largely
+identical to that in Apache-2.0 (specifically, Section 3 of Apache-2.0
+as reflected at https://www.apache.org/licenses/LICENSE-2.0) with two
+exceptions.
+
+First, several changes were made related to the defined terms so as to
+reflect the fact that such defined terms need to align with the
+terminology in CC-BY-SA-4.0 rather than Apache-2.0 (e.g., changing
+“Work” to “Licensed Material”).
+
+Second, the defensive termination clause was changed such that the
+scope of defensive termination applies to “any licenses granted to
+You” (rather than “any patent licenses granted to You”).  This change
+is intended to help maintain a healthy ecosystem by providing
+additional protection to the community against patent litigation
+claims.
+
+Contributions
+-------------
+
+Contributions to this project are licensed under an inbound=outbound
+model such that any such contributions are licensed by the contributor
+under the same terms as those in the `Licence`_ section.
+
+Trademark notice
+----------------
+
+The text of and illustrations in this document are licensed by Arm
+under a Creative Commons Attribution–Share Alike 4.0 International
+license ("CC-BY-SA-4.0”), with an additional clause on patents.
+The Arm trademarks featured here are registered trademarks or
+trademarks of Arm Limited (or its subsidiaries) in the US and/or
+elsewhere. All rights reserved. Please visit
+https://www.arm.com/company/policies/trademarks for more information
+about Arm’s trademarks.
+
+Copyright
+---------
+
+Copyright (c) |copyright-date|, Arm Limited and its affiliates.  All rights
+reserved.
+
+.. raw:: pdf
+
+   PageBreak
+
+.. contents::
+   :depth: 3
+
+.. raw:: pdf
+
+   PageBreak
+
+About this document
+===================
+
+Change control
+--------------
+
+Current status and anticipated changes
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The following support level definitions are used by the Arm Atomics ABI
+specifications:
+
+**Release**
+   Arm considers this specification to have enough implementations, which have
+   received sufficient testing, to verify that it is correct. The details of
+   these criteria are dependent on the scale and complexity of the change over
+   previous versions: small, simple changes might only require one
+   implementation, but more complex changes require multiple independent
+   implementations, which have been rigorously tested for cross-compatibility.
+   Arm anticipates that future changes to this specification will be limited to
+   typographical corrections, clarifications and compatible extensions.
+
+**Beta**
+   Arm considers this specification to be complete, but existing
+   implementations do not meet the requirements for confidence in its release
+   quality. Arm may need to make incompatible changes if issues emerge from its
+   implementation.
+
+**Alpha**
+   The content of this specification is a draft, and Arm considers the
+   likelihood of future incompatible changes to be significant.
+
+All content in this document is at the **Alpha** quality level.
+
+Change History
+--------------
+
+If there is no entry in the change history table for a release, there are no
+changes to the content of the document for that release.
+
+.. class:: atomicsabi64-change-history
+
+.. table::
+
+  +---------+------------------------------+-------------------------------------------------------------------+
+  | Issue   | Date                         | Change                                                            |
+  +=========+==============================+===================================================================+
+  | 00alp0  | 5\ :sup:`th` April 2024.     | Alpha release.                                                    |
+  +---------+------------------------------+-------------------------------------------------------------------+
+  
+
+References
+----------
+
+This document refers to, or is referred to by, the following documents.
+
+.. table::
+
+  +-------------+--------------------------------------------------------------+-----------------------------------------------------------------------------+
+  | Ref         | External reference or URL                                    | Title                                                                       |
+  +=============+==============================================================+=============================================================================+
+  | ARMARM_     | DDI 0487                                                     | Arm Architecture Reference Manual Armv8 for Armv8-A architecture profile    |
+  +-------------+--------------------------------------------------------------+-----------------------------------------------------------------------------+
+  | CSTD_       | ISO/IEC 9899:2018                                            | International Standard ISO/IEC 9899:2018 – Programming languages C.         |
+  +-------------+--------------------------------------------------------------+-----------------------------------------------------------------------------+
+
+Note: At the time of writing C23 is not released, as such ISO C17 is considered the latest published document.
+
+.. raw:: pdf
+
+   PageBreak
+
+Terms and Abbreviations
+-----------------------
+
+The Atomics ABI for the Arm 64-bit Architecture uses the following terms and
+abbreviations.
+
+A64
+   The instruction set available when in AArch64 state.
+
+AArch64
+   The 64-bit general-purpose register width state of the Armv8 architecture.
+
+ABI
+   Application Binary Interface:
+
+   1. The specifications to which an executable must conform in order to
+      execute in a specific execution environment. For example, the
+      :title-reference:`Linux ABI for the Arm Architecture`.
+
+   2. A particular aspect of the specifications to which independently
+      produced relocatable files must conform in order to be statically
+      linkable and executable.  For example, the C++ ABI for the Arm 64-bit
+      Architecture [CPPABI64_], or ELF for the Arm Architecture [AAELF64_].
+
+Arm-based
+   ... based on the Arm architecture ...
+
+Concurrent Program
+   A C or C++ program that consists of one or more Threads of Execution. Each
+   Thread of Execution must communicate with other threads in the Concurrent
+   Program through Shared-Memory Locations, using Atomic Operations to be
+   deemed *concurrent*.
+
+Thread of Execution
+   A unit of computation that executes one or more Atomic Operations,
+   Synchronization Operations or other C language statements. The Arm
+   Architecture Reference Manual [ARMARM_] calls these *Observers*. Typically a
+   thread is defined as a function (e.g. a POSIX thread) although we do not
+   limit threads to such implementations.
+
+Atomic Operation
+   A C/C++ operation on a Shared-Memory Location. Typically either a load,
+   store, exchange, compare, or arithmetic instruction (such as a fetch and add
+   operation). Atomics are used to define higher level primitives including
+   locks and concurrent queues. ISO C defines the range of supported atomic
+   operations and the ``atomic`` type. Operations on atomic-qualified data are
+   guaranteed not to be interrupted by another Thread of Execution.
+
+Synchronization Operation
+   The order that atomic operations are executed by each Thread of Execution
+   may not be the same as the order they are written in the program.
+   Synchronization Operations are statements that constrain the order of
+   accesses made to Shared-Memory Locations by each thread. Synchronization
+   Operations include Thread Fences, and certain control flow structures.
+
+Shared-Memory Location
+   A memory location that can be accessed by any Thread of Execution in the
+   program.
+
+Memory Order Parameters
+   Describes a constraint on an Atomic Operation or Synchronization Operation.
+   Memory Order describes how memory accesses made by Atomic Operations may be
+   ordered with respect to other Atomic Operations and Synchronization
+   Operations. ISO C defines a ``memory_order`` enum type to capture the
+   possible memory order parameters.
+
+Thread Fence 
+   A Thread Fence is a Synchronization Operation that constrains the order of
+   Accesses made by Atomic Operations on a given Thread of Execution. Fences
+   are equipped with a Memory Order Parameter that specifies which kinds of
+   accesses may be reordered before or after the fence. ISO C defines the
+   ``atomic_thread_fence`` to synchronize the order of accesses made by atomic
+   operations on ``_Atomic`` qualified data.
+
+Atomic Instruction
+   An A64 instruction that may have Memory Order semantics. For instance an A64
+   LDR instruction has no atomicity, but the LDAR instruction has *acquire*
+   semantics. (see [ARMARM_]). 
+
+Assembly Sequence
+   A sequence of Atomic Instructions.
+
+Mapping
+   A pair of Atomic Operation and Assembly Sequence. A compiler generates the
+   Assembly Sequence, given an Atomic Operation and Compiler Profile as input.
+
+Compiler Profile
+   A combination of a compiler and command-line flags that implements a set of
+   Mappings from Atomic Operations to A64 Assembly Sequences. When the compiler
+   is provided with a Concurrent Program and Compiler Profile, it generates an
+   Assembly Sequence.
+
+More specific terminology is defined when it is first used.
+
+.. raw:: pdf
+
+   PageBreak
+
+Overview
+========
+
+The C/C++ Atomics ABI for the Arm 64-bit architecture (AABI64) comprises the
+following sub-components.
+
+* The `Mappings from Atomic Operations to Assembly Sequences`_, which defines
+  the mappings from C/C++ atomic operations to sequences of A64 assembly that
+  are interoperable with respect to each other.
+
+* A `Declarative statement of Mappings compatibility`_, as far as
+  non-exhaustive testing can validate, that the aforementioned Mappings can be
+  used together. That is, there is no tested combination of Mappings that
+  induces unexpected program behaviour when a compiled program that uses
+  atomics is executed on a multi-core Arm-based machine.
+
+Mappings from Atomic Operations to Assembly Sequences
+=====================================================
+
+We now describe the compatible Mappings for C/C++ Atomic Operations and
+Assembly Sequences. Since there is a large number of ways these mappings may be
+combined, we break down the tables by the width of the access, and list
+compatible Assembly Sequences for each Atomic Operation.
+
+This is an open ABI, we encourage improvements to this specification to be
+submitted to the `issue tracker page on
+GitHub <https://github.com/ARM-software/abi-aa/issues>`_.
+
+These mappings are not exhaustive, but aim to cover the atomics we have tested.
+Please request more atomics using the issue tracker.
+
+Notational Conventions
+----------------------
+To reduce repetition, we use the following notational conventions
+
+.. table::
+
+  +-----------------------------------------+--------------------------------------+
+  | Memory Order Parameter                  | Notation                             | 
+  +=========================================+======================================+
+  | ``memory_order_relaxed``                | ``relaxed``                          |
+  +-----------------------------------------+--------------------------------------+
+  | ``memory_order_acquire``                | ``acq``                              |
+  +-----------------------------------------+--------------------------------------+
+  | ``memory_order_release``                | ``rel``                              |
+  +-----------------------------------------+--------------------------------------+
+  | ``memory_order_acq_rel``                | ``acq_rel``                          |
+  +-----------------------------------------+--------------------------------------+
+  | ``memory_order_seq_cst``                | ``sc``                               |
+  +-----------------------------------------+--------------------------------------+
+
+In what follows ``loc`` refers to the location, ``val`` refers to a value
+parameter.
+
+Arbitrary registers may be used in the Assembly Sequences that may change in
+compiler implementations. Cases where arbitrary registers may *not* be used are
+covered in the Special Cases section.
+
+Further, in what follows there may be multiple valid Mappings from Atomic
+Operation to Assembly Sequence, as made available by a given architecture
+extension. In this case we split the rows of the table to represent multiple
+options.
+
+.. table::
+
+  +--------------------------------------------------------+--------------------------------------+
+  | Atomic Operation                                       | Assembly Sequence                    | 
+  +============================================+===========+======================================+
+  | ``atomic_store_explicit(loc,val,relaxed)`` | ARCH1     | ``option A``                         |
+  +                                            +-----------+--------------------------------------+
+  |                                            | ARCH2     | ``option B``                         |
+  +--------------------------------------------+-----------+--------------------------------------+
+
+Where ARCH is for example BASE (armv8), LSE, LSE2, LSE128, RCPC, or LRCPC3.
+
+Lastly, all operations are in a shorthand form:
+
+.. table::
+
+  +----------------------------------------------------+--------------------------------------+
+  | Atomic Operation                                   | ShortHand Atomic Operation           | 
+  +====================================================+======================================+
+  | ``atomic_store_explicit(...)``                     | ``store(...)``                       |
+  +----------------------------------------------------+--------------------------------------+
+  | ``atomic_load_explicit(...)``                      | ``load(...)``                        |
+  +----------------------------------------------------+--------------------------------------+
+  | ``atomic_thread_fence(...)``                       | ``fence(...)``                       |
+  +----------------------------------------------------+--------------------------------------+
+  | ``atomic_exchange_explicit(...)``                  | ``exchange(...)``                    |
+  +----------------------------------------------------+--------------------------------------+
+  | ``atomic_fetch_add_explicit(...)``                 | ``fetch_add(...)``                   | 
+  +----------------------------------------------------+--------------------------------------+
+  | ``atomic_fetch_sub_explicit(...)``                 | ``fetch_sub(...)``                   | 
+  +----------------------------------------------------+--------------------------------------+
+  | ``atomic_fetch_or_explicit(...)``                  | ``fetch_or(...)``                    | 
+  +----------------------------------------------------+--------------------------------------+
+  | ``atomic_fetch_xor_explicit(...)``                 | ``fetch_xor(...)``                   | 
+  +----------------------------------------------------+--------------------------------------+
+  | ``atomic_fetch_and_explicit(...)``                 | ``fetch_and(...)``                   | 
+  +----------------------------------------------------+--------------------------------------+
+
+
+Mappings for 32-bit types
+-------------------------
+
+In what follows, register ``X1`` contains the location ``loc`` and ``W2``
+contains ``val``. The result is returned in ``W0``.
+
+.. table::
+
+  +------------------------------------------+--------------------------------------+
+  | Atomic Operation                         | Assembly Sequence                    | 
+  +==========================================+======================================+
+  | ``store(loc,val,relaxed)``               | ``STR   W2, [X1]``                   |
+  +------------------------------------------+--------------------------------------+
+  +| ``store(loc,val,rel)``                  + ``STLR  W2, [X1]``                   +
+  +| ``store(loc,val,sc)``                   +                                      +
+  +------------------------------------------+--------------------------------------+
+  | ``load(loc,relaxed)``                    | ``LDR   W2, [X1]``                   |
+  +-------------------------------+----------+--------------------------------------+
+  | ``load(loc,acq)``             | ``BASE`` | ``LDAR  W2, [X1]``                   |
+  +                               +----------+--------------------------------------+
+  |                               | ``RCPC`` | ``LDAPR W2, [X1]``                   |
+  +-------------------------------+----------+--------------------------------------+
+  | ``load(loc,sc)``                         | ``LDAR  W2, [X1]``                   |
+  +------------------------------------------+--------------------------------------+
+  | ``fence(relaxed)``                       | ``NOP``                              |
+  +------------------------------------------+--------------------------------------+
+  | ``fence(acq)``                           | ``DMB ISHLD``                        |
+  +------------------------------------------+--------------------------------------+
+  | | ``fence(rel)``                         | ``DMB ISH``                          |
+  | | ``fence(acq_rel)``                     |                                      |
+  | | ``fence(sc)``                          |                                      |
+  +-------------------------------+----------+--------------------------------------+
+  | ``exchange(loc,val,relaxed)`` | ``BASE`` | | ``loop:``                          |
+  |                               +          + |   ``LDXR   W0, [X1]``              +
+  +                               |          | |   ``STXR   W3, W2, [X1]``          |
+  |                               +          + |   ``CBNZ   W3, loop``              +
+  +                               +----------+--------------------------------------+
+  |                               | ``LSE``  | ``SWP    W2, W0, [X1]``              | 
+  +-------------------------------+----------+--------------------------------------+
+  | ``exchange(loc,val,acq)``     | ``BASE`` | | ``loop:``                          |
+  |                               |          | |   ``LDAXR  W0, [X1]``              |
+  +                               +          + |   ``STXR   W3, W2, [X1]``          +
+  |                               |          | |   ``CBNZ   W3, loop``              |
+  +                               +----------+--------------------------------------+
+  |                               | ``LSE``  | ``SWPA   W2, W0, [X1]``              |  
+  +-------------------------------+----------+--------------------------------------+
+  | ``exchange(loc,val,rel)``     | ``BASE`` | | ``loop:``                          |
+  |                               |          | |   ``LDXR   W0, [X1]``              |
+  +                               +          + |   ``STLXR  W3, W2, [X1]``          +
+  |                               |          | |   ``CBNZ   W3, loop``              |
+  +                               +----------+--------------------------------------+
+  |                               | ``LSE``  | ``SWPL   W2, W0, [X1]``              | 
+  +-------------------------------+----------+--------------------------------------+
+  | ``exchange(loc,val,acq_rel)`` | ``BASE`` | | ``loop:``                          |
+  | ``exchange(loc,val,sc)``      |          | |   ``LDAXR  W0, [X1]``              |
+  +                               +          + |   ``STLXR  W3, W2, [X1]``          +
+  |                               |          | |   ``CBNZ   W3, loop``              |
+  +                               +----------+--------------------------------------+
+  |                               | ``LSE``  | ``SWPAL  W2, W0, [X1]``              | 
+  +-------------------------------+----------+--------------------------------------+
+  | ``fetch_add(loc,val,relaxed)``| ``BASE`` | | ``loop:``                          |
+  |                               +          + |   ``LDXR   W0, [X1]``              +
+  |                               +          + |   ``ADD    W2, W2, W0``            +
+  +                               |          | |   ``STXR   W3, W2, [X1]``          |
+  |                               +          + |   ``CBNZ   W3, loop``              +
+  +                               +----------+--------------------------------------+
+  |                               | ``LSE``  | ``LDADD    W2, W0, [X1]``            |
+  +-------------------------------+----------+--------------------------------------+
+  | ``fetch_add(loc,val,acq)``    | ``BASE`` | | ``loop:``                          |
+  |                               +          + |   ``LDAXR  W0, [X1]``              +
+  |                               +          + |   ``ADD    W2, W2, W0``            +
+  +                               |          | |   ``STXR   W3, W2, [X1]``          |
+  |                               +          + |   ``CBNZ   W3, loop``              +
+  +                               +----------+--------------------------------------+
+  |                               | ``LSE``  | ``LDADDA   W2, W0, [X1]``            | 
+  +-------------------------------+----------+--------------------------------------+
+  | ``fetch_add(loc,val,rel)``    | ``BASE`` | | ``loop:``                          |
+  |                               +          + |   ``LDXR   W0, [X1]``              +
+  |                               +          + |   ``ADD    W2, W2, W0``            +
+  +                               |          | |   ``STLXR  W3, W2, [X1]``          |
+  |                               +          + |   ``CBNZ   W3, loop``              +
+  +                               +----------+--------------------------------------+
+  |                               | ``LSE``  | ``LDADDL   W2, W0, [X1]``            |
+  +-------------------------------+----------+--------------------------------------+
+  | ``fetch_add(loc,val,acq_rel)``| ``BASE`` | | ``loop:``                          |
+  | ``fetch_add(loc,val,sc)``     +          + |   ``LDXAR  W0, [X1]``              +
+  |                               +          + |   ``ADD    W2, W2, W0``            +
+  +                               |          | |   ``STLXR  W3, W2, [X1]``          |
+  |                               +          + |   ``CBNZ   W3, loop``              +
+  +                               +----------+--------------------------------------+
+  |                               | ``LSE``  | ``LDADDAL  W2, W0, [X1]``            |
+  +-------------------------------+----------+--------------------------------------+
+  | ``compare_exchange_strong(``  | ``BASE`` | | ``loop:``                          |
+  |   ``loc,&exp,val,relaxed,``   +          + |   ``LDXR   W0, [X1]``              +
+  |   ``relaxed)``                +          + |   ``CMP    W0, W4``                +
+  |                               +          + |   ``B.NE    fail``                 +
+  +                               |          | |   ``STXR   W3, W2, [X1]``          |
+  |                               +          + |   ``CBNZ   W3, loop``              +
+  +                               +          + | ``fail:``                          +
+  +                               +----------+--------------------------------------+
+  |                               | ``LSE``  | ``CAS    W0, W2, [X1]``              |
+  +-------------------------------+----------+--------------------------------------+
+  | ``compare_exchange_strong(``  | ``BASE`` | | ``loop:``                          |
+  |   ``loc,&exp,val,acq,acq)``   +          + |   ``LDAXR  W0, [X1]``              +
+  |                               +          + |   ``CMP    W0, W4``                +
+  |                               +          + |   ``B.NE    fail``                 +
+  +                               |          | |   ``STXR   W3, W2, [X1]``          |
+  |                               +          + |   ``CBNZ   W3, loop``              +
+  +                               +          + | ``fail:``                          +
+  +                               +----------+--------------------------------------+
+  |                               | ``LSE``  | ``CASA   W0, W2, [X1]``              |
+  +-------------------------------+----------+--------------------------------------+
+  | ``compare_exchange_strong(``  | ``BASE`` | | ``loop:``                          |
+  |   ``loc,&exp,val,rel,rel)``   +          + |   ``LDXR   W0, [X1]``              +
+  |                               +          + |   ``CMP    W0, W4``                +
+  |                               +          + |   ``B.NE    fail``                 +
+  +                               |          | |   ``STLXR  W3, W2, [X1]``          |
+  |                               +          + |   ``CBNZ   W3, loop``              +
+  +                               +          + | ``fail:``                          +
+  +                               +----------+--------------------------------------+
+  |                               | ``LSE``  | ``CASL   W0, W2, [X1]``              |
+  +-------------------------------+----------+--------------------------------------+
+  | ``compare_exchange_strong(``  | ``BASE`` | | ``loop:``                          |
+  |  ``loc,&exp,val,acq_rel,acq)``+          + |   ``LDAXR  W0, [X1]``              +
+  |                               +          + |   ``CMP    W0, W4``                +
+  | ``compare_exchange_strong(``  +          + |   ``B.NE    fail``                 +
+  +   ``loc,&exp,val,sc,sc)``     |          | |   ``STLXR  W3, W2, [X1]``          |
+  |                               +          + |   ``CBNZ   W3, loop``              +
+  +                               +          + | ``fail:``                          +
+  +                               +----------+--------------------------------------+
+  |                               | ``LSE``  | ``CASAL  W0, W2, [X1]``              |
+  +-------------------------------+----------+--------------------------------------+
+
+Mappings for 8-bit types
+------------------------
+
+The mappings for 8-bit types are the same as 32-bit types except they use the
+``B`` variants of instructions. 
+
+
+Mappings for 16-bit types
+-------------------------
+
+The mappings for 16-bit types are the same as 32-bit types except they use the
+``H`` variants of instructions.
+
+Mappings for 64-bit types
+-------------------------
+
+The mappings for 64-bit types are the same as 32-bit types except the registers
+used are X-registers.
+
+Mappings for 128-bit types
+--------------------------
+
+Since the access width of 128-bit types is double that of the 64-bit register
+width, the following Mappings use *pair* instructions, which require their own
+table.
+
+In what follows, register ``X4`` contains the location ``loc``, ``X2`` and 
+``X3`` contain the input value. The result is returned in ``X0`` and ``X1``.
+
+.. table::
+
+  +-----------------------------------------------+--------------------------------------+
+  | Atomic Operation                              | Assembly Sequence                    |
+  +=================================+=============+======================================+
+  | ``store(loc,val,relaxed)``      | ``BASE``    | | ``loop:``                          |
+  |                                 |             | |   ``LDXP   XZR, X1, [X4]``         |
+  |                                 |             | |   ``STXP   W5, X2, X3, [X4]``      |
+  |                                 |             | |   ``CBNZ   W5, loop``              |
+  +                                 +-------------+--------------------------------------+
+  |                                 | ``LSE``     | | ``LDP   X0, X1, [X4]``             |
+  |                                 |             | | ``loop:``                          |
+  |                                 |             | |   ``MOV    X6, X0``                |
+  |                                 |             | |   ``MOV    X7, X1``                |
+  |                                 |             | |   ``CASP   X0, X1, X2, X3, [X4]``  |
+  |                                 |             | |   ``CMP    X0, X6``                |
+  |                                 |             | |   ``CCMP   X1, X7, 0, EQ``         |
+  |                                 |             | |   ``B.NE   loop``                  |
+  +                                 +-------------+--------------------------------------+
+  |                                 | ``LSE2``    | ``STP   x2, X3, [X4]``               |
+  +---------------------------------+-------------+--------------------------------------+
+  | ``store(loc,val,rel)``          | ``BASE``    | | ``loop:``                          |
+  |                                 |             | |   ``LDXP    XZR, X1, [X4]``        |
+  |                                 |             | |   ``STLXP   W5, X2, X3, [X4]``     |
+  |                                 |             | |   ``CBNZ    W5, loop``             |
+  +                                 +-------------+--------------------------------------+
+  |                                 | ``LSE``     | | ``LDP   X0, X1, [X4]``             |
+  |                                 |             | | ``loop:``                          |
+  |                                 |             | |   ``MOV    X6, X0``                |
+  |                                 |             | |   ``MOV    X7, X1``                |
+  |                                 |             | |   ``CASPL  X0, X1, X2, X3, [X4]``  |
+  |                                 |             | |   ``CMP    X0, X6``                |
+  |                                 |             | |   ``CCMP   X1, X7, 0, EQ``         |
+  |                                 |             | |   ``B.NE   loop``                  |
+  +                                 +-------------+--------------------------------------+
+  |                                 | ``LSE2``    | | ``DMB   ISH``                      |
+  |                                 |             | | ``STP   X2, X3, [X4]``             |
+  +                                 +-------------+--------------------------------------+
+  |                                 | ``LRCPC3``  | ``STILP   X2, X3, [X4]``             |
+  +---------------------------------+-------------+--------------------------------------+
+  | ``store(loc,val,sc)``           | ``BASE``    | | ``loop:``                          |
+  |                                 |             | |   ``LDXP    XZR, X1, [X4]``        |
+  |                                 |             | |   ``STLXP   W5, X2, X3, [X4]``     |
+  |                                 |             | |   ``CBNZ    W5, loop``             |
+  +                                 +-------------+--------------------------------------+
+  |                                 | ``LSE``     | | ``LDP   X0, X1, [X4]``             |
+  |                                 |             | | ``loop:``                          |
+  |                                 |             | |   ``MOV    X6, X0``                |
+  |                                 |             | |   ``MOV    X7, X1``                |
+  |                                 |             | |   ``CASPL  X0, X1, X2, X3, [X4]``  |
+  |                                 |             | |   ``CMP    X0, X6``                |
+  |                                 |             | |   ``CCMP   X1, X7, 0, EQ``         |
+  |                                 |             | |   ``B.NE   loop``                  |
+  +                                 +-------------+--------------------------------------+
+  |                                 | ``LSE2``    | | ``DMB   ISH``                      |
+  |                                 |             | | ``STP   X2, X3, [X4]``             |
+  |                                 |             | | ``DMB   ISH``                      |
+  +                                 +-------------+--------------------------------------+
+  |                                 | ``LRCPC3``  | ``STILP   X2, X3, [X4]``             |
+  +---------------------------------+-------------+--------------------------------------+
+  | ``load(loc,relaxed)``           | ``BASE``    | | ``loop:``                          |
+  |                                 |             | |   ``LDXP   X0, X1, [X4]``          |
+  |                                 |             | |   ``STXP   W5, X0, X1, [X4]``      |
+  |                                 |             | |   ``CBNZ   W5, loop``              |
+  +                                 +-------------+--------------------------------------+
+  |                                 | ``LSE``     | ``CASP   X0, X1, X0, X1, [X4]``      |
+  +                                 +-------------+--------------------------------------+
+  |                                 | ``LSE2``    | ``LDP   X0, X1, [X4]``               |
+  +---------------------------------+-------------+--------------------------------------+
+  | ``load(loc,acq)``               | ``BASE``    | | ``loop:``                          |
+  |                                 |             | |   ``LDAXP  X0, X1, [X4]``          |
+  |                                 |             | |   ``STXP   W5, X0, X1, [X4]``      |
+  |                                 |             | |   ``CBNZ   W5, loop``              |
+  +                                 +-------------+--------------------------------------+
+  |                                 | ``LSE``     | ``CASPA  X0, X1, X0, X1, [X4]``      |
+  +                                 +-------------+--------------------------------------+
+  |                                 | ``LSE2``    | | ``LDP   X0, X1, [X4]``             |
+  |                                 |             | | ``DMB   ISHLD``                    |
+  +                                 +-------------+--------------------------------------+
+  |                                 | ``LRCPC3``  | ``LDIAPP   X0, X1, [X4]``            |
+  +---------------------------------+-------------+--------------------------------------+
+  | ``load(loc,sc)``                | ``BASE``    | | ``loop:``                          |
+  |                                 |             | |   ``LDAXP   X0, X1, [X4]``         |
+  |                                 |             | |   ``STXP    W5, X0, X1, [X4]``     |
+  |                                 |             | |   ``CBNZ    W5, loop``             |
+  +                                 +-------------+--------------------------------------+
+  |                                 | ``LSE``     | ``CASPA  X0, X1, X0, X1, [X4]``      |
+  +                                 +-------------+--------------------------------------+
+  |                                 | ``LSE2``    | | ``LDAR  X5, [X4]``                 |
+  |                                 |             | | ``LDP   X0, X1, [X4]``             |
+  |                                 |             | | ``DMB   ISHLD``                    |
+  +                                 +-------------+--------------------------------------+
+  |                                 | ``LRCPC3``  | | ``LDAR   X5, [X4]``                |
+  |                                 |             | | ``LDIAPP X0, X1, [X4]``            |
+  +---------------------------------+-------------+--------------------------------------+
+  | ``exchange(loc,val,relaxed)``   | ``BASE``    | | ``loop:``                          |
+  |                                 |             | |   ``LDXP   X0, X1, [X4]``          |
+  |                                 |             | |   ``STXP   W5, X2, X3, [X4]``      |
+  |                                 |             | |   ``CBNZ   W5, loop``              |
+  +                                 +-------------+--------------------------------------+
+  |                                 | ``LSE``     | | ``LDP   X0, X1, [X4]``             |
+  |                                 |             | | ``loop:``                          |
+  |                                 |             | |   ``MOV    X6, X0``                |
+  |                                 |             | |   ``MOV    X7, X1``                |
+  |                                 |             | |   ``CASP   X0, X1, X2, X3, [X4]``  |
+  |                                 |             | |   ``CMP    X0, X6``                |
+  |                                 |             | |   ``CCMP   X1, X7, 0, EQ``         |
+  |                                 |             | |   ``B.NE   loop``                  |
+  +                                 +-------------+--------------------------------------+
+  |                                 | ``LSE128``  | | ``MOV    X0, X2``                  |
+  |                                 |             | | ``MOV    X1, X3``                  |
+  |                                 |             | | ``SWPP   X0, X1, [X4]``            |
+  +---------------------------------+-------------+--------------------------------------+
+  | ``exchange(loc,val,acq)``       | ``BASE``    | | ``loop:``                          |
+  |                                 |             | |   ``LDAXP  X0, X1, [X4]``          |
+  |                                 |             | |   ``STXP   W5, X2, X3, [X4]``      |
+  |                                 |             | |   ``CBNZ   W5, loop``              |
+  +                                 +-------------+--------------------------------------+
+  |                                 | ``LSE``     | | ``LDP   X0, X1, [X4]``             |
+  |                                 |             | | ``loop:``                          |
+  |                                 |             | |   ``MOV    X6, X0``                |
+  |                                 |             | |   ``MOV    X7, X1``                |
+  |                                 |             | |   ``CASPA  X0, X1, X2, X3, [X4]``  |
+  |                                 |             | |   ``CMP    X0, X6``                |
+  |                                 |             | |   ``CCMP   X1, X7, 0, EQ``         |
+  |                                 |             | |   ``B.NE   loop``                  |
+  +                                 +-------------+--------------------------------------+
+  |                                 | ``LSE128``  | | ``MOV    X0, X2``                  |
+  |                                 |             | | ``MOV    X1, X3``                  |
+  |                                 |             | | ``SWPPA  X0, X1, [X4]``            |
+  +---------------------------------+-------------+--------------------------------------+
+  | ``exchange(loc,val,rel)``       | ``BASE``    | | ``loop:``                          |
+  |                                 |             | |   ``LDXP   X0, X1, [X4]``          |
+  |                                 |             | |   ``STLXP  W5, X2, X3, [X4]``      |
+  |                                 |             | |   ``CBNZ   W5, loop``              |
+  +                                 +-------------+--------------------------------------+
+  |                                 | ``LSE``     | | ``LDP   X0, X1, [X4]``             |
+  |                                 |             | | ``loop:``                          |
+  |                                 |             | |   ``MOV    X6, X0``                |
+  |                                 |             | |   ``MOV    X7, X1``                |
+  |                                 |             | |   ``CASPL  X0, X1, X2, X3, [X4]``  |
+  |                                 |             | |   ``CMP    X0, X6``                |
+  |                                 |             | |   ``CCMP   X1, X7, 0, EQ``         |
+  |                                 |             | |   ``B.NE   loop``                  |
+  +                                 +-------------+--------------------------------------+
+  |                                 | ``LSE128``  | | ``MOV    X0, X2``                  |
+  |                                 |             | | ``MOV    X1, X3``                  |
+  |                                 |             | | ``SWPPL  X0, X1, [X4]``            |
+  +---------------------------------+-------------+--------------------------------------+
+  | ``exchange(loc,val,acq_rel)``   | ``BASE``    | | ``loop:``                          |
+  | ``exchange(loc,val,sc)``        |             | |   ``LDAXP  X0, X1, [X4]``          |
+  |                                 |             | |   ``STLXP  W5, X2, X3, [X4]``      |
+  |                                 |             | |   ``CBNZ   W5, loop``              |
+  +                                 +-------------+--------------------------------------+
+  |                                 | ``LSE``     | | ``LDP   X0, X1, [X4]``             |
+  |                                 |             | | ``loop:``                          |
+  |                                 |             | |   ``MOV    X6, X0``                |
+  |                                 |             | |   ``MOV    X7, X1``                |
+  |                                 |             | |   ``CASPAL X0, X1, X2, X3, [X4]``  |
+  |                                 |             | |   ``CMP    X0, X6``                |
+  |                                 |             | |   ``CCMP   X1, X7, 0, EQ``         |
+  |                                 |             | |   ``B.NE   loop``                  |
+  +                                 +-------------+--------------------------------------+
+  |                                 | ``LSE128``  | | ``MOV    X0, X2``                  |
+  |                                 |             | | ``MOV    X1, X3``                  |
+  |                                 |             | | ``SWPPAL X0, X1, [X4]``            |
+  +---------------------------------+-------------+--------------------------------------+
+  | ``fetch_add(loc,val,relaxed)``  | ``BASE``    | | ``loop:``                          |
+  |                                 |             | |   ``LDXP   X0, X1, [X4]``          |
+  |                                 |             | |   ``ADDS   X0, X0, X2``            |
+  |                                 |             | |   ``ADC    X1, X1, X3``            |
+  |                                 |             | |   ``STXP   W5, X2, X3, [X4]``      |
+  |                                 |             | |   ``CBNZ   W5, loop``              |
+  +                                 +-------------+--------------------------------------+
+  |                                 | ``LSE``     | | ``LDP   X0, X1, [X4]``             |
+  |                                 |             | | ``loop:``                          |
+  |                                 |             | |   ``MOV    X6, X0``                |
+  |                                 |             | |   ``MOV    X7, X1``                |
+  |                                 |             | |   ``ADDS   X8, X0, X2``            |
+  |                                 |             | |   ``ADC    X9, X1, X3``            |
+  |                                 |             | |   ``CASP   X0, X1, X8, X9, [X4]``  |
+  |                                 |             | |   ``CMP    X0, X6``                |
+  |                                 |             | |   ``CCMP   X1, X7, 0, EQ``         |
+  |                                 |             | |   ``B.NE   loop``                  |
+  +---------------------------------+-------------+--------------------------------------+
+  | ``fetch_add(loc,val,acq)``      | ``BASE``    | | ``loop:``                          |
+  |                                 |             | |   ``LDAXP  X0, X1, [X4]``          |
+  |                                 |             | |   ``ADDS   X0, X0, X2``            |
+  |                                 |             | |   ``ADC    X1, X1, X3``            |
+  |                                 |             | |   ``STXP   W5, X2, X3, [X4]``      |
+  |                                 |             | |   ``CBNZ   W5, loop``              |
+  +                                 +-------------+--------------------------------------+
+  |                                 | ``LSE``     | | ``LDP   X0, X1, [X4]``             |
+  |                                 |             | | ``loop:``                          |
+  |                                 |             | |   ``MOV    X6, X0``                |
+  |                                 |             | |   ``MOV    X7, X1``                |
+  |                                 |             | |   ``ADDS   X8, X0, X2``            |
+  |                                 |             | |   ``ADC    X9, X1, X3``            |
+  |                                 |             | |   ``CASPA  X0, X1, X8, X9, [X4]``  |
+  |                                 |             | |   ``CMP    X0, X6``                |
+  |                                 |             | |   ``CCMP   X1, X7, 0, EQ``         |
+  |                                 |             | |   ``B.NE   loop``                  |
+  +---------------------------------+-------------+--------------------------------------+
+  | ``fetch_add(loc,val,rel)``      | ``BASE``    | | ``loop:``                          |
+  |                                 |             | |   ``LDXP   X0, X1, [X4]``          |
+  |                                 |             | |   ``ADDS   X0, X0, X2``            |
+  |                                 |             | |   ``ADC    X1, X1, X3``            |
+  |                                 |             | |   ``STLXP  W5, X2, X3, [X4]``      |
+  |                                 |             | |   ``CBNZ   W5, loop``              |
+  +                                 +-------------+--------------------------------------+
+  |                                 | ``LSE``     | | ``LDP   X0, X1, [X4]``             |
+  |                                 |             | | ``loop:``                          |
+  |                                 |             | |   ``MOV    X6, X0``                |
+  |                                 |             | |   ``MOV    X7, X1``                |
+  |                                 |             | |   ``ADDS   X8, X0, X2``            |
+  |                                 |             | |   ``ADC    X9, X1, X3``            |
+  |                                 |             | |   ``CASPL  X0, X1, X8, X9, [X4]``  |
+  |                                 |             | |   ``CMP    X0, X6``                |
+  |                                 |             | |   ``CCMP   X1, X7, 0, EQ``         |
+  |                                 |             | |   ``B.NE   loop``                  |
+  +---------------------------------+-------------+--------------------------------------+
+  | ``fetch_add(loc,val,acq_rel)``  | ``BASE``    | | ``loop:``                          |
+  | ``fetch_add(loc,val,sc)``       |             | |   ``LDAXP  X0, X1, [X4]``          |
+  |                                 |             | |   ``ADDS   X0, X0, X2``            |
+  |                                 |             | |   ``ADC    X1, X1, X3``            |
+  |                                 |             | |   ``STXLP  W5, X2, X3, [X4]``      |
+  |                                 |             | |   ``CBNZ   W5, loop``              |
+  +                                 +-------------+--------------------------------------+
+  |                                 | ``LSE``     | | ``LDP   X0, X1, [X4]``             |
+  |                                 |             | | ``loop:``                          |
+  |                                 |             | |   ``MOV    X6, X0``                |
+  |                                 |             | |   ``MOV    X7, X1``                |
+  |                                 |             | |   ``ADDS   X8, X0, X2``            |
+  |                                 |             | |   ``ADC    X9, X1, X3``            |
+  |                                 |             | |   ``CASPAL X0, X1, X8, X9, [X4]``  |
+  |                                 |             | |   ``CMP    X0, X6``                |
+  |                                 |             | |   ``CCMP   X1, X7, 0, EQ``         |
+  |                                 |             | |   ``B.NE   loop``                  |
+  +---------------------------------+-------------+--------------------------------------+
+  | ``fetch_or(loc,val,relaxed)``   | ``LSE128``  | | ``MOV      X0, X2``                |
+  |                                 |             | | ``MOV      X1, X3``                |
+  |                                 |             | | ``LDSETP   X0, X1, [X4]``          |
+  +---------------------------------+-------------+--------------------------------------+
+  | ``fetch_or(loc,val,acq)``       | ``LSE128``  | | ``MOV      X0, X2``                |
+  |                                 |             | | ``MOV      X1, X3``                |
+  |                                 |             | | ``LDSETPA  X0, X1, [X4]``          |
+  +---------------------------------+-------------+--------------------------------------+
+  | ``fetch_or(loc,val,rel)``       | ``LSE128``  | | ``MOV      X0, X2``                |
+  |                                 |             | | ``MOV      X1, X3``                |
+  |                                 |             | | ``LDSETPL  X0, X1, [X4]``          |
+  +---------------------------------+-------------+--------------------------------------+
+  | ``fetch_or(loc,val,acq_rel)``   | ``LSE128``  | | ``MOV      X0, X2``                |
+  | ``fetch_or(loc,val,sc)``        |             | | ``MOV      X1, X3``                |
+  |                                 |             | | ``LDSETPAL X0, X1, [X4]``          |
+  +---------------------------------+-------------+--------------------------------------+
+  | ``fetch_and(loc,val,relaxed)``  | ``LSE128``  | | ``MVN      X0, X2``                |
+  |                                 |             | | ``MVN      X1, X3``                |
+  |                                 |             | | ``LDCLRP   X0, X1, [X4]``          |
+  +---------------------------------+-------------+--------------------------------------+
+  | ``fetch_and(loc,val,acq)``      | ``LSE128``  | | ``MVN      X0, X2``                |
+  |                                 |             | | ``MVN      X1, X3``                |
+  |                                 |             | | ``LDCLRPA  X0, X1, [X4]``          |
+  +---------------------------------+-------------+--------------------------------------+
+  | ``fetch_and(loc,val,rel)``      | ``LSE128``  | | ``MVN      X0, X2``                |
+  |                                 |             | | ``MVN      X1, X3``                |
+  |                                 |             | | ``LDCLRPL  X0, X1, [X4]``          |
+  +---------------------------------+-------------+--------------------------------------+
+  | ``fetch_and(loc,val,acq_rel)``  | ``LSE128``  | | ``MVN      X0, X2``                |
+  | ``fetch_and(loc,val,sc)``       |             | | ``MVN      X1, X3``                |
+  |                                 |             | | ``LDCLRPAL X0, X1, [X4]``          |
+  +---------------------------------+-------------+--------------------------------------+
+  | ``compare_exchange_strong(``    | ``BASE``    | | ``loop:``                          |
+  |   ``loc,&exp,val,relaxed,``     +             + |   ``LDXP   X6, x7, [X4]``          +
+  |   ``relaxed)``                  +             + |   ``CMP    X6, X0``                +
+  +                                 |             | |   ``CCMP   X7, X1, 0, EQ``         |
+  |                                 +             + |   ``CSEL   X8, X2, X6, EQ``        +
+  |                                 +             + |   ``CSEL   X9, X3, X7, EQ``        +
+  |                                 +             + |   ``STXP   W5, X8, X9, [X4]``      +
+  |                                 +             + |   ``CBNZ   W5, loop``              +
+  +                                 +             + | ``MOV   X0, X6``                   +
+  +                                 +             + | ``MOV   X1, X7``                   +
+  +                                 +-------------+--------------------------------------+
+  |                                 | ``LSE``     | ``CASP    X0, X1, X2, X3, [X4]``     |
+  +---------------------------------+-------------+--------------------------------------+
+  | ``compare_exchange_strong(``    | ``BASE``    | | ``loop:``                          |
+  |   ``loc,&exp,val,acq, acq)``    +             + |   ``LDAXP  X6, x7, [X4]``          +
+  |                                 +             + |   ``CMP    X6, X0``                +
+  +                                 |             | |   ``CCMP   X7, X1, 0, EQ``         |
+  |                                 +             + |   ``CSEL   X8, X2, X6, EQ``        +
+  |                                 +             + |   ``CSEL   X9, X3, X7, EQ``        +
+  |                                 +             + |   ``STXP   W5, X8, X9, [X4]``      +
+  |                                 +             + |   ``CBNZ   W5, loop``              +
+  +                                 +             + | ``MOV   X0, X6``                   +
+  +                                 +             + | ``MOV   X1, X7``                   +
+  +                                 +-------------+--------------------------------------+
+  |                                 | ``LSE``     | ``CASPA   X0, X1, X2, X3, [X4]``     |
+  +---------------------------------+-------------+--------------------------------------+
+  | ``compare_exchange_strong(``    | ``BASE``    | | ``loop:``                          |
+  |   ``loc,&exp,val,rel,rel)``     +             + |   ``LDXP   X6, x7, [X4]``          +
+  |                                 +             + |   ``CMP    X6, X0``                +
+  +                                 |             | |   ``CCMP   X7, X1, 0, EQ``         |
+  |                                 +             + |   ``CSEL   X8, X2, X6, EQ``        +
+  |                                 +             + |   ``CSEL   X9, X3, X7, EQ``        +
+  |                                 +             + |   ``STLXP  W5, X8, X9, [X4]``      +
+  |                                 +             + |   ``CBNZ   W5, loop``              +
+  +                                 +             + | ``MOV   X0, X6``                   +
+  +                                 +             + | ``MOV   X1, X7``                   +
+  +                                 +-------------+--------------------------------------+
+  |                                 | ``LSE``     | ``CASPL   X0, X1, X2, X3, [X4]``     |
+  +---------------------------------+-------------+--------------------------------------+
+  | ``compare_exchange_strong(``    | ``BASE``    | | ``loop:``                          |
+  |   ``loc,&exp,val,acq_rel,acq)`` +             + |   ``LDAXP  X6, x7, [X4]``          +
+  |                                 +             + |   ``CMP    X6, X0``                +
+  + ``compare_exchange_strong(``    |             | |   ``CCMP   X7, X1, 0, EQ``         |
+  |   ``loc,&exp,val,sc,sc)``       +             + |   ``CSEL   X8, X2, X6, EQ``        +
+  |                                 +             + |   ``CSEL   X9, X3, X7, EQ``        +
+  |                                 +             + |   ``STLXP  W5, X8, X9, [X4]``      +
+  |                                 +             + |   ``CBNZ   W5, loop``              +
+  +                                 +             + | ``MOV   X0, X6``                   +
+  +                                 +             + | ``MOV   X1, X7``                   +
+  +                                 +-------------+--------------------------------------+
+  |                                 | ``LSE``     | ``CASPAL  X0, X1, X2, X3, [X4]``     |
+  +---------------------------------+-------------+--------------------------------------+
+
+
+We do not list other variants of ``fetch_<op>`` since their mappings should be
+the same (modulo implementations of <op> that are not in scope of this
+document). Precisely implementations that use loops should use the instructions
+that load or store from memory with the relevant memory order, and the
+appropriate <op> Assembly Sequence inside the loop. Exceptions, where Assembly 
+Sequences exist, are stated (for instance ``fetch_or`` can be implemented using
+``LDSETP`` when the LSE128 extension is enabled).
+
+Special Cases
+-------------
+
+There are special cases in the Mappings presented above, these must be handled
+in order to prevent unexpected outcomes of the compiled program.
+
+Re-Ordering of Read-Modify-Write Effects and Acquire Fence
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Consider the following Concurrent Program::
+
+  // Shared-Memory Locations
+  _Atomic int* x;
+  _Atomic int* y;
+
+  // Memory Order Parameter
+  #define relaxed memory_order_relaxed
+  #define release memory_order_release
+  #define acquire memory_order_acquire
+
+  // Threads of Execution
+  void thread_0 () {
+    atomic_store_explicit(x,1,relaxed);
+    atomic_thread_fence(release);
+    atomic_store_explicit(y,1,relaxed);
+  }
+
+  void thread_1 () {
+    atomic_exchange_explicit(y,2,release);
+    atomic_thread_fence(acquire);
+    int r0 = atomic_load_explicit(x,relaxed);
+  }
+
+
+Under ISO C, the above Concurrent Program finishes execution in one of three
+possible outcomes::
+
+  { thread_1:r0=0; y=1; }
+  { thread_1:r0=1; y=1; }
+  { thread_1:r0=1; y=2; }
+
+In this case the value read by the exchange on ``thread_1`` is not used, and a
+compiler is free to remove references to unused data. It is thus legal under
+ISO C for a compiler to translate the program into the following Assembly
+Sequences::
+
+  thread_0:
+    MOV W9,#1
+    STR W9,[X2]
+    DMB ISH
+    STR W3,[X4]
+
+  thread_1:
+    MOV W9,#2
+    SWP W9, WZR, [X2]
+    DMB ISHLD
+    LDR W3,[X4]
+
+where ``thread_0:X2`` contains the address of ``x``, ``thread_0:X4`` contains
+the address of ``y``, and
+``thread_1:X2`` contains the address of ``y``, ``thread_1:X4`` contains the
+address of ``x``.
+
+Note: the ``exchange`` Atomic Operation is compiled to a ``SWP`` Assembly
+Instruction, where its destination register is the zero register ``WZR``. The 
+``acquire`` fence on ``thread_1`` is compiled to the ``DMB ISHLD`` Assembly 
+Instruction.
+
+Executing the compiled program on an Arm-based machine from a fixed initial
+state (where ``x`` and ``y`` are ``0``) produces one of the following outcomes,
+according to the AArch64 Memory Model contained in §B2 of the Arm Architecture
+Reference Manual [ARMARM_]::
+
+  { thread_1:r0=0; [y]=1; }
+  { thread_1:r0=0; [y]=2; } <-- Forbidden by source model, a bug!
+  { thread_1:r0=1; [y]=1; }
+  { thread_1:r0=1; [y]=2; }
+
+By comparing ``W3`` and the local variable ``r0`` of the original Concurrent
+Program we see there is one additional Outcome of executing the compiled
+program that is not an outcome of executing the Concurrent Program. This is due
+to the fact that according to the Arm Architecture Reference Manual [ARMARM_] 
+*instructions where the destination register is WZR or XZR, are not regarded as
+doing a read for the purpose of a DMB LD barrier.*
+
+ISO C permits a conforming implementation to delete unused data, but in this
+case it introduces another Outcome of Execution. To fix this issue, a compiler
+should not rewrite the destination register to be the zero register in this
+case::
+
+  thread_0:
+    MOV W9,#1
+    STR W9,[X2]
+    DMB ISH
+    STR W3,[X4]
+
+  thread_1:
+    MOV W9,#2
+    SWP W9, W10, [X2]
+    DMB ISHLD
+    LDR W3,[X4]
+
+Executing the compiled program on an Arm-based machine from a fixed initial
+state (where ``x`` and ``y`` are ``0``) produces one of the following outcomes,
+according to the AArch64 Memory Model contained in §B2 of the Arm Architecture
+Reference Manual [ARMARM_]::
+
+  { thread_1:r0=0; [y]=1; }
+  { thread_1:r0=1; [y]=1; }
+  { thread_1:r0=1; [y]=2; }
+
+As such the unexpected outcome has disappeared. There are multiple Mappings
+that exhibit this behaviour, those effected make use of ``SWP`` and ``LD<OP>``
+Assembly instructions. These include but are not limited to:
+
+.. table::
+
+  +-----------------------------------------+--------------------------------------+
+  | Atomic Operation                        | Assembly Sequence                    |
+  +=========================================+======================================+
+  | ``exchange(loc,val,sc)``                | ``MOV W4, #val;``                    |
+  |                                         | ``SWP W4, W10, [X1]``                |
+  +-----------------------------------------+--------------------------------------+
+  | ``fetch_add(loc,val,sc)``               | ``MOV W4, #val;``                    |
+  |                                         | ``LDADD W4, W10, [X1]``              |
+  +-----------------------------------------+--------------------------------------+
+
+Where ``X1`` contains the address of ``loc``.
+
+Const-Qualified 128-bit Atomic Loads
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Const-qualified data containing 128-bit atomic types should not be placed
+in readonly memory (the ``.rodata`` section).
+
+Before LSE2, the only way to implement a single-copy 128-bit atomic load
+is by using a Read-Modify-Write sequence. The write is not visible to
+software if the memory is writeable. Compilers and runtimes should use the
+LSE2/LRCPC3 sequence when available.
+
+
+Declarative statement of Mappings compatibility
+===============================================
+
+To ensure that the above Mappings are ABI-compatible we test the compilation of
+Concurrent Programs, where each Atomic Operation is compiled to one of the
+aforementioned Mappings. We test if there is a compiled program that exhibits
+an outcome of execution according to the AArch64 Memory Model contained in §B2
+of the Arm Architecture Reference Manual [ARMARM_] that is not an outcome of
+execution of the source program under the ISO C model. In this section we
+define the process by which we test compatibility.
+
+The Mix Testing Process
+-----------------------
+
+We test for Compiler bugs, a compiler bug is defined as an Outcome of a
+compiled program execution (under the AArch64 model) that is not an Outcome of
+execution of the source Concurrent Program (under the ISO C model). Consider
+the hypothetical example where a source Concurrent Program finishes execution
+in one of three possible outcomes::
+
+  { thread_0:r0=0, thread_1:r0=1 }
+  { thread_0:r0=1, thread_1:r0=0 }
+  { thread_0:r0=1, thread_1:r0=1 }
+
+and one possible compiled program outcome has the following according to the
+AArch64 Memory Model contained in §B2 of the Arm Architecture Reference Manual
+[ARMARM_]::
+
+  { thread_0:X3=0, thread_1:X3=0 } <--- Forbidden by source model, compiler bug!
+  { thread_0:X3=0, thread_1:X3=1 }
+  { thread_0:X3=1, thread_1:X3=0 }
+  { thread_0:X3=1, thread_1:X3=1 }
+
+By comparing ``X3`` and the local variable ``r0`` of the original Concurrent
+Program in this example we see there is one additional outcome of executing the
+compiled program that is not an outcome of executing the source program (under
+the respective models). This suggests the Mappings under question are
+incompatible, and a compiler that implements them exhibits a compiler bug. To
+ensure compatibility we therefore test for the absence of such Outcomes of the
+compiled programs when mixing all combinations of the above Mappings. We define
+the *Mix Testing* process as follows:
+
+#. Given a C/C++ Concurrent Program.
+#. Split it into its representative Atomic Operations.
+#. Compile each Atomic Operation separately using a Compiler Profile that
+   generates Assembly Sequences under a given Mapping.
+#. Combine the Assembly Sequences into *multiple* possible Compiled Programs.
+#. Compute the outcomes of executing the Source Concurrent Program under the
+   ISO C memory model. Get source program outcomes *S*.
+#. Compute the outcomes of each compiled program under the AArch64 memory model
+   [ARMARM_]. Get a *set* of compiled program outcomes *C*.
+#. If any *c* in *C* exhibits a compiler bug with respect to the outcomes *S*
+   then the given mappings are not interoperable.
+
+Using Mix Testing we now define ABI-Compatibility of Atomic Operations.
+
+
+Definition of ABI-Compatibility for Atomic Operations
+-----------------------------------------------------
+
+*A compiler that implements the above set of Mappings is ABI-Compatible with
+respect to other compilers that implement the Mappings, if Mix Testing their
+code generation finds no compiler bugs.*
+
+We impose some constraints on this definition:
+
+* This is not a correctness guarantee, but rather a statement backed up by
+  bounded testing. Atomics ABI-compatibility is thus tested for the Mappings
+  above by generating C/C++ Concurrent Programs that permute combinations of
+  Atomic Operations on each Thread of Execution. We bound our test size between
+  2 and 5 Threads of Execution, where each Thread has at least 1 Atomic
+  Operation or Synchronization Operation and at most 5 Atomic Operations or
+  Synchronization Operations. We do not make any statement about the
+  ABI-Compatibility of Concurrent Programs outside these bounds.
+* We test Concurrent Programs with a fixed initial state, loop unroll factor
+  (equal to 1 loop unroll), and function calls or recursion. 
+* The above Mappings are not exhaustive, We hope that Arm's partners will
+  submit requests for other Mappings to the ABI team using the issue tracker
+  page on GitHub.
+* This document makes no statement about the ABI-Compatibility of optimised
+  Concurrent Programs, nor does a statement concerning the performance of
+  compiled programs under the above Mappings when executed on a given Arm-based
+  machine.
+* This document makes no statement about the ABI-Compatibility of compilers
+  that implement Mappings other than what is stated in this document.
+
diff --git a/tools/common/check-rst-syntax.sh b/tools/common/check-rst-syntax.sh
index cd99217e..842bec51 100755
--- a/tools/common/check-rst-syntax.sh
+++ b/tools/common/check-rst-syntax.sh
@@ -38,6 +38,9 @@ declare -a docs=(
 
     # semihosting
     "semihosting"
+
+    # atomics
+    "atomicsabi64"
 )
 
 for doc in "${docs[@]}"; do
diff --git a/tools/common/generate-release-links.sh b/tools/common/generate-release-links.sh
index db008878..2774e788 100755
--- a/tools/common/generate-release-links.sh
+++ b/tools/common/generate-release-links.sh
@@ -57,6 +57,7 @@ cat <<EOF
 - $(spec aadwarf64 "DWARF for the Arm 64-bit Architecture")
 - $(spec cppabi64 "C++ ABI for the Arm 64-bit Architecture")
 - $(spec vfabia64 "Vector Function ABI for the Arm 64-bit Architecture")
+- $(spec atomicsabi64 "C/C++ Atomics Application Binary Interface Standard for the Arm 64-bit Architecture")
 
 #### PAuth ABI Extension
 - $(spec pauthabielf64 "PAuth ABI Extension to ELF for the Arm 64-bit Architecture")
diff --git a/tools/rst2pdf/generate-pdfs.sh b/tools/rst2pdf/generate-pdfs.sh
index 4878dc3d..96784e9a 100755
--- a/tools/rst2pdf/generate-pdfs.sh
+++ b/tools/rst2pdf/generate-pdfs.sh
@@ -45,6 +45,9 @@ declare -a docs=(
 
     # semihosting
     "semihosting"
+
+    # atomics
+    "atomicsabi64"
 )
 
 for doc in "${docs[@]}"; do

From 13d2d86f72ba39a7a92881bccf0980573016e0c4 Mon Sep 17 00:00:00 2001
From: lukeg101 <6547672+lukeg101@users.noreply.github.com>
Date: Thu, 2 May 2024 15:34:22 +0100
Subject: [PATCH 02/17] [atomicsabi64]: address code review comments

---
 atomicsabi64/README.md        |    2 +-
 atomicsabi64/atomicsabi64.rst | 1324 ++++++++++++++++++++-------------
 2 files changed, 789 insertions(+), 537 deletions(-)

diff --git a/atomicsabi64/README.md b/atomicsabi64/README.md
index 64136bd4..24bea6b6 100644
--- a/atomicsabi64/README.md
+++ b/atomicsabi64/README.md
@@ -2,7 +2,7 @@
    <img src="Arm_logo_blue_RGB.svg" />
 </div>
 
-# Atomics ABI for the Arm® 64-bit Architecture (AArch64)
+# C/C++ Atomics ABI for the Arm® 64-bit Architecture (AArch64)
 
 
 ## About this document
diff --git a/atomicsabi64/atomicsabi64.rst b/atomicsabi64/atomicsabi64.rst
index 128067a3..f0520c33 100644
--- a/atomicsabi64/atomicsabi64.rst
+++ b/atomicsabi64/atomicsabi64.rst
@@ -13,6 +13,7 @@
 .. _AAELF64: https://github.com/ARM-software/abi-aa/releases
 .. _CPPABI64: https://github.com/ARM-software/abi-aa/releases
 .. _CSTD: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1548.pdf
+.. _PAPER: https://doi.org/10.1109/CGO57630.2024.10444836
 
 *********************************************************************************************
 C/C++ Atomics Application Binary Interface Standard for the Arm\ :sup:`®` 64-bit Architecture
@@ -45,7 +46,7 @@ Abstract
 --------
 
 This document describes the C/C++ Atomics Application Binary Interface for the
-Arm 64-bit architecture. This document concerns the valid mappings from C/C++
+Arm 64-bit architecture. This document concerns the valid Mappings from C/C++
 Atomic Operations to sequences of A64 instructions. For matters concerning the
 memory model, please consult §B2 of the Arm Architecture Reference Manual
 [ARMARM_]. We focus only on a subset of the C11 atomic operations at the time
@@ -60,7 +61,7 @@ Atomics, Concurrency
 Latest release and defects report
 ---------------------------------
 
-Please check `Atomics Application Binary Interface for the Arm® Architecture
+Please check `C/C++ Atomics Application Binary Interface Standard for the Arm 64-bit Architecture
 <https://github.com/ARM-software/abi-aa>`_ for the latest
 release of this document.
 
@@ -230,8 +231,15 @@ This document refers to, or is referred to by, the following documents.
   +-------------+--------------------------------------------------------------+-----------------------------------------------------------------------------+
   | CSTD_       | ISO/IEC 9899:2018                                            | International Standard ISO/IEC 9899:2018 – Programming languages C.         |
   +-------------+--------------------------------------------------------------+-----------------------------------------------------------------------------+
+  | AAELF64_    | ELF for the Arm 64-bit Architecture (AArch64)                | ELF for the Arm 64-bit Architecture (AArch64)                               |
+  +-------------+--------------------------------------------------------------+-----------------------------------------------------------------------------+
+  | PAPER_      | CGO paper                                                    | Compiler Testing with Relaxed Memory Models                                 |
+  +-------------+--------------------------------------------------------------+-----------------------------------------------------------------------------+
+
+
 
-Note: At the time of writing C23 is not released, as such ISO C17 is considered the latest published document.
+Note: At the time of writing C23 is not released, as such ISO C17 is considered
+the latest published document.
 
 .. raw:: pdf
 
@@ -240,7 +248,7 @@ Note: At the time of writing C23 is not released, as such ISO C17 is considered
 Terms and Abbreviations
 -----------------------
 
-The Atomics ABI for the Arm 64-bit Architecture uses the following terms and
+The C/C++ Atomics ABI for the Arm 64-bit Architecture uses the following terms and
 abbreviations.
 
 A64
@@ -264,12 +272,6 @@ ABI
 Arm-based
    ... based on the Arm architecture ...
 
-Concurrent Program
-   A C or C++ program that consists of one or more Threads of Execution. Each
-   Thread of Execution must communicate with other threads in the Concurrent
-   Program through Shared-Memory Locations, using Atomic Operations to be
-   deemed *concurrent*.
-
 Thread of Execution
    A unit of computation that executes one or more Atomic Operations,
    Synchronization Operations or other C language statements. The Arm
@@ -285,18 +287,26 @@ Atomic Operation
    operations and the ``atomic`` type. Operations on atomic-qualified data are
    guaranteed not to be interrupted by another Thread of Execution.
 
+Concurrent Program
+   A C or C++ program that consists of one or more Threads of Execution. Each
+   Thread of Execution must communicate with other threads in the Concurrent
+   Program through Shared-Memory Locations, using both Atomic Operations and
+   Non-Atomic Operations (Operations that lack the atomic qualifier) to be
+   deemed *concurrent*. This document focuses on compiling such programs for
+   Arm-based machines that run the A64 instruction set.
+
 Synchronization Operation
    The order that atomic operations are executed by each Thread of Execution
    may not be the same as the order they are written in the program.
    Synchronization Operations are statements that constrain the order of
    accesses made to Shared-Memory Locations by each thread. Synchronization
-   Operations include Thread Fences, and certain control flow structures.
+   Operations include Thread Fences.
 
 Shared-Memory Location
    A memory location that can be accessed by any Thread of Execution in the
    program.
 
-Memory Order Parameters
+Memory Order Parameter
    Describes a constraint on an Atomic Operation or Synchronization Operation.
    Memory Order describes how memory accesses made by Atomic Operations may be
    ordered with respect to other Atomic Operations and Synchronization
@@ -311,23 +321,16 @@ Thread Fence
    ``atomic_thread_fence`` to synchronize the order of accesses made by atomic
    operations on ``_Atomic`` qualified data.
 
-Atomic Instruction
-   An A64 instruction that may have Memory Order semantics. For instance an A64
-   LDR instruction has no atomicity, but the LDAR instruction has *acquire*
-   semantics. (see [ARMARM_]). 
-
 Assembly Sequence
-   A sequence of Atomic Instructions.
+   A sequence of A64 instructions, optionally including Atomic Instructions.
 
 Mapping
-   A pair of Atomic Operation and Assembly Sequence. A compiler generates the
-   Assembly Sequence, given an Atomic Operation and Compiler Profile as input.
+   A Mapping takes an Atomic Operation and Compiler Profile as input, 
+   producing an Assembly Sequence as output.
 
 Compiler Profile
-   A combination of a compiler and command-line flags that implements a set of
-   Mappings from Atomic Operations to A64 Assembly Sequences. When the compiler
-   is provided with a Concurrent Program and Compiler Profile, it generates an
-   Assembly Sequence.
+   A Compiler implementation and command-line flags or attributes that use
+   Mappings.
 
 More specific terminology is defined when it is first used.
 
@@ -342,8 +345,8 @@ The C/C++ Atomics ABI for the Arm 64-bit architecture (AABI64) comprises the
 following sub-components.
 
 * The `Mappings from Atomic Operations to Assembly Sequences`_, which defines
-  the mappings from C/C++ atomic operations to sequences of A64 assembly that
-  are interoperable with respect to each other.
+  the Mappings from C/C++ atomic operations to sto one of more Assembly 
+  Sequences that are interoperable with respect to each other.
 
 * A `Declarative statement of Mappings compatibility`_, as far as
   non-exhaustive testing can validate, that the aforementioned Mappings can be
@@ -355,7 +358,7 @@ Mappings from Atomic Operations to Assembly Sequences
 =====================================================
 
 We now describe the compatible Mappings for C/C++ Atomic Operations and
-Assembly Sequences. Since there is a large number of ways these mappings may be
+Assembly Sequences. Since there is a large number of ways these Mappings may be
 combined, we break down the tables by the width of the access, and list
 compatible Assembly Sequences for each Atomic Operation.
 
@@ -363,7 +366,7 @@ This is an open ABI, we encourage improvements to this specification to be
 submitted to the `issue tracker page on
 GitHub <https://github.com/ARM-software/abi-aa/issues>`_.
 
-These mappings are not exhaustive, but aim to cover the atomics we have tested.
+These Mappings are not exhaustive, but aim to cover the atomics we have tested.
 Please request more atomics using the issue tracker.
 
 Notational Conventions
@@ -409,6 +412,8 @@ options.
   +--------------------------------------------+-----------+--------------------------------------+
 
 Where ARCH is for example BASE (armv8), LSE, LSE2, LSE128, RCPC, or LRCPC3.
+ARCH describes the required extension, with BASE meaning Armv8-A with no
+extensions and LSE is shorthand for FEAT_LSE (likewise for the other extensions).
 
 Lastly, all operations are in a shorthand form:
 
@@ -443,6 +448,12 @@ Mappings for 32-bit types
 In what follows, register ``X1`` contains the location ``loc`` and ``W2``
 contains ``val``. The result is returned in ``W0``.
 
+  +-------------------------------------------------------------------------------------------+
+  | Note                                                                                      |
+  +===========================================================================================+
+  | ``*`` Using ``WZR`` or ``XZR`` for the destination register is invalid (Section 4.7).     |
+  +-------------------------------------------------------------------------------------------+
+
 .. table::
 
   +------------------------------------------+--------------------------------------+
@@ -450,8 +461,8 @@ contains ``val``. The result is returned in ``W0``.
   +==========================================+======================================+
   | ``store(loc,val,relaxed)``               | ``STR   W2, [X1]``                   |
   +------------------------------------------+--------------------------------------+
-  +| ``store(loc,val,rel)``                  + ``STLR  W2, [X1]``                   +
-  +| ``store(loc,val,sc)``                   +                                      +
+  | ``store(loc,val,rel)``                   | ``STLR  W2, [X1]``                   |
+  | ``store(loc,val,sc)``                    |                                      |
   +------------------------------------------+--------------------------------------+
   | ``load(loc,relaxed)``                    | ``LDR   W2, [X1]``                   |
   +-------------------------------+----------+--------------------------------------+
@@ -465,128 +476,168 @@ contains ``val``. The result is returned in ``W0``.
   +------------------------------------------+--------------------------------------+
   | ``fence(acq)``                           | ``DMB ISHLD``                        |
   +------------------------------------------+--------------------------------------+
-  | | ``fence(rel)``                         | ``DMB ISH``                          |
-  | | ``fence(acq_rel)``                     |                                      |
-  | | ``fence(sc)``                          |                                      |
+  | ``fence(rel)``                           | ``DMB ISH``                          |
+  | ``fence(acq_rel)``                       |                                      |
+  | ``fence(sc)``                            |                                      |
   +-------------------------------+----------+--------------------------------------+
-  | ``exchange(loc,val,relaxed)`` | ``BASE`` | | ``loop:``                          |
-  |                               +          + |   ``LDXR   W0, [X1]``              +
-  +                               |          | |   ``STXR   W3, W2, [X1]``          |
-  |                               +          + |   ``CBNZ   W3, loop``              +
-  +                               +----------+--------------------------------------+
-  |                               | ``LSE``  | ``SWP    W2, W0, [X1]``              | 
+  | ``exchange(loc,val,relaxed)`` | ``BASE`` | ``loop:``                            |
+  |                               |          |   ``LDXR   W0, [X1]``                |
+  |                               |          |                                      |
+  |                               |          |   ``STXR   W3, W2, [X1]``            |
+  |                               |          |                                      |
+  |                               |          |   ``CBNZ   W3, loop``                |
+  |                               +----------+--------------------------------------+
+  |                               | ``LSE``  | ``SWP    W2, W0, [X1]`` *            | 
   +-------------------------------+----------+--------------------------------------+
-  | ``exchange(loc,val,acq)``     | ``BASE`` | | ``loop:``                          |
-  |                               |          | |   ``LDAXR  W0, [X1]``              |
-  +                               +          + |   ``STXR   W3, W2, [X1]``          +
-  |                               |          | |   ``CBNZ   W3, loop``              |
-  +                               +----------+--------------------------------------+
-  |                               | ``LSE``  | ``SWPA   W2, W0, [X1]``              |  
+  | ``exchange(loc,val,acq)``     | ``BASE`` | ``loop:``                            |
+  |                               |          |   ``LDAXR  W0, [X1]``                |
+  |                               |          |                                      |
+  |                               |          |   ``STXR   W3, W2, [X1]``            |
+  |                               |          |                                      |
+  |                               |          |   ``CBNZ   W3, loop``                |
+  |                               +----------+--------------------------------------+
+  |                               | ``LSE``  | ``SWPA   W2, W0, [X1]`` *            |  
   +-------------------------------+----------+--------------------------------------+
-  | ``exchange(loc,val,rel)``     | ``BASE`` | | ``loop:``                          |
-  |                               |          | |   ``LDXR   W0, [X1]``              |
-  +                               +          + |   ``STLXR  W3, W2, [X1]``          +
-  |                               |          | |   ``CBNZ   W3, loop``              |
-  +                               +----------+--------------------------------------+
-  |                               | ``LSE``  | ``SWPL   W2, W0, [X1]``              | 
+  | ``exchange(loc,val,rel)``     | ``BASE`` | ``loop:``                            |
+  |                               |          |   ``LDXR   W0, [X1]``                |
+  |                               |          |                                      |
+  |                               |          |   ``STLXR  W3, W2, [X1]``            |
+  |                               |          |                                      |
+  |                               |          |   ``CBNZ   W3, loop``                |
+  |                               +----------+--------------------------------------+
+  |                               | ``LSE``  | ``SWPL   W2, W0, [X1]`` *            | 
   +-------------------------------+----------+--------------------------------------+
-  | ``exchange(loc,val,acq_rel)`` | ``BASE`` | | ``loop:``                          |
-  | ``exchange(loc,val,sc)``      |          | |   ``LDAXR  W0, [X1]``              |
-  +                               +          + |   ``STLXR  W3, W2, [X1]``          +
-  |                               |          | |   ``CBNZ   W3, loop``              |
-  +                               +----------+--------------------------------------+
-  |                               | ``LSE``  | ``SWPAL  W2, W0, [X1]``              | 
+  | ``exchange(loc,val,acq_rel)`` | ``BASE`` | ``loop:``                            |
+  | ``exchange(loc,val,sc)``      |          |   ``LDAXR  W0, [X1]``                |
+  |                               |          |                                      |
+  |                               |          |   ``STLXR  W3, W2, [X1]``            |
+  |                               |          |                                      |
+  |                               |          |   ``CBNZ   W3, loop``                |
+  |                               +----------+--------------------------------------+
+  |                               | ``LSE``  | ``SWPAL  W2, W0, [X1]`` *            | 
   +-------------------------------+----------+--------------------------------------+
-  | ``fetch_add(loc,val,relaxed)``| ``BASE`` | | ``loop:``                          |
-  |                               +          + |   ``LDXR   W0, [X1]``              +
-  |                               +          + |   ``ADD    W2, W2, W0``            +
-  +                               |          | |   ``STXR   W3, W2, [X1]``          |
-  |                               +          + |   ``CBNZ   W3, loop``              +
+  | ``fetch_add(loc,val,relaxed)``| ``BASE`` | ``loop:``                            |
+  |                               |          |   ``LDXR   W0, [X1]``                |
+  |                               |          |                                      |
+  |                               |          |   ``ADD    W2, W2, W0``              |
+  |                               |          |                                      |
+  |                               |          |   ``STXR   W3, W2, [X1]``            |
+  |                               |          |                                      |
+  |                               |          |   ``CBNZ   W3, loop``                |
   +                               +----------+--------------------------------------+
-  |                               | ``LSE``  | ``LDADD    W2, W0, [X1]``            |
+  |                               | ``LSE``  | ``LDADD    W2, W0, [X1]`` *          |
   +-------------------------------+----------+--------------------------------------+
-  | ``fetch_add(loc,val,acq)``    | ``BASE`` | | ``loop:``                          |
-  |                               +          + |   ``LDAXR  W0, [X1]``              +
-  |                               +          + |   ``ADD    W2, W2, W0``            +
-  +                               |          | |   ``STXR   W3, W2, [X1]``          |
-  |                               +          + |   ``CBNZ   W3, loop``              +
-  +                               +----------+--------------------------------------+
-  |                               | ``LSE``  | ``LDADDA   W2, W0, [X1]``            | 
+  | ``fetch_add(loc,val,acq)``    | ``BASE`` | ``loop:``                            |
+  |                               |          |   ``LDAXR  W0, [X1]``                |
+  |                               |          |                                      |
+  |                               |          |   ``ADD    W2, W2, W0``              |
+  |                               |          |                                      |
+  |                               |          |   ``STXR   W3, W2, [X1]``            |
+  |                               |          |                                      |
+  |                               |          |   ``CBNZ   W3, loop``                |
+  |                               +----------+--------------------------------------+
+  |                               | ``LSE``  | ``LDADDA   W2, W0, [X1]`` *          | 
   +-------------------------------+----------+--------------------------------------+
-  | ``fetch_add(loc,val,rel)``    | ``BASE`` | | ``loop:``                          |
-  |                               +          + |   ``LDXR   W0, [X1]``              +
-  |                               +          + |   ``ADD    W2, W2, W0``            +
-  +                               |          | |   ``STLXR  W3, W2, [X1]``          |
-  |                               +          + |   ``CBNZ   W3, loop``              +
-  +                               +----------+--------------------------------------+
-  |                               | ``LSE``  | ``LDADDL   W2, W0, [X1]``            |
+  | ``fetch_add(loc,val,rel)``    | ``BASE`` | ``loop:``                            |
+  |                               |          |   ``LDXR   W0, [X1]``                |
+  |                               |          |                                      |
+  |                               |          |   ``ADD    W2, W2, W0``              |
+  |                               |          |                                      |
+  |                               |          |   ``STLXR  W3, W2, [X1]``            |
+  |                               |          |                                      |
+  |                               |          |   ``CBNZ   W3, loop``                |
+  |                               +----------+--------------------------------------+
+  |                               | ``LSE``  | ``LDADDL   W2, W0, [X1]`` *          |
   +-------------------------------+----------+--------------------------------------+
-  | ``fetch_add(loc,val,acq_rel)``| ``BASE`` | | ``loop:``                          |
-  | ``fetch_add(loc,val,sc)``     +          + |   ``LDXAR  W0, [X1]``              +
-  |                               +          + |   ``ADD    W2, W2, W0``            +
-  +                               |          | |   ``STLXR  W3, W2, [X1]``          |
-  |                               +          + |   ``CBNZ   W3, loop``              +
-  +                               +----------+--------------------------------------+
-  |                               | ``LSE``  | ``LDADDAL  W2, W0, [X1]``            |
+  | ``fetch_add(loc,val,acq_rel)``| ``BASE`` | ``loop:``                            |
+  | ``fetch_add(loc,val,sc)``     |          |   ``LDXAR  W0, [X1]``                |
+  |                               |          |                                      |
+  |                               |          |   ``ADD    W2, W2, W0``              |
+  |                               |          |                                      |
+  |                               |          |   ``STLXR  W3, W2, [X1]``            |
+  |                               |          |                                      |
+  |                               |          |   ``CBNZ   W3, loop``                | 
+  |                               +----------+--------------------------------------+
+  |                               | ``LSE``  | ``LDADDAL  W2, W0, [X1]`` *          |
   +-------------------------------+----------+--------------------------------------+
-  | ``compare_exchange_strong(``  | ``BASE`` | | ``loop:``                          |
-  |   ``loc,&exp,val,relaxed,``   +          + |   ``LDXR   W0, [X1]``              +
-  |   ``relaxed)``                +          + |   ``CMP    W0, W4``                +
-  |                               +          + |   ``B.NE    fail``                 +
-  +                               |          | |   ``STXR   W3, W2, [X1]``          |
-  |                               +          + |   ``CBNZ   W3, loop``              +
-  +                               +          + | ``fail:``                          +
-  +                               +----------+--------------------------------------+
-  |                               | ``LSE``  | ``CAS    W0, W2, [X1]``              |
+  | ``compare_exchange_strong(``  | ``BASE`` | ``loop:``                            |
+  |   ``loc,&exp,val,relaxed,``   |          |   ``LDXR   W0, [X1]``                |
+  |   ``relaxed)``                |          |                                      |
+  |                               |          |   ``CMP    W0, W4``                  |
+  |                               |          |                                      |
+  |                               |          |   ``B.NE    fail``                   |
+  |                               |          |                                      |
+  |                               |          |   ``STXR   W3, W2, [X1]``            |
+  |                               |          |                                      |
+  |                               |          |   ``CBNZ   W3, loop``                |
+  |                               |          |                                      |
+  |                               |          | ``fail:``                            |
+  |                               +----------+--------------------------------------+
+  |                               | ``LSE``  | ``CAS    W0, W2, [X1]`` *            |
   +-------------------------------+----------+--------------------------------------+
-  | ``compare_exchange_strong(``  | ``BASE`` | | ``loop:``                          |
-  |   ``loc,&exp,val,acq,acq)``   +          + |   ``LDAXR  W0, [X1]``              +
-  |                               +          + |   ``CMP    W0, W4``                +
-  |                               +          + |   ``B.NE    fail``                 +
-  +                               |          | |   ``STXR   W3, W2, [X1]``          |
-  |                               +          + |   ``CBNZ   W3, loop``              +
-  +                               +          + | ``fail:``                          +
-  +                               +----------+--------------------------------------+
-  |                               | ``LSE``  | ``CASA   W0, W2, [X1]``              |
+  | ``compare_exchange_strong(``  | ``BASE`` | ``loop:``                            |
+  |   ``loc,&exp,val,acq,acq)``   |          |   ``LDAXR  W0, [X1]``                |
+  |                               |          |                                      |
+  |                               |          |   ``CMP    W0, W4``                  |
+  |                               |          |                                      |
+  |                               |          |   ``B.NE    fail``                   |
+  |                               |          |                                      |
+  |                               |          |   ``STXR   W3, W2, [X1]``            |
+  |                               |          |                                      |
+  |                               |          |   ``CBNZ   W3, loop``                |
+  |                               |          |                                      |
+  |                               |          | ``fail:``                            |
+  |                               +----------+--------------------------------------+
+  |                               | ``LSE``  | ``CASA   W0, W2, [X1]`` *            |
   +-------------------------------+----------+--------------------------------------+
-  | ``compare_exchange_strong(``  | ``BASE`` | | ``loop:``                          |
-  |   ``loc,&exp,val,rel,rel)``   +          + |   ``LDXR   W0, [X1]``              +
-  |                               +          + |   ``CMP    W0, W4``                +
-  |                               +          + |   ``B.NE    fail``                 +
-  +                               |          | |   ``STLXR  W3, W2, [X1]``          |
-  |                               +          + |   ``CBNZ   W3, loop``              +
-  +                               +          + | ``fail:``                          +
-  +                               +----------+--------------------------------------+
-  |                               | ``LSE``  | ``CASL   W0, W2, [X1]``              |
+  | ``compare_exchange_strong(``  | ``BASE`` | ``loop:``                            |
+  |   ``loc,&exp,val,rel,rel)``   |          |   ``LDXR   W0, [X1]``                |
+  |                               |          |                                      |
+  |                               |          |   ``CMP    W0, W4``                  |
+  |                               |          |                                      |
+  |                               |          |   ``B.NE    fail``                   |
+  |                               |          |                                      |
+  |                               |          |   ``STLXR  W3, W2, [X1]``            |
+  |                               |          |                                      |
+  |                               |          |   ``CBNZ   W3, loop``                |
+  |                               |          |                                      |
+  |                               |          | ``fail:``                            |
+  |                               +----------+--------------------------------------+
+  |                               | ``LSE``  | ``CASL   W0, W2, [X1]`` *            |
   +-------------------------------+----------+--------------------------------------+
-  | ``compare_exchange_strong(``  | ``BASE`` | | ``loop:``                          |
-  |  ``loc,&exp,val,acq_rel,acq)``+          + |   ``LDAXR  W0, [X1]``              +
-  |                               +          + |   ``CMP    W0, W4``                +
-  | ``compare_exchange_strong(``  +          + |   ``B.NE    fail``                 +
-  +   ``loc,&exp,val,sc,sc)``     |          | |   ``STLXR  W3, W2, [X1]``          |
-  |                               +          + |   ``CBNZ   W3, loop``              +
-  +                               +          + | ``fail:``                          +
-  +                               +----------+--------------------------------------+
-  |                               | ``LSE``  | ``CASAL  W0, W2, [X1]``              |
+  | ``compare_exchange_strong(``  | ``BASE`` | ``loop:``                            |
+  |  ``loc,&exp,val,acq_rel,acq)``|          |   ``LDAXR  W0, [X1]``                |
+  | ``compare_exchange_strong(``  |          |                                      |
+  |   ``loc,&exp,val,sc,sc)``     |          |   ``CMP    W0, W4``                  |
+  |                               |          |                                      |
+  |                               |          |   ``B.NE    fail``                   |
+  |                               |          |                                      |
+  |                               |          |   ``STLXR  W3, W2, [X1]``            |
+  |                               |          |                                      |
+  |                               |          |   ``CBNZ   W3, loop``                |
+  |                               |          |                                      |
+  |                               |          | ``fail:``                            |
+  |                               +----------+--------------------------------------+
+  |                               | ``LSE``  | ``CASAL  W0, W2, [X1]`` *            |
   +-------------------------------+----------+--------------------------------------+
 
 Mappings for 8-bit types
 ------------------------
 
-The mappings for 8-bit types are the same as 32-bit types except they use the
+The Mappings for 8-bit types are the same as 32-bit types except they use the
 ``B`` variants of instructions. 
 
 
 Mappings for 16-bit types
 -------------------------
 
-The mappings for 16-bit types are the same as 32-bit types except they use the
+The Mappings for 16-bit types are the same as 32-bit types except they use the
 ``H`` variants of instructions.
 
 Mappings for 64-bit types
 -------------------------
 
-The mappings for 64-bit types are the same as 32-bit types except the registers
+The Msappings for 64-bit types are the same as 32-bit types except the registers
 used are X-registers.
 
 Mappings for 128-bit types
@@ -604,329 +655,500 @@ In what follows, register ``X4`` contains the location ``loc``, ``X2`` and
   +-----------------------------------------------+--------------------------------------+
   | Atomic Operation                              | Assembly Sequence                    |
   +=================================+=============+======================================+
-  | ``store(loc,val,relaxed)``      | ``BASE``    | | ``loop:``                          |
-  |                                 |             | |   ``LDXP   XZR, X1, [X4]``         |
-  |                                 |             | |   ``STXP   W5, X2, X3, [X4]``      |
-  |                                 |             | |   ``CBNZ   W5, loop``              |
-  +                                 +-------------+--------------------------------------+
-  |                                 | ``LSE``     | | ``LDP   X0, X1, [X4]``             |
-  |                                 |             | | ``loop:``                          |
-  |                                 |             | |   ``MOV    X6, X0``                |
-  |                                 |             | |   ``MOV    X7, X1``                |
-  |                                 |             | |   ``CASP   X0, X1, X2, X3, [X4]``  |
-  |                                 |             | |   ``CMP    X0, X6``                |
-  |                                 |             | |   ``CCMP   X1, X7, 0, EQ``         |
-  |                                 |             | |   ``B.NE   loop``                  |
-  +                                 +-------------+--------------------------------------+
+  | ``store(loc,val,relaxed)``      | ``BASE``    | ``loop:``                            |
+  |                                 |             |   ``LDXP   XZR, X1, [X4]``           |
+  |                                 |             |                                      |
+  |                                 |             |   ``STXP   W5, X2, X3, [X4]``        |
+  |                                 |             |                                      |
+  |                                 |             |   ``CBNZ   W5, loop``                |
+  |                                 +-------------+--------------------------------------+
+  |                                 | ``LSE``     | ``LDP   X0, X1, [X4]``               |
+  |                                 |             |                                      |
+  |                                 |             | ``loop:``                            |
+  |                                 |             |   ``MOV    X6, X0``                  |
+  |                                 |             |                                      |
+  |                                 |             |   ``MOV    X7, X1``                  |
+  |                                 |             |                                      |
+  |                                 |             |   ``CASP   X0, X1, X2, X3, [X4]``    |
+  |                                 |             |                                      |
+  |                                 |             |   ``CMP    X0, X6``                  |
+  |                                 |             |                                      |
+  |                                 |             |   ``CCMP   X1, X7, 0, EQ``           |
+  |                                 |             |                                      |
+  |                                 |             |   ``B.NE   loop``                    |
+  |                                 +-------------+--------------------------------------+
   |                                 | ``LSE2``    | ``STP   x2, X3, [X4]``               |
   +---------------------------------+-------------+--------------------------------------+
-  | ``store(loc,val,rel)``          | ``BASE``    | | ``loop:``                          |
-  |                                 |             | |   ``LDXP    XZR, X1, [X4]``        |
-  |                                 |             | |   ``STLXP   W5, X2, X3, [X4]``     |
-  |                                 |             | |   ``CBNZ    W5, loop``             |
-  +                                 +-------------+--------------------------------------+
-  |                                 | ``LSE``     | | ``LDP   X0, X1, [X4]``             |
-  |                                 |             | | ``loop:``                          |
-  |                                 |             | |   ``MOV    X6, X0``                |
-  |                                 |             | |   ``MOV    X7, X1``                |
-  |                                 |             | |   ``CASPL  X0, X1, X2, X3, [X4]``  |
-  |                                 |             | |   ``CMP    X0, X6``                |
-  |                                 |             | |   ``CCMP   X1, X7, 0, EQ``         |
-  |                                 |             | |   ``B.NE   loop``                  |
-  +                                 +-------------+--------------------------------------+
-  |                                 | ``LSE2``    | | ``DMB   ISH``                      |
-  |                                 |             | | ``STP   X2, X3, [X4]``             |
-  +                                 +-------------+--------------------------------------+
+  | ``store(loc,val,rel)``          | ``BASE``    | ``loop:``                            |
+  |                                 |             |   ``LDXP    XZR, X1, [X4]``          |
+  |                                 |             |   ``STLXP   W5, X2, X3, [X4]``       |
+  |                                 |             |   ``CBNZ    W5, loop``               |
+  |                                 +-------------+--------------------------------------+
+  |                                 | ``LSE``     | ``LDP   X0, X1, [X4]``               |
+  |                                 |             |                                      |
+  |                                 |             | ``loop:``                            |
+  |                                 |             |   ``MOV    X6, X0``                  |
+  |                                 |             |                                      |
+  |                                 |             |   ``MOV    X7, X1``                  |
+  |                                 |             |                                      |
+  |                                 |             |   ``CASPL  X0, X1, X2, X3, [X4]``    |
+  |                                 |             |                                      |
+  |                                 |             |   ``CMP    X0, X6``                  |
+  |                                 |             |                                      |
+  |                                 |             |   ``CCMP   X1, X7, 0, EQ``           |
+  |                                 |             |                                      |
+  |                                 |             |   ``B.NE   loop``                    |
+  |                                 +-------------+--------------------------------------+
+  |                                 | ``LSE2``    | ``DMB   ISH``                        |
+  |                                 |             |                                      |
+  |                                 |             | ``STP   X2, X3, [X4]``               |
+  |                                 +-------------+--------------------------------------+
   |                                 | ``LRCPC3``  | ``STILP   X2, X3, [X4]``             |
   +---------------------------------+-------------+--------------------------------------+
-  | ``store(loc,val,sc)``           | ``BASE``    | | ``loop:``                          |
-  |                                 |             | |   ``LDXP    XZR, X1, [X4]``        |
-  |                                 |             | |   ``STLXP   W5, X2, X3, [X4]``     |
-  |                                 |             | |   ``CBNZ    W5, loop``             |
-  +                                 +-------------+--------------------------------------+
-  |                                 | ``LSE``     | | ``LDP   X0, X1, [X4]``             |
-  |                                 |             | | ``loop:``                          |
-  |                                 |             | |   ``MOV    X6, X0``                |
-  |                                 |             | |   ``MOV    X7, X1``                |
-  |                                 |             | |   ``CASPL  X0, X1, X2, X3, [X4]``  |
-  |                                 |             | |   ``CMP    X0, X6``                |
-  |                                 |             | |   ``CCMP   X1, X7, 0, EQ``         |
-  |                                 |             | |   ``B.NE   loop``                  |
-  +                                 +-------------+--------------------------------------+
-  |                                 | ``LSE2``    | | ``DMB   ISH``                      |
-  |                                 |             | | ``STP   X2, X3, [X4]``             |
-  |                                 |             | | ``DMB   ISH``                      |
-  +                                 +-------------+--------------------------------------+
+  | ``store(loc,val,sc)``           | ``BASE``    | ``loop:``                            |
+  |                                 |             |   ``LDXP    XZR, X1, [X4]``          |
+  |                                 |             |                                      |
+  |                                 |             |   ``STLXP   W5, X2, X3, [X4]``       |
+  |                                 |             |                                      |
+  |                                 |             |   ``CBNZ    W5, loop``               |
+  |                                 +-------------+--------------------------------------+
+  |                                 | ``LSE``     | ``LDP   X0, X1, [X4]``               |
+  |                                 |             |                                      |
+  |                                 |             | ``loop:``                            |
+  |                                 |             |   ``MOV    X6, X0``                  |
+  |                                 |             |                                      |
+  |                                 |             |   ``MOV    X7, X1``                  |
+  |                                 |             |                                      |
+  |                                 |             |   ``CASPL  X0, X1, X2, X3, [X4]``    |
+  |                                 |             |                                      |
+  |                                 |             |   ``CMP    X0, X6``                  |
+  |                                 |             |                                      |
+  |                                 |             |   ``CCMP   X1, X7, 0, EQ``           |
+  |                                 |             |                                      |
+  |                                 |             |   ``B.NE   loop``                    |
+  |                                 +-------------+--------------------------------------+
+  |                                 | ``LSE2``    | ``DMB   ISH``                        |
+  |                                 |             |                                      |
+  |                                 |             | ``STP   X2, X3, [X4]``               |
+  |                                 |             |                                      |
+  |                                 |             | ``DMB   ISH``                        |
+  |                                 +-------------+--------------------------------------+
   |                                 | ``LRCPC3``  | ``STILP   X2, X3, [X4]``             |
   +---------------------------------+-------------+--------------------------------------+
-  | ``load(loc,relaxed)``           | ``BASE``    | | ``loop:``                          |
-  |                                 |             | |   ``LDXP   X0, X1, [X4]``          |
-  |                                 |             | |   ``STXP   W5, X0, X1, [X4]``      |
-  |                                 |             | |   ``CBNZ   W5, loop``              |
-  +                                 +-------------+--------------------------------------+
+  | ``load(loc,relaxed)``           | ``BASE``    | ``loop:``                            |
+  |                                 |             |   ``LDXP   X0, X1, [X4]``            |
+  |                                 |             |                                      |
+  |                                 |             |   ``STXP   W5, X0, X1, [X4]``        |
+  |                                 |             |                                      |
+  |                                 |             |   ``CBNZ   W5, loop``                |
+  |                                 +-------------+--------------------------------------+
   |                                 | ``LSE``     | ``CASP   X0, X1, X0, X1, [X4]``      |
-  +                                 +-------------+--------------------------------------+
+  |                                 +-------------+--------------------------------------+
   |                                 | ``LSE2``    | ``LDP   X0, X1, [X4]``               |
   +---------------------------------+-------------+--------------------------------------+
-  | ``load(loc,acq)``               | ``BASE``    | | ``loop:``                          |
-  |                                 |             | |   ``LDAXP  X0, X1, [X4]``          |
-  |                                 |             | |   ``STXP   W5, X0, X1, [X4]``      |
-  |                                 |             | |   ``CBNZ   W5, loop``              |
-  +                                 +-------------+--------------------------------------+
+  | ``load(loc,acq)``               | ``BASE``    | ``loop:``                            |
+  |                                 |             |   ``LDAXP  X0, X1, [X4]``            |
+  |                                 |             |                                      |
+  |                                 |             |   ``STXP   W5, X0, X1, [X4]``        |
+  |                                 |             |                                      |
+  |                                 |             |   ``CBNZ   W5, loop``                |
+  |                                 +-------------+--------------------------------------+
   |                                 | ``LSE``     | ``CASPA  X0, X1, X0, X1, [X4]``      |
-  +                                 +-------------+--------------------------------------+
-  |                                 | ``LSE2``    | | ``LDP   X0, X1, [X4]``             |
-  |                                 |             | | ``DMB   ISHLD``                    |
-  +                                 +-------------+--------------------------------------+
+  |                                 +-------------+--------------------------------------+
+  |                                 | ``LSE2``    | ``LDP   X0, X1, [X4]``               |
+  |                                 |             |                                      |
+  |                                 |             | ``DMB   ISHLD``                      |
+  |                                 +-------------+--------------------------------------+
   |                                 | ``LRCPC3``  | ``LDIAPP   X0, X1, [X4]``            |
   +---------------------------------+-------------+--------------------------------------+
-  | ``load(loc,sc)``                | ``BASE``    | | ``loop:``                          |
-  |                                 |             | |   ``LDAXP   X0, X1, [X4]``         |
-  |                                 |             | |   ``STXP    W5, X0, X1, [X4]``     |
-  |                                 |             | |   ``CBNZ    W5, loop``             |
-  +                                 +-------------+--------------------------------------+
+  | ``load(loc,sc)``                | ``BASE``    | ``loop:``                            |
+  |                                 |             |   ``LDAXP   X0, X1, [X4]``           |
+  |                                 |             |                                      |
+  |                                 |             |   ``STXP    W5, X0, X1, [X4]``       |
+  |                                 |             |                                      |
+  |                                 |             |   ``CBNZ    W5, loop``               |
+  |                                 +-------------+--------------------------------------+
   |                                 | ``LSE``     | ``CASPA  X0, X1, X0, X1, [X4]``      |
-  +                                 +-------------+--------------------------------------+
-  |                                 | ``LSE2``    | | ``LDAR  X5, [X4]``                 |
-  |                                 |             | | ``LDP   X0, X1, [X4]``             |
-  |                                 |             | | ``DMB   ISHLD``                    |
-  +                                 +-------------+--------------------------------------+
-  |                                 | ``LRCPC3``  | | ``LDAR   X5, [X4]``                |
-  |                                 |             | | ``LDIAPP X0, X1, [X4]``            |
+  |                                 +-------------+--------------------------------------+
+  |                                 | ``LSE2``    | ``LDAR  X5, [X4]``                   |
+  |                                 |             |                                      |
+  |                                 |             | ``LDP   X0, X1, [X4]``               |
+  |                                 |             |                                      |
+  |                                 |             | ``DMB   ISHLD``                      |
+  |                                 +-------------+--------------------------------------+
+  |                                 | ``LRCPC3``  | ``LDAR   X5, [X4]``                  |
+  |                                 |             |                                      |
+  |                                 |             | ``LDIAPP X0, X1, [X4]``              |
   +---------------------------------+-------------+--------------------------------------+
-  | ``exchange(loc,val,relaxed)``   | ``BASE``    | | ``loop:``                          |
-  |                                 |             | |   ``LDXP   X0, X1, [X4]``          |
-  |                                 |             | |   ``STXP   W5, X2, X3, [X4]``      |
-  |                                 |             | |   ``CBNZ   W5, loop``              |
-  +                                 +-------------+--------------------------------------+
-  |                                 | ``LSE``     | | ``LDP   X0, X1, [X4]``             |
-  |                                 |             | | ``loop:``                          |
-  |                                 |             | |   ``MOV    X6, X0``                |
-  |                                 |             | |   ``MOV    X7, X1``                |
-  |                                 |             | |   ``CASP   X0, X1, X2, X3, [X4]``  |
-  |                                 |             | |   ``CMP    X0, X6``                |
-  |                                 |             | |   ``CCMP   X1, X7, 0, EQ``         |
-  |                                 |             | |   ``B.NE   loop``                  |
-  +                                 +-------------+--------------------------------------+
-  |                                 | ``LSE128``  | | ``MOV    X0, X2``                  |
-  |                                 |             | | ``MOV    X1, X3``                  |
-  |                                 |             | | ``SWPP   X0, X1, [X4]``            |
+  | ``exchange(loc,val,relaxed)``   | ``BASE``    | ``loop:``                            |
+  |                                 |             |   ``LDXP   X0, X1, [X4]``            |
+  |                                 |             |                                      |
+  |                                 |             |   ``STXP   W5, X2, X3, [X4]``        |
+  |                                 |             |                                      |
+  |                                 |             |   ``CBNZ   W5, loop``                |
+  |                                 +-------------+--------------------------------------+
+  |                                 | ``LSE``     | ``LDP   X0, X1, [X4]``               |
+  |                                 |             |                                      |
+  |                                 |             | ``loop:``                            |
+  |                                 |             |   ``MOV    X6, X0``                  |
+  |                                 |             |                                      |
+  |                                 |             |   ``MOV    X7, X1``                  |
+  |                                 |             |                                      |
+  |                                 |             |   ``CASP   X0, X1, X2, X3, [X4]``    |
+  |                                 |             |                                      |
+  |                                 |             |   ``CMP    X0, X6``                  |
+  |                                 |             |                                      |
+  |                                 |             |   ``CCMP   X1, X7, 0, EQ``           |
+  |                                 |             |                                      |
+  |                                 |             |   ``B.NE   loop``                    |
+  |                                 +-------------+--------------------------------------+
+  |                                 | ``LSE128``  | ``MOV    X0, X2``                    |
+  |                                 |             |                                      |
+  |                                 |             | ``MOV    X1, X3``                    |
+  |                                 |             |                                      |
+  |                                 |             | ``SWPP   X0, X1, [X4]``              |
   +---------------------------------+-------------+--------------------------------------+
-  | ``exchange(loc,val,acq)``       | ``BASE``    | | ``loop:``                          |
-  |                                 |             | |   ``LDAXP  X0, X1, [X4]``          |
-  |                                 |             | |   ``STXP   W5, X2, X3, [X4]``      |
-  |                                 |             | |   ``CBNZ   W5, loop``              |
-  +                                 +-------------+--------------------------------------+
-  |                                 | ``LSE``     | | ``LDP   X0, X1, [X4]``             |
-  |                                 |             | | ``loop:``                          |
-  |                                 |             | |   ``MOV    X6, X0``                |
-  |                                 |             | |   ``MOV    X7, X1``                |
-  |                                 |             | |   ``CASPA  X0, X1, X2, X3, [X4]``  |
-  |                                 |             | |   ``CMP    X0, X6``                |
-  |                                 |             | |   ``CCMP   X1, X7, 0, EQ``         |
-  |                                 |             | |   ``B.NE   loop``                  |
-  +                                 +-------------+--------------------------------------+
-  |                                 | ``LSE128``  | | ``MOV    X0, X2``                  |
-  |                                 |             | | ``MOV    X1, X3``                  |
-  |                                 |             | | ``SWPPA  X0, X1, [X4]``            |
+  | ``exchange(loc,val,acq)``       | ``BASE``    | ``loop:``                            |
+  |                                 |             |   ``LDAXP  X0, X1, [X4]``            |
+  |                                 |             |                                      |
+  |                                 |             |   ``STXP   W5, X2, X3, [X4]``        |
+  |                                 |             |                                      |
+  |                                 |             |   ``CBNZ   W5, loop``                |
+  |                                 +-------------+--------------------------------------+
+  |                                 | ``LSE``     | ``LDP   X0, X1, [X4]``               |
+  |                                 |             |                                      |
+  |                                 |             | ``loop:``                            |
+  |                                 |             |   ``MOV    X6, X0``                  |
+  |                                 |             |                                      |
+  |                                 |             |   ``MOV    X7, X1``                  |
+  |                                 |             |                                      |
+  |                                 |             |   ``CASPA  X0, X1, X2, X3, [X4]``    |
+  |                                 |             |                                      |
+  |                                 |             |   ``CMP    X0, X6``                  |
+  |                                 |             |                                      |
+  |                                 |             |   ``CCMP   X1, X7, 0, EQ``           |
+  |                                 |             |                                      |
+  |                                 |             |   ``B.NE   loop``                    |
+  |                                 +-------------+--------------------------------------+
+  |                                 | ``LSE128``  | ``MOV    X0, X2``                    |
+  |                                 |             |                                      |
+  |                                 |             | ``MOV    X1, X3``                    |
+  |                                 |             |                                      |
+  |                                 |             | ``SWPPA  X0, X1, [X4]``              |
   +---------------------------------+-------------+--------------------------------------+
-  | ``exchange(loc,val,rel)``       | ``BASE``    | | ``loop:``                          |
-  |                                 |             | |   ``LDXP   X0, X1, [X4]``          |
-  |                                 |             | |   ``STLXP  W5, X2, X3, [X4]``      |
-  |                                 |             | |   ``CBNZ   W5, loop``              |
-  +                                 +-------------+--------------------------------------+
-  |                                 | ``LSE``     | | ``LDP   X0, X1, [X4]``             |
-  |                                 |             | | ``loop:``                          |
-  |                                 |             | |   ``MOV    X6, X0``                |
-  |                                 |             | |   ``MOV    X7, X1``                |
-  |                                 |             | |   ``CASPL  X0, X1, X2, X3, [X4]``  |
-  |                                 |             | |   ``CMP    X0, X6``                |
-  |                                 |             | |   ``CCMP   X1, X7, 0, EQ``         |
-  |                                 |             | |   ``B.NE   loop``                  |
-  +                                 +-------------+--------------------------------------+
-  |                                 | ``LSE128``  | | ``MOV    X0, X2``                  |
-  |                                 |             | | ``MOV    X1, X3``                  |
-  |                                 |             | | ``SWPPL  X0, X1, [X4]``            |
+  | ``exchange(loc,val,rel)``       | ``BASE``    | ``loop:``                            |
+  |                                 |             |   ``LDXP   X0, X1, [X4]``            |
+  |                                 |             |                                      |
+  |                                 |             |   ``STLXP  W5, X2, X3, [X4]``        |
+  |                                 |             |                                      |
+  |                                 |             |   ``CBNZ   W5, loop``                |
+  |                                 +-------------+--------------------------------------+
+  |                                 | ``LSE``     | ``LDP   X0, X1, [X4]``               |
+  |                                 |             |                                      |
+  |                                 |             | ``loop:``                            |
+  |                                 |             |   ``MOV    X6, X0``                  |
+  |                                 |             |                                      |
+  |                                 |             |   ``MOV    X7, X1``                  |
+  |                                 |             |                                      |
+  |                                 |             |   ``CASPL  X0, X1, X2, X3, [X4]``    |
+  |                                 |             |                                      |
+  |                                 |             |   ``CMP    X0, X6``                  |
+  |                                 |             |                                      |
+  |                                 |             |   ``CCMP   X1, X7, 0, EQ``           |
+  |                                 |             |                                      |
+  |                                 |             |   ``B.NE   loop``                    |
+  |                                 +-------------+--------------------------------------+
+  |                                 | ``LSE128``  | ``MOV    X0, X2``                    |
+  |                                 |             |                                      |
+  |                                 |             | ``MOV    X1, X3``                    |
+  |                                 |             |                                      |
+  |                                 |             | ``SWPPL  X0, X1, [X4]``              |
   +---------------------------------+-------------+--------------------------------------+
-  | ``exchange(loc,val,acq_rel)``   | ``BASE``    | | ``loop:``                          |
-  | ``exchange(loc,val,sc)``        |             | |   ``LDAXP  X0, X1, [X4]``          |
-  |                                 |             | |   ``STLXP  W5, X2, X3, [X4]``      |
-  |                                 |             | |   ``CBNZ   W5, loop``              |
-  +                                 +-------------+--------------------------------------+
-  |                                 | ``LSE``     | | ``LDP   X0, X1, [X4]``             |
-  |                                 |             | | ``loop:``                          |
-  |                                 |             | |   ``MOV    X6, X0``                |
-  |                                 |             | |   ``MOV    X7, X1``                |
-  |                                 |             | |   ``CASPAL X0, X1, X2, X3, [X4]``  |
-  |                                 |             | |   ``CMP    X0, X6``                |
-  |                                 |             | |   ``CCMP   X1, X7, 0, EQ``         |
-  |                                 |             | |   ``B.NE   loop``                  |
-  +                                 +-------------+--------------------------------------+
-  |                                 | ``LSE128``  | | ``MOV    X0, X2``                  |
-  |                                 |             | | ``MOV    X1, X3``                  |
-  |                                 |             | | ``SWPPAL X0, X1, [X4]``            |
+  | ``exchange(loc,val,acq_rel)``   | ``BASE``    | ``loop:``                            |
+  | ``exchange(loc,val,sc)``        |             |   ``LDAXP  X0, X1, [X4]``            |
+  |                                 |             |                                      |
+  |                                 |             |   ``STLXP  W5, X2, X3, [X4]``        |
+  |                                 |             |                                      |
+  |                                 |             |   ``CBNZ   W5, loop``                |
+  |                                 +-------------+--------------------------------------+
+  |                                 | ``LSE``     | ``LDP   X0, X1, [X4]``               |
+  |                                 |             |                                      |
+  |                                 |             | ``loop:``                            |
+  |                                 |             |   ``MOV    X6, X0``                  |
+  |                                 |             |                                      |
+  |                                 |             |   ``MOV    X7, X1``                  |
+  |                                 |             |                                      |
+  |                                 |             |   ``CASPAL X0, X1, X2, X3, [X4]``    |
+  |                                 |             |                                      |
+  |                                 |             |   ``CMP    X0, X6``                  |
+  |                                 |             |                                      |
+  |                                 |             |   ``CCMP   X1, X7, 0, EQ``           |
+  |                                 |             |                                      |
+  |                                 |             |   ``B.NE   loop``                    |
+  |                                 +-------------+--------------------------------------+
+  |                                 | ``LSE128``  | ``MOV    X0, X2``                    |
+  |                                 |             |                                      |
+  |                                 |             | ``MOV    X1, X3``                    |
+  |                                 |             |                                      |
+  |                                 |             | ``SWPPAL X0, X1, [X4]``              |
   +---------------------------------+-------------+--------------------------------------+
-  | ``fetch_add(loc,val,relaxed)``  | ``BASE``    | | ``loop:``                          |
-  |                                 |             | |   ``LDXP   X0, X1, [X4]``          |
-  |                                 |             | |   ``ADDS   X0, X0, X2``            |
-  |                                 |             | |   ``ADC    X1, X1, X3``            |
-  |                                 |             | |   ``STXP   W5, X2, X3, [X4]``      |
-  |                                 |             | |   ``CBNZ   W5, loop``              |
-  +                                 +-------------+--------------------------------------+
-  |                                 | ``LSE``     | | ``LDP   X0, X1, [X4]``             |
-  |                                 |             | | ``loop:``                          |
-  |                                 |             | |   ``MOV    X6, X0``                |
-  |                                 |             | |   ``MOV    X7, X1``                |
-  |                                 |             | |   ``ADDS   X8, X0, X2``            |
-  |                                 |             | |   ``ADC    X9, X1, X3``            |
-  |                                 |             | |   ``CASP   X0, X1, X8, X9, [X4]``  |
-  |                                 |             | |   ``CMP    X0, X6``                |
-  |                                 |             | |   ``CCMP   X1, X7, 0, EQ``         |
-  |                                 |             | |   ``B.NE   loop``                  |
+  | ``fetch_add(loc,val,relaxed)``  | ``BASE``    | ``loop:``                            |
+  |                                 |             |   ``LDXP   X0, X1, [X4]``            |
+  |                                 |             |                                      |
+  |                                 |             |   ``ADDS   X0, X0, X2``              |
+  |                                 |             |                                      |
+  |                                 |             |   ``ADC    X1, X1, X3``              |
+  |                                 |             |                                      |
+  |                                 |             |   ``STXP   W5, X2, X3, [X4]``        |
+  |                                 |             |                                      |
+  |                                 |             |   ``CBNZ   W5, loop``                |
+  |                                 +-------------+--------------------------------------+
+  |                                 | ``LSE``     | ``LDP   X0, X1, [X4]``               |
+  |                                 |             |                                      |
+  |                                 |             | ``loop:``                            |
+  |                                 |             |   ``MOV    X6, X0``                  |
+  |                                 |             |                                      |
+  |                                 |             |   ``MOV    X7, X1``                  |
+  |                                 |             |                                      |
+  |                                 |             |   ``ADDS   X8, X0, X2``              |
+  |                                 |             |                                      |
+  |                                 |             |   ``ADC    X9, X1, X3``              |
+  |                                 |             |                                      |
+  |                                 |             |   ``CASP   X0, X1, X8, X9, [X4]``    |
+  |                                 |             |                                      |
+  |                                 |             |   ``CMP    X0, X6``                  |
+  |                                 |             |                                      |
+  |                                 |             |   ``CCMP   X1, X7, 0, EQ``           |
+  |                                 |             |                                      |
+  |                                 |             |   ``B.NE   loop``                    |
   +---------------------------------+-------------+--------------------------------------+
-  | ``fetch_add(loc,val,acq)``      | ``BASE``    | | ``loop:``                          |
-  |                                 |             | |   ``LDAXP  X0, X1, [X4]``          |
-  |                                 |             | |   ``ADDS   X0, X0, X2``            |
-  |                                 |             | |   ``ADC    X1, X1, X3``            |
-  |                                 |             | |   ``STXP   W5, X2, X3, [X4]``      |
-  |                                 |             | |   ``CBNZ   W5, loop``              |
-  +                                 +-------------+--------------------------------------+
-  |                                 | ``LSE``     | | ``LDP   X0, X1, [X4]``             |
-  |                                 |             | | ``loop:``                          |
-  |                                 |             | |   ``MOV    X6, X0``                |
-  |                                 |             | |   ``MOV    X7, X1``                |
-  |                                 |             | |   ``ADDS   X8, X0, X2``            |
-  |                                 |             | |   ``ADC    X9, X1, X3``            |
-  |                                 |             | |   ``CASPA  X0, X1, X8, X9, [X4]``  |
-  |                                 |             | |   ``CMP    X0, X6``                |
-  |                                 |             | |   ``CCMP   X1, X7, 0, EQ``         |
-  |                                 |             | |   ``B.NE   loop``                  |
+  | ``fetch_add(loc,val,acq)``      | ``BASE``    | ``loop:``                            |
+  |                                 |             |   ``LDAXP  X0, X1, [X4]``            |
+  |                                 |             |                                      |
+  |                                 |             |   ``ADDS   X0, X0, X2``              |
+  |                                 |             |                                      |
+  |                                 |             |   ``ADC    X1, X1, X3``              |
+  |                                 |             |                                      |
+  |                                 |             |   ``STXP   W5, X2, X3, [X4]``        |
+  |                                 |             |                                      |
+  |                                 |             |   ``CBNZ   W5, loop``                |
+  |                                 +-------------+--------------------------------------+
+  |                                 | ``LSE``     | ``LDP   X0, X1, [X4]``               |
+  |                                 |             |                                      |
+  |                                 |             | ``loop:``                            |
+  |                                 |             |   ``MOV    X6, X0``                  |
+  |                                 |             |                                      |
+  |                                 |             |   ``MOV    X7, X1``                  |
+  |                                 |             |                                      |
+  |                                 |             |   ``ADDS   X8, X0, X2``              |
+  |                                 |             |                                      |
+  |                                 |             |   ``ADC    X9, X1, X3``              |
+  |                                 |             |                                      |
+  |                                 |             |   ``CASPA  X0, X1, X8, X9, [X4]``    |
+  |                                 |             |                                      |
+  |                                 |             |   ``CMP    X0, X6``                  |
+  |                                 |             |                                      |
+  |                                 |             |   ``CCMP   X1, X7, 0, EQ``           |
+  |                                 |             |                                      |
+  |                                 |             |   ``B.NE   loop``                    |
   +---------------------------------+-------------+--------------------------------------+
-  | ``fetch_add(loc,val,rel)``      | ``BASE``    | | ``loop:``                          |
-  |                                 |             | |   ``LDXP   X0, X1, [X4]``          |
-  |                                 |             | |   ``ADDS   X0, X0, X2``            |
-  |                                 |             | |   ``ADC    X1, X1, X3``            |
-  |                                 |             | |   ``STLXP  W5, X2, X3, [X4]``      |
-  |                                 |             | |   ``CBNZ   W5, loop``              |
-  +                                 +-------------+--------------------------------------+
-  |                                 | ``LSE``     | | ``LDP   X0, X1, [X4]``             |
-  |                                 |             | | ``loop:``                          |
-  |                                 |             | |   ``MOV    X6, X0``                |
-  |                                 |             | |   ``MOV    X7, X1``                |
-  |                                 |             | |   ``ADDS   X8, X0, X2``            |
-  |                                 |             | |   ``ADC    X9, X1, X3``            |
-  |                                 |             | |   ``CASPL  X0, X1, X8, X9, [X4]``  |
-  |                                 |             | |   ``CMP    X0, X6``                |
-  |                                 |             | |   ``CCMP   X1, X7, 0, EQ``         |
-  |                                 |             | |   ``B.NE   loop``                  |
+  | ``fetch_add(loc,val,rel)``      | ``BASE``    | ``loop:``                            |
+  |                                 |             |   ``LDXP   X0, X1, [X4]``            |
+  |                                 |             |                                      |
+  |                                 |             |   ``ADDS   X0, X0, X2``              |
+  |                                 |             |                                      |
+  |                                 |             |   ``ADC    X1, X1, X3``              |
+  |                                 |             |                                      |
+  |                                 |             |   ``STLXP  W5, X2, X3, [X4]``        |
+  |                                 |             |                                      |
+  |                                 |             |   ``CBNZ   W5, loop``                |
+  |                                 +-------------+--------------------------------------+
+  |                                 | ``LSE``     | ``LDP   X0, X1, [X4]``               |
+  |                                 |             |                                      |
+  |                                 |             | ``loop:``                            |
+  |                                 |             |   ``MOV    X6, X0``                  |
+  |                                 |             |                                      |
+  |                                 |             |   ``MOV    X7, X1``                  |
+  |                                 |             |                                      |
+  |                                 |             |   ``ADDS   X8, X0, X2``              |
+  |                                 |             |                                      |
+  |                                 |             |   ``ADC    X9, X1, X3``              |
+  |                                 |             |                                      |
+  |                                 |             |   ``CASPL  X0, X1, X8, X9, [X4]``    |
+  |                                 |             |                                      |
+  |                                 |             |   ``CMP    X0, X6``                  |
+  |                                 |             |                                      |
+  |                                 |             |   ``CCMP   X1, X7, 0, EQ``           |
+  |                                 |             |                                      |
+  |                                 |             |   ``B.NE   loop``                    |
   +---------------------------------+-------------+--------------------------------------+
-  | ``fetch_add(loc,val,acq_rel)``  | ``BASE``    | | ``loop:``                          |
-  | ``fetch_add(loc,val,sc)``       |             | |   ``LDAXP  X0, X1, [X4]``          |
-  |                                 |             | |   ``ADDS   X0, X0, X2``            |
-  |                                 |             | |   ``ADC    X1, X1, X3``            |
-  |                                 |             | |   ``STXLP  W5, X2, X3, [X4]``      |
-  |                                 |             | |   ``CBNZ   W5, loop``              |
-  +                                 +-------------+--------------------------------------+
-  |                                 | ``LSE``     | | ``LDP   X0, X1, [X4]``             |
-  |                                 |             | | ``loop:``                          |
-  |                                 |             | |   ``MOV    X6, X0``                |
-  |                                 |             | |   ``MOV    X7, X1``                |
-  |                                 |             | |   ``ADDS   X8, X0, X2``            |
-  |                                 |             | |   ``ADC    X9, X1, X3``            |
-  |                                 |             | |   ``CASPAL X0, X1, X8, X9, [X4]``  |
-  |                                 |             | |   ``CMP    X0, X6``                |
-  |                                 |             | |   ``CCMP   X1, X7, 0, EQ``         |
-  |                                 |             | |   ``B.NE   loop``                  |
+  | ``fetch_add(loc,val,acq_rel)``  | ``BASE``    | ``loop:``                            |
+  | ``fetch_add(loc,val,sc)``       |             |   ``LDAXP  X0, X1, [X4]``            |
+  |                                 |             |                                      |
+  |                                 |             |   ``ADDS   X0, X0, X2``              |
+  |                                 |             |                                      |
+  |                                 |             |   ``ADC    X1, X1, X3``              |
+  |                                 |             |                                      |
+  |                                 |             |   ``STXLP  W5, X2, X3, [X4]``        |
+  |                                 |             |                                      |
+  |                                 |             |   ``CBNZ   W5, loop``                |
+  |                                 +-------------+--------------------------------------+
+  |                                 | ``LSE``     | ``LDP   X0, X1, [X4]``               |
+  |                                 |             |                                      |
+  |                                 |             | ``loop:``                            |
+  |                                 |             |   ``MOV    X6, X0``                  |
+  |                                 |             |                                      |
+  |                                 |             |   ``MOV    X7, X1``                  |
+  |                                 |             |                                      |
+  |                                 |             |   ``ADDS   X8, X0, X2``              |
+  |                                 |             |                                      |
+  |                                 |             |   ``ADC    X9, X1, X3``              |
+  |                                 |             |                                      |
+  |                                 |             |   ``CASPAL X0, X1, X8, X9, [X4]``    |
+  |                                 |             |                                      |
+  |                                 |             |   ``CMP    X0, X6``                  |
+  |                                 |             |                                      |
+  |                                 |             |   ``CCMP   X1, X7, 0, EQ``           |
+  |                                 |             |                                      |
+  |                                 |             |   ``B.NE   loop``                    |
   +---------------------------------+-------------+--------------------------------------+
-  | ``fetch_or(loc,val,relaxed)``   | ``LSE128``  | | ``MOV      X0, X2``                |
-  |                                 |             | | ``MOV      X1, X3``                |
-  |                                 |             | | ``LDSETP   X0, X1, [X4]``          |
+  | ``fetch_or(loc,val,relaxed)``   | ``LSE128``  | ``MOV      X0, X2``                  |
+  |                                 |             |                                      |
+  |                                 |             | ``MOV      X1, X3``                  |
+  |                                 |             |                                      |
+  |                                 |             | ``LDSETP   X0, X1, [X4]``            |
   +---------------------------------+-------------+--------------------------------------+
-  | ``fetch_or(loc,val,acq)``       | ``LSE128``  | | ``MOV      X0, X2``                |
-  |                                 |             | | ``MOV      X1, X3``                |
-  |                                 |             | | ``LDSETPA  X0, X1, [X4]``          |
+  | ``fetch_or(loc,val,acq)``       | ``LSE128``  | ``MOV      X0, X2``                  |
+  |                                 |             |                                      |
+  |                                 |             | ``MOV      X1, X3``                  |
+  |                                 |             |                                      |
+  |                                 |             | ``LDSETPA  X0, X1, [X4]``            |
   +---------------------------------+-------------+--------------------------------------+
-  | ``fetch_or(loc,val,rel)``       | ``LSE128``  | | ``MOV      X0, X2``                |
-  |                                 |             | | ``MOV      X1, X3``                |
-  |                                 |             | | ``LDSETPL  X0, X1, [X4]``          |
+  | ``fetch_or(loc,val,rel)``       | ``LSE128``  | ``MOV      X0, X2``                  |
+  |                                 |             |                                      |
+  |                                 |             | ``MOV      X1, X3``                  |
+  |                                 |             |                                      |
+  |                                 |             | ``LDSETPL  X0, X1, [X4]``            |
   +---------------------------------+-------------+--------------------------------------+
-  | ``fetch_or(loc,val,acq_rel)``   | ``LSE128``  | | ``MOV      X0, X2``                |
-  | ``fetch_or(loc,val,sc)``        |             | | ``MOV      X1, X3``                |
-  |                                 |             | | ``LDSETPAL X0, X1, [X4]``          |
+  | ``fetch_or(loc,val,acq_rel)``   | ``LSE128``  | ``MOV      X0, X2``                  |
+  | ``fetch_or(loc,val,sc)``        |             |                                      |
+  |                                 |             | ``MOV      X1, X3``                  |
+  |                                 |             |                                      |
+  |                                 |             | ``LDSETPAL X0, X1, [X4]``            |
   +---------------------------------+-------------+--------------------------------------+
-  | ``fetch_and(loc,val,relaxed)``  | ``LSE128``  | | ``MVN      X0, X2``                |
-  |                                 |             | | ``MVN      X1, X3``                |
-  |                                 |             | | ``LDCLRP   X0, X1, [X4]``          |
+  | ``fetch_and(loc,val,relaxed)``  | ``LSE128``  | ``MVN      X0, X2``                  |
+  |                                 |             |                                      |
+  |                                 |             | ``MVN      X1, X3``                  |
+  |                                 |             |                                      |
+  |                                 |             | ``LDCLRP   X0, X1, [X4]``            |
   +---------------------------------+-------------+--------------------------------------+
-  | ``fetch_and(loc,val,acq)``      | ``LSE128``  | | ``MVN      X0, X2``                |
-  |                                 |             | | ``MVN      X1, X3``                |
-  |                                 |             | | ``LDCLRPA  X0, X1, [X4]``          |
+  | ``fetch_and(loc,val,acq)``      | ``LSE128``  | ``MVN      X0, X2``                  |
+  |                                 |             |                                      |
+  |                                 |             | ``MVN      X1, X3``                  |
+  |                                 |             |                                      |
+  |                                 |             | ``LDCLRPA  X0, X1, [X4]``            |
   +---------------------------------+-------------+--------------------------------------+
-  | ``fetch_and(loc,val,rel)``      | ``LSE128``  | | ``MVN      X0, X2``                |
-  |                                 |             | | ``MVN      X1, X3``                |
-  |                                 |             | | ``LDCLRPL  X0, X1, [X4]``          |
+  | ``fetch_and(loc,val,rel)``      | ``LSE128``  | ``MVN      X0, X2``                  |
+  |                                 |             |                                      |
+  |                                 |             | ``MVN      X1, X3``                  |
+  |                                 |             |                                      |
+  |                                 |             | ``LDCLRPL  X0, X1, [X4]``            |
   +---------------------------------+-------------+--------------------------------------+
-  | ``fetch_and(loc,val,acq_rel)``  | ``LSE128``  | | ``MVN      X0, X2``                |
-  | ``fetch_and(loc,val,sc)``       |             | | ``MVN      X1, X3``                |
-  |                                 |             | | ``LDCLRPAL X0, X1, [X4]``          |
+  | ``fetch_and(loc,val,acq_rel)``  | ``LSE128``  | ``MVN      X0, X2``                  |
+  | ``fetch_and(loc,val,sc)``       |             |                                      |
+  |                                 |             | ``MVN      X1, X3``                  |
+  |                                 |             |                                      |
+  |                                 |             | ``LDCLRPAL X0, X1, [X4]``            |
   +---------------------------------+-------------+--------------------------------------+
-  | ``compare_exchange_strong(``    | ``BASE``    | | ``loop:``                          |
-  |   ``loc,&exp,val,relaxed,``     +             + |   ``LDXP   X6, x7, [X4]``          +
-  |   ``relaxed)``                  +             + |   ``CMP    X6, X0``                +
-  +                                 |             | |   ``CCMP   X7, X1, 0, EQ``         |
-  |                                 +             + |   ``CSEL   X8, X2, X6, EQ``        +
-  |                                 +             + |   ``CSEL   X9, X3, X7, EQ``        +
-  |                                 +             + |   ``STXP   W5, X8, X9, [X4]``      +
-  |                                 +             + |   ``CBNZ   W5, loop``              +
-  +                                 +             + | ``MOV   X0, X6``                   +
-  +                                 +             + | ``MOV   X1, X7``                   +
-  +                                 +-------------+--------------------------------------+
+  | ``compare_exchange_strong(``    | ``BASE``    | ``loop:``                            |
+  |   ``loc,&exp,val,relaxed,``     |             |   ``LDXP   X6, x7, [X4]``            |
+  |   ``relaxed)``                  |             |                                      |
+  |                                 |             |   ``CMP    X6, X0``                  |
+  |                                 |             |                                      |
+  |                                 |             |   ``CCMP   X7, X1, 0, EQ``           |
+  |                                 |             |                                      |
+  |                                 |             |   ``CSEL   X8, X2, X6, EQ``          |
+  |                                 |             |                                      |
+  |                                 |             |   ``CSEL   X9, X3, X7, EQ``          |
+  |                                 |             |                                      |
+  |                                 |             |   ``STXP   W5, X8, X9, [X4]``        |
+  |                                 |             |                                      |
+  |                                 |             |   ``CBNZ   W5, loop``                |
+  |                                 |             |                                      |
+  |                                 |             | ``MOV   X0, X6``                     |
+  |                                 |             |                                      |
+  |                                 |             | ``MOV   X1, X7``                     |
+  |                                 +-------------+--------------------------------------+
   |                                 | ``LSE``     | ``CASP    X0, X1, X2, X3, [X4]``     |
   +---------------------------------+-------------+--------------------------------------+
-  | ``compare_exchange_strong(``    | ``BASE``    | | ``loop:``                          |
-  |   ``loc,&exp,val,acq, acq)``    +             + |   ``LDAXP  X6, x7, [X4]``          +
-  |                                 +             + |   ``CMP    X6, X0``                +
-  +                                 |             | |   ``CCMP   X7, X1, 0, EQ``         |
-  |                                 +             + |   ``CSEL   X8, X2, X6, EQ``        +
-  |                                 +             + |   ``CSEL   X9, X3, X7, EQ``        +
-  |                                 +             + |   ``STXP   W5, X8, X9, [X4]``      +
-  |                                 +             + |   ``CBNZ   W5, loop``              +
-  +                                 +             + | ``MOV   X0, X6``                   +
-  +                                 +             + | ``MOV   X1, X7``                   +
-  +                                 +-------------+--------------------------------------+
+  | ``compare_exchange_strong(``    | ``BASE``    | ``loop:``                            |
+  |   ``loc,&exp,val,acq, acq)``    |             |   ``LDAXP  X6, x7, [X4]``            |
+  |                                 |             |                                      |
+  |                                 |             |   ``CMP    X6, X0``                  |
+  |                                 |             |                                      |
+  |                                 |             |   ``CCMP   X7, X1, 0, EQ``           |
+  |                                 |             |                                      |
+  |                                 |             |   ``CSEL   X8, X2, X6, EQ``          |
+  |                                 |             |                                      |
+  |                                 |             |   ``CSEL   X9, X3, X7, EQ``          |
+  |                                 |             |                                      |
+  |                                 |             |   ``STXP   W5, X8, X9, [X4]``        |
+  |                                 |             |                                      |
+  |                                 |             |   ``CBNZ   W5, loop``                |
+  |                                 |             |                                      |
+  |                                 |             | ``MOV   X0, X6``                     |
+  |                                 |             |                                      |
+  |                                 |             | ``MOV   X1, X7``                     |
+  |                                 +-------------+--------------------------------------+
   |                                 | ``LSE``     | ``CASPA   X0, X1, X2, X3, [X4]``     |
   +---------------------------------+-------------+--------------------------------------+
-  | ``compare_exchange_strong(``    | ``BASE``    | | ``loop:``                          |
-  |   ``loc,&exp,val,rel,rel)``     +             + |   ``LDXP   X6, x7, [X4]``          +
-  |                                 +             + |   ``CMP    X6, X0``                +
-  +                                 |             | |   ``CCMP   X7, X1, 0, EQ``         |
-  |                                 +             + |   ``CSEL   X8, X2, X6, EQ``        +
-  |                                 +             + |   ``CSEL   X9, X3, X7, EQ``        +
-  |                                 +             + |   ``STLXP  W5, X8, X9, [X4]``      +
-  |                                 +             + |   ``CBNZ   W5, loop``              +
-  +                                 +             + | ``MOV   X0, X6``                   +
-  +                                 +             + | ``MOV   X1, X7``                   +
-  +                                 +-------------+--------------------------------------+
+  | ``compare_exchange_strong(``    | ``BASE``    | ``loop:``                            |
+  |   ``loc,&exp,val,rel,rel)``     |             |   ``LDXP   X6, x7, [X4]``            |
+  |                                 |             |                                      |
+  |                                 |             |   ``CMP    X6, X0``                  |
+  |                                 |             |                                      |
+  |                                 |             |   ``CCMP   X7, X1, 0, EQ``           |
+  |                                 |             |                                      |
+  |                                 |             |   ``CSEL   X8, X2, X6, EQ``          |
+  |                                 |             |                                      |
+  |                                 |             |   ``CSEL   X9, X3, X7, EQ``          |
+  |                                 |             |                                      |
+  |                                 |             |   ``STLXP  W5, X8, X9, [X4]``        |
+  |                                 |             |                                      |
+  |                                 |             |   ``CBNZ   W5, loop``                |
+  |                                 |             |                                      |
+  |                                 |             | ``MOV   X0, X6``                     |
+  |                                 |             |                                      |
+  |                                 |             | ``MOV   X1, X7``                     |
+  |                                 +-------------+--------------------------------------+
   |                                 | ``LSE``     | ``CASPL   X0, X1, X2, X3, [X4]``     |
   +---------------------------------+-------------+--------------------------------------+
-  | ``compare_exchange_strong(``    | ``BASE``    | | ``loop:``                          |
-  |   ``loc,&exp,val,acq_rel,acq)`` +             + |   ``LDAXP  X6, x7, [X4]``          +
-  |                                 +             + |   ``CMP    X6, X0``                +
-  + ``compare_exchange_strong(``    |             | |   ``CCMP   X7, X1, 0, EQ``         |
-  |   ``loc,&exp,val,sc,sc)``       +             + |   ``CSEL   X8, X2, X6, EQ``        +
-  |                                 +             + |   ``CSEL   X9, X3, X7, EQ``        +
-  |                                 +             + |   ``STLXP  W5, X8, X9, [X4]``      +
-  |                                 +             + |   ``CBNZ   W5, loop``              +
-  +                                 +             + | ``MOV   X0, X6``                   +
-  +                                 +             + | ``MOV   X1, X7``                   +
-  +                                 +-------------+--------------------------------------+
+  | ``compare_exchange_strong(``    | ``BASE``    | ``loop:``                            |
+  |   ``loc,&exp,val,acq_rel,acq)`` |             |   ``LDAXP  X6, x7, [X4]``            |
+  | ``compare_exchange_strong(``    |             |                                      |
+  |   ``loc,&exp,val,sc,sc)``       |             |   ``CMP    X6, X0``                  |
+  |                                 |             |                                      |
+  |                                 |             |   ``CCMP   X7, X1, 0, EQ``           |
+  |                                 |             |                                      |
+  |                                 |             |   ``CSEL   X8, X2, X6, EQ``          |
+  |                                 |             |                                      |
+  |                                 |             |   ``CSEL   X9, X3, X7, EQ``          |
+  |                                 |             |                                      |
+  |                                 |             |   ``STLXP  W5, X8, X9, [X4]``        |
+  |                                 |             |                                      |
+  |                                 |             |   ``CBNZ   W5, loop``                |
+  |                                 |             |                                      |
+  |                                 |             | ``MOV   X0, X6``                     |
+  |                                 |             |                                      |
+  |                                 |             | ``MOV   X1, X7``                     |
+  |                                 +-------------+--------------------------------------+
   |                                 | ``LSE``     | ``CASPAL  X0, X1, X2, X3, [X4]``     |
   +---------------------------------+-------------+--------------------------------------+
 
 
-We do not list other variants of ``fetch_<op>`` since their mappings should be
+We do not list other variants of ``fetch_<op>`` since their Mappings should be
 the same (modulo implementations of <op> that are not in scope of this
-document). Precisely implementations that use loops should use the instructions
+document). Precisely, implementations that use loops should use the instructions
 that load or store from memory with the relevant memory order, and the
 appropriate <op> Assembly Sequence inside the loop. Exceptions, where Assembly 
 Sequences exist, are stated (for instance ``fetch_or`` can be implemented using
@@ -936,12 +1158,156 @@ Special Cases
 -------------
 
 There are special cases in the Mappings presented above, these must be handled
-in order to prevent unexpected outcomes of the compiled program.
+in order to prevent unexpected outcomes of the compiled program. The special 
+cases are identified below.
+
+* Re-Ordering of Read-Modify-Write Effects and Acquire Fence
+* Const-Qualified 128-bit Atomic Loads
+
+Destination Register Should Not Be Zero Register for Read-Modify-Writes
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+A compiler is not permitted to rewrite the destination register to be the
+zero register for atomic operations that make use of ``SWP`` and ``LD<OP>``
+Assembly instructions. These include but are not limited to:
+
+.. table::
+
+  +-----------------------------------------+--------------------------------------+
+  | Atomic Operation                        | Assembly Sequence                    |
+  +=========================================+======================================+
+  | ``exchange(loc,val,sc)``                | ``MOV W4, #val;``                    |
+  |                                         | ``SWP W4, W10, [X1]``                |
+  +-----------------------------------------+--------------------------------------+
+  | ``fetch_add(loc,val,sc)``               | ``MOV W4, #val;``                    |
+  |                                         | ``LDADD W4, W10, [X1]``              |
+  +-----------------------------------------+--------------------------------------+
+
+Where ``X1`` contains the address of ``loc``.
+
+We annotate Mappings affected with ``*`` in section 4.2.
+
+Please refer to 
+`Appendix: Read-Modify-Write Destination Register Semantics`_ for information on why
+this example must be documented.
+
+Const-Qualified 128-bit Atomic Loads Should Be Marked Mutable
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Const-qualified data containing 128-bit atomic types should not be placed
+in read-only memory (such as the ``.rodata`` section).
+
+Before LSE2, the only way to implement a single-copy 128-bit atomic load
+is by using a Read-Modify-Write sequence. The write is not visible to
+software if the memory is writeable. Compilers and runtimes should use the
+LSE2/LRCPC3 sequence when available.
+
+
+Declarative statement of Mappings compatibility
+===============================================
+
+To ensure that the above Mappings are ABI-compatible we tested the compilation of
+Concurrent Programs, where each Atomic Operation is compiled to one of the
+aforementioned Mappings. We test if there is a compiled program that exhibits
+an outcome of execution according to the AArch64 Memory Model contained in §B2
+of the Arm Architecture Reference Manual [ARMARM_] that is not an outcome of
+execution of the source program under the ISO C model. In this section we
+define the process by which we test compatibility. Please refer to 
+`Appendix: Mix Testing`_ for information on how ABI-compatibility is tested.
+
+
+Definition of ABI-Compatibility for Atomic Operations
+-----------------------------------------------------
+
+*A compiler that implements the above set of Mappings is ABI-Compatible with
+respect to other compilers that implement the Mappings, if Mix Testing their
+code generation finds no Compiler Bugs.*
+
+We impose some constraints on this definition:
+
+* This is not a correctness guarantee, but rather a statement backed up by
+  bounded testing. C/C++ Atomics ABI-compatibility is thus tested for the Mappings
+  above by generating C/C++ Concurrent Programs that permute combinations of
+  Atomic Operations on each Thread of Execution. We bound our test size between
+  2 and 5 Threads of Execution, where each Thread has at least 1 Atomic
+  Operation or Synchronization Operation and at most 5 Atomic Operations or
+  Synchronization Operations. We do not make any statement about the
+  ABI-Compatibility of Concurrent Programs outside these bounds.
+* We test Concurrent Programs with a fixed initial state, loop unroll factor
+  (equal to 1 loop unroll), and function calls or recursion. 
+* The above Mappings are not exhaustive, we recommend that Arm's partners
+  submit requests for other Mappings to the ABI team using the `issue tracker page on GitHub <https://github.com/ARM-software/abi-aa/issues>`_.
+* This document makes no statement about the ABI-Compatibility of optimised
+  Concurrent Programs, nor does a statement concerning the performance of
+  compiled programs under the above Mappings when executed on a given Arm-based
+  machine.
+* This document makes no statement about the ABI-Compatibility of compilers
+  that implement Mappings other than what is stated in this document.
+
+Appendix: Mix Testing
+=====================
+
+The status of this appendix is informative.
+
+
+The Mix Testing Process
+-----------------------
+
+We test for Compiler bugs, a Compiler Bug is defined as an outcome of a
+compiled program execution (under the AArch64 Memory Model contained in
+§B2 of the Arm Architecture Reference Manual [ARMARM_]) that is not 
+an outcome of execution of the source Concurrent Program (under the 
+ISO C memory model). Consider the hypothetical example where a source
+Concurrent Program finishes execution in one of three possible outcomes
+(a reference for this notation is found here [PAPER_])::
+
+  { thread_0:r0=0, thread_1:r0=1 }
+  { thread_0:r0=1, thread_1:r0=0 }
+  { thread_0:r0=1, thread_1:r0=1 }
+
+and one possible compiled program outcome has the following according to the
+AArch64 Memory Model contained in §B2 of the Arm Architecture Reference Manual
+[ARMARM_]::
+
+  { thread_0:X3=0, thread_1:X3=0 } <--- Forbidden by source model, Compiler Bug!
+  { thread_0:X3=0, thread_1:X3=1 }
+  { thread_0:X3=1, thread_1:X3=0 }
+  { thread_0:X3=1, thread_1:X3=1 }
+
+By comparing ``X3`` and the local variable ``r0`` of the original Concurrent
+Program in this example we see there is one additional outcome of executing the
+compiled program that is not an outcome of executing the source program (under
+the respective models). This suggests the Mappings under question are
+incompatible, and a compiler that implements them exhibits a Compiler Bug. To
+ensure compatibility we therefore test for the absence of such outcomes of the
+compiled programs when mixing all combinations of the above Mappings. We define
+the *Mix Testing* process as follows:
+
+#. Take an arbitrary Concurrent Program, when executed on the C/C++ memory
+   model will produce outcomes *S*.
+#. Split out the individual Atomic Operations from the initial concurrent
+   program into individual source files.
+#. Compile each individual source file containing an Atomic Operation 
+   using each Compiler Profile under test that generates Assembly Sequences
+   under a given Mapping.
+#. Combine the Assembly Sequences from above into *multiple* possible Compiled
+   Programs.
+#. Compute the outcomes of each compiled program under the AArch64 Memory Model
+   contained in §B2 of the Arm Architecture Reference Manual [ARMARM_]. Get a
+   *set* of compiled program outcomes *C*.
+#. If any compiled program set of outcomes *c* in *C* exhibits a Compiler Bug
+   (Check that *c* is a subset of *S*) with then the given Mappings are not
+   interoperable. 
+
+
+Appendix: Read-Modify-Write Destination Register Semantics
+==========================================================
 
-Re-Ordering of Read-Modify-Write Effects and Acquire Fence
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+We elaborate on why in the following example.
 
-Consider the following Concurrent Program::
+Consider the following Concurrent Program:
+
+code-block::
 
   // Shared-Memory Locations
   _Atomic int* x;
@@ -967,16 +1333,16 @@ Consider the following Concurrent Program::
 
 
 Under ISO C, the above Concurrent Program finishes execution in one of three
-possible outcomes::
+possible outcomes (a reference for this notation is found here [PAPER_])::
 
   { thread_1:r0=0; y=1; }
   { thread_1:r0=1; y=1; }
   { thread_1:r0=1; y=2; }
 
 In this case the value read by the exchange on ``thread_1`` is not used, and a
-compiler is free to remove references to unused data. It is thus legal under
-ISO C for a compiler to translate the program into the following Assembly
-Sequences::
+compiler is free to remove references to unused data. It is not legal according
+to this ABI for a compliant implementation piler to translate the program into
+the following Assembly Sequences::
 
   thread_0:
     MOV W9,#1
@@ -991,11 +1357,10 @@ Sequences::
     LDR W3,[X4]
 
 where ``thread_0:X2`` contains the address of ``x``, ``thread_0:X4`` contains
-the address of ``y``, and
-``thread_1:X2`` contains the address of ``y``, ``thread_1:X4`` contains the
-address of ``x``.
+the address of ``y``, and ``thread_1:X2`` contains the address of ``y``,
+``thread_1:X4`` contains the address of ``x``.
 
-Note: the ``exchange`` Atomic Operation is compiled to a ``SWP`` Assembly
+The ``exchange`` Atomic Operation is compiled to a ``SWP`` Assembly
 Instruction, where its destination register is the zero register ``WZR``. The 
 ``acquire`` fence on ``thread_1`` is compiled to the ``DMB ISHLD`` Assembly 
 Instruction.
@@ -1011,16 +1376,15 @@ Reference Manual [ARMARM_]::
   { thread_1:r0=1; [y]=2; }
 
 By comparing ``W3`` and the local variable ``r0`` of the original Concurrent
-Program we see there is one additional Outcome of executing the compiled
+Program we see there is one additional outcome of executing the compiled
 program that is not an outcome of executing the Concurrent Program. This is due
 to the fact that according to the Arm Architecture Reference Manual [ARMARM_] 
 *instructions where the destination register is WZR or XZR, are not regarded as
 doing a read for the purpose of a DMB LD barrier.*
 
-ISO C permits a conforming implementation to delete unused data, but in this
-case it introduces another Outcome of Execution. To fix this issue, a compiler
-should not rewrite the destination register to be the zero register in this
-case::
+In this case the compiler introduces another outcome of Execution. To fix this
+issue, a compiler is not permitted to rewrite the destination register to be the
+zero register in this case::
 
   thread_0:
     MOV W9,#1
@@ -1044,118 +1408,6 @@ Reference Manual [ARMARM_]::
   { thread_1:r0=1; [y]=2; }
 
 As such the unexpected outcome has disappeared. There are multiple Mappings
-that exhibit this behaviour, those effected make use of ``SWP`` and ``LD<OP>``
-Assembly instructions. These include but are not limited to:
-
-.. table::
-
-  +-----------------------------------------+--------------------------------------+
-  | Atomic Operation                        | Assembly Sequence                    |
-  +=========================================+======================================+
-  | ``exchange(loc,val,sc)``                | ``MOV W4, #val;``                    |
-  |                                         | ``SWP W4, W10, [X1]``                |
-  +-----------------------------------------+--------------------------------------+
-  | ``fetch_add(loc,val,sc)``               | ``MOV W4, #val;``                    |
-  |                                         | ``LDADD W4, W10, [X1]``              |
-  +-----------------------------------------+--------------------------------------+
-
-Where ``X1`` contains the address of ``loc``.
-
-Const-Qualified 128-bit Atomic Loads
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-Const-qualified data containing 128-bit atomic types should not be placed
-in readonly memory (the ``.rodata`` section).
-
-Before LSE2, the only way to implement a single-copy 128-bit atomic load
-is by using a Read-Modify-Write sequence. The write is not visible to
-software if the memory is writeable. Compilers and runtimes should use the
-LSE2/LRCPC3 sequence when available.
-
-
-Declarative statement of Mappings compatibility
-===============================================
-
-To ensure that the above Mappings are ABI-compatible we test the compilation of
-Concurrent Programs, where each Atomic Operation is compiled to one of the
-aforementioned Mappings. We test if there is a compiled program that exhibits
-an outcome of execution according to the AArch64 Memory Model contained in §B2
-of the Arm Architecture Reference Manual [ARMARM_] that is not an outcome of
-execution of the source program under the ISO C model. In this section we
-define the process by which we test compatibility.
-
-The Mix Testing Process
------------------------
-
-We test for Compiler bugs, a compiler bug is defined as an Outcome of a
-compiled program execution (under the AArch64 model) that is not an Outcome of
-execution of the source Concurrent Program (under the ISO C model). Consider
-the hypothetical example where a source Concurrent Program finishes execution
-in one of three possible outcomes::
-
-  { thread_0:r0=0, thread_1:r0=1 }
-  { thread_0:r0=1, thread_1:r0=0 }
-  { thread_0:r0=1, thread_1:r0=1 }
-
-and one possible compiled program outcome has the following according to the
-AArch64 Memory Model contained in §B2 of the Arm Architecture Reference Manual
-[ARMARM_]::
-
-  { thread_0:X3=0, thread_1:X3=0 } <--- Forbidden by source model, compiler bug!
-  { thread_0:X3=0, thread_1:X3=1 }
-  { thread_0:X3=1, thread_1:X3=0 }
-  { thread_0:X3=1, thread_1:X3=1 }
-
-By comparing ``X3`` and the local variable ``r0`` of the original Concurrent
-Program in this example we see there is one additional outcome of executing the
-compiled program that is not an outcome of executing the source program (under
-the respective models). This suggests the Mappings under question are
-incompatible, and a compiler that implements them exhibits a compiler bug. To
-ensure compatibility we therefore test for the absence of such Outcomes of the
-compiled programs when mixing all combinations of the above Mappings. We define
-the *Mix Testing* process as follows:
-
-#. Given a C/C++ Concurrent Program.
-#. Split it into its representative Atomic Operations.
-#. Compile each Atomic Operation separately using a Compiler Profile that
-   generates Assembly Sequences under a given Mapping.
-#. Combine the Assembly Sequences into *multiple* possible Compiled Programs.
-#. Compute the outcomes of executing the Source Concurrent Program under the
-   ISO C memory model. Get source program outcomes *S*.
-#. Compute the outcomes of each compiled program under the AArch64 memory model
-   [ARMARM_]. Get a *set* of compiled program outcomes *C*.
-#. If any *c* in *C* exhibits a compiler bug with respect to the outcomes *S*
-   then the given mappings are not interoperable.
-
-Using Mix Testing we now define ABI-Compatibility of Atomic Operations.
-
-
-Definition of ABI-Compatibility for Atomic Operations
------------------------------------------------------
-
-*A compiler that implements the above set of Mappings is ABI-Compatible with
-respect to other compilers that implement the Mappings, if Mix Testing their
-code generation finds no compiler bugs.*
-
-We impose some constraints on this definition:
-
-* This is not a correctness guarantee, but rather a statement backed up by
-  bounded testing. Atomics ABI-compatibility is thus tested for the Mappings
-  above by generating C/C++ Concurrent Programs that permute combinations of
-  Atomic Operations on each Thread of Execution. We bound our test size between
-  2 and 5 Threads of Execution, where each Thread has at least 1 Atomic
-  Operation or Synchronization Operation and at most 5 Atomic Operations or
-  Synchronization Operations. We do not make any statement about the
-  ABI-Compatibility of Concurrent Programs outside these bounds.
-* We test Concurrent Programs with a fixed initial state, loop unroll factor
-  (equal to 1 loop unroll), and function calls or recursion. 
-* The above Mappings are not exhaustive, We hope that Arm's partners will
-  submit requests for other Mappings to the ABI team using the issue tracker
-  page on GitHub.
-* This document makes no statement about the ABI-Compatibility of optimised
-  Concurrent Programs, nor does a statement concerning the performance of
-  compiled programs under the above Mappings when executed on a given Arm-based
-  machine.
-* This document makes no statement about the ABI-Compatibility of compilers
-  that implement Mappings other than what is stated in this document.
+that exhibit this behaviour, those affected make use of ``SWP`` and ``LD<OP>``
+Assembly instructions.
 

From 265eb01f2153be0eda22995c9f87bba2a01d0b82 Mon Sep 17 00:00:00 2001
From: lukeg101 <6547672+lukeg101@users.noreply.github.com>
Date: Tue, 9 Jul 2024 17:17:01 +0100
Subject: [PATCH 03/17] Added design document

---
 atomicsabi64/atomicsabi64.rst    | 176 +-----------------
 design-documents/atomics-ABI.rst | 308 +++++++++++++++++++++++++++++++
 2 files changed, 316 insertions(+), 168 deletions(-)
 create mode 100644 design-documents/atomics-ABI.rst

diff --git a/atomicsabi64/atomicsabi64.rst b/atomicsabi64/atomicsabi64.rst
index f0520c33..b92563d1 100644
--- a/atomicsabi64/atomicsabi64.rst
+++ b/atomicsabi64/atomicsabi64.rst
@@ -4,7 +4,7 @@
    See LICENSE file for details
 
 .. |release| replace:: 2024Q1
-.. |date-of-issue| replace:: 5\ :sup:`th` April 2024
+.. |date-of-issue| replace:: 5\ :sup:`th` July 2024
 .. |copyright-date| replace:: 2024
 .. |footer| replace:: Copyright © |copyright-date|, Arm Limited and its
                       affiliates. All rights reserved.
@@ -80,6 +80,11 @@ This document came about in the process of Luke Geeson’s PhD on testing the
 compilation of concurrent C/C++ with assistance from Wilco Dijkstra from Arm's
 Compiler Teams.
 
+This ABI arises from a paper to appear at OOPSLA 2024:
+*Mix Testing: Specifying and Testing ABI Compatibility Of C/C++ Atomics Implementations*
+by Luke Geeson, James Brotherston, Wilco Dijkstra, Alastair Donaldson, Lee Smith,
+Tyler Sorensen, and John Wickerson.
+
 
 
 Licence
@@ -213,7 +218,7 @@ changes to the content of the document for that release.
   +---------+------------------------------+-------------------------------------------------------------------+
   | Issue   | Date                         | Change                                                            |
   +=========+==============================+===================================================================+
-  | 00alp0  | 5\ :sup:`th` April 2024.     | Alpha release.                                                    |
+  | 00alp0  | 5\ :sup:`th` July 2024.      | Beta release.                                                     |
   +---------+------------------------------+-------------------------------------------------------------------+
   
 
@@ -1187,10 +1192,6 @@ Where ``X1`` contains the address of ``loc``.
 
 We annotate Mappings affected with ``*`` in section 4.2.
 
-Please refer to 
-`Appendix: Read-Modify-Write Destination Register Semantics`_ for information on why
-this example must be documented.
-
 Const-Qualified 128-bit Atomic Loads Should Be Marked Mutable
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
@@ -1212,9 +1213,7 @@ aforementioned Mappings. We test if there is a compiled program that exhibits
 an outcome of execution according to the AArch64 Memory Model contained in §B2
 of the Arm Architecture Reference Manual [ARMARM_] that is not an outcome of
 execution of the source program under the ISO C model. In this section we
-define the process by which we test compatibility. Please refer to 
-`Appendix: Mix Testing`_ for information on how ABI-compatibility is tested.
-
+define the process by which we test compatibility. 
 
 Definition of ABI-Compatibility for Atomic Operations
 -----------------------------------------------------
@@ -1250,164 +1249,5 @@ Appendix: Mix Testing
 The status of this appendix is informative.
 
 
-The Mix Testing Process
------------------------
 
-We test for Compiler bugs, a Compiler Bug is defined as an outcome of a
-compiled program execution (under the AArch64 Memory Model contained in
-§B2 of the Arm Architecture Reference Manual [ARMARM_]) that is not 
-an outcome of execution of the source Concurrent Program (under the 
-ISO C memory model). Consider the hypothetical example where a source
-Concurrent Program finishes execution in one of three possible outcomes
-(a reference for this notation is found here [PAPER_])::
-
-  { thread_0:r0=0, thread_1:r0=1 }
-  { thread_0:r0=1, thread_1:r0=0 }
-  { thread_0:r0=1, thread_1:r0=1 }
-
-and one possible compiled program outcome has the following according to the
-AArch64 Memory Model contained in §B2 of the Arm Architecture Reference Manual
-[ARMARM_]::
-
-  { thread_0:X3=0, thread_1:X3=0 } <--- Forbidden by source model, Compiler Bug!
-  { thread_0:X3=0, thread_1:X3=1 }
-  { thread_0:X3=1, thread_1:X3=0 }
-  { thread_0:X3=1, thread_1:X3=1 }
-
-By comparing ``X3`` and the local variable ``r0`` of the original Concurrent
-Program in this example we see there is one additional outcome of executing the
-compiled program that is not an outcome of executing the source program (under
-the respective models). This suggests the Mappings under question are
-incompatible, and a compiler that implements them exhibits a Compiler Bug. To
-ensure compatibility we therefore test for the absence of such outcomes of the
-compiled programs when mixing all combinations of the above Mappings. We define
-the *Mix Testing* process as follows:
-
-#. Take an arbitrary Concurrent Program, when executed on the C/C++ memory
-   model will produce outcomes *S*.
-#. Split out the individual Atomic Operations from the initial concurrent
-   program into individual source files.
-#. Compile each individual source file containing an Atomic Operation 
-   using each Compiler Profile under test that generates Assembly Sequences
-   under a given Mapping.
-#. Combine the Assembly Sequences from above into *multiple* possible Compiled
-   Programs.
-#. Compute the outcomes of each compiled program under the AArch64 Memory Model
-   contained in §B2 of the Arm Architecture Reference Manual [ARMARM_]. Get a
-   *set* of compiled program outcomes *C*.
-#. If any compiled program set of outcomes *c* in *C* exhibits a Compiler Bug
-   (Check that *c* is a subset of *S*) with then the given Mappings are not
-   interoperable. 
-
-
-Appendix: Read-Modify-Write Destination Register Semantics
-==========================================================
-
-We elaborate on why in the following example.
-
-Consider the following Concurrent Program:
-
-code-block::
-
-  // Shared-Memory Locations
-  _Atomic int* x;
-  _Atomic int* y;
-
-  // Memory Order Parameter
-  #define relaxed memory_order_relaxed
-  #define release memory_order_release
-  #define acquire memory_order_acquire
-
-  // Threads of Execution
-  void thread_0 () {
-    atomic_store_explicit(x,1,relaxed);
-    atomic_thread_fence(release);
-    atomic_store_explicit(y,1,relaxed);
-  }
-
-  void thread_1 () {
-    atomic_exchange_explicit(y,2,release);
-    atomic_thread_fence(acquire);
-    int r0 = atomic_load_explicit(x,relaxed);
-  }
-
-
-Under ISO C, the above Concurrent Program finishes execution in one of three
-possible outcomes (a reference for this notation is found here [PAPER_])::
-
-  { thread_1:r0=0; y=1; }
-  { thread_1:r0=1; y=1; }
-  { thread_1:r0=1; y=2; }
-
-In this case the value read by the exchange on ``thread_1`` is not used, and a
-compiler is free to remove references to unused data. It is not legal according
-to this ABI for a compliant implementation piler to translate the program into
-the following Assembly Sequences::
-
-  thread_0:
-    MOV W9,#1
-    STR W9,[X2]
-    DMB ISH
-    STR W3,[X4]
-
-  thread_1:
-    MOV W9,#2
-    SWP W9, WZR, [X2]
-    DMB ISHLD
-    LDR W3,[X4]
-
-where ``thread_0:X2`` contains the address of ``x``, ``thread_0:X4`` contains
-the address of ``y``, and ``thread_1:X2`` contains the address of ``y``,
-``thread_1:X4`` contains the address of ``x``.
-
-The ``exchange`` Atomic Operation is compiled to a ``SWP`` Assembly
-Instruction, where its destination register is the zero register ``WZR``. The 
-``acquire`` fence on ``thread_1`` is compiled to the ``DMB ISHLD`` Assembly 
-Instruction.
-
-Executing the compiled program on an Arm-based machine from a fixed initial
-state (where ``x`` and ``y`` are ``0``) produces one of the following outcomes,
-according to the AArch64 Memory Model contained in §B2 of the Arm Architecture
-Reference Manual [ARMARM_]::
-
-  { thread_1:r0=0; [y]=1; }
-  { thread_1:r0=0; [y]=2; } <-- Forbidden by source model, a bug!
-  { thread_1:r0=1; [y]=1; }
-  { thread_1:r0=1; [y]=2; }
-
-By comparing ``W3`` and the local variable ``r0`` of the original Concurrent
-Program we see there is one additional outcome of executing the compiled
-program that is not an outcome of executing the Concurrent Program. This is due
-to the fact that according to the Arm Architecture Reference Manual [ARMARM_] 
-*instructions where the destination register is WZR or XZR, are not regarded as
-doing a read for the purpose of a DMB LD barrier.*
-
-In this case the compiler introduces another outcome of Execution. To fix this
-issue, a compiler is not permitted to rewrite the destination register to be the
-zero register in this case::
-
-  thread_0:
-    MOV W9,#1
-    STR W9,[X2]
-    DMB ISH
-    STR W3,[X4]
-
-  thread_1:
-    MOV W9,#2
-    SWP W9, W10, [X2]
-    DMB ISHLD
-    LDR W3,[X4]
-
-Executing the compiled program on an Arm-based machine from a fixed initial
-state (where ``x`` and ``y`` are ``0``) produces one of the following outcomes,
-according to the AArch64 Memory Model contained in §B2 of the Arm Architecture
-Reference Manual [ARMARM_]::
-
-  { thread_1:r0=0; [y]=1; }
-  { thread_1:r0=1; [y]=1; }
-  { thread_1:r0=1; [y]=2; }
-
-As such the unexpected outcome has disappeared. There are multiple Mappings
-that exhibit this behaviour, those affected make use of ``SWP`` and ``LD<OP>``
-Assembly instructions.
 
diff --git a/design-documents/atomics-ABI.rst b/design-documents/atomics-ABI.rst
new file mode 100644
index 00000000..0b4f890c
--- /dev/null
+++ b/design-documents/atomics-ABI.rst
@@ -0,0 +1,308 @@
+..
+   Copyright (c) 2023, Arm Limited and its affiliates.  All rights reserved.
+   CC-BY-SA-4.0 AND Apache-Patent-License
+   See LICENSE file for details
+
+.. _ARMARM: https://developer.arm.com/documentation/ddi0487/latest
+.. _PAPER: https://doi.org/10.1109/CGO57630.2024.10444836
+
+Rationale Document for C11 Atomics ABI.
+***************************************
+
+Preamble
+========
+
+Background
+----------
+
+This document describes the rationale behind the ABI choices made for mapping
+from C11 atomic operations to Arm AArch64 assembly sequences.
+
+From the perspective of the Arm ABI we have some decisions to
+make:
+
+- We need to choose a baseline ABI (a set of mappings), that is compatible for all versions of the Armv8 architecture.
+- The mappings should cover atomic accesses of various sign, size, and type accessible through C11 atomic operations using compiler profiles.
+
+The main trade-offs we have identified or have been made aware of are:
+
+- Performance of different mappings versus compatibility with all architectures.
+- Whether certain compiler operations lead to unexpected behaviours.
+
+As motivated by the use cases expanded upon below:
+
+- The need for a baseline ABI
+- Knowing when an implementation departs from that baseline
+- Backwards compatibility of atomics as new mappings are added
+- Compatibility between compilers and runtimes
+- The need to constrain optimisations on specific atomic operations
+- Documenting the interoperable mappings
+- providing a basis upon which ABI compatibility can be tested.
+
+References
+----------
+
+This document refers to, or is referred to by, the following documents.
+
+.. table::
+
+  +-------------+--------------------------------------------------------------+-----------------------------------------------------------------------------+
+  | Ref         | External reference or URL                                    | Title                                                                       |
+  +=============+==============================================================+=============================================================================+
+  | ARMARM_     | DDI 0487                                                     | Arm Architecture Reference Manual Armv8 for Armv8-A architecture profile    |
+  +-------------+--------------------------------------------------------------+-----------------------------------------------------------------------------+
+  | PAPER_      | CGO paper                                                    | Compiler Testing with Relaxed Memory Models                                 |
+  +-------------+--------------------------------------------------------------+-----------------------------------------------------------------------------+
+
+
+
+Note: At the time of writing C23 is not released, as such ISO C17 is considered
+the latest published document.
+
+Use-cases known of so far
+-------------------------
+
+
+A Baseline: Describing current implementations
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The ABI we provide is a baseline specification that compilers should or do implement.
+The ABI provides a grounds to be compatible across all versions of the Armv8 architecture. Most
+of the mappings in the ABI are already implemented in LLVM and GCC and this ABI ratifies
+a decade of established practice, and provides alternatives where the current practice
+is incompatible.
+
+
+Sub-ABIs and ABI-islands: Departing from the baseline (or 'mainland')
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+We do *not* require that compilers implement this ABI. Implementers can specify their own
+ABI, whether it is a subset of the allowed mappings of the baseline ABI (a sub-ABI), or 
+uses different mappings altogether (an ABI-island). Currently, sub-ABIs and ABI-islands implicitly
+arise with each new architecture release, and implementers quickly find new candidate mappings
+that are performant on their machines. Such mappings are proposed or added to mainstream
+compilers. However due to the lack of a baseline specification or widespread
+concurrency expertise, testing such mappings has been a challenge and concurrency bugs have been
+unintentionally introduced into compilers when new mappings are added.
+
+We need a baseline ABI in order to determine if a given sub-ABI respects or departs
+from the baseline. Adding command-line options is a logical consequence of defining such an ABI, 
+and makes it possible to track ABI compatibility of concurrent programs at compile or link-time,
+rather than runtime. It is the responsibility of the sub-ABI maintainer to ensure code built
+under their ABI does not mix with code built under the baseline. But a baseline must exist, 
+for sub-ABI compatibility to be decided in the first place.
+
+A baseline provides the means to describe or contain ABI-islands. Where a compiler implementation
+departs from the baseline completely (an ABI-island), it would be the responsibility of the
+maintainer of that implementation to ensure their programs are not mixed with programs built for 
+baseline ABI compatibility, or provide adequate warnings at compile time. 
+
+Further, numerous different parties have asked the ABI team whether
+the same atomics mapping is correct. Writing down the known cases helps engineers
+answer these queries without the concurrency expertise required to come up with
+current compatible mappings. A future section of the ABI could document common
+queries received by the ABI team, in order to assist implementers and engineers
+with such issues.
+
+Backwards Compatibility and New Architecture Features
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Put another way, a baseline ABI assists in deciding whether new mappings are compatible
+with compiler implementations targeting older versions of the Armv8 architecture.
+Certain instructions (such as Load/Store-Pair instructions [ARMARM_]) have different
+single-copy atomicity guarantees with respect different architecture versions. A baseline
+decides which assembly sequences can be composed correctly (at least as far as testing can decide).
+
+
+Compatibility Between Compilers and Runtimes
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The above issues also apply when ensuring object files compiled with different compilers can be mixed. 
+For instance LLVM and GCC code should be interoperable. At the time of writing we identified a number of
+places where this does not apply, both when compiling to target the same architecture version, and mixing
+different (compatible) architecture versions. Further, the above is not limited to statically compiled code. We found
+one instance where proposed mappings implemented in a JiT compiler would not be interoperable with respect
+to the statically compiled code the runtime links against. Even if a JiT compiles under one set of mappings,
+and is not subject to an ABI, it may still depend on other libraries or components that do have an ABI.
+
+
+Constrain optimisations
+~~~~~~~~~~~~~~~~~~~~~~~
+
+There have been several instances where optimisations have been incorrectly applied,
+or attempts to apply optimisations to atomic code generation that induce unexpected
+concurrent program behaviour. This has happened frequently enough that we need to
+collect these cases together to outline why they should not occur. For example
+
+Consider the following Concurrent Program::
+
+  // Shared-Memory Locations
+  _Atomic int* x;
+  _Atomic int* y;
+
+  // Memory Order Parameter
+  #define relaxed memory_order_relaxed
+  #define release memory_order_release
+  #define acquire memory_order_acquire
+
+  // Threads of Execution
+  void thread_0 () {
+    atomic_store_explicit(x,1,relaxed);
+    atomic_thread_fence(release);
+    atomic_store_explicit(y,1,relaxed);
+  }
+
+  void thread_1 () {
+    atomic_exchange_explicit(y,2,release);
+    atomic_thread_fence(acquire);
+    int r0 = atomic_load_explicit(x,relaxed);
+  }
+
+
+Under ISO C, the above Concurrent Program finishes execution in one of three
+possible outcomes (a reference for this notation is found here [PAPER_])::
+
+  { thread_1:r0=0; y=1; }
+  { thread_1:r0=1; y=1; }
+  { thread_1:r0=1; y=2; }
+
+In this case the value read by the exchange on ``thread_1`` is not used, and a
+compiler is free to remove references to unused data. It is not legal according
+to this ABI for a compliant implementation piler to translate the program into
+the following Assembly Sequences::
+
+  thread_0:
+    MOV W9,#1
+    STR W9,[X2]
+    DMB ISH
+    STR W3,[X4]
+
+  thread_1:
+    MOV W9,#2
+    SWP W9, WZR, [X2]
+    DMB ISHLD
+    LDR W3,[X4]
+
+where ``thread_0:X2`` contains the address of ``x``, ``thread_0:X4`` contains
+the address of ``y``, and ``thread_1:X2`` contains the address of ``y``,
+``thread_1:X4`` contains the address of ``x``.
+
+The ``exchange`` Atomic Operation is compiled to a ``SWP`` Assembly
+Instruction, where its destination register is the zero register ``WZR``. The 
+``acquire`` fence on ``thread_1`` is compiled to the ``DMB ISHLD`` Assembly 
+Instruction.
+
+Executing the compiled program on an Arm-based machine from a fixed initial
+state (where ``x`` and ``y`` are ``0``) produces one of the following outcomes,
+according to the AArch64 Memory Model contained in §B2 of the Arm Architecture
+Reference Manual [ARMARM_]::
+
+  { thread_1:r0=0; [y]=1; }
+  { thread_1:r0=0; [y]=2; } <-- Forbidden by source model, a bug!
+  { thread_1:r0=1; [y]=1; }
+  { thread_1:r0=1; [y]=2; }
+
+By comparing ``W3`` and the local variable ``r0`` of the original Concurrent
+Program we see there is one additional outcome of executing the compiled
+program that is not an outcome of executing the Concurrent Program. This is due
+to the fact that according to the Arm Architecture Reference Manual [ARMARM_] 
+*instructions where the destination register is WZR or XZR, are not regarded as
+doing a read for the purpose of a DMB LD barrier.*
+
+In this case the compiler introduces another outcome of Execution. To fix this
+issue, a compiler is not permitted to rewrite the destination register to be the
+zero register in this case::
+
+  thread_0:
+    MOV W9,#1
+    STR W9,[X2]
+    DMB ISH
+    STR W3,[X4]
+
+  thread_1:
+    MOV W9,#2
+    SWP W9, W10, [X2]
+    DMB ISHLD
+    LDR W3,[X4]
+
+Executing the compiled program on an Arm-based machine from a fixed initial
+state (where ``x`` and ``y`` are ``0``) produces one of the following outcomes,
+according to the AArch64 Memory Model contained in §B2 of the Arm Architecture
+Reference Manual [ARMARM_]::
+
+  { thread_1:r0=0; [y]=1; }
+  { thread_1:r0=1; [y]=1; }
+  { thread_1:r0=1; [y]=2; }
+
+As such the unexpected outcome has disappeared. There are multiple Mappings
+that exhibit this behaviour, those affected make use of ``SWP`` and ``LD<OP>``
+Assembly instructions.
+
+Documentation
+~~~~~~~~~~~~~
+
+The collective knowledge of atomics ABIs exists as numerous online discusions.
+These discussions are neither authoritative nor persistent. Some discussions 
+are now inaccessible and others are out of date. This is problematic given the
+inherent complexity of relaxed memory concurrency, the difficulty of finding bugs,
+and the possibility of user error. We believe an ABI is necessary to document
+this corner of code generation.
+
+
+The Mix Testing Process
+-----------------------
+
+ABI compatibility must be testable. Concurrency is not trivial, and the ABI
+presents a simplification of part of the problem that is understandable by
+engineers. We provide novel, yet simple, techniques and tools for
+testing ABI compatibility. These techniques reduce the difficulty of checking
+compatibility from a problem of understanding concurrent executions, to the
+familiar testing domain of comparing program outcomes of tests. This document
+does not preclude other means of testing compatibility however.
+
+We test for Compiler bugs, a Compiler Bug is defined as an outcome of a
+compiled program execution (under the AArch64 Memory Model contained in
+§B2 of the Arm Architecture Reference Manual [ARMARM_]) that is not 
+an outcome of execution of the source Concurrent Program (under the 
+ISO C memory model). Consider the hypothetical example where a source
+Concurrent Program finishes execution in one of three possible outcomes
+(a reference for this notation is found here [PAPER_])::
+
+  { thread_0:r0=0, thread_1:r0=1 }
+  { thread_0:r0=1, thread_1:r0=0 }
+  { thread_0:r0=1, thread_1:r0=1 }
+
+and one possible compiled program outcome has the following according to the
+AArch64 Memory Model contained in §B2 of the Arm Architecture Reference Manual
+[ARMARM_]::
+
+  { thread_0:X3=0, thread_1:X3=0 } <--- Forbidden by source model, Compiler Bug!
+  { thread_0:X3=0, thread_1:X3=1 }
+  { thread_0:X3=1, thread_1:X3=0 }
+  { thread_0:X3=1, thread_1:X3=1 }
+
+By comparing ``X3`` and the local variable ``r0`` of the original Concurrent
+Program in this example we see there is one additional outcome of executing the
+compiled program that is not an outcome of executing the source program (under
+the respective models). This suggests the Mappings under question are
+incompatible, and a compiler that implements them exhibits a Compiler Bug. To
+ensure compatibility we therefore test for the absence of such outcomes of the
+compiled programs when mixing all combinations of the above Mappings. We define
+the *Mix Testing* process as follows:
+
+#. Take an arbitrary Concurrent Program, when executed on the C/C++ memory
+   model will produce outcomes *S*.
+#. Split out the individual Atomic Operations from the initial concurrent
+   program into individual source files.
+#. Compile each individual source file containing an Atomic Operation 
+   using each Compiler Profile under test that generates Assembly Sequences
+   under a given Mapping.
+#. Combine the Assembly Sequences from above into *multiple* possible Compiled
+   Programs.
+#. Compute the outcomes of each compiled program under the AArch64 Memory Model
+   contained in §B2 of the Arm Architecture Reference Manual [ARMARM_]. Get a
+   *set* of compiled program outcomes *C*.
+#. If any compiled program set of outcomes *c* in *C* exhibits a Compiler Bug
+   (Check that *c* is a subset of *S*) with then the given Mappings are not
+   interoperable. 
+

From 7b9b325e5ef0c596b3d7e62a90e545af7d60fba8 Mon Sep 17 00:00:00 2001
From: lukeg101 <6547672+lukeg101@users.noreply.github.com>
Date: Wed, 17 Jul 2024 15:05:35 +0100
Subject: [PATCH 04/17] Remove mix testing from compat defn

---
 atomicsabi64/atomicsabi64.rst | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/atomicsabi64/atomicsabi64.rst b/atomicsabi64/atomicsabi64.rst
index b92563d1..30aefece 100644
--- a/atomicsabi64/atomicsabi64.rst
+++ b/atomicsabi64/atomicsabi64.rst
@@ -1218,9 +1218,8 @@ define the process by which we test compatibility.
 Definition of ABI-Compatibility for Atomic Operations
 -----------------------------------------------------
 
-*A compiler that implements the above set of Mappings is ABI-Compatible with
-respect to other compilers that implement the Mappings, if Mix Testing their
-code generation finds no Compiler Bugs.*
+*A compiler that implements the above set of Mappings and special cases is ABI-Compatible with
+respect to other compilers that implement the Mappings and special cases.*
 
 We impose some constraints on this definition:
 

From 6a140f83fd2aa5d4da1a98930e63e4b61689513a Mon Sep 17 00:00:00 2001
From: Wilco Dijkstra <wdijkstr@arm.com>
Date: Mon, 19 Aug 2024 11:36:17 +0100
Subject: [PATCH 05/17] Update tables to improve formatting.

---
 atomicsabi64/atomicsabi64.rst | 1308 ++++++++++++++++-----------------
 1 file changed, 639 insertions(+), 669 deletions(-)

diff --git a/atomicsabi64/atomicsabi64.rst b/atomicsabi64/atomicsabi64.rst
index 30aefece..92ffc2ad 100644
--- a/atomicsabi64/atomicsabi64.rst
+++ b/atomicsabi64/atomicsabi64.rst
@@ -385,13 +385,13 @@ To reduce repetition, we use the following notational conventions
   +=========================================+======================================+
   | ``memory_order_relaxed``                | ``relaxed``                          |
   +-----------------------------------------+--------------------------------------+
-  | ``memory_order_acquire``                | ``acq``                              |
+  | ``memory_order_acquire``                | ``acquire``                          |
   +-----------------------------------------+--------------------------------------+
-  | ``memory_order_release``                | ``rel``                              |
+  | ``memory_order_release``                | ``release``                          |
   +-----------------------------------------+--------------------------------------+
   | ``memory_order_acq_rel``                | ``acq_rel``                          |
   +-----------------------------------------+--------------------------------------+
-  | ``memory_order_seq_cst``                | ``sc``                               |
+  | ``memory_order_seq_cst``                | ``seq_cst``                          |
   +-----------------------------------------+--------------------------------------+
 
 In what follows ``loc`` refers to the location, ``val`` refers to a value
@@ -416,9 +416,7 @@ options.
   |                                            | ARCH2     | ``option B``                         |
   +--------------------------------------------+-----------+--------------------------------------+
 
-Where ARCH is for example BASE (armv8), LSE, LSE2, LSE128, RCPC, or LRCPC3.
-ARCH describes the required extension, with BASE meaning Armv8-A with no
-extensions and LSE is shorthand for FEAT_LSE (likewise for the other extensions).
+Where ARCH is either the base architecture (Armv8-A) or an extension like FEAT_LSE.
 
 Lastly, all operations are in a shorthand form:
 
@@ -453,178 +451,202 @@ Mappings for 32-bit types
 In what follows, register ``X1`` contains the location ``loc`` and ``W2``
 contains ``val``. The result is returned in ``W0``.
 
-  +-------------------------------------------------------------------------------------------+
-  | Note                                                                                      |
-  +===========================================================================================+
-  | ``*`` Using ``WZR`` or ``XZR`` for the destination register is invalid (Section 4.7).     |
-  +-------------------------------------------------------------------------------------------+
-
 .. table::
 
-  +------------------------------------------+--------------------------------------+
-  | Atomic Operation                         | Assembly Sequence                    | 
-  +==========================================+======================================+
-  | ``store(loc,val,relaxed)``               | ``STR   W2, [X1]``                   |
-  +------------------------------------------+--------------------------------------+
-  | ``store(loc,val,rel)``                   | ``STLR  W2, [X1]``                   |
-  | ``store(loc,val,sc)``                    |                                      |
-  +------------------------------------------+--------------------------------------+
-  | ``load(loc,relaxed)``                    | ``LDR   W2, [X1]``                   |
-  +-------------------------------+----------+--------------------------------------+
-  | ``load(loc,acq)``             | ``BASE`` | ``LDAR  W2, [X1]``                   |
-  +                               +----------+--------------------------------------+
-  |                               | ``RCPC`` | ``LDAPR W2, [X1]``                   |
-  +-------------------------------+----------+--------------------------------------+
-  | ``load(loc,sc)``                         | ``LDAR  W2, [X1]``                   |
-  +------------------------------------------+--------------------------------------+
-  | ``fence(relaxed)``                       | ``NOP``                              |
-  +------------------------------------------+--------------------------------------+
-  | ``fence(acq)``                           | ``DMB ISHLD``                        |
-  +------------------------------------------+--------------------------------------+
-  | ``fence(rel)``                           | ``DMB ISH``                          |
-  | ``fence(acq_rel)``                       |                                      |
-  | ``fence(sc)``                            |                                      |
-  +-------------------------------+----------+--------------------------------------+
-  | ``exchange(loc,val,relaxed)`` | ``BASE`` | ``loop:``                            |
-  |                               |          |   ``LDXR   W0, [X1]``                |
-  |                               |          |                                      |
-  |                               |          |   ``STXR   W3, W2, [X1]``            |
-  |                               |          |                                      |
-  |                               |          |   ``CBNZ   W3, loop``                |
-  |                               +----------+--------------------------------------+
-  |                               | ``LSE``  | ``SWP    W2, W0, [X1]`` *            | 
-  +-------------------------------+----------+--------------------------------------+
-  | ``exchange(loc,val,acq)``     | ``BASE`` | ``loop:``                            |
-  |                               |          |   ``LDAXR  W0, [X1]``                |
-  |                               |          |                                      |
-  |                               |          |   ``STXR   W3, W2, [X1]``            |
-  |                               |          |                                      |
-  |                               |          |   ``CBNZ   W3, loop``                |
-  |                               +----------+--------------------------------------+
-  |                               | ``LSE``  | ``SWPA   W2, W0, [X1]`` *            |  
-  +-------------------------------+----------+--------------------------------------+
-  | ``exchange(loc,val,rel)``     | ``BASE`` | ``loop:``                            |
-  |                               |          |   ``LDXR   W0, [X1]``                |
-  |                               |          |                                      |
-  |                               |          |   ``STLXR  W3, W2, [X1]``            |
-  |                               |          |                                      |
-  |                               |          |   ``CBNZ   W3, loop``                |
-  |                               +----------+--------------------------------------+
-  |                               | ``LSE``  | ``SWPL   W2, W0, [X1]`` *            | 
-  +-------------------------------+----------+--------------------------------------+
-  | ``exchange(loc,val,acq_rel)`` | ``BASE`` | ``loop:``                            |
-  | ``exchange(loc,val,sc)``      |          |   ``LDAXR  W0, [X1]``                |
-  |                               |          |                                      |
-  |                               |          |   ``STLXR  W3, W2, [X1]``            |
-  |                               |          |                                      |
-  |                               |          |   ``CBNZ   W3, loop``                |
-  |                               +----------+--------------------------------------+
-  |                               | ``LSE``  | ``SWPAL  W2, W0, [X1]`` *            | 
-  +-------------------------------+----------+--------------------------------------+
-  | ``fetch_add(loc,val,relaxed)``| ``BASE`` | ``loop:``                            |
-  |                               |          |   ``LDXR   W0, [X1]``                |
-  |                               |          |                                      |
-  |                               |          |   ``ADD    W2, W2, W0``              |
-  |                               |          |                                      |
-  |                               |          |   ``STXR   W3, W2, [X1]``            |
-  |                               |          |                                      |
-  |                               |          |   ``CBNZ   W3, loop``                |
-  +                               +----------+--------------------------------------+
-  |                               | ``LSE``  | ``LDADD    W2, W0, [X1]`` *          |
-  +-------------------------------+----------+--------------------------------------+
-  | ``fetch_add(loc,val,acq)``    | ``BASE`` | ``loop:``                            |
-  |                               |          |   ``LDAXR  W0, [X1]``                |
-  |                               |          |                                      |
-  |                               |          |   ``ADD    W2, W2, W0``              |
-  |                               |          |                                      |
-  |                               |          |   ``STXR   W3, W2, [X1]``            |
-  |                               |          |                                      |
-  |                               |          |   ``CBNZ   W3, loop``                |
-  |                               +----------+--------------------------------------+
-  |                               | ``LSE``  | ``LDADDA   W2, W0, [X1]`` *          | 
-  +-------------------------------+----------+--------------------------------------+
-  | ``fetch_add(loc,val,rel)``    | ``BASE`` | ``loop:``                            |
-  |                               |          |   ``LDXR   W0, [X1]``                |
-  |                               |          |                                      |
-  |                               |          |   ``ADD    W2, W2, W0``              |
-  |                               |          |                                      |
-  |                               |          |   ``STLXR  W3, W2, [X1]``            |
-  |                               |          |                                      |
-  |                               |          |   ``CBNZ   W3, loop``                |
-  |                               +----------+--------------------------------------+
-  |                               | ``LSE``  | ``LDADDL   W2, W0, [X1]`` *          |
-  +-------------------------------+----------+--------------------------------------+
-  | ``fetch_add(loc,val,acq_rel)``| ``BASE`` | ``loop:``                            |
-  | ``fetch_add(loc,val,sc)``     |          |   ``LDXAR  W0, [X1]``                |
-  |                               |          |                                      |
-  |                               |          |   ``ADD    W2, W2, W0``              |
-  |                               |          |                                      |
-  |                               |          |   ``STLXR  W3, W2, [X1]``            |
-  |                               |          |                                      |
-  |                               |          |   ``CBNZ   W3, loop``                | 
-  |                               +----------+--------------------------------------+
-  |                               | ``LSE``  | ``LDADDAL  W2, W0, [X1]`` *          |
-  +-------------------------------+----------+--------------------------------------+
-  | ``compare_exchange_strong(``  | ``BASE`` | ``loop:``                            |
-  |   ``loc,&exp,val,relaxed,``   |          |   ``LDXR   W0, [X1]``                |
-  |   ``relaxed)``                |          |                                      |
-  |                               |          |   ``CMP    W0, W4``                  |
-  |                               |          |                                      |
-  |                               |          |   ``B.NE    fail``                   |
-  |                               |          |                                      |
-  |                               |          |   ``STXR   W3, W2, [X1]``            |
-  |                               |          |                                      |
-  |                               |          |   ``CBNZ   W3, loop``                |
-  |                               |          |                                      |
-  |                               |          | ``fail:``                            |
-  |                               +----------+--------------------------------------+
-  |                               | ``LSE``  | ``CAS    W0, W2, [X1]`` *            |
-  +-------------------------------+----------+--------------------------------------+
-  | ``compare_exchange_strong(``  | ``BASE`` | ``loop:``                            |
-  |   ``loc,&exp,val,acq,acq)``   |          |   ``LDAXR  W0, [X1]``                |
-  |                               |          |                                      |
-  |                               |          |   ``CMP    W0, W4``                  |
-  |                               |          |                                      |
-  |                               |          |   ``B.NE    fail``                   |
-  |                               |          |                                      |
-  |                               |          |   ``STXR   W3, W2, [X1]``            |
-  |                               |          |                                      |
-  |                               |          |   ``CBNZ   W3, loop``                |
-  |                               |          |                                      |
-  |                               |          | ``fail:``                            |
-  |                               +----------+--------------------------------------+
-  |                               | ``LSE``  | ``CASA   W0, W2, [X1]`` *            |
-  +-------------------------------+----------+--------------------------------------+
-  | ``compare_exchange_strong(``  | ``BASE`` | ``loop:``                            |
-  |   ``loc,&exp,val,rel,rel)``   |          |   ``LDXR   W0, [X1]``                |
-  |                               |          |                                      |
-  |                               |          |   ``CMP    W0, W4``                  |
-  |                               |          |                                      |
-  |                               |          |   ``B.NE    fail``                   |
-  |                               |          |                                      |
-  |                               |          |   ``STLXR  W3, W2, [X1]``            |
-  |                               |          |                                      |
-  |                               |          |   ``CBNZ   W3, loop``                |
-  |                               |          |                                      |
-  |                               |          | ``fail:``                            |
-  |                               +----------+--------------------------------------+
-  |                               | ``LSE``  | ``CASL   W0, W2, [X1]`` *            |
-  +-------------------------------+----------+--------------------------------------+
-  | ``compare_exchange_strong(``  | ``BASE`` | ``loop:``                            |
-  |  ``loc,&exp,val,acq_rel,acq)``|          |   ``LDAXR  W0, [X1]``                |
-  | ``compare_exchange_strong(``  |          |                                      |
-  |   ``loc,&exp,val,sc,sc)``     |          |   ``CMP    W0, W4``                  |
-  |                               |          |                                      |
-  |                               |          |   ``B.NE    fail``                   |
-  |                               |          |                                      |
-  |                               |          |   ``STLXR  W3, W2, [X1]``            |
-  |                               |          |                                      |
-  |                               |          |   ``CBNZ   W3, loop``                |
-  |                               |          |                                      |
-  |                               |          | ``fail:``                            |
-  |                               +----------+--------------------------------------+
-  |                               | ``LSE``  | ``CASAL  W0, W2, [X1]`` *            |
-  +-------------------------------+----------+--------------------------------------+
+  +-----------------------------------------------------+--------------------------------------+
+  | Atomic Operation                                    | Assembly Sequence                    |
+  +=====================================================+======================================+
+  | ``store(loc,val,relaxed)``                          | .. code-block:: none                 |
+  |                                                     |                                      |
+  |                                                     |    STR   W2, [X1]                    |
+  +-----------------------------------------------------+--------------------------------------+
+  | ``store(loc,val,release)``                          | .. code-block:: none                 |
+  |                                                     |                                      |
+  | ``store(loc,val,seq_cst)``                          |    STLR  W2, [X1]                    |
+  +-----------------------------------------------------+--------------------------------------+
+  | ``load(loc,relaxed)``                               | .. code-block:: none                 |
+  |                                                     |                                      |
+  |                                                     |    LDR    W2, [X1]                   |
+  +-------------------------------------+---------------+--------------------------------------+
+  | ``load(loc,acquire)``               | ``Armv8-A``   | .. code-block:: none                 |
+  |                                     |               |                                      |
+  |                                     |               |    LDAR  W2, [X1]                    |
+  +                                     +---------------+--------------------------------------+
+  |                                     | ``FEAT_RCPC`` | .. code-block:: none                 |
+  |                                     |               |                                      |
+  |                                     |               |    LDAPR  W2, [X1]                   |
+  +-------------------------------------+---------------+--------------------------------------+
+  | ``load(loc,seq_cst)``                               | .. code-block:: none                 |
+  |                                                     |                                      |
+  |                                                     |    LDAR   W2, [X1]                   |
+  +-----------------------------------------------------+--------------------------------------+
+  | ``fence(relaxed)``                                  | .. code-block:: none                 |
+  |                                                     |                                      |
+  |                                                     |    NOP                               |
+  +-----------------------------------------------------+--------------------------------------+
+  | ``fence(acquire)``                                  | .. code-block:: none                 |
+  |                                                     |                                      |
+  |                                                     |    DMB ISHLD                         |
+  +-----------------------------------------------------+--------------------------------------+
+  | ``fence(release)``                                  | .. code-block:: none                 |
+  |                                                     |                                      |
+  | ``fence(acq_rel)``                                  |    DMB ISH                           |
+  |                                                     |                                      |
+  | ``fence(seq_cst)``                                  |                                      |
+  +-------------------------------------+---------------+--------------------------------------+
+  | ``exchange(loc,val,relaxed)``       | ``Armv8-A``   | .. code-block:: none                 |
+  |                                     |               |                                      |
+  |                                     |               |    loop:                             |
+  |                                     |               |      LDXR   W0, [X1]                 |
+  |                                     |               |      STXR   W3, W2, [X1]             |
+  |                                     |               |      CBNZ   W3, loop                 |
+  |                                     +---------------+--------------------------------------+
+  |                                     | ``FEAT_LSE``  | .. code-block:: none                 |
+  |                                     |               |                                      |
+  |                                     |               |    SWP    W2, W0, [X1] *             |
+  +-------------------------------------+---------------+--------------------------------------+
+  | ``exchange(loc,val,acquire)``       | ``Armv8-A``   | .. code-block:: none                 |
+  |                                     |               |                                      |
+  |                                     |               |    loop:                             |
+  |                                     |               |      LDAXR  W0, [X1]                 |
+  |                                     |               |      STXR   W3, W2, [X1]             |
+  |                                     |               |      CBNZ   W3, loop                 |
+  |                                     +---------------+--------------------------------------+
+  |                                     | ``FEAT_LSE``  | .. code-block:: none                 |
+  |                                     |               |                                      |
+  |                                     |               |    SWPA   W2, W0, [X1] *             |
+  +-------------------------------------+---------------+--------------------------------------+
+  | ``exchange(loc,val,release)``       | ``Armv8-A``   | .. code-block:: none                 |
+  |                                     |               |                                      |
+  |                                     |               |    loop:                             |
+  |                                     |               |      LDXR   W0, [X1]                 |
+  |                                     |               |      STLXR  W3, W2, [X1]             |
+  |                                     |               |      CBNZ   W3, loop                 |
+  |                                     +---------------+--------------------------------------+
+  |                                     | ``FEAT_LSE``  | .. code-block:: none                 |
+  |                                     |               |                                      |
+  |                                     |               |    SWPL   W2, W0, [X1] *             |
+  +-------------------------------------+---------------+--------------------------------------+
+  | ``exchange(loc,val,acq_rel)``       | ``Armv8-A``   | .. code-block:: none                 |
+  | ``exchange(loc,val,seq_cst)``       |               |                                      |
+  |                                     |               |    loop:                             |
+  |                                     |               |      LDAXR  W0, [X1]                 |
+  |                                     |               |      STLXR  W3, W2, [X1]             |
+  |                                     |               |      CBNZ   W3, loop                 |
+  |                                     +---------------+--------------------------------------+
+  |                                     | ``FEAT_LSE``  | .. code-block:: none                 |
+  |                                     |               |                                      |
+  |                                     |               |    SWAL   W2, W0, [X1] *             |
+  +-------------------------------------+---------------+--------------------------------------+
+  | ``fetch_add(loc,val,relaxed)``      | ``Armv8-A``   | .. code-block:: none                 |
+  |                                     |               |                                      |
+  |                                     |               |    loop:                             |
+  |                                     |               |      LDXR   W0, [X1]                 |
+  |                                     |               |      ADD    W2, W2, W0               |
+  |                                     |               |      STXR   W3, W2, [X1]             |
+  |                                     |               |      CBNZ   W3, loop                 |
+  |                                     +---------------+--------------------------------------+
+  |                                     | ``FEAT_LSE``  | .. code-block:: none                 |
+  |                                     |               |                                      |
+  |                                     |               |    LDADD  W0, W2, [X1] *             |
+  +-------------------------------------+---------------+--------------------------------------+
+  | ``fetch_add(loc,val,acquire)``      | ``Armv8-A``   | .. code-block:: none                 |
+  |                                     |               |                                      |
+  |                                     |               |    loop:                             |
+  |                                     |               |      LDAXR  W0, [X1]                 |
+  |                                     |               |      ADD    W2, W2, W0               |
+  |                                     |               |      STXR   W3, W2, [X1]             |
+  |                                     |               |      CBNZ   W3, loop                 |
+  |                                     +---------------+--------------------------------------+
+  |                                     | ``FEAT_LSE``  | .. code-block:: none                 |
+  |                                     |               |                                      |
+  |                                     |               |    LDADDA W0, W2, [X1] *             |
+  +-------------------------------------+---------------+--------------------------------------+
+  | ``fetch_add(loc,val,release)``      | ``Armv8-A``   | .. code-block:: none                 |
+  |                                     |               |                                      |
+  |                                     |               |    loop:                             |
+  |                                     |               |      LDXR   W0, [X1]                 |
+  |                                     |               |      ADD    W2, W2, W0               |
+  |                                     |               |      STLXR  W3, W2, [X1]             |
+  |                                     |               |      CBNZ   W3, loop                 |
+  |                                     +---------------+--------------------------------------+
+  |                                     | ``FEAT_LSE``  | .. code-block:: none                 |
+  |                                     |               |                                      |
+  |                                     |               |    LDADDL W0, W2, [X1] *             |
+  +-------------------------------------+---------------+--------------------------------------+
+  | ``fetch_add(loc,val,acq_rel)``      | ``Armv8-A``   | .. code-block:: none                 |
+  | ``fetch_add(loc,val,seq_cst)``      |               |                                      |
+  |                                     |               |    loop:                             |
+  |                                     |               |      LDAXR  W0, [X1]                 |
+  |                                     |               |      ADD    W2, W2, W0               |
+  |                                     |               |      STLXR  W3, W2, [X1]             |
+  |                                     |               |      CBNZ   W3, loop                 |
+  |                                     +---------------+--------------------------------------+
+  |                                     | ``FEAT_LSE``  | .. code-block:: none                 |
+  |                                     |               |                                      |
+  |                                     |               |    LDADDAL W0, W2, [X1] *            |
+  +-------------------------------------+---------------+--------------------------------------+
+  | ``compare_exchange_strong(``        | ``Armv8-A``   | .. code-block:: none                 |
+  |   ``loc,&exp,val,relaxed,relaxed)`` |               |                                      |
+  |                                     |               |    loop:                             |
+  |                                     |               |      LDXR   W0, [X1]                 |
+  |                                     |               |      CMP    W0, W4                   |
+  |                                     |               |      B.NE   fail                     |
+  |                                     |               |      STXR   W3, W2, [X1]             |
+  |                                     |               |      CBNZ   W3, loop                 |
+  |                                     |               |    fail:                             |
+  |                                     +---------------+--------------------------------------+
+  |                                     | ``FEAT_LSE``  | .. code-block:: none                 |
+  |                                     |               |                                      |
+  |                                     |               |    CAS    W0, W2, [X1] *             |
+  +-------------------------------------+---------------+--------------------------------------+
+  | ``compare_exchange_strong(``        | ``Armv8-A``   | .. code-block:: none                 |
+  |   ``loc,&exp,val,acquire,acquire)`` |               |                                      |
+  |                                     |               |    loop:                             |
+  |                                     |               |      LDAXR  W0, [X1]                 |
+  |                                     |               |      CMP    W0, W4                   |
+  |                                     |               |      B.NE   fail                     |
+  |                                     |               |      STXR   W3, W2, [X1]             |
+  |                                     |               |      CBNZ   W3, loop                 |
+  |                                     |               |    fail:                             |
+  |                                     +---------------+--------------------------------------+
+  |                                     | ``FEAT_LSE``  | .. code-block:: none                 |
+  |                                     |               |                                      |
+  |                                     |               |    CASA   W0, W2, [X1] *             |
+  +-------------------------------------+---------------+--------------------------------------+
+  | ``compare_exchange_strong(``        | ``Armv8-A``   | .. code-block:: none                 |
+  |   ``loc,&exp,val,release,release)`` |               |                                      |
+  |                                     |               |    loop:                             |
+  |                                     |               |      LDXR   W0, [X1]                 |
+  |                                     |               |      CMP    W0, W4                   |
+  |                                     |               |      B.NE   fail                     |
+  |                                     |               |      STLXR  W3, W2, [X1]             |
+  |                                     |               |      CBNZ   W3, loop                 |
+  |                                     |               |    fail:                             |
+  |                                     +---------------+--------------------------------------+
+  |                                     | ``FEAT_LSE``  | .. code-block:: none                 |
+  |                                     |               |                                      |
+  |                                     |               |    CASL   W0, W2, [X1] *             |
+  +-------------------------------------+---------------+--------------------------------------+
+  | ``compare_exchange_strong(``        | ``Armv8-A``   | .. code-block:: none                 |
+  |   ``loc,&exp,val,acq_rel,acquire)`` |               |                                      |
+  | ``compare_exchange_strong(``        |               |    loop:                             |
+  |   ``loc,&exp,val,seq_cst,seq_cst)`` |               |      LDAXR  W0, [X1]                 |
+  |                                     |               |      CMP    W0, W4                   |
+  |                                     |               |      B.NE   fail                     |
+  |                                     |               |      STLXR  W3, W2, [X1]             |
+  |                                     |               |      CBNZ   W3, loop                 |
+  |                                     |               |    fail:                             |
+  |                                     +---------------+--------------------------------------+
+  |                                     | ``FEAT_LSE``  | .. code-block:: none                 |
+  |                                     |               |                                      |
+  |                                     |               |    CASAL  W0, W2, [X1] *             |
+  +-------------------------------------+---------------+--------------------------------------+
+  | Note                                                                                       |
+  +--------------------------------------------------------------------------------------------+
+  | ``*`` Using ``WZR`` or ``XZR`` for the destination register is invalid (Section 4.7).      |
+  +--------------------------------------------------------------------------------------------+
+
 
 Mappings for 8-bit types
 ------------------------
@@ -642,7 +664,7 @@ The Mappings for 16-bit types are the same as 32-bit types except they use the
 Mappings for 64-bit types
 -------------------------
 
-The Msappings for 64-bit types are the same as 32-bit types except the registers
+The Mappings for 64-bit types are the same as 32-bit types except the registers
 used are X-registers.
 
 Mappings for 128-bit types
@@ -657,498 +679,446 @@ In what follows, register ``X4`` contains the location ``loc``, ``X2`` and
 
 .. table::
 
-  +-----------------------------------------------+--------------------------------------+
-  | Atomic Operation                              | Assembly Sequence                    |
-  +=================================+=============+======================================+
-  | ``store(loc,val,relaxed)``      | ``BASE``    | ``loop:``                            |
-  |                                 |             |   ``LDXP   XZR, X1, [X4]``           |
-  |                                 |             |                                      |
-  |                                 |             |   ``STXP   W5, X2, X3, [X4]``        |
-  |                                 |             |                                      |
-  |                                 |             |   ``CBNZ   W5, loop``                |
-  |                                 +-------------+--------------------------------------+
-  |                                 | ``LSE``     | ``LDP   X0, X1, [X4]``               |
-  |                                 |             |                                      |
-  |                                 |             | ``loop:``                            |
-  |                                 |             |   ``MOV    X6, X0``                  |
-  |                                 |             |                                      |
-  |                                 |             |   ``MOV    X7, X1``                  |
-  |                                 |             |                                      |
-  |                                 |             |   ``CASP   X0, X1, X2, X3, [X4]``    |
-  |                                 |             |                                      |
-  |                                 |             |   ``CMP    X0, X6``                  |
-  |                                 |             |                                      |
-  |                                 |             |   ``CCMP   X1, X7, 0, EQ``           |
-  |                                 |             |                                      |
-  |                                 |             |   ``B.NE   loop``                    |
-  |                                 +-------------+--------------------------------------+
-  |                                 | ``LSE2``    | ``STP   x2, X3, [X4]``               |
-  +---------------------------------+-------------+--------------------------------------+
-  | ``store(loc,val,rel)``          | ``BASE``    | ``loop:``                            |
-  |                                 |             |   ``LDXP    XZR, X1, [X4]``          |
-  |                                 |             |   ``STLXP   W5, X2, X3, [X4]``       |
-  |                                 |             |   ``CBNZ    W5, loop``               |
-  |                                 +-------------+--------------------------------------+
-  |                                 | ``LSE``     | ``LDP   X0, X1, [X4]``               |
-  |                                 |             |                                      |
-  |                                 |             | ``loop:``                            |
-  |                                 |             |   ``MOV    X6, X0``                  |
-  |                                 |             |                                      |
-  |                                 |             |   ``MOV    X7, X1``                  |
-  |                                 |             |                                      |
-  |                                 |             |   ``CASPL  X0, X1, X2, X3, [X4]``    |
-  |                                 |             |                                      |
-  |                                 |             |   ``CMP    X0, X6``                  |
-  |                                 |             |                                      |
-  |                                 |             |   ``CCMP   X1, X7, 0, EQ``           |
-  |                                 |             |                                      |
-  |                                 |             |   ``B.NE   loop``                    |
-  |                                 +-------------+--------------------------------------+
-  |                                 | ``LSE2``    | ``DMB   ISH``                        |
-  |                                 |             |                                      |
-  |                                 |             | ``STP   X2, X3, [X4]``               |
-  |                                 +-------------+--------------------------------------+
-  |                                 | ``LRCPC3``  | ``STILP   X2, X3, [X4]``             |
-  +---------------------------------+-------------+--------------------------------------+
-  | ``store(loc,val,sc)``           | ``BASE``    | ``loop:``                            |
-  |                                 |             |   ``LDXP    XZR, X1, [X4]``          |
-  |                                 |             |                                      |
-  |                                 |             |   ``STLXP   W5, X2, X3, [X4]``       |
-  |                                 |             |                                      |
-  |                                 |             |   ``CBNZ    W5, loop``               |
-  |                                 +-------------+--------------------------------------+
-  |                                 | ``LSE``     | ``LDP   X0, X1, [X4]``               |
-  |                                 |             |                                      |
-  |                                 |             | ``loop:``                            |
-  |                                 |             |   ``MOV    X6, X0``                  |
-  |                                 |             |                                      |
-  |                                 |             |   ``MOV    X7, X1``                  |
-  |                                 |             |                                      |
-  |                                 |             |   ``CASPL  X0, X1, X2, X3, [X4]``    |
-  |                                 |             |                                      |
-  |                                 |             |   ``CMP    X0, X6``                  |
-  |                                 |             |                                      |
-  |                                 |             |   ``CCMP   X1, X7, 0, EQ``           |
-  |                                 |             |                                      |
-  |                                 |             |   ``B.NE   loop``                    |
-  |                                 +-------------+--------------------------------------+
-  |                                 | ``LSE2``    | ``DMB   ISH``                        |
-  |                                 |             |                                      |
-  |                                 |             | ``STP   X2, X3, [X4]``               |
-  |                                 |             |                                      |
-  |                                 |             | ``DMB   ISH``                        |
-  |                                 +-------------+--------------------------------------+
-  |                                 | ``LRCPC3``  | ``STILP   X2, X3, [X4]``             |
-  +---------------------------------+-------------+--------------------------------------+
-  | ``load(loc,relaxed)``           | ``BASE``    | ``loop:``                            |
-  |                                 |             |   ``LDXP   X0, X1, [X4]``            |
-  |                                 |             |                                      |
-  |                                 |             |   ``STXP   W5, X0, X1, [X4]``        |
-  |                                 |             |                                      |
-  |                                 |             |   ``CBNZ   W5, loop``                |
-  |                                 +-------------+--------------------------------------+
-  |                                 | ``LSE``     | ``CASP   X0, X1, X0, X1, [X4]``      |
-  |                                 +-------------+--------------------------------------+
-  |                                 | ``LSE2``    | ``LDP   X0, X1, [X4]``               |
-  +---------------------------------+-------------+--------------------------------------+
-  | ``load(loc,acq)``               | ``BASE``    | ``loop:``                            |
-  |                                 |             |   ``LDAXP  X0, X1, [X4]``            |
-  |                                 |             |                                      |
-  |                                 |             |   ``STXP   W5, X0, X1, [X4]``        |
-  |                                 |             |                                      |
-  |                                 |             |   ``CBNZ   W5, loop``                |
-  |                                 +-------------+--------------------------------------+
-  |                                 | ``LSE``     | ``CASPA  X0, X1, X0, X1, [X4]``      |
-  |                                 +-------------+--------------------------------------+
-  |                                 | ``LSE2``    | ``LDP   X0, X1, [X4]``               |
-  |                                 |             |                                      |
-  |                                 |             | ``DMB   ISHLD``                      |
-  |                                 +-------------+--------------------------------------+
-  |                                 | ``LRCPC3``  | ``LDIAPP   X0, X1, [X4]``            |
-  +---------------------------------+-------------+--------------------------------------+
-  | ``load(loc,sc)``                | ``BASE``    | ``loop:``                            |
-  |                                 |             |   ``LDAXP   X0, X1, [X4]``           |
-  |                                 |             |                                      |
-  |                                 |             |   ``STXP    W5, X0, X1, [X4]``       |
-  |                                 |             |                                      |
-  |                                 |             |   ``CBNZ    W5, loop``               |
-  |                                 +-------------+--------------------------------------+
-  |                                 | ``LSE``     | ``CASPA  X0, X1, X0, X1, [X4]``      |
-  |                                 +-------------+--------------------------------------+
-  |                                 | ``LSE2``    | ``LDAR  X5, [X4]``                   |
-  |                                 |             |                                      |
-  |                                 |             | ``LDP   X0, X1, [X4]``               |
-  |                                 |             |                                      |
-  |                                 |             | ``DMB   ISHLD``                      |
-  |                                 +-------------+--------------------------------------+
-  |                                 | ``LRCPC3``  | ``LDAR   X5, [X4]``                  |
-  |                                 |             |                                      |
-  |                                 |             | ``LDIAPP X0, X1, [X4]``              |
-  +---------------------------------+-------------+--------------------------------------+
-  | ``exchange(loc,val,relaxed)``   | ``BASE``    | ``loop:``                            |
-  |                                 |             |   ``LDXP   X0, X1, [X4]``            |
-  |                                 |             |                                      |
-  |                                 |             |   ``STXP   W5, X2, X3, [X4]``        |
-  |                                 |             |                                      |
-  |                                 |             |   ``CBNZ   W5, loop``                |
-  |                                 +-------------+--------------------------------------+
-  |                                 | ``LSE``     | ``LDP   X0, X1, [X4]``               |
-  |                                 |             |                                      |
-  |                                 |             | ``loop:``                            |
-  |                                 |             |   ``MOV    X6, X0``                  |
-  |                                 |             |                                      |
-  |                                 |             |   ``MOV    X7, X1``                  |
-  |                                 |             |                                      |
-  |                                 |             |   ``CASP   X0, X1, X2, X3, [X4]``    |
-  |                                 |             |                                      |
-  |                                 |             |   ``CMP    X0, X6``                  |
-  |                                 |             |                                      |
-  |                                 |             |   ``CCMP   X1, X7, 0, EQ``           |
-  |                                 |             |                                      |
-  |                                 |             |   ``B.NE   loop``                    |
-  |                                 +-------------+--------------------------------------+
-  |                                 | ``LSE128``  | ``MOV    X0, X2``                    |
-  |                                 |             |                                      |
-  |                                 |             | ``MOV    X1, X3``                    |
-  |                                 |             |                                      |
-  |                                 |             | ``SWPP   X0, X1, [X4]``              |
-  +---------------------------------+-------------+--------------------------------------+
-  | ``exchange(loc,val,acq)``       | ``BASE``    | ``loop:``                            |
-  |                                 |             |   ``LDAXP  X0, X1, [X4]``            |
-  |                                 |             |                                      |
-  |                                 |             |   ``STXP   W5, X2, X3, [X4]``        |
-  |                                 |             |                                      |
-  |                                 |             |   ``CBNZ   W5, loop``                |
-  |                                 +-------------+--------------------------------------+
-  |                                 | ``LSE``     | ``LDP   X0, X1, [X4]``               |
-  |                                 |             |                                      |
-  |                                 |             | ``loop:``                            |
-  |                                 |             |   ``MOV    X6, X0``                  |
-  |                                 |             |                                      |
-  |                                 |             |   ``MOV    X7, X1``                  |
-  |                                 |             |                                      |
-  |                                 |             |   ``CASPA  X0, X1, X2, X3, [X4]``    |
-  |                                 |             |                                      |
-  |                                 |             |   ``CMP    X0, X6``                  |
-  |                                 |             |                                      |
-  |                                 |             |   ``CCMP   X1, X7, 0, EQ``           |
-  |                                 |             |                                      |
-  |                                 |             |   ``B.NE   loop``                    |
-  |                                 +-------------+--------------------------------------+
-  |                                 | ``LSE128``  | ``MOV    X0, X2``                    |
-  |                                 |             |                                      |
-  |                                 |             | ``MOV    X1, X3``                    |
-  |                                 |             |                                      |
-  |                                 |             | ``SWPPA  X0, X1, [X4]``              |
-  +---------------------------------+-------------+--------------------------------------+
-  | ``exchange(loc,val,rel)``       | ``BASE``    | ``loop:``                            |
-  |                                 |             |   ``LDXP   X0, X1, [X4]``            |
-  |                                 |             |                                      |
-  |                                 |             |   ``STLXP  W5, X2, X3, [X4]``        |
-  |                                 |             |                                      |
-  |                                 |             |   ``CBNZ   W5, loop``                |
-  |                                 +-------------+--------------------------------------+
-  |                                 | ``LSE``     | ``LDP   X0, X1, [X4]``               |
-  |                                 |             |                                      |
-  |                                 |             | ``loop:``                            |
-  |                                 |             |   ``MOV    X6, X0``                  |
-  |                                 |             |                                      |
-  |                                 |             |   ``MOV    X7, X1``                  |
-  |                                 |             |                                      |
-  |                                 |             |   ``CASPL  X0, X1, X2, X3, [X4]``    |
-  |                                 |             |                                      |
-  |                                 |             |   ``CMP    X0, X6``                  |
-  |                                 |             |                                      |
-  |                                 |             |   ``CCMP   X1, X7, 0, EQ``           |
-  |                                 |             |                                      |
-  |                                 |             |   ``B.NE   loop``                    |
-  |                                 +-------------+--------------------------------------+
-  |                                 | ``LSE128``  | ``MOV    X0, X2``                    |
-  |                                 |             |                                      |
-  |                                 |             | ``MOV    X1, X3``                    |
-  |                                 |             |                                      |
-  |                                 |             | ``SWPPL  X0, X1, [X4]``              |
-  +---------------------------------+-------------+--------------------------------------+
-  | ``exchange(loc,val,acq_rel)``   | ``BASE``    | ``loop:``                            |
-  | ``exchange(loc,val,sc)``        |             |   ``LDAXP  X0, X1, [X4]``            |
-  |                                 |             |                                      |
-  |                                 |             |   ``STLXP  W5, X2, X3, [X4]``        |
-  |                                 |             |                                      |
-  |                                 |             |   ``CBNZ   W5, loop``                |
-  |                                 +-------------+--------------------------------------+
-  |                                 | ``LSE``     | ``LDP   X0, X1, [X4]``               |
-  |                                 |             |                                      |
-  |                                 |             | ``loop:``                            |
-  |                                 |             |   ``MOV    X6, X0``                  |
-  |                                 |             |                                      |
-  |                                 |             |   ``MOV    X7, X1``                  |
-  |                                 |             |                                      |
-  |                                 |             |   ``CASPAL X0, X1, X2, X3, [X4]``    |
-  |                                 |             |                                      |
-  |                                 |             |   ``CMP    X0, X6``                  |
-  |                                 |             |                                      |
-  |                                 |             |   ``CCMP   X1, X7, 0, EQ``           |
-  |                                 |             |                                      |
-  |                                 |             |   ``B.NE   loop``                    |
-  |                                 +-------------+--------------------------------------+
-  |                                 | ``LSE128``  | ``MOV    X0, X2``                    |
-  |                                 |             |                                      |
-  |                                 |             | ``MOV    X1, X3``                    |
-  |                                 |             |                                      |
-  |                                 |             | ``SWPPAL X0, X1, [X4]``              |
-  +---------------------------------+-------------+--------------------------------------+
-  | ``fetch_add(loc,val,relaxed)``  | ``BASE``    | ``loop:``                            |
-  |                                 |             |   ``LDXP   X0, X1, [X4]``            |
-  |                                 |             |                                      |
-  |                                 |             |   ``ADDS   X0, X0, X2``              |
-  |                                 |             |                                      |
-  |                                 |             |   ``ADC    X1, X1, X3``              |
-  |                                 |             |                                      |
-  |                                 |             |   ``STXP   W5, X2, X3, [X4]``        |
-  |                                 |             |                                      |
-  |                                 |             |   ``CBNZ   W5, loop``                |
-  |                                 +-------------+--------------------------------------+
-  |                                 | ``LSE``     | ``LDP   X0, X1, [X4]``               |
-  |                                 |             |                                      |
-  |                                 |             | ``loop:``                            |
-  |                                 |             |   ``MOV    X6, X0``                  |
-  |                                 |             |                                      |
-  |                                 |             |   ``MOV    X7, X1``                  |
-  |                                 |             |                                      |
-  |                                 |             |   ``ADDS   X8, X0, X2``              |
-  |                                 |             |                                      |
-  |                                 |             |   ``ADC    X9, X1, X3``              |
-  |                                 |             |                                      |
-  |                                 |             |   ``CASP   X0, X1, X8, X9, [X4]``    |
-  |                                 |             |                                      |
-  |                                 |             |   ``CMP    X0, X6``                  |
-  |                                 |             |                                      |
-  |                                 |             |   ``CCMP   X1, X7, 0, EQ``           |
-  |                                 |             |                                      |
-  |                                 |             |   ``B.NE   loop``                    |
-  +---------------------------------+-------------+--------------------------------------+
-  | ``fetch_add(loc,val,acq)``      | ``BASE``    | ``loop:``                            |
-  |                                 |             |   ``LDAXP  X0, X1, [X4]``            |
-  |                                 |             |                                      |
-  |                                 |             |   ``ADDS   X0, X0, X2``              |
-  |                                 |             |                                      |
-  |                                 |             |   ``ADC    X1, X1, X3``              |
-  |                                 |             |                                      |
-  |                                 |             |   ``STXP   W5, X2, X3, [X4]``        |
-  |                                 |             |                                      |
-  |                                 |             |   ``CBNZ   W5, loop``                |
-  |                                 +-------------+--------------------------------------+
-  |                                 | ``LSE``     | ``LDP   X0, X1, [X4]``               |
-  |                                 |             |                                      |
-  |                                 |             | ``loop:``                            |
-  |                                 |             |   ``MOV    X6, X0``                  |
-  |                                 |             |                                      |
-  |                                 |             |   ``MOV    X7, X1``                  |
-  |                                 |             |                                      |
-  |                                 |             |   ``ADDS   X8, X0, X2``              |
-  |                                 |             |                                      |
-  |                                 |             |   ``ADC    X9, X1, X3``              |
-  |                                 |             |                                      |
-  |                                 |             |   ``CASPA  X0, X1, X8, X9, [X4]``    |
-  |                                 |             |                                      |
-  |                                 |             |   ``CMP    X0, X6``                  |
-  |                                 |             |                                      |
-  |                                 |             |   ``CCMP   X1, X7, 0, EQ``           |
-  |                                 |             |                                      |
-  |                                 |             |   ``B.NE   loop``                    |
-  +---------------------------------+-------------+--------------------------------------+
-  | ``fetch_add(loc,val,rel)``      | ``BASE``    | ``loop:``                            |
-  |                                 |             |   ``LDXP   X0, X1, [X4]``            |
-  |                                 |             |                                      |
-  |                                 |             |   ``ADDS   X0, X0, X2``              |
-  |                                 |             |                                      |
-  |                                 |             |   ``ADC    X1, X1, X3``              |
-  |                                 |             |                                      |
-  |                                 |             |   ``STLXP  W5, X2, X3, [X4]``        |
-  |                                 |             |                                      |
-  |                                 |             |   ``CBNZ   W5, loop``                |
-  |                                 +-------------+--------------------------------------+
-  |                                 | ``LSE``     | ``LDP   X0, X1, [X4]``               |
-  |                                 |             |                                      |
-  |                                 |             | ``loop:``                            |
-  |                                 |             |   ``MOV    X6, X0``                  |
-  |                                 |             |                                      |
-  |                                 |             |   ``MOV    X7, X1``                  |
-  |                                 |             |                                      |
-  |                                 |             |   ``ADDS   X8, X0, X2``              |
-  |                                 |             |                                      |
-  |                                 |             |   ``ADC    X9, X1, X3``              |
-  |                                 |             |                                      |
-  |                                 |             |   ``CASPL  X0, X1, X8, X9, [X4]``    |
-  |                                 |             |                                      |
-  |                                 |             |   ``CMP    X0, X6``                  |
-  |                                 |             |                                      |
-  |                                 |             |   ``CCMP   X1, X7, 0, EQ``           |
-  |                                 |             |                                      |
-  |                                 |             |   ``B.NE   loop``                    |
-  +---------------------------------+-------------+--------------------------------------+
-  | ``fetch_add(loc,val,acq_rel)``  | ``BASE``    | ``loop:``                            |
-  | ``fetch_add(loc,val,sc)``       |             |   ``LDAXP  X0, X1, [X4]``            |
-  |                                 |             |                                      |
-  |                                 |             |   ``ADDS   X0, X0, X2``              |
-  |                                 |             |                                      |
-  |                                 |             |   ``ADC    X1, X1, X3``              |
-  |                                 |             |                                      |
-  |                                 |             |   ``STXLP  W5, X2, X3, [X4]``        |
-  |                                 |             |                                      |
-  |                                 |             |   ``CBNZ   W5, loop``                |
-  |                                 +-------------+--------------------------------------+
-  |                                 | ``LSE``     | ``LDP   X0, X1, [X4]``               |
-  |                                 |             |                                      |
-  |                                 |             | ``loop:``                            |
-  |                                 |             |   ``MOV    X6, X0``                  |
-  |                                 |             |                                      |
-  |                                 |             |   ``MOV    X7, X1``                  |
-  |                                 |             |                                      |
-  |                                 |             |   ``ADDS   X8, X0, X2``              |
-  |                                 |             |                                      |
-  |                                 |             |   ``ADC    X9, X1, X3``              |
-  |                                 |             |                                      |
-  |                                 |             |   ``CASPAL X0, X1, X8, X9, [X4]``    |
-  |                                 |             |                                      |
-  |                                 |             |   ``CMP    X0, X6``                  |
-  |                                 |             |                                      |
-  |                                 |             |   ``CCMP   X1, X7, 0, EQ``           |
-  |                                 |             |                                      |
-  |                                 |             |   ``B.NE   loop``                    |
-  +---------------------------------+-------------+--------------------------------------+
-  | ``fetch_or(loc,val,relaxed)``   | ``LSE128``  | ``MOV      X0, X2``                  |
-  |                                 |             |                                      |
-  |                                 |             | ``MOV      X1, X3``                  |
-  |                                 |             |                                      |
-  |                                 |             | ``LDSETP   X0, X1, [X4]``            |
-  +---------------------------------+-------------+--------------------------------------+
-  | ``fetch_or(loc,val,acq)``       | ``LSE128``  | ``MOV      X0, X2``                  |
-  |                                 |             |                                      |
-  |                                 |             | ``MOV      X1, X3``                  |
-  |                                 |             |                                      |
-  |                                 |             | ``LDSETPA  X0, X1, [X4]``            |
-  +---------------------------------+-------------+--------------------------------------+
-  | ``fetch_or(loc,val,rel)``       | ``LSE128``  | ``MOV      X0, X2``                  |
-  |                                 |             |                                      |
-  |                                 |             | ``MOV      X1, X3``                  |
-  |                                 |             |                                      |
-  |                                 |             | ``LDSETPL  X0, X1, [X4]``            |
-  +---------------------------------+-------------+--------------------------------------+
-  | ``fetch_or(loc,val,acq_rel)``   | ``LSE128``  | ``MOV      X0, X2``                  |
-  | ``fetch_or(loc,val,sc)``        |             |                                      |
-  |                                 |             | ``MOV      X1, X3``                  |
-  |                                 |             |                                      |
-  |                                 |             | ``LDSETPAL X0, X1, [X4]``            |
-  +---------------------------------+-------------+--------------------------------------+
-  | ``fetch_and(loc,val,relaxed)``  | ``LSE128``  | ``MVN      X0, X2``                  |
-  |                                 |             |                                      |
-  |                                 |             | ``MVN      X1, X3``                  |
-  |                                 |             |                                      |
-  |                                 |             | ``LDCLRP   X0, X1, [X4]``            |
-  +---------------------------------+-------------+--------------------------------------+
-  | ``fetch_and(loc,val,acq)``      | ``LSE128``  | ``MVN      X0, X2``                  |
-  |                                 |             |                                      |
-  |                                 |             | ``MVN      X1, X3``                  |
-  |                                 |             |                                      |
-  |                                 |             | ``LDCLRPA  X0, X1, [X4]``            |
-  +---------------------------------+-------------+--------------------------------------+
-  | ``fetch_and(loc,val,rel)``      | ``LSE128``  | ``MVN      X0, X2``                  |
-  |                                 |             |                                      |
-  |                                 |             | ``MVN      X1, X3``                  |
-  |                                 |             |                                      |
-  |                                 |             | ``LDCLRPL  X0, X1, [X4]``            |
-  +---------------------------------+-------------+--------------------------------------+
-  | ``fetch_and(loc,val,acq_rel)``  | ``LSE128``  | ``MVN      X0, X2``                  |
-  | ``fetch_and(loc,val,sc)``       |             |                                      |
-  |                                 |             | ``MVN      X1, X3``                  |
-  |                                 |             |                                      |
-  |                                 |             | ``LDCLRPAL X0, X1, [X4]``            |
-  +---------------------------------+-------------+--------------------------------------+
-  | ``compare_exchange_strong(``    | ``BASE``    | ``loop:``                            |
-  |   ``loc,&exp,val,relaxed,``     |             |   ``LDXP   X6, x7, [X4]``            |
-  |   ``relaxed)``                  |             |                                      |
-  |                                 |             |   ``CMP    X6, X0``                  |
-  |                                 |             |                                      |
-  |                                 |             |   ``CCMP   X7, X1, 0, EQ``           |
-  |                                 |             |                                      |
-  |                                 |             |   ``CSEL   X8, X2, X6, EQ``          |
-  |                                 |             |                                      |
-  |                                 |             |   ``CSEL   X9, X3, X7, EQ``          |
-  |                                 |             |                                      |
-  |                                 |             |   ``STXP   W5, X8, X9, [X4]``        |
-  |                                 |             |                                      |
-  |                                 |             |   ``CBNZ   W5, loop``                |
-  |                                 |             |                                      |
-  |                                 |             | ``MOV   X0, X6``                     |
-  |                                 |             |                                      |
-  |                                 |             | ``MOV   X1, X7``                     |
-  |                                 +-------------+--------------------------------------+
-  |                                 | ``LSE``     | ``CASP    X0, X1, X2, X3, [X4]``     |
-  +---------------------------------+-------------+--------------------------------------+
-  | ``compare_exchange_strong(``    | ``BASE``    | ``loop:``                            |
-  |   ``loc,&exp,val,acq, acq)``    |             |   ``LDAXP  X6, x7, [X4]``            |
-  |                                 |             |                                      |
-  |                                 |             |   ``CMP    X6, X0``                  |
-  |                                 |             |                                      |
-  |                                 |             |   ``CCMP   X7, X1, 0, EQ``           |
-  |                                 |             |                                      |
-  |                                 |             |   ``CSEL   X8, X2, X6, EQ``          |
-  |                                 |             |                                      |
-  |                                 |             |   ``CSEL   X9, X3, X7, EQ``          |
-  |                                 |             |                                      |
-  |                                 |             |   ``STXP   W5, X8, X9, [X4]``        |
-  |                                 |             |                                      |
-  |                                 |             |   ``CBNZ   W5, loop``                |
-  |                                 |             |                                      |
-  |                                 |             | ``MOV   X0, X6``                     |
-  |                                 |             |                                      |
-  |                                 |             | ``MOV   X1, X7``                     |
-  |                                 +-------------+--------------------------------------+
-  |                                 | ``LSE``     | ``CASPA   X0, X1, X2, X3, [X4]``     |
-  +---------------------------------+-------------+--------------------------------------+
-  | ``compare_exchange_strong(``    | ``BASE``    | ``loop:``                            |
-  |   ``loc,&exp,val,rel,rel)``     |             |   ``LDXP   X6, x7, [X4]``            |
-  |                                 |             |                                      |
-  |                                 |             |   ``CMP    X6, X0``                  |
-  |                                 |             |                                      |
-  |                                 |             |   ``CCMP   X7, X1, 0, EQ``           |
-  |                                 |             |                                      |
-  |                                 |             |   ``CSEL   X8, X2, X6, EQ``          |
-  |                                 |             |                                      |
-  |                                 |             |   ``CSEL   X9, X3, X7, EQ``          |
-  |                                 |             |                                      |
-  |                                 |             |   ``STLXP  W5, X8, X9, [X4]``        |
-  |                                 |             |                                      |
-  |                                 |             |   ``CBNZ   W5, loop``                |
-  |                                 |             |                                      |
-  |                                 |             | ``MOV   X0, X6``                     |
-  |                                 |             |                                      |
-  |                                 |             | ``MOV   X1, X7``                     |
-  |                                 +-------------+--------------------------------------+
-  |                                 | ``LSE``     | ``CASPL   X0, X1, X2, X3, [X4]``     |
-  +---------------------------------+-------------+--------------------------------------+
-  | ``compare_exchange_strong(``    | ``BASE``    | ``loop:``                            |
-  |   ``loc,&exp,val,acq_rel,acq)`` |             |   ``LDAXP  X6, x7, [X4]``            |
-  | ``compare_exchange_strong(``    |             |                                      |
-  |   ``loc,&exp,val,sc,sc)``       |             |   ``CMP    X6, X0``                  |
-  |                                 |             |                                      |
-  |                                 |             |   ``CCMP   X7, X1, 0, EQ``           |
-  |                                 |             |                                      |
-  |                                 |             |   ``CSEL   X8, X2, X6, EQ``          |
-  |                                 |             |                                      |
-  |                                 |             |   ``CSEL   X9, X3, X7, EQ``          |
-  |                                 |             |                                      |
-  |                                 |             |   ``STLXP  W5, X8, X9, [X4]``        |
-  |                                 |             |                                      |
-  |                                 |             |   ``CBNZ   W5, loop``                |
-  |                                 |             |                                      |
-  |                                 |             | ``MOV   X0, X6``                     |
-  |                                 |             |                                      |
-  |                                 |             | ``MOV   X1, X7``                     |
-  |                                 +-------------+--------------------------------------+
-  |                                 | ``LSE``     | ``CASPAL  X0, X1, X2, X3, [X4]``     |
-  +---------------------------------+-------------+--------------------------------------+
+  +-----------------------------------------------------+--------------------------------------+
+  | Atomic Operation                                    | Assembly Sequence                    |
+  +=====================================+===============+======================================+
+  | ``store(loc,val,relaxed)``          | ``Armv8-A``   | .. code-block:: none                 |
+  |                                     |               |                                      |
+  |                                     |               |    loop:                             |
+  |                                     |               |      LDXP   XZR, X1, [X4]            |
+  |                                     |               |      STXP   W5, X2, X3, [X4]         |
+  |                                     |               |      CBNZ   W5, loop                 |
+  |                                     +---------------+--------------------------------------+
+  |                                     | ``FEAT_LSE``  | .. code-block:: none                 |
+  |                                     |               |                                      |
+  |                                     |               |      LDP   X0, X1, [X4]              |
+  |                                     |               |    loop:                             |
+  |                                     |               |      MOV    X6, X0                   |
+  |                                     |               |      MOV    X7, X1                   |
+  |                                     |               |      CASP   X0, X1, X2, X3, [X4]     |
+  |                                     |               |      CMP    X0, X6                   |
+  |                                     |               |      CCMP   X1, X7, 0, EQ            |
+  |                                     |               |      B.NE   loop                     |
+  |                                     +---------------+--------------------------------------+
+  |                                     | ``FEAT_LSE2`` | .. code-block:: none                 |
+  |                                     |               |                                      |
+  |                                     |               |    STP   X2, X3, [X4]                |
+  +-------------------------------------+---------------+--------------------------------------+
+  | ``store(loc,val,release)``          | ``Armv8-A``   | .. code-block:: none                 |
+  |                                     |               |                                      |
+  |                                     |               |    loop:                             |
+  |                                     |               |      LDXP   XZR, X1, [X4]            |
+  |                                     |               |      STLXP  W5, X2, X3, [X4]         |
+  |                                     |               |      CBNZ   W5, loop                 |
+  |                                     +---------------+--------------------------------------+
+  |                                     | ``FEAT_LSE``  | .. code-block:: none                 |
+  |                                     |               |                                      |
+  |                                     |               |      LDP   X0, X1, [X4]              |
+  |                                     |               |    loop:                             |
+  |                                     |               |      MOV    X6, X0                   |
+  |                                     |               |      MOV    X7, X1                   |
+  |                                     |               |      CASPL  X0, X1, X2, X3, [X4]     |
+  |                                     |               |      CMP    X0, X6                   |
+  |                                     |               |      CCMP   X1, X7, 0, EQ            |
+  |                                     |               |      B.NE   loop                     |
+  +                                     +---------------+--------------------------------------+
+  |                                     | ``FEAT_LSE2`` | .. code-block:: none                 |
+  |                                     |               |                                      |
+  |                                     |               |    DMB   ISH                         |
+  |                                     |               |    STP   X2, X3, [X4]                |
+  |                                     +---------------+--------------------------------------+
+  |                                     |``FEAT_LRCPC3``| .. code-block:: none                 |
+  |                                     |               |                                      |
+  |                                     |               |    STILP   X2, X3, [X4]              |
+  +-------------------------------------+---------------+--------------------------------------+
+  | ``store(loc,val,seq_cst)``          | ``Armv8-A``   | .. code-block:: none                 |
+  |                                     |               |                                      |
+  |                                     |               |    loop:                             |
+  |                                     |               |      LDAXP   XZR, X1, [X4]           |
+  |                                     |               |      STLXP   W5, X2, X3, [X4]        |
+  |                                     |               |      CBNZ    W5, loop                |
+  |                                     +---------------+--------------------------------------+
+  |                                     | ``FEAT_LSE``  | .. code-block:: none                 |
+  |                                     |               |                                      |
+  |                                     |               |      LDP   X0, X1, [X4]              |
+  |                                     |               |    loop:                             |
+  |                                     |               |      MOV    X6, X0                   |
+  |                                     |               |      MOV    X7, X1                   |
+  |                                     |               |      CASPAL X0, X1, X2, X3, [X4]     |
+  |                                     |               |      CMP    X0, X6                   |
+  |                                     |               |      CCMP   X1, X7, 0, EQ            |
+  |                                     |               |      B.NE   loop                     |
+  +                                     +---------------+--------------------------------------+
+  |                                     | ``FEAT_LSE2`` | .. code-block:: none                 |
+  |                                     |               |                                      |
+  |                                     |               |    DMB   ISH                         |
+  |                                     |               |    STP   X2, X3, [X4]                |
+  |                                     |               |    DMB   ISH                         |
+  |                                     +---------------+--------------------------------------+
+  |                                     |``FEAT_LRCPC3``| .. code-block:: none                 |
+  |                                     |               |                                      |
+  |                                     |               |    STILP   x2, X3, [X4]              |
+  +-------------------------------------+---------------+--------------------------------------+
+  | ``load(loc,relaxed)``               | ``Armv8-A``   | .. code-block:: none                 |
+  |                                     |               |                                      |
+  |                                     |               |    loop:                             |
+  |                                     |               |      LDXP   X0, X1, [X4]             |
+  |                                     |               |      STXP   W5, X0, X1, [X4]         |
+  |                                     |               |      CBNZ   W5, loop                 |
+  |                                     +---------------+--------------------------------------+
+  |                                     | ``FEAT_LSE``  | .. code-block:: none                 |
+  |                                     |               |                                      |
+  |                                     |               |    CASP   X0, X1, X0, X1, [X4]       |
+  |                                     +---------------+--------------------------------------+
+  |                                     | ``FEAT_LSE2`` | .. code-block:: none                 |
+  |                                     |               |                                      |
+  |                                     |               |    LDP   X0, X1, [X4]                |
+  +-------------------------------------+---------------+--------------------------------------+
+  | ``load(loc,acquire)``               | ``Armv8-A``   | .. code-block:: none                 |
+  |                                     |               |                                      |
+  |                                     |               |    loop:                             |
+  |                                     |               |      LDAXP  X0, X1, [X4]             |
+  |                                     |               |      STXP   W5, X0, X1, [X4]         |
+  |                                     |               |      CBNZ   W5, loop                 |
+  |                                     +---------------+--------------------------------------+
+  |                                     | ``FEAT_LSE``  | .. code-block:: none                 |
+  |                                     |               |                                      |
+  |                                     |               |    CASPA  X0, X1, X0, X1, [X4]       |
+  |                                     +---------------+--------------------------------------+
+  |                                     | ``FEAT_LSE2`` | .. code-block:: none                 |
+  |                                     |               |                                      |
+  |                                     |               |    LDP   X0, X1, [X4]                |
+  |                                     |               |    DMB   ISHLD                       |
+  |                                     +---------------+--------------------------------------+
+  |                                     |``FEAT_LRCPC3``| .. code-block:: none                 |
+  |                                     |               |                                      |
+  |                                     |               |    LDIAPP X0, X1, [X4]               |
+  +-------------------------------------+---------------+--------------------------------------+
+  | ``load(loc,seq_cst)``               | ``Armv8-A``   | .. code-block:: none                 |
+  |                                     |               |                                      |
+  |                                     |               |    loop:                             |
+  |                                     |               |      LDAXP  X0, X1, [X4]             |
+  |                                     |               |      STXP   W5, X0, X1, [X4]         |
+  |                                     |               |      CBNZ   W5, loop                 |
+  |                                     +---------------+--------------------------------------+
+  |                                     | ``FEAT_LSE``  | .. code-block:: none                 |
+  |                                     |               |                                      |
+  |                                     |               |    CASPA  X0, X1, X0, X1, [X4]       |
+  |                                     +---------------+--------------------------------------+
+  |                                     | ``FEAT_LSE2`` | .. code-block:: none                 |
+  |                                     |               |                                      |
+  |                                     |               |    LDAR  X5, [X4]                    |
+  |                                     |               |    LDP   X0, X1, [X4]                |
+  |                                     |               |    DMB   ISHLD                       |
+  |                                     +---------------+--------------------------------------+
+  |                                     |``FEAT_LRCPC3``| .. code-block:: none                 |
+  |                                     |               |                                      |
+  |                                     |               |    LDAR   X5, [X4]                   |
+  |                                     |               |    LDIAPP X0, X1, [X4]               |
+  +-------------------------------------+---------------+--------------------------------------+
+  | ``exchange(loc,val,relaxed)``       | ``Armv8-A``   | .. code-block:: none                 |
+  |                                     |               |                                      |
+  |                                     |               |    loop:                             |
+  |                                     |               |      LDXP   X0, X1, [X4]             |
+  |                                     |               |      STXP   W5, X2, X3, [X4]         |
+  |                                     |               |      CBNZ   W5, loop                 |
+  |                                     +---------------+--------------------------------------+
+  |                                     | ``FEAT_LSE``  | .. code-block:: none                 |
+  |                                     |               |                                      |
+  |                                     |               |      LDP   X0, X1, [X4]              |
+  |                                     |               |    loop:                             |
+  |                                     |               |      MOV    X6, X0                   |
+  |                                     |               |      MOV    X7, X1                   |
+  |                                     |               |      CASP   X0, X1, X2, X3, [X4]     |
+  |                                     |               |      CMP    X0, X6                   |
+  |                                     |               |      CCMP   X1, X7, 0, EQ            |
+  |                                     |               |      B.NE   loop                     |
+  |                                     +---------------+--------------------------------------+
+  |                                     |``FEAT_LSE128``| .. code-block:: none                 |
+  |                                     |               |                                      |
+  |                                     |               |    MOV    X0, X2                     |
+  |                                     |               |    MOV    X1, X3                     |
+  |                                     |               |    SWPP   X0, X1, [X4]               |
+  +-------------------------------------+---------------+--------------------------------------+
+  | ``exchange(loc,val,acquire)``       | ``Armv8-A``   | .. code-block:: none                 |
+  |                                     |               |                                      |
+  |                                     |               |    loop:                             |
+  |                                     |               |      LDAXP  X0, X1, [X4]             |
+  |                                     |               |      STXP   W5, X2, X3, [X4]         |
+  |                                     |               |      CBNZ   W5, loop                 |
+  |                                     +---------------+--------------------------------------+
+  |                                     | ``FEAT_LSE``  | .. code-block:: none                 |
+  |                                     |               |                                      |
+  |                                     |               |      LDP   X0, X1, [X4]              |
+  |                                     |               |    loop:                             |
+  |                                     |               |      MOV    X6, X0                   |
+  |                                     |               |      MOV    X7, X1                   |
+  |                                     |               |      CASPA  X0, X1, X2, X3, [X4]     |
+  |                                     |               |      CMP    X0, X6                   |
+  |                                     |               |      CCMP   X1, X7, 0, EQ            |
+  |                                     |               |      B.NE   loop                     |
+  |                                     +---------------+--------------------------------------+
+  |                                     |``FEAT_LSE128``| .. code-block:: none                 |
+  |                                     |               |                                      |
+  |                                     |               |    MOV    X0, X2                     |
+  |                                     |               |    MOV    X1, X3                     |
+  |                                     |               |    SWPPA  X0, X1, [X4]               |
+  +-------------------------------------+---------------+--------------------------------------+
+  | ``exchange(loc,val,release)``       | ``Armv8-A``   | .. code-block:: none                 |
+  |                                     |               |                                      |
+  |                                     |               |    loop:                             |
+  |                                     |               |      LDXP   X0, X1, [X4]             |
+  |                                     |               |      STLXP  W5, X2, X3, [X4]         |
+  |                                     |               |      CBNZ   W5, loop                 |
+  |                                     +---------------+--------------------------------------+
+  |                                     | ``FEAT_LSE``  | .. code-block:: none                 |
+  |                                     |               |                                      |
+  |                                     |               |      LDP   X0, X1, [X4]              |
+  |                                     |               |    loop:                             |
+  |                                     |               |      MOV    X6, X0                   |
+  |                                     |               |      MOV    X7, X1                   |
+  |                                     |               |      CASPL  X0, X1, X2, X3, [X4]     |
+  |                                     |               |      CMP    X0, X6                   |
+  |                                     |               |      CCMP   X1, X7, 0, EQ            |
+  |                                     |               |      B.NE   loop                     |
+  |                                     +---------------+--------------------------------------+
+  |                                     |``FEAT_LSE128``| .. code-block:: none                 |
+  |                                     |               |                                      |
+  |                                     |               |    MOV    X0, X2                     |
+  |                                     |               |    MOV    X1, X3                     |
+  |                                     |               |    SWPPL  X0, X1, [X4]               |
+  +-------------------------------------+---------------+--------------------------------------+
+  | ``exchange(loc,val,acq_rel)``       | ``Armv8-A``   | .. code-block:: none                 |
+  |                                     |               |                                      |
+  | ``exchange(loc,val,seq_cst)``       |               |    loop:                             |
+  |                                     |               |      LDAXP  X0, X1, [X4]             |
+  |                                     |               |      STLXP  W5, X2, X3, [X4]         |
+  |                                     |               |      CBNZ   W5, loop                 |
+  |                                     +---------------+--------------------------------------+
+  |                                     | ``FEAT_LSE``  | .. code-block:: none                 |
+  |                                     |               |                                      |
+  |                                     |               |      LDP   X0, X1, [X4]              |
+  |                                     |               |    loop:                             |
+  |                                     |               |      MOV    X6, X0                   |
+  |                                     |               |      MOV    X7, X1                   |
+  |                                     |               |      CASPAL X0, X1, X2, X3, [X4]     |
+  |                                     |               |      CMP    X0, X6                   |
+  |                                     |               |      CCMP   X1, X7, 0, EQ            |
+  |                                     |               |      B.NE   loop                     |
+  |                                     +---------------+--------------------------------------+
+  |                                     |``FEAT_LSE128``| .. code-block:: none                 |
+  |                                     |               |                                      |
+  |                                     |               |    MOV    X0, X2                     |
+  |                                     |               |    MOV    X1, X3                     |
+  |                                     |               |    SWPPAL X0, X1, [X4]               |
+  +-------------------------------------+---------------+--------------------------------------+
+  | ``fetch_add(loc,val,relaxed)``      | ``Armv8-A``   | .. code-block:: none                 |
+  |                                     |               |                                      |
+  |                                     |               |    loop:                             |
+  |                                     |               |      LDXP   X0, X1, [X4]             |
+  |                                     |               |      ADDS   X0, X0, X2               |
+  |                                     |               |      ADC    X1, X1, X3               |
+  |                                     |               |      STXP   W5, X0, X1, [X4]         |
+  |                                     |               |      CBNZ   W5, loop                 |
+  |                                     +---------------+--------------------------------------+
+  |                                     | ``FEAT_LSE``  | .. code-block:: none                 |
+  |                                     |               |                                      |
+  |                                     |               |      LDP   X0, X1, [X4]              |
+  |                                     |               |    loop:                             |
+  |                                     |               |      MOV    X6, X0                   |
+  |                                     |               |      MOV    X7, X1                   |
+  |                                     |               |      ADDS   X8, X0, X2               |
+  |                                     |               |      ADC    X9, X1, X3               |
+  |                                     |               |      CASP   X0, X1, X8, X9, [X4]     |
+  |                                     |               |      CMP    X0, X6                   |
+  |                                     |               |      CCMP   X1, X7, 0, EQ            |
+  |                                     |               |      B.NE   loop                     |
+  +-------------------------------------+---------------+--------------------------------------+
+  | ``fetch_add(loc,val,acquire)``      | ``Armv8-A``   | .. code-block:: none                 |
+  |                                     |               |                                      |
+  |                                     |               |    loop:                             |
+  |                                     |               |      LDAXP  X0, X1, [X4]             |
+  |                                     |               |      ADDS   X0, X0, X2               |
+  |                                     |               |      ADC    X1, X1, X3               |
+  |                                     |               |      STXP   W5, X0, X1, [X4]         |
+  |                                     |               |      CBNZ   W5, loop                 |
+  |                                     +---------------+--------------------------------------+
+  |                                     | ``FEAT_LSE``  | .. code-block:: none                 |
+  |                                     |               |                                      |
+  |                                     |               |      LDP   X0, X1, [X4]              |
+  |                                     |               |    loop:                             |
+  |                                     |               |      MOV    X6, X0                   |
+  |                                     |               |      MOV    X7, X1                   |
+  |                                     |               |      ADDS   X8, X0, X2               |
+  |                                     |               |      ADC    X9, X1, X3               |
+  |                                     |               |      CASPA  X0, X1, X8, X9, [X4]     |
+  |                                     |               |      CMP    X0, X6                   |
+  |                                     |               |      CCMP   X1, X7, 0, EQ            |
+  |                                     |               |      B.NE   loop                     |
+  +-------------------------------------+---------------+--------------------------------------+
+  | ``fetch_add(loc,val,release)``      | ``Armv8-A``   | .. code-block:: none                 |
+  |                                     |               |                                      |
+  |                                     |               |    loop:                             |
+  |                                     |               |      LDXP   X0, X1, [X4]             |
+  |                                     |               |      ADDS   X0, X0, X2               |
+  |                                     |               |      ADC    X1, X1, X3               |
+  |                                     |               |      STLXP  W5, X0, X1, [X4]         |
+  |                                     |               |      CBNZ   W5, loop                 |
+  |                                     +---------------+--------------------------------------+
+  |                                     | ``FEAT_LSE``  | .. code-block:: none                 |
+  |                                     |               |                                      |
+  |                                     |               |      LDP   X0, X1, [X4]              |
+  |                                     |               |    loop:                             |
+  |                                     |               |      MOV    X6, X0                   |
+  |                                     |               |      MOV    X7, X1                   |
+  |                                     |               |      ADDS   X8, X0, X2               |
+  |                                     |               |      ADC    X9, X1, X3               |
+  |                                     |               |      CASPL  X0, X1, X8, X9, [X4]     |
+  |                                     |               |      CMP    X0, X6                   |
+  |                                     |               |      CCMP   X1, X7, 0, EQ            |
+  |                                     |               |      B.NE   loop                     |
+  +-------------------------------------+---------------+--------------------------------------+
+  | ``fetch_add(loc,val,acq_rel)``      | ``Armv8-A``   | .. code-block:: none                 |
+  |                                     |               |                                      |
+  | ``fetch_add(loc,val,seq_cst)``      |               |    loop:                             |
+  |                                     |               |      LDAXP  X0, X1, [X4]             |
+  |                                     |               |      ADDS   X0, X0, X2               |
+  |                                     |               |      ADC    X1, X1, X3               |
+  |                                     |               |      STLXP  W5, X0, X1, [X4]         |
+  |                                     |               |      CBNZ   W5, loop                 |
+  |                                     +---------------+--------------------------------------+
+  |                                     | ``FEAT_LSE``  | .. code-block:: none                 |
+  |                                     |               |                                      |
+  |                                     |               |      LDP   X0, X1, [X4]              |
+  |                                     |               |    loop:                             |
+  |                                     |               |      MOV    X6, X0                   |
+  |                                     |               |      MOV    X7, X1                   |
+  |                                     |               |      ADDS   X8, X0, X2               |
+  |                                     |               |      ADC    X9, X1, X3               |
+  |                                     |               |      CASPAL X0, X1, X8, X9, [X4]     |
+  |                                     |               |      CMP    X0, X6                   |
+  |                                     |               |      CCMP   X1, X7, 0, EQ            |
+  |                                     |               |      B.NE   loop                     |
+  +-------------------------------------+---------------+--------------------------------------+
+  | ``fetch_or(loc,val,relaxed)``       |``FEAT_LSE128``| .. code-block:: none                 |
+  |                                     |               |                                      |
+  |                                     |               |    MOV    X0, X2                     |
+  |                                     |               |    MOV    X1, X3                     |
+  |                                     |               |    LDSETP X0, X1, [X4]               |
+  +-------------------------------------+---------------+--------------------------------------+
+  | ``fetch_or(loc,val,acquire)``       |``FEAT_LSE128``| .. code-block:: none                 |
+  |                                     |               |                                      |
+  |                                     |               |    MOV     X0, X2                    |
+  |                                     |               |    MOV     X1, X3                    |
+  |                                     |               |    LDSETPA X0, X1, [X4]              |
+  +-------------------------------------+---------------+--------------------------------------+
+  | ``fetch_or(loc,val,release)``       |``FEAT_LSE128``| .. code-block:: none                 |
+  |                                     |               |                                      |
+  |                                     |               |    MOV     X0, X2                    |
+  |                                     |               |    MOV     X1, X3                    |
+  |                                     |               |    LDSETPL X0, X1, [X4]              |
+  +-------------------------------------+---------------+--------------------------------------+
+  | ``fetch_or(loc,val,acq_rel)``       |``FEAT_LSE128``| .. code-block:: none                 |
+  |                                     |               |                                      |
+  | ``fetch_or(loc,val,seq_cst)``       |               |    MOV      X0, X2                   |
+  |                                     |               |    MOV      X1, X3                   |
+  |                                     |               |    LDSETPAL X0, X1, [X4]             |
+  +-------------------------------------+---------------+--------------------------------------+
+  | ``fetch_and(loc,val,relaxed)``      |``FEAT_LSE128``| .. code-block:: none                 |
+  |                                     |               |                                      |
+  |                                     |               |    MVN    X0, X2                     |
+  |                                     |               |    MVN    X1, X3                     |
+  |                                     |               |    LDCLRP X0, X1, [X4]               |
+  +-------------------------------------+---------------+--------------------------------------+
+  | ``fetch_and(loc,val,acquire)``      |``FEAT_LSE128``| .. code-block:: none                 |
+  |                                     |               |                                      |
+  |                                     |               |    MVN     X0, X2                    |
+  |                                     |               |    MNV     X1, X3                    |
+  |                                     |               |    LDCLRPA X0, X1, [X4]              |
+  +-------------------------------------+---------------+--------------------------------------+
+  | ``fetch_and(loc,val,release)``      |``FEAT_LSE128``| .. code-block:: none                 |
+  |                                     |               |                                      |
+  |                                     |               |    MVN     X0, X2                    |
+  |                                     |               |    MVN     X1, X3                    |
+  |                                     |               |    LDCLRPL X0, X1, [X4]              |
+  +-------------------------------------+---------------+--------------------------------------+
+  | ``fetch_and(loc,val,acq_rel)``      |``FEAT_LSE128``| .. code-block:: none                 |
+  |                                     |               |                                      |
+  | ``fetch_and(loc,val,seq_cst)``      |               |    MVN      X0, X2                   |
+  |                                     |               |    MVN      X1, X3                   |
+  |                                     |               |    LDCLRPAL X0, X1, [X4]             |
+  +-------------------------------------+---------------+--------------------------------------+
+  | ``compare_exchange_strong(``        | ``Armv8-A``   | .. code-block:: none                 |
+  |   ``loc,exp,val,relaxed,relaxed)``  |               |                                      |
+  |                                     |               |    loop:                             |
+  |                                     |               |      LDXP   X6, X7, [X4]             |
+  |                                     |               |      CMP    X6, X0                   |
+  |                                     |               |      CCMP   X7, X1, 0, EQ            |
+  |                                     |               |      CSEL   X8, X2, X6, EQ           |
+  |                                     |               |      CSEL   X9, X3, X7, EQ           |
+  |                                     |               |      STXP   W5, X8, X9, [X4]         |
+  |                                     |               |      CBNZ   W5, loop                 |
+  |                                     |               |      MOV    X0, X6                   |
+  |                                     |               |      MOV    X1, X7                   |
+  |                                     +---------------+--------------------------------------+
+  |                                     | ``FEAT_LSE``  | .. code-block:: none                 |
+  |                                     |               |                                      |
+  |                                     |               |    CASP    X0, X1, X2, X3, [X4]      |
+  +-------------------------------------+---------------+--------------------------------------+
+  | ``compare_exchange_strong(``        | ``Armv8-A``   | .. code-block:: none                 |
+  |   ``loc,exp,val,acquire,acquire)``  |               |                                      |
+  |                                     |               |    loop:                             |
+  | ``compare_exchange_strong(``        |               |      LDAXP  X6, X7, [X4]             |
+  |   ``loc,exp,val,acquire,relaxed)``  |               |      CMP    X6, X0                   |
+  |                                     |               |      CCMP   X7, X1, 0, EQ            |
+  |                                     |               |      CSEL   X8, X2, X6, EQ           |
+  |                                     |               |      CSEL   X9, X3, X7, EQ           |
+  |                                     |               |      STXP   W5, X8, X9, [X4]         |
+  |                                     |               |      CBNZ   W5, loop                 |
+  |                                     |               |      MOV    X0, X6                   |
+  |                                     |               |      MOV    X1, X7                   |
+  |                                     +---------------+--------------------------------------+
+  |                                     | ``FEAT_LSE``  | .. code-block:: none                 |
+  |                                     |               |                                      |
+  |                                     |               |    CASPA   X0, X1, X2, X3, [X4]      |
+  +-------------------------------------+---------------+--------------------------------------+
+  | ``compare_exchange_strong(``        | ``Armv8-A``   | .. code-block:: none                 |
+  |   ``loc,exp,val,release,relaxed)``  |               |                                      |
+  |                                     |               |    loop:                             |
+  |                                     |               |      LDXP   X6, X7, [X4]             |
+  |                                     |               |      CMP    X6, X0                   |
+  |                                     |               |      CCMP   X7, X1, 0, EQ            |
+  |                                     |               |      CSEL   X8, X2, X6, EQ           |
+  |                                     |               |      CSEL   X9, X3, X7, EQ           |
+  |                                     |               |      STLXP  W5, X8, X9, [X4]         |
+  |                                     |               |      CBNZ   W5, loop                 |
+  |                                     |               |      MOV    X0, X6                   |
+  |                                     |               |      MOV    X1, X7                   |
+  |                                     +---------------+--------------------------------------+
+  |                                     | ``FEAT_LSE``  | .. code-block:: none                 |
+  |                                     |               |                                      |
+  |                                     |               |    CASPL   X0, X1, X2, X3, [X4]      |
+  +-------------------------------------+---------------+--------------------------------------+
+  | ``compare_exchange_strong(``        | ``Armv8-A``   | .. code-block:: none                 |
+  |   ``loc,exp,val,acq_rel,acquire)``  |               |                                      |
+  |                                     |               |    loop:                             |
+  | ``compare_exchange_strong(``        |               |      LDAXP  X6, X7, [X4]             |
+  |   ``loc,exp,val,seq_cst,acquire)``  |               |      CMP    X6, X0                   |
+  |                                     |               |      CCMP   X7, X1, 0, EQ            |
+  |                                     |               |      CSEL   X8, X2, X6, EQ           |
+  |                                     |               |      CSEL   X9, X3, X7, EQ           |
+  |                                     |               |      STLXP  W5, X8, X9, [X4]         |
+  |                                     |               |      CBNZ   W5, loop                 |
+  |                                     |               |      MOV    X0, X6                   |
+  |                                     |               |      MOV    X1, X7                   |
+  |                                     +---------------+--------------------------------------+
+  |                                     | ``FEAT_LSE``  | .. code-block:: none                 |
+  |                                     |               |                                      |
+  |                                     |               |    CASPAL  X0, X1, X2, X3, [X4]      |
+  +-------------------------------------+---------------+--------------------------------------+
+
+
+
 
 
 We do not list other variants of ``fetch_<op>`` since their Mappings should be

From 72faa506e77dff140e14df3d44690e7e1b5aeafb Mon Sep 17 00:00:00 2001
From: Wilco Dijkstra <wdijkstr@arm.com>
Date: Mon, 19 Aug 2024 12:21:53 +0100
Subject: [PATCH 06/17] Further fixes to tables and special cases.

---
 atomicsabi64/atomicsabi64.rst | 58 ++++++++++++-----------------------
 1 file changed, 19 insertions(+), 39 deletions(-)

diff --git a/atomicsabi64/atomicsabi64.rst b/atomicsabi64/atomicsabi64.rst
index 92ffc2ad..9530aa25 100644
--- a/atomicsabi64/atomicsabi64.rst
+++ b/atomicsabi64/atomicsabi64.rst
@@ -394,9 +394,6 @@ To reduce repetition, we use the following notational conventions
   | ``memory_order_seq_cst``                | ``seq_cst``                          |
   +-----------------------------------------+--------------------------------------+
 
-In what follows ``loc`` refers to the location, ``val`` refers to a value
-parameter.
-
 Arbitrary registers may be used in the Assembly Sequences that may change in
 compiler implementations. Cases where arbitrary registers may *not* be used are
 covered in the Special Cases section.
@@ -449,7 +446,8 @@ Mappings for 32-bit types
 -------------------------
 
 In what follows, register ``X1`` contains the location ``loc`` and ``W2``
-contains ``val``. The result is returned in ``W0``.
+contains ``val``. ``W0`` contains input ``exp`` in compare-exchange.  The result is
+returned in ``W0``.
 
 .. table::
 
@@ -587,7 +585,8 @@ contains ``val``. The result is returned in ``W0``.
   |                                     |               |    LDADDAL W0, W2, [X1] *            |
   +-------------------------------------+---------------+--------------------------------------+
   | ``compare_exchange_strong(``        | ``Armv8-A``   | .. code-block:: none                 |
-  |   ``loc,&exp,val,relaxed,relaxed)`` |               |                                      |
+  |   ``loc,exp,val,relaxed,relaxed)``  |               |                                      |
+  |                                     |               |      MOV    W4, W0                   |
   |                                     |               |    loop:                             |
   |                                     |               |      LDXR   W0, [X1]                 |
   |                                     |               |      CMP    W0, W4                   |
@@ -601,7 +600,8 @@ contains ``val``. The result is returned in ``W0``.
   |                                     |               |    CAS    W0, W2, [X1] *             |
   +-------------------------------------+---------------+--------------------------------------+
   | ``compare_exchange_strong(``        | ``Armv8-A``   | .. code-block:: none                 |
-  |   ``loc,&exp,val,acquire,acquire)`` |               |                                      |
+  |   ``loc,exp,val,acquire,acquire)``  |               |                                      |
+  |                                     |               |      MOV    W4, W0                   |
   |                                     |               |    loop:                             |
   |                                     |               |      LDAXR  W0, [X1]                 |
   |                                     |               |      CMP    W0, W4                   |
@@ -615,7 +615,8 @@ contains ``val``. The result is returned in ``W0``.
   |                                     |               |    CASA   W0, W2, [X1] *             |
   +-------------------------------------+---------------+--------------------------------------+
   | ``compare_exchange_strong(``        | ``Armv8-A``   | .. code-block:: none                 |
-  |   ``loc,&exp,val,release,release)`` |               |                                      |
+  |   ``loc,exp,val,release,release)``  |               |                                      |
+  |                                     |               |      MOV    W4, W0                   |
   |                                     |               |    loop:                             |
   |                                     |               |      LDXR   W0, [X1]                 |
   |                                     |               |      CMP    W0, W4                   |
@@ -629,9 +630,10 @@ contains ``val``. The result is returned in ``W0``.
   |                                     |               |    CASL   W0, W2, [X1] *             |
   +-------------------------------------+---------------+--------------------------------------+
   | ``compare_exchange_strong(``        | ``Armv8-A``   | .. code-block:: none                 |
-  |   ``loc,&exp,val,acq_rel,acquire)`` |               |                                      |
+  |   ``loc,exp,val,acq_rel,acquire)``  |               |                                      |
+  |                                     |               |      MOV    W4, W0                   |
   | ``compare_exchange_strong(``        |               |    loop:                             |
-  |   ``loc,&exp,val,seq_cst,seq_cst)`` |               |      LDAXR  W0, [X1]                 |
+  |   ``loc,exp,val,seq_cst,seq_cst)``  |               |      LDAXR  W0, [X1]                 |
   |                                     |               |      CMP    W0, W4                   |
   |                                     |               |      B.NE   fail                     |
   |                                     |               |      STLXR  W3, W2, [X1]             |
@@ -675,7 +677,8 @@ width, the following Mappings use *pair* instructions, which require their own
 table.
 
 In what follows, register ``X4`` contains the location ``loc``, ``X2`` and 
-``X3`` contain the input value. The result is returned in ``X0`` and ``X1``.
+``X3`` contain the input value ``val``. ``X0`` and ``X1`` contain input ``exp`` in
+compare-exchange. The result is returned in ``X0`` and ``X1``.
 
 .. table::
 
@@ -1132,35 +1135,12 @@ Sequences exist, are stated (for instance ``fetch_or`` can be implemented using
 Special Cases
 -------------
 
-There are special cases in the Mappings presented above, these must be handled
-in order to prevent unexpected outcomes of the compiled program. The special 
-cases are identified below.
-
-* Re-Ordering of Read-Modify-Write Effects and Acquire Fence
-* Const-Qualified 128-bit Atomic Loads
-
-Destination Register Should Not Be Zero Register for Read-Modify-Writes
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-A compiler is not permitted to rewrite the destination register to be the
-zero register for atomic operations that make use of ``SWP`` and ``LD<OP>``
-Assembly instructions. These include but are not limited to:
-
-.. table::
-
-  +-----------------------------------------+--------------------------------------+
-  | Atomic Operation                        | Assembly Sequence                    |
-  +=========================================+======================================+
-  | ``exchange(loc,val,sc)``                | ``MOV W4, #val;``                    |
-  |                                         | ``SWP W4, W10, [X1]``                |
-  +-----------------------------------------+--------------------------------------+
-  | ``fetch_add(loc,val,sc)``               | ``MOV W4, #val;``                    |
-  |                                         | ``LDADD W4, W10, [X1]``              |
-  +-----------------------------------------+--------------------------------------+
-
-Where ``X1`` contains the address of ``loc``.
+Read-Modify-Write atomics must not use the zero register
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
-We annotate Mappings affected with ``*`` in section 4.2.
+``CAS``, ``SWP`` and ``LD<OP>`` instructions must not use the zero register if
+the result is not used since it allows reordering of the read past a
+``DMB ISHLD`` barrier. Affected instructions are marked with ``*`` in section 4.2.
 
 Const-Qualified 128-bit Atomic Loads Should Be Marked Mutable
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -1170,7 +1150,7 @@ in read-only memory (such as the ``.rodata`` section).
 
 Before LSE2, the only way to implement a single-copy 128-bit atomic load
 is by using a Read-Modify-Write sequence. The write is not visible to
-software if the memory is writeable. Compilers and runtimes should use the
+software if the memory is writeable. Compilers and runtimes should prefer the
 LSE2/LRCPC3 sequence when available.
 
 

From f32935316f283c74efedb0bec6b7ca7c9a8d917c Mon Sep 17 00:00:00 2001
From: lukeg101 <6547672+lukeg101@users.noreply.github.com>
Date: Mon, 19 Aug 2024 15:23:04 +0100
Subject: [PATCH 07/17] Addresses Ties Feedback

---
 atomicsabi64/atomicsabi64.rst    | 49 ++++++++++++-------------
 design-documents/atomics-ABI.rst | 63 ++++++++++++++++----------------
 2 files changed, 54 insertions(+), 58 deletions(-)

diff --git a/atomicsabi64/atomicsabi64.rst b/atomicsabi64/atomicsabi64.rst
index 9530aa25..afc7894a 100644
--- a/atomicsabi64/atomicsabi64.rst
+++ b/atomicsabi64/atomicsabi64.rst
@@ -4,7 +4,7 @@
    See LICENSE file for details
 
 .. |release| replace:: 2024Q1
-.. |date-of-issue| replace:: 5\ :sup:`th` July 2024
+.. |date-of-issue| replace:: 19\ :sup:`th` August 2024
 .. |copyright-date| replace:: 2024
 .. |footer| replace:: Copyright © |copyright-date|, Arm Limited and its
                       affiliates. All rights reserved.
@@ -14,6 +14,7 @@
 .. _CPPABI64: https://github.com/ARM-software/abi-aa/releases
 .. _CSTD: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1548.pdf
 .. _PAPER: https://doi.org/10.1109/CGO57630.2024.10444836
+.. _OOPSLA: https://2024.splashcon.org/track/splash-2024-oopsla#event-overview
 
 *********************************************************************************************
 C/C++ Atomics Application Binary Interface Standard for the Arm\ :sup:`®` 64-bit Architecture
@@ -47,10 +48,9 @@ Abstract
 
 This document describes the C/C++ Atomics Application Binary Interface for the
 Arm 64-bit architecture. This document concerns the valid Mappings from C/C++
-Atomic Operations to sequences of A64 instructions. For matters concerning the
-memory model, please consult §B2 of the Arm Architecture Reference Manual
-[ARMARM_]. We focus only on a subset of the C11 atomic operations at the time
-of writing.
+Atomic Operations to sequences of A64 instructions. Regarding the memory model, 
+please consult §B2 of the Arm Architecture Reference Manual [ARMARM_]. This 
+document only focusses on a subset of C11 atomic operations.
 
 Keywords
 --------
@@ -76,11 +76,11 @@ on GitHub
 Acknowledgement
 ---------------
 
-This document came about in the process of Luke Geeson’s PhD on testing the
+This ABI was written as part of Luke Geeson’s PhD on testing the
 compilation of concurrent C/C++ with assistance from Wilco Dijkstra from Arm's
 Compiler Teams.
 
-This ABI arises from a paper to appear at OOPSLA 2024:
+It is an offshoot from a paper that will be presented at OOPSLA 2024 [OOPSLA_]:
 *Mix Testing: Specifying and Testing ABI Compatibility Of C/C++ Atomics Implementations*
 by Luke Geeson, James Brotherston, Wilco Dijkstra, Alastair Donaldson, Lee Smith,
 Tyler Sorensen, and John Wickerson.
@@ -203,7 +203,7 @@ specifications:
    The content of this specification is a draft, and Arm considers the
    likelihood of future incompatible changes to be significant.
 
-All content in this document is at the **Alpha** quality level.
+All content in this document is at the **Release** quality level.
 
 Change History
 --------------
@@ -218,7 +218,7 @@ changes to the content of the document for that release.
   +---------+------------------------------+-------------------------------------------------------------------+
   | Issue   | Date                         | Change                                                            |
   +=========+==============================+===================================================================+
-  | 00alp0  | 5\ :sup:`th` July 2024.      | Beta release.                                                     |
+  | 00rel0  | 19\ :sup:`th` August 2024.   | Release.                                                          |
   +---------+------------------------------+-------------------------------------------------------------------+
   
 
@@ -282,7 +282,7 @@ Thread of Execution
    Synchronization Operations or other C language statements. The Arm
    Architecture Reference Manual [ARMARM_] calls these *Observers*. Typically a
    thread is defined as a function (e.g. a POSIX thread) although we do not
-   limit threads to such implementations.
+   limit threads to this type of implementation.
 
 Atomic Operation
    A C/C++ operation on a Shared-Memory Location. Typically either a load,
@@ -301,7 +301,7 @@ Concurrent Program
    Arm-based machines that run the A64 instruction set.
 
 Synchronization Operation
-   The order that atomic operations are executed by each Thread of Execution
+   The order in which atomic operations are executed by each Thread of Execution
    may not be the same as the order they are written in the program.
    Synchronization Operations are statements that constrain the order of
    accesses made to Shared-Memory Locations by each thread. Synchronization
@@ -347,10 +347,10 @@ Overview
 ========
 
 The C/C++ Atomics ABI for the Arm 64-bit architecture (AABI64) comprises the
-following sub-components.
+following sub-components:
 
-* The `Mappings from Atomic Operations to Assembly Sequences`_, which defines
-  the Mappings from C/C++ atomic operations to sto one of more Assembly 
+* The `Mappings from Atomic Operations to Assembly Sequences`_ defines
+  the Mappings from C/C++ atomic operations to the Assembly 
   Sequences that are interoperable with respect to each other.
 
 * A `Declarative statement of Mappings compatibility`_, as far as
@@ -367,13 +367,10 @@ Assembly Sequences. Since there is a large number of ways these Mappings may be
 combined, we break down the tables by the width of the access, and list
 compatible Assembly Sequences for each Atomic Operation.
 
-This is an open ABI, we encourage improvements to this specification to be
-submitted to the `issue tracker page on
+This is an open ABI, we encourage suggestions and improvements to this 
+specification to be submitted to the `issue tracker page on
 GitHub <https://github.com/ARM-software/abi-aa/issues>`_.
 
-These Mappings are not exhaustive, but aim to cover the atomics we have tested.
-Please request more atomics using the issue tracker.
-
 Notational Conventions
 ----------------------
 To reduce repetition, we use the following notational conventions
@@ -1125,12 +1122,12 @@ compare-exchange. The result is returned in ``X0`` and ``X1``.
 
 
 We do not list other variants of ``fetch_<op>`` since their Mappings should be
-the same (modulo implementations of <op> that are not in scope of this
+the same (modulo implementations of <op> that are not in scope for this
 document). Precisely, implementations that use loops should use the instructions
 that load or store from memory with the relevant memory order, and the
 appropriate <op> Assembly Sequence inside the loop. Exceptions, where Assembly 
-Sequences exist, are stated (for instance ``fetch_or`` can be implemented using
-``LDSETP`` when the LSE128 extension is enabled).
+Sequences exist, are stated. For instance ``fetch_or`` can be implemented using
+``LDSETP`` when the LSE128 extension is enabled.
 
 Special Cases
 -------------
@@ -1157,7 +1154,7 @@ LSE2/LRCPC3 sequence when available.
 Declarative statement of Mappings compatibility
 ===============================================
 
-To ensure that the above Mappings are ABI-compatible we tested the compilation of
+To ensure that the above Mappings are ABI-compatible we test the compilation of
 Concurrent Programs, where each Atomic Operation is compiled to one of the
 aforementioned Mappings. We test if there is a compiled program that exhibits
 an outcome of execution according to the AArch64 Memory Model contained in §B2
@@ -1168,7 +1165,7 @@ define the process by which we test compatibility.
 Definition of ABI-Compatibility for Atomic Operations
 -----------------------------------------------------
 
-*A compiler that implements the above set of Mappings and special cases is ABI-Compatible with
+*A compiler that implement these Mappings and special cases is ABI-Compatible with
 respect to other compilers that implement the Mappings and special cases.*
 
 We impose some constraints on this definition:
@@ -1183,10 +1180,10 @@ We impose some constraints on this definition:
   ABI-Compatibility of Concurrent Programs outside these bounds.
 * We test Concurrent Programs with a fixed initial state, loop unroll factor
   (equal to 1 loop unroll), and function calls or recursion. 
-* The above Mappings are not exhaustive, we recommend that Arm's partners
+* The above Mappings are not exhaustive. We recommend that Arm's partners
   submit requests for other Mappings to the ABI team using the `issue tracker page on GitHub <https://github.com/ARM-software/abi-aa/issues>`_.
 * This document makes no statement about the ABI-Compatibility of optimised
-  Concurrent Programs, nor does a statement concerning the performance of
+  Concurrent Programs, nor does it make a statement concerning the performance of
   compiled programs under the above Mappings when executed on a given Arm-based
   machine.
 * This document makes no statement about the ABI-Compatibility of compilers
diff --git a/design-documents/atomics-ABI.rst b/design-documents/atomics-ABI.rst
index 0b4f890c..f63c461e 100644
--- a/design-documents/atomics-ABI.rst
+++ b/design-documents/atomics-ABI.rst
@@ -89,7 +89,7 @@ We need a baseline ABI in order to determine if a given sub-ABI respects or depa
 from the baseline. Adding command-line options is a logical consequence of defining such an ABI, 
 and makes it possible to track ABI compatibility of concurrent programs at compile or link-time,
 rather than runtime. It is the responsibility of the sub-ABI maintainer to ensure code built
-under their ABI does not mix with code built under the baseline. But a baseline must exist, 
+under their ABI does not mix with code built under the baseline. But a baseline must exist 
 for sub-ABI compatibility to be decided in the first place.
 
 A baseline provides the means to describe or contain ABI-islands. Where a compiler implementation
@@ -97,12 +97,11 @@ departs from the baseline completely (an ABI-island), it would be the responsibi
 maintainer of that implementation to ensure their programs are not mixed with programs built for 
 baseline ABI compatibility, or provide adequate warnings at compile time. 
 
-Further, numerous different parties have asked the ABI team whether
-the same atomics mapping is correct. Writing down the known cases helps engineers
-answer these queries without the concurrency expertise required to come up with
-current compatible mappings. A future section of the ABI could document common
-queries received by the ABI team, in order to assist implementers and engineers
-with such issues.
+Further, numerous parties have asked the ABI team whether the same atomics mapping is correct. 
+Writing down the known cases helps engineers answer these queries without the concurrency 
+expertise required to come up with current compatible mappings. A future section of the ABI 
+could document common queries received by the ABI team, in order to assist implementers and 
+engineers with such issues.
 
 Backwards Compatibility and New Architecture Features
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -110,7 +109,7 @@ Backwards Compatibility and New Architecture Features
 Put another way, a baseline ABI assists in deciding whether new mappings are compatible
 with compiler implementations targeting older versions of the Armv8 architecture.
 Certain instructions (such as Load/Store-Pair instructions [ARMARM_]) have different
-single-copy atomicity guarantees with respect different architecture versions. A baseline
+single-copy atomicity guarantees with respect to different architecture versions. A baseline
 decides which assembly sequences can be composed correctly (at least as far as testing can decide).
 
 
@@ -119,11 +118,11 @@ Compatibility Between Compilers and Runtimes
 
 The above issues also apply when ensuring object files compiled with different compilers can be mixed. 
 For instance LLVM and GCC code should be interoperable. At the time of writing we identified a number of
-places where this does not apply, both when compiling to target the same architecture version, and mixing
+places where this does not apply, both when compiling to target the same architecture version, and when mixing
 different (compatible) architecture versions. Further, the above is not limited to statically compiled code. We found
-one instance where proposed mappings implemented in a JiT compiler would not be interoperable with respect
-to the statically compiled code the runtime links against. Even if a JiT compiles under one set of mappings,
-and is not subject to an ABI, it may still depend on other libraries or components that do have an ABI.
+one instance where proposed mappings implemented in a JiT compiler would not be interoperable with statically 
+compiled code the runtime links against. Even if a JiT compiles under one set of mappings, and is not subject to 
+an ABI, it may still depend on other libraries or components that do have an ABI.
 
 
 Constrain optimisations
@@ -168,7 +167,7 @@ possible outcomes (a reference for this notation is found here [PAPER_])::
 
 In this case the value read by the exchange on ``thread_1`` is not used, and a
 compiler is free to remove references to unused data. It is not legal according
-to this ABI for a compliant implementation piler to translate the program into
+to this ABI for a compliant implementation to translate the program into
 the following Assembly Sequences::
 
   thread_0:
@@ -204,14 +203,14 @@ Reference Manual [ARMARM_]::
 
 By comparing ``W3`` and the local variable ``r0`` of the original Concurrent
 Program we see there is one additional outcome of executing the compiled
-program that is not an outcome of executing the Concurrent Program. This is due
-to the fact that according to the Arm Architecture Reference Manual [ARMARM_] 
-*instructions where the destination register is WZR or XZR, are not regarded as
-doing a read for the purpose of a DMB LD barrier.*
+program that is not an outcome of executing the Concurrent Program. The Arm 
+Architecture Reference Manual [ARMARM_] states that *instructions where the 
+destination register is WZR or XZR, are not regarded as doing a read for the 
+purpose of a DMB LD barrier.*
 
 In this case the compiler introduces another outcome of Execution. To fix this
 issue, a compiler is not permitted to rewrite the destination register to be the
-zero register in this case::
+zero register::
 
   thread_0:
     MOV W9,#1
@@ -235,8 +234,8 @@ Reference Manual [ARMARM_]::
   { thread_1:r0=1; [y]=2; }
 
 As such the unexpected outcome has disappeared. There are multiple Mappings
-that exhibit this behaviour, those affected make use of ``SWP`` and ``LD<OP>``
-Assembly instructions.
+that exhibit this behaviour. Assembly Sequences affected make use of ``SWP`` 
+and ``LD<OP>`` Assembly instructions.
 
 Documentation
 ~~~~~~~~~~~~~
@@ -254,13 +253,13 @@ The Mix Testing Process
 
 ABI compatibility must be testable. Concurrency is not trivial, and the ABI
 presents a simplification of part of the problem that is understandable by
-engineers. We provide novel, yet simple, techniques and tools for
-testing ABI compatibility. These techniques reduce the difficulty of checking
-compatibility from a problem of understanding concurrent executions, to the
-familiar testing domain of comparing program outcomes of tests. This document
-does not preclude other means of testing compatibility however.
+engineers. We provide a simple technique for testing ABI compatibility.
+These techniques reduce the difficulty of checking compatibility from a 
+problem of understanding concurrent executions, to the familiar testing 
+domain of comparing program outcomes of tests. This document does not 
+preclude other means of testing compatibility.
 
-We test for Compiler bugs, a Compiler Bug is defined as an outcome of a
+We test for Compiler bugs. A Compiler Bug is defined as an outcome of a
 compiled program execution (under the AArch64 Memory Model contained in
 §B2 of the Arm Architecture Reference Manual [ARMARM_]) that is not 
 an outcome of execution of the source Concurrent Program (under the 
@@ -272,9 +271,9 @@ Concurrent Program finishes execution in one of three possible outcomes
   { thread_0:r0=1, thread_1:r0=0 }
   { thread_0:r0=1, thread_1:r0=1 }
 
-and one possible compiled program outcome has the following according to the
-AArch64 Memory Model contained in §B2 of the Arm Architecture Reference Manual
-[ARMARM_]::
+and one possible compiled program outcome has the following outcomes 
+according to the AArch64 Memory Model contained in §B2 of the Arm 
+Architecture Reference Manual [ARMARM_]::
 
   { thread_0:X3=0, thread_1:X3=0 } <--- Forbidden by source model, Compiler Bug!
   { thread_0:X3=0, thread_1:X3=1 }
@@ -290,8 +289,8 @@ ensure compatibility we therefore test for the absence of such outcomes of the
 compiled programs when mixing all combinations of the above Mappings. We define
 the *Mix Testing* process as follows:
 
-#. Take an arbitrary Concurrent Program, when executed on the C/C++ memory
-   model will produce outcomes *S*.
+#. Take an arbitrary Concurrent Program. When executed on the C/C++ memory
+   model, it will produce outcomes *S*.
 #. Split out the individual Atomic Operations from the initial concurrent
    program into individual source files.
 #. Compile each individual source file containing an Atomic Operation 
@@ -303,6 +302,6 @@ the *Mix Testing* process as follows:
    contained in §B2 of the Arm Architecture Reference Manual [ARMARM_]. Get a
    *set* of compiled program outcomes *C*.
 #. If any compiled program set of outcomes *c* in *C* exhibits a Compiler Bug
-   (Check that *c* is a subset of *S*) with then the given Mappings are not
+   (Check that *c* is a subset of *S*), the given Mappings are not
    interoperable. 
 

From 7bc4213ae40b1f97436db646d399e342fef79933 Mon Sep 17 00:00:00 2001
From: lukeg101 <6547672+lukeg101@users.noreply.github.com>
Date: Mon, 19 Aug 2024 15:34:18 +0100
Subject: [PATCH 08/17] Addresses Sally's Feedback

---
 atomicsabi64/atomicsabi64.rst    | 10 +++++-----
 design-documents/atomics-ABI.rst | 27 +++++++++++----------------
 2 files changed, 16 insertions(+), 21 deletions(-)

diff --git a/atomicsabi64/atomicsabi64.rst b/atomicsabi64/atomicsabi64.rst
index afc7894a..dd56e109 100644
--- a/atomicsabi64/atomicsabi64.rst
+++ b/atomicsabi64/atomicsabi64.rst
@@ -48,9 +48,9 @@ Abstract
 
 This document describes the C/C++ Atomics Application Binary Interface for the
 Arm 64-bit architecture. This document concerns the valid Mappings from C/C++
-Atomic Operations to sequences of A64 instructions. Regarding the memory model, 
-please consult §B2 of the Arm Architecture Reference Manual [ARMARM_]. This 
-document only focusses on a subset of C11 atomic operations.
+Atomic Operations to sequences of A64 instructions. For further information 
+on the memory model, refer to §B2 of the Arm Architecture Reference Manual [ARMARM_]. 
+This document only focusses on a subset of C11 atomic operations.
 
 Keywords
 --------
@@ -243,8 +243,8 @@ This document refers to, or is referred to by, the following documents.
 
 
 
-Note: At the time of writing C23 is not released, as such ISO C17 is considered
-the latest published document.
+Note: At the time of writing, C23 is not released. Therefore, ISO C17 is considered 
+the most recently published document.
 
 .. raw:: pdf
 
diff --git a/design-documents/atomics-ABI.rst b/design-documents/atomics-ABI.rst
index f63c461e..3da9d71a 100644
--- a/design-documents/atomics-ABI.rst
+++ b/design-documents/atomics-ABI.rst
@@ -106,8 +106,7 @@ engineers with such issues.
 Backwards Compatibility and New Architecture Features
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-Put another way, a baseline ABI assists in deciding whether new mappings are compatible
-with compiler implementations targeting older versions of the Armv8 architecture.
+Put another way, A baseline ABI helps with the decisions of compatibility of new mappings.
 Certain instructions (such as Load/Store-Pair instructions [ARMARM_]) have different
 single-copy atomicity guarantees with respect to different architecture versions. A baseline
 decides which assembly sequences can be composed correctly (at least as far as testing can decide).
@@ -119,21 +118,17 @@ Compatibility Between Compilers and Runtimes
 The above issues also apply when ensuring object files compiled with different compilers can be mixed. 
 For instance LLVM and GCC code should be interoperable. At the time of writing we identified a number of
 places where this does not apply, both when compiling to target the same architecture version, and when mixing
-different (compatible) architecture versions. Further, the above is not limited to statically compiled code. We found
-one instance where proposed mappings implemented in a JiT compiler would not be interoperable with statically 
-compiled code the runtime links against. Even if a JiT compiles under one set of mappings, and is not subject to 
-an ABI, it may still depend on other libraries or components that do have an ABI.
+different (compatible) architecture versions. Further, the above issues are not limited to statically compiled 
+code. We found one instance where proposed mappings implemented in a JiT compiler would not be interoperable 
+with statically compiled code the runtime links against. Even if a JiT compiles under one set of mappings, and 
+is not subject to an ABI, it may still depend on other libraries or components that do have an ABI.
 
 
 Constrain optimisations
 ~~~~~~~~~~~~~~~~~~~~~~~
 
-There have been several instances where optimisations have been incorrectly applied,
-or attempts to apply optimisations to atomic code generation that induce unexpected
-concurrent program behaviour. This has happened frequently enough that we need to
-collect these cases together to outline why they should not occur. For example
-
-Consider the following Concurrent Program::
+The frequency of this behaviour justifies collecting these cases together to outline why they should not occur. 
+For example, consider the following Concurrent Program::
 
   // Shared-Memory Locations
   _Atomic int* x;
@@ -203,10 +198,10 @@ Reference Manual [ARMARM_]::
 
 By comparing ``W3`` and the local variable ``r0`` of the original Concurrent
 Program we see there is one additional outcome of executing the compiled
-program that is not an outcome of executing the Concurrent Program. The Arm 
-Architecture Reference Manual [ARMARM_] states that *instructions where the 
-destination register is WZR or XZR, are not regarded as doing a read for the 
-purpose of a DMB LD barrier.*
+program that is not an outcome of executing the Concurrent Program. This is 
+because the Arm Architecture Reference Manual [ARMARM_] states that 
+*instructions where the destination register is WZR or XZR, are not regarded 
+as doing a read for the purpose of a DMB LD barrier.*
 
 In this case the compiler introduces another outcome of Execution. To fix this
 issue, a compiler is not permitted to rewrite the destination register to be the

From 9ac36144e2e2e9f3a97cece16b6c59805516740e Mon Sep 17 00:00:00 2001
From: Wilco Dijkstra <wdijkstr@arm.com>
Date: Mon, 19 Aug 2024 15:38:26 +0100
Subject: [PATCH 09/17] Cleanup Terms and Abbreviations.

---
 atomicsabi64/atomicsabi64.rst | 78 +++++++++--------------------------
 1 file changed, 19 insertions(+), 59 deletions(-)

diff --git a/atomicsabi64/atomicsabi64.rst b/atomicsabi64/atomicsabi64.rst
index dd56e109..b241c2c4 100644
--- a/atomicsabi64/atomicsabi64.rst
+++ b/atomicsabi64/atomicsabi64.rst
@@ -48,9 +48,8 @@ Abstract
 
 This document describes the C/C++ Atomics Application Binary Interface for the
 Arm 64-bit architecture. This document concerns the valid Mappings from C/C++
-Atomic Operations to sequences of A64 instructions. For further information 
+Atomic Operations to sequences of AArch64 instructions. For further information 
 on the memory model, refer to §B2 of the Arm Architecture Reference Manual [ARMARM_]. 
-This document only focusses on a subset of C11 atomic operations.
 
 Keywords
 --------
@@ -256,9 +255,6 @@ Terms and Abbreviations
 The C/C++ Atomics ABI for the Arm 64-bit Architecture uses the following terms and
 abbreviations.
 
-A64
-   The instruction set available when in AArch64 state.
-
 AArch64
    The 64-bit general-purpose register width state of the Armv8 architecture.
 
@@ -277,67 +273,32 @@ ABI
 Arm-based
    ... based on the Arm architecture ...
 
-Thread of Execution
-   A unit of computation that executes one or more Atomic Operations,
-   Synchronization Operations or other C language statements. The Arm
-   Architecture Reference Manual [ARMARM_] calls these *Observers*. Typically a
-   thread is defined as a function (e.g. a POSIX thread) although we do not
-   limit threads to this type of implementation.
+Thread
+   A unit of computation (e.g. a POSIX thread) of a process, managed by the OS.
 
 Atomic Operation
-   A C/C++ operation on a Shared-Memory Location. Typically either a load,
-   store, exchange, compare, or arithmetic instruction (such as a fetch and add
-   operation). Atomics are used to define higher level primitives including
-   locks and concurrent queues. ISO C defines the range of supported atomic
-   operations and the ``atomic`` type. Operations on atomic-qualified data are
-   guaranteed not to be interrupted by another Thread of Execution.
+   An indivisble operation on a memory location. This can be a load, store,
+   exchange, compare, or arithmetic operation. Atomics may be used to define
+   higher level primitives including locks and concurrent queues. ISO C/C++
+   defines a range of supported atomic types and operations.
 
 Concurrent Program
-   A C or C++ program that consists of one or more Threads of Execution. Each
-   Thread of Execution must communicate with other threads in the Concurrent
-   Program through Shared-Memory Locations, using both Atomic Operations and
-   Non-Atomic Operations (Operations that lack the atomic qualifier) to be
-   deemed *concurrent*. This document focuses on compiling such programs for
-   Arm-based machines that run the A64 instruction set.
-
-Synchronization Operation
-   The order in which atomic operations are executed by each Thread of Execution
-   may not be the same as the order they are written in the program.
-   Synchronization Operations are statements that constrain the order of
-   accesses made to Shared-Memory Locations by each thread. Synchronization
-   Operations include Thread Fences.
-
-Shared-Memory Location
-   A memory location that can be accessed by any Thread of Execution in the
-   program.
+   A C or C++ program that consists of one or more threads. Threads may
+   communicate with each other through memory locations, using both Atomic
+   Operations and standard memory accesses.
 
 Memory Order Parameter
-   Describes a constraint on an Atomic Operation or Synchronization Operation.
-   Memory Order describes how memory accesses made by Atomic Operations may be
-   ordered with respect to other Atomic Operations and Synchronization
-   Operations. ISO C defines a ``memory_order`` enum type to capture the
-   possible memory order parameters.
-
-Thread Fence 
-   A Thread Fence is a Synchronization Operation that constrains the order of
-   Accesses made by Atomic Operations on a given Thread of Execution. Fences
-   are equipped with a Memory Order Parameter that specifies which kinds of
-   accesses may be reordered before or after the fence. ISO C defines the
-   ``atomic_thread_fence`` to synchronize the order of accesses made by atomic
-   operations on ``_Atomic`` qualified data.
+   The order of memory accesses as executed by each thread may not be the same
+   as the order they are written in the program. The Memory Order describes
+   how memory accesses are ordered with respect to other memory accesses or
+   Atomic Operations. ISO C/C++ defines a ``memory_order`` enum type for the set
+   of memory orders.
 
 Assembly Sequence
-   A sequence of A64 instructions, optionally including Atomic Instructions.
+   A sequence of AArch64 instructions.
 
 Mapping
-   A Mapping takes an Atomic Operation and Compiler Profile as input, 
-   producing an Assembly Sequence as output.
-
-Compiler Profile
-   A Compiler implementation and command-line flags or attributes that use
-   Mappings.
-
-More specific terminology is defined when it is first used.
+   A Mapping from an Atomic Operation to an Assembly Sequence.
 
 .. raw:: pdf
 
@@ -1174,9 +1135,8 @@ We impose some constraints on this definition:
   bounded testing. C/C++ Atomics ABI-compatibility is thus tested for the Mappings
   above by generating C/C++ Concurrent Programs that permute combinations of
   Atomic Operations on each Thread of Execution. We bound our test size between
-  2 and 5 Threads of Execution, where each Thread has at least 1 Atomic
-  Operation or Synchronization Operation and at most 5 Atomic Operations or
-  Synchronization Operations. We do not make any statement about the
+  2 and 5 threads, where each thread has at least 1 atomic operation or fence and
+  at most 5 atomic operations or fences. We do not make any statement about the
   ABI-Compatibility of Concurrent Programs outside these bounds.
 * We test Concurrent Programs with a fixed initial state, loop unroll factor
   (equal to 1 loop unroll), and function calls or recursion. 

From 13860e3de3840ad81fd83ae53ee65e3b24f6ac0c Mon Sep 17 00:00:00 2001
From: Wilco Dijkstra <wdijkstr@arm.com>
Date: Mon, 19 Aug 2024 16:35:59 +0100
Subject: [PATCH 10/17] More cleanups and changes from review comments.

---
 atomicsabi64/atomicsabi64.rst | 225 +++++++++++-----------------------
 1 file changed, 72 insertions(+), 153 deletions(-)

diff --git a/atomicsabi64/atomicsabi64.rst b/atomicsabi64/atomicsabi64.rst
index b241c2c4..fe559350 100644
--- a/atomicsabi64/atomicsabi64.rst
+++ b/atomicsabi64/atomicsabi64.rst
@@ -15,6 +15,7 @@
 .. _CSTD: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1548.pdf
 .. _PAPER: https://doi.org/10.1109/CGO57630.2024.10444836
 .. _OOPSLA: https://2024.splashcon.org/track/splash-2024-oopsla#event-overview
+.. _RATIONALE: https://github.com/ARM-software/abi-aa/design-documents/atomics-ABI.rst
 
 *********************************************************************************************
 C/C++ Atomics Application Binary Interface Standard for the Arm\ :sup:`®` 64-bit Architecture
@@ -47,9 +48,9 @@ Abstract
 --------
 
 This document describes the C/C++ Atomics Application Binary Interface for the
-Arm 64-bit architecture. This document concerns the valid Mappings from C/C++
-Atomic Operations to sequences of AArch64 instructions. For further information 
-on the memory model, refer to §B2 of the Arm Architecture Reference Manual [ARMARM_]. 
+Arm 64-bit architecture. This document lists the valid Mappings from C/C++
+Atomic Operations to sequences of AArch64 instructions. For further information
+on the memory model, refer to §B2 of the Arm Architecture Reference Manual [ARMARM_].
 
 Keywords
 --------
@@ -219,7 +220,7 @@ changes to the content of the document for that release.
   +=========+==============================+===================================================================+
   | 00rel0  | 19\ :sup:`th` August 2024.   | Release.                                                          |
   +---------+------------------------------+-------------------------------------------------------------------+
-  
+
 
 References
 ----------
@@ -237,14 +238,14 @@ This document refers to, or is referred to by, the following documents.
   +-------------+--------------------------------------------------------------+-----------------------------------------------------------------------------+
   | AAELF64_    | ELF for the Arm 64-bit Architecture (AArch64)                | ELF for the Arm 64-bit Architecture (AArch64)                               |
   +-------------+--------------------------------------------------------------+-----------------------------------------------------------------------------+
+  | CPPABI64_   | C++ ABI for the Arm 64-bit Architecture (AArch64)            | C++ ABI for the Arm 64-bit Architecture (AArch64)                           |
+  +-------------+--------------------------------------------------------------+-----------------------------------------------------------------------------+
+  | RATIONALE_  | Rationale Document for C11 Atomics ABI                       | Rationale Document for C11 Atomics ABI                                      |
+  +-------------+--------------------------------------------------------------+-----------------------------------------------------------------------------+
   | PAPER_      | CGO paper                                                    | Compiler Testing with Relaxed Memory Models                                 |
   +-------------+--------------------------------------------------------------+-----------------------------------------------------------------------------+
 
 
-
-Note: At the time of writing, C23 is not released. Therefore, ISO C17 is considered 
-the most recently published document.
-
 .. raw:: pdf
 
    PageBreak
@@ -294,11 +295,8 @@ Memory Order Parameter
    Atomic Operations. ISO C/C++ defines a ``memory_order`` enum type for the set
    of memory orders.
 
-Assembly Sequence
-   A sequence of AArch64 instructions.
-
 Mapping
-   A Mapping from an Atomic Operation to an Assembly Sequence.
+   A Mapping from an Atomic Operation to a sequence of AArch64 instructions.
 
 .. raw:: pdf
 
@@ -307,78 +305,22 @@ Mapping
 Overview
 ========
 
-The C/C++ Atomics ABI for the Arm 64-bit architecture (AABI64) comprises the
-following sub-components:
-
-* The `Mappings from Atomic Operations to Assembly Sequences`_ defines
-  the Mappings from C/C++ atomic operations to the Assembly 
-  Sequences that are interoperable with respect to each other.
+`AArch64 atomics`_ defines the Mappings from C/C++ atomic operations
+to AArch64 that are interoperable.
 
-* A `Declarative statement of Mappings compatibility`_, as far as
-  non-exhaustive testing can validate, that the aforementioned Mappings can be
-  used together. That is, there is no tested combination of Mappings that
-  induces unexpected program behaviour when a compiled program that uses
-  atomics is executed on a multi-core Arm-based machine.
+Arbitrary registers may be used in the Mappings. Instructions marked with ``*``
+in the tables cannot use ``WZR`` or ``XZR`` as a destination register. This is
+further detailed in `Special Cases`_.
 
-Mappings from Atomic Operations to Assembly Sequences
-=====================================================
+Only some variants of ``fetch_<op>`` are listed since the Mappings are identical
+except for a different ``<op>``.
 
-We now describe the compatible Mappings for C/C++ Atomic Operations and
-Assembly Sequences. Since there is a large number of ways these Mappings may be
-combined, we break down the tables by the width of the access, and list
-compatible Assembly Sequences for each Atomic Operation.
-
-This is an open ABI, we encourage suggestions and improvements to this 
-specification to be submitted to the `issue tracker page on
-GitHub <https://github.com/ARM-software/abi-aa/issues>`_.
-
-Notational Conventions
-----------------------
-To reduce repetition, we use the following notational conventions
-
-.. table::
-
-  +-----------------------------------------+--------------------------------------+
-  | Memory Order Parameter                  | Notation                             | 
-  +=========================================+======================================+
-  | ``memory_order_relaxed``                | ``relaxed``                          |
-  +-----------------------------------------+--------------------------------------+
-  | ``memory_order_acquire``                | ``acquire``                          |
-  +-----------------------------------------+--------------------------------------+
-  | ``memory_order_release``                | ``release``                          |
-  +-----------------------------------------+--------------------------------------+
-  | ``memory_order_acq_rel``                | ``acq_rel``                          |
-  +-----------------------------------------+--------------------------------------+
-  | ``memory_order_seq_cst``                | ``seq_cst``                          |
-  +-----------------------------------------+--------------------------------------+
-
-Arbitrary registers may be used in the Assembly Sequences that may change in
-compiler implementations. Cases where arbitrary registers may *not* be used are
-covered in the Special Cases section.
-
-Further, in what follows there may be multiple valid Mappings from Atomic
-Operation to Assembly Sequence, as made available by a given architecture
-extension. In this case we split the rows of the table to represent multiple
-options.
-
-.. table::
-
-  +--------------------------------------------------------+--------------------------------------+
-  | Atomic Operation                                       | Assembly Sequence                    | 
-  +============================================+===========+======================================+
-  | ``atomic_store_explicit(loc,val,relaxed)`` | ARCH1     | ``option A``                         |
-  +                                            +-----------+--------------------------------------+
-  |                                            | ARCH2     | ``option B``                         |
-  +--------------------------------------------+-----------+--------------------------------------+
-
-Where ARCH is either the base architecture (Armv8-A) or an extension like FEAT_LSE.
-
-Lastly, all operations are in a shorthand form:
+Atomic operations and Memory Order are abbreviated as follows:
 
 .. table::
 
   +----------------------------------------------------+--------------------------------------+
-  | Atomic Operation                                   | ShortHand Atomic Operation           | 
+  | Atomic Operation                                   | Short form                           |
   +====================================================+======================================+
   | ``atomic_store_explicit(...)``                     | ``store(...)``                       |
   +----------------------------------------------------+--------------------------------------+
@@ -388,17 +330,54 @@ Lastly, all operations are in a shorthand form:
   +----------------------------------------------------+--------------------------------------+
   | ``atomic_exchange_explicit(...)``                  | ``exchange(...)``                    |
   +----------------------------------------------------+--------------------------------------+
-  | ``atomic_fetch_add_explicit(...)``                 | ``fetch_add(...)``                   | 
+  | ``atomic_fetch_add_explicit(...)``                 | ``fetch_add(...)``                   |
   +----------------------------------------------------+--------------------------------------+
-  | ``atomic_fetch_sub_explicit(...)``                 | ``fetch_sub(...)``                   | 
+  | ``atomic_fetch_sub_explicit(...)``                 | ``fetch_sub(...)``                   |
   +----------------------------------------------------+--------------------------------------+
-  | ``atomic_fetch_or_explicit(...)``                  | ``fetch_or(...)``                    | 
+  | ``atomic_fetch_or_explicit(...)``                  | ``fetch_or(...)``                    |
   +----------------------------------------------------+--------------------------------------+
-  | ``atomic_fetch_xor_explicit(...)``                 | ``fetch_xor(...)``                   | 
+  | ``atomic_fetch_xor_explicit(...)``                 | ``fetch_xor(...)``                   |
   +----------------------------------------------------+--------------------------------------+
-  | ``atomic_fetch_and_explicit(...)``                 | ``fetch_and(...)``                   | 
+  | ``atomic_fetch_and_explicit(...)``                 | ``fetch_and(...)``                   |
   +----------------------------------------------------+--------------------------------------+
 
+.. table::
+
+  +----------------------------------------------------+--------------------------------------+
+  | Memory Order Parameter                             | Short form                           |
+  +====================================================+======================================+
+  | ``memory_order_relaxed``                           | ``relaxed``                          |
+  +----------------------------------------------------+--------------------------------------+
+  | ``memory_order_acquire``                           | ``acquire``                          |
+  +----------------------------------------------------+--------------------------------------+
+  | ``memory_order_release``                           | ``release``                          |
+  +----------------------------------------------------+--------------------------------------+
+  | ``memory_order_acq_rel``                           | ``acq_rel``                          |
+  +----------------------------------------------------+--------------------------------------+
+  | ``memory_order_seq_cst``                           | ``seq_cst``                          |
+  +----------------------------------------------------+--------------------------------------+
+
+If there are multiple Mappings for an Atomic Operation, the rows of the table
+show the options:
+
+.. table::
+
+  +----------------------------------------------------+--------------------------------------+
+  | Atomic Operation                                   | AArch64                              |
+  +========================================+===========+======================================+
+  | ``store(loc,val,relaxed)``             | ARCH1     | ``option A``                         |
+  +                                        +-----------+--------------------------------------+
+  |                                        | ARCH2     | ``option B``                         |
+  +----------------------------------------+-----------+--------------------------------------+
+
+Where ARCH is either the base architecture (Armv8-A) or an extension like FEAT_LSE.
+
+
+Suggestions and improvements to this specification may be submitted to:
+`issue tracker page on GitHub <https://github.com/ARM-software/abi-aa/issues>`_.
+
+AArch64 atomics
+===============
 
 Mappings for 32-bit types
 -------------------------
@@ -410,7 +389,7 @@ returned in ``W0``.
 .. table::
 
   +-----------------------------------------------------+--------------------------------------+
-  | Atomic Operation                                    | Assembly Sequence                    |
+  | Atomic Operation                                    | AArch64                              |
   +=====================================================+======================================+
   | ``store(loc,val,relaxed)``                          | .. code-block:: none                 |
   |                                                     |                                      |
@@ -602,17 +581,13 @@ returned in ``W0``.
   |                                     |               |                                      |
   |                                     |               |    CASAL  W0, W2, [X1] *             |
   +-------------------------------------+---------------+--------------------------------------+
-  | Note                                                                                       |
-  +--------------------------------------------------------------------------------------------+
-  | ``*`` Using ``WZR`` or ``XZR`` for the destination register is invalid (Section 4.7).      |
-  +--------------------------------------------------------------------------------------------+
 
 
 Mappings for 8-bit types
 ------------------------
 
 The Mappings for 8-bit types are the same as 32-bit types except they use the
-``B`` variants of instructions. 
+``B`` variants of instructions.
 
 
 Mappings for 16-bit types
@@ -634,14 +609,14 @@ Since the access width of 128-bit types is double that of the 64-bit register
 width, the following Mappings use *pair* instructions, which require their own
 table.
 
-In what follows, register ``X4`` contains the location ``loc``, ``X2`` and 
+In what follows, register ``X4`` contains the location ``loc``, ``X2`` and
 ``X3`` contain the input value ``val``. ``X0`` and ``X1`` contain input ``exp`` in
 compare-exchange. The result is returned in ``X0`` and ``X1``.
 
 .. table::
 
   +-----------------------------------------------------+--------------------------------------+
-  | Atomic Operation                                    | Assembly Sequence                    |
+  | Atomic Operation                                    | AArch64                              |
   +=====================================+===============+======================================+
   | ``store(loc,val,relaxed)``          | ``Armv8-A``   | .. code-block:: none                 |
   |                                     |               |                                      |
@@ -1080,80 +1055,24 @@ compare-exchange. The result is returned in ``X0`` and ``X1``.
 
 
 
-
-
-We do not list other variants of ``fetch_<op>`` since their Mappings should be
-the same (modulo implementations of <op> that are not in scope for this
-document). Precisely, implementations that use loops should use the instructions
-that load or store from memory with the relevant memory order, and the
-appropriate <op> Assembly Sequence inside the loop. Exceptions, where Assembly 
-Sequences exist, are stated. For instance ``fetch_or`` can be implemented using
-``LDSETP`` when the LSE128 extension is enabled.
-
 Special Cases
--------------
+=============
 
 Read-Modify-Write atomics must not use the zero register
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+--------------------------------------------------------
 
 ``CAS``, ``SWP`` and ``LD<OP>`` instructions must not use the zero register if
 the result is not used since it allows reordering of the read past a
-``DMB ISHLD`` barrier. Affected instructions are marked with ``*`` in section 4.2.
+``DMB ISHLD`` barrier. Affected instructions are marked with ``*``.
 
-Const-Qualified 128-bit Atomic Loads Should Be Marked Mutable
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+Const-Qualified 128-bit Atomic Loads
+------------------------------------
 
 Const-qualified data containing 128-bit atomic types should not be placed
 in read-only memory (such as the ``.rodata`` section).
 
-Before LSE2, the only way to implement a single-copy 128-bit atomic load
+Before FEAT_LSE2, the only way to implement a single-copy 128-bit atomic load
 is by using a Read-Modify-Write sequence. The write is not visible to
 software if the memory is writeable. Compilers and runtimes should prefer the
-LSE2/LRCPC3 sequence when available.
-
-
-Declarative statement of Mappings compatibility
-===============================================
-
-To ensure that the above Mappings are ABI-compatible we test the compilation of
-Concurrent Programs, where each Atomic Operation is compiled to one of the
-aforementioned Mappings. We test if there is a compiled program that exhibits
-an outcome of execution according to the AArch64 Memory Model contained in §B2
-of the Arm Architecture Reference Manual [ARMARM_] that is not an outcome of
-execution of the source program under the ISO C model. In this section we
-define the process by which we test compatibility. 
-
-Definition of ABI-Compatibility for Atomic Operations
------------------------------------------------------
-
-*A compiler that implement these Mappings and special cases is ABI-Compatible with
-respect to other compilers that implement the Mappings and special cases.*
-
-We impose some constraints on this definition:
-
-* This is not a correctness guarantee, but rather a statement backed up by
-  bounded testing. C/C++ Atomics ABI-compatibility is thus tested for the Mappings
-  above by generating C/C++ Concurrent Programs that permute combinations of
-  Atomic Operations on each Thread of Execution. We bound our test size between
-  2 and 5 threads, where each thread has at least 1 atomic operation or fence and
-  at most 5 atomic operations or fences. We do not make any statement about the
-  ABI-Compatibility of Concurrent Programs outside these bounds.
-* We test Concurrent Programs with a fixed initial state, loop unroll factor
-  (equal to 1 loop unroll), and function calls or recursion. 
-* The above Mappings are not exhaustive. We recommend that Arm's partners
-  submit requests for other Mappings to the ABI team using the `issue tracker page on GitHub <https://github.com/ARM-software/abi-aa/issues>`_.
-* This document makes no statement about the ABI-Compatibility of optimised
-  Concurrent Programs, nor does it make a statement concerning the performance of
-  compiled programs under the above Mappings when executed on a given Arm-based
-  machine.
-* This document makes no statement about the ABI-Compatibility of compilers
-  that implement Mappings other than what is stated in this document.
-
-Appendix: Mix Testing
-=====================
-
-The status of this appendix is informative.
-
-
-
+FEAT_LSE2/FEAT_LRCPC3 sequence when available.
 

From caf08b8ca5aa13209e0edffbc8d0f9e2347cc6a2 Mon Sep 17 00:00:00 2001
From: Wilco Dijkstra <wdijkstr@arm.com>
Date: Thu, 22 Aug 2024 15:18:21 +0100
Subject: [PATCH 11/17] Further cleanups, split off fences.

---
 atomicsabi64/atomicsabi64.rst | 85 +++++++++++++++++++----------------
 1 file changed, 47 insertions(+), 38 deletions(-)

diff --git a/atomicsabi64/atomicsabi64.rst b/atomicsabi64/atomicsabi64.rst
index fe559350..ce03c411 100644
--- a/atomicsabi64/atomicsabi64.rst
+++ b/atomicsabi64/atomicsabi64.rst
@@ -48,7 +48,7 @@ Abstract
 --------
 
 This document describes the C/C++ Atomics Application Binary Interface for the
-Arm 64-bit architecture. This document lists the valid Mappings from C/C++
+Arm 64-bit architecture. This document lists the valid mappings from C/C++
 Atomic Operations to sequences of AArch64 instructions. For further information
 on the memory model, refer to §B2 of the Arm Architecture Reference Manual [ARMARM_].
 
@@ -296,7 +296,7 @@ Memory Order Parameter
    of memory orders.
 
 Mapping
-   A Mapping from an Atomic Operation to a sequence of AArch64 instructions.
+   A mapping from an Atomic Operation to a sequence of AArch64 instructions.
 
 .. raw:: pdf
 
@@ -305,14 +305,14 @@ Mapping
 Overview
 ========
 
-`AArch64 atomics`_ defines the Mappings from C/C++ atomic operations
+`AArch64 atomic mappings`_ defines the mappings from C/C++ atomic operations
 to AArch64 that are interoperable.
 
-Arbitrary registers may be used in the Mappings. Instructions marked with ``*``
+Arbitrary registers may be used in the mappings. Instructions marked with ``*``
 in the tables cannot use ``WZR`` or ``XZR`` as a destination register. This is
 further detailed in `Special Cases`_.
 
-Only some variants of ``fetch_<op>`` are listed since the Mappings are identical
+Only some variants of ``fetch_<op>`` are listed since the mappings are identical
 except for a different ``<op>``.
 
 Atomic operations and Memory Order are abbreviated as follows:
@@ -357,7 +357,7 @@ Atomic operations and Memory Order are abbreviated as follows:
   | ``memory_order_seq_cst``                           | ``seq_cst``                          |
   +----------------------------------------------------+--------------------------------------+
 
-If there are multiple Mappings for an Atomic Operation, the rows of the table
+If there are multiple mappings for an Atomic Operation, the rows of the table
 show the options:
 
 .. table::
@@ -376,11 +376,34 @@ Where ARCH is either the base architecture (Armv8-A) or an extension like FEAT_L
 Suggestions and improvements to this specification may be submitted to:
 `issue tracker page on GitHub <https://github.com/ARM-software/abi-aa/issues>`_.
 
-AArch64 atomics
-===============
 
-Mappings for 32-bit types
--------------------------
+
+AArch64 atomic mappings
+=======================
+
+Synchronization Fences
+----------------------
+
+  +-----------------------------------------------------+--------------------------------------+
+  | Fence                                               | AArch64                              |
+  +=====================================================+======================================+
+  | ``atomic_thread_fence(relaxed)``                    | .. code-block:: none                 |
+  |                                                     |                                      |
+  |                                                     |    NOP                               |
+  +-----------------------------------------------------+--------------------------------------+
+  | ``atomic_thread_fence(acquire)``                    | .. code-block:: none                 |
+  |                                                     |                                      |
+  |                                                     |    DMB ISHLD                         |
+  +-----------------------------------------------------+--------------------------------------+
+  | ``atomic_thread_fence(release)``                    | .. code-block:: none                 |
+  |                                                     |                                      |
+  | ``atomic_thread_fence(acq_rel)``                    |    DMB ISH                           |
+  |                                                     |                                      |
+  | ``atomic_thread_fence(seq_cst)``                    |                                      |
+  +-------------------------------------+---------------+--------------------------------------+
+
+32-bit types
+------------
 
 In what follows, register ``X1`` contains the location ``loc`` and ``W2``
 contains ``val``. ``W0`` contains input ``exp`` in compare-exchange.  The result is
@@ -414,20 +437,6 @@ returned in ``W0``.
   | ``load(loc,seq_cst)``                               | .. code-block:: none                 |
   |                                                     |                                      |
   |                                                     |    LDAR   W2, [X1]                   |
-  +-----------------------------------------------------+--------------------------------------+
-  | ``fence(relaxed)``                                  | .. code-block:: none                 |
-  |                                                     |                                      |
-  |                                                     |    NOP                               |
-  +-----------------------------------------------------+--------------------------------------+
-  | ``fence(acquire)``                                  | .. code-block:: none                 |
-  |                                                     |                                      |
-  |                                                     |    DMB ISHLD                         |
-  +-----------------------------------------------------+--------------------------------------+
-  | ``fence(release)``                                  | .. code-block:: none                 |
-  |                                                     |                                      |
-  | ``fence(acq_rel)``                                  |    DMB ISH                           |
-  |                                                     |                                      |
-  | ``fence(seq_cst)``                                  |                                      |
   +-------------------------------------+---------------+--------------------------------------+
   | ``exchange(loc,val,relaxed)``       | ``Armv8-A``   | .. code-block:: none                 |
   |                                     |               |                                      |
@@ -583,30 +592,30 @@ returned in ``W0``.
   +-------------------------------------+---------------+--------------------------------------+
 
 
-Mappings for 8-bit types
-------------------------
+8-bit types
+-----------
 
-The Mappings for 8-bit types are the same as 32-bit types except they use the
+The mappings for 8-bit types are the same as 32-bit types except they use the
 ``B`` variants of instructions.
 
 
-Mappings for 16-bit types
--------------------------
+16-bit types
+------------
 
-The Mappings for 16-bit types are the same as 32-bit types except they use the
+The mappings for 16-bit types are the same as 32-bit types except they use the
 ``H`` variants of instructions.
 
-Mappings for 64-bit types
--------------------------
+64-bit types
+------------
 
-The Mappings for 64-bit types are the same as 32-bit types except the registers
+The mappings for 64-bit types are the same as 32-bit types except the registers
 used are X-registers.
 
-Mappings for 128-bit types
---------------------------
+128-bit types
+-------------
 
 Since the access width of 128-bit types is double that of the 64-bit register
-width, the following Mappings use *pair* instructions, which require their own
+width, the following mappings use *pair* instructions, which require their own
 table.
 
 In what follows, register ``X4`` contains the location ``loc``, ``X2`` and
@@ -1058,8 +1067,8 @@ compare-exchange. The result is returned in ``X0`` and ``X1``.
 Special Cases
 =============
 
-Read-Modify-Write atomics must not use the zero register
---------------------------------------------------------
+Unused result in Read-Modify-Write atomics
+------------------------------------------
 
 ``CAS``, ``SWP`` and ``LD<OP>`` instructions must not use the zero register if
 the result is not used since it allows reordering of the read past a

From 42cf2978820325ad969841eee4a69100665aaf12 Mon Sep 17 00:00:00 2001
From: lukeg101 <6547672+lukeg101@users.noreply.github.com>
Date: Wed, 28 Aug 2024 16:33:47 +0100
Subject: [PATCH 12/17] Ties Feedback on phrasing

---
 design-documents/atomics-ABI.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/design-documents/atomics-ABI.rst b/design-documents/atomics-ABI.rst
index 3da9d71a..e8a44949 100644
--- a/design-documents/atomics-ABI.rst
+++ b/design-documents/atomics-ABI.rst
@@ -266,7 +266,7 @@ Concurrent Program finishes execution in one of three possible outcomes
   { thread_0:r0=1, thread_1:r0=0 }
   { thread_0:r0=1, thread_1:r0=1 }
 
-and one possible compiled program outcome has the following outcomes 
+and one compiled program execution run has the following possible outcomes 
 according to the AArch64 Memory Model contained in §B2 of the Arm 
 Architecture Reference Manual [ARMARM_]::
 

From 7ba0888c2d3d77f905ad885ce6997d85d7335b2c Mon Sep 17 00:00:00 2001
From: lukeg101 <6547672+lukeg101@users.noreply.github.com>
Date: Wed, 28 Aug 2024 16:58:42 +0100
Subject: [PATCH 13/17] peter feedback

---
 design-documents/atomics-ABI.rst | 55 +++++++++++++++++++++-----------
 1 file changed, 36 insertions(+), 19 deletions(-)

diff --git a/design-documents/atomics-ABI.rst b/design-documents/atomics-ABI.rst
index e8a44949..8f7725b9 100644
--- a/design-documents/atomics-ABI.rst
+++ b/design-documents/atomics-ABI.rst
@@ -5,10 +5,26 @@
 
 .. _ARMARM: https://developer.arm.com/documentation/ddi0487/latest
 .. _PAPER: https://doi.org/10.1109/CGO57630.2024.10444836
+.. _ATOMICS64: https://github.com/ARM-software/abi-aa/atomicsabi64/atomicsabi64.rst
 
 Rationale Document for C11 Atomics ABI.
 ***************************************
 
+Scope
+=====
+
+This document contains the design rationale for C/C++ Atomics Application 
+Binary Interface Standard for the Arm\ :sup:`®` 64-bit Architecture 
+defined in (ATOMICS64_). Nothing in this document
+is part of the specification. The purpose is to record the rationale
+for the specification as well as alternatives that were considered.
+Any contradictions between this rationale and the specification shall
+be resolved in favor of the specification.
+
+This document assumes that the reader is familiar with (ATOMICS64_)
+and the 32-bit build attributes defined in (ATOMICS64_) and will use
+concepts defined in these documents.
+
 Preamble
 ========
 
@@ -24,19 +40,19 @@ make:
 - We need to choose a baseline ABI (a set of mappings), that is compatible for all versions of the Armv8 architecture.
 - The mappings should cover atomic accesses of various sign, size, and type accessible through C11 atomic operations using compiler profiles.
 
-The main trade-offs we have identified or have been made aware of are:
+We have identified the following trade-offs:
 
 - Performance of different mappings versus compatibility with all architectures.
 - Whether certain compiler operations lead to unexpected behaviours.
 
 As motivated by the use cases expanded upon below:
 
-- The need for a baseline ABI
-- Knowing when an implementation departs from that baseline
-- Backwards compatibility of atomics as new mappings are added
-- Compatibility between compilers and runtimes
-- The need to constrain optimisations on specific atomic operations
-- Documenting the interoperable mappings
+- The need for a baseline ABI.
+- Knowing when an implementation departs from that baseline.
+- Backwards compatibility of atomics as new mappings are added.
+- Compatibility between compilers and runtimes.
+- The need to constrain optimisations on specific atomic operations.
+- Documenting the interoperable mappings.
 - providing a basis upon which ABI compatibility can be tested.
 
 References
@@ -59,18 +75,18 @@ This document refers to, or is referred to by, the following documents.
 Note: At the time of writing C23 is not released, as such ISO C17 is considered
 the latest published document.
 
-Use-cases known of so far
--------------------------
+Known use-cases
+---------------
 
 
 A Baseline: Describing current implementations
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-The ABI we provide is a baseline specification that compilers should or do implement.
-The ABI provides a grounds to be compatible across all versions of the Armv8 architecture. Most
-of the mappings in the ABI are already implemented in LLVM and GCC and this ABI ratifies
-a decade of established practice, and provides alternatives where the current practice
-is incompatible.
+The ABI we provide is a baseline specification that compilers should implement.
+Compilers that implement the baseline specification are compatible across all versions 
+of the Armv8 architecture. Most of the mappings in the ABI are already implemented in 
+LLVM and GCC and this ABI ratifies a decade of established practice, and provides 
+alternatives where the current practice is incompatible.
 
 
 Sub-ABIs and ABI-islands: Departing from the baseline (or 'mainland')
@@ -88,14 +104,15 @@ unintentionally introduced into compilers when new mappings are added.
 We need a baseline ABI in order to determine if a given sub-ABI respects or departs
 from the baseline. Adding command-line options is a logical consequence of defining such an ABI, 
 and makes it possible to track ABI compatibility of concurrent programs at compile or link-time,
-rather than runtime. It is the responsibility of the sub-ABI maintainer to ensure code built
+rather than runtime. It is the responsibility of the sub-ABI user to ensure code built
 under their ABI does not mix with code built under the baseline. But a baseline must exist 
 for sub-ABI compatibility to be decided in the first place.
 
-A baseline provides the means to describe or contain ABI-islands. Where a compiler implementation
-departs from the baseline completely (an ABI-island), it would be the responsibility of the
-maintainer of that implementation to ensure their programs are not mixed with programs built for 
-baseline ABI compatibility, or provide adequate warnings at compile time. 
+Where a compiler implementation departs from the baseline completely (an ABI-island), 
+Arm cannot provide any statement on the compatibility of the extensions with respect 
+to the baseline specification. In the ABI-island, which could be a known incompatibility 
+with the base-line then users should not mix ABIs. It is QoI whether a toolchain is 
+able to diagnose incompatibility.
 
 Further, numerous parties have asked the ABI team whether the same atomics mapping is correct. 
 Writing down the known cases helps engineers answer these queries without the concurrency 

From 0e385573d086883c20c5c0b285e1512897ee6f36 Mon Sep 17 00:00:00 2001
From: lukeg101 <6547672+lukeg101@users.noreply.github.com>
Date: Wed, 28 Aug 2024 17:02:25 +0100
Subject: [PATCH 14/17] peter feedback

---
 design-documents/atomics-ABI.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/design-documents/atomics-ABI.rst b/design-documents/atomics-ABI.rst
index 8f7725b9..902c4467 100644
--- a/design-documents/atomics-ABI.rst
+++ b/design-documents/atomics-ABI.rst
@@ -116,7 +116,7 @@ able to diagnose incompatibility.
 
 Further, numerous parties have asked the ABI team whether the same atomics mapping is correct. 
 Writing down the known cases helps engineers answer these queries without the concurrency 
-expertise required to come up with current compatible mappings. A future section of the ABI 
+expertise required to come up with current compatible mappings. A future section of this document 
 could document common queries received by the ABI team, in order to assist implementers and 
 engineers with such issues.
 

From 93734e8b96db7b7d7419769d13df1c1533a58d5c Mon Sep 17 00:00:00 2001
From: lukeg101 <6547672+lukeg101@users.noreply.github.com>
Date: Thu, 29 Aug 2024 11:47:32 +0100
Subject: [PATCH 15/17] alpha release

---
 atomicsabi64/atomicsabi64.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/atomicsabi64/atomicsabi64.rst b/atomicsabi64/atomicsabi64.rst
index ce03c411..5e769ffd 100644
--- a/atomicsabi64/atomicsabi64.rst
+++ b/atomicsabi64/atomicsabi64.rst
@@ -218,7 +218,7 @@ changes to the content of the document for that release.
   +---------+------------------------------+-------------------------------------------------------------------+
   | Issue   | Date                         | Change                                                            |
   +=========+==============================+===================================================================+
-  | 00rel0  | 19\ :sup:`th` August 2024.   | Release.                                                          |
+  | 00alp0  | 19\ :sup:`th` August 2024.   | Alpha Release.                                                    |
   +---------+------------------------------+-------------------------------------------------------------------+
 
 

From e5076e3cfbe87c7fef17f8f1f4e1fba7cfa38451 Mon Sep 17 00:00:00 2001
From: lukeg101 <6547672+lukeg101@users.noreply.github.com>
Date: Thu, 29 Aug 2024 12:31:02 +0100
Subject: [PATCH 16/17] Ties II

---
 atomicsabi64/atomicsabi64.rst    |  2 +-
 design-documents/atomics-ABI.rst | 28 +++++++++++++++-------------
 2 files changed, 16 insertions(+), 14 deletions(-)

diff --git a/atomicsabi64/atomicsabi64.rst b/atomicsabi64/atomicsabi64.rst
index 5e769ffd..20ae47bd 100644
--- a/atomicsabi64/atomicsabi64.rst
+++ b/atomicsabi64/atomicsabi64.rst
@@ -373,7 +373,7 @@ show the options:
 Where ARCH is either the base architecture (Armv8-A) or an extension like FEAT_LSE.
 
 
-Suggestions and improvements to this specification may be submitted to:
+Suggestions and improvements to this specification may be submitted to the:
 `issue tracker page on GitHub <https://github.com/ARM-software/abi-aa/issues>`_.
 
 
diff --git a/design-documents/atomics-ABI.rst b/design-documents/atomics-ABI.rst
index 902c4467..2f12ecdd 100644
--- a/design-documents/atomics-ABI.rst
+++ b/design-documents/atomics-ABI.rst
@@ -15,14 +15,14 @@ Scope
 
 This document contains the design rationale for C/C++ Atomics Application 
 Binary Interface Standard for the Arm\ :sup:`®` 64-bit Architecture 
-defined in (ATOMICS64_). Nothing in this document
+defined in ATOMICS64_. Nothing in this document
 is part of the specification. The purpose is to record the rationale
 for the specification as well as alternatives that were considered.
 Any contradictions between this rationale and the specification shall
 be resolved in favor of the specification.
 
-This document assumes that the reader is familiar with (ATOMICS64_)
-and the 32-bit build attributes defined in (ATOMICS64_) and will use
+This document assumes that the reader is familiar with ATOMICS64_
+and the 32-bit build attributes defined in ATOMICS64_ and will use
 concepts defined in these documents.
 
 Preamble
@@ -45,7 +45,7 @@ We have identified the following trade-offs:
 - Performance of different mappings versus compatibility with all architectures.
 - Whether certain compiler operations lead to unexpected behaviours.
 
-As motivated by the use cases expanded upon below:
+The use cases expanded upon below motivate why we need an atomics abi:
 
 - The need for a baseline ABI.
 - Knowing when an implementation departs from that baseline.
@@ -62,13 +62,15 @@ This document refers to, or is referred to by, the following documents.
 
 .. table::
 
-  +-------------+--------------------------------------------------------------+-----------------------------------------------------------------------------+
-  | Ref         | External reference or URL                                    | Title                                                                       |
-  +=============+==============================================================+=============================================================================+
-  | ARMARM_     | DDI 0487                                                     | Arm Architecture Reference Manual Armv8 for Armv8-A architecture profile    |
-  +-------------+--------------------------------------------------------------+-----------------------------------------------------------------------------+
-  | PAPER_      | CGO paper                                                    | Compiler Testing with Relaxed Memory Models                                 |
-  +-------------+--------------------------------------------------------------+-----------------------------------------------------------------------------+
+  +-------------+--------------------------------------------------------------+-----------------------------------------------------------------------------------------------+
+  | Ref         | External reference or URL                                    | Title                                                                                         |
+  +=============+==============================================================+===============================================================================================+
+  | ARMARM_     | DDI 0487                                                     | Arm Architecture Reference Manual Armv8 for Armv8-A architecture profile                      |
+  +-------------+--------------------------------------------------------------+-----------------------------------------------------------------------------------------------+
+  | PAPER_      | CGO paper                                                    | Compiler Testing with Relaxed Memory Models                                                   |
+  +-------------+--------------------------------------------------------------+-----------------------------------------------------------------------------------------------+
+  | ATOMICS64_  | Atomics ABI                                                  | C/C++ Atomics Application Binary Interface Standard for the Arm\ :sup:`®` 64-bit Architecture |
+  +-------------+--------------------------------------------------------------+-----------------------------------------------------------------------------------------------+
 
 
 
@@ -195,7 +197,7 @@ the following Assembly Sequences::
     LDR W3,[X4]
 
 where ``thread_0:X2`` contains the address of ``x``, ``thread_0:X4`` contains
-the address of ``y``, and ``thread_1:X2`` contains the address of ``y``,
+the address of ``y``,  ``thread_1:X2`` contains the address of ``y``, and 
 ``thread_1:X4`` contains the address of ``x``.
 
 The ``exchange`` Atomic Operation is compiled to a ``SWP`` Assembly
@@ -266,7 +268,7 @@ The Mix Testing Process
 ABI compatibility must be testable. Concurrency is not trivial, and the ABI
 presents a simplification of part of the problem that is understandable by
 engineers. We provide a simple technique for testing ABI compatibility.
-These techniques reduce the difficulty of checking compatibility from a 
+This technique reduces the difficulty of checking compatibility from a 
 problem of understanding concurrent executions, to the familiar testing 
 domain of comparing program outcomes of tests. This document does not 
 preclude other means of testing compatibility.

From eb315cff357334a4c7665579a64d375e9c9f10be Mon Sep 17 00:00:00 2001
From: lukeg101 <6547672+lukeg101@users.noreply.github.com>
Date: Thu, 29 Aug 2024 16:05:54 +0100
Subject: [PATCH 17/17] alpha release on current status line

---
 atomicsabi64/atomicsabi64.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/atomicsabi64/atomicsabi64.rst b/atomicsabi64/atomicsabi64.rst
index 20ae47bd..cf3d915c 100644
--- a/atomicsabi64/atomicsabi64.rst
+++ b/atomicsabi64/atomicsabi64.rst
@@ -203,7 +203,7 @@ specifications:
    The content of this specification is a draft, and Arm considers the
    likelihood of future incompatible changes to be significant.
 
-All content in this document is at the **Release** quality level.
+All content in this document is at the **Alpha** quality level.
 
 Change History
 --------------