diff --git a/main/acle.md b/main/acle.md index 2ab32fe9..5f23e1b6 100644 --- a/main/acle.md +++ b/main/acle.md @@ -411,6 +411,8 @@ Armv8.4-A [[ARMARMv84]](#ARMARMv84). Support is added for the Dot Product intrin and maximum intrinsics (FEAT_FAMINMAX). * Added specifications for table lookup intrinsics (FEAT_LUT, FEAT_SME_LUTv2). * Release support level of the [Custom Datapath Extension](#custom-datapath-extension). +* Added [**Alpha**](#current-status-and-anticipated-changes) + support for modal 8-bit floating point intrinsics. ### References @@ -751,6 +753,9 @@ The predefined types are: * The `__bf16` type for 16-bit brain floating-point values (see [Half-precision brain floating-point](#half-precision-brain-floating-point)). +* The `__mfp8` type for the modal 8-bit floating-point values (see +[Modal 8-bit floating point types](#modal-8-bit-floating-point)). + ### Implementation-defined type properties ACLE and the Arm ABI allow implementations some freedom in order to @@ -1280,6 +1285,12 @@ sequence of instructions to achieve the conversion. Providing emulation libraries for half-precision floating point conversions when not implemented in hardware is implementation-defined. +### Modal 8-bit floating-point + +ACLE defines the `__mfp8` type, which can be used for the E5M2 and E4M3 +8-bit floating-point formats ("FP8"). It is a storage and interchange +only type with no arithmetic operations other than intrinsic calls. + # Architecture and CPU names ## Introduction @@ -2147,6 +2158,48 @@ and if the associated ACLE intrinsics are available. lookup table instructions with 4-bit indices and 8-bit elements (FEAT_SME_LUTv2) and if the associated ACLE intrinsics are available. +### Modal 8-bit floating point extensions + +`__ARM_FEATURE_FP8` is defined to 1 if there is hardware support for FP8 conversion +instructions (FEAT_FP8) and if the associated ACLE intrinsics are available. + +`__ARM_FEATURE_FP8FMA` is defined to 1 if there is hardware support for +FP8 multiply-accumulate to half-precision and single-precision instructions +(FEAT_FP8FMA) and if the associated ACLE intrinsics are available. + +`__ARM_FEATURE_FP8DOT2` is defined to 1 if there is hardware support for +FP8 2-way dot product to half-precision instructions (FEAT_FP8DOT2) +and if the associated ACLE intrinsics are available. + +`__ARM_FEATURE_FP8DOT4` is defined to 1 if there is hardware support for +FP8 4-way dot product to single-precision instructions (FEAT_FP8DOT4) +and if the associated ACLE intrinsics are available. + +`__ARM_FEATURE_SSVE_FP8DOT4` is defined to 1 if there is hardware support for +SVE2 FP8 4-way dot product to single-precision instructions +in Streaming SVE mode (FEAT_SSVE_FP8DOT4) and if the associated ACLE +intrinsics are available. + +`__ARM_FEATURE_SSVE_FP8DOT2` is defined to 1 if there is hardware support for +SVE2 FP8 2-way dot product to half-precision instructions +in Streaming SVE mode (FEAT_SSVE_FP8DOT2) and if the associated ACLE intrinsics +are available. + +`__ARM_FEATURE_SSVE_FP8FMA` is defined to 1 if there is hardware support for +SVE2 FP8 multiply-accumulate to half-precision and single-precision +instructions in Streaming SVE mode (FEAT_SSVE_FP8FMA) and if the associated +ACLE intrinsics are available. + +`__ARM_FEATURE_SME_F8F32` is defined to 1 if there is hardware support for SME2 +FP8 dot product, multiply-accumulate, and outer product to single-precision +instructions (FEAT_SME_F8F32) and if the associated ACLE intrinsics are +available. + +`__ARM_FEATURE_SME_F8F16` is defined to 1 if there is hardware support for SME2 +FP8 dot product, multiply-accumulate, and outer product to half-precision +instructions (FEAT_SME_F8F16) and if the associated ACLE intrinsics are +available. + ### Other floating-point and vector extensions #### Fused multiply-accumulate (FMA) @@ -2437,6 +2490,10 @@ be found in [[BA]](#BA). | [`__ARM_FEATURE_FAMINMAX`](#floating-point-absolute-minimum-and-maximum-extension) | Floating-point absolute minimum and maximum extension | 1 | | [`__ARM_FEATURE_FMA`](#fused-multiply-accumulate-fma) | Floating-point fused multiply-accumulate | 1 | | [`__ARM_FEATURE_FP16_FML`](#fp16-fml-extension) | FP16 FML extension (Arm v8.4-A, optional Armv8.2-A, Armv8.3-A) | 1 | +| [`__ARM_FEATURE_FP8`](#modal-8-bit-floating-point-extensions) | Modal 8-bit floating-point extensions | 1 | +| [`__ARM_FEATURE_FP8DOT2`](#modal-8-bit-floating-point-extensions) | Modal 8-bit floating-point extensions | 1 | +| [`__ARM_FEATURE_FP8DOT4`](#modal-8-bit-floating-point-extensions) | Modal 8-bit floating-point extensions | 1 | +| [`__ARM_FEATURE_FP8FMA`](#modal-8-bit-floating-point-extensions) | Modal 8-bit floating-point extensions | 1 | | [`__ARM_FEATURE_FRINT`](#availability-of-armv8.5-a-floating-point-rounding-intrinsics) | Floating-point rounding extension (Arm v8.5-A) | 1 | | [`__ARM_FEATURE_IDIV`](#hardware-integer-divide) | Hardware Integer Divide | 1 | | [`__ARM_FEATURE_JCVT`](#javascript-floating-point-conversion) | Javascript conversion (ARMv8.3-A) | 1 | @@ -2466,9 +2523,14 @@ be found in [[BA]](#BA). | [`__ARM_FEATURE_SME_B16B16`](#non-widening-brain-16-bit-floating-point-support) | Non-widening brain 16-bit floating-point SME intrinsics (FEAT_SME_B16B16) | 1 | | [`__ARM_FEATURE_SME_F16F16`](#half-precision-floating-point-sme-intrinsics) | Half-precision floating-point SME intrinsics (FEAT_SME_F16F16) | 1 | | [`__ARM_FEATURE_SME_F64F64`](#double-precision-floating-point-outer-product-intrinsics) | Double precision floating-point outer product intrinsics (FEAT_SME_F64F64) | 1 | +| [`__ARM_FEATURE_SME_F8F16`](#modal-8-bit-floating-point-extensions) | Modal 8-bit floating-point extensions | 1 | +| [`__ARM_FEATURE_SME_F8F32`](#modal-8-bit-floating-point-extensions) | Modal 8-bit floating-point extensions | 1 | | [`__ARM_FEATURE_SME_I16I64`](#16-bit-to-64-bit-integer-widening-outer-product-intrinsics) | 16-bit to 64-bit integer widening outer product intrinsics (FEAT_SME_I16I64) | 1 | | [`__ARM_FEATURE_SME_LOCALLY_STREAMING`](#scalable-matrix-extension-sme) | Support for the `arm_locally_streaming` attribute | 1 | | [`__ARM_FEATURE_SME_LUTv2`](#lookup-table-extensions) | Lookup table extensions (FEAT_SME_LUTv2) | 1 | +| [`__ARM_FEATURE_SSVE_FP8DOT2`](#modal-8-bit-floating-point-extensions) | Modal 8-bit floating-point extensions | 1 | +| [`__ARM_FEATURE_SSVE_FP8DOT4`](#modal-8-bit-floating-point-extensions) | Modal 8-bit floating-point extensions | 1 | +| [`__ARM_FEATURE_SSVE_FP8FMA`](#modal-8-bit-floating-point-extensions) | Modal 8-bit floating-point extensions | 1 | | [`__ARM_FEATURE_SVE`](#scalable-vector-extension-sve) | Scalable Vector Extension (FEAT_SVE) | 1 | | [`__ARM_FEATURE_SVE_B16B16`](#non-widening-brain-16-bit-floating-point-support) | Non-widening brain 16-bit floating-point intrinsics (FEAT_SVE_B16B16) | 1 | | [`__ARM_FEATURE_SVE_BF16`](#brain-16-bit-floating-point-support) | SVE support for the 16-bit brain floating-point extension (FEAT_BF16) | 1 | @@ -5619,6 +5681,137 @@ each architecture includes its predecessor instruction set. | WFI | | 8,6K, 6-M | `__wfi` | | YIELD | | 8,6K, 6-M | `__yield` | +# About FP8 intrinsics + +The specification for FP8 intrinsics is in +[**Alpha** state](#current-status-and-anticipated-changes). + +Each 8-bit floating point intrinsic call has a parameter to define the format +and scale of the operands, and the overflow behavior, as applicable to each +operation. This parameter is typically declared as `fpm_t fpm`. + +```c + typedef uint64_t fpm_t; +``` + +The bits of an argument to an `fpm` parameter are interpreted as follows: + +| **Bit range** | **Name** | **Meaning** | +| ------------- | -------------- | ------------------------------------------------------------------ | +| 0-2 | `src1_format` | first source operand format: 0 - FP8 E5M2, 1 - FP8 E4M3 | +| 3-5 | `src2_format` | second source operand format: 0 - FP8 E5M2, 1 - FP8 E4M3 | +| 6-8 | `dst_format` | destination format: 0 - FP8 E5M2, 1 - FP8 E4M3 | +| 9-13 | | must be zero | +| 14 | `overflow_mul` | overflow behavior for multiplication instructions: | +| | | 0 - generate infinity, 1 - generate maximum normal number | +| 15 | `overflow_cvt` | overflow behavior for conversion instructions: | +| | | 0 - generate infinity or NaN, 1 - generate maximum normal number | +| 16-22 | `lscale` | downscaling value | +| 23 | | must be zero | +| 24-31 | `nscale` | scaling value for conversions | +| 32-37 | `lscale2` | downscaling value for conversions of the second input stream | +| 38-63 | | must be zero | + +Bit patterns other than as described above are invalid. Passing an invalid value as an argument +to an FP8 intrinsic results in undefined behavior. + +The ACLE declares several helper types and intrinsics to +facilitate construction of `fpm` arguments. The helper intrinsics do not have +side effects and their return values depend only on their parameters. + +Passing an out of range argument to a helper intrinsic results in the intrinsic +returning an indeterminate value. Passing such an indeterminate value as +an argument to an FP8 intrinsic results in undefined behavior. + +The helper types and intrinsics are available after including any of +[``](#arm_neon.h), [``](#arm_sve.h), or +[``](#arm_sme.h). + +Note: where a helper intrinsic description refers to "updating the FP8 mode" it +means the intrinsic only modifies the bits of the input `fpm_t` parameter that +correspond to the new mode and returns the resulting value. No side effects +(such as changing processor state) occur. + +Individual FP8 intrinsics are described in their respective +Advanced SIMD (NEON), SVE, and SME sections. + +## Support enumerations + +```c +enum __ARM_FPM_FORMAT { + __ARM_FPM_E5M2, + __ARM_FPM_E4M3, +}; + +enum __ARM_FPM_OVERFLOW { + __ARM_FPM_INFNAN, + __ARM_FPM_SATURATE, +}; +``` + +## Helper intrinsics + +```c + fpm_t __arm_fpm_init(); +``` +Initializes a value, suitable for use as an `fpm` argument ("FP8 mode"). +The value corresponds to a mode of operation where: + * The source and destination operands are interpreted as E5M2. + * Overflow behavior is to yield infinity or NaN (depending on operation). + * No scaling occurs. + +```c + fpm_t __arm_set_fpm_src1_format(fpm_t fpm, enum __ARM_FPM_FORMAT format); + fpm_t __arm_set_fpm_src2_format(fpm_t fpm, enum __ARM_FPM_FORMAT format); +``` +Updates the FP8 mode to set the first or the second source operand format, +respectively. + +```c + fpm_t __arm_set_fpm_dst_format(fpm_t fpm, enum __ARM_FPM_FORMAT format); +``` +Updates the FP8 mode to set the destination format. + +```c + fpm_t __arm_set_fpm_overflow_cvt(fpm_t fpm, enum __ARM_FPM_OVERFLOW behavior); +``` +Updates the FP8 mode to set the overflow behavior for conversion operations. + +``` c + fpm_t __arm_set_fpm_overflow_mul(fpm_t fpm, enum __ARM_FPM_OVERFLOW behavior); +``` +Updates the FP8 mode to set the overflow behavior for multiplicative +operations. + +``` c + fpm_t __arm_set_fpm_lscale(fpm_t fpm, uint64_t scale); +``` +Updates the FP8 mode to set the downscaling value subtracted from: +* The product or the sum-of-products exponent, for multiplication instructions + with FP8 operands. +* The result exponent, for instructions converting the first FP8 + input data stream to other floating-point formats. + +The valid range for the `scale` parameter is [0, 127], inclusive. + +``` c + fpm_t __arm_set_fpm_lscale2(fpm_t fpm, uint64_t scale); +``` +Updates the FP8 mode to set the downscaling value subtracted from the +result exponent for instructions converting the second FP8 input data +stream to other floating-point formats. + +The valid range for the `scale` parameter is [0, 63], inclusive. + +``` c + fpm_t __arm_set_fpm_nscale(fpm_t fpm, int64_t scale); +``` +Updates the FP8 mode to set the scaling value added to the operand's +exponent for instructions converting other floating-point formats to an +FP8 format. + +The valid range for the `scale` parameter is [-128, 127], inclusive. + # Advanced SIMD (Neon) intrinsics ## Introduction @@ -5682,14 +5875,14 @@ a `uint16_t` result containing the sum. ### Vector data types -Vector data types are named as a lane type and a multiple. Lane type names are -based on the types defined in ``. For example,. `int16x4_t` is a -vector of four `int16_t` values. The base types are `int8_t`, `uint8_t`, -`int16_t`, `uint16_t`, `int32_t`, `uint32_t`, `int64_t`, -`uint64_t`, `float16_t`, `float32_t`, `poly8_t`, `poly16_t`, -`poly64_t`, `poly128_t` and `bfloat16_t`. The multiples are such that -the resulting vector types are 64-bit and 128-bit. In AArch64, `float64_t` is -also a base type. +Vector data types are named as a lane type and a multiple. Lane type +names are based on the types defined in ``. For example, +`int16x4_t` is a vector of four `int16_t` values. The base types are +`int8_t`, `uint8_t`, `int16_t`, `uint16_t`, `int32_t`, `uint32_t`, +`int64_t`, `uint64_t`, `float16_t`, `float32_t`, `poly8_t`, `poly16_t`, +`poly64_t`, `poly128_t`, and `bfloat16_t`. The multiples are such that the +resulting vector types are 64-bit and 128-bit. In AArch64, `float64_t` +and `mfloat8_t` are also base types. Not all types can be used in all operations. Generally, the operations available on a type correspond to the operations available on the @@ -5707,6 +5900,9 @@ bfloat types are only available when the `__bf16` type is defined, that is, when supported by the hardware. The bfloat types are all opaque types. That is to say they can only be used by intrinsics. +The FP8 types are all opaque types. That is to say they can only be used +by intrinsics. + ### Advanced SIMD Scalar data types AArch64 supports Advanced SIMD scalar operations that work on standard @@ -5745,6 +5941,8 @@ it. If the `__bf16` type is defined, `bfloat16_t` is defined as an alias for it. +If the `__mfp8` type is defined, `mfloat8_t` is defined as an alias for it. + `poly8_t`, `poly16_t`, `poly64_t` and `poly128_t` are defined as unsigned integer types. It is unspecified whether these are the same type as `uint8_t`, `uint16_t`, `uint64_t` and `uint128_t` for overloading and @@ -6500,6 +6698,7 @@ In addition, the header file defines the following scalar data types: | `float16_t` | equivalent to `__fp16` | | `float32_t` | equivalent to `float` | | `float64_t` | equivalent to `double` | +| `mfloat8_t` | equivalent to `__mfp8` | If the feature macro `__ARM_FEATURE_BF16_SCALAR_ARITHMETIC` is defined, [``](#arm_sve.h) also includes @@ -6514,7 +6713,7 @@ single vectors: | **Signed integer** | **Unsigned integer** | **Floating-point** | | | -------------------- | -------------------- | -------------------- | -------------------- | -| `svint8_t` | `svuint8_t` | | | +| `svint8_t` | `svuint8_t` | | `svmfloat8_t | | `svint16_t` | `svuint16_t` | `svfloat16_t` | `svbfloat16_t` | | `svint32_t` | `svuint32_t` | `svfloat32_t` | | | `svint64_t` | `svuint64_t` | `svfloat64_t` | | @@ -6534,17 +6733,17 @@ vectors, as follows: | **Signed integer** | **Unsigned integer** | **Floating-point** | | | -------------------- | -------------------- | --------------------- | -------------------- | -| `svint8x2_t` | `svuint8x2_t` | | | +| `svint8x2_t` | `svuint8x2_t` | | `svmfloat8x2_t` | | `svint16x2_t` | `svuint16x2_t` | `svfloat16x2_t` | `svbfloat16x2_t` | | `svint32x2_t` | `svuint32x2_t` | `svfloat32x2_t` | | | `svint64x2_t` | `svuint64x2_t` | `svfloat64x2_t` | | | | | | | -| `svint8x3_t` | `svuint8x3_t` | | | +| `svint8x3_t` | `svuint8x3_t` | | `svmfloat8x3_t` | | `svint16x3_t` | `svuint16x3_t` | `svfloat16x3_t` | `svbfloat16x3_t` | | `svint32x3_t` | `svuint32x3_t` | `svfloat32x3_t` | | | `svint64x3_t` | `svuint64x3_t` | `svfloat64x3_t` | | | | | | | -| `svint8x4_t` | `svuint8x4_t` | | | +| `svint8x4_t` | `svuint8x4_t` | | `svmfloat8x4_t` | | `svint16x4_t` | `svuint16x4_t` | `svfloat16x4_t` | `svbfloat16x4_t` | | `svint32x4_t` | `svuint32x4_t` | `svfloat32x4_t` | | | `svint64x4_t` | `svuint64x4_t` | `svfloat64x4_t` | | @@ -8938,7 +9137,7 @@ Broadcast indexed element within each quadword vector segment. ``` c // Variants are also available for: // _s8, _u16, _s16, _u32, _s32, _u64, _s64 - // _bf16, _f16, _f32, _f64 + // _mf8, _bf16, _f16, _f32, _f64 svuint8_t svdup_laneq[_u8](svuint8_t zn, uint64_t imm_idx); ``` @@ -8949,7 +9148,7 @@ Extract vector segment from each pair of quadword segments. ``` c // Variants are also available for: // _s8, _s16, _u16, _s32, _u32, _s64, _u64 - // _bf16, _f16, _f32, _f64 + // _mf8, _bf16, _f16, _f32, _f64 svuint8_t svextq[_u8](svuint8_t zdn, svuint8_t zm, uint64_t imm); ``` #### LD1D, LD1W @@ -8976,18 +9175,17 @@ Gather Load Quadword. ``` c // Variants are also available for: // _u8, _u16, _s16, _u32, _s32, _u64, _s64 - // _bf16, _f16, _f32, _f64 + // _mf8, _bf16, _f16, _f32, _f64 svint8_t svld1q_gather[_u64base]_s8(svbool_t pg, svuint64_t zn); svint8_t svld1q_gather[_u64base]_offset_s8(svbool_t pg, svuint64_t zn, int64_t offset); svint8_t svld1q_gather_[u64]offset[_s8](svbool_t pg, const int8_t *base, svuint64_t offset); - // Variants are also available for: // _u16, _u32, _s32, _u64, _s64 // _bf16, _f16, _f32, _f64 svint16_t svld1q_gather_[u64]index[_s16](svbool_t pg, const int16_t *base, svuint64_t index); svint8_t svld1q_gather[_u64base]_index_s8(svbool_t pg, svuint64_t zn, int64_t index); - ``` +``` #### LD2Q, LD3Q, LD4Q @@ -8996,7 +9194,7 @@ Contiguous load two, three, or four quadword structures. ``` c // Variants are also available for: // _u8, _u16, _s16, _u32, _s32, _u64, _s64 - // _bf16, _f16, _f32, _f64 + // _mf8, _bf16, _f16, _f32, _f64 svint8x2_t svld2q[_s8](svbool_t pg, const int8_t *rn); svint8x2_t svld2q_vnum[_s8](svbool_t pg, const int8_t *rn, uint64_t vnum); svint8x3_t svld3q[_s8](svbool_t pg, const int8_t *rn); @@ -9071,7 +9269,7 @@ Scatter store quadwords. ``` c // Variants are also available for: // _u8, _u16, _s16, _u32, _s32, _u64, _s64 - // _bf16, _f16, _f32, _f64 + // _mf8, _bf16, _f16, _f32, _f64 void svst1q_scatter[_u64base][_s8](svbool_t pg, svuint64_t zn, svint8_t data); void svst1q_scatter[_u64base]_offset[_s8](svbool_t pg, svuint64_t zn, int64_t offset, svint8_t data); void svst1q_scatter_[u64]offset[_s8](svbool_t pg, const uint8_t *base, svuint64_t offset, svint8_t data); @@ -9081,7 +9279,7 @@ Scatter store quadwords. // _bf16, _f16, _f32, _f64 void svst1q_scatter[_u64base]_index[_s8](svbool_t pg, svuint64_t zn, int64_t index, svint8_t data); void svst1q_scatter_[u64]index_[s16](svbool_t pg, const int16_t *base, svuint64_t index, svint16_t data); - ``` +``` #### ST2Q, ST3Q, ST4Q @@ -9090,7 +9288,7 @@ Contiguous store. ``` c // Variants are also available for: // _s8 _u16, _s16, _u32, _s32, _u64, _s64 - // _bf16, _f16, _f32, _f64 + // _mf8, _bf16, _f16, _f32, _f64 void svst2q[_u8](svbool_t pg, uint8_t *rn, svuint8x2_t zt); void svst2q_vnum[_u8](svbool_t pg, uint8_t *rn, int64_t vnum, svuint8x2_t zt); void svst3q[_u8](svbool_t pg, uint8_t *rn, svuint8x3_t zt); @@ -9106,7 +9304,7 @@ Programmable table lookup within each quadword vector segment (zeroing). ``` c // Variants are also available for: // _u8, _u16, _s16, _u32, _s32, _u64, _s64 - // _bf16, _f16, _f32, _f64 + // _mf8, _bf16, _f16, _f32, _f64 svint8_t svtblq[_s8](svint8_t zn, svuint8_t zm); ``` @@ -9117,7 +9315,7 @@ Programmable table lookup within each quadword vector segment (merging). ``` c // Variants are also available for: // _u8, _u16, _s16, _u32, _s32, _u64, _s64 - // _bf16, _f16, _f32, _f64 + // _mf8, _bf16, _f16, _f32, _f64 svint8_t svtbxq[_s8](svint8_t fallback, svint8_t zn, svuint8_t zm); ``` @@ -9128,7 +9326,7 @@ Concatenate elements within each pair of quadword vector segments. ``` c // Variants are also available for: // _s8, _u16, _s16, _u32, _s32, _u64, _s64 - // _bf16, _f16, _f32, _f64 + // _mf8, _bf16, _f16, _f32, _f64 svuint8_t svuzpq1[_u8](svuint8_t zn, svuint8_t zm); svuint8_t svuzpq2[_u8](svuint8_t zn, svuint8_t zm); ``` @@ -9140,7 +9338,7 @@ Interleave elements from halves of each pair of quadword vector segments. ``` c // Variants are also available for: // _s8, _u16, _s16, _u32, _s32, _u64, _s64 - // _bf16, _f16, _f32, _f64 + // _mf8, _bf16, _f16, _f32, _f64 svuint8_t svzipq1[_u8](svuint8_t zn, svuint8_t zm); svuint8_t svzipq2[_u8](svuint8_t zn, svuint8_t zm); ``` @@ -10204,7 +10402,7 @@ For example, in the `_u8` intrinsic, the return value and the `zd` parameter both have type `svuint8_t`. ``` c - // And similarly for u8. + // And similarly for u8, mf8 svint8_t svread_hor_za8[_s8]_m(svint8_t zd, svbool_t pg, uint64_t tile, uint32_t slice) __arm_streaming __arm_in("za"); @@ -10224,7 +10422,7 @@ parameter both have type `svuint8_t`. uint64_t tile, uint32_t slice) __arm_streaming __arm_in("za"); - // And similarly for s16, s32, s64, u8, u16, u32, u64, bf16, f16, f32, f64 + // And similarly for s16, s32, s64, u8, u16, u32, u64, mf8, bf16, f16, f32, f64 svint8_t svread_hor_za128[_s8]_m(svint8_t zd, svbool_t pg, uint64_t tile, uint32_t slice) __arm_streaming __arm_in("za"); @@ -10237,7 +10435,7 @@ the type of the `zn` parameter varies with the type suffix. For example, the `zn` parameter to the `_u8` intrinsic has type `svuint8_t`. ``` c - // And similarly for u8. + // And similarly for u8, mf8. void svwrite_hor_za8[_s8]_m(uint64_t tile, uint32_t slice, svbool_t pg, svint8_t zn) __arm_streaming __arm_inout("za"); @@ -10257,7 +10455,7 @@ the `zn` parameter to the `_u8` intrinsic has type `svuint8_t`. svint64_t zn) __arm_streaming __arm_inout("za"); - // And similarly for s16, s32, s64, u8, u16, u32, u64, bf16, f16, f32, f64 + // And similarly for s16, s32, s64, u8, u16, u32, u64, mf8, bf16, f16, f32, f64 void svwrite_hor_za128[_s8]_m(uint64_t tile, uint32_t slice, svbool_t pg, svint8_t zn) __arm_streaming __arm_inout("za"); @@ -11735,33 +11933,33 @@ Zero ZT0 Lookup table read with 2-bit and 4-bit indexes ``` c - // Variants are also available for _zt_u8, _zt_s16, _zt_u16, _zt_f16, + // Variants are also available for _zt_u8, _zt_mf8, _zt_s16, _zt_u16, _zt_f16, // _zt_bf16, _zt_s32, _zt_u32 and _zt_f32 svint8_t svluti2_lane_zt_s8(uint64_t zt, svuint8_t zn, uint64_t imm_idx) __arm_streaming __arm_in("zt0"); - // Variants are also available for _zt_u8, _zt_s16, _zt_u16, _zt_f16, + // Variants are also available for _zt_u8, _zt_mf8, _zt_s16, _zt_u16, _zt_f16, // _zt_bf16, _zt_s32, _zt_u32 and _zt_f32 svint8x2_t svluti2_lane_zt_s8_x2(uint64_t zt, svuint8_t zn, uint64_t imm_idx) __arm_streaming __arm_in("zt0"); - // Variants are also available for _zt_u8, _zt_s16, _zt_u16, _zt_f16, + // Variants are also available for _zt_u8, _zt_mf8, _zt_s16, _zt_u16, _zt_f16, // _zt_bf16, _zt_s32, _zt_u32 and _zt_f32 svint8x4_t svluti2_lane_zt_s8_x4(uint64_t zt, svuint8_t zn, uint64_t imm_idx) __arm_streaming __arm_in("zt0"); - // Variants are also available for _zt_u8, _zt_s16, _zt_u16, _zt_f16, + // Variants are also available for _zt_u8, _zt_mf8, _zt_s16, _zt_u16, _zt_f16, // _zt_bf16, _zt_s32, _zt_u32 and _zt_f32 svint8_t svluti4_lane_zt_s8(uint64_t zt, svuint8_t zn, uint64_t imm_idx) __arm_streaming __arm_in("zt0"); - // Variants are also available for _zt_u8, _zt_s16, _zt_u16, _zt_f16, + // Variants are also available for _zt_u8, _zt_mf8, _zt_s16, _zt_u16, _zt_f16, // _zt_bf16, _zt_s32, _zt_u32 and _zt_f32 svint8x2_t svluti4_lane_zt_s8_x2(uint64_t zt, svuint8_t zn, uint64_t imm_idx) @@ -11780,84 +11978,84 @@ Lookup table read with 2-bit and 4-bit indexes Move multi-vectors to/from ZA ``` c - // Variants are also available for _za8_u8, _za16_s16, _za16_u16, + // Variants are also available for _za8_u8, _za8_mf8, _za16_s16, _za16_u16, // _za16_f16, _za16_bf16, _za32_s32, _za32_u32, _za32_f32, // _za64_s64, _za64_u64 and _za64_f64 svint8x2_t svread_hor_za8_s8_vg2(uint64_t tile, uint32_t slice) __arm_streaming __arm_in("za"); - // Variants are also available for _za8_u8, _za16_s16, _za16_u16, + // Variants are also available for _za8_u8, _za8_mf8, _za16_s16, _za16_u16, // _za16_f16, _za16_bf16, _za32_s32, _za32_u32, _za32_f32, // _za64_s64, _za64_u64 and _za64_f64 svint8x4_t svread_hor_za8_s8_vg4(uint64_t tile, uint32_t slice) __arm_streaming __arm_in("za"); - // Variants are also available for _za8_u8, _za16_s16, _za16_u16, + // Variants are also available for _za8_u8, _za8_mf8, _za16_s16, _za16_u16, // _za16_f16, _za16_bf16, _za32_s32, _za32_u32, _za32_f32, // _za64_s64, _za64_u64 and _za64_f64 svint8x2_t svread_ver_za8_s8_vg2(uint64_t tile, uint32_t slice) __arm_streaming __arm_in("za"); - // Variants are also available for _za8_u8, _za16_s16, _za16_u16, + // Variants are also available for _za8_u8, _za8_mf8, _za16_s16, _za16_u16, // _za16_f16, _za16_bf16, _za32_s32, _za32_u32, _za32_f32, // _za64_s64, _za64_u64 and _za64_f64 svint8x4_t svread_ver_za8_s8_vg4(uint64_t tile, uint32_t slice) __arm_streaming __arm_in("za"); - // Variants are also available for _za8_u8, _za16_s16, _za16_u16, + // Variants are also available for _za8_u8, _za8_mf8, _za16_s16, _za16_u16, // _za16_f16, _za16_bf16, _za32_s32, _za32_u32, _za32_f32, // _za64_s64, _za64_u64 and _za64_f64 svint8x2_t svread_za8_s8_vg1x2(uint32_t slice) __arm_streaming __arm_in("za"); - // Variants are also available for _za8_u8, _za16_s16, _za16_u16, + // Variants are also available for _za8_u8, _za8_mf8, _za16_s16, _za16_u16, // _za16_f16, _za16_bf16, _za32_s32, _za32_u32, _za32_f32, // _za64_s64, _za64_u64 and _za64_f64 svint8x4_t svread_za8_s8_vg1x4(uint32_t slice) __arm_streaming __arm_in("za"); - // Variants are also available for _za8[_u8], _za16[_s16], _za16[_u16], + // Variants are also available for _za8[_u8], _za8[_mf8], _za16[_s16], _za16[_u16], // _za16[_f16], _za16[_bf16], _za32[_s32], _za32[_u32], _za32[_f32], // _za64[_s64], _za64[_u64] and _za64[_f64] void svwrite_hor_za8[_s8]_vg2(uint64_t tile, uint32_t slice, svint8x2_t zn) __arm_streaming __arm_inout("za"); - // Variants are also available for _za8[_u8], _za16[_s16], _za16[_u16], + // Variants are also available for _za8[_u8], _za8[_mf8], _za16[_s16], _za16[_u16], // _za16[_f16], _za16[_bf16], _za32[_s32], _za32[_u32], _za32[_f32], // _za64[_s64], _za64[_u64] and _za64[_f64] void svwrite_hor_za8[_s8]_vg4(uint64_t tile, uint32_t slice, svint8x4_t zn) __arm_streaming __arm_inout("za"); - // Variants are also available for _za8[_u8], _za16[_s16], _za16[_u16], + // Variants are also available for _za8[_u8], _za8[_mf8], _za16[_s16], _za16[_u16], // _za16[_f16], _za16[_bf16], _za32[_s32], _za32[_u32], _za32[_f32], // _za64[_s64], _za64[_u64] and _za64[_f64] void svwrite_ver_za8[_s8]_vg2(uint64_t tile, uint32_t slice, svint8x2_t zn) __arm_streaming __arm_inout("za"); - // Variants are also available for _za8[_u8], _za16[_s16], _za16[_u16], + // Variants are also available for _za8[_u8], _za8[_mf8], _za16[_s16], _za16[_u16], // _za16[_f16], _za16[_bf16], _za32[_s32], _za32[_u32], _za32[_f32], // _za64[_s64], _za64[_u64] and _za64[_f64] void svwrite_ver_za8[_s8]_vg4(uint64_t tile, uint32_t slice, svint8x4_t zn) __arm_streaming __arm_inout("za"); - // Variants are also available for _za8[_u8], _za16[_s16], _za16[_u16], + // Variants are also available for _za8[_u8], _za8[_mf8], _za16[_s16], _za16[_u16], // _za16[_f16], _za16[_bf16], _za32[_s32], _za32[_u32], _za32[_f32], // _za64[_s64], _za64[_u64] and _za64[_f64] void svwrite_za8[_s8]_vg1x2(uint32_t slice, svint8x2_t zn) __arm_streaming __arm_inout("za"); - // Variants are also available for _za8[_u8], _za16[_s16], _za16[_u16], + // Variants are also available for _za8[_u8], za8[_mf8], _za16[_s16], _za16[_u16], // _za16[_f16], _za16[_bf16], _za32[_s32], _za32[_u32], _za32[_f32], // _za64[_s64], _za64[_u64] and _za64[_f64] void svwrite_za8[_s8]_vg1x4(uint32_t slice, svint8x4_t zn) @@ -11909,13 +12107,13 @@ Multi-vector clamp to minimum/maximum vector Multi-vector conditionally select elements from two vectors ``` c - // Variants are also available for _s8_x2, _u16_x2, _s16_x2, _f16_x2, + // Variants are also available for _s8_x2, _mf8_x2, _u16_x2, _s16_x2, _f16_x2, // _bf16_x2, _u32_x2, _s32_x2, _f32_x2, _u64_x2, _s64_x2 and _f64_x2 svuint8x2_t svsel[_u8_x2](svcount_t png, svuint8x2_t zn, svuint8x2_t zm) __arm_streaming; - // Variants are also available for _s8_x4, _u16_x4, _s16_x4, _f16_x4, + // Variants are also available for _s8_x4, _mf8_x4, _u16_x4, _s16_x4, _f16_x4, // _bf16_x4, _u32_x4, _s32_x4, _f32_x4, _u64_x4, _s64_x4 and _f64_x4 svuint8x4_t svsel[_u8_x4](svcount_t png, svuint8x4_t zn, svuint8x4_t zm) __arm_streaming; @@ -12065,12 +12263,12 @@ Multi-vector pack/unpack Multi-vector zip. ``` c - // Variants are also available for _u8_x2, _u16_x2, _s16_x2, _f16_x2, + // Variants are also available for _u8_x2, _mf8_x2, _u16_x2, _s16_x2, _f16_x2, // _bf16_x2, _u32_x2, _s32_x2, _f32_x2, _u64_x2, _s64_x2 and _f64_x2 svint8x2_t svzip[_s8_x2](svint8x2_t zn) __arm_streaming; - // Variants are also available for _u8_x4, _u16_x4, _s16_x4, _f16_x4, + // Variants are also available for _u8_x4, _mf8_x4, _u16_x4, _s16_x4, _f16_x4, // _bf16_x4, _u32_x4, _s32_x4, _f32_x4, _u64_x4, _s64_x4 and _f64_x4 svint8x4_t svzip[_s8_x4](svint8x4_t zn) __arm_streaming; ``` @@ -12080,12 +12278,12 @@ element types. ``` c - // Variants are also available for _u8_x2, _u16_x2, _s16_x2, _f16_x2, + // Variants are also available for _u8_x2, _mf8_x2, _u16_x2, _s16_x2, _f16_x2, // _bf16_x2, _u32_x2, _s32_x2, _f32_x2, _u64_x2, _s64_x2 and _f64_x2 svint8x2_t svzipq[_s8_x2](svint8x2_t zn) __arm_streaming; - // Variants are also available for _u8_x4, _u16_x4, _s16_x4, _f16_x4, + // Variants are also available for _u8_x4, _mf8_x4, _u16_x4, _s16_x4, _f16_x4, // _bf16_x4, _u32_x4, _s32_x4, _f32_x4, _u64_x4, _s64_x4 and _f64_x4 svint8x4_t svzipq[_s8_x4](svint8x4_t zn) __arm_streaming; ``` @@ -12095,12 +12293,12 @@ element types. Multi-vector unzip. ``` c - // Variants are also available for _u8_x2, _u16_x2, _s16_x2, _f16_x2, + // Variants are also available for _u8_x2, _mf8_x2, _u16_x2, _s16_x2, _f16_x2, // _bf16_x2, _u32_x2, _s32_x2, _f32_x2, _u64_x2, _s64_x2 and _f64_x2 svint8x2_t svuzp[_s8_x2](svint8x2_t zn) __arm_streaming; - // Variants are also available for _u8_x4, _u16_x4, _s16_x4, _f16_x4, + // Variants are also available for _u8_x4, _mf8_x4, _u16_x4, _s16_x4, _f16_x4, // _bf16_x4, _u32_x4, _s32_x4, _f32_x4, _u64_x4, _s64_x4 and _f64_x4 svint8x4_t svuzp[_s8_x4](svint8x4_t zn) __arm_streaming; ``` @@ -12109,12 +12307,12 @@ The `svuzpq` intrinsics operate on quad-words, but for convenience accept all element types. ``` c - // Variants are also available for _u8_x2, _u16_x2, _s16_x2, _f16_x2, + // Variants are also available for _u8_x2, _mf8_x2, _u16_x2, _s16_x2, _f16_x2, // _bf16_x2, _u32_x2, _s32_x2, _f32_x2, _u64_x2, _s64_x2 and _f64_x2 svint8x2_t svuzpq[_s8_x2](svint8x2_t zn) __arm_streaming; - // Variants are also available for _u8_x4, _u16_x4, _s16_x4, _f16_x4, + // Variants are also available for _u8_x4, _mf8_x4, _u16_x4, _s16_x4, _f16_x4, // _bf16_x4, _u32_x4, _s32_x4, _f32_x4, _u64_x4, _s64_x4 and _f64_x4 svint8x4_t svuzpq[_s8_x4](svint8x4_t zn) __arm_streaming; ``` @@ -12341,20 +12539,20 @@ Multi-vector dot-product (2-way) Contiguous load to multi-vector ``` c - // Variants are also available for _s8 + // Variants are also available for _s8, _mf8 svuint8x2_t svld1[_u8]_x2(svcount_t png, const uint8_t *rn); - // Variants are also available for _s8 + // Variants are also available for _s8, _mf8 svuint8x4_t svld1[_u8]_x4(svcount_t png, const uint8_t *rn); - // Variants are also available for _s8 + // Variants are also available for _s8, _mf8 svuint8x2_t svld1_vnum[_u8]_x2(svcount_t png, const uint8_t *rn, int64_t vnum); - // Variants are also available for _s8 + // Variants are also available for _s8, _mf8 svuint8x4_t svld1_vnum[_u8]_x4(svcount_t png, const uint8_t *rn, int64_t vnum); @@ -12418,20 +12616,20 @@ Contiguous load to multi-vector Contiguous non-temporal load to multi-vector ``` c - // Variants are also available for _s8 + // Variants are also available for _s8, _mf8 svuint8x2_t svldnt1[_u8]_x2(svcount_t png, const uint8_t *rn); - // Variants are also available for _s8 + // Variants are also available for _s8, _mf8 svuint8x4_t svldnt1[_u8]_x4(svcount_t png, const uint8_t *rn); - // Variants are also available for _s8 + // Variants are also available for _s8, _mf8 svuint8x2_t svldnt1_vnum[_u8]_x2(svcount_t png, const uint8_t *rn, int64_t vnum); - // Variants are also available for _s8 + // Variants are also available for _s8, _mf8 svuint8x4_t svldnt1_vnum[_u8]_x4(svcount_t png, const uint8_t *rn, int64_t vnum); @@ -12555,19 +12753,19 @@ Reverse doublewords in elements. // All the intrinsics below are [SME] // Variants are available for: // _s8, _s16, _u16, _s32, _u32, _s64, _u64 - // _bf16, _f16, _f32, _f64 + // _mf8, _bf16, _f16, _f32, _f64 svuint8_t svrevd[_u8]_m(svuint8_t zd, svbool_t pg, svuint8_t zn); // Variants are available for: // _s8, _s16, _u16, _s32, _u32, _s64, _u64 - // _bf16, _f16, _f32, _f64 + // _mf8, _bf16, _f16, _f32, _f64 svuint8_t svrevd[_u8]_z(svbool_t pg, svuint8_t zn); // Variants are available for: // _s8, _s16, _u16, _s32, _u32, _s64, _u64 - // _bf16, _f16, _f32, _f64 + // _mf8, _bf16, _f16, _f32, _f64 svuint8_t svrevd[_u8]_x(svbool_t pg, svuint8_t zn); ``` @@ -12602,20 +12800,20 @@ Multi-vector saturating rounding shift right unsigned narrow and interleave Contiguous store of multi-vector operand ``` c - // Variants are also available for _s8_x2 + // Variants are also available for _s8_x2, _mf8_x2 void svst1[_u8_x2](svcount_t png, uint8_t *rn, svuint8x2_t zt); - // Variants are also available for _s8_x4 + // Variants are also available for _s8_x4, _mf8_x4 void svst1[_u8_x4](svcount_t png, uint8_t *rn, svuint8x4_t zt); - // Variants are also available for _s8_x2 + // Variants are also available for _s8_x2, _mf8_x2 void svst1_vnum[_u8_x2](svcount_t png, uint8_t *rn, int64_t vnum, svuint8x2_t zt); - // Variants are also available for _s8_x4 + // Variants are also available for _s8_x4, _mf8_x4 void svst1_vnum[_u8_x4](svcount_t png, uint8_t *rn, int64_t vnum, svuint8x4_t zt); @@ -12679,20 +12877,20 @@ Contiguous store of multi-vector operand Contiguous non-temporal store of multi-vector operand ``` c - // Variants are also available for _s8_x2 + // Variants are also available for _s8_x2, _mf8_x2 void svstnt1[_u8_x2](svcount_t png, uint8_t *rn, svuint8x2_t zt); - // Variants are also available for _s8_x4 + // Variants are also available for _s8_x4, _mf8_x4 void svstnt1[_u8_x4](svcount_t png, uint8_t *rn, svuint8x4_t zt); - // Variants are also available for _s8_x2 + // Variants are also available for _s8_x2, _mf8_x2 void svstnt1_vnum[_u8_x2](svcount_t png, uint8_t *rn, int64_t vnum, svuint8x2_t zt); - // Variants are also available for _s8_x4 + // Variants are also available for _s8_x4, _mf8_x4 void svstnt1_vnum[_u8_x4](svcount_t png, uint8_t *rn, int64_t vnum, svuint8x4_t zt); @@ -12858,6 +13056,384 @@ Lookup table read with 4-bit indexes and 8-bit elements. svint8x4_t svluti4_zt_s8_x4(uint64_t zt0, svuint8x2_t zn) __arm_streaming __arm_in("zt0"); ``` +### SVE2 and SME2 modal 8-bit floating-point intrinsics + +The intrinsics in this section are defined by the header file +[``](#arm_sve.h) when `__ARM_FEATURE_FP8` is defined, +and `__ARM_FEATURE_SVE2` or `__ARM_FEATURE_SME2` is defined. Individual +intrinsics may have additional target feature requirements. + +#### BF1CVT, BF2CVT, F1CVT, F2CVT + +8-bit floating-point convert to half-precision and BFloat16. +``` c + // Variants are also available for: _bf16 + svfloat16_t svcvt1_f16[_mf8]_fpm(svmfloat8_t zn, fpm_t fpm); + svfloat16_t svcvt2_f16[_mf8]_fpm(svmfloat8_t zn, fpm_t fpm); +``` + +#### BF1CVTLT, BF2CVTLT, F1CVTLT, F2CVTLT + +8-bit floating-point convert to half-precision and BFloat16 (top). +``` c + // Variants are also available for: _bf16 + svfloat16_t svcvtlt1_f16[_mf8]_fpm(svmfloat8_t zn, fpm_t fpm); + svfloat16_t svcvtlt2_f16[_mf8]_fpm(svmfloat8_t zn, fpm_t fpm); +``` + +#### BFCVTN, FCVTN + +Half-precision and BFloat16 convert, narrow and interleave to 8-bit +floating-point. +``` c + // Variant is also available for: _bf16_x2 + svmfloat8_t svcvtn_mf8[_f16_x2]_fpm(svfloat16x2_t zn, fpm_t fpm); +``` + +#### FCVTNT, FCVTNB + +Single-precision convert, narrow and interleave to 8-bit floating-point (top and bottom). +``` c + svmfloat8_t svcvtnt_mf8[_f32_x2]_fpm(svmfloat8_t zd, svfloat32x2_t zn, fpm_t fpm); + svmfloat8_t svcvtnb_mf8[_f32_x2]_fpm(svmfloat8_t zd, svfloat32x2_t zn, fpm_t fpm); +``` + +#### FDOT (4-way, vectors) + +8-bit floating-point dot product to single-precision. +``` c + // Only if (__ARM_FEATURE_SVE2 && __ARM_FEATURE_FP8DOT4) || __ARM_FEATURE_SSVE_FP8DOT4 + svfloat32_t svdot[_f32_mf8]_fpm(svfloat32_t zda, svmfloat8_t zn, svmfloat8_t zm, fpm_t fpm); +``` + +#### FDOT (4-way, indexed) + +8-bit floating-point indexed dot product to single-precision. +``` c + // Only if (__ARM_FEATURE_SVE2 && __ARM_FEATURE_FP8DOT4) || __ARM_FEATURE_SSVE_FP8DOT4 + svfloat32_t svdot_lane[_f32_mf8]_fpm(svfloat32_t zda, svmfloat8_t zn, svmfloat8_t zm, + uint64_t imm0_3, fpm_t fpm); +``` + +#### FDOT (2-way, vectors, FP8 to FP16) + +8-bit floating-point dot product to half-precision. +``` c + // Only if (__ARM_FEATURE_SVE2 && __ARM_FEATURE_FP8DOT2) || __ARM_FEATURE_SSVE_FP8DOT2 + svfloat16_t svdot[_f16_mf8]_fpm(svfloat16_t zda, svmfloat8_t zn, svmfloat8_t zm, fpm_t fpm); +``` + +#### FDOT (2-way, indexed, FP8 to FP16) + +8-bit floating-point dot product to half-precision. +``` c + // Only if (__ARM_FEATURE_SVE2 && __ARM_FEATURE_FP8DOT2) || __ARM_FEATURE_SSVE_FP8DOT2 + svfloat16_t svdot_lane[_f16_mf8]_fpm(svfloat16_t zda, svmfloat8_t zn, svmfloat8_t zm, + uint64_t imm0_7, fpm_t fpm); +``` + +#### FMLALB (vectors, FP8 to FP16) + +8-bit floating-point multiply-add long to half-precision (bottom). +``` c + // Only if (__ARM_FEATURE_SVE2 && __ARM_FEATURE_FP8FMA) || __ARM_FEATURE_SSVE_FP8FMA + svfloat16_t svmlalb[_f16_mf8]_fpm(svfloat16_t zda, svmfloat8_t zn, svmfloat8_t zm, fpm_t fpm); + svfloat16_t svmlalb[_n_f16_mf8]_fpm(svfloat16_t zda, svmfloat8_t zn, mfloat8_t zm, fpm_t fpm); +``` + +#### FMLALB (indexed, FP8 to FP16) + +8-bit floating-point multiply-add long to half-precision (bottom, indexed). +``` c + // Only if (__ARM_FEATURE_SVE2 && __ARM_FEATURE_FP8FMA) || __ARM_FEATURE_SSVE_FP8FMA + svfloat16_t svmlalb_lane[_f16_mf8]_fpm(svfloat16_t zda, svmfloat8_t zn, svmfloat8_t zm, + uint64_t imm0_15, fpm_t fpm); +``` + +#### FMLALLBB (vectors) + +8-bit floating-point multiply-add long long to single-precision (bottom bottom). +``` c + // Only if (__ARM_FEATURE_SVE2 && __ARM_FEATURE_FP8FMA) || __ARM_FEATURE_SSVE_FP8FMA + svfloat32_t svmlallbb[_f32_mf8]_fpm(svfloat32_t zda, svmfloat8_t zn, svmfloat8_t zm, fpm_t fpm); + svfloat32_t svmlallbb[_n_f32_mf8]_fpm(svfloat32_t zda, svmfloat8_t zn, mfloat8_t zm, fpm_t fpm); +``` + +#### FMLALLBB (indexed) + +8-bit floating-point multiply-add long long to single-precision (bottom bottom, indexed). +``` c + // Only if (__ARM_FEATURE_SVE2 && __ARM_FEATURE_FP8FMA) || __ARM_FEATURE_SSVE_FP8FMA + svfloat32_t svmlallbb_lane[_f32_mf8]_fpm(svfloat32_t zda, svmfloat8_t zn, svmfloat8_t zm, + uint64_t imm0_15, fpm_t fpm); +``` + +#### FMLALLBT (vectors) + +8-bit floating-point multiply-add long long to single-precision (bottom top). +``` c + // Only if (__ARM_FEATURE_SVE2 && __ARM_FEATURE_FP8FMA) || __ARM_FEATURE_SSVE_FP8FMA + svfloat32_t svmlallbt[_f32_mf8]_fpm(svfloat32_t zda, svmfloat8_t zn, svmfloat8_t zm, fpm_t fpm); + svfloat32_t svmlallbt[_n_f32_mf8]_fpm(svfloat32_t zda, svmfloat8_t zn, mfloat8_t zm, fpm_t fpm); +``` + +#### FMLALLBT (indexed) + +8-bit floating-point multiply-add long long to single-precision (bottom top, indexed). +``` c + // Only if (__ARM_FEATURE_SVE2 && __ARM_FEATURE_FP8FMA) || __ARM_FEATURE_SSVE_FP8FMA + svfloat32_t svmlallbt_lane[_f32_mf8]_fpm(svfloat32_t zda, svmfloat8_t zn, svmfloat8_t zm, + uint64_t imm0_15, fpm_t fpm); +``` + +#### FMLALLTB (vectors) + +8-bit floating-point multiply-add long long to single-precision (top bottom). +``` c + // Only if (__ARM_FEATURE_SVE2 && __ARM_FEATURE_FP8FMA) || __ARM_FEATURE_SSVE_FP8FMA + svfloat32_t svmlalltb[_f32_mf8]_fpm(svfloat32_t zda, svmfloat8_t zn, svmfloat8_t zm, fpm_t fpm); + svfloat32_t svmlalltb[_n_f32_mf8]_fpm(svfloat32_t zda, svmfloat8_t zn, mfloat8_t zm, fpm_t fpm); +``` + +#### FMLALLTB (indexed) + +8-bit floating-point multiply-add long long to single-precision (top bottom, indexed). +``` c + // Only if (__ARM_FEATURE_SVE2 && __ARM_FEATURE_FP8FMA) || __ARM_FEATURE_SSVE_FP8FMA + svfloat32_t svmlalltb_lane[_f32_mf8]_fpm(svfloat32_t zda, svmfloat8_t zn, svmfloat8_t zm, + uint64_t imm0_15, fpm_t fpm); +``` + +#### FMLALLTT (vectors) + +8-bit floating-point multiply-add long long to single-precision (top top). +``` c + // Only if (__ARM_FEATURE_SVE2 && __ARM_FEATURE_FP8FMA) || __ARM_FEATURE_SSVE_FP8FMA + svfloat32_t svmlalltt[_f32_mf8]_fpm(svfloat32_t zda, svmfloat8_t zn, svmfloat8_t zm, fpm_t fpm); + svfloat32_t svmlalltt[_n_f32_mf8]_fpm(svfloat32_t zda, svmfloat8_t zn, mfloat8_t zm, fpm_t fpm); +``` + +#### FMLALLTT (indexed) + +8-bit floating-point multiply-add long long to single-precision (top top, indexed). +``` c + // Only if (__ARM_FEATURE_SVE2 && __ARM_FEATURE_FP8FMA) || __ARM_FEATURE_SSVE_FP8FMA + svfloat32_t svmlalltt_lane[_f32_mf8]_fpm(svfloat32_t zda, svmfloat8_t zn, svmfloat8_t zm, + uint64_t imm0_15, fpm_t fpm); +``` + +#### FMLALT (vectors, FP8 to FP16) + +8-bit floating-point multiply-add long to half-precision (top). +```c + // Only if (__ARM_FEATURE_SVE2 && __ARM_FEATURE_FP8FMA) || __ARM_FEATURE_SSVE_FP8FMA + svfloat16_t svmlalt[_f16_mf8]_fpm(svfloat16_t zda, svmfloat8_t zn, svmfloat8_t zm, fpm_t fpm); + svfloat16_t svmlalt[_n_f16_mf8]_fpm(svfloat16_t zda, svmfloat8_t zn, mfloat8_t zm, fpm_t fpm); +``` + +#### FMLALT (indexed, FP8 to FP16) + +8-bit floating-point multiply-add long to half-precision (top, indexed). +```c + // Only if (__ARM_FEATURE_SVE2 && __ARM_FEATURE_FP8FMA) || __ARM_FEATURE_SSVE_FP8FMA + svfloat16_t svmlalt_lane[_f16_mf8]_fpm(svfloat16_t zda, svmfloat8_t zn, svmfloat8_t zm, + uint64_t imm0_15, fpm_t fpm); +``` + +### SME2 modal 8-bit floating-point intrinsics + +The intrinsics in this section are defined by the header file +[``](#arm_sme.h) when `__ARM_FEATURE_SME2` and +`__ARM_FEATURE_FP8` are defined. Individual intrinsics may have +additional target feature requirements. + +#### BF1CVT, BF2CVT, F1CVT, F2CVT + +8-bit floating-point convert to half-precision or BFloat16. +``` c + // Variant is also available for: _bf16[_mf8]_x2 + svfloat16x2_t svcvt1_f16[_mf8]_x2_fpm(svmfloat8_t zn, fpm_t fpm) __arm_streaming; + svfloat16x2_t svcvt2_f16[_mf8]_x2_fpm(svmfloat8_t zn, fpm_t fpm) __arm_streaming; +``` + +#### BF1CVTL, BF2CVTL, F1CVTL, F2CVTL + +8-bit floating-point convert to deinterleaved half-precision or BFloat16. +``` c + // Variant is also available for: _bf16[_mf8]_x2 + svfloat16x2_t svcvtl1_f16[_mf8]_x2_fpm(svmfloat8_t zn, fpm_t fpm) __arm_streaming; + svfloat16x2_t svcvtl2_f16[_mf8]_x2_fpm(svmfloat8_t zn, fpm_t fpm) __arm_streaming; +``` + +#### BFCVT, FCVT + +Convert to packed 8-bit floating-point format. +``` c + // Variants are also available for: _mf8[_bf16_x2] and _mf8[_f32_x4] + svmfloat8_t svcvt_mf8[_f16_x2]_fpm(svfloat16x2_t zn, fpm_t fpm) __arm_streaming; +``` + +#### FCVTN + +Convert to interleaved 8-bit floating-point format. +``` c + svmfloat8_t svcvtn_mf8[_f32_x4]_fpm(svfloat32x4_t zn, fpm_t fpm) __arm_streaming; +``` + +#### FSCALE +``` c + // Variants are also available for: + // [_single_f32_x2], [_single_f64_x2], + // [_single_f16_x4], [_single_f32_x4] and [_single_f64_x4] + svfloat16x2_t svscale[_single_f16_x2](svfloat16x2_t zd, svint16_t zm) __arm_streaming; + + // Variants are also available for: + // [_f32_x2], [_f64_x2], + // [_f16_x4], [_f32_x4] and [_f64_x4] + svfloat16x2_t svscale[_f16_x2](svfloat16x2_t zd, svint16x2_t zm) __arm_streaming; +``` + +#### FDOT + +Multi-vector 8-bit floating-point dot-product. +``` c + // Available variants are: _za16 if __ARM_FEATURE_SME_F8F16 != 0 + // _za32 if __ARM_FEATURE_SME_F8F32 != 0 + void svdot_lane_za16[_mf8]_vg1x2_fpm(uint32_t slice, svmfloat8x2_t zn, + svmfloat8_t zm, uint64_t imm_idx, + fpm_t fpm) __arm_streaming __arm_inout("za"); + + void svdot_lane_za16[_mf8]_vg1x4_fpm(uint32_t slice, svmfloat8x4_t zn, + svmfloat8_t zm, uint64_t imm_idx, + fpm_t fpm) __arm_streaming __arm_inout("za"); + + void svdot[_single]_za16[_mf8]_vg1x2_fpm(uint32_t slice, svmfloat8x2_t zn, + svmfloat8_t zm, fpm_t fpm) + __arm_streaming __arm_inout("za"); + + void svdot[_single]_za16[_mf8]_vg1x4_fpm(uint32_t slice, svmfloat8x4_t zn, + svmfloat8_t zm, fpm_t fpm) + __arm_streaming __arm_inout("za"); + + void svdot_za16[_mf8]_vg1x2_fpm(uint32_t slice, svmfloat8x2_t zn, svmfloat8x2_t zm, + fpm_t fpm) __arm_streaming __arm_inout("za"); + + void svdot_za16[_mf8]_vg1x4_fpm(uint32_t slice, svmfloat8x4_t zn, svmfloat8x4_t zm, + fpm_t fpm) __arm_streaming __arm_inout("za"); +``` + +#### FVDOT + +Multi-vector 8-bit floating-point vertical dot-product by indexed element to +half-precision. +``` c + // Only if __ARM_FEATURE_SME_F8F16 != 0 + void svvdot_lane_za16[_mf8]_vg1x2_fpm(uint32_t slice, svmfloat8x2_t zn, + svmfloat8_t zm, uint64_t imm_idx, + fpm_t fpm) __arm_streaming __arm_inout("za"); +``` + +#### FVDOTB, FVDOTT + +Multi-vector 8-bit floating-point vertical dot-product. +``` c + // Only if __ARM_FEATURE_SME_F8F32 != 0 + void svvdott_lane_za32[_mf8]_vg1x4_fpm(uint32_t slice, svmfloat8x2_t zn, + svmfloat8_t zm, uint64_t imm_idx, + fpm_t fpm) __arm_streaming __arm_inout("za"); + + void svvdotb_lane_za32[_mf8]_vg1x4_fpm(uint32_t slice, svmfloat8x2_t zn, + svmfloat8_t zm, uint64_t imm_idx, + fpm_t fpm) __arm_streaming __arm_inout("za"); +``` + +#### FMLAL + +Multi-vector 8-bit floating-point multiply-add long. +``` c + // Only if __ARM_FEATURE_SME_F8F16 != 0 + void svmla_lane_za16[_mf8]_vg2x1_fpm(uint32_t slice, svmfloat8_t zn, + svmfloat8_t zm, uint64_t imm_idx, + fpm_t fpm) __arm_streaming __arm_inout("za"); + + void svmla_lane_za16[_mf8]_vg2x2_fpm(uint32_t slice, svmfloat8x2_t zn, + svmfloat8_t zm, uint64_t imm_idx, + fpm_t fpm) __arm_streaming __arm_inout("za"); + + void svmla_lane_za16[_mf8]_vg2x4_fpm(uint32_t slice, svmfloat8x4_t zn, + svmfloat8_t zm, uint64_t imm_idx + fpm_t fpm) __arm_streaming __arm_inout("za"); + + void svmla[_single]_za16[_mf8]_vg2x1_fpm(uint32_t slice, svmfloat8_t zn, + svmfloat8_t zm, fpm_t fpm) + __arm_streaming __arm_inout("za"); + + void svmla[_single]_za16[_mf8]_vg2x2_fpm(uint32_t slice, svmfloat8x2_t zn, + svmfloat8_t zm, fpm_t fpm) + __arm_streaming __arm_inout("za"); + + void svmla[_single]_za16[_mf8]_vg2x4_fpm(uint32_t slice, svmfloat8x4_t zn, + svmfloat8_t zm, fpm_t fpm) + __arm_streaming __arm_inout("za"); + + void svmla_za16[_mf8]_vg2x2_fpm(uint32_t slice, svmfloat8x2_t zn, svmfloat8x2_t zm, + fpm_t fpm) __arm_streaming __arm_inout("za"); + + void svmla_za16[_mf8]_vg2x4_fpm(uint32_t slice, svmfloat8x4_t zn, svmfloat8x4_t zm, + fpm_t fpm) __arm_streaming __arm_inout("za"); +``` + +#### FMLALL + +Multi-vector 8-bit floating-point multiply-add long. +``` c +// Only if __ARM_FEATURE_SME_F8F32 != 0 + void svmla_lane_za32[_mf8]_vg4x1_fpm(uint32_t slice, svmfloat8_t zn, + svmfloat8_t zm, uint64_t imm_idx, + fpm_t fpm)__arm_streaming __arm_inout("za"); + + void svmla_lane_za32[_mf8]_vg4x2_fpm(uint32_t slice, svmfloat8x2_t zn, + svmfloat8_t zm, uint64_t imm_idx, + fpm_t fpm)__arm_streaming __arm_inout("za"); + + void svmla_lane_za32[_mf8]_vg4x4_fpm(uint32_t slice, svmfloat8x4_t zn, + svmfloat8_t zm, uint64_t imm_idx, + fpm_t fpm)__arm_streaming __arm_inout("za"); + + void svmla[_single]_za32[_mf8]_vg4x1_fpm(uint32_t slice, svmfloat8_t zn, + svmfloat8_t zm, fpm_t fpm) + __arm_streaming __arm_inout("za"); + + void svmla[_single]_za32[_mf8]_vg4x2_fpm(uint32_t slice, svmfloat8x2_t zn, + svmfloat8_t zm, fpm_t fpm) + __arm_streaming __arm_inout("za"); + + void svmla[_single]_za32[_mf8]_vg4x4_fpm(uint32_t slice, svmfloat8x4_t zn, + svmfloat8_t zm, fpm_t fpm) + __arm_streaming __arm_inout("za"); + + void svmla_za32[_mf8]_vg4x2_fpm(uint32_t slice, svmfloat8x2_t zn, svmfloat8x2_t zm, + fpm_t fpm) __arm_streaming __arm_inout("za"); + + void svmla_za32[_mf8]_vg4x4_fpm(uint32_t slice, svmfloat8x4_t zn, svmfloat8x4_t zm, + fpm_t fpm) __arm_streaming __arm_inout("za"); +``` + +#### FMOPA + +8-bit floating-point sum of outer products and accumulate. +``` c + // Only if __ARM_FEATURE_SME_F8F16 != 0 + void svmopa_za16[_mf8]_m_fpm(uint64_t tile, svbool_t pn, svbool_t pm, + svmfloat8_t zn, svmfloat8_t zm, fpm_t fpm) + __arm_streaming __arm_inout("za"); + + // Only if __ARM_FEATURE_SME_F8F32 != 0 + void svmopa_za32[_mf8]_m_fpm(uint64_t tile, svbool_t pn, svbool_t pm, + svmfloat8_t zn, svmfloat8_t zm, fpm_t fpm) + __arm_streaming __arm_inout("za"); +``` + # M-profile Vector Extension (MVE) intrinsics The M-profile Vector Extension (MVE) [[MVE-spec]](#MVE-spec) instructions provide packed Single @@ -13340,6 +13916,7 @@ additional instructions. | `svfloat32_t svset_neonq[_f32](svfloat32_t vec, float32x4_t subvec)` | | `svfloat64_t svset_neonq[_f64](svfloat64_t vec, float64x2_t subvec)` | | `svbfloat16_t svset_neonq[_bf16](svbfloat16_t vec, bfloat16x8_t subvec)` | +| `svmfloat8_t svset_neonq[_mf8](svmfloat8_t vec, mfloat8x16_t subvec)` | ### `svget_neonq` @@ -13360,6 +13937,7 @@ NEON vector. | `float32x4_t svget_neonq[_f32](svfloat32_t vec)` | | `float64x2_t svget_neonq[_f64](svfloat64_t vec)` | | `bfloat16x8_t svget_neonq[_bf16](svbfloat16_t vec)` | +| `mfloat8x16_t svget_neonq[_mf8](svmfloat8_t vec)` | ### `svdup_neonq` @@ -13380,6 +13958,7 @@ duplicated NEON vector `vec`. | `svfloat32_t svdup_neonq[_f32](float32x4_t vec)` | | `svfloat64_t svdup_neonq[_f64](float64x2_t vec)` | | `svbfloat16_t svdup_neonq[_bf16](bfloat16x8_t vec)` | +| `svmfloat8_t svdup_neonq[_mf8](mfloat8x16_t vec)` | # Future directions diff --git a/neon_intrinsics/advsimd.md b/neon_intrinsics/advsimd.md index c8056afa..392df44d 100644 --- a/neon_intrinsics/advsimd.md +++ b/neon_intrinsics/advsimd.md @@ -2133,394 +2133,452 @@ The intrinsics in this section are guarded by the macro ``__ARM_NEON``. #### Reinterpret casts -| Intrinsic | Argument preparation | AArch64 Instruction | Result | Supported architectures | -|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------|-----------------------|--------------------|---------------------------| -| int16x4_t vreinterpret_s16_s8(int8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.4H -> result` | `v7/A32/A64` | -| int32x2_t vreinterpret_s32_s8(int8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.2S -> result` | `v7/A32/A64` | -| float32x2_t vreinterpret_f32_s8(int8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.2S -> result` | `v7/A32/A64` | -| uint8x8_t vreinterpret_u8_s8(int8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.8B -> result` | `v7/A32/A64` | -| uint16x4_t vreinterpret_u16_s8(int8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.4H -> result` | `v7/A32/A64` | -| uint32x2_t vreinterpret_u32_s8(int8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.2S -> result` | `v7/A32/A64` | -| poly8x8_t vreinterpret_p8_s8(int8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.8B -> result` | `v7/A32/A64` | -| poly16x4_t vreinterpret_p16_s8(int8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.4H -> result` | `v7/A32/A64` | -| uint64x1_t vreinterpret_u64_s8(int8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.1D -> result` | `v7/A32/A64` | -| int64x1_t vreinterpret_s64_s8(int8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.1D -> result` | `v7/A32/A64` | -| float64x1_t vreinterpret_f64_s8(int8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.1D -> result` | `A64` | -| poly64x1_t vreinterpret_p64_s8(int8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.1D -> result` | `A32/A64` | -| float16x4_t vreinterpret_f16_s8(int8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.4H -> result` | `v7/A32/A64` | -| int8x8_t vreinterpret_s8_s16(int16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.8B -> result` | `v7/A32/A64` | -| int32x2_t vreinterpret_s32_s16(int16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.2S -> result` | `v7/A32/A64` | -| float32x2_t vreinterpret_f32_s16(int16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.2S -> result` | `v7/A32/A64` | -| uint8x8_t vreinterpret_u8_s16(int16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.8B -> result` | `v7/A32/A64` | -| uint16x4_t vreinterpret_u16_s16(int16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.4H -> result` | `v7/A32/A64` | -| uint32x2_t vreinterpret_u32_s16(int16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.2S -> result` | `v7/A32/A64` | -| poly8x8_t vreinterpret_p8_s16(int16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.8B -> result` | `v7/A32/A64` | -| poly16x4_t vreinterpret_p16_s16(int16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.4H -> result` | `v7/A32/A64` | -| uint64x1_t vreinterpret_u64_s16(int16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.1D -> result` | `v7/A32/A64` | -| int64x1_t vreinterpret_s64_s16(int16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.1D -> result` | `v7/A32/A64` | -| float64x1_t vreinterpret_f64_s16(int16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.1D -> result` | `A64` | -| poly64x1_t vreinterpret_p64_s16(int16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.1D -> result` | `A32/A64` | -| float16x4_t vreinterpret_f16_s16(int16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.4H -> result` | `v7/A32/A64` | -| int8x8_t vreinterpret_s8_s32(int32x2_t a) | `a -> Vd.2S` | `NOP` | `Vd.8B -> result` | `v7/A32/A64` | -| int16x4_t vreinterpret_s16_s32(int32x2_t a) | `a -> Vd.2S` | `NOP` | `Vd.4H -> result` | `v7/A32/A64` | -| float32x2_t vreinterpret_f32_s32(int32x2_t a) | `a -> Vd.2S` | `NOP` | `Vd.2S -> result` | `v7/A32/A64` | -| uint8x8_t vreinterpret_u8_s32(int32x2_t a) | `a -> Vd.2S` | `NOP` | `Vd.8B -> result` | `v7/A32/A64` | -| uint16x4_t vreinterpret_u16_s32(int32x2_t a) | `a -> Vd.2S` | `NOP` | `Vd.4H -> result` | `v7/A32/A64` | -| uint32x2_t vreinterpret_u32_s32(int32x2_t a) | `a -> Vd.2S` | `NOP` | `Vd.2S -> result` | `v7/A32/A64` | -| poly8x8_t vreinterpret_p8_s32(int32x2_t a) | `a -> Vd.2S` | `NOP` | `Vd.8B -> result` | `v7/A32/A64` | -| poly16x4_t vreinterpret_p16_s32(int32x2_t a) | `a -> Vd.2S` | `NOP` | `Vd.4H -> result` | `v7/A32/A64` | -| uint64x1_t vreinterpret_u64_s32(int32x2_t a) | `a -> Vd.2S` | `NOP` | `Vd.1D -> result` | `v7/A32/A64` | -| int64x1_t vreinterpret_s64_s32(int32x2_t a) | `a -> Vd.2S` | `NOP` | `Vd.1D -> result` | `v7/A32/A64` | -| float64x1_t vreinterpret_f64_s32(int32x2_t a) | `a -> Vd.2S` | `NOP` | `Vd.1D -> result` | `A64` | -| poly64x1_t vreinterpret_p64_s32(int32x2_t a) | `a -> Vd.2S` | `NOP` | `Vd.1D -> result` | `A32/A64` | -| float16x4_t vreinterpret_f16_s32(int32x2_t a) | `a -> Vd.2S` | `NOP` | `Vd.4H -> result` | `v7/A32/A64` | -| int8x8_t vreinterpret_s8_f32(float32x2_t a) | `a -> Vd.2S` | `NOP` | `Vd.8B -> result` | `v7/A32/A64` | -| int16x4_t vreinterpret_s16_f32(float32x2_t a) | `a -> Vd.2S` | `NOP` | `Vd.4H -> result` | `v7/A32/A64` | -| int32x2_t vreinterpret_s32_f32(float32x2_t a) | `a -> Vd.2S` | `NOP` | `Vd.2S -> result` | `v7/A32/A64` | -| uint8x8_t vreinterpret_u8_f32(float32x2_t a) | `a -> Vd.2S` | `NOP` | `Vd.8B -> result` | `v7/A32/A64` | -| uint16x4_t vreinterpret_u16_f32(float32x2_t a) | `a -> Vd.2S` | `NOP` | `Vd.4H -> result` | `v7/A32/A64` | -| uint32x2_t vreinterpret_u32_f32(float32x2_t a) | `a -> Vd.2S` | `NOP` | `Vd.2S -> result` | `v7/A32/A64` | -| poly8x8_t vreinterpret_p8_f32(float32x2_t a) | `a -> Vd.2S` | `NOP` | `Vd.8B -> result` | `v7/A32/A64` | -| poly16x4_t vreinterpret_p16_f32(float32x2_t a) | `a -> Vd.2S` | `NOP` | `Vd.4H -> result` | `v7/A32/A64` | -| uint64x1_t vreinterpret_u64_f32(float32x2_t a) | `a -> Vd.2S` | `NOP` | `Vd.1D -> result` | `v7/A32/A64` | -| int64x1_t vreinterpret_s64_f32(float32x2_t a) | `a -> Vd.2S` | `NOP` | `Vd.1D -> result` | `v7/A32/A64` | -| float64x1_t vreinterpret_f64_f32(float32x2_t a) | `a -> Vd.2S` | `NOP` | `Vd.1D -> result` | `A64` | -| poly64x1_t vreinterpret_p64_f32(float32x2_t a) | `a -> Vd.2S` | `NOP` | `Vd.1D -> result` | `A32/A64` | -| poly64x1_t vreinterpret_p64_f64(float64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.1D -> result` | `A64` | -| float16x4_t vreinterpret_f16_f32(float32x2_t a) | `a -> Vd.2S` | `NOP` | `Vd.4H -> result` | `v7/A32/A64` | -| int8x8_t vreinterpret_s8_u8(uint8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.8B -> result` | `v7/A32/A64` | -| int16x4_t vreinterpret_s16_u8(uint8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.4H -> result` | `v7/A32/A64` | -| int32x2_t vreinterpret_s32_u8(uint8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.2S -> result` | `v7/A32/A64` | -| float32x2_t vreinterpret_f32_u8(uint8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.2S -> result` | `v7/A32/A64` | -| uint16x4_t vreinterpret_u16_u8(uint8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.4H -> result` | `v7/A32/A64` | -| uint32x2_t vreinterpret_u32_u8(uint8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.2S -> result` | `v7/A32/A64` | -| poly8x8_t vreinterpret_p8_u8(uint8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.8B -> result` | `v7/A32/A64` | -| poly16x4_t vreinterpret_p16_u8(uint8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.4H -> result` | `v7/A32/A64` | -| uint64x1_t vreinterpret_u64_u8(uint8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.1D -> result` | `v7/A32/A64` | -| int64x1_t vreinterpret_s64_u8(uint8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.1D -> result` | `v7/A32/A64` | -| float64x1_t vreinterpret_f64_u8(uint8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.1D -> result` | `A64` | -| poly64x1_t vreinterpret_p64_u8(uint8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.1D -> result` | `A32/A64` | -| float16x4_t vreinterpret_f16_u8(uint8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.4H -> result` | `v7/A32/A64` | -| int8x8_t vreinterpret_s8_u16(uint16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.8B -> result` | `v7/A32/A64` | -| int16x4_t vreinterpret_s16_u16(uint16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.4H -> result` | `v7/A32/A64` | -| int32x2_t vreinterpret_s32_u16(uint16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.2S -> result` | `v7/A32/A64` | -| float32x2_t vreinterpret_f32_u16(uint16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.2S -> result` | `v7/A32/A64` | -| uint8x8_t vreinterpret_u8_u16(uint16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.8B -> result` | `v7/A32/A64` | -| uint32x2_t vreinterpret_u32_u16(uint16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.2S -> result` | `v7/A32/A64` | -| poly8x8_t vreinterpret_p8_u16(uint16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.8B -> result` | `v7/A32/A64` | -| poly16x4_t vreinterpret_p16_u16(uint16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.4H -> result` | `v7/A32/A64` | -| uint64x1_t vreinterpret_u64_u16(uint16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.1D -> result` | `v7/A32/A64` | -| int64x1_t vreinterpret_s64_u16(uint16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.1D -> result` | `v7/A32/A64` | -| float64x1_t vreinterpret_f64_u16(uint16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.1D -> result` | `A64` | -| poly64x1_t vreinterpret_p64_u16(uint16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.1D -> result` | `A32/A64` | -| float16x4_t vreinterpret_f16_u16(uint16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.4H -> result` | `v7/A32/A64` | -| int8x8_t vreinterpret_s8_u32(uint32x2_t a) | `a -> Vd.2S` | `NOP` | `Vd.8B -> result` | `v7/A32/A64` | -| int16x4_t vreinterpret_s16_u32(uint32x2_t a) | `a -> Vd.2S` | `NOP` | `Vd.4H -> result` | `v7/A32/A64` | -| int32x2_t vreinterpret_s32_u32(uint32x2_t a) | `a -> Vd.2S` | `NOP` | `Vd.2S -> result` | `v7/A32/A64` | -| float32x2_t vreinterpret_f32_u32(uint32x2_t a) | `a -> Vd.2S` | `NOP` | `Vd.2S -> result` | `v7/A32/A64` | -| uint8x8_t vreinterpret_u8_u32(uint32x2_t a) | `a -> Vd.2S` | `NOP` | `Vd.8B -> result` | `v7/A32/A64` | -| uint16x4_t vreinterpret_u16_u32(uint32x2_t a) | `a -> Vd.2S` | `NOP` | `Vd.4H -> result` | `v7/A32/A64` | -| poly8x8_t vreinterpret_p8_u32(uint32x2_t a) | `a -> Vd.2S` | `NOP` | `Vd.8B -> result` | `v7/A32/A64` | -| poly16x4_t vreinterpret_p16_u32(uint32x2_t a) | `a -> Vd.2S` | `NOP` | `Vd.4H -> result` | `v7/A32/A64` | -| uint64x1_t vreinterpret_u64_u32(uint32x2_t a) | `a -> Vd.2S` | `NOP` | `Vd.1D -> result` | `v7/A32/A64` | -| int64x1_t vreinterpret_s64_u32(uint32x2_t a) | `a -> Vd.2S` | `NOP` | `Vd.1D -> result` | `v7/A32/A64` | -| float64x1_t vreinterpret_f64_u32(uint32x2_t a) | `a -> Vd.2S` | `NOP` | `Vd.1D -> result` | `A64` | -| poly64x1_t vreinterpret_p64_u32(uint32x2_t a) | `a -> Vd.2S` | `NOP` | `Vd.1D -> result` | `A32/A64` | -| float16x4_t vreinterpret_f16_u32(uint32x2_t a) | `a -> Vd.2S` | `NOP` | `Vd.4H -> result` | `v7/A32/A64` | -| int8x8_t vreinterpret_s8_p8(poly8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.8B -> result` | `v7/A32/A64` | -| int16x4_t vreinterpret_s16_p8(poly8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.4H -> result` | `v7/A32/A64` | -| int32x2_t vreinterpret_s32_p8(poly8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.2S -> result` | `v7/A32/A64` | -| float32x2_t vreinterpret_f32_p8(poly8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.2S -> result` | `v7/A32/A64` | -| uint8x8_t vreinterpret_u8_p8(poly8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.8B -> result` | `v7/A32/A64` | -| uint16x4_t vreinterpret_u16_p8(poly8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.4H -> result` | `v7/A32/A64` | -| uint32x2_t vreinterpret_u32_p8(poly8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.2S -> result` | `v7/A32/A64` | -| poly16x4_t vreinterpret_p16_p8(poly8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.4H -> result` | `v7/A32/A64` | -| uint64x1_t vreinterpret_u64_p8(poly8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.1D -> result` | `v7/A32/A64` | -| int64x1_t vreinterpret_s64_p8(poly8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.1D -> result` | `v7/A32/A64` | -| float64x1_t vreinterpret_f64_p8(poly8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.1D -> result` | `A64` | -| poly64x1_t vreinterpret_p64_p8(poly8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.1D -> result` | `A32/A64` | -| float16x4_t vreinterpret_f16_p8(poly8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.4H -> result` | `v7/A32/A64` | -| int8x8_t vreinterpret_s8_p16(poly16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.8B -> result` | `v7/A32/A64` | -| int16x4_t vreinterpret_s16_p16(poly16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.4H -> result` | `v7/A32/A64` | -| int32x2_t vreinterpret_s32_p16(poly16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.2S -> result` | `v7/A32/A64` | -| float32x2_t vreinterpret_f32_p16(poly16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.2S -> result` | `v7/A32/A64` | -| uint8x8_t vreinterpret_u8_p16(poly16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.8B -> result` | `v7/A32/A64` | -| uint16x4_t vreinterpret_u16_p16(poly16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.4H -> result` | `v7/A32/A64` | -| uint32x2_t vreinterpret_u32_p16(poly16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.2S -> result` | `v7/A32/A64` | -| poly8x8_t vreinterpret_p8_p16(poly16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.8B -> result` | `v7/A32/A64` | -| uint64x1_t vreinterpret_u64_p16(poly16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.1D -> result` | `v7/A32/A64` | -| int64x1_t vreinterpret_s64_p16(poly16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.1D -> result` | `v7/A32/A64` | -| float64x1_t vreinterpret_f64_p16(poly16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.1D -> result` | `A64` | -| poly64x1_t vreinterpret_p64_p16(poly16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.1D -> result` | `A32/A64` | -| float16x4_t vreinterpret_f16_p16(poly16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.4H -> result` | `v7/A32/A64` | -| int8x8_t vreinterpret_s8_u64(uint64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.8B -> result` | `v7/A32/A64` | -| int16x4_t vreinterpret_s16_u64(uint64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.4H -> result` | `v7/A32/A64` | -| int32x2_t vreinterpret_s32_u64(uint64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.2S -> result` | `v7/A32/A64` | -| float32x2_t vreinterpret_f32_u64(uint64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.2S -> result` | `v7/A32/A64` | -| uint8x8_t vreinterpret_u8_u64(uint64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.8B -> result` | `v7/A32/A64` | -| uint16x4_t vreinterpret_u16_u64(uint64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.4H -> result` | `v7/A32/A64` | -| uint32x2_t vreinterpret_u32_u64(uint64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.2S -> result` | `v7/A32/A64` | -| poly8x8_t vreinterpret_p8_u64(uint64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.8B -> result` | `v7/A32/A64` | -| poly16x4_t vreinterpret_p16_u64(uint64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.4H -> result` | `v7/A32/A64` | -| int64x1_t vreinterpret_s64_u64(uint64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.1D -> result` | `v7/A32/A64` | -| float64x1_t vreinterpret_f64_u64(uint64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.1D -> result` | `A64` | -| poly64x1_t vreinterpret_p64_u64(uint64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.1D -> result` | `A32/A64` | -| float16x4_t vreinterpret_f16_u64(uint64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.4H -> result` | `v7/A32/A64` | -| int8x8_t vreinterpret_s8_s64(int64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.8B -> result` | `v7/A32/A64` | -| int16x4_t vreinterpret_s16_s64(int64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.4H -> result` | `v7/A32/A64` | -| int32x2_t vreinterpret_s32_s64(int64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.2S -> result` | `v7/A32/A64` | -| float32x2_t vreinterpret_f32_s64(int64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.2S -> result` | `v7/A32/A64` | -| uint8x8_t vreinterpret_u8_s64(int64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.8B -> result` | `v7/A32/A64` | -| uint16x4_t vreinterpret_u16_s64(int64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.4H -> result` | `v7/A32/A64` | -| uint32x2_t vreinterpret_u32_s64(int64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.2S -> result` | `v7/A32/A64` | -| poly8x8_t vreinterpret_p8_s64(int64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.8B -> result` | `v7/A32/A64` | -| poly16x4_t vreinterpret_p16_s64(int64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.4H -> result` | `v7/A32/A64` | -| uint64x1_t vreinterpret_u64_s64(int64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.1D -> result` | `v7/A32/A64` | -| float64x1_t vreinterpret_f64_s64(int64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.1D -> result` | `A64` | -| uint64x1_t vreinterpret_u64_p64(poly64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.1D -> result` | `A32/A64` | -| float16x4_t vreinterpret_f16_s64(int64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.4H -> result` | `v7/A32/A64` | -| int8x8_t vreinterpret_s8_f16(float16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.8B -> result` | `v7/A32/A64` | -| int16x4_t vreinterpret_s16_f16(float16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.4H -> result` | `v7/A32/A64` | -| int32x2_t vreinterpret_s32_f16(float16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.2S -> result` | `v7/A32/A64` | -| float32x2_t vreinterpret_f32_f16(float16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.2S -> result` | `v7/A32/A64` | -| uint8x8_t vreinterpret_u8_f16(float16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.8B -> result` | `v7/A32/A64` | -| uint16x4_t vreinterpret_u16_f16(float16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.4H -> result` | `v7/A32/A64` | -| uint32x2_t vreinterpret_u32_f16(float16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.2S -> result` | `v7/A32/A64` | -| poly8x8_t vreinterpret_p8_f16(float16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.8B -> result` | `v7/A32/A64` | -| poly16x4_t vreinterpret_p16_f16(float16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.4H -> result` | `v7/A32/A64` | -| uint64x1_t vreinterpret_u64_f16(float16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.1D -> result` | `v7/A32/A64` | -| int64x1_t vreinterpret_s64_f16(float16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.1D -> result` | `v7/A32/A64` | -| float64x1_t vreinterpret_f64_f16(float16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.1D -> result` | `A64` | -| poly64x1_t vreinterpret_p64_f16(float16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.1D -> result` | `A32/A64` | -| int16x8_t vreinterpretq_s16_s8(int8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.8H -> result` | `v7/A32/A64` | -| int32x4_t vreinterpretq_s32_s8(int8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.4S -> result` | `v7/A32/A64` | -| float32x4_t vreinterpretq_f32_s8(int8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.4S -> result` | `v7/A32/A64` | -| uint8x16_t vreinterpretq_u8_s8(int8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.16B -> result` | `v7/A32/A64` | -| uint16x8_t vreinterpretq_u16_s8(int8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.8H -> result` | `v7/A32/A64` | -| uint32x4_t vreinterpretq_u32_s8(int8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.4S -> result` | `v7/A32/A64` | -| poly8x16_t vreinterpretq_p8_s8(int8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.16B -> result` | `v7/A32/A64` | -| poly16x8_t vreinterpretq_p16_s8(int8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.8H -> result` | `v7/A32/A64` | -| uint64x2_t vreinterpretq_u64_s8(int8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.2D -> result` | `v7/A32/A64` | -| int64x2_t vreinterpretq_s64_s8(int8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.2D -> result` | `v7/A32/A64` | -| float64x2_t vreinterpretq_f64_s8(int8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.2D -> result` | `A64` | -| poly64x2_t vreinterpretq_p64_s8(int8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.2D -> result` | `A32/A64` | -| poly128_t vreinterpretq_p128_s8(int8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.1Q -> result` | `A32/A64` | -| float16x8_t vreinterpretq_f16_s8(int8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.8H -> result` | `v7/A32/A64` | -| int8x16_t vreinterpretq_s8_s16(int16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.16B -> result` | `v7/A32/A64` | -| int32x4_t vreinterpretq_s32_s16(int16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.4S -> result` | `v7/A32/A64` | -| float32x4_t vreinterpretq_f32_s16(int16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.4S -> result` | `v7/A32/A64` | -| uint8x16_t vreinterpretq_u8_s16(int16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.16B -> result` | `v7/A32/A64` | -| uint16x8_t vreinterpretq_u16_s16(int16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.8H -> result` | `v7/A32/A64` | -| uint32x4_t vreinterpretq_u32_s16(int16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.4S -> result` | `v7/A32/A64` | -| poly8x16_t vreinterpretq_p8_s16(int16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.16B -> result` | `v7/A32/A64` | -| poly16x8_t vreinterpretq_p16_s16(int16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.8H -> result` | `v7/A32/A64` | -| uint64x2_t vreinterpretq_u64_s16(int16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.2D -> result` | `v7/A32/A64` | -| int64x2_t vreinterpretq_s64_s16(int16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.2D -> result` | `v7/A32/A64` | -| float64x2_t vreinterpretq_f64_s16(int16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.2D -> result` | `A64` | -| poly64x2_t vreinterpretq_p64_s16(int16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.2D -> result` | `A32/A64` | -| poly128_t vreinterpretq_p128_s16(int16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.1Q -> result` | `A32/A64` | -| float16x8_t vreinterpretq_f16_s16(int16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.8H -> result` | `v7/A32/A64` | -| int8x16_t vreinterpretq_s8_s32(int32x4_t a) | `a -> Vd.4S` | `NOP` | `Vd.16B -> result` | `v7/A32/A64` | -| int16x8_t vreinterpretq_s16_s32(int32x4_t a) | `a -> Vd.4S` | `NOP` | `Vd.8H -> result` | `v7/A32/A64` | -| float32x4_t vreinterpretq_f32_s32(int32x4_t a) | `a -> Vd.4S` | `NOP` | `Vd.4S -> result` | `v7/A32/A64` | -| uint8x16_t vreinterpretq_u8_s32(int32x4_t a) | `a -> Vd.4S` | `NOP` | `Vd.16B -> result` | `v7/A32/A64` | -| uint16x8_t vreinterpretq_u16_s32(int32x4_t a) | `a -> Vd.4S` | `NOP` | `Vd.8H -> result` | `v7/A32/A64` | -| uint32x4_t vreinterpretq_u32_s32(int32x4_t a) | `a -> Vd.4S` | `NOP` | `Vd.4S -> result` | `v7/A32/A64` | -| poly8x16_t vreinterpretq_p8_s32(int32x4_t a) | `a -> Vd.4S` | `NOP` | `Vd.16B -> result` | `v7/A32/A64` | -| poly16x8_t vreinterpretq_p16_s32(int32x4_t a) | `a -> Vd.4S` | `NOP` | `Vd.8H -> result` | `v7/A32/A64` | -| uint64x2_t vreinterpretq_u64_s32(int32x4_t a) | `a -> Vd.4S` | `NOP` | `Vd.2D -> result` | `v7/A32/A64` | -| int64x2_t vreinterpretq_s64_s32(int32x4_t a) | `a -> Vd.4S` | `NOP` | `Vd.2D -> result` | `v7/A32/A64` | -| float64x2_t vreinterpretq_f64_s32(int32x4_t a) | `a -> Vd.4S` | `NOP` | `Vd.2D -> result` | `A64` | -| poly64x2_t vreinterpretq_p64_s32(int32x4_t a) | `a -> Vd.4S` | `NOP` | `Vd.2D -> result` | `A32/A64` | -| poly128_t vreinterpretq_p128_s32(int32x4_t a) | `a -> Vd.4S` | `NOP` | `Vd.1Q -> result` | `A32/A64` | -| float16x8_t vreinterpretq_f16_s32(int32x4_t a) | `a -> Vd.4S` | `NOP` | `Vd.8H -> result` | `v7/A32/A64` | -| int8x16_t vreinterpretq_s8_f32(float32x4_t a) | `a -> Vd.4S` | `NOP` | `Vd.16B -> result` | `v7/A32/A64` | -| int16x8_t vreinterpretq_s16_f32(float32x4_t a) | `a -> Vd.4S` | `NOP` | `Vd.8H -> result` | `v7/A32/A64` | -| int32x4_t vreinterpretq_s32_f32(float32x4_t a) | `a -> Vd.4S` | `NOP` | `Vd.4S -> result` | `v7/A32/A64` | -| uint8x16_t vreinterpretq_u8_f32(float32x4_t a) | `a -> Vd.4S` | `NOP` | `Vd.16B -> result` | `v7/A32/A64` | -| uint16x8_t vreinterpretq_u16_f32(float32x4_t a) | `a -> Vd.4S` | `NOP` | `Vd.8H -> result` | `v7/A32/A64` | -| uint32x4_t vreinterpretq_u32_f32(float32x4_t a) | `a -> Vd.4S` | `NOP` | `Vd.4S -> result` | `v7/A32/A64` | -| poly8x16_t vreinterpretq_p8_f32(float32x4_t a) | `a -> Vd.4S` | `NOP` | `Vd.16B -> result` | `v7/A32/A64` | -| poly16x8_t vreinterpretq_p16_f32(float32x4_t a) | `a -> Vd.4S` | `NOP` | `Vd.8H -> result` | `v7/A32/A64` | -| uint64x2_t vreinterpretq_u64_f32(float32x4_t a) | `a -> Vd.4S` | `NOP` | `Vd.2D -> result` | `v7/A32/A64` | -| int64x2_t vreinterpretq_s64_f32(float32x4_t a) | `a -> Vd.4S` | `NOP` | `Vd.2D -> result` | `v7/A32/A64` | -| float64x2_t vreinterpretq_f64_f32(float32x4_t a) | `a -> Vd.4S` | `NOP` | `Vd.2D -> result` | `A64` | -| poly64x2_t vreinterpretq_p64_f32(float32x4_t a) | `a -> Vd.4S` | `NOP` | `Vd.2D -> result` | `A32/A64` | -| poly128_t vreinterpretq_p128_f32(float32x4_t a) | `a -> Vd.4S` | `NOP` | `Vd.1Q -> result` | `A32/A64` | -| poly64x2_t vreinterpretq_p64_f64(float64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.2D -> result` | `A64` | -| poly128_t vreinterpretq_p128_f64(float64x2_t a) | `a -> Vd.1Q` | `NOP` | `Vd.2D -> result` | `A64` | -| float16x8_t vreinterpretq_f16_f32(float32x4_t a) | `a -> Vd.4S` | `NOP` | `Vd.8H -> result` | `v7/A32/A64` | -| int8x16_t vreinterpretq_s8_u8(uint8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.16B -> result` | `v7/A32/A64` | -| int16x8_t vreinterpretq_s16_u8(uint8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.8H -> result` | `v7/A32/A64` | -| int32x4_t vreinterpretq_s32_u8(uint8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.4S -> result` | `v7/A32/A64` | -| float32x4_t vreinterpretq_f32_u8(uint8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.4S -> result` | `v7/A32/A64` | -| uint16x8_t vreinterpretq_u16_u8(uint8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.8H -> result` | `v7/A32/A64` | -| uint32x4_t vreinterpretq_u32_u8(uint8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.4S -> result` | `v7/A32/A64` | -| poly8x16_t vreinterpretq_p8_u8(uint8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.16B -> result` | `v7/A32/A64` | -| poly16x8_t vreinterpretq_p16_u8(uint8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.8H -> result` | `v7/A32/A64` | -| uint64x2_t vreinterpretq_u64_u8(uint8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.2D -> result` | `v7/A32/A64` | -| int64x2_t vreinterpretq_s64_u8(uint8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.2D -> result` | `v7/A32/A64` | -| float64x2_t vreinterpretq_f64_u8(uint8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.2D -> result` | `A64` | -| poly64x2_t vreinterpretq_p64_u8(uint8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.2D -> result` | `A32/A64` | -| poly128_t vreinterpretq_p128_u8(uint8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.1Q -> result` | `A32/A64` | -| float16x8_t vreinterpretq_f16_u8(uint8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.8H -> result` | `v7/A32/A64` | -| int8x16_t vreinterpretq_s8_u16(uint16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.16B -> result` | `v7/A32/A64` | -| int16x8_t vreinterpretq_s16_u16(uint16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.8H -> result` | `v7/A32/A64` | -| int32x4_t vreinterpretq_s32_u16(uint16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.4S -> result` | `v7/A32/A64` | -| float32x4_t vreinterpretq_f32_u16(uint16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.4S -> result` | `v7/A32/A64` | -| uint8x16_t vreinterpretq_u8_u16(uint16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.16B -> result` | `v7/A32/A64` | -| uint32x4_t vreinterpretq_u32_u16(uint16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.4S -> result` | `v7/A32/A64` | -| poly8x16_t vreinterpretq_p8_u16(uint16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.16B -> result` | `v7/A32/A64` | -| poly16x8_t vreinterpretq_p16_u16(uint16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.8H -> result` | `v7/A32/A64` | -| uint64x2_t vreinterpretq_u64_u16(uint16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.2D -> result` | `v7/A32/A64` | -| int64x2_t vreinterpretq_s64_u16(uint16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.2D -> result` | `v7/A32/A64` | -| float64x2_t vreinterpretq_f64_u16(uint16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.2D -> result` | `A64` | -| poly64x2_t vreinterpretq_p64_u16(uint16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.2D -> result` | `A32/A64` | -| poly128_t vreinterpretq_p128_u16(uint16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.1Q -> result` | `A32/A64` | -| float16x8_t vreinterpretq_f16_u16(uint16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.8H -> result` | `v7/A32/A64` | -| int8x16_t vreinterpretq_s8_u32(uint32x4_t a) | `a -> Vd.4S` | `NOP` | `Vd.16B -> result` | `v7/A32/A64` | -| int16x8_t vreinterpretq_s16_u32(uint32x4_t a) | `a -> Vd.4S` | `NOP` | `Vd.8H -> result` | `v7/A32/A64` | -| int32x4_t vreinterpretq_s32_u32(uint32x4_t a) | `a -> Vd.4S` | `NOP` | `Vd.4S -> result` | `v7/A32/A64` | -| float32x4_t vreinterpretq_f32_u32(uint32x4_t a) | `a -> Vd.4S` | `NOP` | `Vd.4S -> result` | `v7/A32/A64` | -| uint8x16_t vreinterpretq_u8_u32(uint32x4_t a) | `a -> Vd.4S` | `NOP` | `Vd.16B -> result` | `v7/A32/A64` | -| uint16x8_t vreinterpretq_u16_u32(uint32x4_t a) | `a -> Vd.4S` | `NOP` | `Vd.8H -> result` | `v7/A32/A64` | -| poly8x16_t vreinterpretq_p8_u32(uint32x4_t a) | `a -> Vd.4S` | `NOP` | `Vd.16B -> result` | `v7/A32/A64` | -| poly16x8_t vreinterpretq_p16_u32(uint32x4_t a) | `a -> Vd.4S` | `NOP` | `Vd.8H -> result` | `v7/A32/A64` | -| uint64x2_t vreinterpretq_u64_u32(uint32x4_t a) | `a -> Vd.4S` | `NOP` | `Vd.2D -> result` | `v7/A32/A64` | -| int64x2_t vreinterpretq_s64_u32(uint32x4_t a) | `a -> Vd.4S` | `NOP` | `Vd.2D -> result` | `v7/A32/A64` | -| float64x2_t vreinterpretq_f64_u32(uint32x4_t a) | `a -> Vd.4S` | `NOP` | `Vd.2D -> result` | `A64` | -| poly64x2_t vreinterpretq_p64_u32(uint32x4_t a) | `a -> Vd.4S` | `NOP` | `Vd.2D -> result` | `A32/A64` | -| poly128_t vreinterpretq_p128_u32(uint32x4_t a) | `a -> Vd.4S` | `NOP` | `Vd.1Q -> result` | `A32/A64` | -| float16x8_t vreinterpretq_f16_u32(uint32x4_t a) | `a -> Vd.4S` | `NOP` | `Vd.8H -> result` | `v7/A32/A64` | -| int8x16_t vreinterpretq_s8_p8(poly8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.16B -> result` | `v7/A32/A64` | -| int16x8_t vreinterpretq_s16_p8(poly8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.8H -> result` | `v7/A32/A64` | -| int32x4_t vreinterpretq_s32_p8(poly8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.4S -> result` | `v7/A32/A64` | -| float32x4_t vreinterpretq_f32_p8(poly8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.4S -> result` | `v7/A32/A64` | -| uint8x16_t vreinterpretq_u8_p8(poly8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.16B -> result` | `v7/A32/A64` | -| uint16x8_t vreinterpretq_u16_p8(poly8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.8H -> result` | `v7/A32/A64` | -| uint32x4_t vreinterpretq_u32_p8(poly8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.4S -> result` | `v7/A32/A64` | -| poly16x8_t vreinterpretq_p16_p8(poly8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.8H -> result` | `v7/A32/A64` | -| uint64x2_t vreinterpretq_u64_p8(poly8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.2D -> result` | `v7/A32/A64` | -| int64x2_t vreinterpretq_s64_p8(poly8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.2D -> result` | `v7/A32/A64` | -| float64x2_t vreinterpretq_f64_p8(poly8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.2D -> result` | `A64` | -| poly64x2_t vreinterpretq_p64_p8(poly8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.2D -> result` | `A32/A64` | -| poly128_t vreinterpretq_p128_p8(poly8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.1Q -> result` | `A32/A64` | -| float16x8_t vreinterpretq_f16_p8(poly8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.8H -> result` | `v7/A32/A64` | -| int8x16_t vreinterpretq_s8_p16(poly16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.16B -> result` | `v7/A32/A64` | -| int16x8_t vreinterpretq_s16_p16(poly16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.8H -> result` | `v7/A32/A64` | -| int32x4_t vreinterpretq_s32_p16(poly16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.4S -> result` | `v7/A32/A64` | -| float32x4_t vreinterpretq_f32_p16(poly16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.4S -> result` | `v7/A32/A64` | -| uint8x16_t vreinterpretq_u8_p16(poly16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.16B -> result` | `v7/A32/A64` | -| uint16x8_t vreinterpretq_u16_p16(poly16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.8H -> result` | `v7/A32/A64` | -| uint32x4_t vreinterpretq_u32_p16(poly16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.4S -> result` | `v7/A32/A64` | -| poly8x16_t vreinterpretq_p8_p16(poly16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.16B -> result` | `v7/A32/A64` | -| uint64x2_t vreinterpretq_u64_p16(poly16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.2D -> result` | `v7/A32/A64` | -| int64x2_t vreinterpretq_s64_p16(poly16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.2D -> result` | `v7/A32/A64` | -| float64x2_t vreinterpretq_f64_p16(poly16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.2D -> result` | `A64` | -| poly64x2_t vreinterpretq_p64_p16(poly16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.2D -> result` | `A32/A64` | -| poly128_t vreinterpretq_p128_p16(poly16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.1Q -> result` | `A32/A64` | -| float16x8_t vreinterpretq_f16_p16(poly16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.8H -> result` | `v7/A32/A64` | -| int8x16_t vreinterpretq_s8_u64(uint64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.16B -> result` | `v7/A32/A64` | -| int16x8_t vreinterpretq_s16_u64(uint64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.8H -> result` | `v7/A32/A64` | -| int32x4_t vreinterpretq_s32_u64(uint64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.4S -> result` | `v7/A32/A64` | -| float32x4_t vreinterpretq_f32_u64(uint64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.4S -> result` | `v7/A32/A64` | -| uint8x16_t vreinterpretq_u8_u64(uint64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.16B -> result` | `v7/A32/A64` | -| uint16x8_t vreinterpretq_u16_u64(uint64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.8H -> result` | `v7/A32/A64` | -| uint32x4_t vreinterpretq_u32_u64(uint64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.4S -> result` | `v7/A32/A64` | -| poly8x16_t vreinterpretq_p8_u64(uint64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.16B -> result` | `v7/A32/A64` | -| poly16x8_t vreinterpretq_p16_u64(uint64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.8H -> result` | `v7/A32/A64` | -| int64x2_t vreinterpretq_s64_u64(uint64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.2D -> result` | `v7/A32/A64` | -| float64x2_t vreinterpretq_f64_u64(uint64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.2D -> result` | `v7/A32/A64` | -| float64x2_t vreinterpretq_f64_s64(int64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.2D -> result` | `A64` | -| poly64x2_t vreinterpretq_p64_s64(int64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.2D -> result` | `A32/A64` | -| poly128_t vreinterpretq_p128_s64(int64x2_t a) | `a -> Vd.1Q` | `NOP` | `Vd.2D -> result` | `A32/A64` | -| poly64x2_t vreinterpretq_p64_u64(uint64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.2D -> result` | `A32/A64` | -| poly128_t vreinterpretq_p128_u64(uint64x2_t a) | `a -> Vd.1Q` | `NOP` | `Vd.2D -> result` | `A32/A64` | -| float16x8_t vreinterpretq_f16_u64(uint64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.8H -> result` | `v7/A32/A64` | -| int8x16_t vreinterpretq_s8_s64(int64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.16B -> result` | `v7/A32/A64` | -| int16x8_t vreinterpretq_s16_s64(int64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.8H -> result` | `v7/A32/A64` | -| int32x4_t vreinterpretq_s32_s64(int64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.4S -> result` | `v7/A32/A64` | -| float32x4_t vreinterpretq_f32_s64(int64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.4S -> result` | `v7/A32/A64` | -| uint8x16_t vreinterpretq_u8_s64(int64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.16B -> result` | `v7/A32/A64` | -| uint16x8_t vreinterpretq_u16_s64(int64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.8H -> result` | `v7/A32/A64` | -| uint32x4_t vreinterpretq_u32_s64(int64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.4S -> result` | `v7/A32/A64` | -| poly8x16_t vreinterpretq_p8_s64(int64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.16B -> result` | `v7/A32/A64` | -| poly16x8_t vreinterpretq_p16_s64(int64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.8H -> result` | `v7/A32/A64` | -| uint64x2_t vreinterpretq_u64_s64(int64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.2D -> result` | `v7/A32/A64` | -| uint64x2_t vreinterpretq_u64_p64(poly64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.2D -> result` | `A32/A64` | -| float16x8_t vreinterpretq_f16_s64(int64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.8H -> result` | `v7/A32/A64` | -| int8x16_t vreinterpretq_s8_f16(float16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.16B -> result` | `v7/A32/A64` | -| int16x8_t vreinterpretq_s16_f16(float16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.8H -> result` | `v7/A32/A64` | -| int32x4_t vreinterpretq_s32_f16(float16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.4S -> result` | `v7/A32/A64` | -| float32x4_t vreinterpretq_f32_f16(float16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.4S -> result` | `v7/A32/A64` | -| uint8x16_t vreinterpretq_u8_f16(float16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.16B -> result` | `v7/A32/A64` | -| uint16x8_t vreinterpretq_u16_f16(float16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.8H -> result` | `v7/A32/A64` | -| uint32x4_t vreinterpretq_u32_f16(float16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.4S -> result` | `v7/A32/A64` | -| poly8x16_t vreinterpretq_p8_f16(float16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.16B -> result` | `v7/A32/A64` | -| poly16x8_t vreinterpretq_p16_f16(float16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.8H -> result` | `v7/A32/A64` | -| uint64x2_t vreinterpretq_u64_f16(float16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.2D -> result` | `v7/A32/A64` | -| int64x2_t vreinterpretq_s64_f16(float16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.2D -> result` | `v7/A32/A64` | -| float64x2_t vreinterpretq_f64_f16(float16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.2D -> result` | `A64` | -| poly64x2_t vreinterpretq_p64_f16(float16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.2D -> result` | `A32/A64` | -| poly128_t vreinterpretq_p128_f16(float16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.1Q -> result` | `A32/A64` | -| int8x8_t vreinterpret_s8_f64(float64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.8B -> result` | `A64` | -| int16x4_t vreinterpret_s16_f64(float64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.4H -> result` | `A64` | -| int32x2_t vreinterpret_s32_f64(float64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.2S -> result` | `A64` | -| uint8x8_t vreinterpret_u8_f64(float64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.8B -> result` | `A64` | -| uint16x4_t vreinterpret_u16_f64(float64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.4H -> result` | `A64` | -| uint32x2_t vreinterpret_u32_f64(float64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.2S -> result` | `A64` | -| poly8x8_t vreinterpret_p8_f64(float64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.8B -> result` | `A64` | -| poly16x4_t vreinterpret_p16_f64(float64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.4H -> result` | `A64` | -| uint64x1_t vreinterpret_u64_f64(float64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.1D -> result` | `A64` | -| int64x1_t vreinterpret_s64_f64(float64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.1D -> result` | `A64` | -| float16x4_t vreinterpret_f16_f64(float64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.4H -> result` | `A64` | -| float32x2_t vreinterpret_f32_f64(float64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.2S -> result` | `A64` | -| int8x16_t vreinterpretq_s8_f64(float64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.16B -> result` | `A64` | -| int16x8_t vreinterpretq_s16_f64(float64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.8H -> result` | `A64` | -| int32x4_t vreinterpretq_s32_f64(float64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.4S -> result` | `A64` | -| uint8x16_t vreinterpretq_u8_f64(float64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.16B -> result` | `A64` | -| uint16x8_t vreinterpretq_u16_f64(float64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.8H -> result` | `A64` | -| uint32x4_t vreinterpretq_u32_f64(float64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.4S -> result` | `A64` | -| poly8x16_t vreinterpretq_p8_f64(float64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.16B -> result` | `A64` | -| poly16x8_t vreinterpretq_p16_f64(float64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.8H -> result` | `A64` | -| uint64x2_t vreinterpretq_u64_f64(float64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.2D -> result` | `A64` | -| int64x2_t vreinterpretq_s64_f64(float64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.2D -> result` | `A64` | -| float16x8_t vreinterpretq_f16_f64(float64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.8H -> result` | `A64` | -| float32x4_t vreinterpretq_f32_f64(float64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.4S -> result` | `A64` | -| int8x8_t vreinterpret_s8_p64(poly64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.8B -> result` | `A32/A64` | -| int16x4_t vreinterpret_s16_p64(poly64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.4H -> result` | `A32/A64` | -| int32x2_t vreinterpret_s32_p64(poly64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.2S -> result` | `A32/A64` | -| uint8x8_t vreinterpret_u8_p64(poly64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.8B -> result` | `A32/A64` | -| uint16x4_t vreinterpret_u16_p64(poly64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.4H -> result` | `A32/A64` | -| uint32x2_t vreinterpret_u32_p64(poly64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.2S -> result` | `A32/A64` | -| poly8x8_t vreinterpret_p8_p64(poly64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.8B -> result` | `A32/A64` | -| poly16x4_t vreinterpret_p16_p64(poly64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.4H -> result` | `A32/A64` | -| int64x1_t vreinterpret_s64_p64(poly64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.1D -> result` | `A32/A64` | -| float64x1_t vreinterpret_f64_p64(poly64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.1D -> result` | `A64` | -| float16x4_t vreinterpret_f16_p64(poly64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.4H -> result` | `A32/A64` | -| int8x16_t vreinterpretq_s8_p64(poly64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.16B -> result` | `A32/A64` | -| int16x8_t vreinterpretq_s16_p64(poly64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.8H -> result` | `A32/A64` | -| int32x4_t vreinterpretq_s32_p64(poly64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.4S -> result` | `A32/A64` | -| uint8x16_t vreinterpretq_u8_p64(poly64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.16B -> result` | `A32/A64` | -| uint16x8_t vreinterpretq_u16_p64(poly64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.8H -> result` | `A32/A64` | -| uint32x4_t vreinterpretq_u32_p64(poly64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.4S -> result` | `A32/A64` | -| poly8x16_t vreinterpretq_p8_p64(poly64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.16B -> result` | `A32/A64` | -| poly16x8_t vreinterpretq_p16_p64(poly64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.8H -> result` | `A32/A64` | -| int64x2_t vreinterpretq_s64_p64(poly64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.2D -> result` | `A32/A64` | -| float64x2_t vreinterpretq_f64_p64(poly64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.2D -> result` | `A64` | -| float16x8_t vreinterpretq_f16_p64(poly64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.8H -> result` | `A32/A64` | -| int8x16_t vreinterpretq_s8_p128(poly128_t a) | `a -> Vd.1Q` | `NOP` | `Vd.16B -> result` | `A32/A64` | -| int16x8_t vreinterpretq_s16_p128(poly128_t a) | `a -> Vd.1Q` | `NOP` | `Vd.8H -> result` | `A32/A64` | -| int32x4_t vreinterpretq_s32_p128(poly128_t a) | `a -> Vd.1Q` | `NOP` | `Vd.4S -> result` | `A32/A64` | -| uint8x16_t vreinterpretq_u8_p128(poly128_t a) | `a -> Vd.1Q` | `NOP` | `Vd.16B -> result` | `A32/A64` | -| uint16x8_t vreinterpretq_u16_p128(poly128_t a) | `a -> Vd.1Q` | `NOP` | `Vd.8H -> result` | `A32/A64` | -| uint32x4_t vreinterpretq_u32_p128(poly128_t a) | `a -> Vd.1Q` | `NOP` | `Vd.4S -> result` | `A32/A64` | -| poly8x16_t vreinterpretq_p8_p128(poly128_t a) | `a -> Vd.1Q` | `NOP` | `Vd.16B -> result` | `A32/A64` | -| poly16x8_t vreinterpretq_p16_p128(poly128_t a) | `a -> Vd.1Q` | `NOP` | `Vd.8H -> result` | `A32/A64` | -| uint64x2_t vreinterpretq_u64_p128(poly128_t a) | `a -> Vd.1Q` | `NOP` | `Vd.2D -> result` | `A32/A64` | -| int64x2_t vreinterpretq_s64_p128(poly128_t a) | `a -> Vd.1Q` | `NOP` | `Vd.2D -> result` | `A32/A64` | -| float64x2_t vreinterpretq_f64_p128(poly128_t a) | `a -> Vd.1Q` | `NOP` | `Vd.2D -> result` | `A64` | -| float16x8_t vreinterpretq_f16_p128(poly128_t a) | `a -> Vd.1Q` | `NOP` | `Vd.8H -> result` | `A32/A64` | +| Intrinsic | Argument preparation | AArch64 Instruction | Result | Supported architectures | +|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------|-----------------------|--------------------|---------------------------| +| int16x4_t vreinterpret_s16_s8(int8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.4H -> result` | `v7/A32/A64` | +| int32x2_t vreinterpret_s32_s8(int8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.2S -> result` | `v7/A32/A64` | +| float32x2_t vreinterpret_f32_s8(int8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.2S -> result` | `v7/A32/A64` | +| uint8x8_t vreinterpret_u8_s8(int8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.8B -> result` | `v7/A32/A64` | +| uint16x4_t vreinterpret_u16_s8(int8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.4H -> result` | `v7/A32/A64` | +| uint32x2_t vreinterpret_u32_s8(int8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.2S -> result` | `v7/A32/A64` | +| poly8x8_t vreinterpret_p8_s8(int8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.8B -> result` | `v7/A32/A64` | +| poly16x4_t vreinterpret_p16_s8(int8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.4H -> result` | `v7/A32/A64` | +| mfloat8x8_t vreinterpret_mf8_s8(int8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.8B -> result` | `A64` | +| uint64x1_t vreinterpret_u64_s8(int8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.1D -> result` | `v7/A32/A64` | +| int64x1_t vreinterpret_s64_s8(int8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.1D -> result` | `v7/A32/A64` | +| float64x1_t vreinterpret_f64_s8(int8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.1D -> result` | `A64` | +| poly64x1_t vreinterpret_p64_s8(int8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.1D -> result` | `A32/A64` | +| float16x4_t vreinterpret_f16_s8(int8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.4H -> result` | `v7/A32/A64` | +| int8x8_t vreinterpret_s8_s16(int16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.8B -> result` | `v7/A32/A64` | +| int32x2_t vreinterpret_s32_s16(int16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.2S -> result` | `v7/A32/A64` | +| float32x2_t vreinterpret_f32_s16(int16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.2S -> result` | `v7/A32/A64` | +| uint8x8_t vreinterpret_u8_s16(int16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.8B -> result` | `v7/A32/A64` | +| uint16x4_t vreinterpret_u16_s16(int16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.4H -> result` | `v7/A32/A64` | +| uint32x2_t vreinterpret_u32_s16(int16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.2S -> result` | `v7/A32/A64` | +| poly8x8_t vreinterpret_p8_s16(int16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.8B -> result` | `v7/A32/A64` | +| poly16x4_t vreinterpret_p16_s16(int16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.4H -> result` | `v7/A32/A64` | +| mfloat8x8_t vreinterpret_mf8_s16(int16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.8B -> result` | `A64` | +| uint64x1_t vreinterpret_u64_s16(int16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.1D -> result` | `v7/A32/A64` | +| int64x1_t vreinterpret_s64_s16(int16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.1D -> result` | `v7/A32/A64` | +| float64x1_t vreinterpret_f64_s16(int16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.1D -> result` | `A64` | +| poly64x1_t vreinterpret_p64_s16(int16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.1D -> result` | `A32/A64` | +| float16x4_t vreinterpret_f16_s16(int16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.4H -> result` | `v7/A32/A64` | +| int8x8_t vreinterpret_s8_s32(int32x2_t a) | `a -> Vd.2S` | `NOP` | `Vd.8B -> result` | `v7/A32/A64` | +| int16x4_t vreinterpret_s16_s32(int32x2_t a) | `a -> Vd.2S` | `NOP` | `Vd.4H -> result` | `v7/A32/A64` | +| float32x2_t vreinterpret_f32_s32(int32x2_t a) | `a -> Vd.2S` | `NOP` | `Vd.2S -> result` | `v7/A32/A64` | +| uint8x8_t vreinterpret_u8_s32(int32x2_t a) | `a -> Vd.2S` | `NOP` | `Vd.8B -> result` | `v7/A32/A64` | +| uint16x4_t vreinterpret_u16_s32(int32x2_t a) | `a -> Vd.2S` | `NOP` | `Vd.4H -> result` | `v7/A32/A64` | +| uint32x2_t vreinterpret_u32_s32(int32x2_t a) | `a -> Vd.2S` | `NOP` | `Vd.2S -> result` | `v7/A32/A64` | +| poly8x8_t vreinterpret_p8_s32(int32x2_t a) | `a -> Vd.2S` | `NOP` | `Vd.8B -> result` | `v7/A32/A64` | +| poly16x4_t vreinterpret_p16_s32(int32x2_t a) | `a -> Vd.2S` | `NOP` | `Vd.4H -> result` | `v7/A32/A64` | +| mfloat8x8_t vreinterpret_mf8_s32(int32x2_t a) | `a -> Vd.2S` | `NOP` | `Vd.8B -> result` | `A64` | +| uint64x1_t vreinterpret_u64_s32(int32x2_t a) | `a -> Vd.2S` | `NOP` | `Vd.1D -> result` | `v7/A32/A64` | +| int64x1_t vreinterpret_s64_s32(int32x2_t a) | `a -> Vd.2S` | `NOP` | `Vd.1D -> result` | `v7/A32/A64` | +| float64x1_t vreinterpret_f64_s32(int32x2_t a) | `a -> Vd.2S` | `NOP` | `Vd.1D -> result` | `A64` | +| poly64x1_t vreinterpret_p64_s32(int32x2_t a) | `a -> Vd.2S` | `NOP` | `Vd.1D -> result` | `A32/A64` | +| float16x4_t vreinterpret_f16_s32(int32x2_t a) | `a -> Vd.2S` | `NOP` | `Vd.4H -> result` | `v7/A32/A64` | +| int8x8_t vreinterpret_s8_f32(float32x2_t a) | `a -> Vd.2S` | `NOP` | `Vd.8B -> result` | `v7/A32/A64` | +| int16x4_t vreinterpret_s16_f32(float32x2_t a) | `a -> Vd.2S` | `NOP` | `Vd.4H -> result` | `v7/A32/A64` | +| int32x2_t vreinterpret_s32_f32(float32x2_t a) | `a -> Vd.2S` | `NOP` | `Vd.2S -> result` | `v7/A32/A64` | +| uint8x8_t vreinterpret_u8_f32(float32x2_t a) | `a -> Vd.2S` | `NOP` | `Vd.8B -> result` | `v7/A32/A64` | +| uint16x4_t vreinterpret_u16_f32(float32x2_t a) | `a -> Vd.2S` | `NOP` | `Vd.4H -> result` | `v7/A32/A64` | +| uint32x2_t vreinterpret_u32_f32(float32x2_t a) | `a -> Vd.2S` | `NOP` | `Vd.2S -> result` | `v7/A32/A64` | +| poly8x8_t vreinterpret_p8_f32(float32x2_t a) | `a -> Vd.2S` | `NOP` | `Vd.8B -> result` | `v7/A32/A64` | +| poly16x4_t vreinterpret_p16_f32(float32x2_t a) | `a -> Vd.2S` | `NOP` | `Vd.4H -> result` | `v7/A32/A64` | +| mfloat8x8_t vreinterpret_mf8_f32(float32x2_t a) | `a -> Vd.2S` | `NOP` | `Vd.8B -> result` | `A64` | +| uint64x1_t vreinterpret_u64_f32(float32x2_t a) | `a -> Vd.2S` | `NOP` | `Vd.1D -> result` | `v7/A32/A64` | +| int64x1_t vreinterpret_s64_f32(float32x2_t a) | `a -> Vd.2S` | `NOP` | `Vd.1D -> result` | `v7/A32/A64` | +| float64x1_t vreinterpret_f64_f32(float32x2_t a) | `a -> Vd.2S` | `NOP` | `Vd.1D -> result` | `A64` | +| poly64x1_t vreinterpret_p64_f32(float32x2_t a) | `a -> Vd.2S` | `NOP` | `Vd.1D -> result` | `A32/A64` | +| poly64x1_t vreinterpret_p64_f64(float64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.1D -> result` | `A64` | +| float16x4_t vreinterpret_f16_f32(float32x2_t a) | `a -> Vd.2S` | `NOP` | `Vd.4H -> result` | `v7/A32/A64` | +| int8x8_t vreinterpret_s8_u8(uint8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.8B -> result` | `v7/A32/A64` | +| int16x4_t vreinterpret_s16_u8(uint8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.4H -> result` | `v7/A32/A64` | +| int32x2_t vreinterpret_s32_u8(uint8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.2S -> result` | `v7/A32/A64` | +| float32x2_t vreinterpret_f32_u8(uint8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.2S -> result` | `v7/A32/A64` | +| uint16x4_t vreinterpret_u16_u8(uint8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.4H -> result` | `v7/A32/A64` | +| uint32x2_t vreinterpret_u32_u8(uint8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.2S -> result` | `v7/A32/A64` | +| poly8x8_t vreinterpret_p8_u8(uint8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.8B -> result` | `v7/A32/A64` | +| poly16x4_t vreinterpret_p16_u8(uint8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.4H -> result` | `v7/A32/A64` | +| mfloat8x8_t vreinterpret_mf8_u8(uint8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.8B -> result` | `A64` | +| uint64x1_t vreinterpret_u64_u8(uint8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.1D -> result` | `v7/A32/A64` | +| int64x1_t vreinterpret_s64_u8(uint8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.1D -> result` | `v7/A32/A64` | +| float64x1_t vreinterpret_f64_u8(uint8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.1D -> result` | `A64` | +| poly64x1_t vreinterpret_p64_u8(uint8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.1D -> result` | `A32/A64` | +| float16x4_t vreinterpret_f16_u8(uint8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.4H -> result` | `v7/A32/A64` | +| int8x8_t vreinterpret_s8_u16(uint16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.8B -> result` | `v7/A32/A64` | +| int16x4_t vreinterpret_s16_u16(uint16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.4H -> result` | `v7/A32/A64` | +| int32x2_t vreinterpret_s32_u16(uint16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.2S -> result` | `v7/A32/A64` | +| float32x2_t vreinterpret_f32_u16(uint16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.2S -> result` | `v7/A32/A64` | +| uint8x8_t vreinterpret_u8_u16(uint16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.8B -> result` | `v7/A32/A64` | +| uint32x2_t vreinterpret_u32_u16(uint16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.2S -> result` | `v7/A32/A64` | +| poly8x8_t vreinterpret_p8_u16(uint16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.8B -> result` | `v7/A32/A64` | +| poly16x4_t vreinterpret_p16_u16(uint16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.4H -> result` | `v7/A32/A64` | +| mfloat8x8_t vreinterpret_mf8_u16(uint16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.8B -> result` | `A64` | +| uint64x1_t vreinterpret_u64_u16(uint16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.1D -> result` | `v7/A32/A64` | +| int64x1_t vreinterpret_s64_u16(uint16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.1D -> result` | `v7/A32/A64` | +| float64x1_t vreinterpret_f64_u16(uint16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.1D -> result` | `A64` | +| poly64x1_t vreinterpret_p64_u16(uint16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.1D -> result` | `A32/A64` | +| float16x4_t vreinterpret_f16_u16(uint16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.4H -> result` | `v7/A32/A64` | +| int8x8_t vreinterpret_s8_u32(uint32x2_t a) | `a -> Vd.2S` | `NOP` | `Vd.8B -> result` | `v7/A32/A64` | +| int16x4_t vreinterpret_s16_u32(uint32x2_t a) | `a -> Vd.2S` | `NOP` | `Vd.4H -> result` | `v7/A32/A64` | +| int32x2_t vreinterpret_s32_u32(uint32x2_t a) | `a -> Vd.2S` | `NOP` | `Vd.2S -> result` | `v7/A32/A64` | +| float32x2_t vreinterpret_f32_u32(uint32x2_t a) | `a -> Vd.2S` | `NOP` | `Vd.2S -> result` | `v7/A32/A64` | +| uint8x8_t vreinterpret_u8_u32(uint32x2_t a) | `a -> Vd.2S` | `NOP` | `Vd.8B -> result` | `v7/A32/A64` | +| uint16x4_t vreinterpret_u16_u32(uint32x2_t a) | `a -> Vd.2S` | `NOP` | `Vd.4H -> result` | `v7/A32/A64` | +| poly8x8_t vreinterpret_p8_u32(uint32x2_t a) | `a -> Vd.2S` | `NOP` | `Vd.8B -> result` | `v7/A32/A64` | +| poly16x4_t vreinterpret_p16_u32(uint32x2_t a) | `a -> Vd.2S` | `NOP` | `Vd.4H -> result` | `v7/A32/A64` | +| mfloat8x8_t vreinterpret_mf8_u32(uint32x2_t a) | `a -> Vd.2S` | `NOP` | `Vd.8B -> result` | `A64` | +| uint64x1_t vreinterpret_u64_u32(uint32x2_t a) | `a -> Vd.2S` | `NOP` | `Vd.1D -> result` | `v7/A32/A64` | +| int64x1_t vreinterpret_s64_u32(uint32x2_t a) | `a -> Vd.2S` | `NOP` | `Vd.1D -> result` | `v7/A32/A64` | +| float64x1_t vreinterpret_f64_u32(uint32x2_t a) | `a -> Vd.2S` | `NOP` | `Vd.1D -> result` | `A64` | +| poly64x1_t vreinterpret_p64_u32(uint32x2_t a) | `a -> Vd.2S` | `NOP` | `Vd.1D -> result` | `A32/A64` | +| float16x4_t vreinterpret_f16_u32(uint32x2_t a) | `a -> Vd.2S` | `NOP` | `Vd.4H -> result` | `v7/A32/A64` | +| int8x8_t vreinterpret_s8_p8(poly8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.8B -> result` | `v7/A32/A64` | +| int16x4_t vreinterpret_s16_p8(poly8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.4H -> result` | `v7/A32/A64` | +| int32x2_t vreinterpret_s32_p8(poly8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.2S -> result` | `v7/A32/A64` | +| float32x2_t vreinterpret_f32_p8(poly8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.2S -> result` | `v7/A32/A64` | +| uint8x8_t vreinterpret_u8_p8(poly8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.8B -> result` | `v7/A32/A64` | +| uint16x4_t vreinterpret_u16_p8(poly8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.4H -> result` | `v7/A32/A64` | +| uint32x2_t vreinterpret_u32_p8(poly8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.2S -> result` | `v7/A32/A64` | +| poly16x4_t vreinterpret_p16_p8(poly8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.4H -> result` | `v7/A32/A64` | +| uint64x1_t vreinterpret_u64_p8(poly8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.1D -> result` | `v7/A32/A64` | +| int64x1_t vreinterpret_s64_p8(poly8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.1D -> result` | `v7/A32/A64` | +| float64x1_t vreinterpret_f64_p8(poly8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.1D -> result` | `A64` | +| poly64x1_t vreinterpret_p64_p8(poly8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.1D -> result` | `A32/A64` | +| float16x4_t vreinterpret_f16_p8(poly8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.4H -> result` | `v7/A32/A64` | +| int8x8_t vreinterpret_s8_mf8(mfloat8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.8B -> result` | `A64` | +| int16x4_t vreinterpret_s16_mf8(mfloat8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.4H -> result` | `A64` | +| int32x2_t vreinterpret_s32_mf8(mfloat8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.2S -> result` | `A64` | +| float32x2_t vreinterpret_f32_mf8(mfloat8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.2S -> result` | `A64` | +| uint8x8_t vreinterpret_u8_mf8(mfloat8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.8B -> result` | `A64` | +| uint16x4_t vreinterpret_u16_mf8(mfloat8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.4H -> result` | `A64` | +| uint32x2_t vreinterpret_u32_mf8(mfloat8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.2S -> result` | `A64` | +| poly16x4_t vreinterpret_p16_mf8(mfloat8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.4H -> result` | `A64` | +| uint64x1_t vreinterpret_u64_mf8(mfloat8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.1D -> result` | `A64` | +| int64x1_t vreinterpret_s64_mf8(mfloat8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.1D -> result` | `A64` | +| float64x1_t vreinterpret_f64_mf8(mfloat8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.1D -> result` | `A64` | +| poly64x1_t vreinterpret_p64_mf8(mfloat8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.1D -> result` | `A64` | +| float16x4_t vreinterpret_f16_mf8(mfloat8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.4H -> result` | `A64` | +| int8x8_t vreinterpret_s8_p16(poly16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.8B -> result` | `v7/A32/A64` | +| int16x4_t vreinterpret_s16_p16(poly16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.4H -> result` | `v7/A32/A64` | +| int32x2_t vreinterpret_s32_p16(poly16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.2S -> result` | `v7/A32/A64` | +| float32x2_t vreinterpret_f32_p16(poly16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.2S -> result` | `v7/A32/A64` | +| uint8x8_t vreinterpret_u8_p16(poly16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.8B -> result` | `v7/A32/A64` | +| uint16x4_t vreinterpret_u16_p16(poly16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.4H -> result` | `v7/A32/A64` | +| uint32x2_t vreinterpret_u32_p16(poly16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.2S -> result` | `v7/A32/A64` | +| poly8x8_t vreinterpret_p8_p16(poly16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.8B -> result` | `v7/A32/A64` | +| mfloat8x8_t vreinterpret_mf8_p16(poly16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.8B -> result` | `A64` | +| uint64x1_t vreinterpret_u64_p16(poly16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.1D -> result` | `v7/A32/A64` | +| int64x1_t vreinterpret_s64_p16(poly16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.1D -> result` | `v7/A32/A64` | +| float64x1_t vreinterpret_f64_p16(poly16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.1D -> result` | `A64` | +| poly64x1_t vreinterpret_p64_p16(poly16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.1D -> result` | `A32/A64` | +| float16x4_t vreinterpret_f16_p16(poly16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.4H -> result` | `v7/A32/A64` | +| int8x8_t vreinterpret_s8_u64(uint64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.8B -> result` | `v7/A32/A64` | +| int16x4_t vreinterpret_s16_u64(uint64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.4H -> result` | `v7/A32/A64` | +| int32x2_t vreinterpret_s32_u64(uint64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.2S -> result` | `v7/A32/A64` | +| float32x2_t vreinterpret_f32_u64(uint64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.2S -> result` | `v7/A32/A64` | +| uint8x8_t vreinterpret_u8_u64(uint64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.8B -> result` | `v7/A32/A64` | +| uint16x4_t vreinterpret_u16_u64(uint64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.4H -> result` | `v7/A32/A64` | +| uint32x2_t vreinterpret_u32_u64(uint64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.2S -> result` | `v7/A32/A64` | +| poly8x8_t vreinterpret_p8_u64(uint64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.8B -> result` | `v7/A32/A64` | +| poly16x4_t vreinterpret_p16_u64(uint64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.4H -> result` | `v7/A32/A64` | +| mfloat8x8_t vreinterpret_mf8_u64(uint64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.8B -> result` | `A64` | +| int64x1_t vreinterpret_s64_u64(uint64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.1D -> result` | `v7/A32/A64` | +| float64x1_t vreinterpret_f64_u64(uint64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.1D -> result` | `A64` | +| poly64x1_t vreinterpret_p64_u64(uint64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.1D -> result` | `A32/A64` | +| float16x4_t vreinterpret_f16_u64(uint64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.4H -> result` | `v7/A32/A64` | +| int8x8_t vreinterpret_s8_s64(int64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.8B -> result` | `v7/A32/A64` | +| int16x4_t vreinterpret_s16_s64(int64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.4H -> result` | `v7/A32/A64` | +| int32x2_t vreinterpret_s32_s64(int64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.2S -> result` | `v7/A32/A64` | +| float32x2_t vreinterpret_f32_s64(int64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.2S -> result` | `v7/A32/A64` | +| uint8x8_t vreinterpret_u8_s64(int64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.8B -> result` | `v7/A32/A64` | +| uint16x4_t vreinterpret_u16_s64(int64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.4H -> result` | `v7/A32/A64` | +| uint32x2_t vreinterpret_u32_s64(int64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.2S -> result` | `v7/A32/A64` | +| poly8x8_t vreinterpret_p8_s64(int64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.8B -> result` | `v7/A32/A64` | +| poly16x4_t vreinterpret_p16_s64(int64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.4H -> result` | `v7/A32/A64` | +| mfloat8x8_t vreinterpret_mf8_s64(int64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.8B -> result` | `A64` | +| uint64x1_t vreinterpret_u64_s64(int64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.1D -> result` | `v7/A32/A64` | +| float64x1_t vreinterpret_f64_s64(int64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.1D -> result` | `A64` | +| uint64x1_t vreinterpret_u64_p64(poly64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.1D -> result` | `A32/A64` | +| float16x4_t vreinterpret_f16_s64(int64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.4H -> result` | `v7/A32/A64` | +| int8x8_t vreinterpret_s8_f16(float16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.8B -> result` | `v7/A32/A64` | +| int16x4_t vreinterpret_s16_f16(float16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.4H -> result` | `v7/A32/A64` | +| int32x2_t vreinterpret_s32_f16(float16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.2S -> result` | `v7/A32/A64` | +| float32x2_t vreinterpret_f32_f16(float16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.2S -> result` | `v7/A32/A64` | +| uint8x8_t vreinterpret_u8_f16(float16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.8B -> result` | `v7/A32/A64` | +| uint16x4_t vreinterpret_u16_f16(float16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.4H -> result` | `v7/A32/A64` | +| uint32x2_t vreinterpret_u32_f16(float16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.2S -> result` | `v7/A32/A64` | +| poly8x8_t vreinterpret_p8_f16(float16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.8B -> result` | `v7/A32/A64` | +| poly16x4_t vreinterpret_p16_f16(float16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.4H -> result` | `v7/A32/A64` | +| mfloat8x8_t vreinterpret_mf8_f16(float16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.8B -> result` | `A64` | +| uint64x1_t vreinterpret_u64_f16(float16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.1D -> result` | `v7/A32/A64` | +| int64x1_t vreinterpret_s64_f16(float16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.1D -> result` | `v7/A32/A64` | +| float64x1_t vreinterpret_f64_f16(float16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.1D -> result` | `A64` | +| poly64x1_t vreinterpret_p64_f16(float16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.1D -> result` | `A32/A64` | +| int16x8_t vreinterpretq_s16_s8(int8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.8H -> result` | `v7/A32/A64` | +| int32x4_t vreinterpretq_s32_s8(int8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.4S -> result` | `v7/A32/A64` | +| float32x4_t vreinterpretq_f32_s8(int8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.4S -> result` | `v7/A32/A64` | +| uint8x16_t vreinterpretq_u8_s8(int8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.16B -> result` | `v7/A32/A64` | +| uint16x8_t vreinterpretq_u16_s8(int8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.8H -> result` | `v7/A32/A64` | +| uint32x4_t vreinterpretq_u32_s8(int8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.4S -> result` | `v7/A32/A64` | +| poly8x16_t vreinterpretq_p8_s8(int8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.16B -> result` | `v7/A32/A64` | +| poly16x8_t vreinterpretq_p16_s8(int8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.8H -> result` | `v7/A32/A64` | +| mfloat8x16_t vreinterpretq_mf8_s8(int8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.16B -> result` | `A64` | +| uint64x2_t vreinterpretq_u64_s8(int8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.2D -> result` | `v7/A32/A64` | +| int64x2_t vreinterpretq_s64_s8(int8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.2D -> result` | `v7/A32/A64` | +| float64x2_t vreinterpretq_f64_s8(int8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.2D -> result` | `A64` | +| poly64x2_t vreinterpretq_p64_s8(int8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.2D -> result` | `A32/A64` | +| poly128_t vreinterpretq_p128_s8(int8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.1Q -> result` | `A32/A64` | +| float16x8_t vreinterpretq_f16_s8(int8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.8H -> result` | `v7/A32/A64` | +| int8x16_t vreinterpretq_s8_s16(int16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.16B -> result` | `v7/A32/A64` | +| int32x4_t vreinterpretq_s32_s16(int16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.4S -> result` | `v7/A32/A64` | +| float32x4_t vreinterpretq_f32_s16(int16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.4S -> result` | `v7/A32/A64` | +| uint8x16_t vreinterpretq_u8_s16(int16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.16B -> result` | `v7/A32/A64` | +| uint16x8_t vreinterpretq_u16_s16(int16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.8H -> result` | `v7/A32/A64` | +| uint32x4_t vreinterpretq_u32_s16(int16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.4S -> result` | `v7/A32/A64` | +| poly8x16_t vreinterpretq_p8_s16(int16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.16B -> result` | `v7/A32/A64` | +| poly16x8_t vreinterpretq_p16_s16(int16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.8H -> result` | `v7/A32/A64` | +| mfloat8x16_t vreinterpretq_mf8_s16(int16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.16B -> result` | `A64` | +| uint64x2_t vreinterpretq_u64_s16(int16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.2D -> result` | `v7/A32/A64` | +| int64x2_t vreinterpretq_s64_s16(int16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.2D -> result` | `v7/A32/A64` | +| float64x2_t vreinterpretq_f64_s16(int16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.2D -> result` | `A64` | +| poly64x2_t vreinterpretq_p64_s16(int16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.2D -> result` | `A32/A64` | +| poly128_t vreinterpretq_p128_s16(int16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.1Q -> result` | `A32/A64` | +| float16x8_t vreinterpretq_f16_s16(int16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.8H -> result` | `v7/A32/A64` | +| int8x16_t vreinterpretq_s8_s32(int32x4_t a) | `a -> Vd.4S` | `NOP` | `Vd.16B -> result` | `v7/A32/A64` | +| int16x8_t vreinterpretq_s16_s32(int32x4_t a) | `a -> Vd.4S` | `NOP` | `Vd.8H -> result` | `v7/A32/A64` | +| float32x4_t vreinterpretq_f32_s32(int32x4_t a) | `a -> Vd.4S` | `NOP` | `Vd.4S -> result` | `v7/A32/A64` | +| uint8x16_t vreinterpretq_u8_s32(int32x4_t a) | `a -> Vd.4S` | `NOP` | `Vd.16B -> result` | `v7/A32/A64` | +| uint16x8_t vreinterpretq_u16_s32(int32x4_t a) | `a -> Vd.4S` | `NOP` | `Vd.8H -> result` | `v7/A32/A64` | +| uint32x4_t vreinterpretq_u32_s32(int32x4_t a) | `a -> Vd.4S` | `NOP` | `Vd.4S -> result` | `v7/A32/A64` | +| poly8x16_t vreinterpretq_p8_s32(int32x4_t a) | `a -> Vd.4S` | `NOP` | `Vd.16B -> result` | `v7/A32/A64` | +| poly16x8_t vreinterpretq_p16_s32(int32x4_t a) | `a -> Vd.4S` | `NOP` | `Vd.8H -> result` | `v7/A32/A64` | +| mfloat8x16_t vreinterpretq_mf8_s32(int32x4_t a) | `a -> Vd.4S` | `NOP` | `Vd.16B -> result` | `A64` | +| uint64x2_t vreinterpretq_u64_s32(int32x4_t a) | `a -> Vd.4S` | `NOP` | `Vd.2D -> result` | `v7/A32/A64` | +| int64x2_t vreinterpretq_s64_s32(int32x4_t a) | `a -> Vd.4S` | `NOP` | `Vd.2D -> result` | `v7/A32/A64` | +| float64x2_t vreinterpretq_f64_s32(int32x4_t a) | `a -> Vd.4S` | `NOP` | `Vd.2D -> result` | `A64` | +| poly64x2_t vreinterpretq_p64_s32(int32x4_t a) | `a -> Vd.4S` | `NOP` | `Vd.2D -> result` | `A32/A64` | +| poly128_t vreinterpretq_p128_s32(int32x4_t a) | `a -> Vd.4S` | `NOP` | `Vd.1Q -> result` | `A32/A64` | +| float16x8_t vreinterpretq_f16_s32(int32x4_t a) | `a -> Vd.4S` | `NOP` | `Vd.8H -> result` | `v7/A32/A64` | +| int8x16_t vreinterpretq_s8_f32(float32x4_t a) | `a -> Vd.4S` | `NOP` | `Vd.16B -> result` | `v7/A32/A64` | +| int16x8_t vreinterpretq_s16_f32(float32x4_t a) | `a -> Vd.4S` | `NOP` | `Vd.8H -> result` | `v7/A32/A64` | +| int32x4_t vreinterpretq_s32_f32(float32x4_t a) | `a -> Vd.4S` | `NOP` | `Vd.4S -> result` | `v7/A32/A64` | +| uint8x16_t vreinterpretq_u8_f32(float32x4_t a) | `a -> Vd.4S` | `NOP` | `Vd.16B -> result` | `v7/A32/A64` | +| uint16x8_t vreinterpretq_u16_f32(float32x4_t a) | `a -> Vd.4S` | `NOP` | `Vd.8H -> result` | `v7/A32/A64` | +| uint32x4_t vreinterpretq_u32_f32(float32x4_t a) | `a -> Vd.4S` | `NOP` | `Vd.4S -> result` | `v7/A32/A64` | +| poly8x16_t vreinterpretq_p8_f32(float32x4_t a) | `a -> Vd.4S` | `NOP` | `Vd.16B -> result` | `v7/A32/A64` | +| poly16x8_t vreinterpretq_p16_f32(float32x4_t a) | `a -> Vd.4S` | `NOP` | `Vd.8H -> result` | `v7/A32/A64` | +| mfloat8x16_t vreinterpretq_mf8_f32(float32x4_t a) | `a -> Vd.4S` | `NOP` | `Vd.16B -> result` | `A64` | +| uint64x2_t vreinterpretq_u64_f32(float32x4_t a) | `a -> Vd.4S` | `NOP` | `Vd.2D -> result` | `v7/A32/A64` | +| int64x2_t vreinterpretq_s64_f32(float32x4_t a) | `a -> Vd.4S` | `NOP` | `Vd.2D -> result` | `v7/A32/A64` | +| float64x2_t vreinterpretq_f64_f32(float32x4_t a) | `a -> Vd.4S` | `NOP` | `Vd.2D -> result` | `A64` | +| poly64x2_t vreinterpretq_p64_f32(float32x4_t a) | `a -> Vd.4S` | `NOP` | `Vd.2D -> result` | `A32/A64` | +| poly128_t vreinterpretq_p128_f32(float32x4_t a) | `a -> Vd.4S` | `NOP` | `Vd.1Q -> result` | `A32/A64` | +| poly64x2_t vreinterpretq_p64_f64(float64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.2D -> result` | `A64` | +| poly128_t vreinterpretq_p128_f64(float64x2_t a) | `a -> Vd.1Q` | `NOP` | `Vd.2D -> result` | `A64` | +| float16x8_t vreinterpretq_f16_f32(float32x4_t a) | `a -> Vd.4S` | `NOP` | `Vd.8H -> result` | `v7/A32/A64` | +| int8x16_t vreinterpretq_s8_u8(uint8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.16B -> result` | `v7/A32/A64` | +| int16x8_t vreinterpretq_s16_u8(uint8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.8H -> result` | `v7/A32/A64` | +| int32x4_t vreinterpretq_s32_u8(uint8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.4S -> result` | `v7/A32/A64` | +| float32x4_t vreinterpretq_f32_u8(uint8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.4S -> result` | `v7/A32/A64` | +| uint16x8_t vreinterpretq_u16_u8(uint8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.8H -> result` | `v7/A32/A64` | +| uint32x4_t vreinterpretq_u32_u8(uint8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.4S -> result` | `v7/A32/A64` | +| poly8x16_t vreinterpretq_p8_u8(uint8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.16B -> result` | `v7/A32/A64` | +| poly16x8_t vreinterpretq_p16_u8(uint8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.8H -> result` | `v7/A32/A64` | +| mfloat8x16_t vreinterpretq_mf8_u8(uint8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.16B -> result` | `A64` | +| uint64x2_t vreinterpretq_u64_u8(uint8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.2D -> result` | `v7/A32/A64` | +| int64x2_t vreinterpretq_s64_u8(uint8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.2D -> result` | `v7/A32/A64` | +| float64x2_t vreinterpretq_f64_u8(uint8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.2D -> result` | `A64` | +| poly64x2_t vreinterpretq_p64_u8(uint8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.2D -> result` | `A32/A64` | +| poly128_t vreinterpretq_p128_u8(uint8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.1Q -> result` | `A32/A64` | +| float16x8_t vreinterpretq_f16_u8(uint8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.8H -> result` | `v7/A32/A64` | +| int8x16_t vreinterpretq_s8_u16(uint16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.16B -> result` | `v7/A32/A64` | +| int16x8_t vreinterpretq_s16_u16(uint16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.8H -> result` | `v7/A32/A64` | +| int32x4_t vreinterpretq_s32_u16(uint16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.4S -> result` | `v7/A32/A64` | +| float32x4_t vreinterpretq_f32_u16(uint16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.4S -> result` | `v7/A32/A64` | +| uint8x16_t vreinterpretq_u8_u16(uint16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.16B -> result` | `v7/A32/A64` | +| uint32x4_t vreinterpretq_u32_u16(uint16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.4S -> result` | `v7/A32/A64` | +| poly8x16_t vreinterpretq_p8_u16(uint16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.16B -> result` | `v7/A32/A64` | +| poly16x8_t vreinterpretq_p16_u16(uint16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.8H -> result` | `v7/A32/A64` | +| mfloat8x16_t vreinterpretq_mf8_u16(uint16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.16B -> result` | `A64` | +| uint64x2_t vreinterpretq_u64_u16(uint16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.2D -> result` | `v7/A32/A64` | +| int64x2_t vreinterpretq_s64_u16(uint16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.2D -> result` | `v7/A32/A64` | +| float64x2_t vreinterpretq_f64_u16(uint16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.2D -> result` | `A64` | +| poly64x2_t vreinterpretq_p64_u16(uint16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.2D -> result` | `A32/A64` | +| poly128_t vreinterpretq_p128_u16(uint16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.1Q -> result` | `A32/A64` | +| float16x8_t vreinterpretq_f16_u16(uint16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.8H -> result` | `v7/A32/A64` | +| int8x16_t vreinterpretq_s8_u32(uint32x4_t a) | `a -> Vd.4S` | `NOP` | `Vd.16B -> result` | `v7/A32/A64` | +| int16x8_t vreinterpretq_s16_u32(uint32x4_t a) | `a -> Vd.4S` | `NOP` | `Vd.8H -> result` | `v7/A32/A64` | +| int32x4_t vreinterpretq_s32_u32(uint32x4_t a) | `a -> Vd.4S` | `NOP` | `Vd.4S -> result` | `v7/A32/A64` | +| float32x4_t vreinterpretq_f32_u32(uint32x4_t a) | `a -> Vd.4S` | `NOP` | `Vd.4S -> result` | `v7/A32/A64` | +| uint8x16_t vreinterpretq_u8_u32(uint32x4_t a) | `a -> Vd.4S` | `NOP` | `Vd.16B -> result` | `v7/A32/A64` | +| uint16x8_t vreinterpretq_u16_u32(uint32x4_t a) | `a -> Vd.4S` | `NOP` | `Vd.8H -> result` | `v7/A32/A64` | +| poly8x16_t vreinterpretq_p8_u32(uint32x4_t a) | `a -> Vd.4S` | `NOP` | `Vd.16B -> result` | `v7/A32/A64` | +| poly16x8_t vreinterpretq_p16_u32(uint32x4_t a) | `a -> Vd.4S` | `NOP` | `Vd.8H -> result` | `v7/A32/A64` | +| mfloat8x16_t vreinterpretq_mf8_u32(uint32x4_t a) | `a -> Vd.4S` | `NOP` | `Vd.16B -> result` | `A64` | +| uint64x2_t vreinterpretq_u64_u32(uint32x4_t a) | `a -> Vd.4S` | `NOP` | `Vd.2D -> result` | `v7/A32/A64` | +| int64x2_t vreinterpretq_s64_u32(uint32x4_t a) | `a -> Vd.4S` | `NOP` | `Vd.2D -> result` | `v7/A32/A64` | +| float64x2_t vreinterpretq_f64_u32(uint32x4_t a) | `a -> Vd.4S` | `NOP` | `Vd.2D -> result` | `A64` | +| poly64x2_t vreinterpretq_p64_u32(uint32x4_t a) | `a -> Vd.4S` | `NOP` | `Vd.2D -> result` | `A32/A64` | +| poly128_t vreinterpretq_p128_u32(uint32x4_t a) | `a -> Vd.4S` | `NOP` | `Vd.1Q -> result` | `A32/A64` | +| float16x8_t vreinterpretq_f16_u32(uint32x4_t a) | `a -> Vd.4S` | `NOP` | `Vd.8H -> result` | `v7/A32/A64` | +| int8x16_t vreinterpretq_s8_p8(poly8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.16B -> result` | `v7/A32/A64` | +| int16x8_t vreinterpretq_s16_p8(poly8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.8H -> result` | `v7/A32/A64` | +| int32x4_t vreinterpretq_s32_p8(poly8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.4S -> result` | `v7/A32/A64` | +| float32x4_t vreinterpretq_f32_p8(poly8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.4S -> result` | `v7/A32/A64` | +| uint8x16_t vreinterpretq_u8_p8(poly8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.16B -> result` | `v7/A32/A64` | +| uint16x8_t vreinterpretq_u16_p8(poly8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.8H -> result` | `v7/A32/A64` | +| uint32x4_t vreinterpretq_u32_p8(poly8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.4S -> result` | `v7/A32/A64` | +| poly16x8_t vreinterpretq_p16_p8(poly8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.8H -> result` | `v7/A32/A64` | +| uint64x2_t vreinterpretq_u64_p8(poly8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.2D -> result` | `v7/A32/A64` | +| int64x2_t vreinterpretq_s64_p8(poly8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.2D -> result` | `v7/A32/A64` | +| float64x2_t vreinterpretq_f64_p8(poly8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.2D -> result` | `A64` | +| poly64x2_t vreinterpretq_p64_p8(poly8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.2D -> result` | `A32/A64` | +| poly128_t vreinterpretq_p128_p8(poly8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.1Q -> result` | `A32/A64` | +| float16x8_t vreinterpretq_f16_p8(poly8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.8H -> result` | `v7/A32/A64` | +| int8x16_t vreinterpretq_s8_mf8(mfloat8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.16B -> result` | `A64` | +| int16x8_t vreinterpretq_s16_mf8(mfloat8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.8H -> result` | `A64` | +| int32x4_t vreinterpretq_s32_mf8(mfloat8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.4S -> result` | `A64` | +| float32x4_t vreinterpretq_f32_mf8(mfloat8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.4S -> result` | `A64` | +| uint8x16_t vreinterpretq_u8_mf8(mfloat8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.16B -> result` | `A64` | +| uint16x8_t vreinterpretq_u16_mf8(mfloat8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.8H -> result` | `A64` | +| uint32x4_t vreinterpretq_u32_mf8(mfloat8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.4S -> result` | `A64` | +| poly16x8_t vreinterpretq_p16_mf8(mfloat8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.8H -> result` | `A64` | +| uint64x2_t vreinterpretq_u64_mf8(mfloat8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.2D -> result` | `A64` | +| int64x2_t vreinterpretq_s64_mf8(mfloat8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.2D -> result` | `A64` | +| float64x2_t vreinterpretq_f64_mf8(mfloat8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.2D -> result` | `A64` | +| poly64x2_t vreinterpretq_p64_mf8(mfloat8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.2D -> result` | `A64` | +| poly128_t vreinterpretq_p128_mf8(mfloat8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.1Q -> result` | `A64` | +| float16x8_t vreinterpretq_f16_mf8(mfloat8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.8H -> result` | `A64` | +| int8x16_t vreinterpretq_s8_p16(poly16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.16B -> result` | `v7/A32/A64` | +| int16x8_t vreinterpretq_s16_p16(poly16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.8H -> result` | `v7/A32/A64` | +| int32x4_t vreinterpretq_s32_p16(poly16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.4S -> result` | `v7/A32/A64` | +| float32x4_t vreinterpretq_f32_p16(poly16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.4S -> result` | `v7/A32/A64` | +| uint8x16_t vreinterpretq_u8_p16(poly16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.16B -> result` | `v7/A32/A64` | +| uint16x8_t vreinterpretq_u16_p16(poly16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.8H -> result` | `v7/A32/A64` | +| uint32x4_t vreinterpretq_u32_p16(poly16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.4S -> result` | `v7/A32/A64` | +| poly8x16_t vreinterpretq_p8_p16(poly16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.16B -> result` | `v7/A32/A64` | +| mfloat8x16_t vreinterpretq_mf8_p16(poly16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.16B -> result` | `A64` | +| uint64x2_t vreinterpretq_u64_p16(poly16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.2D -> result` | `v7/A32/A64` | +| int64x2_t vreinterpretq_s64_p16(poly16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.2D -> result` | `v7/A32/A64` | +| float64x2_t vreinterpretq_f64_p16(poly16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.2D -> result` | `A64` | +| poly64x2_t vreinterpretq_p64_p16(poly16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.2D -> result` | `A32/A64` | +| poly128_t vreinterpretq_p128_p16(poly16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.1Q -> result` | `A32/A64` | +| float16x8_t vreinterpretq_f16_p16(poly16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.8H -> result` | `v7/A32/A64` | +| int8x16_t vreinterpretq_s8_u64(uint64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.16B -> result` | `v7/A32/A64` | +| int16x8_t vreinterpretq_s16_u64(uint64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.8H -> result` | `v7/A32/A64` | +| int32x4_t vreinterpretq_s32_u64(uint64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.4S -> result` | `v7/A32/A64` | +| float32x4_t vreinterpretq_f32_u64(uint64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.4S -> result` | `v7/A32/A64` | +| uint8x16_t vreinterpretq_u8_u64(uint64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.16B -> result` | `v7/A32/A64` | +| uint16x8_t vreinterpretq_u16_u64(uint64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.8H -> result` | `v7/A32/A64` | +| uint32x4_t vreinterpretq_u32_u64(uint64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.4S -> result` | `v7/A32/A64` | +| poly8x16_t vreinterpretq_p8_u64(uint64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.16B -> result` | `v7/A32/A64` | +| poly16x8_t vreinterpretq_p16_u64(uint64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.8H -> result` | `v7/A32/A64` | +| mfloat8x16_t vreinterpretq_mf8_u64(uint64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.16B -> result` | `A64` | +| int64x2_t vreinterpretq_s64_u64(uint64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.2D -> result` | `v7/A32/A64` | +| float64x2_t vreinterpretq_f64_u64(uint64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.2D -> result` | `v7/A32/A64` | +| float64x2_t vreinterpretq_f64_s64(int64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.2D -> result` | `A64` | +| poly64x2_t vreinterpretq_p64_s64(int64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.2D -> result` | `A32/A64` | +| poly128_t vreinterpretq_p128_s64(int64x2_t a) | `a -> Vd.1Q` | `NOP` | `Vd.2D -> result` | `A32/A64` | +| poly64x2_t vreinterpretq_p64_u64(uint64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.2D -> result` | `A32/A64` | +| poly128_t vreinterpretq_p128_u64(uint64x2_t a) | `a -> Vd.1Q` | `NOP` | `Vd.2D -> result` | `A32/A64` | +| float16x8_t vreinterpretq_f16_u64(uint64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.8H -> result` | `v7/A32/A64` | +| int8x16_t vreinterpretq_s8_s64(int64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.16B -> result` | `v7/A32/A64` | +| int16x8_t vreinterpretq_s16_s64(int64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.8H -> result` | `v7/A32/A64` | +| int32x4_t vreinterpretq_s32_s64(int64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.4S -> result` | `v7/A32/A64` | +| float32x4_t vreinterpretq_f32_s64(int64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.4S -> result` | `v7/A32/A64` | +| uint8x16_t vreinterpretq_u8_s64(int64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.16B -> result` | `v7/A32/A64` | +| uint16x8_t vreinterpretq_u16_s64(int64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.8H -> result` | `v7/A32/A64` | +| uint32x4_t vreinterpretq_u32_s64(int64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.4S -> result` | `v7/A32/A64` | +| poly8x16_t vreinterpretq_p8_s64(int64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.16B -> result` | `v7/A32/A64` | +| poly16x8_t vreinterpretq_p16_s64(int64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.8H -> result` | `v7/A32/A64` | +| mfloat8x16_t vreinterpretq_mf8_s64(int64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.16B -> result` | `A64` | +| uint64x2_t vreinterpretq_u64_s64(int64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.2D -> result` | `v7/A32/A64` | +| uint64x2_t vreinterpretq_u64_p64(poly64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.2D -> result` | `A32/A64` | +| float16x8_t vreinterpretq_f16_s64(int64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.8H -> result` | `v7/A32/A64` | +| int8x16_t vreinterpretq_s8_f16(float16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.16B -> result` | `v7/A32/A64` | +| int16x8_t vreinterpretq_s16_f16(float16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.8H -> result` | `v7/A32/A64` | +| int32x4_t vreinterpretq_s32_f16(float16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.4S -> result` | `v7/A32/A64` | +| float32x4_t vreinterpretq_f32_f16(float16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.4S -> result` | `v7/A32/A64` | +| uint8x16_t vreinterpretq_u8_f16(float16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.16B -> result` | `v7/A32/A64` | +| uint16x8_t vreinterpretq_u16_f16(float16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.8H -> result` | `v7/A32/A64` | +| uint32x4_t vreinterpretq_u32_f16(float16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.4S -> result` | `v7/A32/A64` | +| poly8x16_t vreinterpretq_p8_f16(float16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.16B -> result` | `v7/A32/A64` | +| poly16x8_t vreinterpretq_p16_f16(float16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.8H -> result` | `v7/A32/A64` | +| mfloat8x16_t vreinterpretq_mf8_f16(float16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.16B -> result` | `A64` | +| uint64x2_t vreinterpretq_u64_f16(float16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.2D -> result` | `v7/A32/A64` | +| int64x2_t vreinterpretq_s64_f16(float16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.2D -> result` | `v7/A32/A64` | +| float64x2_t vreinterpretq_f64_f16(float16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.2D -> result` | `A64` | +| poly64x2_t vreinterpretq_p64_f16(float16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.2D -> result` | `A32/A64` | +| poly128_t vreinterpretq_p128_f16(float16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.1Q -> result` | `A32/A64` | +| int8x8_t vreinterpret_s8_f64(float64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.8B -> result` | `A64` | +| int16x4_t vreinterpret_s16_f64(float64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.4H -> result` | `A64` | +| int32x2_t vreinterpret_s32_f64(float64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.2S -> result` | `A64` | +| uint8x8_t vreinterpret_u8_f64(float64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.8B -> result` | `A64` | +| uint16x4_t vreinterpret_u16_f64(float64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.4H -> result` | `A64` | +| uint32x2_t vreinterpret_u32_f64(float64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.2S -> result` | `A64` | +| poly8x8_t vreinterpret_p8_f64(float64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.8B -> result` | `A64` | +| poly16x4_t vreinterpret_p16_f64(float64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.4H -> result` | `A64` | +| mfloat8x8_t vreinterpret_mf8_f64(float64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.8B -> result` | `A64` | +| uint64x1_t vreinterpret_u64_f64(float64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.1D -> result` | `A64` | +| int64x1_t vreinterpret_s64_f64(float64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.1D -> result` | `A64` | +| float16x4_t vreinterpret_f16_f64(float64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.4H -> result` | `A64` | +| float32x2_t vreinterpret_f32_f64(float64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.2S -> result` | `A64` | +| int8x16_t vreinterpretq_s8_f64(float64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.16B -> result` | `A64` | +| int16x8_t vreinterpretq_s16_f64(float64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.8H -> result` | `A64` | +| int32x4_t vreinterpretq_s32_f64(float64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.4S -> result` | `A64` | +| uint8x16_t vreinterpretq_u8_f64(float64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.16B -> result` | `A64` | +| uint16x8_t vreinterpretq_u16_f64(float64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.8H -> result` | `A64` | +| uint32x4_t vreinterpretq_u32_f64(float64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.4S -> result` | `A64` | +| poly8x16_t vreinterpretq_p8_f64(float64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.16B -> result` | `A64` | +| poly16x8_t vreinterpretq_p16_f64(float64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.8H -> result` | `A64` | +| mfloat8x16_t vreinterpretq_mf8_f64(float64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.16B -> result` | `A64` | +| uint64x2_t vreinterpretq_u64_f64(float64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.2D -> result` | `A64` | +| int64x2_t vreinterpretq_s64_f64(float64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.2D -> result` | `A64` | +| float16x8_t vreinterpretq_f16_f64(float64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.8H -> result` | `A64` | +| float32x4_t vreinterpretq_f32_f64(float64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.4S -> result` | `A64` | +| int8x8_t vreinterpret_s8_p64(poly64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.8B -> result` | `A32/A64` | +| int16x4_t vreinterpret_s16_p64(poly64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.4H -> result` | `A32/A64` | +| int32x2_t vreinterpret_s32_p64(poly64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.2S -> result` | `A32/A64` | +| uint8x8_t vreinterpret_u8_p64(poly64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.8B -> result` | `A32/A64` | +| uint16x4_t vreinterpret_u16_p64(poly64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.4H -> result` | `A32/A64` | +| uint32x2_t vreinterpret_u32_p64(poly64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.2S -> result` | `A32/A64` | +| poly8x8_t vreinterpret_p8_p64(poly64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.8B -> result` | `A32/A64` | +| poly16x4_t vreinterpret_p16_p64(poly64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.4H -> result` | `A32/A64` | +| mfloat8x8_t vreinterpret_mf8_p64(poly64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.8B -> result` | `A32/A64` | +| int64x1_t vreinterpret_s64_p64(poly64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.1D -> result` | `A32/A64` | +| float64x1_t vreinterpret_f64_p64(poly64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.1D -> result` | `A64` | +| float16x4_t vreinterpret_f16_p64(poly64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.4H -> result` | `A32/A64` | +| int8x16_t vreinterpretq_s8_p64(poly64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.16B -> result` | `A32/A64` | +| int16x8_t vreinterpretq_s16_p64(poly64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.8H -> result` | `A32/A64` | +| int32x4_t vreinterpretq_s32_p64(poly64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.4S -> result` | `A32/A64` | +| uint8x16_t vreinterpretq_u8_p64(poly64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.16B -> result` | `A32/A64` | +| uint16x8_t vreinterpretq_u16_p64(poly64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.8H -> result` | `A32/A64` | +| uint32x4_t vreinterpretq_u32_p64(poly64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.4S -> result` | `A32/A64` | +| poly8x16_t vreinterpretq_p8_p64(poly64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.16B -> result` | `A32/A64` | +| poly16x8_t vreinterpretq_p16_p64(poly64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.8H -> result` | `A32/A64` | +| mfloat8x16_t vreinterpretq_mf8_p64(poly64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.16B -> result` | `A64` | +| int64x2_t vreinterpretq_s64_p64(poly64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.2D -> result` | `A32/A64` | +| float64x2_t vreinterpretq_f64_p64(poly64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.2D -> result` | `A64` | +| float16x8_t vreinterpretq_f16_p64(poly64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.8H -> result` | `A32/A64` | +| int8x16_t vreinterpretq_s8_p128(poly128_t a) | `a -> Vd.1Q` | `NOP` | `Vd.16B -> result` | `A32/A64` | +| int16x8_t vreinterpretq_s16_p128(poly128_t a) | `a -> Vd.1Q` | `NOP` | `Vd.8H -> result` | `A32/A64` | +| int32x4_t vreinterpretq_s32_p128(poly128_t a) | `a -> Vd.1Q` | `NOP` | `Vd.4S -> result` | `A32/A64` | +| uint8x16_t vreinterpretq_u8_p128(poly128_t a) | `a -> Vd.1Q` | `NOP` | `Vd.16B -> result` | `A32/A64` | +| uint16x8_t vreinterpretq_u16_p128(poly128_t a) | `a -> Vd.1Q` | `NOP` | `Vd.8H -> result` | `A32/A64` | +| uint32x4_t vreinterpretq_u32_p128(poly128_t a) | `a -> Vd.1Q` | `NOP` | `Vd.4S -> result` | `A32/A64` | +| poly8x16_t vreinterpretq_p8_p128(poly128_t a) | `a -> Vd.1Q` | `NOP` | `Vd.16B -> result` | `A32/A64` | +| poly16x8_t vreinterpretq_p16_p128(poly128_t a) | `a -> Vd.1Q` | `NOP` | `Vd.8H -> result` | `A32/A64` | +| mfloat8x16_t vreinterpretq_mf8_p128(poly128_t a) | `a -> Vd.1Q` | `NOP` | `Vd.16B -> result` | `A64` | +| uint64x2_t vreinterpretq_u64_p128(poly128_t a) | `a -> Vd.1Q` | `NOP` | `Vd.2D -> result` | `A32/A64` | +| int64x2_t vreinterpretq_s64_p128(poly128_t a) | `a -> Vd.1Q` | `NOP` | `Vd.2D -> result` | `A32/A64` | +| float64x2_t vreinterpretq_f64_p128(poly128_t a) | `a -> Vd.1Q` | `NOP` | `Vd.2D -> result` | `A64` | +| float16x8_t vreinterpretq_f16_p128(poly128_t a) | `a -> Vd.1Q` | `NOP` | `Vd.8H -> result` | `A32/A64` | +| mfloat8x8_t vreinterpret_mf8_u8(uint8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.8B -> result` | `A64` | +| mfloat8x16_t vreinterpretq_mf8_u8(uint8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.16B -> result` | `A64` | +| uint8x8_t vreinterpret_u8_mf8(mfloat8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.8B -> result` | `A64` | +| uint8x16_t vreinterpretq_u8_mf8(mfloat8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.16B -> result` | `A64` | ### Move @@ -3045,60 +3103,64 @@ The intrinsics in this section are guarded by the macro ``__ARM_NEON``. #### Copy vector lane -| Intrinsic | Argument preparation | AArch64 Instruction | Result | Supported architectures | -|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------|-------------------------------|--------------------|---------------------------| -| int8x8_t vcopy_lane_s8(
     int8x8_t a,
     const int lane1,
     int8x8_t b,
     const int lane2)
| `a -> Vd.8B`
`0 <= lane1 <= 7`
`b -> Vn.8B`
`0 <= lane2 <= 7` | `INS Vd.B[lane1],Vn.B[lane2]` | `Vd.8B -> result` | `A64` | -| int8x16_t vcopyq_lane_s8(
     int8x16_t a,
     const int lane1,
     int8x8_t b,
     const int lane2)
| `a -> Vd.16B`
`0 <= lane1 <= 15`
`b -> Vn.8B`
`0 <= lane2 <= 7` | `INS Vd.B[lane1],Vn.B[lane2]` | `Vd.16B -> result` | `A64` | -| int16x4_t vcopy_lane_s16(
     int16x4_t a,
     const int lane1,
     int16x4_t b,
     const int lane2)
| `a -> Vd.4H`
`0 <= lane1 <= 3`
`b -> Vn.4H`
`0 <= lane2 <= 3` | `INS Vd.H[lane1],Vn.H[lane2]` | `Vd.4H -> result` | `A64` | -| int16x8_t vcopyq_lane_s16(
     int16x8_t a,
     const int lane1,
     int16x4_t b,
     const int lane2)
| `a -> Vd.8H`
`0 <= lane1 <= 7`
`b -> Vn.4H`
`0 <= lane2 <= 3` | `INS Vd.H[lane1],Vn.H[lane2]` | `Vd.8H -> result` | `A64` | -| int32x2_t vcopy_lane_s32(
     int32x2_t a,
     const int lane1,
     int32x2_t b,
     const int lane2)
| `a -> Vd.2S`
`0 <= lane1 <= 1`
`b -> Vn.2S`
`0 <= lane2 <= 1` | `INS Vd.S[lane1],Vn.S[lane2]` | `Vd.2S -> result` | `A64` | -| int32x4_t vcopyq_lane_s32(
     int32x4_t a,
     const int lane1,
     int32x2_t b,
     const int lane2)
| `a -> Vd.4S`
`0 <= lane1 <= 3`
`b -> Vn.2S`
`0 <= lane2 <= 1` | `INS Vd.S[lane1],Vn.S[lane2]` | `Vd.4S -> result` | `A64` | -| int64x1_t vcopy_lane_s64(
     int64x1_t a,
     const int lane1,
     int64x1_t b,
     const int lane2)
| `a -> UNUSED`
`0 <= lane1 <= 0`
`b -> Vn.1D`
`0 <= lane2 <= 0` | `DUP Dd,Vn.D[lane2]` | `Dd -> result` | `A64` | -| int64x2_t vcopyq_lane_s64(
     int64x2_t a,
     const int lane1,
     int64x1_t b,
     const int lane2)
| `a -> Vd.2D`
`0 <= lane1 <= 1`
`b -> Vn.1D`
`0 <= lane2 <= 0` | `INS Vd.D[lane1],Vn.D[lane2]` | `Vd.2D -> result` | `A64` | -| uint8x8_t vcopy_lane_u8(
     uint8x8_t a,
     const int lane1,
     uint8x8_t b,
     const int lane2)
| `a -> Vd.8B`
`0 <= lane1 <= 7`
`b -> Vn.8B`
`0 <= lane2 <= 7` | `INS Vd.B[lane1],Vn.B[lane2]` | `Vd.8B -> result` | `A64` | -| uint8x16_t vcopyq_lane_u8(
     uint8x16_t a,
     const int lane1,
     uint8x8_t b,
     const int lane2)
| `a -> Vd.16B`
`0 <= lane1 <= 15`
`b -> Vn.8B`
`0 <= lane2 <= 7` | `INS Vd.B[lane1],Vn.B[lane2]` | `Vd.16B -> result` | `A64` | -| uint16x4_t vcopy_lane_u16(
     uint16x4_t a,
     const int lane1,
     uint16x4_t b,
     const int lane2)
| `a -> Vd.4H`
`0 <= lane1 <= 3`
`b -> Vn.4H`
`0 <= lane2 <= 3` | `INS Vd.H[lane1],Vn.H[lane2]` | `Vd.4H -> result` | `A64` | -| uint16x8_t vcopyq_lane_u16(
     uint16x8_t a,
     const int lane1,
     uint16x4_t b,
     const int lane2)
| `a -> Vd.8H`
`0 <= lane1 <= 7`
`b -> Vn.4H`
`0 <= lane2 <= 3` | `INS Vd.H[lane1],Vn.H[lane2]` | `Vd.8H -> result` | `A64` | -| uint32x2_t vcopy_lane_u32(
     uint32x2_t a,
     const int lane1,
     uint32x2_t b,
     const int lane2)
| `a -> Vd.2S`
`0 <= lane1 <= 1`
`b -> Vn.2S`
`0 <= lane2 <= 1` | `INS Vd.S[lane1],Vn.S[lane2]` | `Vd.2S -> result` | `A64` | -| uint32x4_t vcopyq_lane_u32(
     uint32x4_t a,
     const int lane1,
     uint32x2_t b,
     const int lane2)
| `a -> Vd.4S`
`0 <= lane1 <= 3`
`b -> Vn.2S`
`0 <= lane2 <= 1` | `INS Vd.S[lane1],Vn.S[lane2]` | `Vd.4S -> result` | `A64` | -| uint64x1_t vcopy_lane_u64(
     uint64x1_t a,
     const int lane1,
     uint64x1_t b,
     const int lane2)
| `a -> UNUSED`
`0 <= lane1 <= 0`
`b -> Vn.1D`
`0 <= lane2 <= 0` | `DUP Dd,Vn.D[lane2]` | `Dd -> result` | `A64` | -| uint64x2_t vcopyq_lane_u64(
     uint64x2_t a,
     const int lane1,
     uint64x1_t b,
     const int lane2)
| `a -> Vd.2D`
`0 <= lane1 <= 1`
`b -> Vn.1D`
`0 <= lane2 <= 0` | `INS Vd.D[lane1],Vn.D[lane2]` | `Vd.2D -> result` | `A64` | -| poly64x1_t vcopy_lane_p64(
     poly64x1_t a,
     const int lane1,
     poly64x1_t b,
     const int lane2)
| `a -> UNUSED`
`0 <= lane1 <= 0`
`b -> Vn.1D`
`0 <= lane2 <= 0` | `DUP Dd,Vn.D[lane2]` | `Dd -> result` | `A32/A64` | -| poly64x2_t vcopyq_lane_p64(
     poly64x2_t a,
     const int lane1,
     poly64x1_t b,
     const int lane2)
| `a -> Vd.2D`
`0 <= lane1 <= 1`
`b -> Vn.1D`
`0 <= lane2 <= 0` | `INS Vd.D[lane1],Vn.D[lane2]` | `Vd.2D -> result` | `A32/A64` | -| float32x2_t vcopy_lane_f32(
     float32x2_t a,
     const int lane1,
     float32x2_t b,
     const int lane2)
| `a -> Vd.2S`
`0 <= lane1 <= 1`
`b -> Vn.2S`
`0 <= lane2 <= 1` | `INS Vd.S[lane1],Vn.S[lane2]` | `Vd.2S -> result` | `A64` | -| float32x4_t vcopyq_lane_f32(
     float32x4_t a,
     const int lane1,
     float32x2_t b,
     const int lane2)
| `a -> Vd.4S`
`0 <= lane1 <= 3`
`b -> Vn.2S`
`0 <= lane2 <= 1` | `INS Vd.S[lane1],Vn.S[lane2]` | `Vd.4S -> result` | `A64` | -| float64x1_t vcopy_lane_f64(
     float64x1_t a,
     const int lane1,
     float64x1_t b,
     const int lane2)
| `a -> UNUSED`
`0 <= lane1 <= 0`
`b -> Vn.1D`
`0 <= lane2 <= 0` | `DUP Dd,Vn.D[lane2]` | `Dd -> result` | `A64` | -| float64x2_t vcopyq_lane_f64(
     float64x2_t a,
     const int lane1,
     float64x1_t b,
     const int lane2)
| `a -> Vd.2D`
`0 <= lane1 <= 1`
`b -> Vn.1D`
`0 <= lane2 <= 0` | `INS Vd.D[lane1],Vn.D[lane2]` | `Vd.2D -> result` | `A64` | -| poly8x8_t vcopy_lane_p8(
     poly8x8_t a,
     const int lane1,
     poly8x8_t b,
     const int lane2)
| `a -> Vd.8B`
`0 <= lane1 <= 7`
`b -> Vn.8B`
`0 <= lane2 <= 7` | `INS Vd.B[lane1],Vn.B[lane2]` | `Vd.8B -> result` | `A64` | -| poly8x16_t vcopyq_lane_p8(
     poly8x16_t a,
     const int lane1,
     poly8x8_t b,
     const int lane2)
| `a -> Vd.16B`
`0 <= lane1 <= 15`
`b -> Vn.8B`
`0 <= lane2 <= 7` | `INS Vd.B[lane1],Vn.B[lane2]` | `Vd.16B -> result` | `A64` | -| poly16x4_t vcopy_lane_p16(
     poly16x4_t a,
     const int lane1,
     poly16x4_t b,
     const int lane2)
| `a -> Vd.4H`
`0 <= lane1 <= 3`
`b -> Vn.4H`
`0 <= lane2 <= 3` | `INS Vd.H[lane1],Vn.H[lane2]` | `Vd.4H -> result` | `A64` | -| poly16x8_t vcopyq_lane_p16(
     poly16x8_t a,
     const int lane1,
     poly16x4_t b,
     const int lane2)
| `a -> Vd.8H`
`0 <= lane1 <= 7`
`b -> Vn.4H`
`0 <= lane2 <= 3` | `INS Vd.H[lane1],Vn.H[lane2]` | `Vd.8H -> result` | `A64` | -| int8x8_t vcopy_laneq_s8(
     int8x8_t a,
     const int lane1,
     int8x16_t b,
     const int lane2)
| `a -> Vd.8B`
`0 <= lane1 <= 7`
`b -> Vn.16B`
`0 <= lane2 <= 15` | `INS Vd.B[lane1],Vn.B[lane2]` | `Vd.8B -> result` | `A64` | -| int8x16_t vcopyq_laneq_s8(
     int8x16_t a,
     const int lane1,
     int8x16_t b,
     const int lane2)
| `a -> Vd.16B`
`0 <= lane1 <= 15`
`b -> Vn.16B`
`0 <= lane2 <= 15` | `INS Vd.B[lane1],Vn.B[lane2]` | `Vd.16B -> result` | `A64` | -| int16x4_t vcopy_laneq_s16(
     int16x4_t a,
     const int lane1,
     int16x8_t b,
     const int lane2)
| `a -> Vd.4H`
`0 <= lane1 <= 3`
`b -> Vn.8H`
`0 <= lane2 <= 7` | `INS Vd.H[lane1],Vn.H[lane2]` | `Vd.4H -> result` | `A64` | -| int16x8_t vcopyq_laneq_s16(
     int16x8_t a,
     const int lane1,
     int16x8_t b,
     const int lane2)
| `a -> Vd.8H`
`0 <= lane1 <= 7`
`b -> Vn.8H`
`0 <= lane2 <= 7` | `INS Vd.H[lane1],Vn.H[lane2]` | `Vd.8H -> result` | `A64` | -| int32x2_t vcopy_laneq_s32(
     int32x2_t a,
     const int lane1,
     int32x4_t b,
     const int lane2)
| `a -> Vd.2S`
`0 <= lane1 <= 1`
`b -> Vn.4S`
`0 <= lane2 <= 3` | `INS Vd.S[lane1],Vn.S[lane2]` | `Vd.2S -> result` | `A64` | -| int32x4_t vcopyq_laneq_s32(
     int32x4_t a,
     const int lane1,
     int32x4_t b,
     const int lane2)
| `a -> Vd.4S`
`0 <= lane1 <= 3`
`b -> Vn.4S`
`0 <= lane2 <= 3` | `INS Vd.S[lane1],Vn.S[lane2]` | `Vd.4S -> result` | `A64` | -| int64x1_t vcopy_laneq_s64(
     int64x1_t a,
     const int lane1,
     int64x2_t b,
     const int lane2)
| `a -> UNUSED`
`0 <= lane1 <= 0`
`b -> Vn.2D`
`0 <= lane2 <= 1` | `DUP Dd,Vn.D[lane2]` | `Dd -> result` | `A64` | -| int64x2_t vcopyq_laneq_s64(
     int64x2_t a,
     const int lane1,
     int64x2_t b,
     const int lane2)
| `a -> Vd.2D`
`0 <= lane1 <= 1`
`b -> Vn.2D`
`0 <= lane2 <= 1` | `INS Vd.D[lane1],Vn.D[lane2]` | `Vd.2D -> result` | `A64` | -| uint8x8_t vcopy_laneq_u8(
     uint8x8_t a,
     const int lane1,
     uint8x16_t b,
     const int lane2)
| `a -> Vd.8B`
`0 <= lane1 <= 7`
`b -> Vn.16B`
`0 <= lane2 <= 15` | `INS Vd.B[lane1],Vn.B[lane2]` | `Vd.8B -> result` | `A64` | -| uint8x16_t vcopyq_laneq_u8(
     uint8x16_t a,
     const int lane1,
     uint8x16_t b,
     const int lane2)
| `a -> Vd.16B`
`0 <= lane1 <= 15`
`b -> Vn.16B`
`0 <= lane2 <= 15` | `INS Vd.B[lane1],Vn.B[lane2]` | `Vd.16B -> result` | `A64` | -| uint16x4_t vcopy_laneq_u16(
     uint16x4_t a,
     const int lane1,
     uint16x8_t b,
     const int lane2)
| `a -> Vd.4H`
`0 <= lane1 <= 3`
`b -> Vn.8H`
`0 <= lane2 <= 7` | `INS Vd.H[lane1],Vn.H[lane2]` | `Vd.4H -> result` | `A64` | -| uint16x8_t vcopyq_laneq_u16(
     uint16x8_t a,
     const int lane1,
     uint16x8_t b,
     const int lane2)
| `a -> Vd.8H`
`0 <= lane1 <= 7`
`b -> Vn.8H`
`0 <= lane2 <= 7` | `INS Vd.H[lane1],Vn.H[lane2]` | `Vd.8H -> result` | `A64` | -| uint32x2_t vcopy_laneq_u32(
     uint32x2_t a,
     const int lane1,
     uint32x4_t b,
     const int lane2)
| `a -> Vd.2S`
`0 <= lane1 <= 1`
`b -> Vn.4S`
`0 <= lane2 <= 3` | `INS Vd.S[lane1],Vn.S[lane2]` | `Vd.2S -> result` | `A64` | -| uint32x4_t vcopyq_laneq_u32(
     uint32x4_t a,
     const int lane1,
     uint32x4_t b,
     const int lane2)
| `a -> Vd.4S`
`0 <= lane1 <= 3`
`b -> Vn.4S`
`0 <= lane2 <= 3` | `INS Vd.S[lane1],Vn.S[lane2]` | `Vd.4S -> result` | `A64` | -| uint64x1_t vcopy_laneq_u64(
     uint64x1_t a,
     const int lane1,
     uint64x2_t b,
     const int lane2)
| `a -> UNUSED`
`0 <= lane1 <= 0`
`b -> Vn.2D`
`0 <= lane2 <= 1` | `DUP Dd,Vn.D[lane2]` | `Dd -> result` | `A64` | -| uint64x2_t vcopyq_laneq_u64(
     uint64x2_t a,
     const int lane1,
     uint64x2_t b,
     const int lane2)
| `a -> Vd.2D`
`0 <= lane1 <= 1`
`b -> Vn.2D`
`0 <= lane2 <= 1` | `INS Vd.D[lane1],Vn.D[lane2]` | `Vd.2D -> result` | `A64` | -| poly64x1_t vcopy_laneq_p64(
     poly64x1_t a,
     const int lane1,
     poly64x2_t b,
     const int lane2)
| `a -> UNUSED`
`0 <= lane1 <= 0`
`b -> Vn.2D`
`0 <= lane2 <= 1` | `DUP Dd,Vn.D[lane2]` | `Dd -> result` | `A32/A64` | -| poly64x2_t vcopyq_laneq_p64(
     poly64x2_t a,
     const int lane1,
     poly64x2_t b,
     const int lane2)
| `a -> Vd.2D`
`0 <= lane1 <= 1`
`b -> Vn.2D`
`0 <= lane2 <= 1` | `INS Vd.D[lane1],Vn.D[lane2]` | `Vd.2D -> result` | `A32/A64` | -| float32x2_t vcopy_laneq_f32(
     float32x2_t a,
     const int lane1,
     float32x4_t b,
     const int lane2)
| `a -> Vd.2S`
`0 <= lane1 <= 1`
`b -> Vn.4S`
`0 <= lane2 <= 3` | `INS Vd.S[lane1],Vn.S[lane2]` | `Vd.2S -> result` | `A64` | -| float32x4_t vcopyq_laneq_f32(
     float32x4_t a,
     const int lane1,
     float32x4_t b,
     const int lane2)
| `a -> Vd.4S`
`0 <= lane1 <= 3`
`b -> Vn.4S`
`0 <= lane2 <= 3` | `INS Vd.S[lane1],Vn.S[lane2]` | `Vd.4S -> result` | `A64` | -| float64x1_t vcopy_laneq_f64(
     float64x1_t a,
     const int lane1,
     float64x2_t b,
     const int lane2)
| `a -> UNUSED`
`0 <= lane1 <= 0`
`b -> Vn.2D`
`0 <= lane2 <= 1` | `DUP Dd,Vn.D[lane2]` | `Dd -> result` | `A64` | -| float64x2_t vcopyq_laneq_f64(
     float64x2_t a,
     const int lane1,
     float64x2_t b,
     const int lane2)
| `a -> Vd.2D`
`0 <= lane1 <= 1`
`b -> Vn.2D`
`0 <= lane2 <= 1` | `INS Vd.D[lane1],Vn.D[lane2]` | `Vd.2D -> result` | `A64` | -| poly8x8_t vcopy_laneq_p8(
     poly8x8_t a,
     const int lane1,
     poly8x16_t b,
     const int lane2)
| `a -> Vd.8B`
`0 <= lane1 <= 7`
`b -> Vn.16B`
`0 <= lane2 <= 15` | `INS Vd.B[lane1],Vn.B[lane2]` | `Vd.8B -> result` | `A64` | -| poly8x16_t vcopyq_laneq_p8(
     poly8x16_t a,
     const int lane1,
     poly8x16_t b,
     const int lane2)
| `a -> Vd.16B`
`0 <= lane1 <= 15`
`b -> Vn.16B`
`0 <= lane2 <= 15` | `INS Vd.B[lane1],Vn.B[lane2]` | `Vd.16B -> result` | `A64` | -| poly16x4_t vcopy_laneq_p16(
     poly16x4_t a,
     const int lane1,
     poly16x8_t b,
     const int lane2)
| `a -> Vd.4H`
`0 <= lane1 <= 3`
`b -> Vn.8H`
`0 <= lane2 <= 7` | `INS Vd.H[lane1],Vn.H[lane2]` | `Vd.4H -> result` | `A64` | -| poly16x8_t vcopyq_laneq_p16(
     poly16x8_t a,
     const int lane1,
     poly16x8_t b,
     const int lane2)
| `a -> Vd.8H`
`0 <= lane1 <= 7`
`b -> Vn.8H`
`0 <= lane2 <= 7` | `INS Vd.H[lane1],Vn.H[lane2]` | `Vd.8H -> result` | `A64` | +| Intrinsic | Argument preparation | AArch64 Instruction | Result | Supported architectures | +|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------|-------------------------------|--------------------|---------------------------| +| int8x8_t vcopy_lane_s8(
     int8x8_t a,
     const int lane1,
     int8x8_t b,
     const int lane2)
| `a -> Vd.8B`
`0 <= lane1 <= 7`
`b -> Vn.8B`
`0 <= lane2 <= 7` | `INS Vd.B[lane1],Vn.B[lane2]` | `Vd.8B -> result` | `A64` | +| int8x16_t vcopyq_lane_s8(
     int8x16_t a,
     const int lane1,
     int8x8_t b,
     const int lane2)
| `a -> Vd.16B`
`0 <= lane1 <= 15`
`b -> Vn.8B`
`0 <= lane2 <= 7` | `INS Vd.B[lane1],Vn.B[lane2]` | `Vd.16B -> result` | `A64` | +| int16x4_t vcopy_lane_s16(
     int16x4_t a,
     const int lane1,
     int16x4_t b,
     const int lane2)
| `a -> Vd.4H`
`0 <= lane1 <= 3`
`b -> Vn.4H`
`0 <= lane2 <= 3` | `INS Vd.H[lane1],Vn.H[lane2]` | `Vd.4H -> result` | `A64` | +| int16x8_t vcopyq_lane_s16(
     int16x8_t a,
     const int lane1,
     int16x4_t b,
     const int lane2)
| `a -> Vd.8H`
`0 <= lane1 <= 7`
`b -> Vn.4H`
`0 <= lane2 <= 3` | `INS Vd.H[lane1],Vn.H[lane2]` | `Vd.8H -> result` | `A64` | +| int32x2_t vcopy_lane_s32(
     int32x2_t a,
     const int lane1,
     int32x2_t b,
     const int lane2)
| `a -> Vd.2S`
`0 <= lane1 <= 1`
`b -> Vn.2S`
`0 <= lane2 <= 1` | `INS Vd.S[lane1],Vn.S[lane2]` | `Vd.2S -> result` | `A64` | +| int32x4_t vcopyq_lane_s32(
     int32x4_t a,
     const int lane1,
     int32x2_t b,
     const int lane2)
| `a -> Vd.4S`
`0 <= lane1 <= 3`
`b -> Vn.2S`
`0 <= lane2 <= 1` | `INS Vd.S[lane1],Vn.S[lane2]` | `Vd.4S -> result` | `A64` | +| int64x1_t vcopy_lane_s64(
     int64x1_t a,
     const int lane1,
     int64x1_t b,
     const int lane2)
| `a -> UNUSED`
`0 <= lane1 <= 0`
`b -> Vn.1D`
`0 <= lane2 <= 0` | `DUP Dd,Vn.D[lane2]` | `Dd -> result` | `A64` | +| int64x2_t vcopyq_lane_s64(
     int64x2_t a,
     const int lane1,
     int64x1_t b,
     const int lane2)
| `a -> Vd.2D`
`0 <= lane1 <= 1`
`b -> Vn.1D`
`0 <= lane2 <= 0` | `INS Vd.D[lane1],Vn.D[lane2]` | `Vd.2D -> result` | `A64` | +| uint8x8_t vcopy_lane_u8(
     uint8x8_t a,
     const int lane1,
     uint8x8_t b,
     const int lane2)
| `a -> Vd.8B`
`0 <= lane1 <= 7`
`b -> Vn.8B`
`0 <= lane2 <= 7` | `INS Vd.B[lane1],Vn.B[lane2]` | `Vd.8B -> result` | `A64` | +| uint8x16_t vcopyq_lane_u8(
     uint8x16_t a,
     const int lane1,
     uint8x8_t b,
     const int lane2)
| `a -> Vd.16B`
`0 <= lane1 <= 15`
`b -> Vn.8B`
`0 <= lane2 <= 7` | `INS Vd.B[lane1],Vn.B[lane2]` | `Vd.16B -> result` | `A64` | +| uint16x4_t vcopy_lane_u16(
     uint16x4_t a,
     const int lane1,
     uint16x4_t b,
     const int lane2)
| `a -> Vd.4H`
`0 <= lane1 <= 3`
`b -> Vn.4H`
`0 <= lane2 <= 3` | `INS Vd.H[lane1],Vn.H[lane2]` | `Vd.4H -> result` | `A64` | +| uint16x8_t vcopyq_lane_u16(
     uint16x8_t a,
     const int lane1,
     uint16x4_t b,
     const int lane2)
| `a -> Vd.8H`
`0 <= lane1 <= 7`
`b -> Vn.4H`
`0 <= lane2 <= 3` | `INS Vd.H[lane1],Vn.H[lane2]` | `Vd.8H -> result` | `A64` | +| uint32x2_t vcopy_lane_u32(
     uint32x2_t a,
     const int lane1,
     uint32x2_t b,
     const int lane2)
| `a -> Vd.2S`
`0 <= lane1 <= 1`
`b -> Vn.2S`
`0 <= lane2 <= 1` | `INS Vd.S[lane1],Vn.S[lane2]` | `Vd.2S -> result` | `A64` | +| uint32x4_t vcopyq_lane_u32(
     uint32x4_t a,
     const int lane1,
     uint32x2_t b,
     const int lane2)
| `a -> Vd.4S`
`0 <= lane1 <= 3`
`b -> Vn.2S`
`0 <= lane2 <= 1` | `INS Vd.S[lane1],Vn.S[lane2]` | `Vd.4S -> result` | `A64` | +| uint64x1_t vcopy_lane_u64(
     uint64x1_t a,
     const int lane1,
     uint64x1_t b,
     const int lane2)
| `a -> UNUSED`
`0 <= lane1 <= 0`
`b -> Vn.1D`
`0 <= lane2 <= 0` | `DUP Dd,Vn.D[lane2]` | `Dd -> result` | `A64` | +| uint64x2_t vcopyq_lane_u64(
     uint64x2_t a,
     const int lane1,
     uint64x1_t b,
     const int lane2)
| `a -> Vd.2D`
`0 <= lane1 <= 1`
`b -> Vn.1D`
`0 <= lane2 <= 0` | `INS Vd.D[lane1],Vn.D[lane2]` | `Vd.2D -> result` | `A64` | +| poly64x1_t vcopy_lane_p64(
     poly64x1_t a,
     const int lane1,
     poly64x1_t b,
     const int lane2)
| `a -> UNUSED`
`0 <= lane1 <= 0`
`b -> Vn.1D`
`0 <= lane2 <= 0` | `DUP Dd,Vn.D[lane2]` | `Dd -> result` | `A32/A64` | +| poly64x2_t vcopyq_lane_p64(
     poly64x2_t a,
     const int lane1,
     poly64x1_t b,
     const int lane2)
| `a -> Vd.2D`
`0 <= lane1 <= 1`
`b -> Vn.1D`
`0 <= lane2 <= 0` | `INS Vd.D[lane1],Vn.D[lane2]` | `Vd.2D -> result` | `A32/A64` | +| float32x2_t vcopy_lane_f32(
     float32x2_t a,
     const int lane1,
     float32x2_t b,
     const int lane2)
| `a -> Vd.2S`
`0 <= lane1 <= 1`
`b -> Vn.2S`
`0 <= lane2 <= 1` | `INS Vd.S[lane1],Vn.S[lane2]` | `Vd.2S -> result` | `A64` | +| float32x4_t vcopyq_lane_f32(
     float32x4_t a,
     const int lane1,
     float32x2_t b,
     const int lane2)
| `a -> Vd.4S`
`0 <= lane1 <= 3`
`b -> Vn.2S`
`0 <= lane2 <= 1` | `INS Vd.S[lane1],Vn.S[lane2]` | `Vd.4S -> result` | `A64` | +| float64x1_t vcopy_lane_f64(
     float64x1_t a,
     const int lane1,
     float64x1_t b,
     const int lane2)
| `a -> UNUSED`
`0 <= lane1 <= 0`
`b -> Vn.1D`
`0 <= lane2 <= 0` | `DUP Dd,Vn.D[lane2]` | `Dd -> result` | `A64` | +| float64x2_t vcopyq_lane_f64(
     float64x2_t a,
     const int lane1,
     float64x1_t b,
     const int lane2)
| `a -> Vd.2D`
`0 <= lane1 <= 1`
`b -> Vn.1D`
`0 <= lane2 <= 0` | `INS Vd.D[lane1],Vn.D[lane2]` | `Vd.2D -> result` | `A64` | +| poly8x8_t vcopy_lane_p8(
     poly8x8_t a,
     const int lane1,
     poly8x8_t b,
     const int lane2)
| `a -> Vd.8B`
`0 <= lane1 <= 7`
`b -> Vn.8B`
`0 <= lane2 <= 7` | `INS Vd.B[lane1],Vn.B[lane2]` | `Vd.8B -> result` | `A64` | +| poly8x16_t vcopyq_lane_p8(
     poly8x16_t a,
     const int lane1,
     poly8x8_t b,
     const int lane2)
| `a -> Vd.16B`
`0 <= lane1 <= 15`
`b -> Vn.8B`
`0 <= lane2 <= 7` | `INS Vd.B[lane1],Vn.B[lane2]` | `Vd.16B -> result` | `A64` | +| poly16x4_t vcopy_lane_p16(
     poly16x4_t a,
     const int lane1,
     poly16x4_t b,
     const int lane2)
| `a -> Vd.4H`
`0 <= lane1 <= 3`
`b -> Vn.4H`
`0 <= lane2 <= 3` | `INS Vd.H[lane1],Vn.H[lane2]` | `Vd.4H -> result` | `A64` | +| poly16x8_t vcopyq_lane_p16(
     poly16x8_t a,
     const int lane1,
     poly16x4_t b,
     const int lane2)
| `a -> Vd.8H`
`0 <= lane1 <= 7`
`b -> Vn.4H`
`0 <= lane2 <= 3` | `INS Vd.H[lane1],Vn.H[lane2]` | `Vd.8H -> result` | `A64` | +| mfloat8x8_t vcopy_lane_mf8(
     mfloat8x8_t a,
     const int lane1,
     mfloat8x8_t b,
     const int lane2)
| `a -> Vd.8B`
`0 <= lane1 <= 7`
`b -> Vn.8B`
`0 <= lane2 <= 7` | `INS Vd.B[lane1],Vn.B[lane2]` | `Vd.8B -> result` | `A64` | +| mfloat8x16_t vcopyq_lane_mf8(
     mfloat8x16_t a,
     const int lane1,
     mfloat8x8_t b,
     const int lane2)
| `a -> Vd.16B`
`0 <= lane1 <= 15`
`b -> Vn.8B`
`0 <= lane2 <= 7` | `INS Vd.B[lane1],Vn.B[lane2]` | `Vd.16B -> result` | `A64` | +| int8x8_t vcopy_laneq_s8(
     int8x8_t a,
     const int lane1,
     int8x16_t b,
     const int lane2)
| `a -> Vd.8B`
`0 <= lane1 <= 7`
`b -> Vn.16B`
`0 <= lane2 <= 15` | `INS Vd.B[lane1],Vn.B[lane2]` | `Vd.8B -> result` | `A64` | +| int8x16_t vcopyq_laneq_s8(
     int8x16_t a,
     const int lane1,
     int8x16_t b,
     const int lane2)
| `a -> Vd.16B`
`0 <= lane1 <= 15`
`b -> Vn.16B`
`0 <= lane2 <= 15` | `INS Vd.B[lane1],Vn.B[lane2]` | `Vd.16B -> result` | `A64` | +| int16x4_t vcopy_laneq_s16(
     int16x4_t a,
     const int lane1,
     int16x8_t b,
     const int lane2)
| `a -> Vd.4H`
`0 <= lane1 <= 3`
`b -> Vn.8H`
`0 <= lane2 <= 7` | `INS Vd.H[lane1],Vn.H[lane2]` | `Vd.4H -> result` | `A64` | +| int16x8_t vcopyq_laneq_s16(
     int16x8_t a,
     const int lane1,
     int16x8_t b,
     const int lane2)
| `a -> Vd.8H`
`0 <= lane1 <= 7`
`b -> Vn.8H`
`0 <= lane2 <= 7` | `INS Vd.H[lane1],Vn.H[lane2]` | `Vd.8H -> result` | `A64` | +| int32x2_t vcopy_laneq_s32(
     int32x2_t a,
     const int lane1,
     int32x4_t b,
     const int lane2)
| `a -> Vd.2S`
`0 <= lane1 <= 1`
`b -> Vn.4S`
`0 <= lane2 <= 3` | `INS Vd.S[lane1],Vn.S[lane2]` | `Vd.2S -> result` | `A64` | +| int32x4_t vcopyq_laneq_s32(
     int32x4_t a,
     const int lane1,
     int32x4_t b,
     const int lane2)
| `a -> Vd.4S`
`0 <= lane1 <= 3`
`b -> Vn.4S`
`0 <= lane2 <= 3` | `INS Vd.S[lane1],Vn.S[lane2]` | `Vd.4S -> result` | `A64` | +| int64x1_t vcopy_laneq_s64(
     int64x1_t a,
     const int lane1,
     int64x2_t b,
     const int lane2)
| `a -> UNUSED`
`0 <= lane1 <= 0`
`b -> Vn.2D`
`0 <= lane2 <= 1` | `DUP Dd,Vn.D[lane2]` | `Dd -> result` | `A64` | +| int64x2_t vcopyq_laneq_s64(
     int64x2_t a,
     const int lane1,
     int64x2_t b,
     const int lane2)
| `a -> Vd.2D`
`0 <= lane1 <= 1`
`b -> Vn.2D`
`0 <= lane2 <= 1` | `INS Vd.D[lane1],Vn.D[lane2]` | `Vd.2D -> result` | `A64` | +| uint8x8_t vcopy_laneq_u8(
     uint8x8_t a,
     const int lane1,
     uint8x16_t b,
     const int lane2)
| `a -> Vd.8B`
`0 <= lane1 <= 7`
`b -> Vn.16B`
`0 <= lane2 <= 15` | `INS Vd.B[lane1],Vn.B[lane2]` | `Vd.8B -> result` | `A64` | +| uint8x16_t vcopyq_laneq_u8(
     uint8x16_t a,
     const int lane1,
     uint8x16_t b,
     const int lane2)
| `a -> Vd.16B`
`0 <= lane1 <= 15`
`b -> Vn.16B`
`0 <= lane2 <= 15` | `INS Vd.B[lane1],Vn.B[lane2]` | `Vd.16B -> result` | `A64` | +| uint16x4_t vcopy_laneq_u16(
     uint16x4_t a,
     const int lane1,
     uint16x8_t b,
     const int lane2)
| `a -> Vd.4H`
`0 <= lane1 <= 3`
`b -> Vn.8H`
`0 <= lane2 <= 7` | `INS Vd.H[lane1],Vn.H[lane2]` | `Vd.4H -> result` | `A64` | +| uint16x8_t vcopyq_laneq_u16(
     uint16x8_t a,
     const int lane1,
     uint16x8_t b,
     const int lane2)
| `a -> Vd.8H`
`0 <= lane1 <= 7`
`b -> Vn.8H`
`0 <= lane2 <= 7` | `INS Vd.H[lane1],Vn.H[lane2]` | `Vd.8H -> result` | `A64` | +| uint32x2_t vcopy_laneq_u32(
     uint32x2_t a,
     const int lane1,
     uint32x4_t b,
     const int lane2)
| `a -> Vd.2S`
`0 <= lane1 <= 1`
`b -> Vn.4S`
`0 <= lane2 <= 3` | `INS Vd.S[lane1],Vn.S[lane2]` | `Vd.2S -> result` | `A64` | +| uint32x4_t vcopyq_laneq_u32(
     uint32x4_t a,
     const int lane1,
     uint32x4_t b,
     const int lane2)
| `a -> Vd.4S`
`0 <= lane1 <= 3`
`b -> Vn.4S`
`0 <= lane2 <= 3` | `INS Vd.S[lane1],Vn.S[lane2]` | `Vd.4S -> result` | `A64` | +| uint64x1_t vcopy_laneq_u64(
     uint64x1_t a,
     const int lane1,
     uint64x2_t b,
     const int lane2)
| `a -> UNUSED`
`0 <= lane1 <= 0`
`b -> Vn.2D`
`0 <= lane2 <= 1` | `DUP Dd,Vn.D[lane2]` | `Dd -> result` | `A64` | +| uint64x2_t vcopyq_laneq_u64(
     uint64x2_t a,
     const int lane1,
     uint64x2_t b,
     const int lane2)
| `a -> Vd.2D`
`0 <= lane1 <= 1`
`b -> Vn.2D`
`0 <= lane2 <= 1` | `INS Vd.D[lane1],Vn.D[lane2]` | `Vd.2D -> result` | `A64` | +| poly64x1_t vcopy_laneq_p64(
     poly64x1_t a,
     const int lane1,
     poly64x2_t b,
     const int lane2)
| `a -> UNUSED`
`0 <= lane1 <= 0`
`b -> Vn.2D`
`0 <= lane2 <= 1` | `DUP Dd,Vn.D[lane2]` | `Dd -> result` | `A32/A64` | +| poly64x2_t vcopyq_laneq_p64(
     poly64x2_t a,
     const int lane1,
     poly64x2_t b,
     const int lane2)
| `a -> Vd.2D`
`0 <= lane1 <= 1`
`b -> Vn.2D`
`0 <= lane2 <= 1` | `INS Vd.D[lane1],Vn.D[lane2]` | `Vd.2D -> result` | `A32/A64` | +| float32x2_t vcopy_laneq_f32(
     float32x2_t a,
     const int lane1,
     float32x4_t b,
     const int lane2)
| `a -> Vd.2S`
`0 <= lane1 <= 1`
`b -> Vn.4S`
`0 <= lane2 <= 3` | `INS Vd.S[lane1],Vn.S[lane2]` | `Vd.2S -> result` | `A64` | +| float32x4_t vcopyq_laneq_f32(
     float32x4_t a,
     const int lane1,
     float32x4_t b,
     const int lane2)
| `a -> Vd.4S`
`0 <= lane1 <= 3`
`b -> Vn.4S`
`0 <= lane2 <= 3` | `INS Vd.S[lane1],Vn.S[lane2]` | `Vd.4S -> result` | `A64` | +| float64x1_t vcopy_laneq_f64(
     float64x1_t a,
     const int lane1,
     float64x2_t b,
     const int lane2)
| `a -> UNUSED`
`0 <= lane1 <= 0`
`b -> Vn.2D`
`0 <= lane2 <= 1` | `DUP Dd,Vn.D[lane2]` | `Dd -> result` | `A64` | +| float64x2_t vcopyq_laneq_f64(
     float64x2_t a,
     const int lane1,
     float64x2_t b,
     const int lane2)
| `a -> Vd.2D`
`0 <= lane1 <= 1`
`b -> Vn.2D`
`0 <= lane2 <= 1` | `INS Vd.D[lane1],Vn.D[lane2]` | `Vd.2D -> result` | `A64` | +| poly8x8_t vcopy_laneq_p8(
     poly8x8_t a,
     const int lane1,
     poly8x16_t b,
     const int lane2)
| `a -> Vd.8B`
`0 <= lane1 <= 7`
`b -> Vn.16B`
`0 <= lane2 <= 15` | `INS Vd.B[lane1],Vn.B[lane2]` | `Vd.8B -> result` | `A64` | +| poly8x16_t vcopyq_laneq_p8(
     poly8x16_t a,
     const int lane1,
     poly8x16_t b,
     const int lane2)
| `a -> Vd.16B`
`0 <= lane1 <= 15`
`b -> Vn.16B`
`0 <= lane2 <= 15` | `INS Vd.B[lane1],Vn.B[lane2]` | `Vd.16B -> result` | `A64` | +| poly16x4_t vcopy_laneq_p16(
     poly16x4_t a,
     const int lane1,
     poly16x8_t b,
     const int lane2)
| `a -> Vd.4H`
`0 <= lane1 <= 3`
`b -> Vn.8H`
`0 <= lane2 <= 7` | `INS Vd.H[lane1],Vn.H[lane2]` | `Vd.4H -> result` | `A64` | +| poly16x8_t vcopyq_laneq_p16(
     poly16x8_t a,
     const int lane1,
     poly16x8_t b,
     const int lane2)
| `a -> Vd.8H`
`0 <= lane1 <= 7`
`b -> Vn.8H`
`0 <= lane2 <= 7` | `INS Vd.H[lane1],Vn.H[lane2]` | `Vd.8H -> result` | `A64` | +| mfloat8x8_t vcopy_laneq_mf8(
     mfloat8x8_t a,
     const int lane1,
     mfloat8x16_t b,
     const int lane2)
| `a -> Vd.8B`
`0 <= lane1 <= 7`
`b -> Vn.16B`
`0 <= lane2 <= 15` | `INS Vd.B[lane1],Vn.B[lane2]` | `Vd.8B -> result` | `A64` | +| mfloat8x16_t vcopyq_laneq_mf8(
     mfloat8x16_t a,
     const int lane1,
     mfloat8x16_t b,
     const int lane2)
| `a -> Vd.16B`
`0 <= lane1 <= 15`
`b -> Vn.16B`
`0 <= lane2 <= 15` | `INS Vd.B[lane1],Vn.B[lane2]` | `Vd.16B -> result` | `A64` | #### Reverse bits within elements @@ -3129,971 +3191,1048 @@ The intrinsics in this section are guarded by the macro ``__ARM_NEON``. | poly8x8_t vcreate_p8(uint64_t a) | `a -> Xn` | `INS Vd.D[0],Xn` | `Vd.8B -> result` | `v7/A32/A64` | | poly16x4_t vcreate_p16(uint64_t a) | `a -> Xn` | `INS Vd.D[0],Xn` | `Vd.4H -> result` | `v7/A32/A64` | | float64x1_t vcreate_f64(uint64_t a) | `a -> Xn` | `INS Vd.D[0],Xn` | `Vd.1D -> result` | `A64` | +| mfloat8x8_t vcreate_mf8(uint64_t a) | `a -> Xn` | `INS Vd.D[0],Xn` | `Vd.8B -> result` | `A64` | #### Set all lanes to the same value -| Intrinsic | Argument preparation | AArch64 Instruction | Result | Supported architectures | -|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------|-------------------------|--------------------|---------------------------| -| int8x8_t vdup_n_s8(int8_t value) | `value -> rn` | `DUP Vd.8B,rn` | `Vd.8B -> result` | `v7/A32/A64` | -| int8x16_t vdupq_n_s8(int8_t value) | `value -> rn` | `DUP Vd.16B,rn` | `Vd.16B -> result` | `v7/A32/A64` | -| int16x4_t vdup_n_s16(int16_t value) | `value -> rn` | `DUP Vd.4H,rn` | `Vd.4H -> result` | `v7/A32/A64` | -| int16x8_t vdupq_n_s16(int16_t value) | `value -> rn` | `DUP Vd.8H,rn` | `Vd.8H -> result` | `v7/A32/A64` | -| int32x2_t vdup_n_s32(int32_t value) | `value -> rn` | `DUP Vd.2S,rn` | `Vd.2S -> result` | `v7/A32/A64` | -| int32x4_t vdupq_n_s32(int32_t value) | `value -> rn` | `DUP Vd.4S,rn` | `Vd.4S -> result` | `v7/A32/A64` | -| int64x1_t vdup_n_s64(int64_t value) | `value -> rn` | `INS Dd.D[0],xn` | `Vd.1D -> result` | `v7/A32/A64` | -| int64x2_t vdupq_n_s64(int64_t value) | `value -> rn` | `DUP Vd.2D,rn` | `Vd.2D -> result` | `v7/A32/A64` | -| uint8x8_t vdup_n_u8(uint8_t value) | `value -> rn` | `DUP Vd.8B,rn` | `Vd.8B -> result` | `v7/A32/A64` | -| uint8x16_t vdupq_n_u8(uint8_t value) | `value -> rn` | `DUP Vd.16B,rn` | `Vd.16B -> result` | `v7/A32/A64` | -| uint16x4_t vdup_n_u16(uint16_t value) | `value -> rn` | `DUP Vd.4H,rn` | `Vd.4H -> result` | `v7/A32/A64` | -| uint16x8_t vdupq_n_u16(uint16_t value) | `value -> rn` | `DUP Vd.8H,rn` | `Vd.8H -> result` | `v7/A32/A64` | -| uint32x2_t vdup_n_u32(uint32_t value) | `value -> rn` | `DUP Vd.2S,rn` | `Vd.2S -> result` | `v7/A32/A64` | -| uint32x4_t vdupq_n_u32(uint32_t value) | `value -> rn` | `DUP Vd.4S,rn` | `Vd.4S -> result` | `v7/A32/A64` | -| uint64x1_t vdup_n_u64(uint64_t value) | `value -> rn` | `INS Dd.D[0],xn` | `Vd.1D -> result` | `v7/A32/A64` | -| uint64x2_t vdupq_n_u64(uint64_t value) | `value -> rn` | `DUP Vd.2D,rn` | `Vd.2D -> result` | `v7/A32/A64` | -| poly64x1_t vdup_n_p64(poly64_t value) | `value -> rn` | `INS Dd.D[0],xn` | `Vd.1D -> result` | `A32/A64` | -| poly64x2_t vdupq_n_p64(poly64_t value) | `value -> rn` | `DUP Vd.2D,rn` | `Vd.2D -> result` | `A32/A64` | -| float32x2_t vdup_n_f32(float32_t value) | `value -> rn` | `DUP Vd.2S,rn` | `Vd.2S -> result` | `v7/A32/A64` | -| float32x4_t vdupq_n_f32(float32_t value) | `value -> rn` | `DUP Vd.4S,rn` | `Vd.4S -> result` | `v7/A32/A64` | -| poly8x8_t vdup_n_p8(poly8_t value) | `value -> rn` | `DUP Vd.8B,rn` | `Vd.8B -> result` | `v7/A32/A64` | -| poly8x16_t vdupq_n_p8(poly8_t value) | `value -> rn` | `DUP Vd.16B,rn` | `Vd.16B -> result` | `v7/A32/A64` | -| poly16x4_t vdup_n_p16(poly16_t value) | `value -> rn` | `DUP Vd.4H,rn` | `Vd.4H -> result` | `v7/A32/A64` | -| poly16x8_t vdupq_n_p16(poly16_t value) | `value -> rn` | `DUP Vd.8H,rn` | `Vd.8H -> result` | `v7/A32/A64` | -| float64x1_t vdup_n_f64(float64_t value) | `value -> rn` | `INS Dd.D[0],xn` | `Vd.1D -> result` | `A64` | -| float64x2_t vdupq_n_f64(float64_t value) | `value -> rn` | `DUP Vd.2D,rn` | `Vd.2D -> result` | `A64` | -| int8x8_t vmov_n_s8(int8_t value) | `value -> rn` | `DUP Vd.8B,rn` | `Vd.8B -> result` | `v7/A32/A64` | -| int8x16_t vmovq_n_s8(int8_t value) | `value -> rn` | `DUP Vd.16B,rn` | `Vd.16B -> result` | `v7/A32/A64` | -| int16x4_t vmov_n_s16(int16_t value) | `value -> rn` | `DUP Vd.4H,rn` | `Vd.4H -> result` | `v7/A32/A64` | -| int16x8_t vmovq_n_s16(int16_t value) | `value -> rn` | `DUP Vd.8H,rn` | `Vd.8H -> result` | `v7/A32/A64` | -| int32x2_t vmov_n_s32(int32_t value) | `value -> rn` | `DUP Vd.2S,rn` | `Vd.2S -> result` | `v7/A32/A64` | -| int32x4_t vmovq_n_s32(int32_t value) | `value -> rn` | `DUP Vd.4S,rn` | `Vd.4S -> result` | `v7/A32/A64` | -| int64x1_t vmov_n_s64(int64_t value) | `value -> rn` | `DUP Vd.1D,rn` | `Vd.1D -> result` | `v7/A32/A64` | -| int64x2_t vmovq_n_s64(int64_t value) | `value -> rn` | `DUP Vd.2D,rn` | `Vd.2D -> result` | `v7/A32/A64` | -| uint8x8_t vmov_n_u8(uint8_t value) | `value -> rn` | `DUP Vd.8B,rn` | `Vd.8B -> result` | `v7/A32/A64` | -| uint8x16_t vmovq_n_u8(uint8_t value) | `value -> rn` | `DUP Vd.16B,rn` | `Vd.16B -> result` | `v7/A32/A64` | -| uint16x4_t vmov_n_u16(uint16_t value) | `value -> rn` | `DUP Vd.4H,rn` | `Vd.4H -> result` | `v7/A32/A64` | -| uint16x8_t vmovq_n_u16(uint16_t value) | `value -> rn` | `DUP Vd.8H,rn` | `Vd.8H -> result` | `v7/A32/A64` | -| uint32x2_t vmov_n_u32(uint32_t value) | `value -> rn` | `DUP Vd.2S,rn` | `Vd.2S -> result` | `v7/A32/A64` | -| uint32x4_t vmovq_n_u32(uint32_t value) | `value -> rn` | `DUP Vd.4S,rn` | `Vd.4S -> result` | `v7/A32/A64` | -| uint64x1_t vmov_n_u64(uint64_t value) | `value -> rn` | `DUP Vd.1D,rn` | `Vd.1D -> result` | `v7/A32/A64` | -| uint64x2_t vmovq_n_u64(uint64_t value) | `value -> rn` | `DUP Vd.2D,rn` | `Vd.2D -> result` | `v7/A32/A64` | -| float32x2_t vmov_n_f32(float32_t value) | `value -> rn` | `DUP Vd.2S,rn` | `Vd.2S -> result` | `v7/A32/A64` | -| float32x4_t vmovq_n_f32(float32_t value) | `value -> rn` | `DUP Vd.4S,rn` | `Vd.4S -> result` | `v7/A32/A64` | -| poly8x8_t vmov_n_p8(poly8_t value) | `value -> rn` | `DUP Vd.8B,rn` | `Vd.8B -> result` | `v7/A32/A64` | -| poly8x16_t vmovq_n_p8(poly8_t value) | `value -> rn` | `DUP Vd.16B,rn` | `Vd.16B -> result` | `v7/A32/A64` | -| poly16x4_t vmov_n_p16(poly16_t value) | `value -> rn` | `DUP Vd.4H,rn` | `Vd.4H -> result` | `v7/A32/A64` | -| poly16x8_t vmovq_n_p16(poly16_t value) | `value -> rn` | `DUP Vd.8H,rn` | `Vd.8H -> result` | `v7/A32/A64` | -| float64x1_t vmov_n_f64(float64_t value) | `value -> rn` | `DUP Vd.1D,rn` | `Vd.1D -> result` | `A64` | -| float64x2_t vmovq_n_f64(float64_t value) | `value -> rn` | `DUP Vd.2D,rn` | `Vd.2D -> result` | `A64` | -| int8x8_t vdup_lane_s8(
     int8x8_t vec,
     const int lane)
| `vec -> Vn.8B`
`0 <= lane <= 7` | `DUP Vd.8B,Vn.B[lane]` | `Vd.8B -> result` | `v7/A32/A64` | -| int8x16_t vdupq_lane_s8(
     int8x8_t vec,
     const int lane)
| `vec -> Vn.8B`
`0 <= lane <= 7` | `DUP Vd.16B,Vn.B[lane]` | `Vd.16B -> result` | `v7/A32/A64` | -| int16x4_t vdup_lane_s16(
     int16x4_t vec,
     const int lane)
| `vec -> Vn.4H`
`0 <= lane <= 3` | `DUP Vd.4H,Vn.H[lane]` | `Vd.4H -> result` | `v7/A32/A64` | -| int16x8_t vdupq_lane_s16(
     int16x4_t vec,
     const int lane)
| `vec -> Vn.4H`
`0 <= lane <= 3` | `DUP Vd.8H,Vn.H[lane]` | `Vd.8H -> result` | `v7/A32/A64` | -| int32x2_t vdup_lane_s32(
     int32x2_t vec,
     const int lane)
| `vec -> Vn.2S`
`0 <= lane <= 1` | `DUP Vd.2S,Vn.S[lane]` | `Vd.2S -> result` | `v7/A32/A64` | -| int32x4_t vdupq_lane_s32(
     int32x2_t vec,
     const int lane)
| `vec -> Vn.2S`
`0 <= lane <= 1` | `DUP Vd.4S,Vn.S[lane]` | `Vd.4S -> result` | `v7/A32/A64` | -| int64x1_t vdup_lane_s64(
     int64x1_t vec,
     const int lane)
| `vec -> Vn.1D`
`0 <= lane <= 0` | `DUP Dd,Vn.D[lane]` | `Dd -> result` | `v7/A32/A64` | -| int64x2_t vdupq_lane_s64(
     int64x1_t vec,
     const int lane)
| `vec -> Vn.1D`
`0 <= lane <= 0` | `DUP Vd.2D,Vn.D[lane]` | `Vd.2D -> result` | `v7/A32/A64` | -| uint8x8_t vdup_lane_u8(
     uint8x8_t vec,
     const int lane)
| `vec -> Vn.8B`
`0 <= lane <= 7` | `DUP Vd.8B,Vn.B[lane]` | `Vd.8B -> result` | `v7/A32/A64` | -| uint8x16_t vdupq_lane_u8(
     uint8x8_t vec,
     const int lane)
| `vec -> Vn.8B`
`0 <= lane <= 7` | `DUP Vd.16B,Vn.B[lane]` | `Vd.16B -> result` | `v7/A32/A64` | -| uint16x4_t vdup_lane_u16(
     uint16x4_t vec,
     const int lane)
| `vec -> Vn.4H`
`0 <= lane <= 3` | `DUP Vd.4H,Vn.H[lane]` | `Vd.4H -> result` | `v7/A32/A64` | -| uint16x8_t vdupq_lane_u16(
     uint16x4_t vec,
     const int lane)
| `vec -> Vn.4H`
`0 <= lane <= 3` | `DUP Vd.8H,Vn.H[lane]` | `Vd.8H -> result` | `v7/A32/A64` | -| uint32x2_t vdup_lane_u32(
     uint32x2_t vec,
     const int lane)
| `vec -> Vn.2S`
`0 <= lane <= 1` | `DUP Vd.2S,Vn.S[lane]` | `Vd.2S -> result` | `v7/A32/A64` | -| uint32x4_t vdupq_lane_u32(
     uint32x2_t vec,
     const int lane)
| `vec -> Vn.2S`
`0 <= lane <= 1` | `DUP Vd.4S,Vn.S[lane]` | `Vd.4S -> result` | `v7/A32/A64` | -| uint64x1_t vdup_lane_u64(
     uint64x1_t vec,
     const int lane)
| `vec -> Vn.1D`
`0 <= lane <= 0` | `DUP Dd,Vn.D[lane]` | `Dd -> result` | `v7/A32/A64` | -| uint64x2_t vdupq_lane_u64(
     uint64x1_t vec,
     const int lane)
| `vec -> Vn.1D`
`0 <= lane <= 0` | `DUP Vd.2D,Vn.D[lane]` | `Vd.2D -> result` | `v7/A32/A64` | -| poly64x1_t vdup_lane_p64(
     poly64x1_t vec,
     const int lane)
| `vec -> Vn.1D`
`0 <= lane <= 0` | `DUP Dd,Vn.D[lane]` | `Dd -> result` | `A32/A64` | -| poly64x2_t vdupq_lane_p64(
     poly64x1_t vec,
     const int lane)
| `vec -> Vn.1D`
`0 <= lane <= 0` | `DUP Vd.2D,Vn.D[lane]` | `Vd.2D -> result` | `A32/A64` | -| float32x2_t vdup_lane_f32(
     float32x2_t vec,
     const int lane)
| `vec -> Vn.2S`
`0 <= lane <= 1` | `DUP Vd.2S,Vn.S[lane]` | `Vd.2S -> result` | `v7/A32/A64` | -| float32x4_t vdupq_lane_f32(
     float32x2_t vec,
     const int lane)
| `vec -> Vn.2S`
`0 <= lane <= 1` | `DUP Vd.4S,Vn.S[lane]` | `Vd.4S -> result` | `v7/A32/A64` | -| poly8x8_t vdup_lane_p8(
     poly8x8_t vec,
     const int lane)
| `vec -> Vn.8B`
`0 <= lane <= 7` | `DUP Vd.8B,Vn.B[lane]` | `Vd.8B -> result` | `v7/A32/A64` | -| poly8x16_t vdupq_lane_p8(
     poly8x8_t vec,
     const int lane)
| `vec -> Vn.8B`
`0 <= lane <= 7` | `DUP Vd.16B,Vn.B[lane]` | `Vd.16B -> result` | `v7/A32/A64` | -| poly16x4_t vdup_lane_p16(
     poly16x4_t vec,
     const int lane)
| `vec -> Vn.4H`
`0 <= lane <= 3` | `DUP Vd.4H,Vn.H[lane]` | `Vd.4H -> result` | `v7/A32/A64` | -| poly16x8_t vdupq_lane_p16(
     poly16x4_t vec,
     const int lane)
| `vec -> Vn.4H`
`0 <= lane <= 3` | `DUP Vd.8H,Vn.H[lane]` | `Vd.8H -> result` | `v7/A32/A64` | -| float64x1_t vdup_lane_f64(
     float64x1_t vec,
     const int lane)
| `vec -> Vn.1D`
`0 <= lane <= 0` | `DUP Dd,Vn.D[lane]` | `Dd -> result` | `A64` | -| float64x2_t vdupq_lane_f64(
     float64x1_t vec,
     const int lane)
| `vec -> Vn.1D`
`0 <= lane <= 0` | `DUP Vd.2D,Vn.D[lane]` | `Vd.2D -> result` | `A64` | -| int8x8_t vdup_laneq_s8(
     int8x16_t vec,
     const int lane)
| `vec -> Vn.16B`
`0 <= lane <= 15` | `DUP Vd.8B,Vn.B[lane]` | `Vd.8B -> result` | `A64` | -| int8x16_t vdupq_laneq_s8(
     int8x16_t vec,
     const int lane)
| `vec -> Vn.16B`
`0 <= lane <= 15` | `DUP Vd.16B,Vn.B[lane]` | `Vd.16B -> result` | `A64` | -| int16x4_t vdup_laneq_s16(
     int16x8_t vec,
     const int lane)
| `vec -> Vn.8H`
`0 <= lane <= 7` | `DUP Vd.4H,Vn.H[lane]` | `Vd.4H -> result` | `A64` | -| int16x8_t vdupq_laneq_s16(
     int16x8_t vec,
     const int lane)
| `vec -> Vn.8H`
`0 <= lane <= 7` | `DUP Vd.8H,Vn.H[lane]` | `Vd.8H -> result` | `A64` | -| int32x2_t vdup_laneq_s32(
     int32x4_t vec,
     const int lane)
| `vec -> Vn.4S`
`0 <= lane <= 3` | `DUP Vd.2S,Vn.S[lane]` | `Vd.2S -> result` | `A64` | -| int32x4_t vdupq_laneq_s32(
     int32x4_t vec,
     const int lane)
| `vec -> Vn.4S`
`0 <= lane <= 3` | `DUP Vd.4S,Vn.S[lane]` | `Vd.4S -> result` | `A64` | -| int64x1_t vdup_laneq_s64(
     int64x2_t vec,
     const int lane)
| `vec -> Vn.2D`
`0 <= lane <= 1` | `DUP Dd,Vn.D[lane]` | `Dd -> result` | `A64` | -| int64x2_t vdupq_laneq_s64(
     int64x2_t vec,
     const int lane)
| `vec -> Vn.2D`
`0 <= lane <= 1` | `DUP Vd.2D,Vn.D[lane]` | `Vd.2D -> result` | `A64` | -| uint8x8_t vdup_laneq_u8(
     uint8x16_t vec,
     const int lane)
| `vec -> Vn.16B`
`0 <= lane <= 15` | `DUP Vd.8B,Vn.B[lane]` | `Vd.8B -> result` | `A64` | -| uint8x16_t vdupq_laneq_u8(
     uint8x16_t vec,
     const int lane)
| `vec -> Vn.16B`
`0 <= lane <= 15` | `DUP Vd.16B,Vn.B[lane]` | `Vd.16B -> result` | `A64` | -| uint16x4_t vdup_laneq_u16(
     uint16x8_t vec,
     const int lane)
| `vec -> Vn.8H`
`0 <= lane <= 7` | `DUP Vd.4H,Vn.H[lane]` | `Vd.4H -> result` | `A64` | -| uint16x8_t vdupq_laneq_u16(
     uint16x8_t vec,
     const int lane)
| `vec -> Vn.8H`
`0 <= lane <= 7` | `DUP Vd.8H,Vn.H[lane]` | `Vd.8H -> result` | `A64` | -| uint32x2_t vdup_laneq_u32(
     uint32x4_t vec,
     const int lane)
| `vec -> Vn.4S`
`0 <= lane <= 3` | `DUP Vd.2S,Vn.S[lane]` | `Vd.2S -> result` | `A64` | -| uint32x4_t vdupq_laneq_u32(
     uint32x4_t vec,
     const int lane)
| `vec -> Vn.4S`
`0 <= lane <= 3` | `DUP Vd.4S,Vn.S[lane]` | `Vd.4S -> result` | `A64` | -| uint64x1_t vdup_laneq_u64(
     uint64x2_t vec,
     const int lane)
| `vec -> Vn.2D`
`0 <= lane <= 1` | `DUP Dd,Vn.D[lane]` | `Dd -> result` | `A64` | -| uint64x2_t vdupq_laneq_u64(
     uint64x2_t vec,
     const int lane)
| `vec -> Vn.2D`
`0 <= lane <= 1` | `DUP Vd.2D,Vn.D[lane]` | `Vd.2D -> result` | `A64` | -| poly64x1_t vdup_laneq_p64(
     poly64x2_t vec,
     const int lane)
| `vec -> Vn.2D`
`0 <= lane <= 1` | `DUP Dd,Vn.D[lane]` | `Dd -> result` | `A64` | -| poly64x2_t vdupq_laneq_p64(
     poly64x2_t vec,
     const int lane)
| `vec -> Vn.2D`
`0 <= lane <= 1` | `DUP Vd.2D,Vn.D[lane]` | `Vd.2D -> result` | `A64` | -| float32x2_t vdup_laneq_f32(
     float32x4_t vec,
     const int lane)
| `vec -> Vn.4S`
`0 <= lane <= 3` | `DUP Vd.2S,Vn.S[lane]` | `Vd.2S -> result` | `A64` | -| float32x4_t vdupq_laneq_f32(
     float32x4_t vec,
     const int lane)
| `vec -> Vn.4S`
`0 <= lane <= 3` | `DUP Vd.4S,Vn.S[lane]` | `Vd.4S -> result` | `A64` | -| poly8x8_t vdup_laneq_p8(
     poly8x16_t vec,
     const int lane)
| `vec -> Vn.16B`
`0 <= lane <= 15` | `DUP Vd.8B,Vn.B[lane]` | `Vd.8B -> result` | `A64` | -| poly8x16_t vdupq_laneq_p8(
     poly8x16_t vec,
     const int lane)
| `vec -> Vn.16B`
`0 <= lane <= 15` | `DUP Vd.16B,Vn.B[lane]` | `Vd.16B -> result` | `A64` | -| poly16x4_t vdup_laneq_p16(
     poly16x8_t vec,
     const int lane)
| `vec -> Vn.8H`
`0 <= lane <= 7` | `DUP Vd.4H,Vn.H[lane]` | `Vd.4H -> result` | `A64` | -| poly16x8_t vdupq_laneq_p16(
     poly16x8_t vec,
     const int lane)
| `vec -> Vn.8H`
`0 <= lane <= 7` | `DUP Vd.8H,Vn.H[lane]` | `Vd.8H -> result` | `A64` | -| float64x1_t vdup_laneq_f64(
     float64x2_t vec,
     const int lane)
| `vec -> Vn.2D`
`0 <= lane <= 1` | `DUP Dd,Vn.D[lane]` | `Dd -> result` | `A64` | -| float64x2_t vdupq_laneq_f64(
     float64x2_t vec,
     const int lane)
| `vec -> Vn.2D`
`0 <= lane <= 1` | `DUP Vd.2D,Vn.D[lane]` | `Vd.2D -> result` | `A64` | +| Intrinsic | Argument preparation | AArch64 Instruction | Result | Supported architectures | +|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------|-------------------------|--------------------|---------------------------| +| int8x8_t vdup_n_s8(int8_t value) | `value -> rn` | `DUP Vd.8B,rn` | `Vd.8B -> result` | `v7/A32/A64` | +| int8x16_t vdupq_n_s8(int8_t value) | `value -> rn` | `DUP Vd.16B,rn` | `Vd.16B -> result` | `v7/A32/A64` | +| int16x4_t vdup_n_s16(int16_t value) | `value -> rn` | `DUP Vd.4H,rn` | `Vd.4H -> result` | `v7/A32/A64` | +| int16x8_t vdupq_n_s16(int16_t value) | `value -> rn` | `DUP Vd.8H,rn` | `Vd.8H -> result` | `v7/A32/A64` | +| int32x2_t vdup_n_s32(int32_t value) | `value -> rn` | `DUP Vd.2S,rn` | `Vd.2S -> result` | `v7/A32/A64` | +| int32x4_t vdupq_n_s32(int32_t value) | `value -> rn` | `DUP Vd.4S,rn` | `Vd.4S -> result` | `v7/A32/A64` | +| int64x1_t vdup_n_s64(int64_t value) | `value -> rn` | `INS Dd.D[0],xn` | `Vd.1D -> result` | `v7/A32/A64` | +| int64x2_t vdupq_n_s64(int64_t value) | `value -> rn` | `DUP Vd.2D,rn` | `Vd.2D -> result` | `v7/A32/A64` | +| uint8x8_t vdup_n_u8(uint8_t value) | `value -> rn` | `DUP Vd.8B,rn` | `Vd.8B -> result` | `v7/A32/A64` | +| uint8x16_t vdupq_n_u8(uint8_t value) | `value -> rn` | `DUP Vd.16B,rn` | `Vd.16B -> result` | `v7/A32/A64` | +| uint16x4_t vdup_n_u16(uint16_t value) | `value -> rn` | `DUP Vd.4H,rn` | `Vd.4H -> result` | `v7/A32/A64` | +| uint16x8_t vdupq_n_u16(uint16_t value) | `value -> rn` | `DUP Vd.8H,rn` | `Vd.8H -> result` | `v7/A32/A64` | +| uint32x2_t vdup_n_u32(uint32_t value) | `value -> rn` | `DUP Vd.2S,rn` | `Vd.2S -> result` | `v7/A32/A64` | +| uint32x4_t vdupq_n_u32(uint32_t value) | `value -> rn` | `DUP Vd.4S,rn` | `Vd.4S -> result` | `v7/A32/A64` | +| uint64x1_t vdup_n_u64(uint64_t value) | `value -> rn` | `INS Dd.D[0],xn` | `Vd.1D -> result` | `v7/A32/A64` | +| uint64x2_t vdupq_n_u64(uint64_t value) | `value -> rn` | `DUP Vd.2D,rn` | `Vd.2D -> result` | `v7/A32/A64` | +| poly64x1_t vdup_n_p64(poly64_t value) | `value -> rn` | `INS Dd.D[0],xn` | `Vd.1D -> result` | `A32/A64` | +| poly64x2_t vdupq_n_p64(poly64_t value) | `value -> rn` | `DUP Vd.2D,rn` | `Vd.2D -> result` | `A32/A64` | +| float32x2_t vdup_n_f32(float32_t value) | `value -> rn` | `DUP Vd.2S,rn` | `Vd.2S -> result` | `v7/A32/A64` | +| float32x4_t vdupq_n_f32(float32_t value) | `value -> rn` | `DUP Vd.4S,rn` | `Vd.4S -> result` | `v7/A32/A64` | +| poly8x8_t vdup_n_p8(poly8_t value) | `value -> rn` | `DUP Vd.8B,rn` | `Vd.8B -> result` | `v7/A32/A64` | +| poly8x16_t vdupq_n_p8(poly8_t value) | `value -> rn` | `DUP Vd.16B,rn` | `Vd.16B -> result` | `v7/A32/A64` | +| poly16x4_t vdup_n_p16(poly16_t value) | `value -> rn` | `DUP Vd.4H,rn` | `Vd.4H -> result` | `v7/A32/A64` | +| poly16x8_t vdupq_n_p16(poly16_t value) | `value -> rn` | `DUP Vd.8H,rn` | `Vd.8H -> result` | `v7/A32/A64` | +| float64x1_t vdup_n_f64(float64_t value) | `value -> rn` | `INS Dd.D[0],xn` | `Vd.1D -> result` | `A64` | +| float64x2_t vdupq_n_f64(float64_t value) | `value -> rn` | `DUP Vd.2D,rn` | `Vd.2D -> result` | `A64` | +| mfloat8x8_t vdup_n_mf8(mfloat8_t value) | `value -> rn` | `DUP Vd.8B,rn` | `Vd.8B -> result` | `A64` | +| mfloat8x16_t vdupq_n_mf8(mfloat8_t value) | `value -> rn` | `DUP Vd.16B,rn` | `Vd.16B -> result` | `A64` | +| int8x8_t vmov_n_s8(int8_t value) | `value -> rn` | `DUP Vd.8B,rn` | `Vd.8B -> result` | `v7/A32/A64` | +| int8x16_t vmovq_n_s8(int8_t value) | `value -> rn` | `DUP Vd.16B,rn` | `Vd.16B -> result` | `v7/A32/A64` | +| int16x4_t vmov_n_s16(int16_t value) | `value -> rn` | `DUP Vd.4H,rn` | `Vd.4H -> result` | `v7/A32/A64` | +| int16x8_t vmovq_n_s16(int16_t value) | `value -> rn` | `DUP Vd.8H,rn` | `Vd.8H -> result` | `v7/A32/A64` | +| int32x2_t vmov_n_s32(int32_t value) | `value -> rn` | `DUP Vd.2S,rn` | `Vd.2S -> result` | `v7/A32/A64` | +| int32x4_t vmovq_n_s32(int32_t value) | `value -> rn` | `DUP Vd.4S,rn` | `Vd.4S -> result` | `v7/A32/A64` | +| int64x1_t vmov_n_s64(int64_t value) | `value -> rn` | `DUP Vd.1D,rn` | `Vd.1D -> result` | `v7/A32/A64` | +| int64x2_t vmovq_n_s64(int64_t value) | `value -> rn` | `DUP Vd.2D,rn` | `Vd.2D -> result` | `v7/A32/A64` | +| uint8x8_t vmov_n_u8(uint8_t value) | `value -> rn` | `DUP Vd.8B,rn` | `Vd.8B -> result` | `v7/A32/A64` | +| uint8x16_t vmovq_n_u8(uint8_t value) | `value -> rn` | `DUP Vd.16B,rn` | `Vd.16B -> result` | `v7/A32/A64` | +| uint16x4_t vmov_n_u16(uint16_t value) | `value -> rn` | `DUP Vd.4H,rn` | `Vd.4H -> result` | `v7/A32/A64` | +| uint16x8_t vmovq_n_u16(uint16_t value) | `value -> rn` | `DUP Vd.8H,rn` | `Vd.8H -> result` | `v7/A32/A64` | +| uint32x2_t vmov_n_u32(uint32_t value) | `value -> rn` | `DUP Vd.2S,rn` | `Vd.2S -> result` | `v7/A32/A64` | +| uint32x4_t vmovq_n_u32(uint32_t value) | `value -> rn` | `DUP Vd.4S,rn` | `Vd.4S -> result` | `v7/A32/A64` | +| uint64x1_t vmov_n_u64(uint64_t value) | `value -> rn` | `DUP Vd.1D,rn` | `Vd.1D -> result` | `v7/A32/A64` | +| uint64x2_t vmovq_n_u64(uint64_t value) | `value -> rn` | `DUP Vd.2D,rn` | `Vd.2D -> result` | `v7/A32/A64` | +| float32x2_t vmov_n_f32(float32_t value) | `value -> rn` | `DUP Vd.2S,rn` | `Vd.2S -> result` | `v7/A32/A64` | +| float32x4_t vmovq_n_f32(float32_t value) | `value -> rn` | `DUP Vd.4S,rn` | `Vd.4S -> result` | `v7/A32/A64` | +| poly8x8_t vmov_n_p8(poly8_t value) | `value -> rn` | `DUP Vd.8B,rn` | `Vd.8B -> result` | `v7/A32/A64` | +| poly8x16_t vmovq_n_p8(poly8_t value) | `value -> rn` | `DUP Vd.16B,rn` | `Vd.16B -> result` | `v7/A32/A64` | +| poly16x4_t vmov_n_p16(poly16_t value) | `value -> rn` | `DUP Vd.4H,rn` | `Vd.4H -> result` | `v7/A32/A64` | +| poly16x8_t vmovq_n_p16(poly16_t value) | `value -> rn` | `DUP Vd.8H,rn` | `Vd.8H -> result` | `v7/A32/A64` | +| float64x1_t vmov_n_f64(float64_t value) | `value -> rn` | `DUP Vd.1D,rn` | `Vd.1D -> result` | `A64` | +| float64x2_t vmovq_n_f64(float64_t value) | `value -> rn` | `DUP Vd.2D,rn` | `Vd.2D -> result` | `A64` | +| mfloat8x8_t vmov_n_mf8(mfloat8_t value) | `value -> rn` | `DUP Vd.8B,rn` | `Vd.8B -> result` | `A64` | +| mfloat8x16_t vmovq_n_mf8(mfloat8_t value) | `value -> rn` | `DUP Vd.16B,rn` | `Vd.16B -> result` | `A64` | +| int8x8_t vdup_lane_s8(
     int8x8_t vec,
     const int lane)
| `vec -> Vn.8B`
`0 <= lane <= 7` | `DUP Vd.8B,Vn.B[lane]` | `Vd.8B -> result` | `v7/A32/A64` | +| int8x16_t vdupq_lane_s8(
     int8x8_t vec,
     const int lane)
| `vec -> Vn.8B`
`0 <= lane <= 7` | `DUP Vd.16B,Vn.B[lane]` | `Vd.16B -> result` | `v7/A32/A64` | +| int16x4_t vdup_lane_s16(
     int16x4_t vec,
     const int lane)
| `vec -> Vn.4H`
`0 <= lane <= 3` | `DUP Vd.4H,Vn.H[lane]` | `Vd.4H -> result` | `v7/A32/A64` | +| int16x8_t vdupq_lane_s16(
     int16x4_t vec,
     const int lane)
| `vec -> Vn.4H`
`0 <= lane <= 3` | `DUP Vd.8H,Vn.H[lane]` | `Vd.8H -> result` | `v7/A32/A64` | +| int32x2_t vdup_lane_s32(
     int32x2_t vec,
     const int lane)
| `vec -> Vn.2S`
`0 <= lane <= 1` | `DUP Vd.2S,Vn.S[lane]` | `Vd.2S -> result` | `v7/A32/A64` | +| int32x4_t vdupq_lane_s32(
     int32x2_t vec,
     const int lane)
| `vec -> Vn.2S`
`0 <= lane <= 1` | `DUP Vd.4S,Vn.S[lane]` | `Vd.4S -> result` | `v7/A32/A64` | +| int64x1_t vdup_lane_s64(
     int64x1_t vec,
     const int lane)
| `vec -> Vn.1D`
`0 <= lane <= 0` | `DUP Dd,Vn.D[lane]` | `Dd -> result` | `v7/A32/A64` | +| int64x2_t vdupq_lane_s64(
     int64x1_t vec,
     const int lane)
| `vec -> Vn.1D`
`0 <= lane <= 0` | `DUP Vd.2D,Vn.D[lane]` | `Vd.2D -> result` | `v7/A32/A64` | +| uint8x8_t vdup_lane_u8(
     uint8x8_t vec,
     const int lane)
| `vec -> Vn.8B`
`0 <= lane <= 7` | `DUP Vd.8B,Vn.B[lane]` | `Vd.8B -> result` | `v7/A32/A64` | +| uint8x16_t vdupq_lane_u8(
     uint8x8_t vec,
     const int lane)
| `vec -> Vn.8B`
`0 <= lane <= 7` | `DUP Vd.16B,Vn.B[lane]` | `Vd.16B -> result` | `v7/A32/A64` | +| uint16x4_t vdup_lane_u16(
     uint16x4_t vec,
     const int lane)
| `vec -> Vn.4H`
`0 <= lane <= 3` | `DUP Vd.4H,Vn.H[lane]` | `Vd.4H -> result` | `v7/A32/A64` | +| uint16x8_t vdupq_lane_u16(
     uint16x4_t vec,
     const int lane)
| `vec -> Vn.4H`
`0 <= lane <= 3` | `DUP Vd.8H,Vn.H[lane]` | `Vd.8H -> result` | `v7/A32/A64` | +| uint32x2_t vdup_lane_u32(
     uint32x2_t vec,
     const int lane)
| `vec -> Vn.2S`
`0 <= lane <= 1` | `DUP Vd.2S,Vn.S[lane]` | `Vd.2S -> result` | `v7/A32/A64` | +| uint32x4_t vdupq_lane_u32(
     uint32x2_t vec,
     const int lane)
| `vec -> Vn.2S`
`0 <= lane <= 1` | `DUP Vd.4S,Vn.S[lane]` | `Vd.4S -> result` | `v7/A32/A64` | +| uint64x1_t vdup_lane_u64(
     uint64x1_t vec,
     const int lane)
| `vec -> Vn.1D`
`0 <= lane <= 0` | `DUP Dd,Vn.D[lane]` | `Dd -> result` | `v7/A32/A64` | +| uint64x2_t vdupq_lane_u64(
     uint64x1_t vec,
     const int lane)
| `vec -> Vn.1D`
`0 <= lane <= 0` | `DUP Vd.2D,Vn.D[lane]` | `Vd.2D -> result` | `v7/A32/A64` | +| poly64x1_t vdup_lane_p64(
     poly64x1_t vec,
     const int lane)
| `vec -> Vn.1D`
`0 <= lane <= 0` | `DUP Dd,Vn.D[lane]` | `Dd -> result` | `A32/A64` | +| poly64x2_t vdupq_lane_p64(
     poly64x1_t vec,
     const int lane)
| `vec -> Vn.1D`
`0 <= lane <= 0` | `DUP Vd.2D,Vn.D[lane]` | `Vd.2D -> result` | `A32/A64` | +| float32x2_t vdup_lane_f32(
     float32x2_t vec,
     const int lane)
| `vec -> Vn.2S`
`0 <= lane <= 1` | `DUP Vd.2S,Vn.S[lane]` | `Vd.2S -> result` | `v7/A32/A64` | +| float32x4_t vdupq_lane_f32(
     float32x2_t vec,
     const int lane)
| `vec -> Vn.2S`
`0 <= lane <= 1` | `DUP Vd.4S,Vn.S[lane]` | `Vd.4S -> result` | `v7/A32/A64` | +| poly8x8_t vdup_lane_p8(
     poly8x8_t vec,
     const int lane)
| `vec -> Vn.8B`
`0 <= lane <= 7` | `DUP Vd.8B,Vn.B[lane]` | `Vd.8B -> result` | `v7/A32/A64` | +| poly8x16_t vdupq_lane_p8(
     poly8x8_t vec,
     const int lane)
| `vec -> Vn.8B`
`0 <= lane <= 7` | `DUP Vd.16B,Vn.B[lane]` | `Vd.16B -> result` | `v7/A32/A64` | +| poly16x4_t vdup_lane_p16(
     poly16x4_t vec,
     const int lane)
| `vec -> Vn.4H`
`0 <= lane <= 3` | `DUP Vd.4H,Vn.H[lane]` | `Vd.4H -> result` | `v7/A32/A64` | +| poly16x8_t vdupq_lane_p16(
     poly16x4_t vec,
     const int lane)
| `vec -> Vn.4H`
`0 <= lane <= 3` | `DUP Vd.8H,Vn.H[lane]` | `Vd.8H -> result` | `v7/A32/A64` | +| float64x1_t vdup_lane_f64(
     float64x1_t vec,
     const int lane)
| `vec -> Vn.1D`
`0 <= lane <= 0` | `DUP Dd,Vn.D[lane]` | `Dd -> result` | `A64` | +| float64x2_t vdupq_lane_f64(
     float64x1_t vec,
     const int lane)
| `vec -> Vn.1D`
`0 <= lane <= 0` | `DUP Vd.2D,Vn.D[lane]` | `Vd.2D -> result` | `A64` | +| mfloat8x8_t vdup_lane_mf8(
     mfloat8x8_t vec,
     const int lane)
| `vec -> Vn.8B`
`0 <= lane <= 7` | `DUP Vd.8B,Vn.B[lane]` | `Vd.8B -> result` | `/A64` | +| mfloat8x16_t vdupq_lane_mf8(
     mfloat8x8_t vec,
     const int lane)
| `vec -> Vn.8B`
`0 <= lane <= 7` | `DUP Vd.16B,Vn.B[lane]` | `Vd.16B -> result` | `A64` | +| int8x8_t vdup_laneq_s8(
     int8x16_t vec,
     const int lane)
| `vec -> Vn.16B`
`0 <= lane <= 15` | `DUP Vd.8B,Vn.B[lane]` | `Vd.8B -> result` | `A64` | +| int8x16_t vdupq_laneq_s8(
     int8x16_t vec,
     const int lane)
| `vec -> Vn.16B`
`0 <= lane <= 15` | `DUP Vd.16B,Vn.B[lane]` | `Vd.16B -> result` | `A64` | +| int16x4_t vdup_laneq_s16(
     int16x8_t vec,
     const int lane)
| `vec -> Vn.8H`
`0 <= lane <= 7` | `DUP Vd.4H,Vn.H[lane]` | `Vd.4H -> result` | `A64` | +| int16x8_t vdupq_laneq_s16(
     int16x8_t vec,
     const int lane)
| `vec -> Vn.8H`
`0 <= lane <= 7` | `DUP Vd.8H,Vn.H[lane]` | `Vd.8H -> result` | `A64` | +| int32x2_t vdup_laneq_s32(
     int32x4_t vec,
     const int lane)
| `vec -> Vn.4S`
`0 <= lane <= 3` | `DUP Vd.2S,Vn.S[lane]` | `Vd.2S -> result` | `A64` | +| int32x4_t vdupq_laneq_s32(
     int32x4_t vec,
     const int lane)
| `vec -> Vn.4S`
`0 <= lane <= 3` | `DUP Vd.4S,Vn.S[lane]` | `Vd.4S -> result` | `A64` | +| int64x1_t vdup_laneq_s64(
     int64x2_t vec,
     const int lane)
| `vec -> Vn.2D`
`0 <= lane <= 1` | `DUP Dd,Vn.D[lane]` | `Dd -> result` | `A64` | +| int64x2_t vdupq_laneq_s64(
     int64x2_t vec,
     const int lane)
| `vec -> Vn.2D`
`0 <= lane <= 1` | `DUP Vd.2D,Vn.D[lane]` | `Vd.2D -> result` | `A64` | +| uint8x8_t vdup_laneq_u8(
     uint8x16_t vec,
     const int lane)
| `vec -> Vn.16B`
`0 <= lane <= 15` | `DUP Vd.8B,Vn.B[lane]` | `Vd.8B -> result` | `A64` | +| uint8x16_t vdupq_laneq_u8(
     uint8x16_t vec,
     const int lane)
| `vec -> Vn.16B`
`0 <= lane <= 15` | `DUP Vd.16B,Vn.B[lane]` | `Vd.16B -> result` | `A64` | +| uint16x4_t vdup_laneq_u16(
     uint16x8_t vec,
     const int lane)
| `vec -> Vn.8H`
`0 <= lane <= 7` | `DUP Vd.4H,Vn.H[lane]` | `Vd.4H -> result` | `A64` | +| uint16x8_t vdupq_laneq_u16(
     uint16x8_t vec,
     const int lane)
| `vec -> Vn.8H`
`0 <= lane <= 7` | `DUP Vd.8H,Vn.H[lane]` | `Vd.8H -> result` | `A64` | +| uint32x2_t vdup_laneq_u32(
     uint32x4_t vec,
     const int lane)
| `vec -> Vn.4S`
`0 <= lane <= 3` | `DUP Vd.2S,Vn.S[lane]` | `Vd.2S -> result` | `A64` | +| uint32x4_t vdupq_laneq_u32(
     uint32x4_t vec,
     const int lane)
| `vec -> Vn.4S`
`0 <= lane <= 3` | `DUP Vd.4S,Vn.S[lane]` | `Vd.4S -> result` | `A64` | +| uint64x1_t vdup_laneq_u64(
     uint64x2_t vec,
     const int lane)
| `vec -> Vn.2D`
`0 <= lane <= 1` | `DUP Dd,Vn.D[lane]` | `Dd -> result` | `A64` | +| uint64x2_t vdupq_laneq_u64(
     uint64x2_t vec,
     const int lane)
| `vec -> Vn.2D`
`0 <= lane <= 1` | `DUP Vd.2D,Vn.D[lane]` | `Vd.2D -> result` | `A64` | +| poly64x1_t vdup_laneq_p64(
     poly64x2_t vec,
     const int lane)
| `vec -> Vn.2D`
`0 <= lane <= 1` | `DUP Dd,Vn.D[lane]` | `Dd -> result` | `A64` | +| poly64x2_t vdupq_laneq_p64(
     poly64x2_t vec,
     const int lane)
| `vec -> Vn.2D`
`0 <= lane <= 1` | `DUP Vd.2D,Vn.D[lane]` | `Vd.2D -> result` | `A64` | +| float32x2_t vdup_laneq_f32(
     float32x4_t vec,
     const int lane)
| `vec -> Vn.4S`
`0 <= lane <= 3` | `DUP Vd.2S,Vn.S[lane]` | `Vd.2S -> result` | `A64` | +| float32x4_t vdupq_laneq_f32(
     float32x4_t vec,
     const int lane)
| `vec -> Vn.4S`
`0 <= lane <= 3` | `DUP Vd.4S,Vn.S[lane]` | `Vd.4S -> result` | `A64` | +| poly8x8_t vdup_laneq_p8(
     poly8x16_t vec,
     const int lane)
| `vec -> Vn.16B`
`0 <= lane <= 15` | `DUP Vd.8B,Vn.B[lane]` | `Vd.8B -> result` | `A64` | +| poly8x16_t vdupq_laneq_p8(
     poly8x16_t vec,
     const int lane)
| `vec -> Vn.16B`
`0 <= lane <= 15` | `DUP Vd.16B,Vn.B[lane]` | `Vd.16B -> result` | `A64` | +| poly16x4_t vdup_laneq_p16(
     poly16x8_t vec,
     const int lane)
| `vec -> Vn.8H`
`0 <= lane <= 7` | `DUP Vd.4H,Vn.H[lane]` | `Vd.4H -> result` | `A64` | +| poly16x8_t vdupq_laneq_p16(
     poly16x8_t vec,
     const int lane)
| `vec -> Vn.8H`
`0 <= lane <= 7` | `DUP Vd.8H,Vn.H[lane]` | `Vd.8H -> result` | `A64` | +| float64x1_t vdup_laneq_f64(
     float64x2_t vec,
     const int lane)
| `vec -> Vn.2D`
`0 <= lane <= 1` | `DUP Dd,Vn.D[lane]` | `Dd -> result` | `A64` | +| float64x2_t vdupq_laneq_f64(
     float64x2_t vec,
     const int lane)
| `vec -> Vn.2D`
`0 <= lane <= 1` | `DUP Vd.2D,Vn.D[lane]` | `Vd.2D -> result` | `A64` | +| mfloat8x8_t vdup_laneq_mf8(
     mfloat8x16_t vec,
     const int lane)
| `vec -> Vn.16B`
`0 <= lane <= 15` | `DUP Vd.8B,Vn.B[lane]` | `Vd.8B -> result` | `A64` | +| mfloat8x16_t vdupq_laneq_mf8(
     mfloat8x16_t vec,
     const int lane)
| `vec -> Vn.16B`
`0 <= lane <= 15` | `DUP Vd.16B,Vn.B[lane]` | `Vd.16B -> result` | `A64` | #### Combine vectors -| Intrinsic | Argument preparation | AArch64 Instruction | Result | Supported architectures | -|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------|----------------------------------------------|--------------------|---------------------------| -| int8x16_t vcombine_s8(
     int8x8_t low,
     int8x8_t high)
| `low -> Vn.8B`
`high -> Vm.8B` | `DUP Vd.1D,Vn.D[0]`
`INS Vd.D[1],Vm.D[0]` | `Vd.16B -> result` | `v7/A32/A64` | -| int16x8_t vcombine_s16(
     int16x4_t low,
     int16x4_t high)
| `low -> Vn.4H`
`high -> Vm.4H` | `DUP Vd.1D,Vn.D[0]`
`INS Vd.D[1],Vm.D[0]` | `Vd.8H -> result` | `v7/A32/A64` | -| int32x4_t vcombine_s32(
     int32x2_t low,
     int32x2_t high)
| `low -> Vn.2S`
`high -> Vm.2S` | `DUP Vd.1D,Vn.D[0]`
`INS Vd.D[1],Vm.D[0]` | `Vd.4S -> result` | `v7/A32/A64` | -| int64x2_t vcombine_s64(
     int64x1_t low,
     int64x1_t high)
| `low -> Vn.1D`
`high -> Vm.1D` | `DUP Vd.1D,Vn.D[0]`
`INS Vd.D[1],Vm.D[0]` | `Vd.2D -> result` | `v7/A32/A64` | -| uint8x16_t vcombine_u8(
     uint8x8_t low,
     uint8x8_t high)
| `low -> Vn.8B`
`high -> Vm.8B` | `DUP Vd.1D,Vn.D[0]`
`INS Vd.D[1],Vm.D[0]` | `Vd.16B -> result` | `v7/A32/A64` | -| uint16x8_t vcombine_u16(
     uint16x4_t low,
     uint16x4_t high)
| `low -> Vn.4H`
`high -> Vm.4H` | `DUP Vd.1D,Vn.D[0]`
`INS Vd.D[1],Vm.D[0]` | `Vd.8H -> result` | `v7/A32/A64` | -| uint32x4_t vcombine_u32(
     uint32x2_t low,
     uint32x2_t high)
| `low -> Vn.2S`
`high -> Vm.2S` | `DUP Vd.1D,Vn.D[0]`
`INS Vd.D[1],Vm.D[0]` | `Vd.4S -> result` | `v7/A32/A64` | -| uint64x2_t vcombine_u64(
     uint64x1_t low,
     uint64x1_t high)
| `low -> Vn.1D`
`high -> Vm.1D` | `DUP Vd.1D,Vn.D[0]`
`INS Vd.D[1],Vm.D[0]` | `Vd.2D -> result` | `v7/A32/A64` | -| poly64x2_t vcombine_p64(
     poly64x1_t low,
     poly64x1_t high)
| `low -> Vn.1D`
`high -> Vm.1D` | `DUP Vd.1D,Vn.D[0]`
`INS Vd.D[1],Vm.D[0]` | `Vd.2D -> result` | `A32/A64` | -| float16x8_t vcombine_f16(
     float16x4_t low,
     float16x4_t high)
| `low -> Vn.4H`
`high -> Vm.4H` | `DUP Vd.1D,Vn.D[0]`
`INS Vd.D[1],Vm.D[0]` | `Vd.8H -> result` | `v7/A32/A64` | -| float32x4_t vcombine_f32(
     float32x2_t low,
     float32x2_t high)
| `low -> Vn.2S`
`high -> Vm.2S` | `DUP Vd.1D,Vn.D[0]`
`INS Vd.D[1],Vm.D[0]` | `Vd.4S -> result` | `v7/A32/A64` | -| poly8x16_t vcombine_p8(
     poly8x8_t low,
     poly8x8_t high)
| `low -> Vn.8B`
`high -> Vm.8B` | `DUP Vd.1D,Vn.D[0]`
`INS Vd.D[1],Vm.D[0]` | `Vd.16B -> result` | `v7/A32/A64` | -| poly16x8_t vcombine_p16(
     poly16x4_t low,
     poly16x4_t high)
| `low -> Vn.4H`
`high -> Vm.4H` | `DUP Vd.1D,Vn.D[0]`
`INS Vd.D[1],Vm.D[0]` | `Vd.8H -> result` | `v7/A32/A64` | -| float64x2_t vcombine_f64(
     float64x1_t low,
     float64x1_t high)
| `low -> Vn.1D`
`high -> Vm.1D` | `DUP Vd.1D,Vn.D[0]`
`INS Vd.D[1],Vm.D[0]` | `Vd.2D -> result` | `A64` | +| Intrinsic | Argument preparation | AArch64 Instruction | Result | Supported architectures | +|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------|----------------------------------------------|--------------------|---------------------------| +| int8x16_t vcombine_s8(
     int8x8_t low,
     int8x8_t high)
| `low -> Vn.8B`
`high -> Vm.8B` | `DUP Vd.1D,Vn.D[0]`
`INS Vd.D[1],Vm.D[0]` | `Vd.16B -> result` | `v7/A32/A64` | +| int16x8_t vcombine_s16(
     int16x4_t low,
     int16x4_t high)
| `low -> Vn.4H`
`high -> Vm.4H` | `DUP Vd.1D,Vn.D[0]`
`INS Vd.D[1],Vm.D[0]` | `Vd.8H -> result` | `v7/A32/A64` | +| int32x4_t vcombine_s32(
     int32x2_t low,
     int32x2_t high)
| `low -> Vn.2S`
`high -> Vm.2S` | `DUP Vd.1D,Vn.D[0]`
`INS Vd.D[1],Vm.D[0]` | `Vd.4S -> result` | `v7/A32/A64` | +| int64x2_t vcombine_s64(
     int64x1_t low,
     int64x1_t high)
| `low -> Vn.1D`
`high -> Vm.1D` | `DUP Vd.1D,Vn.D[0]`
`INS Vd.D[1],Vm.D[0]` | `Vd.2D -> result` | `v7/A32/A64` | +| uint8x16_t vcombine_u8(
     uint8x8_t low,
     uint8x8_t high)
| `low -> Vn.8B`
`high -> Vm.8B` | `DUP Vd.1D,Vn.D[0]`
`INS Vd.D[1],Vm.D[0]` | `Vd.16B -> result` | `v7/A32/A64` | +| uint16x8_t vcombine_u16(
     uint16x4_t low,
     uint16x4_t high)
| `low -> Vn.4H`
`high -> Vm.4H` | `DUP Vd.1D,Vn.D[0]`
`INS Vd.D[1],Vm.D[0]` | `Vd.8H -> result` | `v7/A32/A64` | +| uint32x4_t vcombine_u32(
     uint32x2_t low,
     uint32x2_t high)
| `low -> Vn.2S`
`high -> Vm.2S` | `DUP Vd.1D,Vn.D[0]`
`INS Vd.D[1],Vm.D[0]` | `Vd.4S -> result` | `v7/A32/A64` | +| uint64x2_t vcombine_u64(
     uint64x1_t low,
     uint64x1_t high)
| `low -> Vn.1D`
`high -> Vm.1D` | `DUP Vd.1D,Vn.D[0]`
`INS Vd.D[1],Vm.D[0]` | `Vd.2D -> result` | `v7/A32/A64` | +| poly64x2_t vcombine_p64(
     poly64x1_t low,
     poly64x1_t high)
| `low -> Vn.1D`
`high -> Vm.1D` | `DUP Vd.1D,Vn.D[0]`
`INS Vd.D[1],Vm.D[0]` | `Vd.2D -> result` | `A32/A64` | +| float16x8_t vcombine_f16(
     float16x4_t low,
     float16x4_t high)
| `low -> Vn.4H`
`high -> Vm.4H` | `DUP Vd.1D,Vn.D[0]`
`INS Vd.D[1],Vm.D[0]` | `Vd.8H -> result` | `v7/A32/A64` | +| float32x4_t vcombine_f32(
     float32x2_t low,
     float32x2_t high)
| `low -> Vn.2S`
`high -> Vm.2S` | `DUP Vd.1D,Vn.D[0]`
`INS Vd.D[1],Vm.D[0]` | `Vd.4S -> result` | `v7/A32/A64` | +| poly8x16_t vcombine_p8(
     poly8x8_t low,
     poly8x8_t high)
| `low -> Vn.8B`
`high -> Vm.8B` | `DUP Vd.1D,Vn.D[0]`
`INS Vd.D[1],Vm.D[0]` | `Vd.16B -> result` | `v7/A32/A64` | +| poly16x8_t vcombine_p16(
     poly16x4_t low,
     poly16x4_t high)
| `low -> Vn.4H`
`high -> Vm.4H` | `DUP Vd.1D,Vn.D[0]`
`INS Vd.D[1],Vm.D[0]` | `Vd.8H -> result` | `v7/A32/A64` | +| float64x2_t vcombine_f64(
     float64x1_t low,
     float64x1_t high)
| `low -> Vn.1D`
`high -> Vm.1D` | `DUP Vd.1D,Vn.D[0]`
`INS Vd.D[1],Vm.D[0]` | `Vd.2D -> result` | `A64` | +| mfloat8x16_t vcombine_mf8(
     mfloat8x8_t low,
     mfloat8x8_t high)
| `low -> Vn.8B`
`high -> Vm.8B` | `DUP Vd.1D,Vn.D[0]`
`INS Vd.D[1],Vm.D[0]` | `Vd.16B -> result` | `A64` | #### Split vectors -| Intrinsic | Argument preparation | AArch64 Instruction | Result | Supported architectures | -|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------|-----------------------|-------------------|---------------------------| -| int8x8_t vget_high_s8(int8x16_t a) | `a -> Vn.16B` | `DUP Vd.1D,Vn.D[1]` | `Vd.8B -> result` | `v7/A32/A64` | -| int16x4_t vget_high_s16(int16x8_t a) | `a -> Vn.8H` | `DUP Vd.1D,Vn.D[1]` | `Vd.4H -> result` | `v7/A32/A64` | -| int32x2_t vget_high_s32(int32x4_t a) | `a -> Vn.4S` | `DUP Vd.1D,Vn.D[1]` | `Vd.2S -> result` | `v7/A32/A64` | -| int64x1_t vget_high_s64(int64x2_t a) | `a -> Vn.2D` | `DUP Vd.1D,Vn.D[1]` | `Vd.1D -> result` | `v7/A32/A64` | -| uint8x8_t vget_high_u8(uint8x16_t a) | `a -> Vn.16B` | `DUP Vd.1D,Vn.D[1]` | `Vd.8B -> result` | `v7/A32/A64` | -| uint16x4_t vget_high_u16(uint16x8_t a) | `a -> Vn.8H` | `DUP Vd.1D,Vn.D[1]` | `Vd.4H -> result` | `v7/A32/A64` | -| uint32x2_t vget_high_u32(uint32x4_t a) | `a -> Vn.4S` | `DUP Vd.1D,Vn.D[1]` | `Vd.2S -> result` | `v7/A32/A64` | -| uint64x1_t vget_high_u64(uint64x2_t a) | `a -> Vn.2D` | `DUP Vd.1D,Vn.D[1]` | `Vd.1D -> result` | `v7/A32/A64` | -| poly64x1_t vget_high_p64(poly64x2_t a) | `a -> Vn.2D` | `DUP Vd.1D,Vn.D[1]` | `Vd.1D -> result` | `A32/A64` | -| float16x4_t vget_high_f16(float16x8_t a) | `a -> Vn.8H` | `DUP Vd.1D,Vn.D[1]` | `Vd.4H -> result` | `v7/A32/A64` | -| float32x2_t vget_high_f32(float32x4_t a) | `a -> Vn.4S` | `DUP Vd.1D,Vn.D[1]` | `Vd.2S -> result` | `v7/A32/A64` | -| poly8x8_t vget_high_p8(poly8x16_t a) | `a -> Vn.16B` | `DUP Vd.1D,Vn.D[1]` | `Vd.8B -> result` | `v7/A32/A64` | -| poly16x4_t vget_high_p16(poly16x8_t a) | `a -> Vn.8H` | `DUP Vd.1D,Vn.D[1]` | `Vd.4H -> result` | `v7/A32/A64` | -| float64x1_t vget_high_f64(float64x2_t a) | `a -> Vn.2D` | `DUP Vd.1D,Vn.D[1]` | `Vd.1D -> result` | `A64` | -| int8x8_t vget_low_s8(int8x16_t a) | `a -> Vn.16B` | `DUP Vd.1D,Vn.D[0]` | `Vd.8B -> result` | `v7/A32/A64` | -| int16x4_t vget_low_s16(int16x8_t a) | `a -> Vn.8H` | `DUP Vd.1D,Vn.D[0]` | `Vd.4H -> result` | `v7/A32/A64` | -| int32x2_t vget_low_s32(int32x4_t a) | `a -> Vn.4S` | `DUP Vd.1D,Vn.D[0]` | `Vd.2S -> result` | `v7/A32/A64` | -| int64x1_t vget_low_s64(int64x2_t a) | `a -> Vn.2D` | `DUP Vd.1D,Vn.D[0]` | `Vd.1D -> result` | `v7/A32/A64` | -| uint8x8_t vget_low_u8(uint8x16_t a) | `a -> Vn.16B` | `DUP Vd.1D,Vn.D[0]` | `Vd.8B -> result` | `v7/A32/A64` | -| uint16x4_t vget_low_u16(uint16x8_t a) | `a -> Vn.8H` | `DUP Vd.1D,Vn.D[0]` | `Vd.4H -> result` | `v7/A32/A64` | -| uint32x2_t vget_low_u32(uint32x4_t a) | `a -> Vn.4S` | `DUP Vd.1D,Vn.D[0]` | `Vd.2S -> result` | `v7/A32/A64` | -| uint64x1_t vget_low_u64(uint64x2_t a) | `a -> Vn.2D` | `DUP Vd.1D,Vn.D[0]` | `Vd.1D -> result` | `v7/A32/A64` | -| poly64x1_t vget_low_p64(poly64x2_t a) | `a -> Vn.2D` | `DUP Vd.1D,Vn.D[0]` | `Vd.1D -> result` | `A32/A64` | -| float16x4_t vget_low_f16(float16x8_t a) | `a -> Vn.8H` | `DUP Vd.1D,Vn.D[0]` | `Vd.4H -> result` | `v7/A32/A64` | -| float32x2_t vget_low_f32(float32x4_t a) | `a -> Vn.4S` | `DUP Vd.1D,Vn.D[0]` | `Vd.2S -> result` | `v7/A32/A64` | -| poly8x8_t vget_low_p8(poly8x16_t a) | `a -> Vn.16B` | `DUP Vd.1D,Vn.D[0]` | `Vd.8B -> result` | `v7/A32/A64` | -| poly16x4_t vget_low_p16(poly16x8_t a) | `a -> Vn.8H` | `DUP Vd.1D,Vn.D[0]` | `Vd.4H -> result` | `v7/A32/A64` | -| float64x1_t vget_low_f64(float64x2_t a) | `a -> Vn.2D` | `DUP Vd.1D,Vn.D[0]` | `Vd.1D -> result` | `A64` | +| Intrinsic | Argument preparation | AArch64 Instruction | Result | Supported architectures | +|------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------|-----------------------|-------------------|---------------------------| +| int8x8_t vget_high_s8(int8x16_t a) | `a -> Vn.16B` | `DUP Vd.1D,Vn.D[1]` | `Vd.8B -> result` | `v7/A32/A64` | +| int16x4_t vget_high_s16(int16x8_t a) | `a -> Vn.8H` | `DUP Vd.1D,Vn.D[1]` | `Vd.4H -> result` | `v7/A32/A64` | +| int32x2_t vget_high_s32(int32x4_t a) | `a -> Vn.4S` | `DUP Vd.1D,Vn.D[1]` | `Vd.2S -> result` | `v7/A32/A64` | +| int64x1_t vget_high_s64(int64x2_t a) | `a -> Vn.2D` | `DUP Vd.1D,Vn.D[1]` | `Vd.1D -> result` | `v7/A32/A64` | +| uint8x8_t vget_high_u8(uint8x16_t a) | `a -> Vn.16B` | `DUP Vd.1D,Vn.D[1]` | `Vd.8B -> result` | `v7/A32/A64` | +| uint16x4_t vget_high_u16(uint16x8_t a) | `a -> Vn.8H` | `DUP Vd.1D,Vn.D[1]` | `Vd.4H -> result` | `v7/A32/A64` | +| uint32x2_t vget_high_u32(uint32x4_t a) | `a -> Vn.4S` | `DUP Vd.1D,Vn.D[1]` | `Vd.2S -> result` | `v7/A32/A64` | +| uint64x1_t vget_high_u64(uint64x2_t a) | `a -> Vn.2D` | `DUP Vd.1D,Vn.D[1]` | `Vd.1D -> result` | `v7/A32/A64` | +| poly64x1_t vget_high_p64(poly64x2_t a) | `a -> Vn.2D` | `DUP Vd.1D,Vn.D[1]` | `Vd.1D -> result` | `A32/A64` | +| float16x4_t vget_high_f16(float16x8_t a) | `a -> Vn.8H` | `DUP Vd.1D,Vn.D[1]` | `Vd.4H -> result` | `v7/A32/A64` | +| float32x2_t vget_high_f32(float32x4_t a) | `a -> Vn.4S` | `DUP Vd.1D,Vn.D[1]` | `Vd.2S -> result` | `v7/A32/A64` | +| poly8x8_t vget_high_p8(poly8x16_t a) | `a -> Vn.16B` | `DUP Vd.1D,Vn.D[1]` | `Vd.8B -> result` | `v7/A32/A64` | +| poly16x4_t vget_high_p16(poly16x8_t a) | `a -> Vn.8H` | `DUP Vd.1D,Vn.D[1]` | `Vd.4H -> result` | `v7/A32/A64` | +| float64x1_t vget_high_f64(float64x2_t a) | `a -> Vn.2D` | `DUP Vd.1D,Vn.D[1]` | `Vd.1D -> result` | `A64` | +| mfloat8x8_t vget_high_mf8(mfloat8x16_t a) | `a -> Vn.16B` | `DUP Vd.1D,Vn.D[1]` | `Vd.8B -> result` | `A64` | +| int8x8_t vget_low_s8(int8x16_t a) | `a -> Vn.16B` | `DUP Vd.1D,Vn.D[0]` | `Vd.8B -> result` | `v7/A32/A64` | +| int16x4_t vget_low_s16(int16x8_t a) | `a -> Vn.8H` | `DUP Vd.1D,Vn.D[0]` | `Vd.4H -> result` | `v7/A32/A64` | +| int32x2_t vget_low_s32(int32x4_t a) | `a -> Vn.4S` | `DUP Vd.1D,Vn.D[0]` | `Vd.2S -> result` | `v7/A32/A64` | +| int64x1_t vget_low_s64(int64x2_t a) | `a -> Vn.2D` | `DUP Vd.1D,Vn.D[0]` | `Vd.1D -> result` | `v7/A32/A64` | +| uint8x8_t vget_low_u8(uint8x16_t a) | `a -> Vn.16B` | `DUP Vd.1D,Vn.D[0]` | `Vd.8B -> result` | `v7/A32/A64` | +| uint16x4_t vget_low_u16(uint16x8_t a) | `a -> Vn.8H` | `DUP Vd.1D,Vn.D[0]` | `Vd.4H -> result` | `v7/A32/A64` | +| uint32x2_t vget_low_u32(uint32x4_t a) | `a -> Vn.4S` | `DUP Vd.1D,Vn.D[0]` | `Vd.2S -> result` | `v7/A32/A64` | +| uint64x1_t vget_low_u64(uint64x2_t a) | `a -> Vn.2D` | `DUP Vd.1D,Vn.D[0]` | `Vd.1D -> result` | `v7/A32/A64` | +| poly64x1_t vget_low_p64(poly64x2_t a) | `a -> Vn.2D` | `DUP Vd.1D,Vn.D[0]` | `Vd.1D -> result` | `A32/A64` | +| float16x4_t vget_low_f16(float16x8_t a) | `a -> Vn.8H` | `DUP Vd.1D,Vn.D[0]` | `Vd.4H -> result` | `v7/A32/A64` | +| float32x2_t vget_low_f32(float32x4_t a) | `a -> Vn.4S` | `DUP Vd.1D,Vn.D[0]` | `Vd.2S -> result` | `v7/A32/A64` | +| poly8x8_t vget_low_p8(poly8x16_t a) | `a -> Vn.16B` | `DUP Vd.1D,Vn.D[0]` | `Vd.8B -> result` | `v7/A32/A64` | +| poly16x4_t vget_low_p16(poly16x8_t a) | `a -> Vn.8H` | `DUP Vd.1D,Vn.D[0]` | `Vd.4H -> result` | `v7/A32/A64` | +| float64x1_t vget_low_f64(float64x2_t a) | `a -> Vn.2D` | `DUP Vd.1D,Vn.D[0]` | `Vd.1D -> result` | `A64` | +| mfloat8x8_t vget_low_mf8(mfloat8x16_t a) | `a -> Vn.16B` | `DUP Vd.1D,Vn.D[0]` | `Vd.8B -> result` | `A64` | #### Extract one element from vector -| Intrinsic | Argument preparation | AArch64 Instruction | Result | Supported architectures | -|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------|-----------------------|----------------|---------------------------| -| int8_t vdupb_lane_s8(
     int8x8_t vec,
     const int lane)
| `vec -> Vn.8B`
`0 <= lane <= 7` | `DUP Bd,Vn.B[lane]` | `Bd -> result` | `A64` | -| int16_t vduph_lane_s16(
     int16x4_t vec,
     const int lane)
| `vec -> Vn.4H`
`0 <= lane <= 3` | `DUP Hd,Vn.H[lane]` | `Hd -> result` | `A64` | -| int32_t vdups_lane_s32(
     int32x2_t vec,
     const int lane)
| `vec -> Vn.2S`
`0 <= lane <= 1` | `DUP Sd,Vn.S[lane]` | `Sd -> result` | `A64` | -| int64_t vdupd_lane_s64(
     int64x1_t vec,
     const int lane)
| `vec -> Vn.1D`
`0 <= lane <= 0` | `DUP Dd,Vn.D[lane]` | `Dd -> result` | `A64` | -| uint8_t vdupb_lane_u8(
     uint8x8_t vec,
     const int lane)
| `vec -> Vn.8B`
`0 <= lane <= 7` | `DUP Bd,Vn.B[lane]` | `Bd -> result` | `A64` | -| uint16_t vduph_lane_u16(
     uint16x4_t vec,
     const int lane)
| `vec -> Vn.4H`
`0 <= lane <= 3` | `DUP Hd,Vn.H[lane]` | `Hd -> result` | `A64` | -| uint32_t vdups_lane_u32(
     uint32x2_t vec,
     const int lane)
| `vec -> Vn.2S`
`0 <= lane <= 1` | `DUP Sd,Vn.S[lane]` | `Sd -> result` | `A64` | -| uint64_t vdupd_lane_u64(
     uint64x1_t vec,
     const int lane)
| `vec -> Vn.1D`
`0 <= lane <= 0` | `DUP Dd,Vn.D[lane]` | `Dd -> result` | `A64` | -| float32_t vdups_lane_f32(
     float32x2_t vec,
     const int lane)
| `vec -> Vn.2S`
`0 <= lane <= 1` | `DUP Sd,Vn.S[lane]` | `Sd -> result` | `A64` | -| float64_t vdupd_lane_f64(
     float64x1_t vec,
     const int lane)
| `vec -> Vn.1D`
`0 <= lane <= 0` | `DUP Dd,Vn.D[lane]` | `Dd -> result` | `A64` | -| poly8_t vdupb_lane_p8(
     poly8x8_t vec,
     const int lane)
| `vec -> Vn.8B`
`0 <= lane <= 7` | `DUP Bd,Vn.B[lane]` | `Bd -> result` | `A64` | -| poly16_t vduph_lane_p16(
     poly16x4_t vec,
     const int lane)
| `vec -> Vn.4H`
`0 <= lane <= 3` | `DUP Hd,Vn.H[lane]` | `Hd -> result` | `A64` | -| int8_t vdupb_laneq_s8(
     int8x16_t vec,
     const int lane)
| `vec -> Vn.16B`
`0 <= lane <= 15` | `DUP Bd,Vn.B[lane]` | `Bd -> result` | `A64` | -| int16_t vduph_laneq_s16(
     int16x8_t vec,
     const int lane)
| `vec -> Vn.8H`
`0 <= lane <= 7` | `DUP Hd,Vn.H[lane]` | `Hd -> result` | `A64` | -| int32_t vdups_laneq_s32(
     int32x4_t vec,
     const int lane)
| `vec -> Vn.4S`
`0 <= lane <= 3` | `DUP Sd,Vn.S[lane]` | `Sd -> result` | `A64` | -| int64_t vdupd_laneq_s64(
     int64x2_t vec,
     const int lane)
| `vec -> Vn.2D`
`0 <= lane <= 1` | `DUP Dd,Vn.D[lane]` | `Dd -> result` | `A64` | -| uint8_t vdupb_laneq_u8(
     uint8x16_t vec,
     const int lane)
| `vec -> Vn.16B`
`0 <= lane <= 15` | `DUP Bd,Vn.B[lane]` | `Bd -> result` | `A64` | -| uint16_t vduph_laneq_u16(
     uint16x8_t vec,
     const int lane)
| `vec -> Vn.8H`
`0 <= lane <= 7` | `DUP Hd,Vn.H[lane]` | `Hd -> result` | `A64` | -| uint32_t vdups_laneq_u32(
     uint32x4_t vec,
     const int lane)
| `vec -> Vn.4S`
`0 <= lane <= 3` | `DUP Sd,Vn.S[lane]` | `Sd -> result` | `A64` | -| uint64_t vdupd_laneq_u64(
     uint64x2_t vec,
     const int lane)
| `vec -> Vn.2D`
`0 <= lane <= 1` | `DUP Dd,Vn.D[lane]` | `Dd -> result` | `A64` | -| float32_t vdups_laneq_f32(
     float32x4_t vec,
     const int lane)
| `vec -> Vn.4S`
`0 <= lane <= 3` | `DUP Sd,Vn.S[lane]` | `Sd -> result` | `A64` | -| float64_t vdupd_laneq_f64(
     float64x2_t vec,
     const int lane)
| `vec -> Vn.2D`
`0 <= lane <= 1` | `DUP Dd,Vn.D[lane]` | `Dd -> result` | `A64` | -| poly8_t vdupb_laneq_p8(
     poly8x16_t vec,
     const int lane)
| `vec -> Vn.16B`
`0 <= lane <= 15` | `DUP Bd,Vn.B[lane]` | `Bd -> result` | `A64` | -| poly16_t vduph_laneq_p16(
     poly16x8_t vec,
     const int lane)
| `vec -> Vn.8H`
`0 <= lane <= 7` | `DUP Hd,Vn.H[lane]` | `Hd -> result` | `A64` | -| uint8_t vget_lane_u8(
     uint8x8_t v,
     const int lane)
| `0<=lane<=7`
`v -> Vn.8B` | `UMOV Rd,Vn.B[lane]` | `Rd -> result` | `v7/A32/A64` | -| uint16_t vget_lane_u16(
     uint16x4_t v,
     const int lane)
| `0<=lane<=3`
`v -> Vn.4H` | `UMOV Rd,Vn.H[lane]` | `Rd -> result` | `v7/A32/A64` | -| uint32_t vget_lane_u32(
     uint32x2_t v,
     const int lane)
| `0<=lane<=1`
`v -> Vn.2S` | `UMOV Rd,Vn.S[lane]` | `Rd -> result` | `v7/A32/A64` | -| uint64_t vget_lane_u64(
     uint64x1_t v,
     const int lane)
| `lane==0`
`v -> Vn.1D` | `UMOV Rd,Vn.D[lane]` | `Rd -> result` | `v7/A32/A64` | -| poly64_t vget_lane_p64(
     poly64x1_t v,
     const int lane)
| `lane==0`
`v -> Vn.1D` | `UMOV Rd,Vn.D[lane]` | `Rd -> result` | `A32/A64` | -| int8_t vget_lane_s8(
     int8x8_t v,
     const int lane)
| `0<=lane<=7`
`v -> Vn.8B` | `SMOV Rd,Vn.B[lane]` | `Rd -> result` | `v7/A32/A64` | -| int16_t vget_lane_s16(
     int16x4_t v,
     const int lane)
| `0<=lane<=3`
`v -> Vn.4H` | `SMOV Rd,Vn.H[lane]` | `Rd -> result` | `v7/A32/A64` | -| int32_t vget_lane_s32(
     int32x2_t v,
     const int lane)
| `0<=lane<=1`
`v -> Vn.2S` | `SMOV Rd,Vn.S[lane]` | `Rd -> result` | `v7/A32/A64` | -| int64_t vget_lane_s64(
     int64x1_t v,
     const int lane)
| `lane==0`
`v -> Vn.1D` | `UMOV Rd,Vn.D[lane]` | `Rd -> result` | `v7/A32/A64` | -| poly8_t vget_lane_p8(
     poly8x8_t v,
     const int lane)
| `0<=lane<=7`
`v -> Vn.8B` | `UMOV Rd,Vn.B[lane]` | `Rd -> result` | `v7/A32/A64` | -| poly16_t vget_lane_p16(
     poly16x4_t v,
     const int lane)
| `0<=lane<=3`
`v -> Vn.4H` | `UMOV Rd,Vn.H[lane]` | `Rd -> result` | `v7/A32/A64` | -| float32_t vget_lane_f32(
     float32x2_t v,
     const int lane)
| `0<=lane<=1`
`v -> Vn.2S` | `DUP Sd,Vn.S[lane]` | `Sd -> result` | `v7/A32/A64` | -| float64_t vget_lane_f64(
     float64x1_t v,
     const int lane)
| `lane==0`
`v -> Vn.1D` | `DUP Dd,Vn.D[lane]` | `Dd -> result` | `A64` | -| uint8_t vgetq_lane_u8(
     uint8x16_t v,
     const int lane)
| `0<=lane<=15`
`v -> Vn.16B` | `UMOV Rd,Vn.B[lane]` | `Rd -> result` | `v7/A32/A64` | -| uint16_t vgetq_lane_u16(
     uint16x8_t v,
     const int lane)
| `0<=lane<=7`
`v -> Vn.8H` | `UMOV Rd,Vn.H[lane]` | `Rd -> result` | `v7/A32/A64` | -| uint32_t vgetq_lane_u32(
     uint32x4_t v,
     const int lane)
| `0<=lane<=3`
`v -> Vn.4S` | `UMOV Rd,Vn.S[lane]` | `Rd -> result` | `v7/A32/A64` | -| uint64_t vgetq_lane_u64(
     uint64x2_t v,
     const int lane)
| `0<=lane<=1`
`v -> Vn.2D` | `UMOV Rd,Vn.D[lane]` | `Rd -> result` | `v7/A32/A64` | -| poly64_t vgetq_lane_p64(
     poly64x2_t v,
     const int lane)
| `0<=lane<=1`
`v -> Vn.2D` | `UMOV Rd,Vn.D[lane]` | `Rd -> result` | `A32/A64` | -| int8_t vgetq_lane_s8(
     int8x16_t v,
     const int lane)
| `0<=lane<=15`
`v -> Vn.16B` | `SMOV Rd,Vn.B[lane]` | `Rd -> result` | `v7/A32/A64` | -| int16_t vgetq_lane_s16(
     int16x8_t v,
     const int lane)
| `0<=lane<=7`
`v -> Vn.8H` | `SMOV Rd,Vn.H[lane]` | `Rd -> result` | `v7/A32/A64` | -| int32_t vgetq_lane_s32(
     int32x4_t v,
     const int lane)
| `0<=lane<=3`
`v -> Vn.4S` | `SMOV Rd,Vn.S[lane]` | `Rd -> result` | `v7/A32/A64` | -| int64_t vgetq_lane_s64(
     int64x2_t v,
     const int lane)
| `0<=lane<=1`
`v -> Vn.2D` | `UMOV Rd,Vn.D[lane]` | `Rd -> result` | `v7/A32/A64` | -| poly8_t vgetq_lane_p8(
     poly8x16_t v,
     const int lane)
| `0<=lane<=15`
`v -> Vn.16B` | `UMOV Rd,Vn.B[lane]` | `Rd -> result` | `v7/A32/A64` | -| poly16_t vgetq_lane_p16(
     poly16x8_t v,
     const int lane)
| `0<=lane<=7`
`v -> Vn.8H` | `UMOV Rd,Vn.H[lane]` | `Rd -> result` | `v7/A32/A64` | -| float16_t vget_lane_f16(
     float16x4_t v,
     const int lane)
| `0<=lane<=3`
`v -> Vn.4H` | `DUP Hd,Vn.H[lane]` | `Hd -> result` | `v7/A32/A64` | -| float16_t vgetq_lane_f16(
     float16x8_t v,
     const int lane)
| `0<=lane<=7`
`v -> Vn.8H` | `DUP Hd,Vn.H[lane]` | `Hd -> result` | `v7/A32/A64` | -| float32_t vgetq_lane_f32(
     float32x4_t v,
     const int lane)
| `0<=lane<=3`
`v -> Vn.4S` | `DUP Sd,Vn.S[lane]` | `Sd -> result` | `v7/A32/A64` | -| float64_t vgetq_lane_f64(
     float64x2_t v,
     const int lane)
| `0<=lane<=1`
`v -> Vn.2D` | `DUP Dd,Vn.D[lane]` | `Dd -> result` | `A64` | +| Intrinsic | Argument preparation | AArch64 Instruction | Result | Supported architectures | +|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------|-----------------------|----------------|---------------------------| +| int8_t vdupb_lane_s8(
     int8x8_t vec,
     const int lane)
| `vec -> Vn.8B`
`0 <= lane <= 7` | `DUP Bd,Vn.B[lane]` | `Bd -> result` | `A64` | +| int16_t vduph_lane_s16(
     int16x4_t vec,
     const int lane)
| `vec -> Vn.4H`
`0 <= lane <= 3` | `DUP Hd,Vn.H[lane]` | `Hd -> result` | `A64` | +| int32_t vdups_lane_s32(
     int32x2_t vec,
     const int lane)
| `vec -> Vn.2S`
`0 <= lane <= 1` | `DUP Sd,Vn.S[lane]` | `Sd -> result` | `A64` | +| int64_t vdupd_lane_s64(
     int64x1_t vec,
     const int lane)
| `vec -> Vn.1D`
`0 <= lane <= 0` | `DUP Dd,Vn.D[lane]` | `Dd -> result` | `A64` | +| uint8_t vdupb_lane_u8(
     uint8x8_t vec,
     const int lane)
| `vec -> Vn.8B`
`0 <= lane <= 7` | `DUP Bd,Vn.B[lane]` | `Bd -> result` | `A64` | +| uint16_t vduph_lane_u16(
     uint16x4_t vec,
     const int lane)
| `vec -> Vn.4H`
`0 <= lane <= 3` | `DUP Hd,Vn.H[lane]` | `Hd -> result` | `A64` | +| uint32_t vdups_lane_u32(
     uint32x2_t vec,
     const int lane)
| `vec -> Vn.2S`
`0 <= lane <= 1` | `DUP Sd,Vn.S[lane]` | `Sd -> result` | `A64` | +| uint64_t vdupd_lane_u64(
     uint64x1_t vec,
     const int lane)
| `vec -> Vn.1D`
`0 <= lane <= 0` | `DUP Dd,Vn.D[lane]` | `Dd -> result` | `A64` | +| float32_t vdups_lane_f32(
     float32x2_t vec,
     const int lane)
| `vec -> Vn.2S`
`0 <= lane <= 1` | `DUP Sd,Vn.S[lane]` | `Sd -> result` | `A64` | +| float64_t vdupd_lane_f64(
     float64x1_t vec,
     const int lane)
| `vec -> Vn.1D`
`0 <= lane <= 0` | `DUP Dd,Vn.D[lane]` | `Dd -> result` | `A64` | +| poly8_t vdupb_lane_p8(
     poly8x8_t vec,
     const int lane)
| `vec -> Vn.8B`
`0 <= lane <= 7` | `DUP Bd,Vn.B[lane]` | `Bd -> result` | `A64` | +| poly16_t vduph_lane_p16(
     poly16x4_t vec,
     const int lane)
| `vec -> Vn.4H`
`0 <= lane <= 3` | `DUP Hd,Vn.H[lane]` | `Hd -> result` | `A64` | +| mfloat8_t vdupb_lane_mf8(
     mfloat8x8_t vec,
     const int lane)
| `vec -> Vn.8B`
`0 <= lane <= 7` | `DUP Bd,Vn.B[lane]` | `Bd -> result` | `A64` | +| int8_t vdupb_laneq_s8(
     int8x16_t vec,
     const int lane)
| `vec -> Vn.16B`
`0 <= lane <= 15` | `DUP Bd,Vn.B[lane]` | `Bd -> result` | `A64` | +| int16_t vduph_laneq_s16(
     int16x8_t vec,
     const int lane)
| `vec -> Vn.8H`
`0 <= lane <= 7` | `DUP Hd,Vn.H[lane]` | `Hd -> result` | `A64` | +| int32_t vdups_laneq_s32(
     int32x4_t vec,
     const int lane)
| `vec -> Vn.4S`
`0 <= lane <= 3` | `DUP Sd,Vn.S[lane]` | `Sd -> result` | `A64` | +| int64_t vdupd_laneq_s64(
     int64x2_t vec,
     const int lane)
| `vec -> Vn.2D`
`0 <= lane <= 1` | `DUP Dd,Vn.D[lane]` | `Dd -> result` | `A64` | +| uint8_t vdupb_laneq_u8(
     uint8x16_t vec,
     const int lane)
| `vec -> Vn.16B`
`0 <= lane <= 15` | `DUP Bd,Vn.B[lane]` | `Bd -> result` | `A64` | +| uint16_t vduph_laneq_u16(
     uint16x8_t vec,
     const int lane)
| `vec -> Vn.8H`
`0 <= lane <= 7` | `DUP Hd,Vn.H[lane]` | `Hd -> result` | `A64` | +| uint32_t vdups_laneq_u32(
     uint32x4_t vec,
     const int lane)
| `vec -> Vn.4S`
`0 <= lane <= 3` | `DUP Sd,Vn.S[lane]` | `Sd -> result` | `A64` | +| uint64_t vdupd_laneq_u64(
     uint64x2_t vec,
     const int lane)
| `vec -> Vn.2D`
`0 <= lane <= 1` | `DUP Dd,Vn.D[lane]` | `Dd -> result` | `A64` | +| float32_t vdups_laneq_f32(
     float32x4_t vec,
     const int lane)
| `vec -> Vn.4S`
`0 <= lane <= 3` | `DUP Sd,Vn.S[lane]` | `Sd -> result` | `A64` | +| float64_t vdupd_laneq_f64(
     float64x2_t vec,
     const int lane)
| `vec -> Vn.2D`
`0 <= lane <= 1` | `DUP Dd,Vn.D[lane]` | `Dd -> result` | `A64` | +| poly8_t vdupb_laneq_p8(
     poly8x16_t vec,
     const int lane)
| `vec -> Vn.16B`
`0 <= lane <= 15` | `DUP Bd,Vn.B[lane]` | `Bd -> result` | `A64` | +| poly16_t vduph_laneq_p16(
     poly16x8_t vec,
     const int lane)
| `vec -> Vn.8H`
`0 <= lane <= 7` | `DUP Hd,Vn.H[lane]` | `Hd -> result` | `A64` | +| mfloat8_t vdupb_laneq_mf8(
     mfloat8x16_t vec,
     const int lane)
| `vec -> Vn.16B`
`0 <= lane <= 15` | `DUP Bd,Vn.B[lane]` | `Bd -> result` | `A64` | +| uint8_t vget_lane_u8(
     uint8x8_t v,
     const int lane)
| `0<=lane<=7`
`v -> Vn.8B` | `UMOV Rd,Vn.B[lane]` | `Rd -> result` | `v7/A32/A64` | +| uint16_t vget_lane_u16(
     uint16x4_t v,
     const int lane)
| `0<=lane<=3`
`v -> Vn.4H` | `UMOV Rd,Vn.H[lane]` | `Rd -> result` | `v7/A32/A64` | +| uint32_t vget_lane_u32(
     uint32x2_t v,
     const int lane)
| `0<=lane<=1`
`v -> Vn.2S` | `UMOV Rd,Vn.S[lane]` | `Rd -> result` | `v7/A32/A64` | +| uint64_t vget_lane_u64(
     uint64x1_t v,
     const int lane)
| `lane==0`
`v -> Vn.1D` | `UMOV Rd,Vn.D[lane]` | `Rd -> result` | `v7/A32/A64` | +| poly64_t vget_lane_p64(
     poly64x1_t v,
     const int lane)
| `lane==0`
`v -> Vn.1D` | `UMOV Rd,Vn.D[lane]` | `Rd -> result` | `A32/A64` | +| int8_t vget_lane_s8(
     int8x8_t v,
     const int lane)
| `0<=lane<=7`
`v -> Vn.8B` | `SMOV Rd,Vn.B[lane]` | `Rd -> result` | `v7/A32/A64` | +| int16_t vget_lane_s16(
     int16x4_t v,
     const int lane)
| `0<=lane<=3`
`v -> Vn.4H` | `SMOV Rd,Vn.H[lane]` | `Rd -> result` | `v7/A32/A64` | +| int32_t vget_lane_s32(
     int32x2_t v,
     const int lane)
| `0<=lane<=1`
`v -> Vn.2S` | `SMOV Rd,Vn.S[lane]` | `Rd -> result` | `v7/A32/A64` | +| int64_t vget_lane_s64(
     int64x1_t v,
     const int lane)
| `lane==0`
`v -> Vn.1D` | `UMOV Rd,Vn.D[lane]` | `Rd -> result` | `v7/A32/A64` | +| poly8_t vget_lane_p8(
     poly8x8_t v,
     const int lane)
| `0<=lane<=7`
`v -> Vn.8B` | `UMOV Rd,Vn.B[lane]` | `Rd -> result` | `v7/A32/A64` | +| poly16_t vget_lane_p16(
     poly16x4_t v,
     const int lane)
| `0<=lane<=3`
`v -> Vn.4H` | `UMOV Rd,Vn.H[lane]` | `Rd -> result` | `v7/A32/A64` | +| float32_t vget_lane_f32(
     float32x2_t v,
     const int lane)
| `0<=lane<=1`
`v -> Vn.2S` | `DUP Sd,Vn.S[lane]` | `Sd -> result` | `v7/A32/A64` | +| float64_t vget_lane_f64(
     float64x1_t v,
     const int lane)
| `lane==0`
`v -> Vn.1D` | `DUP Dd,Vn.D[lane]` | `Dd -> result` | `A64` | +| uint8_t vgetq_lane_u8(
     uint8x16_t v,
     const int lane)
| `0<=lane<=15`
`v -> Vn.16B` | `UMOV Rd,Vn.B[lane]` | `Rd -> result` | `v7/A32/A64` | +| uint16_t vgetq_lane_u16(
     uint16x8_t v,
     const int lane)
| `0<=lane<=7`
`v -> Vn.8H` | `UMOV Rd,Vn.H[lane]` | `Rd -> result` | `v7/A32/A64` | +| uint32_t vgetq_lane_u32(
     uint32x4_t v,
     const int lane)
| `0<=lane<=3`
`v -> Vn.4S` | `UMOV Rd,Vn.S[lane]` | `Rd -> result` | `v7/A32/A64` | +| uint64_t vgetq_lane_u64(
     uint64x2_t v,
     const int lane)
| `0<=lane<=1`
`v -> Vn.2D` | `UMOV Rd,Vn.D[lane]` | `Rd -> result` | `v7/A32/A64` | +| poly64_t vgetq_lane_p64(
     poly64x2_t v,
     const int lane)
| `0<=lane<=1`
`v -> Vn.2D` | `UMOV Rd,Vn.D[lane]` | `Rd -> result` | `A32/A64` | +| int8_t vgetq_lane_s8(
     int8x16_t v,
     const int lane)
| `0<=lane<=15`
`v -> Vn.16B` | `SMOV Rd,Vn.B[lane]` | `Rd -> result` | `v7/A32/A64` | +| int16_t vgetq_lane_s16(
     int16x8_t v,
     const int lane)
| `0<=lane<=7`
`v -> Vn.8H` | `SMOV Rd,Vn.H[lane]` | `Rd -> result` | `v7/A32/A64` | +| int32_t vgetq_lane_s32(
     int32x4_t v,
     const int lane)
| `0<=lane<=3`
`v -> Vn.4S` | `SMOV Rd,Vn.S[lane]` | `Rd -> result` | `v7/A32/A64` | +| int64_t vgetq_lane_s64(
     int64x2_t v,
     const int lane)
| `0<=lane<=1`
`v -> Vn.2D` | `UMOV Rd,Vn.D[lane]` | `Rd -> result` | `v7/A32/A64` | +| poly8_t vgetq_lane_p8(
     poly8x16_t v,
     const int lane)
| `0<=lane<=15`
`v -> Vn.16B` | `UMOV Rd,Vn.B[lane]` | `Rd -> result` | `v7/A32/A64` | +| poly16_t vgetq_lane_p16(
     poly16x8_t v,
     const int lane)
| `0<=lane<=7`
`v -> Vn.8H` | `UMOV Rd,Vn.H[lane]` | `Rd -> result` | `v7/A32/A64` | +| float16_t vget_lane_f16(
     float16x4_t v,
     const int lane)
| `0<=lane<=3`
`v -> Vn.4H` | `DUP Hd,Vn.H[lane]` | `Hd -> result` | `v7/A32/A64` | +| float16_t vgetq_lane_f16(
     float16x8_t v,
     const int lane)
| `0<=lane<=7`
`v -> Vn.8H` | `DUP Hd,Vn.H[lane]` | `Hd -> result` | `v7/A32/A64` | +| float32_t vgetq_lane_f32(
     float32x4_t v,
     const int lane)
| `0<=lane<=3`
`v -> Vn.4S` | `DUP Sd,Vn.S[lane]` | `Sd -> result` | `v7/A32/A64` | +| float64_t vgetq_lane_f64(
     float64x2_t v,
     const int lane)
| `0<=lane<=1`
`v -> Vn.2D` | `DUP Dd,Vn.D[lane]` | `Dd -> result` | `A64` | #### Extract vector from a pair of vectors -| Intrinsic | Argument preparation | AArch64 Instruction | Result | Supported architectures | -|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------|------------------------------------|--------------------|---------------------------| -| int8x8_t vext_s8(
     int8x8_t a,
     int8x8_t b,
     const int n)
| `a -> Vn.8B`
`b -> Vm.8B`
`0 <= n <= 7` | `EXT Vd.8B,Vn.8B,Vm.8B,#n` | `Vd.8B -> result` | `v7/A32/A64` | -| int8x16_t vextq_s8(
     int8x16_t a,
     int8x16_t b,
     const int n)
| `a -> Vn.16B`
`b -> Vm.16B`
`0 <= n <= 15` | `EXT Vd.16B,Vn.16B,Vm.16B,#n` | `Vd.16B -> result` | `v7/A32/A64` | -| int16x4_t vext_s16(
     int16x4_t a,
     int16x4_t b,
     const int n)
| `a -> Vn.8B`
`b -> Vm.8B`
`0 <= n <= 3` | `EXT Vd.8B,Vn.8B,Vm.8B,#(n<<1)` | `Vd.8B -> result` | `v7/A32/A64` | -| int16x8_t vextq_s16(
     int16x8_t a,
     int16x8_t b,
     const int n)
| `a -> Vn.16B`
`b -> Vm.16B`
`0 <= n <= 7` | `EXT Vd.16B,Vn.16B,Vm.16B,#(n<<1)` | `Vd.16B -> result` | `v7/A32/A64` | -| int32x2_t vext_s32(
     int32x2_t a,
     int32x2_t b,
     const int n)
| `a -> Vn.8B`
`b -> Vm.8B`
`0 <= n <= 1` | `EXT Vd.8B,Vn.8B,Vm.8B,#(n<<2)` | `Vd.8B -> result` | `v7/A32/A64` | -| int32x4_t vextq_s32(
     int32x4_t a,
     int32x4_t b,
     const int n)
| `a -> Vn.16B`
`b -> Vm.16B`
`0 <= n <= 3` | `EXT Vd.16B,Vn.16B,Vm.16B,#(n<<2)` | `Vd.16B -> result` | `v7/A32/A64` | -| int64x1_t vext_s64(
     int64x1_t a,
     int64x1_t b,
     const int n)
| `a -> Vn.8B`
`b -> Vm.8B`
`n == 0` | `EXT Vd.8B,Vn.8B,Vm.8B,#(n<<3)` | `Vd.8B -> result` | `v7/A32/A64` | -| int64x2_t vextq_s64(
     int64x2_t a,
     int64x2_t b,
     const int n)
| `a -> Vn.16B`
`b -> Vm.16B`
`0 <= n <= 1` | `EXT Vd.16B,Vn.16B,Vm.16B,#(n<<3)` | `Vd.16B -> result` | `v7/A32/A64` | -| uint8x8_t vext_u8(
     uint8x8_t a,
     uint8x8_t b,
     const int n)
| `a -> Vn.8B`
`b -> Vm.8B`
`0 <= n <= 7` | `EXT Vd.8B,Vn.8B,Vm.8B,#n` | `Vd.8B -> result` | `v7/A32/A64` | -| uint8x16_t vextq_u8(
     uint8x16_t a,
     uint8x16_t b,
     const int n)
| `a -> Vn.16B`
`b -> Vm.16B`
`0 <= n <= 15` | `EXT Vd.16B,Vn.16B,Vm.16B,#n` | `Vd.16B -> result` | `v7/A32/A64` | -| uint16x4_t vext_u16(
     uint16x4_t a,
     uint16x4_t b,
     const int n)
| `a -> Vn.8B`
`b -> Vm.8B`
`0 <= n <= 3` | `EXT Vd.8B,Vn.8B,Vm.8B,#(n<<1)` | `Vd.8B -> result` | `v7/A32/A64` | -| uint16x8_t vextq_u16(
     uint16x8_t a,
     uint16x8_t b,
     const int n)
| `a -> Vn.16B`
`b -> Vm.16B`
`0 <= n <= 7` | `EXT Vd.16B,Vn.16B,Vm.16B,#(n<<1)` | `Vd.16B -> result` | `v7/A32/A64` | -| uint32x2_t vext_u32(
     uint32x2_t a,
     uint32x2_t b,
     const int n)
| `a -> Vn.8B`
`b -> Vm.8B`
`0 <= n <= 1` | `EXT Vd.8B,Vn.8B,Vm.8B,#(n<<2)` | `Vd.8B -> result` | `v7/A32/A64` | -| uint32x4_t vextq_u32(
     uint32x4_t a,
     uint32x4_t b,
     const int n)
| `a -> Vn.16B`
`b -> Vm.16B`
`0 <= n <= 3` | `EXT Vd.16B,Vn.16B,Vm.16B,#(n<<2)` | `Vd.16B -> result` | `v7/A32/A64` | -| uint64x1_t vext_u64(
     uint64x1_t a,
     uint64x1_t b,
     const int n)
| `a -> Vn.8B`
`b -> Vm.8B`
`n == 0` | `EXT Vd.8B,Vn.8B,Vm.8B,#(n<<3)` | `Vd.8B -> result` | `v7/A32/A64` | -| uint64x2_t vextq_u64(
     uint64x2_t a,
     uint64x2_t b,
     const int n)
| `a -> Vn.16B`
`b -> Vm.16B`
`0 <= n <= 1` | `EXT Vd.16B,Vn.16B,Vm.16B,#(n<<3)` | `Vd.16B -> result` | `v7/A32/A64` | -| poly64x1_t vext_p64(
     poly64x1_t a,
     poly64x1_t b,
     const int n)
| `a -> Vn.8B`
`b -> Vm.8B`
`n == 0` | `EXT Vd.8B,Vn.8B,Vm.8B,#(n<<3)` | `Vd.8B -> result` | `A32/A64` | -| poly64x2_t vextq_p64(
     poly64x2_t a,
     poly64x2_t b,
     const int n)
| `a -> Vn.16B`
`b -> Vm.16B`
`0 <= n <= 1` | `EXT Vd.16B,Vn.16B,Vm.16B,#(n<<3)` | `Vd.16B -> result` | `A32/A64` | -| float32x2_t vext_f32(
     float32x2_t a,
     float32x2_t b,
     const int n)
| `a -> Vn.8B`
`b -> Vm.8B`
`0 <= n <= 1` | `EXT Vd.8B,Vn.8B,Vm.8B,#(n<<2)` | `Vd.8B -> result` | `v7/A32/A64` | -| float32x4_t vextq_f32(
     float32x4_t a,
     float32x4_t b,
     const int n)
| `a -> Vn.16B`
`b -> Vm.16B`
`0 <= n <= 3` | `EXT Vd.16B,Vn.16B,Vm.16B,#(n<<2)` | `Vd.16B -> result` | `v7/A32/A64` | -| float64x1_t vext_f64(
     float64x1_t a,
     float64x1_t b,
     const int n)
| `a -> Vn.8B`
`b -> Vm.8B`
`n == 0` | `EXT Vd.8B,Vn.8B,Vm.8B,#(n<<3)` | `Vd.8B -> result` | `A64` | -| float64x2_t vextq_f64(
     float64x2_t a,
     float64x2_t b,
     const int n)
| `a -> Vn.16B`
`b -> Vm.16B`
`0 <= n <= 1` | `EXT Vd.16B,Vn.16B,Vm.16B,#(n<<3)` | `Vd.16B -> result` | `A64` | -| poly8x8_t vext_p8(
     poly8x8_t a,
     poly8x8_t b,
     const int n)
| `a -> Vn.8B`
`b -> Vm.8B`
`0 <= n <= 7` | `EXT Vd.8B,Vn.8B,Vm.8B,#n` | `Vd.8B -> result` | `v7/A32/A64` | -| poly8x16_t vextq_p8(
     poly8x16_t a,
     poly8x16_t b,
     const int n)
| `a -> Vn.16B`
`b -> Vm.16B`
`0 <= n <= 15` | `EXT Vd.16B,Vn.16B,Vm.16B,#n` | `Vd.16B -> result` | `v7/A32/A64` | -| poly16x4_t vext_p16(
     poly16x4_t a,
     poly16x4_t b,
     const int n)
| `a -> Vn.8B`
`b -> Vm.8B`
`0 <= n <= 3` | `EXT Vd.8B,Vn.8B,Vm.8B,#(n<<1)` | `Vd.8B -> result` | `v7/A32/A64` | -| poly16x8_t vextq_p16(
     poly16x8_t a,
     poly16x8_t b,
     const int n)
| `a -> Vn.16B`
`b -> Vm.16B`
`0 <= n <= 7` | `EXT Vd.16B,Vn.16B,Vm.16B,#(n<<1)` | `Vd.16B -> result` | `v7/A32/A64` | +| Intrinsic | Argument preparation | AArch64 Instruction | Result | Supported architectures | +|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------|------------------------------------|--------------------|---------------------------| +| int8x8_t vext_s8(
     int8x8_t a,
     int8x8_t b,
     const int n)
| `a -> Vn.8B`
`b -> Vm.8B`
`0 <= n <= 7` | `EXT Vd.8B,Vn.8B,Vm.8B,#n` | `Vd.8B -> result` | `v7/A32/A64` | +| int8x16_t vextq_s8(
     int8x16_t a,
     int8x16_t b,
     const int n)
| `a -> Vn.16B`
`b -> Vm.16B`
`0 <= n <= 15` | `EXT Vd.16B,Vn.16B,Vm.16B,#n` | `Vd.16B -> result` | `v7/A32/A64` | +| int16x4_t vext_s16(
     int16x4_t a,
     int16x4_t b,
     const int n)
| `a -> Vn.8B`
`b -> Vm.8B`
`0 <= n <= 3` | `EXT Vd.8B,Vn.8B,Vm.8B,#(n<<1)` | `Vd.8B -> result` | `v7/A32/A64` | +| int16x8_t vextq_s16(
     int16x8_t a,
     int16x8_t b,
     const int n)
| `a -> Vn.16B`
`b -> Vm.16B`
`0 <= n <= 7` | `EXT Vd.16B,Vn.16B,Vm.16B,#(n<<1)` | `Vd.16B -> result` | `v7/A32/A64` | +| int32x2_t vext_s32(
     int32x2_t a,
     int32x2_t b,
     const int n)
| `a -> Vn.8B`
`b -> Vm.8B`
`0 <= n <= 1` | `EXT Vd.8B,Vn.8B,Vm.8B,#(n<<2)` | `Vd.8B -> result` | `v7/A32/A64` | +| int32x4_t vextq_s32(
     int32x4_t a,
     int32x4_t b,
     const int n)
| `a -> Vn.16B`
`b -> Vm.16B`
`0 <= n <= 3` | `EXT Vd.16B,Vn.16B,Vm.16B,#(n<<2)` | `Vd.16B -> result` | `v7/A32/A64` | +| int64x1_t vext_s64(
     int64x1_t a,
     int64x1_t b,
     const int n)
| `a -> Vn.8B`
`b -> Vm.8B`
`n == 0` | `EXT Vd.8B,Vn.8B,Vm.8B,#(n<<3)` | `Vd.8B -> result` | `v7/A32/A64` | +| int64x2_t vextq_s64(
     int64x2_t a,
     int64x2_t b,
     const int n)
| `a -> Vn.16B`
`b -> Vm.16B`
`0 <= n <= 1` | `EXT Vd.16B,Vn.16B,Vm.16B,#(n<<3)` | `Vd.16B -> result` | `v7/A32/A64` | +| uint8x8_t vext_u8(
     uint8x8_t a,
     uint8x8_t b,
     const int n)
| `a -> Vn.8B`
`b -> Vm.8B`
`0 <= n <= 7` | `EXT Vd.8B,Vn.8B,Vm.8B,#n` | `Vd.8B -> result` | `v7/A32/A64` | +| uint8x16_t vextq_u8(
     uint8x16_t a,
     uint8x16_t b,
     const int n)
| `a -> Vn.16B`
`b -> Vm.16B`
`0 <= n <= 15` | `EXT Vd.16B,Vn.16B,Vm.16B,#n` | `Vd.16B -> result` | `v7/A32/A64` | +| uint16x4_t vext_u16(
     uint16x4_t a,
     uint16x4_t b,
     const int n)
| `a -> Vn.8B`
`b -> Vm.8B`
`0 <= n <= 3` | `EXT Vd.8B,Vn.8B,Vm.8B,#(n<<1)` | `Vd.8B -> result` | `v7/A32/A64` | +| uint16x8_t vextq_u16(
     uint16x8_t a,
     uint16x8_t b,
     const int n)
| `a -> Vn.16B`
`b -> Vm.16B`
`0 <= n <= 7` | `EXT Vd.16B,Vn.16B,Vm.16B,#(n<<1)` | `Vd.16B -> result` | `v7/A32/A64` | +| uint32x2_t vext_u32(
     uint32x2_t a,
     uint32x2_t b,
     const int n)
| `a -> Vn.8B`
`b -> Vm.8B`
`0 <= n <= 1` | `EXT Vd.8B,Vn.8B,Vm.8B,#(n<<2)` | `Vd.8B -> result` | `v7/A32/A64` | +| uint32x4_t vextq_u32(
     uint32x4_t a,
     uint32x4_t b,
     const int n)
| `a -> Vn.16B`
`b -> Vm.16B`
`0 <= n <= 3` | `EXT Vd.16B,Vn.16B,Vm.16B,#(n<<2)` | `Vd.16B -> result` | `v7/A32/A64` | +| uint64x1_t vext_u64(
     uint64x1_t a,
     uint64x1_t b,
     const int n)
| `a -> Vn.8B`
`b -> Vm.8B`
`n == 0` | `EXT Vd.8B,Vn.8B,Vm.8B,#(n<<3)` | `Vd.8B -> result` | `v7/A32/A64` | +| uint64x2_t vextq_u64(
     uint64x2_t a,
     uint64x2_t b,
     const int n)
| `a -> Vn.16B`
`b -> Vm.16B`
`0 <= n <= 1` | `EXT Vd.16B,Vn.16B,Vm.16B,#(n<<3)` | `Vd.16B -> result` | `v7/A32/A64` | +| poly64x1_t vext_p64(
     poly64x1_t a,
     poly64x1_t b,
     const int n)
| `a -> Vn.8B`
`b -> Vm.8B`
`n == 0` | `EXT Vd.8B,Vn.8B,Vm.8B,#(n<<3)` | `Vd.8B -> result` | `A32/A64` | +| poly64x2_t vextq_p64(
     poly64x2_t a,
     poly64x2_t b,
     const int n)
| `a -> Vn.16B`
`b -> Vm.16B`
`0 <= n <= 1` | `EXT Vd.16B,Vn.16B,Vm.16B,#(n<<3)` | `Vd.16B -> result` | `A32/A64` | +| float32x2_t vext_f32(
     float32x2_t a,
     float32x2_t b,
     const int n)
| `a -> Vn.8B`
`b -> Vm.8B`
`0 <= n <= 1` | `EXT Vd.8B,Vn.8B,Vm.8B,#(n<<2)` | `Vd.8B -> result` | `v7/A32/A64` | +| float32x4_t vextq_f32(
     float32x4_t a,
     float32x4_t b,
     const int n)
| `a -> Vn.16B`
`b -> Vm.16B`
`0 <= n <= 3` | `EXT Vd.16B,Vn.16B,Vm.16B,#(n<<2)` | `Vd.16B -> result` | `v7/A32/A64` | +| float64x1_t vext_f64(
     float64x1_t a,
     float64x1_t b,
     const int n)
| `a -> Vn.8B`
`b -> Vm.8B`
`n == 0` | `EXT Vd.8B,Vn.8B,Vm.8B,#(n<<3)` | `Vd.8B -> result` | `A64` | +| float64x2_t vextq_f64(
     float64x2_t a,
     float64x2_t b,
     const int n)
| `a -> Vn.16B`
`b -> Vm.16B`
`0 <= n <= 1` | `EXT Vd.16B,Vn.16B,Vm.16B,#(n<<3)` | `Vd.16B -> result` | `A64` | +| poly8x8_t vext_p8(
     poly8x8_t a,
     poly8x8_t b,
     const int n)
| `a -> Vn.8B`
`b -> Vm.8B`
`0 <= n <= 7` | `EXT Vd.8B,Vn.8B,Vm.8B,#n` | `Vd.8B -> result` | `v7/A32/A64` | +| poly8x16_t vextq_p8(
     poly8x16_t a,
     poly8x16_t b,
     const int n)
| `a -> Vn.16B`
`b -> Vm.16B`
`0 <= n <= 15` | `EXT Vd.16B,Vn.16B,Vm.16B,#n` | `Vd.16B -> result` | `v7/A32/A64` | +| poly16x4_t vext_p16(
     poly16x4_t a,
     poly16x4_t b,
     const int n)
| `a -> Vn.8B`
`b -> Vm.8B`
`0 <= n <= 3` | `EXT Vd.8B,Vn.8B,Vm.8B,#(n<<1)` | `Vd.8B -> result` | `v7/A32/A64` | +| poly16x8_t vextq_p16(
     poly16x8_t a,
     poly16x8_t b,
     const int n)
| `a -> Vn.16B`
`b -> Vm.16B`
`0 <= n <= 7` | `EXT Vd.16B,Vn.16B,Vm.16B,#(n<<1)` | `Vd.16B -> result` | `v7/A32/A64` | +| mfloat8x8_t vext_mf8(
     mfloat8x8_t a,
     mfloat8x8_t b,
     const int n)
| `a -> Vn.8B`
`b -> Vm.8B`
`0 <= n <= 7` | `EXT Vd.8B,Vn.8B,Vm.8B,#n` | `Vd.8B -> result` | `A64` | +| mfloat8x16_t vextq_mf8(
     mfloat8x16_t a,
     mfloat8x16_t b,
     const int n)
| `a -> Vn.16B`
`b -> Vm.16B`
`0 <= n <= 15` | `EXT Vd.16B,Vn.16B,Vm.16B,#n` | `Vd.16B -> result` | `A64` | #### Reverse elements -| Intrinsic | Argument preparation | AArch64 Instruction | Result | Supported architectures | -|---------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------|-----------------------|--------------------|---------------------------| -| int8x8_t vrev64_s8(int8x8_t vec) | `vec -> Vn.8B` | `REV64 Vd.8B,Vn.8B` | `Vd.8B -> result` | `v7/A32/A64` | -| int8x16_t vrev64q_s8(int8x16_t vec) | `vec -> Vn.16B` | `REV64 Vd.16B,Vn.16B` | `Vd.16B -> result` | `v7/A32/A64` | -| int16x4_t vrev64_s16(int16x4_t vec) | `vec -> Vn.4H` | `REV64 Vd.4H,Vn.4H` | `Vd.4H -> result` | `v7/A32/A64` | -| int16x8_t vrev64q_s16(int16x8_t vec) | `vec -> Vn.8H` | `REV64 Vd.8H,Vn.8H` | `Vd.8H -> result` | `v7/A32/A64` | -| int32x2_t vrev64_s32(int32x2_t vec) | `vec -> Vn.2S` | `REV64 Vd.2S,Vn.2S` | `Vd.2S -> result` | `v7/A32/A64` | -| int32x4_t vrev64q_s32(int32x4_t vec) | `vec -> Vn.4S` | `REV64 Vd.4S,Vn.4S` | `Vd.4S -> result` | `v7/A32/A64` | -| uint8x8_t vrev64_u8(uint8x8_t vec) | `vec -> Vn.8B` | `REV64 Vd.8B,Vn.8B` | `Vd.8B -> result` | `v7/A32/A64` | -| uint8x16_t vrev64q_u8(uint8x16_t vec) | `vec -> Vn.16B` | `REV64 Vd.16B,Vn.16B` | `Vd.16B -> result` | `v7/A32/A64` | -| uint16x4_t vrev64_u16(uint16x4_t vec) | `vec -> Vn.4H` | `REV64 Vd.4H,Vn.4H` | `Vd.4H -> result` | `v7/A32/A64` | -| uint16x8_t vrev64q_u16(uint16x8_t vec) | `vec -> Vn.8H` | `REV64 Vd.8H,Vn.8H` | `Vd.8H -> result` | `v7/A32/A64` | -| uint32x2_t vrev64_u32(uint32x2_t vec) | `vec -> Vn.2S` | `REV64 Vd.2S,Vn.2S` | `Vd.2S -> result` | `v7/A32/A64` | -| uint32x4_t vrev64q_u32(uint32x4_t vec) | `vec -> Vn.4S` | `REV64 Vd.4S,Vn.4S` | `Vd.4S -> result` | `v7/A32/A64` | -| float32x2_t vrev64_f32(float32x2_t vec) | `vec -> Vn.2S` | `REV64 Vd.2S,Vn.2S` | `Vd.2S -> result` | `v7/A32/A64` | -| float32x4_t vrev64q_f32(float32x4_t vec) | `vec -> Vn.4S` | `REV64 Vd.4S,Vn.4S` | `Vd.4S -> result` | `v7/A32/A64` | -| poly8x8_t vrev64_p8(poly8x8_t vec) | `vec -> Vn.8B` | `REV64 Vd.8B,Vn.8B` | `Vd.8B -> result` | `v7/A32/A64` | -| poly8x16_t vrev64q_p8(poly8x16_t vec) | `vec -> Vn.16B` | `REV64 Vd.16B,Vn.16B` | `Vd.16B -> result` | `v7/A32/A64` | -| poly16x4_t vrev64_p16(poly16x4_t vec) | `vec -> Vn.4H` | `REV64 Vd.4H,Vn.4H` | `Vd.4H -> result` | `v7/A32/A64` | -| poly16x8_t vrev64q_p16(poly16x8_t vec) | `vec -> Vn.8H` | `REV64 Vd.8H,Vn.8H` | `Vd.8H -> result` | `v7/A32/A64` | -| int8x8_t vrev32_s8(int8x8_t vec) | `vec -> Vn.8B` | `REV32 Vd.8B,Vn.8B` | `Vd.8B -> result` | `v7/A32/A64` | -| int8x16_t vrev32q_s8(int8x16_t vec) | `vec -> Vn.16B` | `REV32 Vd.16B,Vn.16B` | `Vd.16B -> result` | `v7/A32/A64` | -| int16x4_t vrev32_s16(int16x4_t vec) | `vec -> Vn.4H` | `REV32 Vd.4H,Vn.4H` | `Vd.4H -> result` | `v7/A32/A64` | -| int16x8_t vrev32q_s16(int16x8_t vec) | `vec -> Vn.8H` | `REV32 Vd.8H,Vn.8H` | `Vd.8H -> result` | `v7/A32/A64` | -| uint8x8_t vrev32_u8(uint8x8_t vec) | `vec -> Vn.8B` | `REV32 Vd.8B,Vn.8B` | `Vd.8B -> result` | `v7/A32/A64` | -| uint8x16_t vrev32q_u8(uint8x16_t vec) | `vec -> Vn.16B` | `REV32 Vd.16B,Vn.16B` | `Vd.16B -> result` | `v7/A32/A64` | -| uint16x4_t vrev32_u16(uint16x4_t vec) | `vec -> Vn.4H` | `REV32 Vd.4H,Vn.4H` | `Vd.4H -> result` | `v7/A32/A64` | -| uint16x8_t vrev32q_u16(uint16x8_t vec) | `vec -> Vn.8H` | `REV32 Vd.8H,Vn.8H` | `Vd.8H -> result` | `v7/A32/A64` | -| poly8x8_t vrev32_p8(poly8x8_t vec) | `vec -> Vn.8B` | `REV32 Vd.8B,Vn.8B` | `Vd.8B -> result` | `v7/A32/A64` | -| poly8x16_t vrev32q_p8(poly8x16_t vec) | `vec -> Vn.16B` | `REV32 Vd.16B,Vn.16B` | `Vd.16B -> result` | `v7/A32/A64` | -| poly16x4_t vrev32_p16(poly16x4_t vec) | `vec -> Vn.4H` | `REV32 Vd.4H,Vn.4H` | `Vd.4H -> result` | `v7/A32/A64` | -| poly16x8_t vrev32q_p16(poly16x8_t vec) | `vec -> Vn.8H` | `REV32 Vd.8H,Vn.8H` | `Vd.8H -> result` | `v7/A32/A64` | -| int8x8_t vrev16_s8(int8x8_t vec) | `vec -> Vn.8B` | `REV16 Vd.8B,Vn.8B` | `Vd.8B -> result` | `v7/A32/A64` | -| int8x16_t vrev16q_s8(int8x16_t vec) | `vec -> Vn.16B` | `REV16 Vd.16B,Vn.16B` | `Vd.16B -> result` | `v7/A32/A64` | -| uint8x8_t vrev16_u8(uint8x8_t vec) | `vec -> Vn.8B` | `REV16 Vd.8B,Vn.8B` | `Vd.8B -> result` | `v7/A32/A64` | -| uint8x16_t vrev16q_u8(uint8x16_t vec) | `vec -> Vn.16B` | `REV16 Vd.16B,Vn.16B` | `Vd.16B -> result` | `v7/A32/A64` | -| poly8x8_t vrev16_p8(poly8x8_t vec) | `vec -> Vn.8B` | `REV16 Vd.8B,Vn.8B` | `Vd.8B -> result` | `v7/A32/A64` | -| poly8x16_t vrev16q_p8(poly8x16_t vec) | `vec -> Vn.16B` | `REV16 Vd.16B,Vn.16B` | `Vd.16B -> result` | `v7/A32/A64` | +| Intrinsic | Argument preparation | AArch64 Instruction | Result | Supported architectures | +|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------|-----------------------|--------------------|---------------------------| +| int8x8_t vrev64_s8(int8x8_t vec) | `vec -> Vn.8B` | `REV64 Vd.8B,Vn.8B` | `Vd.8B -> result` | `v7/A32/A64` | +| int8x16_t vrev64q_s8(int8x16_t vec) | `vec -> Vn.16B` | `REV64 Vd.16B,Vn.16B` | `Vd.16B -> result` | `v7/A32/A64` | +| int16x4_t vrev64_s16(int16x4_t vec) | `vec -> Vn.4H` | `REV64 Vd.4H,Vn.4H` | `Vd.4H -> result` | `v7/A32/A64` | +| int16x8_t vrev64q_s16(int16x8_t vec) | `vec -> Vn.8H` | `REV64 Vd.8H,Vn.8H` | `Vd.8H -> result` | `v7/A32/A64` | +| int32x2_t vrev64_s32(int32x2_t vec) | `vec -> Vn.2S` | `REV64 Vd.2S,Vn.2S` | `Vd.2S -> result` | `v7/A32/A64` | +| int32x4_t vrev64q_s32(int32x4_t vec) | `vec -> Vn.4S` | `REV64 Vd.4S,Vn.4S` | `Vd.4S -> result` | `v7/A32/A64` | +| uint8x8_t vrev64_u8(uint8x8_t vec) | `vec -> Vn.8B` | `REV64 Vd.8B,Vn.8B` | `Vd.8B -> result` | `v7/A32/A64` | +| uint8x16_t vrev64q_u8(uint8x16_t vec) | `vec -> Vn.16B` | `REV64 Vd.16B,Vn.16B` | `Vd.16B -> result` | `v7/A32/A64` | +| uint16x4_t vrev64_u16(uint16x4_t vec) | `vec -> Vn.4H` | `REV64 Vd.4H,Vn.4H` | `Vd.4H -> result` | `v7/A32/A64` | +| uint16x8_t vrev64q_u16(uint16x8_t vec) | `vec -> Vn.8H` | `REV64 Vd.8H,Vn.8H` | `Vd.8H -> result` | `v7/A32/A64` | +| uint32x2_t vrev64_u32(uint32x2_t vec) | `vec -> Vn.2S` | `REV64 Vd.2S,Vn.2S` | `Vd.2S -> result` | `v7/A32/A64` | +| uint32x4_t vrev64q_u32(uint32x4_t vec) | `vec -> Vn.4S` | `REV64 Vd.4S,Vn.4S` | `Vd.4S -> result` | `v7/A32/A64` | +| float32x2_t vrev64_f32(float32x2_t vec) | `vec -> Vn.2S` | `REV64 Vd.2S,Vn.2S` | `Vd.2S -> result` | `v7/A32/A64` | +| float32x4_t vrev64q_f32(float32x4_t vec) | `vec -> Vn.4S` | `REV64 Vd.4S,Vn.4S` | `Vd.4S -> result` | `v7/A32/A64` | +| poly8x8_t vrev64_p8(poly8x8_t vec) | `vec -> Vn.8B` | `REV64 Vd.8B,Vn.8B` | `Vd.8B -> result` | `v7/A32/A64` | +| poly8x16_t vrev64q_p8(poly8x16_t vec) | `vec -> Vn.16B` | `REV64 Vd.16B,Vn.16B` | `Vd.16B -> result` | `v7/A32/A64` | +| poly16x4_t vrev64_p16(poly16x4_t vec) | `vec -> Vn.4H` | `REV64 Vd.4H,Vn.4H` | `Vd.4H -> result` | `v7/A32/A64` | +| poly16x8_t vrev64q_p16(poly16x8_t vec) | `vec -> Vn.8H` | `REV64 Vd.8H,Vn.8H` | `Vd.8H -> result` | `v7/A32/A64` | +| mfloat8x8_t vrev64_mf8(mfloat8x8_t vec) | `vec -> Vn.8B` | `REV64 Vd.8B,Vn.8B` | `Vd.8B -> result` | `A64` | +| mfloat8x16_t vrev64q_mf8(mfloat8x16_t vec) | `vec -> Vn.16B` | `REV64 Vd.16B,Vn.16B` | `Vd.16B -> result` | `A64` | +| int8x8_t vrev32_s8(int8x8_t vec) | `vec -> Vn.8B` | `REV32 Vd.8B,Vn.8B` | `Vd.8B -> result` | `v7/A32/A64` | +| int8x16_t vrev32q_s8(int8x16_t vec) | `vec -> Vn.16B` | `REV32 Vd.16B,Vn.16B` | `Vd.16B -> result` | `v7/A32/A64` | +| int16x4_t vrev32_s16(int16x4_t vec) | `vec -> Vn.4H` | `REV32 Vd.4H,Vn.4H` | `Vd.4H -> result` | `v7/A32/A64` | +| int16x8_t vrev32q_s16(int16x8_t vec) | `vec -> Vn.8H` | `REV32 Vd.8H,Vn.8H` | `Vd.8H -> result` | `v7/A32/A64` | +| uint8x8_t vrev32_u8(uint8x8_t vec) | `vec -> Vn.8B` | `REV32 Vd.8B,Vn.8B` | `Vd.8B -> result` | `v7/A32/A64` | +| uint8x16_t vrev32q_u8(uint8x16_t vec) | `vec -> Vn.16B` | `REV32 Vd.16B,Vn.16B` | `Vd.16B -> result` | `v7/A32/A64` | +| uint16x4_t vrev32_u16(uint16x4_t vec) | `vec -> Vn.4H` | `REV32 Vd.4H,Vn.4H` | `Vd.4H -> result` | `v7/A32/A64` | +| uint16x8_t vrev32q_u16(uint16x8_t vec) | `vec -> Vn.8H` | `REV32 Vd.8H,Vn.8H` | `Vd.8H -> result` | `v7/A32/A64` | +| poly8x8_t vrev32_p8(poly8x8_t vec) | `vec -> Vn.8B` | `REV32 Vd.8B,Vn.8B` | `Vd.8B -> result` | `v7/A32/A64` | +| poly8x16_t vrev32q_p8(poly8x16_t vec) | `vec -> Vn.16B` | `REV32 Vd.16B,Vn.16B` | `Vd.16B -> result` | `v7/A32/A64` | +| poly16x4_t vrev32_p16(poly16x4_t vec) | `vec -> Vn.4H` | `REV32 Vd.4H,Vn.4H` | `Vd.4H -> result` | `v7/A32/A64` | +| poly16x8_t vrev32q_p16(poly16x8_t vec) | `vec -> Vn.8H` | `REV32 Vd.8H,Vn.8H` | `Vd.8H -> result` | `v7/A32/A64` | +| mfloat8x8_t vrev32_mf8(mfloat8x8_t vec) | `vec -> Vn.8B` | `REV32 Vd.8B,Vn.8B` | `Vd.8B -> result` | `A64` | +| mfloat8x16_t vrev32q_mf8(mfloat8x16_t vec) | `vec -> Vn.16B` | `REV32 Vd.16B,Vn.16B` | `Vd.16B -> result` | `A64` | +| int8x8_t vrev16_s8(int8x8_t vec) | `vec -> Vn.8B` | `REV16 Vd.8B,Vn.8B` | `Vd.8B -> result` | `v7/A32/A64` | +| int8x16_t vrev16q_s8(int8x16_t vec) | `vec -> Vn.16B` | `REV16 Vd.16B,Vn.16B` | `Vd.16B -> result` | `v7/A32/A64` | +| uint8x8_t vrev16_u8(uint8x8_t vec) | `vec -> Vn.8B` | `REV16 Vd.8B,Vn.8B` | `Vd.8B -> result` | `v7/A32/A64` | +| uint8x16_t vrev16q_u8(uint8x16_t vec) | `vec -> Vn.16B` | `REV16 Vd.16B,Vn.16B` | `Vd.16B -> result` | `v7/A32/A64` | +| poly8x8_t vrev16_p8(poly8x8_t vec) | `vec -> Vn.8B` | `REV16 Vd.8B,Vn.8B` | `Vd.8B -> result` | `v7/A32/A64` | +| poly8x16_t vrev16q_p8(poly8x16_t vec) | `vec -> Vn.16B` | `REV16 Vd.16B,Vn.16B` | `Vd.16B -> result` | `v7/A32/A64` | +| mfloat8x8_t vrev16_mf8(mfloat8x8_t vec) | `vec -> Vn.8B` | `REV16 Vd.8B,Vn.8B` | `Vd.8B -> result` | `A64` | +| mfloat8x16_t vrev16q_mf8(mfloat8x16_t vec) | `vec -> Vn.16B` | `REV16 Vd.16B,Vn.16B` | `Vd.16B -> result` | `A64` | #### Zip elements -| Intrinsic | Argument preparation | AArch64 Instruction | Result | Supported architectures | -|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------|--------------------------------------------------------------|----------------------------------------------------------|---------------------------| -| int8x8_t vzip1_s8(
     int8x8_t a,
     int8x8_t b)
| `a -> Vn.8B`
`b -> Vm.8B` | `ZIP1 Vd.8B,Vn.8B,Vm.8B` | `Vd.8B -> result` | `A64` | -| int8x16_t vzip1q_s8(
     int8x16_t a,
     int8x16_t b)
| `a -> Vn.16B`
`b -> Vm.16B` | `ZIP1 Vd.16B,Vn.16B,Vm.16B` | `Vd.16B -> result` | `A64` | -| int16x4_t vzip1_s16(
     int16x4_t a,
     int16x4_t b)
| `a -> Vn.4H`
`b -> Vm.4H` | `ZIP1 Vd.4H,Vn.4H,Vm.4H` | `Vd.4H -> result` | `A64` | -| int16x8_t vzip1q_s16(
     int16x8_t a,
     int16x8_t b)
| `a -> Vn.8H`
`b -> Vm.8H` | `ZIP1 Vd.8H,Vn.8H,Vm.8H` | `Vd.8H -> result` | `A64` | -| int32x2_t vzip1_s32(
     int32x2_t a,
     int32x2_t b)
| `a -> Vn.2S`
`b -> Vm.2S` | `ZIP1 Vd.2S,Vn.2S,Vm.2S` | `Vd.2S -> result` | `A64` | -| int32x4_t vzip1q_s32(
     int32x4_t a,
     int32x4_t b)
| `a -> Vn.4S`
`b -> Vm.4S` | `ZIP1 Vd.4S,Vn.4S,Vm.4S` | `Vd.4S -> result` | `A64` | -| int64x2_t vzip1q_s64(
     int64x2_t a,
     int64x2_t b)
| `a -> Vn.2D`
`b -> Vm.2D` | `ZIP1 Vd.2D,Vn.2D,Vm.2D` | `Vd.2D -> result` | `A64` | -| uint8x8_t vzip1_u8(
     uint8x8_t a,
     uint8x8_t b)
| `a -> Vn.8B`
`b -> Vm.8B` | `ZIP1 Vd.8B,Vn.8B,Vm.8B` | `Vd.8B -> result` | `A64` | -| uint8x16_t vzip1q_u8(
     uint8x16_t a,
     uint8x16_t b)
| `a -> Vn.16B`
`b -> Vm.16B` | `ZIP1 Vd.16B,Vn.16B,Vm.16B` | `Vd.16B -> result` | `A64` | -| uint16x4_t vzip1_u16(
     uint16x4_t a,
     uint16x4_t b)
| `a -> Vn.4H`
`b -> Vm.4H` | `ZIP1 Vd.4H,Vn.4H,Vm.4H` | `Vd.4H -> result` | `A64` | -| uint16x8_t vzip1q_u16(
     uint16x8_t a,
     uint16x8_t b)
| `a -> Vn.8H`
`b -> Vm.8H` | `ZIP1 Vd.8H,Vn.8H,Vm.8H` | `Vd.8H -> result` | `A64` | -| uint32x2_t vzip1_u32(
     uint32x2_t a,
     uint32x2_t b)
| `a -> Vn.2S`
`b -> Vm.2S` | `ZIP1 Vd.2S,Vn.2S,Vm.2S` | `Vd.2S -> result` | `A64` | -| uint32x4_t vzip1q_u32(
     uint32x4_t a,
     uint32x4_t b)
| `a -> Vn.4S`
`b -> Vm.4S` | `ZIP1 Vd.4S,Vn.4S,Vm.4S` | `Vd.4S -> result` | `A64` | -| uint64x2_t vzip1q_u64(
     uint64x2_t a,
     uint64x2_t b)
| `a -> Vn.2D`
`b -> Vm.2D` | `ZIP1 Vd.2D,Vn.2D,Vm.2D` | `Vd.2D -> result` | `A64` | -| poly64x2_t vzip1q_p64(
     poly64x2_t a,
     poly64x2_t b)
| `a -> Vn.2D`
`b -> Vm.2D` | `ZIP1 Vd.2D,Vn.2D,Vm.2D` | `Vd.2D -> result` | `A64` | -| float32x2_t vzip1_f32(
     float32x2_t a,
     float32x2_t b)
| `a -> Vn.2S`
`b -> Vm.2S` | `ZIP1 Vd.2S,Vn.2S,Vm.2S` | `Vd.2S -> result` | `A64` | -| float32x4_t vzip1q_f32(
     float32x4_t a,
     float32x4_t b)
| `a -> Vn.4S`
`b -> Vm.4S` | `ZIP1 Vd.4S,Vn.4S,Vm.4S` | `Vd.4S -> result` | `A64` | -| float64x2_t vzip1q_f64(
     float64x2_t a,
     float64x2_t b)
| `a -> Vn.2D`
`b -> Vm.2D` | `ZIP1 Vd.2D,Vn.2D,Vm.2D` | `Vd.2D -> result` | `A64` | -| poly8x8_t vzip1_p8(
     poly8x8_t a,
     poly8x8_t b)
| `a -> Vn.8B`
`b -> Vm.8B` | `ZIP1 Vd.8B,Vn.8B,Vm.8B` | `Vd.8B -> result` | `A64` | -| poly8x16_t vzip1q_p8(
     poly8x16_t a,
     poly8x16_t b)
| `a -> Vn.16B`
`b -> Vm.16B` | `ZIP1 Vd.16B,Vn.16B,Vm.16B` | `Vd.16B -> result` | `A64` | -| poly16x4_t vzip1_p16(
     poly16x4_t a,
     poly16x4_t b)
| `a -> Vn.4H`
`b -> Vm.4H` | `ZIP1 Vd.4H,Vn.4H,Vm.4H` | `Vd.4H -> result` | `A64` | -| poly16x8_t vzip1q_p16(
     poly16x8_t a,
     poly16x8_t b)
| `a -> Vn.8H`
`b -> Vm.8H` | `ZIP1 Vd.8H,Vn.8H,Vm.8H` | `Vd.8H -> result` | `A64` | -| int8x8_t vzip2_s8(
     int8x8_t a,
     int8x8_t b)
| `a -> Vn.8B`
`b -> Vm.8B` | `ZIP2 Vd.8B,Vn.8B,Vm.8B` | `Vd.8B -> result` | `A64` | -| int8x16_t vzip2q_s8(
     int8x16_t a,
     int8x16_t b)
| `a -> Vn.16B`
`b -> Vm.16B` | `ZIP2 Vd.16B,Vn.16B,Vm.16B` | `Vd.16B -> result` | `A64` | -| int16x4_t vzip2_s16(
     int16x4_t a,
     int16x4_t b)
| `a -> Vn.4H`
`b -> Vm.4H` | `ZIP2 Vd.4H,Vn.4H,Vm.4H` | `Vd.4H -> result` | `A64` | -| int16x8_t vzip2q_s16(
     int16x8_t a,
     int16x8_t b)
| `a -> Vn.8H`
`b -> Vm.8H` | `ZIP2 Vd.8H,Vn.8H,Vm.8H` | `Vd.8H -> result` | `A64` | -| int32x2_t vzip2_s32(
     int32x2_t a,
     int32x2_t b)
| `a -> Vn.2S`
`b -> Vm.2S` | `ZIP2 Vd.2S,Vn.2S,Vm.2S` | `Vd.2S -> result` | `A64` | -| int32x4_t vzip2q_s32(
     int32x4_t a,
     int32x4_t b)
| `a -> Vn.4S`
`b -> Vm.4S` | `ZIP2 Vd.4S,Vn.4S,Vm.4S` | `Vd.4S -> result` | `A64` | -| int64x2_t vzip2q_s64(
     int64x2_t a,
     int64x2_t b)
| `a -> Vn.2D`
`b -> Vm.2D` | `ZIP2 Vd.2D,Vn.2D,Vm.2D` | `Vd.2D -> result` | `A64` | -| uint8x8_t vzip2_u8(
     uint8x8_t a,
     uint8x8_t b)
| `a -> Vn.8B`
`b -> Vm.8B` | `ZIP2 Vd.8B,Vn.8B,Vm.8B` | `Vd.8B -> result` | `A64` | -| uint8x16_t vzip2q_u8(
     uint8x16_t a,
     uint8x16_t b)
| `a -> Vn.16B`
`b -> Vm.16B` | `ZIP2 Vd.16B,Vn.16B,Vm.16B` | `Vd.16B -> result` | `A64` | -| uint16x4_t vzip2_u16(
     uint16x4_t a,
     uint16x4_t b)
| `a -> Vn.4H`
`b -> Vm.4H` | `ZIP2 Vd.4H,Vn.4H,Vm.4H` | `Vd.4H -> result` | `A64` | -| uint16x8_t vzip2q_u16(
     uint16x8_t a,
     uint16x8_t b)
| `a -> Vn.8H`
`b -> Vm.8H` | `ZIP2 Vd.8H,Vn.8H,Vm.8H` | `Vd.8H -> result` | `A64` | -| uint32x2_t vzip2_u32(
     uint32x2_t a,
     uint32x2_t b)
| `a -> Vn.2S`
`b -> Vm.2S` | `ZIP2 Vd.2S,Vn.2S,Vm.2S` | `Vd.2S -> result` | `A64` | -| uint32x4_t vzip2q_u32(
     uint32x4_t a,
     uint32x4_t b)
| `a -> Vn.4S`
`b -> Vm.4S` | `ZIP2 Vd.4S,Vn.4S,Vm.4S` | `Vd.4S -> result` | `A64` | -| uint64x2_t vzip2q_u64(
     uint64x2_t a,
     uint64x2_t b)
| `a -> Vn.2D`
`b -> Vm.2D` | `ZIP2 Vd.2D,Vn.2D,Vm.2D` | `Vd.2D -> result` | `A64` | -| poly64x2_t vzip2q_p64(
     poly64x2_t a,
     poly64x2_t b)
| `a -> Vn.2D`
`b -> Vm.2D` | `ZIP2 Vd.2D,Vn.2D,Vm.2D` | `Vd.2D -> result` | `A64` | -| float32x2_t vzip2_f32(
     float32x2_t a,
     float32x2_t b)
| `a -> Vn.2S`
`b -> Vm.2S` | `ZIP2 Vd.2S,Vn.2S,Vm.2S` | `Vd.2S -> result` | `A64` | -| float32x4_t vzip2q_f32(
     float32x4_t a,
     float32x4_t b)
| `a -> Vn.4S`
`b -> Vm.4S` | `ZIP2 Vd.4S,Vn.4S,Vm.4S` | `Vd.4S -> result` | `A64` | -| float64x2_t vzip2q_f64(
     float64x2_t a,
     float64x2_t b)
| `a -> Vn.2D`
`b -> Vm.2D` | `ZIP2 Vd.2D,Vn.2D,Vm.2D` | `Vd.2D -> result` | `A64` | -| poly8x8_t vzip2_p8(
     poly8x8_t a,
     poly8x8_t b)
| `a -> Vn.8B`
`b -> Vm.8B` | `ZIP2 Vd.8B,Vn.8B,Vm.8B` | `Vd.8B -> result` | `A64` | -| poly8x16_t vzip2q_p8(
     poly8x16_t a,
     poly8x16_t b)
| `a -> Vn.16B`
`b -> Vm.16B` | `ZIP2 Vd.16B,Vn.16B,Vm.16B` | `Vd.16B -> result` | `A64` | -| poly16x4_t vzip2_p16(
     poly16x4_t a,
     poly16x4_t b)
| `a -> Vn.4H`
`b -> Vm.4H` | `ZIP2 Vd.4H,Vn.4H,Vm.4H` | `Vd.4H -> result` | `A64` | -| poly16x8_t vzip2q_p16(
     poly16x8_t a,
     poly16x8_t b)
| `a -> Vn.8H`
`b -> Vm.8H` | `ZIP2 Vd.8H,Vn.8H,Vm.8H` | `Vd.8H -> result` | `A64` | -| int8x8x2_t vzip_s8(
     int8x8_t a,
     int8x8_t b)
| `a -> Vn.8B`
`b -> Vm.8B` | `ZIP1 Vd1.8B,Vn.8B,Vm.8B`
`ZIP2 Vd2.8B,Vn.8B,Vm.8B` | `Vd1.8B -> result.val[0]`
`Vd2.8B -> result.val[1]` | `v7/A32/A64` | -| int16x4x2_t vzip_s16(
     int16x4_t a,
     int16x4_t b)
| `a -> Vn.4H`
`b -> Vm.4H` | `ZIP1 Vd1.4H,Vn.4H,Vm.4H`
`ZIP2 Vd2.4H,Vn.4H,Vm.4H` | `Vd1.4H -> result.val[0]`
`Vd2.4H -> result.val[1]` | `v7/A32/A64` | -| uint8x8x2_t vzip_u8(
     uint8x8_t a,
     uint8x8_t b)
| `a -> Vn.8B`
`b -> Vm.8B` | `ZIP1 Vd1.8B,Vn.8B,Vm.8B`
`ZIP2 Vd2.8B,Vn.8B,Vm.8B` | `Vd1.8B -> result.val[0]`
`Vd2.8B -> result.val[1]` | `v7/A32/A64` | -| uint16x4x2_t vzip_u16(
     uint16x4_t a,
     uint16x4_t b)
| `a -> Vn.4H`
`b -> Vm.4H` | `ZIP1 Vd1.4H,Vn.4H,Vm.4H`
`ZIP2 Vd2.4H,Vn.4H,Vm.4H` | `Vd1.4H -> result.val[0]`
`Vd2.4H -> result.val[1]` | `v7/A32/A64` | -| poly8x8x2_t vzip_p8(
     poly8x8_t a,
     poly8x8_t b)
| `a -> Vn.8B`
`b -> Vm.8B` | `ZIP1 Vd1.8B,Vn.8B,Vm.8B`
`ZIP2 Vd2.8B,Vn.8B,Vm.8B` | `Vd1.8B -> result.val[0]`
`Vd2.8B -> result.val[1]` | `v7/A32/A64` | -| poly16x4x2_t vzip_p16(
     poly16x4_t a,
     poly16x4_t b)
| `a -> Vn.4H`
`b -> Vm.4H` | `ZIP1 Vd1.4H,Vn.4H,Vm.4H`
`ZIP2 Vd2.4H,Vn.4H,Vm.4H` | `Vd1.4H -> result.val[0]`
`Vd2.4H -> result.val[1]` | `v7/A32/A64` | -| int32x2x2_t vzip_s32(
     int32x2_t a,
     int32x2_t b)
| `a -> Vn.2S`
`b -> Vm.2S` | `ZIP1 Vd1.2S,Vn.2S,Vm.2S`
`ZIP2 Vd2.2S,Vn.2S,Vm.2S` | `Vd1.2S -> result.val[0]`
`Vd2.2S -> result.val[1]` | `v7/A32/A64` | -| float32x2x2_t vzip_f32(
     float32x2_t a,
     float32x2_t b)
| `a -> Vn.2S`
`b -> Vm.2S` | `ZIP1 Vd1.2S,Vn.2S,Vm.2S`
`ZIP2 Vd2.2S,Vn.2S,Vm.2S` | `Vd1.2S -> result.val[0]`
`Vd2.2S -> result.val[1]` | `v7/A32/A64` | -| uint32x2x2_t vzip_u32(
     uint32x2_t a,
     uint32x2_t b)
| `a -> Vn.2S`
`b -> Vm.2S` | `ZIP1 Vd1.2S,Vn.2S,Vm.2S`
`ZIP2 Vd2.2S,Vn.2S,Vm.2S` | `Vd1.2S -> result.val[0]`
`Vd2.2S -> result.val[1]` | `v7/A32/A64` | -| int8x16x2_t vzipq_s8(
     int8x16_t a,
     int8x16_t b)
| `a -> Vn.16B`
`b -> Vm.16B` | `ZIP1 Vd1.16B,Vn.16B,Vm.16B`
`ZIP2 Vd2.16B,Vn.16B,Vm.16B` | `Vd1.16B -> result.val[0]`
`Vd2.16B -> result.val[1]` | `v7/A32/A64` | -| int16x8x2_t vzipq_s16(
     int16x8_t a,
     int16x8_t b)
| `a -> Vn.8H`
`b -> Vm.8H` | `ZIP1 Vd1.8H,Vn.8H,Vm.8H`
`ZIP2 Vd2.8H,Vn.8H,Vm.8H` | `Vd1.8H -> result.val[0]`
`Vd2.8H -> result.val[1]` | `v7/A32/A64` | -| int32x4x2_t vzipq_s32(
     int32x4_t a,
     int32x4_t b)
| `a -> Vn.4S`
`b -> Vm.4S` | `ZIP1 Vd1.4S,Vn.4S,Vm.4S`
`ZIP2 Vd2.4S,Vn.4S,Vm.4S` | `Vd1.4S -> result.val[0]`
`Vd2.4S -> result.val[1]` | `v7/A32/A64` | -| float32x4x2_t vzipq_f32(
     float32x4_t a,
     float32x4_t b)
| `a -> Vn.4S`
`b -> Vm.4S` | `ZIP1 Vd1.4S,Vn.4S,Vm.4S`
`ZIP2 Vd2.4S,Vn.4S,Vm.4S` | `Vd1.4S -> result.val[0]`
`Vd2.4S -> result.val[1]` | `v7/A32/A64` | -| uint8x16x2_t vzipq_u8(
     uint8x16_t a,
     uint8x16_t b)
| `a -> Vn.16B`
`b -> Vm.16B` | `ZIP1 Vd1.16B,Vn.16B,Vm.16B`
`ZIP2 Vd2.16B,Vn.16B,Vm.16B` | `Vd1.16B -> result.val[0]`
`Vd2.16B -> result.val[1]` | `v7/A32/A64` | -| uint16x8x2_t vzipq_u16(
     uint16x8_t a,
     uint16x8_t b)
| `a -> Vn.8H`
`b -> Vm.8H` | `ZIP1 Vd1.8H,Vn.8H,Vm.8H`
`ZIP2 Vd2.8H,Vn.8H,Vm.8H` | `Vd1.8H -> result.val[0]`
`Vd2.8H -> result.val[1]` | `v7/A32/A64` | -| uint32x4x2_t vzipq_u32(
     uint32x4_t a,
     uint32x4_t b)
| `a -> Vn.4S`
`b -> Vm.4S` | `ZIP1 Vd1.4S,Vn.4S,Vm.4S`
`ZIP2 Vd2.4S,Vn.4S,Vm.4S` | `Vd1.4S -> result.val[0]`
`Vd2.4S -> result.val[1]` | `v7/A32/A64` | -| poly8x16x2_t vzipq_p8(
     poly8x16_t a,
     poly8x16_t b)
| `a -> Vn.16B`
`b -> Vm.16B` | `ZIP1 Vd1.16B,Vn.16B,Vm.16B`
`ZIP2 Vd2.16B,Vn.16B,Vm.16B` | `Vd1.16B -> result.val[0]`
`Vd2.16B -> result.val[1]` | `v7/A32/A64` | -| poly16x8x2_t vzipq_p16(
     poly16x8_t a,
     poly16x8_t b)
| `a -> Vn.8H`
`b -> Vm.8H` | `ZIP1 Vd1.8H,Vn.8H,Vm.8H`
`ZIP2 Vd2.8H,Vn.8H,Vm.8H` | `Vd1.8H -> result.val[0]`
`Vd2.8H -> result.val[1]` | `v7/A32/A64` | +| Intrinsic | Argument preparation | AArch64 Instruction | Result | Supported architectures | +|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------|--------------------------------------------------------------|----------------------------------------------------------|---------------------------| +| int8x8_t vzip1_s8(
     int8x8_t a,
     int8x8_t b)
| `a -> Vn.8B`
`b -> Vm.8B` | `ZIP1 Vd.8B,Vn.8B,Vm.8B` | `Vd.8B -> result` | `A64` | +| int8x16_t vzip1q_s8(
     int8x16_t a,
     int8x16_t b)
| `a -> Vn.16B`
`b -> Vm.16B` | `ZIP1 Vd.16B,Vn.16B,Vm.16B` | `Vd.16B -> result` | `A64` | +| int16x4_t vzip1_s16(
     int16x4_t a,
     int16x4_t b)
| `a -> Vn.4H`
`b -> Vm.4H` | `ZIP1 Vd.4H,Vn.4H,Vm.4H` | `Vd.4H -> result` | `A64` | +| int16x8_t vzip1q_s16(
     int16x8_t a,
     int16x8_t b)
| `a -> Vn.8H`
`b -> Vm.8H` | `ZIP1 Vd.8H,Vn.8H,Vm.8H` | `Vd.8H -> result` | `A64` | +| int32x2_t vzip1_s32(
     int32x2_t a,
     int32x2_t b)
| `a -> Vn.2S`
`b -> Vm.2S` | `ZIP1 Vd.2S,Vn.2S,Vm.2S` | `Vd.2S -> result` | `A64` | +| int32x4_t vzip1q_s32(
     int32x4_t a,
     int32x4_t b)
| `a -> Vn.4S`
`b -> Vm.4S` | `ZIP1 Vd.4S,Vn.4S,Vm.4S` | `Vd.4S -> result` | `A64` | +| int64x2_t vzip1q_s64(
     int64x2_t a,
     int64x2_t b)
| `a -> Vn.2D`
`b -> Vm.2D` | `ZIP1 Vd.2D,Vn.2D,Vm.2D` | `Vd.2D -> result` | `A64` | +| uint8x8_t vzip1_u8(
     uint8x8_t a,
     uint8x8_t b)
| `a -> Vn.8B`
`b -> Vm.8B` | `ZIP1 Vd.8B,Vn.8B,Vm.8B` | `Vd.8B -> result` | `A64` | +| uint8x16_t vzip1q_u8(
     uint8x16_t a,
     uint8x16_t b)
| `a -> Vn.16B`
`b -> Vm.16B` | `ZIP1 Vd.16B,Vn.16B,Vm.16B` | `Vd.16B -> result` | `A64` | +| uint16x4_t vzip1_u16(
     uint16x4_t a,
     uint16x4_t b)
| `a -> Vn.4H`
`b -> Vm.4H` | `ZIP1 Vd.4H,Vn.4H,Vm.4H` | `Vd.4H -> result` | `A64` | +| uint16x8_t vzip1q_u16(
     uint16x8_t a,
     uint16x8_t b)
| `a -> Vn.8H`
`b -> Vm.8H` | `ZIP1 Vd.8H,Vn.8H,Vm.8H` | `Vd.8H -> result` | `A64` | +| uint32x2_t vzip1_u32(
     uint32x2_t a,
     uint32x2_t b)
| `a -> Vn.2S`
`b -> Vm.2S` | `ZIP1 Vd.2S,Vn.2S,Vm.2S` | `Vd.2S -> result` | `A64` | +| uint32x4_t vzip1q_u32(
     uint32x4_t a,
     uint32x4_t b)
| `a -> Vn.4S`
`b -> Vm.4S` | `ZIP1 Vd.4S,Vn.4S,Vm.4S` | `Vd.4S -> result` | `A64` | +| uint64x2_t vzip1q_u64(
     uint64x2_t a,
     uint64x2_t b)
| `a -> Vn.2D`
`b -> Vm.2D` | `ZIP1 Vd.2D,Vn.2D,Vm.2D` | `Vd.2D -> result` | `A64` | +| poly64x2_t vzip1q_p64(
     poly64x2_t a,
     poly64x2_t b)
| `a -> Vn.2D`
`b -> Vm.2D` | `ZIP1 Vd.2D,Vn.2D,Vm.2D` | `Vd.2D -> result` | `A64` | +| float32x2_t vzip1_f32(
     float32x2_t a,
     float32x2_t b)
| `a -> Vn.2S`
`b -> Vm.2S` | `ZIP1 Vd.2S,Vn.2S,Vm.2S` | `Vd.2S -> result` | `A64` | +| float32x4_t vzip1q_f32(
     float32x4_t a,
     float32x4_t b)
| `a -> Vn.4S`
`b -> Vm.4S` | `ZIP1 Vd.4S,Vn.4S,Vm.4S` | `Vd.4S -> result` | `A64` | +| float64x2_t vzip1q_f64(
     float64x2_t a,
     float64x2_t b)
| `a -> Vn.2D`
`b -> Vm.2D` | `ZIP1 Vd.2D,Vn.2D,Vm.2D` | `Vd.2D -> result` | `A64` | +| poly8x8_t vzip1_p8(
     poly8x8_t a,
     poly8x8_t b)
| `a -> Vn.8B`
`b -> Vm.8B` | `ZIP1 Vd.8B,Vn.8B,Vm.8B` | `Vd.8B -> result` | `A64` | +| poly8x16_t vzip1q_p8(
     poly8x16_t a,
     poly8x16_t b)
| `a -> Vn.16B`
`b -> Vm.16B` | `ZIP1 Vd.16B,Vn.16B,Vm.16B` | `Vd.16B -> result` | `A64` | +| poly16x4_t vzip1_p16(
     poly16x4_t a,
     poly16x4_t b)
| `a -> Vn.4H`
`b -> Vm.4H` | `ZIP1 Vd.4H,Vn.4H,Vm.4H` | `Vd.4H -> result` | `A64` | +| poly16x8_t vzip1q_p16(
     poly16x8_t a,
     poly16x8_t b)
| `a -> Vn.8H`
`b -> Vm.8H` | `ZIP1 Vd.8H,Vn.8H,Vm.8H` | `Vd.8H -> result` | `A64` | +| mfloat8x8_t vzip1_mf8(
     mfloat8x8_t a,
     mfloat8x8_t b)
| `a -> Vn.8B`
`b -> Vm.8B` | `ZIP1 Vd.8B,Vn.8B,Vm.8B` | `Vd.8B -> result` | `A64` | +| mfloat8x16_t vzip1q_mf8(
     mfloat8x16_t a,
     mfloat8x16_t b)
| `a -> Vn.16B`
`b -> Vm.16B` | `ZIP1 Vd.16B,Vn.16B,Vm.16B` | `Vd.16B -> result` | `A64` | +| int8x8_t vzip2_s8(
     int8x8_t a,
     int8x8_t b)
| `a -> Vn.8B`
`b -> Vm.8B` | `ZIP2 Vd.8B,Vn.8B,Vm.8B` | `Vd.8B -> result` | `A64` | +| int8x16_t vzip2q_s8(
     int8x16_t a,
     int8x16_t b)
| `a -> Vn.16B`
`b -> Vm.16B` | `ZIP2 Vd.16B,Vn.16B,Vm.16B` | `Vd.16B -> result` | `A64` | +| int16x4_t vzip2_s16(
     int16x4_t a,
     int16x4_t b)
| `a -> Vn.4H`
`b -> Vm.4H` | `ZIP2 Vd.4H,Vn.4H,Vm.4H` | `Vd.4H -> result` | `A64` | +| int16x8_t vzip2q_s16(
     int16x8_t a,
     int16x8_t b)
| `a -> Vn.8H`
`b -> Vm.8H` | `ZIP2 Vd.8H,Vn.8H,Vm.8H` | `Vd.8H -> result` | `A64` | +| int32x2_t vzip2_s32(
     int32x2_t a,
     int32x2_t b)
| `a -> Vn.2S`
`b -> Vm.2S` | `ZIP2 Vd.2S,Vn.2S,Vm.2S` | `Vd.2S -> result` | `A64` | +| int32x4_t vzip2q_s32(
     int32x4_t a,
     int32x4_t b)
| `a -> Vn.4S`
`b -> Vm.4S` | `ZIP2 Vd.4S,Vn.4S,Vm.4S` | `Vd.4S -> result` | `A64` | +| int64x2_t vzip2q_s64(
     int64x2_t a,
     int64x2_t b)
| `a -> Vn.2D`
`b -> Vm.2D` | `ZIP2 Vd.2D,Vn.2D,Vm.2D` | `Vd.2D -> result` | `A64` | +| uint8x8_t vzip2_u8(
     uint8x8_t a,
     uint8x8_t b)
| `a -> Vn.8B`
`b -> Vm.8B` | `ZIP2 Vd.8B,Vn.8B,Vm.8B` | `Vd.8B -> result` | `A64` | +| uint8x16_t vzip2q_u8(
     uint8x16_t a,
     uint8x16_t b)
| `a -> Vn.16B`
`b -> Vm.16B` | `ZIP2 Vd.16B,Vn.16B,Vm.16B` | `Vd.16B -> result` | `A64` | +| uint16x4_t vzip2_u16(
     uint16x4_t a,
     uint16x4_t b)
| `a -> Vn.4H`
`b -> Vm.4H` | `ZIP2 Vd.4H,Vn.4H,Vm.4H` | `Vd.4H -> result` | `A64` | +| uint16x8_t vzip2q_u16(
     uint16x8_t a,
     uint16x8_t b)
| `a -> Vn.8H`
`b -> Vm.8H` | `ZIP2 Vd.8H,Vn.8H,Vm.8H` | `Vd.8H -> result` | `A64` | +| uint32x2_t vzip2_u32(
     uint32x2_t a,
     uint32x2_t b)
| `a -> Vn.2S`
`b -> Vm.2S` | `ZIP2 Vd.2S,Vn.2S,Vm.2S` | `Vd.2S -> result` | `A64` | +| uint32x4_t vzip2q_u32(
     uint32x4_t a,
     uint32x4_t b)
| `a -> Vn.4S`
`b -> Vm.4S` | `ZIP2 Vd.4S,Vn.4S,Vm.4S` | `Vd.4S -> result` | `A64` | +| uint64x2_t vzip2q_u64(
     uint64x2_t a,
     uint64x2_t b)
| `a -> Vn.2D`
`b -> Vm.2D` | `ZIP2 Vd.2D,Vn.2D,Vm.2D` | `Vd.2D -> result` | `A64` | +| poly64x2_t vzip2q_p64(
     poly64x2_t a,
     poly64x2_t b)
| `a -> Vn.2D`
`b -> Vm.2D` | `ZIP2 Vd.2D,Vn.2D,Vm.2D` | `Vd.2D -> result` | `A64` | +| float32x2_t vzip2_f32(
     float32x2_t a,
     float32x2_t b)
| `a -> Vn.2S`
`b -> Vm.2S` | `ZIP2 Vd.2S,Vn.2S,Vm.2S` | `Vd.2S -> result` | `A64` | +| float32x4_t vzip2q_f32(
     float32x4_t a,
     float32x4_t b)
| `a -> Vn.4S`
`b -> Vm.4S` | `ZIP2 Vd.4S,Vn.4S,Vm.4S` | `Vd.4S -> result` | `A64` | +| float64x2_t vzip2q_f64(
     float64x2_t a,
     float64x2_t b)
| `a -> Vn.2D`
`b -> Vm.2D` | `ZIP2 Vd.2D,Vn.2D,Vm.2D` | `Vd.2D -> result` | `A64` | +| poly8x8_t vzip2_p8(
     poly8x8_t a,
     poly8x8_t b)
| `a -> Vn.8B`
`b -> Vm.8B` | `ZIP2 Vd.8B,Vn.8B,Vm.8B` | `Vd.8B -> result` | `A64` | +| poly8x16_t vzip2q_p8(
     poly8x16_t a,
     poly8x16_t b)
| `a -> Vn.16B`
`b -> Vm.16B` | `ZIP2 Vd.16B,Vn.16B,Vm.16B` | `Vd.16B -> result` | `A64` | +| poly16x4_t vzip2_p16(
     poly16x4_t a,
     poly16x4_t b)
| `a -> Vn.4H`
`b -> Vm.4H` | `ZIP2 Vd.4H,Vn.4H,Vm.4H` | `Vd.4H -> result` | `A64` | +| poly16x8_t vzip2q_p16(
     poly16x8_t a,
     poly16x8_t b)
| `a -> Vn.8H`
`b -> Vm.8H` | `ZIP2 Vd.8H,Vn.8H,Vm.8H` | `Vd.8H -> result` | `A64` | +| mfloat8x8_t vzip2_mf8(
     mfloat8x8_t a,
     mfloat8x8_t b)
| `a -> Vn.8B`
`b -> Vm.8B` | `ZIP2 Vd.8B,Vn.8B,Vm.8B` | `Vd.8B -> result` | `A64` | +| mfloat8x16_t vzip2q_mf8(
     mfloat8x16_t a,
     mfloat8x16_t b)
| `a -> Vn.16B`
`b -> Vm.16B` | `ZIP2 Vd.16B,Vn.16B,Vm.16B` | `Vd.16B -> result` | `A64` | +| int8x8x2_t vzip_s8(
     int8x8_t a,
     int8x8_t b)
| `a -> Vn.8B`
`b -> Vm.8B` | `ZIP1 Vd1.8B,Vn.8B,Vm.8B`
`ZIP2 Vd2.8B,Vn.8B,Vm.8B` | `Vd1.8B -> result.val[0]`
`Vd2.8B -> result.val[1]` | `v7/A32/A64` | +| int16x4x2_t vzip_s16(
     int16x4_t a,
     int16x4_t b)
| `a -> Vn.4H`
`b -> Vm.4H` | `ZIP1 Vd1.4H,Vn.4H,Vm.4H`
`ZIP2 Vd2.4H,Vn.4H,Vm.4H` | `Vd1.4H -> result.val[0]`
`Vd2.4H -> result.val[1]` | `v7/A32/A64` | +| uint8x8x2_t vzip_u8(
     uint8x8_t a,
     uint8x8_t b)
| `a -> Vn.8B`
`b -> Vm.8B` | `ZIP1 Vd1.8B,Vn.8B,Vm.8B`
`ZIP2 Vd2.8B,Vn.8B,Vm.8B` | `Vd1.8B -> result.val[0]`
`Vd2.8B -> result.val[1]` | `v7/A32/A64` | +| uint16x4x2_t vzip_u16(
     uint16x4_t a,
     uint16x4_t b)
| `a -> Vn.4H`
`b -> Vm.4H` | `ZIP1 Vd1.4H,Vn.4H,Vm.4H`
`ZIP2 Vd2.4H,Vn.4H,Vm.4H` | `Vd1.4H -> result.val[0]`
`Vd2.4H -> result.val[1]` | `v7/A32/A64` | +| poly8x8x2_t vzip_p8(
     poly8x8_t a,
     poly8x8_t b)
| `a -> Vn.8B`
`b -> Vm.8B` | `ZIP1 Vd1.8B,Vn.8B,Vm.8B`
`ZIP2 Vd2.8B,Vn.8B,Vm.8B` | `Vd1.8B -> result.val[0]`
`Vd2.8B -> result.val[1]` | `v7/A32/A64` | +| poly16x4x2_t vzip_p16(
     poly16x4_t a,
     poly16x4_t b)
| `a -> Vn.4H`
`b -> Vm.4H` | `ZIP1 Vd1.4H,Vn.4H,Vm.4H`
`ZIP2 Vd2.4H,Vn.4H,Vm.4H` | `Vd1.4H -> result.val[0]`
`Vd2.4H -> result.val[1]` | `v7/A32/A64` | +| mfloat8x8x2_t vzip_mf8(
     mfloat8x8_t a,
     mfloat8x8_t b)
| `a -> Vn.8B`
`b -> Vm.8B` | `ZIP1 Vd1.8B,Vn.8B,Vm.8B`
`ZIP2 Vd2.8B,Vn.8B,Vm.8B` | `Vd1.8B -> result.val[0]`
`Vd2.8B -> result.val[1]` | `A64` | +| int32x2x2_t vzip_s32(
     int32x2_t a,
     int32x2_t b)
| `a -> Vn.2S`
`b -> Vm.2S` | `ZIP1 Vd1.2S,Vn.2S,Vm.2S`
`ZIP2 Vd2.2S,Vn.2S,Vm.2S` | `Vd1.2S -> result.val[0]`
`Vd2.2S -> result.val[1]` | `v7/A32/A64` | +| float32x2x2_t vzip_f32(
     float32x2_t a,
     float32x2_t b)
| `a -> Vn.2S`
`b -> Vm.2S` | `ZIP1 Vd1.2S,Vn.2S,Vm.2S`
`ZIP2 Vd2.2S,Vn.2S,Vm.2S` | `Vd1.2S -> result.val[0]`
`Vd2.2S -> result.val[1]` | `v7/A32/A64` | +| uint32x2x2_t vzip_u32(
     uint32x2_t a,
     uint32x2_t b)
| `a -> Vn.2S`
`b -> Vm.2S` | `ZIP1 Vd1.2S,Vn.2S,Vm.2S`
`ZIP2 Vd2.2S,Vn.2S,Vm.2S` | `Vd1.2S -> result.val[0]`
`Vd2.2S -> result.val[1]` | `v7/A32/A64` | +| int8x16x2_t vzipq_s8(
     int8x16_t a,
     int8x16_t b)
| `a -> Vn.16B`
`b -> Vm.16B` | `ZIP1 Vd1.16B,Vn.16B,Vm.16B`
`ZIP2 Vd2.16B,Vn.16B,Vm.16B` | `Vd1.16B -> result.val[0]`
`Vd2.16B -> result.val[1]` | `v7/A32/A64` | +| int16x8x2_t vzipq_s16(
     int16x8_t a,
     int16x8_t b)
| `a -> Vn.8H`
`b -> Vm.8H` | `ZIP1 Vd1.8H,Vn.8H,Vm.8H`
`ZIP2 Vd2.8H,Vn.8H,Vm.8H` | `Vd1.8H -> result.val[0]`
`Vd2.8H -> result.val[1]` | `v7/A32/A64` | +| int32x4x2_t vzipq_s32(
     int32x4_t a,
     int32x4_t b)
| `a -> Vn.4S`
`b -> Vm.4S` | `ZIP1 Vd1.4S,Vn.4S,Vm.4S`
`ZIP2 Vd2.4S,Vn.4S,Vm.4S` | `Vd1.4S -> result.val[0]`
`Vd2.4S -> result.val[1]` | `v7/A32/A64` | +| float32x4x2_t vzipq_f32(
     float32x4_t a,
     float32x4_t b)
| `a -> Vn.4S`
`b -> Vm.4S` | `ZIP1 Vd1.4S,Vn.4S,Vm.4S`
`ZIP2 Vd2.4S,Vn.4S,Vm.4S` | `Vd1.4S -> result.val[0]`
`Vd2.4S -> result.val[1]` | `v7/A32/A64` | +| uint8x16x2_t vzipq_u8(
     uint8x16_t a,
     uint8x16_t b)
| `a -> Vn.16B`
`b -> Vm.16B` | `ZIP1 Vd1.16B,Vn.16B,Vm.16B`
`ZIP2 Vd2.16B,Vn.16B,Vm.16B` | `Vd1.16B -> result.val[0]`
`Vd2.16B -> result.val[1]` | `v7/A32/A64` | +| uint16x8x2_t vzipq_u16(
     uint16x8_t a,
     uint16x8_t b)
| `a -> Vn.8H`
`b -> Vm.8H` | `ZIP1 Vd1.8H,Vn.8H,Vm.8H`
`ZIP2 Vd2.8H,Vn.8H,Vm.8H` | `Vd1.8H -> result.val[0]`
`Vd2.8H -> result.val[1]` | `v7/A32/A64` | +| uint32x4x2_t vzipq_u32(
     uint32x4_t a,
     uint32x4_t b)
| `a -> Vn.4S`
`b -> Vm.4S` | `ZIP1 Vd1.4S,Vn.4S,Vm.4S`
`ZIP2 Vd2.4S,Vn.4S,Vm.4S` | `Vd1.4S -> result.val[0]`
`Vd2.4S -> result.val[1]` | `v7/A32/A64` | +| poly8x16x2_t vzipq_p8(
     poly8x16_t a,
     poly8x16_t b)
| `a -> Vn.16B`
`b -> Vm.16B` | `ZIP1 Vd1.16B,Vn.16B,Vm.16B`
`ZIP2 Vd2.16B,Vn.16B,Vm.16B` | `Vd1.16B -> result.val[0]`
`Vd2.16B -> result.val[1]` | `v7/A32/A64` | +| poly16x8x2_t vzipq_p16(
     poly16x8_t a,
     poly16x8_t b)
| `a -> Vn.8H`
`b -> Vm.8H` | `ZIP1 Vd1.8H,Vn.8H,Vm.8H`
`ZIP2 Vd2.8H,Vn.8H,Vm.8H` | `Vd1.8H -> result.val[0]`
`Vd2.8H -> result.val[1]` | `v7/A32/A64` | +| mfloat8x16x2_t vzipq_mf8(
     mfloat8x16_t a,
     mfloat8x16_t b)
| `a -> Vn.16B`
`b -> Vm.16B` | `ZIP1 Vd1.16B,Vn.16B,Vm.16B`
`ZIP2 Vd2.16B,Vn.16B,Vm.16B` | `Vd1.16B -> result.val[0]`
`Vd2.16B -> result.val[1]` | `A64` | #### Unzip elements -| Intrinsic | Argument preparation | AArch64 Instruction | Result | Supported architectures | -|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------|--------------------------------------------------------------|----------------------------------------------------------|---------------------------| -| int8x8_t vuzp1_s8(
     int8x8_t a,
     int8x8_t b)
| `a -> Vn.8B`
`b -> Vm.8B` | `UZP1 Vd.8B,Vn.8B,Vm.8B` | `Vd.8B -> result` | `A64` | -| int8x16_t vuzp1q_s8(
     int8x16_t a,
     int8x16_t b)
| `a -> Vn.16B`
`b -> Vm.16B` | `UZP1 Vd.16B,Vn.16B,Vm.16B` | `Vd.16B -> result` | `A64` | -| int16x4_t vuzp1_s16(
     int16x4_t a,
     int16x4_t b)
| `a -> Vn.4H`
`b -> Vm.4H` | `UZP1 Vd.4H,Vn.4H,Vm.4H` | `Vd.4H -> result` | `A64` | -| int16x8_t vuzp1q_s16(
     int16x8_t a,
     int16x8_t b)
| `a -> Vn.8H`
`b -> Vm.8H` | `UZP1 Vd.8H,Vn.8H,Vm.8H` | `Vd.8H -> result` | `A64` | -| int32x2_t vuzp1_s32(
     int32x2_t a,
     int32x2_t b)
| `a -> Vn.2S`
`b -> Vm.2S` | `UZP1 Vd.2S,Vn.2S,Vm.2S` | `Vd.2S -> result` | `A64` | -| int32x4_t vuzp1q_s32(
     int32x4_t a,
     int32x4_t b)
| `a -> Vn.4S`
`b -> Vm.4S` | `UZP1 Vd.4S,Vn.4S,Vm.4S` | `Vd.4S -> result` | `A64` | -| int64x2_t vuzp1q_s64(
     int64x2_t a,
     int64x2_t b)
| `a -> Vn.2D`
`b -> Vm.2D` | `UZP1 Vd.2D,Vn.2D,Vm.2D` | `Vd.2D -> result` | `A64` | -| uint8x8_t vuzp1_u8(
     uint8x8_t a,
     uint8x8_t b)
| `a -> Vn.8B`
`b -> Vm.8B` | `UZP1 Vd.8B,Vn.8B,Vm.8B` | `Vd.8B -> result` | `A64` | -| uint8x16_t vuzp1q_u8(
     uint8x16_t a,
     uint8x16_t b)
| `a -> Vn.16B`
`b -> Vm.16B` | `UZP1 Vd.16B,Vn.16B,Vm.16B` | `Vd.16B -> result` | `A64` | -| uint16x4_t vuzp1_u16(
     uint16x4_t a,
     uint16x4_t b)
| `a -> Vn.4H`
`b -> Vm.4H` | `UZP1 Vd.4H,Vn.4H,Vm.4H` | `Vd.4H -> result` | `A64` | -| uint16x8_t vuzp1q_u16(
     uint16x8_t a,
     uint16x8_t b)
| `a -> Vn.8H`
`b -> Vm.8H` | `UZP1 Vd.8H,Vn.8H,Vm.8H` | `Vd.8H -> result` | `A64` | -| uint32x2_t vuzp1_u32(
     uint32x2_t a,
     uint32x2_t b)
| `a -> Vn.2S`
`b -> Vm.2S` | `UZP1 Vd.2S,Vn.2S,Vm.2S` | `Vd.2S -> result` | `A64` | -| uint32x4_t vuzp1q_u32(
     uint32x4_t a,
     uint32x4_t b)
| `a -> Vn.4S`
`b -> Vm.4S` | `UZP1 Vd.4S,Vn.4S,Vm.4S` | `Vd.4S -> result` | `A64` | -| uint64x2_t vuzp1q_u64(
     uint64x2_t a,
     uint64x2_t b)
| `a -> Vn.2D`
`b -> Vm.2D` | `UZP1 Vd.2D,Vn.2D,Vm.2D` | `Vd.2D -> result` | `A64` | -| poly64x2_t vuzp1q_p64(
     poly64x2_t a,
     poly64x2_t b)
| `a -> Vn.2D`
`b -> Vm.2D` | `UZP1 Vd.2D,Vn.2D,Vm.2D` | `Vd.2D -> result` | `A64` | -| float32x2_t vuzp1_f32(
     float32x2_t a,
     float32x2_t b)
| `a -> Vn.2S`
`b -> Vm.2S` | `UZP1 Vd.2S,Vn.2S,Vm.2S` | `Vd.2S -> result` | `A64` | -| float32x4_t vuzp1q_f32(
     float32x4_t a,
     float32x4_t b)
| `a -> Vn.4S`
`b -> Vm.4S` | `UZP1 Vd.4S,Vn.4S,Vm.4S` | `Vd.4S -> result` | `A64` | -| float64x2_t vuzp1q_f64(
     float64x2_t a,
     float64x2_t b)
| `a -> Vn.2D`
`b -> Vm.2D` | `UZP1 Vd.2D,Vn.2D,Vm.2D` | `Vd.2D -> result` | `A64` | -| poly8x8_t vuzp1_p8(
     poly8x8_t a,
     poly8x8_t b)
| `a -> Vn.8B`
`b -> Vm.8B` | `UZP1 Vd.8B,Vn.8B,Vm.8B` | `Vd.8B -> result` | `A64` | -| poly8x16_t vuzp1q_p8(
     poly8x16_t a,
     poly8x16_t b)
| `a -> Vn.16B`
`b -> Vm.16B` | `UZP1 Vd.16B,Vn.16B,Vm.16B` | `Vd.16B -> result` | `A64` | -| poly16x4_t vuzp1_p16(
     poly16x4_t a,
     poly16x4_t b)
| `a -> Vn.4H`
`b -> Vm.4H` | `UZP1 Vd.4H,Vn.4H,Vm.4H` | `Vd.4H -> result` | `A64` | -| poly16x8_t vuzp1q_p16(
     poly16x8_t a,
     poly16x8_t b)
| `a -> Vn.8H`
`b -> Vm.8H` | `UZP1 Vd.8H,Vn.8H,Vm.8H` | `Vd.8H -> result` | `A64` | -| int8x8_t vuzp2_s8(
     int8x8_t a,
     int8x8_t b)
| `a -> Vn.8B`
`b -> Vm.8B` | `UZP2 Vd.8B,Vn.8B,Vm.8B` | `Vd.8B -> result` | `A64` | -| int8x16_t vuzp2q_s8(
     int8x16_t a,
     int8x16_t b)
| `a -> Vn.16B`
`b -> Vm.16B` | `UZP2 Vd.16B,Vn.16B,Vm.16B` | `Vd.16B -> result` | `A64` | -| int16x4_t vuzp2_s16(
     int16x4_t a,
     int16x4_t b)
| `a -> Vn.4H`
`b -> Vm.4H` | `UZP2 Vd.4H,Vn.4H,Vm.4H` | `Vd.4H -> result` | `A64` | -| int16x8_t vuzp2q_s16(
     int16x8_t a,
     int16x8_t b)
| `a -> Vn.8H`
`b -> Vm.8H` | `UZP2 Vd.8H,Vn.8H,Vm.8H` | `Vd.8H -> result` | `A64` | -| int32x2_t vuzp2_s32(
     int32x2_t a,
     int32x2_t b)
| `a -> Vn.2S`
`b -> Vm.2S` | `UZP2 Vd.2S,Vn.2S,Vm.2S` | `Vd.2S -> result` | `A64` | -| int32x4_t vuzp2q_s32(
     int32x4_t a,
     int32x4_t b)
| `a -> Vn.4S`
`b -> Vm.4S` | `UZP2 Vd.4S,Vn.4S,Vm.4S` | `Vd.4S -> result` | `A64` | -| int64x2_t vuzp2q_s64(
     int64x2_t a,
     int64x2_t b)
| `a -> Vn.2D`
`b -> Vm.2D` | `UZP2 Vd.2D,Vn.2D,Vm.2D` | `Vd.2D -> result` | `A64` | -| uint8x8_t vuzp2_u8(
     uint8x8_t a,
     uint8x8_t b)
| `a -> Vn.8B`
`b -> Vm.8B` | `UZP2 Vd.8B,Vn.8B,Vm.8B` | `Vd.8B -> result` | `A64` | -| uint8x16_t vuzp2q_u8(
     uint8x16_t a,
     uint8x16_t b)
| `a -> Vn.16B`
`b -> Vm.16B` | `UZP2 Vd.16B,Vn.16B,Vm.16B` | `Vd.16B -> result` | `A64` | -| uint16x4_t vuzp2_u16(
     uint16x4_t a,
     uint16x4_t b)
| `a -> Vn.4H`
`b -> Vm.4H` | `UZP2 Vd.4H,Vn.4H,Vm.4H` | `Vd.4H -> result` | `A64` | -| uint16x8_t vuzp2q_u16(
     uint16x8_t a,
     uint16x8_t b)
| `a -> Vn.8H`
`b -> Vm.8H` | `UZP2 Vd.8H,Vn.8H,Vm.8H` | `Vd.8H -> result` | `A64` | -| uint32x2_t vuzp2_u32(
     uint32x2_t a,
     uint32x2_t b)
| `a -> Vn.2S`
`b -> Vm.2S` | `UZP2 Vd.2S,Vn.2S,Vm.2S` | `Vd.2S -> result` | `A64` | -| uint32x4_t vuzp2q_u32(
     uint32x4_t a,
     uint32x4_t b)
| `a -> Vn.4S`
`b -> Vm.4S` | `UZP2 Vd.4S,Vn.4S,Vm.4S` | `Vd.4S -> result` | `A64` | -| uint64x2_t vuzp2q_u64(
     uint64x2_t a,
     uint64x2_t b)
| `a -> Vn.2D`
`b -> Vm.2D` | `UZP2 Vd.2D,Vn.2D,Vm.2D` | `Vd.2D -> result` | `A64` | -| poly64x2_t vuzp2q_p64(
     poly64x2_t a,
     poly64x2_t b)
| `a -> Vn.2D`
`b -> Vm.2D` | `UZP2 Vd.2D,Vn.2D,Vm.2D` | `Vd.2D -> result` | `A64` | -| float32x2_t vuzp2_f32(
     float32x2_t a,
     float32x2_t b)
| `a -> Vn.2S`
`b -> Vm.2S` | `UZP2 Vd.2S,Vn.2S,Vm.2S` | `Vd.2S -> result` | `A64` | -| float32x4_t vuzp2q_f32(
     float32x4_t a,
     float32x4_t b)
| `a -> Vn.4S`
`b -> Vm.4S` | `UZP2 Vd.4S,Vn.4S,Vm.4S` | `Vd.4S -> result` | `A64` | -| float64x2_t vuzp2q_f64(
     float64x2_t a,
     float64x2_t b)
| `a -> Vn.2D`
`b -> Vm.2D` | `UZP2 Vd.2D,Vn.2D,Vm.2D` | `Vd.2D -> result` | `A64` | -| poly8x8_t vuzp2_p8(
     poly8x8_t a,
     poly8x8_t b)
| `a -> Vn.8B`
`b -> Vm.8B` | `UZP2 Vd.8B,Vn.8B,Vm.8B` | `Vd.8B -> result` | `A64` | -| poly8x16_t vuzp2q_p8(
     poly8x16_t a,
     poly8x16_t b)
| `a -> Vn.16B`
`b -> Vm.16B` | `UZP2 Vd.16B,Vn.16B,Vm.16B` | `Vd.16B -> result` | `A64` | -| poly16x4_t vuzp2_p16(
     poly16x4_t a,
     poly16x4_t b)
| `a -> Vn.4H`
`b -> Vm.4H` | `UZP2 Vd.4H,Vn.4H,Vm.4H` | `Vd.4H -> result` | `A64` | -| poly16x8_t vuzp2q_p16(
     poly16x8_t a,
     poly16x8_t b)
| `a -> Vn.8H`
`b -> Vm.8H` | `UZP2 Vd.8H,Vn.8H,Vm.8H` | `Vd.8H -> result` | `A64` | -| int8x8x2_t vuzp_s8(
     int8x8_t a,
     int8x8_t b)
| `a -> Vn.8B`
`b -> Vm.8B` | `UZP1 Vd1.8B,Vn.8B,Vm.8B`
`UZP2 Vd2.8B,Vn.8B,Vm.8B` | `Vd1.8B -> result.val[0]`
`Vd2.8B -> result.val[1]` | `v7/A32/A64` | -| int16x4x2_t vuzp_s16(
     int16x4_t a,
     int16x4_t b)
| `a -> Vn.4H`
`b -> Vm.4H` | `UZP1 Vd1.4H,Vn.4H,Vm.4H`
`UZP2 Vd2.4H,Vn.4H,Vm.4H` | `Vd1.4H -> result.val[0]`
`Vd2.4H -> result.val[1]` | `v7/A32/A64` | -| int32x2x2_t vuzp_s32(
     int32x2_t a,
     int32x2_t b)
| `a -> Vn.2S`
`b -> Vm.2S` | `UZP1 Vd1.2S,Vn.2S,Vm.2S`
`UZP2 Vd2.2S,Vn.2S,Vm.2S` | `Vd1.2S -> result.val[0]`
`Vd2.2S -> result.val[1]` | `v7/A32/A64` | -| float32x2x2_t vuzp_f32(
     float32x2_t a,
     float32x2_t b)
| `a -> Vn.2S`
`b -> Vm.2S` | `UZP1 Vd1.2S,Vn.2S,Vm.2S`
`UZP2 Vd2.2S,Vn.2S,Vm.2S` | `Vd1.2S -> result.val[0]`
`Vd2.2S -> result.val[1]` | `v7/A32/A64` | -| uint8x8x2_t vuzp_u8(
     uint8x8_t a,
     uint8x8_t b)
| `a -> Vn.8B`
`b -> Vm.8B` | `UZP1 Vd1.8B,Vn.8B,Vm.8B`
`UZP2 Vd2.8B,Vn.8B,Vm.8B` | `Vd1.8B -> result.val[0]`
`Vd2.8B -> result.val[1]` | `v7/A32/A64` | -| uint16x4x2_t vuzp_u16(
     uint16x4_t a,
     uint16x4_t b)
| `a -> Vn.4H`
`b -> Vm.4H` | `UZP1 Vd1.4H,Vn.4H,Vm.4H`
`UZP2 Vd2.4H,Vn.4H,Vm.4H` | `Vd1.4H -> result.val[0]`
`Vd2.4H -> result.val[1]` | `v7/A32/A64` | -| uint32x2x2_t vuzp_u32(
     uint32x2_t a,
     uint32x2_t b)
| `a -> Vn.2S`
`b -> Vm.2S` | `UZP1 Vd1.2S,Vn.2S,Vm.2S`
`UZP2 Vd2.2S,Vn.2S,Vm.2S` | `Vd1.2S -> result.val[0]`
`Vd2.2S -> result.val[1]` | `v7/A32/A64` | -| poly8x8x2_t vuzp_p8(
     poly8x8_t a,
     poly8x8_t b)
| `a -> Vn.8B`
`b -> Vm.8B` | `UZP1 Vd1.8B,Vn.8B,Vm.8B`
`UZP2 Vd2.8B,Vn.8B,Vm.8B` | `Vd1.8B -> result.val[0]`
`Vd2.8B -> result.val[1]` | `v7/A32/A64` | -| poly16x4x2_t vuzp_p16(
     poly16x4_t a,
     poly16x4_t b)
| `a -> Vn.4H`
`b -> Vm.4H` | `UZP1 Vd1.4H,Vn.4H,Vm.4H`
`UZP2 Vd2.4H,Vn.4H,Vm.4H` | `Vd1.4H -> result.val[0]`
`Vd2.4H -> result.val[1]` | `v7/A32/A64` | -| int8x16x2_t vuzpq_s8(
     int8x16_t a,
     int8x16_t b)
| `a -> Vn.16B`
`b -> Vm.16B` | `UZP1 Vd1.16B,Vn.16B,Vm.16B`
`UZP2 Vd2.16B,Vn.16B,Vm.16B` | `Vd1.16B -> result.val[0]`
`Vd2.16B -> result.val[1]` | `v7/A32/A64` | -| int16x8x2_t vuzpq_s16(
     int16x8_t a,
     int16x8_t b)
| `a -> Vn.8H`
`b -> Vm.8H` | `UZP1 Vd1.8H,Vn.8H,Vm.8H`
`UZP2 Vd2.8H,Vn.8H,Vm.8H` | `Vd1.8H -> result.val[0]`
`Vd2.8H -> result.val[1]` | `v7/A32/A64` | -| int32x4x2_t vuzpq_s32(
     int32x4_t a,
     int32x4_t b)
| `a -> Vn.4S`
`b -> Vm.4S` | `UZP1 Vd1.4S,Vn.4S,Vm.4S`
`UZP2 Vd2.4S,Vn.4S,Vm.4S` | `Vd1.4S -> result.val[0]`
`Vd2.4S -> result.val[1]` | `v7/A32/A64` | -| float32x4x2_t vuzpq_f32(
     float32x4_t a,
     float32x4_t b)
| `a -> Vn.4S`
`b -> Vm.4S` | `UZP1 Vd1.4S,Vn.4S,Vm.4S`
`UZP2 Vd2.4S,Vn.4S,Vm.4S` | `Vd1.4S -> result.val[0]`
`Vd2.4S -> result.val[1]` | `v7/A32/A64` | -| uint8x16x2_t vuzpq_u8(
     uint8x16_t a,
     uint8x16_t b)
| `a -> Vn.16B`
`b -> Vm.16B` | `UZP1 Vd1.16B,Vn.16B,Vm.16B`
`UZP2 Vd2.16B,Vn.16B,Vm.16B` | `Vd1.16B -> result.val[0]`
`Vd2.16B -> result.val[1]` | `v7/A32/A64` | -| uint16x8x2_t vuzpq_u16(
     uint16x8_t a,
     uint16x8_t b)
| `a -> Vn.8H`
`b -> Vm.8H` | `UZP1 Vd1.8H,Vn.8H,Vm.8H`
`UZP2 Vd2.8H,Vn.8H,Vm.8H` | `Vd1.8H -> result.val[0]`
`Vd2.8H -> result.val[1]` | `v7/A32/A64` | -| uint32x4x2_t vuzpq_u32(
     uint32x4_t a,
     uint32x4_t b)
| `a -> Vn.4S`
`b -> Vm.4S` | `UZP1 Vd1.4S,Vn.4S,Vm.4S`
`UZP2 Vd2.4S,Vn.4S,Vm.4S` | `Vd1.4S -> result.val[0]`
`Vd2.4S -> result.val[1]` | `v7/A32/A64` | -| poly8x16x2_t vuzpq_p8(
     poly8x16_t a,
     poly8x16_t b)
| `a -> Vn.16B`
`b -> Vm.16B` | `UZP1 Vd1.16B,Vn.16B,Vm.16B`
`UZP2 Vd2.16B,Vn.16B,Vm.16B` | `Vd1.16B -> result.val[0]`
`Vd2.16B -> result.val[1]` | `v7/A32/A64` | -| poly16x8x2_t vuzpq_p16(
     poly16x8_t a,
     poly16x8_t b)
| `a -> Vn.8H`
`b -> Vm.8H` | `UZP1 Vd1.8H,Vn.8H,Vm.8H`
`UZP2 Vd2.8H,Vn.8H,Vm.8H` | `Vd1.8H -> result.val[0]`
`Vd2.8H -> result.val[1]` | `v7/A32/A64` | +| Intrinsic | Argument preparation | AArch64 Instruction | Result | Supported architectures | +|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------|--------------------------------------------------------------|----------------------------------------------------------|---------------------------| +| int8x8_t vuzp1_s8(
     int8x8_t a,
     int8x8_t b)
| `a -> Vn.8B`
`b -> Vm.8B` | `UZP1 Vd.8B,Vn.8B,Vm.8B` | `Vd.8B -> result` | `A64` | +| int8x16_t vuzp1q_s8(
     int8x16_t a,
     int8x16_t b)
| `a -> Vn.16B`
`b -> Vm.16B` | `UZP1 Vd.16B,Vn.16B,Vm.16B` | `Vd.16B -> result` | `A64` | +| int16x4_t vuzp1_s16(
     int16x4_t a,
     int16x4_t b)
| `a -> Vn.4H`
`b -> Vm.4H` | `UZP1 Vd.4H,Vn.4H,Vm.4H` | `Vd.4H -> result` | `A64` | +| int16x8_t vuzp1q_s16(
     int16x8_t a,
     int16x8_t b)
| `a -> Vn.8H`
`b -> Vm.8H` | `UZP1 Vd.8H,Vn.8H,Vm.8H` | `Vd.8H -> result` | `A64` | +| int32x2_t vuzp1_s32(
     int32x2_t a,
     int32x2_t b)
| `a -> Vn.2S`
`b -> Vm.2S` | `UZP1 Vd.2S,Vn.2S,Vm.2S` | `Vd.2S -> result` | `A64` | +| int32x4_t vuzp1q_s32(
     int32x4_t a,
     int32x4_t b)
| `a -> Vn.4S`
`b -> Vm.4S` | `UZP1 Vd.4S,Vn.4S,Vm.4S` | `Vd.4S -> result` | `A64` | +| int64x2_t vuzp1q_s64(
     int64x2_t a,
     int64x2_t b)
| `a -> Vn.2D`
`b -> Vm.2D` | `UZP1 Vd.2D,Vn.2D,Vm.2D` | `Vd.2D -> result` | `A64` | +| uint8x8_t vuzp1_u8(
     uint8x8_t a,
     uint8x8_t b)
| `a -> Vn.8B`
`b -> Vm.8B` | `UZP1 Vd.8B,Vn.8B,Vm.8B` | `Vd.8B -> result` | `A64` | +| uint8x16_t vuzp1q_u8(
     uint8x16_t a,
     uint8x16_t b)
| `a -> Vn.16B`
`b -> Vm.16B` | `UZP1 Vd.16B,Vn.16B,Vm.16B` | `Vd.16B -> result` | `A64` | +| uint16x4_t vuzp1_u16(
     uint16x4_t a,
     uint16x4_t b)
| `a -> Vn.4H`
`b -> Vm.4H` | `UZP1 Vd.4H,Vn.4H,Vm.4H` | `Vd.4H -> result` | `A64` | +| uint16x8_t vuzp1q_u16(
     uint16x8_t a,
     uint16x8_t b)
| `a -> Vn.8H`
`b -> Vm.8H` | `UZP1 Vd.8H,Vn.8H,Vm.8H` | `Vd.8H -> result` | `A64` | +| uint32x2_t vuzp1_u32(
     uint32x2_t a,
     uint32x2_t b)
| `a -> Vn.2S`
`b -> Vm.2S` | `UZP1 Vd.2S,Vn.2S,Vm.2S` | `Vd.2S -> result` | `A64` | +| uint32x4_t vuzp1q_u32(
     uint32x4_t a,
     uint32x4_t b)
| `a -> Vn.4S`
`b -> Vm.4S` | `UZP1 Vd.4S,Vn.4S,Vm.4S` | `Vd.4S -> result` | `A64` | +| uint64x2_t vuzp1q_u64(
     uint64x2_t a,
     uint64x2_t b)
| `a -> Vn.2D`
`b -> Vm.2D` | `UZP1 Vd.2D,Vn.2D,Vm.2D` | `Vd.2D -> result` | `A64` | +| poly64x2_t vuzp1q_p64(
     poly64x2_t a,
     poly64x2_t b)
| `a -> Vn.2D`
`b -> Vm.2D` | `UZP1 Vd.2D,Vn.2D,Vm.2D` | `Vd.2D -> result` | `A64` | +| float32x2_t vuzp1_f32(
     float32x2_t a,
     float32x2_t b)
| `a -> Vn.2S`
`b -> Vm.2S` | `UZP1 Vd.2S,Vn.2S,Vm.2S` | `Vd.2S -> result` | `A64` | +| float32x4_t vuzp1q_f32(
     float32x4_t a,
     float32x4_t b)
| `a -> Vn.4S`
`b -> Vm.4S` | `UZP1 Vd.4S,Vn.4S,Vm.4S` | `Vd.4S -> result` | `A64` | +| float64x2_t vuzp1q_f64(
     float64x2_t a,
     float64x2_t b)
| `a -> Vn.2D`
`b -> Vm.2D` | `UZP1 Vd.2D,Vn.2D,Vm.2D` | `Vd.2D -> result` | `A64` | +| poly8x8_t vuzp1_p8(
     poly8x8_t a,
     poly8x8_t b)
| `a -> Vn.8B`
`b -> Vm.8B` | `UZP1 Vd.8B,Vn.8B,Vm.8B` | `Vd.8B -> result` | `A64` | +| poly8x16_t vuzp1q_p8(
     poly8x16_t a,
     poly8x16_t b)
| `a -> Vn.16B`
`b -> Vm.16B` | `UZP1 Vd.16B,Vn.16B,Vm.16B` | `Vd.16B -> result` | `A64` | +| poly16x4_t vuzp1_p16(
     poly16x4_t a,
     poly16x4_t b)
| `a -> Vn.4H`
`b -> Vm.4H` | `UZP1 Vd.4H,Vn.4H,Vm.4H` | `Vd.4H -> result` | `A64` | +| poly16x8_t vuzp1q_p16(
     poly16x8_t a,
     poly16x8_t b)
| `a -> Vn.8H`
`b -> Vm.8H` | `UZP1 Vd.8H,Vn.8H,Vm.8H` | `Vd.8H -> result` | `A64` | +| mfloat8x8_t vuzp1_mf8(
     mfloat8x8_t a,
     mfloat8x8_t b)
| `a -> Vn.8B`
`b -> Vm.8B` | `UZP1 Vd.8B,Vn.8B,Vm.8B` | `Vd.8B -> result` | `A64` | +| mfloat8x16_t vuzp1q_mf8(
     mfloat8x16_t a,
     mfloat8x16_t b)
| `a -> Vn.16B`
`b -> Vm.16B` | `UZP1 Vd.16B,Vn.16B,Vm.16B` | `Vd.16B -> result` | `A64` | +| int8x8_t vuzp2_s8(
     int8x8_t a,
     int8x8_t b)
| `a -> Vn.8B`
`b -> Vm.8B` | `UZP2 Vd.8B,Vn.8B,Vm.8B` | `Vd.8B -> result` | `A64` | +| int8x16_t vuzp2q_s8(
     int8x16_t a,
     int8x16_t b)
| `a -> Vn.16B`
`b -> Vm.16B` | `UZP2 Vd.16B,Vn.16B,Vm.16B` | `Vd.16B -> result` | `A64` | +| int16x4_t vuzp2_s16(
     int16x4_t a,
     int16x4_t b)
| `a -> Vn.4H`
`b -> Vm.4H` | `UZP2 Vd.4H,Vn.4H,Vm.4H` | `Vd.4H -> result` | `A64` | +| int16x8_t vuzp2q_s16(
     int16x8_t a,
     int16x8_t b)
| `a -> Vn.8H`
`b -> Vm.8H` | `UZP2 Vd.8H,Vn.8H,Vm.8H` | `Vd.8H -> result` | `A64` | +| int32x2_t vuzp2_s32(
     int32x2_t a,
     int32x2_t b)
| `a -> Vn.2S`
`b -> Vm.2S` | `UZP2 Vd.2S,Vn.2S,Vm.2S` | `Vd.2S -> result` | `A64` | +| int32x4_t vuzp2q_s32(
     int32x4_t a,
     int32x4_t b)
| `a -> Vn.4S`
`b -> Vm.4S` | `UZP2 Vd.4S,Vn.4S,Vm.4S` | `Vd.4S -> result` | `A64` | +| int64x2_t vuzp2q_s64(
     int64x2_t a,
     int64x2_t b)
| `a -> Vn.2D`
`b -> Vm.2D` | `UZP2 Vd.2D,Vn.2D,Vm.2D` | `Vd.2D -> result` | `A64` | +| uint8x8_t vuzp2_u8(
     uint8x8_t a,
     uint8x8_t b)
| `a -> Vn.8B`
`b -> Vm.8B` | `UZP2 Vd.8B,Vn.8B,Vm.8B` | `Vd.8B -> result` | `A64` | +| uint8x16_t vuzp2q_u8(
     uint8x16_t a,
     uint8x16_t b)
| `a -> Vn.16B`
`b -> Vm.16B` | `UZP2 Vd.16B,Vn.16B,Vm.16B` | `Vd.16B -> result` | `A64` | +| uint16x4_t vuzp2_u16(
     uint16x4_t a,
     uint16x4_t b)
| `a -> Vn.4H`
`b -> Vm.4H` | `UZP2 Vd.4H,Vn.4H,Vm.4H` | `Vd.4H -> result` | `A64` | +| uint16x8_t vuzp2q_u16(
     uint16x8_t a,
     uint16x8_t b)
| `a -> Vn.8H`
`b -> Vm.8H` | `UZP2 Vd.8H,Vn.8H,Vm.8H` | `Vd.8H -> result` | `A64` | +| uint32x2_t vuzp2_u32(
     uint32x2_t a,
     uint32x2_t b)
| `a -> Vn.2S`
`b -> Vm.2S` | `UZP2 Vd.2S,Vn.2S,Vm.2S` | `Vd.2S -> result` | `A64` | +| uint32x4_t vuzp2q_u32(
     uint32x4_t a,
     uint32x4_t b)
| `a -> Vn.4S`
`b -> Vm.4S` | `UZP2 Vd.4S,Vn.4S,Vm.4S` | `Vd.4S -> result` | `A64` | +| uint64x2_t vuzp2q_u64(
     uint64x2_t a,
     uint64x2_t b)
| `a -> Vn.2D`
`b -> Vm.2D` | `UZP2 Vd.2D,Vn.2D,Vm.2D` | `Vd.2D -> result` | `A64` | +| poly64x2_t vuzp2q_p64(
     poly64x2_t a,
     poly64x2_t b)
| `a -> Vn.2D`
`b -> Vm.2D` | `UZP2 Vd.2D,Vn.2D,Vm.2D` | `Vd.2D -> result` | `A64` | +| float32x2_t vuzp2_f32(
     float32x2_t a,
     float32x2_t b)
| `a -> Vn.2S`
`b -> Vm.2S` | `UZP2 Vd.2S,Vn.2S,Vm.2S` | `Vd.2S -> result` | `A64` | +| float32x4_t vuzp2q_f32(
     float32x4_t a,
     float32x4_t b)
| `a -> Vn.4S`
`b -> Vm.4S` | `UZP2 Vd.4S,Vn.4S,Vm.4S` | `Vd.4S -> result` | `A64` | +| float64x2_t vuzp2q_f64(
     float64x2_t a,
     float64x2_t b)
| `a -> Vn.2D`
`b -> Vm.2D` | `UZP2 Vd.2D,Vn.2D,Vm.2D` | `Vd.2D -> result` | `A64` | +| poly8x8_t vuzp2_p8(
     poly8x8_t a,
     poly8x8_t b)
| `a -> Vn.8B`
`b -> Vm.8B` | `UZP2 Vd.8B,Vn.8B,Vm.8B` | `Vd.8B -> result` | `A64` | +| poly8x16_t vuzp2q_p8(
     poly8x16_t a,
     poly8x16_t b)
| `a -> Vn.16B`
`b -> Vm.16B` | `UZP2 Vd.16B,Vn.16B,Vm.16B` | `Vd.16B -> result` | `A64` | +| poly16x4_t vuzp2_p16(
     poly16x4_t a,
     poly16x4_t b)
| `a -> Vn.4H`
`b -> Vm.4H` | `UZP2 Vd.4H,Vn.4H,Vm.4H` | `Vd.4H -> result` | `A64` | +| poly16x8_t vuzp2q_p16(
     poly16x8_t a,
     poly16x8_t b)
| `a -> Vn.8H`
`b -> Vm.8H` | `UZP2 Vd.8H,Vn.8H,Vm.8H` | `Vd.8H -> result` | `A64` | +| mfloat8x8_t vuzp2_mf8(
     mfloat8x8_t a,
     mfloat8x8_t b)
| `a -> Vn.8B`
`b -> Vm.8B` | `UZP2 Vd.8B,Vn.8B,Vm.8B` | `Vd.8B -> result` | `A64` | +| mfloat8x16_t vuzp2q_mf8(
     mfloat8x16_t a,
     mfloat8x16_t b)
| `a -> Vn.16B`
`b -> Vm.16B` | `UZP2 Vd.16B,Vn.16B,Vm.16B` | `Vd.16B -> result` | `A64` | +| int8x8x2_t vuzp_s8(
     int8x8_t a,
     int8x8_t b)
| `a -> Vn.8B`
`b -> Vm.8B` | `UZP1 Vd1.8B,Vn.8B,Vm.8B`
`UZP2 Vd2.8B,Vn.8B,Vm.8B` | `Vd1.8B -> result.val[0]`
`Vd2.8B -> result.val[1]` | `v7/A32/A64` | +| int16x4x2_t vuzp_s16(
     int16x4_t a,
     int16x4_t b)
| `a -> Vn.4H`
`b -> Vm.4H` | `UZP1 Vd1.4H,Vn.4H,Vm.4H`
`UZP2 Vd2.4H,Vn.4H,Vm.4H` | `Vd1.4H -> result.val[0]`
`Vd2.4H -> result.val[1]` | `v7/A32/A64` | +| int32x2x2_t vuzp_s32(
     int32x2_t a,
     int32x2_t b)
| `a -> Vn.2S`
`b -> Vm.2S` | `UZP1 Vd1.2S,Vn.2S,Vm.2S`
`UZP2 Vd2.2S,Vn.2S,Vm.2S` | `Vd1.2S -> result.val[0]`
`Vd2.2S -> result.val[1]` | `v7/A32/A64` | +| float32x2x2_t vuzp_f32(
     float32x2_t a,
     float32x2_t b)
| `a -> Vn.2S`
`b -> Vm.2S` | `UZP1 Vd1.2S,Vn.2S,Vm.2S`
`UZP2 Vd2.2S,Vn.2S,Vm.2S` | `Vd1.2S -> result.val[0]`
`Vd2.2S -> result.val[1]` | `v7/A32/A64` | +| uint8x8x2_t vuzp_u8(
     uint8x8_t a,
     uint8x8_t b)
| `a -> Vn.8B`
`b -> Vm.8B` | `UZP1 Vd1.8B,Vn.8B,Vm.8B`
`UZP2 Vd2.8B,Vn.8B,Vm.8B` | `Vd1.8B -> result.val[0]`
`Vd2.8B -> result.val[1]` | `v7/A32/A64` | +| uint16x4x2_t vuzp_u16(
     uint16x4_t a,
     uint16x4_t b)
| `a -> Vn.4H`
`b -> Vm.4H` | `UZP1 Vd1.4H,Vn.4H,Vm.4H`
`UZP2 Vd2.4H,Vn.4H,Vm.4H` | `Vd1.4H -> result.val[0]`
`Vd2.4H -> result.val[1]` | `v7/A32/A64` | +| uint32x2x2_t vuzp_u32(
     uint32x2_t a,
     uint32x2_t b)
| `a -> Vn.2S`
`b -> Vm.2S` | `UZP1 Vd1.2S,Vn.2S,Vm.2S`
`UZP2 Vd2.2S,Vn.2S,Vm.2S` | `Vd1.2S -> result.val[0]`
`Vd2.2S -> result.val[1]` | `v7/A32/A64` | +| poly8x8x2_t vuzp_p8(
     poly8x8_t a,
     poly8x8_t b)
| `a -> Vn.8B`
`b -> Vm.8B` | `UZP1 Vd1.8B,Vn.8B,Vm.8B`
`UZP2 Vd2.8B,Vn.8B,Vm.8B` | `Vd1.8B -> result.val[0]`
`Vd2.8B -> result.val[1]` | `v7/A32/A64` | +| poly16x4x2_t vuzp_p16(
     poly16x4_t a,
     poly16x4_t b)
| `a -> Vn.4H`
`b -> Vm.4H` | `UZP1 Vd1.4H,Vn.4H,Vm.4H`
`UZP2 Vd2.4H,Vn.4H,Vm.4H` | `Vd1.4H -> result.val[0]`
`Vd2.4H -> result.val[1]` | `v7/A32/A64` | +| mfloat8x8x2_t vuzp_mf8(
     mfloat8x8_t a,
     mfloat8x8_t b)
| `a -> Vn.8B`
`b -> Vm.8B` | `UZP1 Vd1.8B,Vn.8B,Vm.8B`
`UZP2 Vd2.8B,Vn.8B,Vm.8B` | `Vd1.8B -> result.val[0]`
`Vd2.8B -> result.val[1]` | `A64` | +| int8x16x2_t vuzpq_s8(
     int8x16_t a,
     int8x16_t b)
| `a -> Vn.16B`
`b -> Vm.16B` | `UZP1 Vd1.16B,Vn.16B,Vm.16B`
`UZP2 Vd2.16B,Vn.16B,Vm.16B` | `Vd1.16B -> result.val[0]`
`Vd2.16B -> result.val[1]` | `v7/A32/A64` | +| int16x8x2_t vuzpq_s16(
     int16x8_t a,
     int16x8_t b)
| `a -> Vn.8H`
`b -> Vm.8H` | `UZP1 Vd1.8H,Vn.8H,Vm.8H`
`UZP2 Vd2.8H,Vn.8H,Vm.8H` | `Vd1.8H -> result.val[0]`
`Vd2.8H -> result.val[1]` | `v7/A32/A64` | +| int32x4x2_t vuzpq_s32(
     int32x4_t a,
     int32x4_t b)
| `a -> Vn.4S`
`b -> Vm.4S` | `UZP1 Vd1.4S,Vn.4S,Vm.4S`
`UZP2 Vd2.4S,Vn.4S,Vm.4S` | `Vd1.4S -> result.val[0]`
`Vd2.4S -> result.val[1]` | `v7/A32/A64` | +| float32x4x2_t vuzpq_f32(
     float32x4_t a,
     float32x4_t b)
| `a -> Vn.4S`
`b -> Vm.4S` | `UZP1 Vd1.4S,Vn.4S,Vm.4S`
`UZP2 Vd2.4S,Vn.4S,Vm.4S` | `Vd1.4S -> result.val[0]`
`Vd2.4S -> result.val[1]` | `v7/A32/A64` | +| uint8x16x2_t vuzpq_u8(
     uint8x16_t a,
     uint8x16_t b)
| `a -> Vn.16B`
`b -> Vm.16B` | `UZP1 Vd1.16B,Vn.16B,Vm.16B`
`UZP2 Vd2.16B,Vn.16B,Vm.16B` | `Vd1.16B -> result.val[0]`
`Vd2.16B -> result.val[1]` | `v7/A32/A64` | +| uint32x4x2_t vuzpq_u32(
     uint32x4_t a,
     uint32x4_t b)
| `a -> Vn.4S`
`b -> Vm.4S` | `UZP1 Vd1.4S,Vn.4S,Vm.4S`
`UZP2 Vd2.4S,Vn.4S,Vm.4S` | `Vd1.4S -> result.val[0]`
`Vd2.4S -> result.val[1]` | `v7/A32/A64` | +| poly8x16x2_t vuzpq_p8(
     poly8x16_t a,
     poly8x16_t b)
| `a -> Vn.16B`
`b -> Vm.16B` | `UZP1 Vd1.16B,Vn.16B,Vm.16B`
`UZP2 Vd2.16B,Vn.16B,Vm.16B` | `Vd1.16B -> result.val[0]`
`Vd2.16B -> result.val[1]` | `v7/A32/A64` | +| poly16x8x2_t vuzpq_p16(
     poly16x8_t a,
     poly16x8_t b)
| `a -> Vn.8H`
`b -> Vm.8H` | `UZP1 Vd1.8H,Vn.8H,Vm.8H`
`UZP2 Vd2.8H,Vn.8H,Vm.8H` | `Vd1.8H -> result.val[0]`
`Vd2.8H -> result.val[1]` | `v7/A32/A64` | +| mfloat8x16x2_t vuzpq_mf8(
     mfloat8x16_t a,
     mfloat8x16_t b)
| `a -> Vn.16B`
`b -> Vm.16B` | `UZP1 Vd1.16B,Vn.16B,Vm.16B`
`UZP2 Vd2.16B,Vn.16B,Vm.16B` | `Vd1.16B -> result.val[0]`
`Vd2.16B -> result.val[1]` | `A64` | #### Transpose elements -| Intrinsic | Argument preparation | AArch64 Instruction | Result | Supported architectures | -|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------|--------------------------------------------------------------|----------------------------------------------------------|---------------------------| -| int8x8_t vtrn1_s8(
     int8x8_t a,
     int8x8_t b)
| `a -> Vn.8B`
`b -> Vm.8B` | `TRN1 Vd.8B,Vn.8B,Vm.8B` | `Vd.8B -> result` | `A64` | -| int8x16_t vtrn1q_s8(
     int8x16_t a,
     int8x16_t b)
| `a -> Vn.16B`
`b -> Vm.16B` | `TRN1 Vd.16B,Vn.16B,Vm.16B` | `Vd.16B -> result` | `A64` | -| int16x4_t vtrn1_s16(
     int16x4_t a,
     int16x4_t b)
| `a -> Vn.4H`
`b -> Vm.4H` | `TRN1 Vd.4H,Vn.4H,Vm.4H` | `Vd.4H -> result` | `A64` | -| int16x8_t vtrn1q_s16(
     int16x8_t a,
     int16x8_t b)
| `a -> Vn.8H`
`b -> Vm.8H` | `TRN1 Vd.8H,Vn.8H,Vm.8H` | `Vd.8H -> result` | `A64` | -| int32x2_t vtrn1_s32(
     int32x2_t a,
     int32x2_t b)
| `a -> Vn.2S`
`b -> Vm.2S` | `TRN1 Vd.2S,Vn.2S,Vm.2S` | `Vd.2S -> result` | `A64` | -| int32x4_t vtrn1q_s32(
     int32x4_t a,
     int32x4_t b)
| `a -> Vn.4S`
`b -> Vm.4S` | `TRN1 Vd.4S,Vn.4S,Vm.4S` | `Vd.4S -> result` | `A64` | -| int64x2_t vtrn1q_s64(
     int64x2_t a,
     int64x2_t b)
| `a -> Vn.2D`
`b -> Vm.2D` | `TRN1 Vd.2D,Vn.2D,Vm.2D` | `Vd.2D -> result` | `A64` | -| uint8x8_t vtrn1_u8(
     uint8x8_t a,
     uint8x8_t b)
| `a -> Vn.8B`
`b -> Vm.8B` | `TRN1 Vd.8B,Vn.8B,Vm.8B` | `Vd.8B -> result` | `A64` | -| uint8x16_t vtrn1q_u8(
     uint8x16_t a,
     uint8x16_t b)
| `a -> Vn.16B`
`b -> Vm.16B` | `TRN1 Vd.16B,Vn.16B,Vm.16B` | `Vd.16B -> result` | `A64` | -| uint16x4_t vtrn1_u16(
     uint16x4_t a,
     uint16x4_t b)
| `a -> Vn.4H`
`b -> Vm.4H` | `TRN1 Vd.4H,Vn.4H,Vm.4H` | `Vd.4H -> result` | `A64` | -| uint16x8_t vtrn1q_u16(
     uint16x8_t a,
     uint16x8_t b)
| `a -> Vn.8H`
`b -> Vm.8H` | `TRN1 Vd.8H,Vn.8H,Vm.8H` | `Vd.8H -> result` | `A64` | -| uint32x2_t vtrn1_u32(
     uint32x2_t a,
     uint32x2_t b)
| `a -> Vn.2S`
`b -> Vm.2S` | `TRN1 Vd.2S,Vn.2S,Vm.2S` | `Vd.2S -> result` | `A64` | -| uint32x4_t vtrn1q_u32(
     uint32x4_t a,
     uint32x4_t b)
| `a -> Vn.4S`
`b -> Vm.4S` | `TRN1 Vd.4S,Vn.4S,Vm.4S` | `Vd.4S -> result` | `A64` | -| uint64x2_t vtrn1q_u64(
     uint64x2_t a,
     uint64x2_t b)
| `a -> Vn.2D`
`b -> Vm.2D` | `TRN1 Vd.2D,Vn.2D,Vm.2D` | `Vd.2D -> result` | `A64` | -| poly64x2_t vtrn1q_p64(
     poly64x2_t a,
     poly64x2_t b)
| `a -> Vn.2D`
`b -> Vm.2D` | `TRN1 Vd.2D,Vn.2D,Vm.2D` | `Vd.2D -> result` | `A64` | -| float32x2_t vtrn1_f32(
     float32x2_t a,
     float32x2_t b)
| `a -> Vn.2S`
`b -> Vm.2S` | `TRN1 Vd.2S,Vn.2S,Vm.2S` | `Vd.2S -> result` | `A64` | -| float32x4_t vtrn1q_f32(
     float32x4_t a,
     float32x4_t b)
| `a -> Vn.4S`
`b -> Vm.4S` | `TRN1 Vd.4S,Vn.4S,Vm.4S` | `Vd.4S -> result` | `A64` | -| float64x2_t vtrn1q_f64(
     float64x2_t a,
     float64x2_t b)
| `a -> Vn.2D`
`b -> Vm.2D` | `TRN1 Vd.2D,Vn.2D,Vm.2D` | `Vd.2D -> result` | `A64` | -| poly8x8_t vtrn1_p8(
     poly8x8_t a,
     poly8x8_t b)
| `a -> Vn.8B`
`b -> Vm.8B` | `TRN1 Vd.8B,Vn.8B,Vm.8B` | `Vd.8B -> result` | `A64` | -| poly8x16_t vtrn1q_p8(
     poly8x16_t a,
     poly8x16_t b)
| `a -> Vn.16B`
`b -> Vm.16B` | `TRN1 Vd.16B,Vn.16B,Vm.16B` | `Vd.16B -> result` | `A64` | -| poly16x4_t vtrn1_p16(
     poly16x4_t a,
     poly16x4_t b)
| `a -> Vn.4H`
`b -> Vm.4H` | `TRN1 Vd.4H,Vn.4H,Vm.4H` | `Vd.4H -> result` | `A64` | -| poly16x8_t vtrn1q_p16(
     poly16x8_t a,
     poly16x8_t b)
| `a -> Vn.8H`
`b -> Vm.8H` | `TRN1 Vd.8H,Vn.8H,Vm.8H` | `Vd.8H -> result` | `A64` | -| int8x8_t vtrn2_s8(
     int8x8_t a,
     int8x8_t b)
| `a -> Vn.8B`
`b -> Vm.8B` | `TRN2 Vd.8B,Vn.8B,Vm.8B` | `Vd.8B -> result` | `A64` | -| int8x16_t vtrn2q_s8(
     int8x16_t a,
     int8x16_t b)
| `a -> Vn.16B`
`b -> Vm.16B` | `TRN2 Vd.16B,Vn.16B,Vm.16B` | `Vd.16B -> result` | `A64` | -| int16x4_t vtrn2_s16(
     int16x4_t a,
     int16x4_t b)
| `a -> Vn.4H`
`b -> Vm.4H` | `TRN2 Vd.4H,Vn.4H,Vm.4H` | `Vd.4H -> result` | `A64` | -| int16x8_t vtrn2q_s16(
     int16x8_t a,
     int16x8_t b)
| `a -> Vn.8H`
`b -> Vm.8H` | `TRN2 Vd.8H,Vn.8H,Vm.8H` | `Vd.8H -> result` | `A64` | -| int32x2_t vtrn2_s32(
     int32x2_t a,
     int32x2_t b)
| `a -> Vn.2S`
`b -> Vm.2S` | `TRN2 Vd.2S,Vn.2S,Vm.2S` | `Vd.2S -> result` | `A64` | -| int32x4_t vtrn2q_s32(
     int32x4_t a,
     int32x4_t b)
| `a -> Vn.4S`
`b -> Vm.4S` | `TRN2 Vd.4S,Vn.4S,Vm.4S` | `Vd.4S -> result` | `A64` | -| int64x2_t vtrn2q_s64(
     int64x2_t a,
     int64x2_t b)
| `a -> Vn.2D`
`b -> Vm.2D` | `TRN2 Vd.2D,Vn.2D,Vm.2D` | `Vd.2D -> result` | `A64` | -| uint8x8_t vtrn2_u8(
     uint8x8_t a,
     uint8x8_t b)
| `a -> Vn.8B`
`b -> Vm.8B` | `TRN2 Vd.8B,Vn.8B,Vm.8B` | `Vd.8B -> result` | `A64` | -| uint8x16_t vtrn2q_u8(
     uint8x16_t a,
     uint8x16_t b)
| `a -> Vn.16B`
`b -> Vm.16B` | `TRN2 Vd.16B,Vn.16B,Vm.16B` | `Vd.16B -> result` | `A64` | -| uint16x4_t vtrn2_u16(
     uint16x4_t a,
     uint16x4_t b)
| `a -> Vn.4H`
`b -> Vm.4H` | `TRN2 Vd.4H,Vn.4H,Vm.4H` | `Vd.4H -> result` | `A64` | -| uint16x8_t vtrn2q_u16(
     uint16x8_t a,
     uint16x8_t b)
| `a -> Vn.8H`
`b -> Vm.8H` | `TRN2 Vd.8H,Vn.8H,Vm.8H` | `Vd.8H -> result` | `A64` | -| uint32x2_t vtrn2_u32(
     uint32x2_t a,
     uint32x2_t b)
| `a -> Vn.2S`
`b -> Vm.2S` | `TRN2 Vd.2S,Vn.2S,Vm.2S` | `Vd.2S -> result` | `A64` | -| uint32x4_t vtrn2q_u32(
     uint32x4_t a,
     uint32x4_t b)
| `a -> Vn.4S`
`b -> Vm.4S` | `TRN2 Vd.4S,Vn.4S,Vm.4S` | `Vd.4S -> result` | `A64` | -| uint64x2_t vtrn2q_u64(
     uint64x2_t a,
     uint64x2_t b)
| `a -> Vn.2D`
`b -> Vm.2D` | `TRN2 Vd.2D,Vn.2D,Vm.2D` | `Vd.2D -> result` | `A64` | -| poly64x2_t vtrn2q_p64(
     poly64x2_t a,
     poly64x2_t b)
| `a -> Vn.2D`
`b -> Vm.2D` | `TRN2 Vd.2D,Vn.2D,Vm.2D` | `Vd.2D -> result` | `A64` | -| float32x2_t vtrn2_f32(
     float32x2_t a,
     float32x2_t b)
| `a -> Vn.2S`
`b -> Vm.2S` | `TRN2 Vd.2S,Vn.2S,Vm.2S` | `Vd.2S -> result` | `A64` | -| float32x4_t vtrn2q_f32(
     float32x4_t a,
     float32x4_t b)
| `a -> Vn.4S`
`b -> Vm.4S` | `TRN2 Vd.4S,Vn.4S,Vm.4S` | `Vd.4S -> result` | `A64` | -| float64x2_t vtrn2q_f64(
     float64x2_t a,
     float64x2_t b)
| `a -> Vn.2D`
`b -> Vm.2D` | `TRN2 Vd.2D,Vn.2D,Vm.2D` | `Vd.2D -> result` | `A64` | -| poly8x8_t vtrn2_p8(
     poly8x8_t a,
     poly8x8_t b)
| `a -> Vn.8B`
`b -> Vm.8B` | `TRN2 Vd.8B,Vn.8B,Vm.8B` | `Vd.8B -> result` | `A64` | -| poly8x16_t vtrn2q_p8(
     poly8x16_t a,
     poly8x16_t b)
| `a -> Vn.16B`
`b -> Vm.16B` | `TRN2 Vd.16B,Vn.16B,Vm.16B` | `Vd.16B -> result` | `A64` | -| poly16x4_t vtrn2_p16(
     poly16x4_t a,
     poly16x4_t b)
| `a -> Vn.4H`
`b -> Vm.4H` | `TRN2 Vd.4H,Vn.4H,Vm.4H` | `Vd.4H -> result` | `A64` | -| poly16x8_t vtrn2q_p16(
     poly16x8_t a,
     poly16x8_t b)
| `a -> Vn.8H`
`b -> Vm.8H` | `TRN2 Vd.8H,Vn.8H,Vm.8H` | `Vd.8H -> result` | `A64` | -| int8x8x2_t vtrn_s8(
     int8x8_t a,
     int8x8_t b)
| `a -> Vn.8B`
`b -> Vm.8B` | `TRN1 Vd1.8B,Vn.8B,Vm.8B`
`TRN2 Vd2.8B,Vn.8B,Vm.8B` | `Vd1.8B -> result.val[0]`
`Vd2.8B -> result.val[1]` | `v7/A32/A64` | -| int16x4x2_t vtrn_s16(
     int16x4_t a,
     int16x4_t b)
| `a -> Vn.4H`
`b -> Vm.4H` | `TRN1 Vd1.4H,Vn.4H,Vm.4H`
`TRN2 Vd2.4H,Vn.4H,Vm.4H` | `Vd1.4H -> result.val[0]`
`Vd2.4H -> result.val[1]` | `v7/A32/A64` | -| uint8x8x2_t vtrn_u8(
     uint8x8_t a,
     uint8x8_t b)
| `a -> Vn.8B`
`b -> Vm.8B` | `TRN1 Vd1.8B,Vn.8B,Vm.8B`
`TRN2 Vd2.8B,Vn.8B,Vm.8B` | `Vd1.8B -> result.val[0]`
`Vd2.8B -> result.val[1]` | `v7/A32/A64` | -| uint16x4x2_t vtrn_u16(
     uint16x4_t a,
     uint16x4_t b)
| `a -> Vn.4H`
`b -> Vm.4H` | `TRN1 Vd1.4H,Vn.4H,Vm.4H`
`TRN2 Vd2.4H,Vn.4H,Vm.4H` | `Vd1.4H -> result.val[0]`
`Vd2.4H -> result.val[1]` | `v7/A32/A64` | -| poly8x8x2_t vtrn_p8(
     poly8x8_t a,
     poly8x8_t b)
| `a -> Vn.8B`
`b -> Vm.8B` | `TRN1 Vd1.8B,Vn.8B,Vm.8B`
`TRN2 Vd2.8B,Vn.8B,Vm.8B` | `Vd1.8B -> result.val[0]`
`Vd2.8B -> result.val[1]` | `v7/A32/A64` | -| poly16x4x2_t vtrn_p16(
     poly16x4_t a,
     poly16x4_t b)
| `a -> Vn.4H`
`b -> Vm.4H` | `TRN1 Vd1.4H,Vn.4H,Vm.4H`
`TRN2 Vd2.4H,Vn.4H,Vm.4H` | `Vd1.4H -> result.val[0]`
`Vd2.4H -> result.val[1]` | `v7/A32/A64` | -| int32x2x2_t vtrn_s32(
     int32x2_t a,
     int32x2_t b)
| `a -> Vn.2S`
`b -> Vm.2S` | `TRN1 Vd1.2S,Vn.2S,Vm.2S`
`TRN2 Vd2.2S,Vn.2S,Vm.2S` | `Vd1.2S -> result.val[0]`
`Vd2.2S -> result.val[1]` | `v7/A32/A64` | -| float32x2x2_t vtrn_f32(
     float32x2_t a,
     float32x2_t b)
| `a -> Vn.2S`
`b -> Vm.2S` | `TRN1 Vd1.2S,Vn.2S,Vm.2S`
`TRN2 Vd2.2S,Vn.2S,Vm.2S` | `Vd1.2S -> result.val[0]`
`Vd2.2S -> result.val[1]` | `v7/A32/A64` | -| uint32x2x2_t vtrn_u32(
     uint32x2_t a,
     uint32x2_t b)
| `a -> Vn.2S`
`b -> Vm.2S` | `TRN1 Vd1.2S,Vn.2S,Vm.2S`
`TRN2 Vd2.2S,Vn.2S,Vm.2S` | `Vd1.2S -> result.val[0]`
`Vd2.2S -> result.val[1]` | `v7/A32/A64` | -| int8x16x2_t vtrnq_s8(
     int8x16_t a,
     int8x16_t b)
| `a -> Vn.16B`
`b -> Vm.16B` | `TRN1 Vd1.16B,Vn.16B,Vm.16B`
`TRN2 Vd2.16B,Vn.16B,Vm.16B` | `Vd1.16B -> result.val[0]`
`Vd2.16B -> result.val[1]` | `v7/A32/A64` | -| int16x8x2_t vtrnq_s16(
     int16x8_t a,
     int16x8_t b)
| `a -> Vn.8H`
`b -> Vm.8H` | `TRN1 Vd1.8H,Vn.8H,Vm.8H`
`TRN2 Vd2.8H,Vn.8H,Vm.8H` | `Vd1.8H -> result.val[0]`
`Vd2.8H -> result.val[1]` | `v7/A32/A64` | -| int32x4x2_t vtrnq_s32(
     int32x4_t a,
     int32x4_t b)
| `a -> Vn.4S`
`b -> Vm.4S` | `TRN1 Vd1.4S,Vn.4S,Vm.4S`
`TRN2 Vd2.4S,Vn.4S,Vm.4S` | `Vd1.4S -> result.val[0]`
`Vd2.4S -> result.val[1]` | `v7/A32/A64` | -| float32x4x2_t vtrnq_f32(
     float32x4_t a,
     float32x4_t b)
| `a -> Vn.4S`
`b -> Vm.4S` | `TRN1 Vd1.4S,Vn.4S,Vm.4S`
`TRN2 Vd2.4S,Vn.4S,Vm.4S` | `Vd1.4S -> result.val[0]`
`Vd2.4S -> result.val[1]` | `v7/A32/A64` | -| uint8x16x2_t vtrnq_u8(
     uint8x16_t a,
     uint8x16_t b)
| `a -> Vn.16B`
`b -> Vm.16B` | `TRN1 Vd1.16B,Vn.16B,Vm.16B`
`TRN2 Vd2.16B,Vn.16B,Vm.16B` | `Vd1.16B -> result.val[0]`
`Vd2.16B -> result.val[1]` | `v7/A32/A64` | -| uint16x8x2_t vtrnq_u16(
     uint16x8_t a,
     uint16x8_t b)
| `a -> Vn.8H`
`b -> Vm.8H` | `TRN1 Vd1.8H,Vn.8H,Vm.8H`
`TRN2 Vd2.8H,Vn.8H,Vm.8H` | `Vd1.8H -> result.val[0]`
`Vd2.8H -> result.val[1]` | `v7/A32/A64` | -| uint32x4x2_t vtrnq_u32(
     uint32x4_t a,
     uint32x4_t b)
| `a -> Vn.4S`
`b -> Vm.4S` | `TRN1 Vd1.4S,Vn.4S,Vm.4S`
`TRN2 Vd2.4S,Vn.4S,Vm.4S` | `Vd1.4S -> result.val[0]`
`Vd2.4S -> result.val[1]` | `v7/A32/A64` | -| poly8x16x2_t vtrnq_p8(
     poly8x16_t a,
     poly8x16_t b)
| `a -> Vn.16B`
`b -> Vm.16B` | `TRN1 Vd1.16B,Vn.16B,Vm.16B`
`TRN2 Vd2.16B,Vn.16B,Vm.16B` | `Vd1.16B -> result.val[0]`
`Vd2.16B -> result.val[1]` | `v7/A32/A64` | -| poly16x8x2_t vtrnq_p16(
     poly16x8_t a,
     poly16x8_t b)
| `a -> Vn.8H`
`b -> Vm.8H` | `TRN1 Vd1.8H,Vn.8H,Vm.8H`
`TRN2 Vd2.8H,Vn.8H,Vm.8H` | `Vd1.8H -> result.val[0]`
`Vd2.8H -> result.val[1]` | `v7/A32/A64` | +| Intrinsic | Argument preparation | AArch64 Instruction | Result | Supported architectures | +|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------|--------------------------------------------------------------|----------------------------------------------------------|---------------------------| +| int8x8_t vtrn1_s8(
     int8x8_t a,
     int8x8_t b)
| `a -> Vn.8B`
`b -> Vm.8B` | `TRN1 Vd.8B,Vn.8B,Vm.8B` | `Vd.8B -> result` | `A64` | +| int8x16_t vtrn1q_s8(
     int8x16_t a,
     int8x16_t b)
| `a -> Vn.16B`
`b -> Vm.16B` | `TRN1 Vd.16B,Vn.16B,Vm.16B` | `Vd.16B -> result` | `A64` | +| int16x4_t vtrn1_s16(
     int16x4_t a,
     int16x4_t b)
| `a -> Vn.4H`
`b -> Vm.4H` | `TRN1 Vd.4H,Vn.4H,Vm.4H` | `Vd.4H -> result` | `A64` | +| int16x8_t vtrn1q_s16(
     int16x8_t a,
     int16x8_t b)
| `a -> Vn.8H`
`b -> Vm.8H` | `TRN1 Vd.8H,Vn.8H,Vm.8H` | `Vd.8H -> result` | `A64` | +| int32x2_t vtrn1_s32(
     int32x2_t a,
     int32x2_t b)
| `a -> Vn.2S`
`b -> Vm.2S` | `TRN1 Vd.2S,Vn.2S,Vm.2S` | `Vd.2S -> result` | `A64` | +| int32x4_t vtrn1q_s32(
     int32x4_t a,
     int32x4_t b)
| `a -> Vn.4S`
`b -> Vm.4S` | `TRN1 Vd.4S,Vn.4S,Vm.4S` | `Vd.4S -> result` | `A64` | +| int64x2_t vtrn1q_s64(
     int64x2_t a,
     int64x2_t b)
| `a -> Vn.2D`
`b -> Vm.2D` | `TRN1 Vd.2D,Vn.2D,Vm.2D` | `Vd.2D -> result` | `A64` | +| uint8x8_t vtrn1_u8(
     uint8x8_t a,
     uint8x8_t b)
| `a -> Vn.8B`
`b -> Vm.8B` | `TRN1 Vd.8B,Vn.8B,Vm.8B` | `Vd.8B -> result` | `A64` | +| uint8x16_t vtrn1q_u8(
     uint8x16_t a,
     uint8x16_t b)
| `a -> Vn.16B`
`b -> Vm.16B` | `TRN1 Vd.16B,Vn.16B,Vm.16B` | `Vd.16B -> result` | `A64` | +| uint16x4_t vtrn1_u16(
     uint16x4_t a,
     uint16x4_t b)
| `a -> Vn.4H`
`b -> Vm.4H` | `TRN1 Vd.4H,Vn.4H,Vm.4H` | `Vd.4H -> result` | `A64` | +| uint16x8_t vtrn1q_u16(
     uint16x8_t a,
     uint16x8_t b)
| `a -> Vn.8H`
`b -> Vm.8H` | `TRN1 Vd.8H,Vn.8H,Vm.8H` | `Vd.8H -> result` | `A64` | +| uint32x2_t vtrn1_u32(
     uint32x2_t a,
     uint32x2_t b)
| `a -> Vn.2S`
`b -> Vm.2S` | `TRN1 Vd.2S,Vn.2S,Vm.2S` | `Vd.2S -> result` | `A64` | +| uint32x4_t vtrn1q_u32(
     uint32x4_t a,
     uint32x4_t b)
| `a -> Vn.4S`
`b -> Vm.4S` | `TRN1 Vd.4S,Vn.4S,Vm.4S` | `Vd.4S -> result` | `A64` | +| uint64x2_t vtrn1q_u64(
     uint64x2_t a,
     uint64x2_t b)
| `a -> Vn.2D`
`b -> Vm.2D` | `TRN1 Vd.2D,Vn.2D,Vm.2D` | `Vd.2D -> result` | `A64` | +| poly64x2_t vtrn1q_p64(
     poly64x2_t a,
     poly64x2_t b)
| `a -> Vn.2D`
`b -> Vm.2D` | `TRN1 Vd.2D,Vn.2D,Vm.2D` | `Vd.2D -> result` | `A64` | +| float32x2_t vtrn1_f32(
     float32x2_t a,
     float32x2_t b)
| `a -> Vn.2S`
`b -> Vm.2S` | `TRN1 Vd.2S,Vn.2S,Vm.2S` | `Vd.2S -> result` | `A64` | +| float32x4_t vtrn1q_f32(
     float32x4_t a,
     float32x4_t b)
| `a -> Vn.4S`
`b -> Vm.4S` | `TRN1 Vd.4S,Vn.4S,Vm.4S` | `Vd.4S -> result` | `A64` | +| float64x2_t vtrn1q_f64(
     float64x2_t a,
     float64x2_t b)
| `a -> Vn.2D`
`b -> Vm.2D` | `TRN1 Vd.2D,Vn.2D,Vm.2D` | `Vd.2D -> result` | `A64` | +| poly8x8_t vtrn1_p8(
     poly8x8_t a,
     poly8x8_t b)
| `a -> Vn.8B`
`b -> Vm.8B` | `TRN1 Vd.8B,Vn.8B,Vm.8B` | `Vd.8B -> result` | `A64` | +| poly8x16_t vtrn1q_p8(
     poly8x16_t a,
     poly8x16_t b)
| `a -> Vn.16B`
`b -> Vm.16B` | `TRN1 Vd.16B,Vn.16B,Vm.16B` | `Vd.16B -> result` | `A64` | +| poly16x4_t vtrn1_p16(
     poly16x4_t a,
     poly16x4_t b)
| `a -> Vn.4H`
`b -> Vm.4H` | `TRN1 Vd.4H,Vn.4H,Vm.4H` | `Vd.4H -> result` | `A64` | +| poly16x8_t vtrn1q_p16(
     poly16x8_t a,
     poly16x8_t b)
| `a -> Vn.8H`
`b -> Vm.8H` | `TRN1 Vd.8H,Vn.8H,Vm.8H` | `Vd.8H -> result` | `A64` | +| mfloat8x8_t vtrn1_mf8(
     mfloat8x8_t a,
     mfloat8x8_t b)
| `a -> Vn.8B`
`b -> Vm.8B` | `TRN1 Vd.8B,Vn.8B,Vm.8B` | `Vd.8B -> result` | `A64` | +| mfloat8x16_t vtrn1q_mf8(
     mfloat8x16_t a,
     mfloat8x16_t b)
| `a -> Vn.16B`
`b -> Vm.16B` | `TRN1 Vd.16B,Vn.16B,Vm.16B` | `Vd.16B -> result` | `A64` | +| int8x8_t vtrn2_s8(
     int8x8_t a,
     int8x8_t b)
| `a -> Vn.8B`
`b -> Vm.8B` | `TRN2 Vd.8B,Vn.8B,Vm.8B` | `Vd.8B -> result` | `A64` | +| int8x16_t vtrn2q_s8(
     int8x16_t a,
     int8x16_t b)
| `a -> Vn.16B`
`b -> Vm.16B` | `TRN2 Vd.16B,Vn.16B,Vm.16B` | `Vd.16B -> result` | `A64` | +| int16x4_t vtrn2_s16(
     int16x4_t a,
     int16x4_t b)
| `a -> Vn.4H`
`b -> Vm.4H` | `TRN2 Vd.4H,Vn.4H,Vm.4H` | `Vd.4H -> result` | `A64` | +| int16x8_t vtrn2q_s16(
     int16x8_t a,
     int16x8_t b)
| `a -> Vn.8H`
`b -> Vm.8H` | `TRN2 Vd.8H,Vn.8H,Vm.8H` | `Vd.8H -> result` | `A64` | +| int32x2_t vtrn2_s32(
     int32x2_t a,
     int32x2_t b)
| `a -> Vn.2S`
`b -> Vm.2S` | `TRN2 Vd.2S,Vn.2S,Vm.2S` | `Vd.2S -> result` | `A64` | +| int32x4_t vtrn2q_s32(
     int32x4_t a,
     int32x4_t b)
| `a -> Vn.4S`
`b -> Vm.4S` | `TRN2 Vd.4S,Vn.4S,Vm.4S` | `Vd.4S -> result` | `A64` | +| int64x2_t vtrn2q_s64(
     int64x2_t a,
     int64x2_t b)
| `a -> Vn.2D`
`b -> Vm.2D` | `TRN2 Vd.2D,Vn.2D,Vm.2D` | `Vd.2D -> result` | `A64` | +| uint8x8_t vtrn2_u8(
     uint8x8_t a,
     uint8x8_t b)
| `a -> Vn.8B`
`b -> Vm.8B` | `TRN2 Vd.8B,Vn.8B,Vm.8B` | `Vd.8B -> result` | `A64` | +| uint8x16_t vtrn2q_u8(
     uint8x16_t a,
     uint8x16_t b)
| `a -> Vn.16B`
`b -> Vm.16B` | `TRN2 Vd.16B,Vn.16B,Vm.16B` | `Vd.16B -> result` | `A64` | +| uint16x4_t vtrn2_u16(
     uint16x4_t a,
     uint16x4_t b)
| `a -> Vn.4H`
`b -> Vm.4H` | `TRN2 Vd.4H,Vn.4H,Vm.4H` | `Vd.4H -> result` | `A64` | +| uint16x8_t vtrn2q_u16(
     uint16x8_t a,
     uint16x8_t b)
| `a -> Vn.8H`
`b -> Vm.8H` | `TRN2 Vd.8H,Vn.8H,Vm.8H` | `Vd.8H -> result` | `A64` | +| uint32x2_t vtrn2_u32(
     uint32x2_t a,
     uint32x2_t b)
| `a -> Vn.2S`
`b -> Vm.2S` | `TRN2 Vd.2S,Vn.2S,Vm.2S` | `Vd.2S -> result` | `A64` | +| uint32x4_t vtrn2q_u32(
     uint32x4_t a,
     uint32x4_t b)
| `a -> Vn.4S`
`b -> Vm.4S` | `TRN2 Vd.4S,Vn.4S,Vm.4S` | `Vd.4S -> result` | `A64` | +| uint64x2_t vtrn2q_u64(
     uint64x2_t a,
     uint64x2_t b)
| `a -> Vn.2D`
`b -> Vm.2D` | `TRN2 Vd.2D,Vn.2D,Vm.2D` | `Vd.2D -> result` | `A64` | +| poly64x2_t vtrn2q_p64(
     poly64x2_t a,
     poly64x2_t b)
| `a -> Vn.2D`
`b -> Vm.2D` | `TRN2 Vd.2D,Vn.2D,Vm.2D` | `Vd.2D -> result` | `A64` | +| float32x2_t vtrn2_f32(
     float32x2_t a,
     float32x2_t b)
| `a -> Vn.2S`
`b -> Vm.2S` | `TRN2 Vd.2S,Vn.2S,Vm.2S` | `Vd.2S -> result` | `A64` | +| float32x4_t vtrn2q_f32(
     float32x4_t a,
     float32x4_t b)
| `a -> Vn.4S`
`b -> Vm.4S` | `TRN2 Vd.4S,Vn.4S,Vm.4S` | `Vd.4S -> result` | `A64` | +| float64x2_t vtrn2q_f64(
     float64x2_t a,
     float64x2_t b)
| `a -> Vn.2D`
`b -> Vm.2D` | `TRN2 Vd.2D,Vn.2D,Vm.2D` | `Vd.2D -> result` | `A64` | +| poly8x8_t vtrn2_p8(
     poly8x8_t a,
     poly8x8_t b)
| `a -> Vn.8B`
`b -> Vm.8B` | `TRN2 Vd.8B,Vn.8B,Vm.8B` | `Vd.8B -> result` | `A64` | +| poly8x16_t vtrn2q_p8(
     poly8x16_t a,
     poly8x16_t b)
| `a -> Vn.16B`
`b -> Vm.16B` | `TRN2 Vd.16B,Vn.16B,Vm.16B` | `Vd.16B -> result` | `A64` | +| poly16x4_t vtrn2_p16(
     poly16x4_t a,
     poly16x4_t b)
| `a -> Vn.4H`
`b -> Vm.4H` | `TRN2 Vd.4H,Vn.4H,Vm.4H` | `Vd.4H -> result` | `A64` | +| poly16x8_t vtrn2q_p16(
     poly16x8_t a,
     poly16x8_t b)
| `a -> Vn.8H`
`b -> Vm.8H` | `TRN2 Vd.8H,Vn.8H,Vm.8H` | `Vd.8H -> result` | `A64` | +| mfloat8x8_t vtrn2_mf8(
     mfloat8x8_t a,
     mfloat8x8_t b)
| `a -> Vn.8B`
`b -> Vm.8B` | `TRN2 Vd.8B,Vn.8B,Vm.8B` | `Vd.8B -> result` | `A64` | +| mfloat8x16_t vtrn2q_mf8(
     mfloat8x16_t a,
     mfloat8x16_t b)
| `a -> Vn.16B`
`b -> Vm.16B` | `TRN2 Vd.16B,Vn.16B,Vm.16B` | `Vd.16B -> result` | `A64` | +| int8x8x2_t vtrn_s8(
     int8x8_t a,
     int8x8_t b)
| `a -> Vn.8B`
`b -> Vm.8B` | `TRN1 Vd1.8B,Vn.8B,Vm.8B`
`TRN2 Vd2.8B,Vn.8B,Vm.8B` | `Vd1.8B -> result.val[0]`
`Vd2.8B -> result.val[1]` | `v7/A32/A64` | +| int16x4x2_t vtrn_s16(
     int16x4_t a,
     int16x4_t b)
| `a -> Vn.4H`
`b -> Vm.4H` | `TRN1 Vd1.4H,Vn.4H,Vm.4H`
`TRN2 Vd2.4H,Vn.4H,Vm.4H` | `Vd1.4H -> result.val[0]`
`Vd2.4H -> result.val[1]` | `v7/A32/A64` | +| uint8x8x2_t vtrn_u8(
     uint8x8_t a,
     uint8x8_t b)
| `a -> Vn.8B`
`b -> Vm.8B` | `TRN1 Vd1.8B,Vn.8B,Vm.8B`
`TRN2 Vd2.8B,Vn.8B,Vm.8B` | `Vd1.8B -> result.val[0]`
`Vd2.8B -> result.val[1]` | `v7/A32/A64` | +| uint16x4x2_t vtrn_u16(
     uint16x4_t a,
     uint16x4_t b)
| `a -> Vn.4H`
`b -> Vm.4H` | `TRN1 Vd1.4H,Vn.4H,Vm.4H`
`TRN2 Vd2.4H,Vn.4H,Vm.4H` | `Vd1.4H -> result.val[0]`
`Vd2.4H -> result.val[1]` | `v7/A32/A64` | +| poly8x8x2_t vtrn_p8(
     poly8x8_t a,
     poly8x8_t b)
| `a -> Vn.8B`
`b -> Vm.8B` | `TRN1 Vd1.8B,Vn.8B,Vm.8B`
`TRN2 Vd2.8B,Vn.8B,Vm.8B` | `Vd1.8B -> result.val[0]`
`Vd2.8B -> result.val[1]` | `v7/A32/A64` | +| poly16x4x2_t vtrn_p16(
     poly16x4_t a,
     poly16x4_t b)
| `a -> Vn.4H`
`b -> Vm.4H` | `TRN1 Vd1.4H,Vn.4H,Vm.4H`
`TRN2 Vd2.4H,Vn.4H,Vm.4H` | `Vd1.4H -> result.val[0]`
`Vd2.4H -> result.val[1]` | `v7/A32/A64` | +| int32x2x2_t vtrn_s32(
     int32x2_t a,
     int32x2_t b)
| `a -> Vn.2S`
`b -> Vm.2S` | `TRN1 Vd1.2S,Vn.2S,Vm.2S`
`TRN2 Vd2.2S,Vn.2S,Vm.2S` | `Vd1.2S -> result.val[0]`
`Vd2.2S -> result.val[1]` | `v7/A32/A64` | +| float32x2x2_t vtrn_f32(
     float32x2_t a,
     float32x2_t b)
| `a -> Vn.2S`
`b -> Vm.2S` | `TRN1 Vd1.2S,Vn.2S,Vm.2S`
`TRN2 Vd2.2S,Vn.2S,Vm.2S` | `Vd1.2S -> result.val[0]`
`Vd2.2S -> result.val[1]` | `v7/A32/A64` | +| uint32x2x2_t vtrn_u32(
     uint32x2_t a,
     uint32x2_t b)
| `a -> Vn.2S`
`b -> Vm.2S` | `TRN1 Vd1.2S,Vn.2S,Vm.2S`
`TRN2 Vd2.2S,Vn.2S,Vm.2S` | `Vd1.2S -> result.val[0]`
`Vd2.2S -> result.val[1]` | `v7/A32/A64` | +| mfloat8x8x2_t vtrn_mf8(
     mfloat8x8_t a,
     mfloat8x8_t b)
| `a -> Vn.8B`
`b -> Vm.8B` | `TRN1 Vd1.8B,Vn.8B,Vm.8B`
`TRN2 Vd2.8B,Vn.8B,Vm.8B` | `Vd1.8B -> result.val[0]`
`Vd2.8B -> result.val[1]` | `A64` | +| int8x16x2_t vtrnq_s8(
     int8x16_t a,
     int8x16_t b)
| `a -> Vn.16B`
`b -> Vm.16B` | `TRN1 Vd1.16B,Vn.16B,Vm.16B`
`TRN2 Vd2.16B,Vn.16B,Vm.16B` | `Vd1.16B -> result.val[0]`
`Vd2.16B -> result.val[1]` | `v7/A32/A64` | +| int16x8x2_t vtrnq_s16(
     int16x8_t a,
     int16x8_t b)
| `a -> Vn.8H`
`b -> Vm.8H` | `TRN1 Vd1.8H,Vn.8H,Vm.8H`
`TRN2 Vd2.8H,Vn.8H,Vm.8H` | `Vd1.8H -> result.val[0]`
`Vd2.8H -> result.val[1]` | `v7/A32/A64` | +| int32x4x2_t vtrnq_s32(
     int32x4_t a,
     int32x4_t b)
| `a -> Vn.4S`
`b -> Vm.4S` | `TRN1 Vd1.4S,Vn.4S,Vm.4S`
`TRN2 Vd2.4S,Vn.4S,Vm.4S` | `Vd1.4S -> result.val[0]`
`Vd2.4S -> result.val[1]` | `v7/A32/A64` | +| float32x4x2_t vtrnq_f32(
     float32x4_t a,
     float32x4_t b)
| `a -> Vn.4S`
`b -> Vm.4S` | `TRN1 Vd1.4S,Vn.4S,Vm.4S`
`TRN2 Vd2.4S,Vn.4S,Vm.4S` | `Vd1.4S -> result.val[0]`
`Vd2.4S -> result.val[1]` | `v7/A32/A64` | +| uint8x16x2_t vtrnq_u8(
     uint8x16_t a,
     uint8x16_t b)
| `a -> Vn.16B`
`b -> Vm.16B` | `TRN1 Vd1.16B,Vn.16B,Vm.16B`
`TRN2 Vd2.16B,Vn.16B,Vm.16B` | `Vd1.16B -> result.val[0]`
`Vd2.16B -> result.val[1]` | `v7/A32/A64` | +| uint16x8x2_t vtrnq_u16(
     uint16x8_t a,
     uint16x8_t b)
| `a -> Vn.8H`
`b -> Vm.8H` | `TRN1 Vd1.8H,Vn.8H,Vm.8H`
`TRN2 Vd2.8H,Vn.8H,Vm.8H` | `Vd1.8H -> result.val[0]`
`Vd2.8H -> result.val[1]` | `v7/A32/A64` | +| uint32x4x2_t vtrnq_u32(
     uint32x4_t a,
     uint32x4_t b)
| `a -> Vn.4S`
`b -> Vm.4S` | `TRN1 Vd1.4S,Vn.4S,Vm.4S`
`TRN2 Vd2.4S,Vn.4S,Vm.4S` | `Vd1.4S -> result.val[0]`
`Vd2.4S -> result.val[1]` | `v7/A32/A64` | +| poly8x16x2_t vtrnq_p8(
     poly8x16_t a,
     poly8x16_t b)
| `a -> Vn.16B`
`b -> Vm.16B` | `TRN1 Vd1.16B,Vn.16B,Vm.16B`
`TRN2 Vd2.16B,Vn.16B,Vm.16B` | `Vd1.16B -> result.val[0]`
`Vd2.16B -> result.val[1]` | `v7/A32/A64` | +| poly16x8x2_t vtrnq_p16(
     poly16x8_t a,
     poly16x8_t b)
| `a -> Vn.8H`
`b -> Vm.8H` | `TRN1 Vd1.8H,Vn.8H,Vm.8H`
`TRN2 Vd2.8H,Vn.8H,Vm.8H` | `Vd1.8H -> result.val[0]`
`Vd2.8H -> result.val[1]` | `v7/A32/A64` | +| mfloat8x16x2_t vtrnq_mf8(
     mfloat8x16_t a,
     mfloat8x16_t b)
| `a -> Vn.16B`
`b -> Vm.16B` | `TRN1 Vd1.16B,Vn.16B,Vm.16B`
`TRN2 Vd2.16B,Vn.16B,Vm.16B` | `Vd1.16B -> result.val[0]`
`Vd2.16B -> result.val[1]` | `A64` | #### Set vector lane -| Intrinsic | Argument preparation | AArch64 Instruction | Result | Supported architectures | -|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------|--------------------------|--------------------|---------------------------| -| uint8x8_t vset_lane_u8(
     uint8_t a,
     uint8x8_t v,
     const int lane)
| `0<=lane<=7`
`a -> Rn`
`v -> Vd.8B` | `MOV Vd.B[lane],Rn` | `Vd.8B -> result` | `v7/A32/A64` | -| uint16x4_t vset_lane_u16(
     uint16_t a,
     uint16x4_t v,
     const int lane)
| `0<=lane<=3`
`a -> Rn`
`v -> Vd.4H` | `MOV Vd.H[lane],Rn` | `Vd.4H -> result` | `v7/A32/A64` | -| uint32x2_t vset_lane_u32(
     uint32_t a,
     uint32x2_t v,
     const int lane)
| `0<=lane<=1`
`a -> Rn`
`v -> Vd.2S` | `MOV Vd.S[lane],Rn` | `Vd.2S -> result` | `v7/A32/A64` | -| uint64x1_t vset_lane_u64(
     uint64_t a,
     uint64x1_t v,
     const int lane)
| `lane==0`
`a -> Rn`
`v -> Vd.1D` | `MOV Vd.D[lane],Rn` | `Vd.1D -> result` | `v7/A32/A64` | -| poly64x1_t vset_lane_p64(
     poly64_t a,
     poly64x1_t v,
     const int lane)
| `lane==0`
`a -> Rn`
`v -> Vd.1D` | `MOV Vd.D[lane],Rn` | `Vd.1D -> result` | `A32/A64` | -| int8x8_t vset_lane_s8(
     int8_t a,
     int8x8_t v,
     const int lane)
| `0<=lane<=7`
`a -> Rn`
`v -> Vd.8B` | `MOV Vd.B[lane],Rn` | `Vd.8B -> result` | `v7/A32/A64` | -| int16x4_t vset_lane_s16(
     int16_t a,
     int16x4_t v,
     const int lane)
| `0<=lane<=3`
`a -> Rn`
`v -> Vd.4H` | `MOV Vd.H[lane],Rn` | `Vd.4H -> result` | `v7/A32/A64` | -| int32x2_t vset_lane_s32(
     int32_t a,
     int32x2_t v,
     const int lane)
| `0<=lane<=1`
`a -> Rn`
`v -> Vd.2S` | `MOV Vd.S[lane],Rn` | `Vd.2S -> result` | `v7/A32/A64` | -| int64x1_t vset_lane_s64(
     int64_t a,
     int64x1_t v,
     const int lane)
| `lane==0`
`a -> Rn`
`v -> Vd.1D` | `MOV Vd.D[lane],Rn` | `Vd.1D -> result` | `v7/A32/A64` | -| poly8x8_t vset_lane_p8(
     poly8_t a,
     poly8x8_t v,
     const int lane)
| `0<=lane<=7`
`a -> Rn`
`v -> Vd.8B` | `MOV Vd.B[lane],Rn` | `Vd.8B -> result` | `v7/A32/A64` | -| poly16x4_t vset_lane_p16(
     poly16_t a,
     poly16x4_t v,
     const int lane)
| `0<=lane<=3`
`a -> Rn`
`v -> Vd.4H` | `MOV Vd.H[lane],Rn` | `Vd.4H -> result` | `v7/A32/A64` | -| float16x4_t vset_lane_f16(
     float16_t a,
     float16x4_t v,
     const int lane)
| `0<=lane<=3`
`a -> VnH`
`v -> Vd.4H` | `MOV Vd.H[lane],Vn.H[0]` | `Vd.4H -> result` | `v7/A32/A64` | -| float16x8_t vsetq_lane_f16(
     float16_t a,
     float16x8_t v,
     const int lane)
| `0<=lane<=7`
`a -> VnH`
`v -> Vd.8H` | `MOV Vd.H[lane],Vn.H[0]` | `Vd.8H -> result` | `v7/A32/A64` | -| float32x2_t vset_lane_f32(
     float32_t a,
     float32x2_t v,
     const int lane)
| `0<=lane<=1`
`a -> Rn`
`v -> Vd.2S` | `MOV Vd.S[lane],Rn` | `Vd.2S -> result` | `v7/A32/A64` | -| float64x1_t vset_lane_f64(
     float64_t a,
     float64x1_t v,
     const int lane)
| `lane==0`
`a -> Rn`
`v -> Vd.1D` | `MOV Vd.D[lane],Rn` | `Vd.1D -> result` | `A64` | -| uint8x16_t vsetq_lane_u8(
     uint8_t a,
     uint8x16_t v,
     const int lane)
| `0<=lane<=15`
`a -> Rn`
`v -> Vd.16B` | `MOV Vd.B[lane],Rn` | `Vd.16B -> result` | `v7/A32/A64` | -| uint16x8_t vsetq_lane_u16(
     uint16_t a,
     uint16x8_t v,
     const int lane)
| `0<=lane<=7`
`a -> Rn`
`v -> Vd.8H` | `MOV Vd.H[lane],Rn` | `Vd.8H -> result` | `v7/A32/A64` | -| uint32x4_t vsetq_lane_u32(
     uint32_t a,
     uint32x4_t v,
     const int lane)
| `0<=lane<=3`
`a -> Rn`
`v -> Vd.4S` | `MOV Vd.S[lane],Rn` | `Vd.4S -> result` | `v7/A32/A64` | -| uint64x2_t vsetq_lane_u64(
     uint64_t a,
     uint64x2_t v,
     const int lane)
| `0<=lane<=1`
`a -> Rn`
`v -> Vd.2D` | `MOV Vd.D[lane],Rn` | `Vd.2D -> result` | `v7/A32/A64` | -| poly64x2_t vsetq_lane_p64(
     poly64_t a,
     poly64x2_t v,
     const int lane)
| `0<=lane<=1`
`a -> Rn`
`v -> Vd.2D` | `MOV Vd.D[lane],Rn` | `Vd.2D -> result` | `A32/A64` | -| int8x16_t vsetq_lane_s8(
     int8_t a,
     int8x16_t v,
     const int lane)
| `0<=lane<=15`
`a -> Rn`
`v -> Vd.16B` | `MOV Vd.B[lane],Rn` | `Vd.16B -> result` | `v7/A32/A64` | -| int16x8_t vsetq_lane_s16(
     int16_t a,
     int16x8_t v,
     const int lane)
| `0<=lane<=7`
`a -> Rn`
`v -> Vd.8H` | `MOV Vd.H[lane],Rn` | `Vd.8H -> result` | `v7/A32/A64` | -| int32x4_t vsetq_lane_s32(
     int32_t a,
     int32x4_t v,
     const int lane)
| `0<=lane<=3`
`a -> Rn`
`v -> Vd.4S` | `MOV Vd.S[lane],Rn` | `Vd.4S -> result` | `v7/A32/A64` | -| int64x2_t vsetq_lane_s64(
     int64_t a,
     int64x2_t v,
     const int lane)
| `0<=lane<=1`
`a -> Rn`
`v -> Vd.2D` | `MOV Vd.D[lane],Rn` | `Vd.2D -> result` | `v7/A32/A64` | -| poly8x16_t vsetq_lane_p8(
     poly8_t a,
     poly8x16_t v,
     const int lane)
| `0<=lane<=15`
`a -> Rn`
`v -> Vd.16B` | `MOV Vd.B[lane],Rn` | `Vd.16B -> result` | `v7/A32/A64` | -| poly16x8_t vsetq_lane_p16(
     poly16_t a,
     poly16x8_t v,
     const int lane)
| `0<=lane<=7`
`a -> Rn`
`v -> Vd.8H` | `MOV Vd.H[lane],Rn` | `Vd.8H -> result` | `v7/A32/A64` | -| float32x4_t vsetq_lane_f32(
     float32_t a,
     float32x4_t v,
     const int lane)
| `0<=lane<=3`
`a -> Rn`
`v -> Vd.4S` | `MOV Vd.S[lane],Rn` | `Vd.4S -> result` | `v7/A32/A64` | -| float64x2_t vsetq_lane_f64(
     float64_t a,
     float64x2_t v,
     const int lane)
| `0<=lane<=1`
`a -> Rn`
`v -> Vd.2D` | `MOV Vd.D[lane],Rn` | `Vd.2D -> result` | `A64` | +| Intrinsic | Argument preparation | AArch64 Instruction | Result | Supported architectures | +|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------|--------------------------|--------------------|---------------------------| +| uint8x8_t vset_lane_u8(
     uint8_t a,
     uint8x8_t v,
     const int lane)
| `0<=lane<=7`
`a -> Rn`
`v -> Vd.8B` | `MOV Vd.B[lane],Rn` | `Vd.8B -> result` | `v7/A32/A64` | +| uint16x4_t vset_lane_u16(
     uint16_t a,
     uint16x4_t v,
     const int lane)
| `0<=lane<=3`
`a -> Rn`
`v -> Vd.4H` | `MOV Vd.H[lane],Rn` | `Vd.4H -> result` | `v7/A32/A64` | +| uint32x2_t vset_lane_u32(
     uint32_t a,
     uint32x2_t v,
     const int lane)
| `0<=lane<=1`
`a -> Rn`
`v -> Vd.2S` | `MOV Vd.S[lane],Rn` | `Vd.2S -> result` | `v7/A32/A64` | +| uint64x1_t vset_lane_u64(
     uint64_t a,
     uint64x1_t v,
     const int lane)
| `lane==0`
`a -> Rn`
`v -> Vd.1D` | `MOV Vd.D[lane],Rn` | `Vd.1D -> result` | `v7/A32/A64` | +| poly64x1_t vset_lane_p64(
     poly64_t a,
     poly64x1_t v,
     const int lane)
| `lane==0`
`a -> Rn`
`v -> Vd.1D` | `MOV Vd.D[lane],Rn` | `Vd.1D -> result` | `A32/A64` | +| int8x8_t vset_lane_s8(
     int8_t a,
     int8x8_t v,
     const int lane)
| `0<=lane<=7`
`a -> Rn`
`v -> Vd.8B` | `MOV Vd.B[lane],Rn` | `Vd.8B -> result` | `v7/A32/A64` | +| int16x4_t vset_lane_s16(
     int16_t a,
     int16x4_t v,
     const int lane)
| `0<=lane<=3`
`a -> Rn`
`v -> Vd.4H` | `MOV Vd.H[lane],Rn` | `Vd.4H -> result` | `v7/A32/A64` | +| int32x2_t vset_lane_s32(
     int32_t a,
     int32x2_t v,
     const int lane)
| `0<=lane<=1`
`a -> Rn`
`v -> Vd.2S` | `MOV Vd.S[lane],Rn` | `Vd.2S -> result` | `v7/A32/A64` | +| int64x1_t vset_lane_s64(
     int64_t a,
     int64x1_t v,
     const int lane)
| `lane==0`
`a -> Rn`
`v -> Vd.1D` | `MOV Vd.D[lane],Rn` | `Vd.1D -> result` | `v7/A32/A64` | +| poly8x8_t vset_lane_p8(
     poly8_t a,
     poly8x8_t v,
     const int lane)
| `0<=lane<=7`
`a -> Rn`
`v -> Vd.8B` | `MOV Vd.B[lane],Rn` | `Vd.8B -> result` | `v7/A32/A64` | +| poly16x4_t vset_lane_p16(
     poly16_t a,
     poly16x4_t v,
     const int lane)
| `0<=lane<=3`
`a -> Rn`
`v -> Vd.4H` | `MOV Vd.H[lane],Rn` | `Vd.4H -> result` | `v7/A32/A64` | +| float16x4_t vset_lane_f16(
     float16_t a,
     float16x4_t v,
     const int lane)
| `0<=lane<=3`
`a -> VnH`
`v -> Vd.4H` | `MOV Vd.H[lane],Vn.H[0]` | `Vd.4H -> result` | `v7/A32/A64` | +| float16x8_t vsetq_lane_f16(
     float16_t a,
     float16x8_t v,
     const int lane)
| `0<=lane<=7`
`a -> VnH`
`v -> Vd.8H` | `MOV Vd.H[lane],Vn.H[0]` | `Vd.8H -> result` | `v7/A32/A64` | +| float32x2_t vset_lane_f32(
     float32_t a,
     float32x2_t v,
     const int lane)
| `0<=lane<=1`
`a -> Rn`
`v -> Vd.2S` | `MOV Vd.S[lane],Rn` | `Vd.2S -> result` | `v7/A32/A64` | +| float64x1_t vset_lane_f64(
     float64_t a,
     float64x1_t v,
     const int lane)
| `lane==0`
`a -> Rn`
`v -> Vd.1D` | `MOV Vd.D[lane],Rn` | `Vd.1D -> result` | `A64` | +| mfloat8x8_t vset_lane_mf8(
     mfloat8_t a,
     mfloat8x8_t v,
     const int lane)
| `0<=lane<=7`
`a -> Rn`
`v -> Vd.8B` | `MOV Vd.B[lane],Rn` | `Vd.8B -> result` | `A64` | +| uint8x16_t vsetq_lane_u8(
     uint8_t a,
     uint8x16_t v,
     const int lane)
| `0<=lane<=15`
`a -> Rn`
`v -> Vd.16B` | `MOV Vd.B[lane],Rn` | `Vd.16B -> result` | `v7/A32/A64` | +| uint16x8_t vsetq_lane_u16(
     uint16_t a,
     uint16x8_t v,
     const int lane)
| `0<=lane<=7`
`a -> Rn`
`v -> Vd.8H` | `MOV Vd.H[lane],Rn` | `Vd.8H -> result` | `v7/A32/A64` | +| uint32x4_t vsetq_lane_u32(
     uint32_t a,
     uint32x4_t v,
     const int lane)
| `0<=lane<=3`
`a -> Rn`
`v -> Vd.4S` | `MOV Vd.S[lane],Rn` | `Vd.4S -> result` | `v7/A32/A64` | +| uint64x2_t vsetq_lane_u64(
     uint64_t a,
     uint64x2_t v,
     const int lane)
| `0<=lane<=1`
`a -> Rn`
`v -> Vd.2D` | `MOV Vd.D[lane],Rn` | `Vd.2D -> result` | `v7/A32/A64` | +| poly64x2_t vsetq_lane_p64(
     poly64_t a,
     poly64x2_t v,
     const int lane)
| `0<=lane<=1`
`a -> Rn`
`v -> Vd.2D` | `MOV Vd.D[lane],Rn` | `Vd.2D -> result` | `A32/A64` | +| int8x16_t vsetq_lane_s8(
     int8_t a,
     int8x16_t v,
     const int lane)
| `0<=lane<=15`
`a -> Rn`
`v -> Vd.16B` | `MOV Vd.B[lane],Rn` | `Vd.16B -> result` | `v7/A32/A64` | +| int16x8_t vsetq_lane_s16(
     int16_t a,
     int16x8_t v,
     const int lane)
| `0<=lane<=7`
`a -> Rn`
`v -> Vd.8H` | `MOV Vd.H[lane],Rn` | `Vd.8H -> result` | `v7/A32/A64` | +| int32x4_t vsetq_lane_s32(
     int32_t a,
     int32x4_t v,
     const int lane)
| `0<=lane<=3`
`a -> Rn`
`v -> Vd.4S` | `MOV Vd.S[lane],Rn` | `Vd.4S -> result` | `v7/A32/A64` | +| int64x2_t vsetq_lane_s64(
     int64_t a,
     int64x2_t v,
     const int lane)
| `0<=lane<=1`
`a -> Rn`
`v -> Vd.2D` | `MOV Vd.D[lane],Rn` | `Vd.2D -> result` | `v7/A32/A64` | +| poly8x16_t vsetq_lane_p8(
     poly8_t a,
     poly8x16_t v,
     const int lane)
| `0<=lane<=15`
`a -> Rn`
`v -> Vd.16B` | `MOV Vd.B[lane],Rn` | `Vd.16B -> result` | `v7/A32/A64` | +| poly16x8_t vsetq_lane_p16(
     poly16_t a,
     poly16x8_t v,
     const int lane)
| `0<=lane<=7`
`a -> Rn`
`v -> Vd.8H` | `MOV Vd.H[lane],Rn` | `Vd.8H -> result` | `v7/A32/A64` | +| float32x4_t vsetq_lane_f32(
     float32_t a,
     float32x4_t v,
     const int lane)
| `0<=lane<=3`
`a -> Rn`
`v -> Vd.4S` | `MOV Vd.S[lane],Rn` | `Vd.4S -> result` | `v7/A32/A64` | +| float64x2_t vsetq_lane_f64(
     float64_t a,
     float64x2_t v,
     const int lane)
| `0<=lane<=1`
`a -> Rn`
`v -> Vd.2D` | `MOV Vd.D[lane],Rn` | `Vd.2D -> result` | `A64` | +| mfloat8x16_t vsetq_lane_mf8(
     mfloat8_t a,
     mfloat8x16_t v,
     const int lane)
| `0<=lane<=15`
`a -> Rn`
`v -> Vd.16B` | `MOV Vd.B[lane],Rn` | `Vd.16B -> result` | `A64` | + +#### Unzip elements` + +| Intrinsic | Argument preparation | AArch64 Instruction | Result | Supported architectures | +|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------|--------------------------------------------------------|--------------------------------------------------------|---------------------------| +| uint16x8x2_t vuzpq_u16(
     uint16x8_t a,
     uint16x8_t b)
| `a -> Vn.8H`
`b -> Vm.8H` | `UZP1 Vd1.8H,Vn.8H,Vm.8H`
`UZP2 Vd2.8H,Vn.8H,Vm.8H` | `Vd1.8H -> result.val[0]`
`Vd2.8H -> result.val[1]` | `v7/A32/A64` | ### Load #### Stride -| Intrinsic | Argument preparation | AArch64 Instruction | Result | Supported architectures | -|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------|---------------------------------------------------------------------------------------------------------------------|---------------------------| -| int8x8_t vld1_s8(int8_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.8B},[Xn]` | `Vt.8B -> result` | `v7/A32/A64` | -| int8x16_t vld1q_s8(int8_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.16B},[Xn]` | `Vt.16B -> result` | `v7/A32/A64` | -| int16x4_t vld1_s16(int16_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.4H},[Xn]` | `Vt.4H -> result` | `v7/A32/A64` | -| int16x8_t vld1q_s16(int16_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.8H},[Xn]` | `Vt.8H -> result` | `v7/A32/A64` | -| int32x2_t vld1_s32(int32_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.2S},[Xn]` | `Vt.2S -> result` | `v7/A32/A64` | -| int32x4_t vld1q_s32(int32_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.4S},[Xn]` | `Vt.4S -> result` | `v7/A32/A64` | -| int64x1_t vld1_s64(int64_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.1D},[Xn]` | `Vt.1D -> result` | `v7/A32/A64` | -| int64x2_t vld1q_s64(int64_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.2D},[Xn]` | `Vt.2D -> result` | `v7/A32/A64` | -| uint8x8_t vld1_u8(uint8_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.8B},[Xn]` | `Vt.8B -> result` | `v7/A32/A64` | -| uint8x16_t vld1q_u8(uint8_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.16B},[Xn]` | `Vt.16B -> result` | `v7/A32/A64` | -| uint16x4_t vld1_u16(uint16_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.4H},[Xn]` | `Vt.4H -> result` | `v7/A32/A64` | -| uint16x8_t vld1q_u16(uint16_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.8H},[Xn]` | `Vt.8H -> result` | `v7/A32/A64` | -| uint32x2_t vld1_u32(uint32_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.2S},[Xn]` | `Vt.2S -> result` | `v7/A32/A64` | -| uint32x4_t vld1q_u32(uint32_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.4S},[Xn]` | `Vt.4S -> result` | `v7/A32/A64` | -| uint64x1_t vld1_u64(uint64_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.1D},[Xn]` | `Vt.1D -> result` | `v7/A32/A64` | -| uint64x2_t vld1q_u64(uint64_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.2D},[Xn]` | `Vt.2D -> result` | `v7/A32/A64` | -| poly64x1_t vld1_p64(poly64_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.1D},[Xn]` | `Vt.1D -> result` | `A32/A64` | -| poly64x2_t vld1q_p64(poly64_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.2D},[Xn]` | `Vt.2D -> result` | `A32/A64` | -| float16x4_t vld1_f16(float16_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.4H},[Xn]` | `Vt.4H -> result` | `v7/A32/A64` | -| float16x8_t vld1q_f16(float16_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.8H},[Xn]` | `Vt.8H -> result` | `v7/A32/A64` | -| float32x2_t vld1_f32(float32_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.2S},[Xn]` | `Vt.2S -> result` | `v7/A32/A64` | -| float32x4_t vld1q_f32(float32_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.4S},[Xn]` | `Vt.4S -> result` | `v7/A32/A64` | -| poly8x8_t vld1_p8(poly8_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.8B},[Xn]` | `Vt.8B -> result` | `v7/A32/A64` | -| poly8x16_t vld1q_p8(poly8_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.16B},[Xn]` | `Vt.16B -> result` | `v7/A32/A64` | -| poly16x4_t vld1_p16(poly16_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.4H},[Xn]` | `Vt.4H -> result` | `v7/A32/A64` | -| poly16x8_t vld1q_p16(poly16_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.8H},[Xn]` | `Vt.8H -> result` | `v7/A32/A64` | -| float64x1_t vld1_f64(float64_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.1D},[Xn]` | `Vt.1D -> result` | `A64` | -| float64x2_t vld1q_f64(float64_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.2D},[Xn]` | `Vt.2D -> result` | `A64` | -| int8x8_t vld1_lane_s8(
     int8_t const *ptr,
     int8x8_t src,
     const int lane)
| `ptr -> Xn`
`src -> Vt.8B`
`0 <= lane <= 7` | `LD1 {Vt.b}[lane],[Xn]` | `Vt.8B -> result` | `v7/A32/A64` | -| int8x16_t vld1q_lane_s8(
     int8_t const *ptr,
     int8x16_t src,
     const int lane)
| `ptr -> Xn`
`src -> Vt.16B`
`0 <= lane <= 15` | `LD1 {Vt.b}[lane],[Xn]` | `Vt.16B -> result` | `v7/A32/A64` | -| int16x4_t vld1_lane_s16(
     int16_t const *ptr,
     int16x4_t src,
     const int lane)
| `ptr -> Xn`
`src -> Vt.4H`
`0 <= lane <= 3` | `LD1 {Vt.H}[lane],[Xn]` | `Vt.4H -> result` | `v7/A32/A64` | -| int16x8_t vld1q_lane_s16(
     int16_t const *ptr,
     int16x8_t src,
     const int lane)
| `ptr -> Xn`
`src -> Vt.8H`
`0 <= lane <= 7` | `LD1 {Vt.H}[lane],[Xn]` | `Vt.8H -> result` | `v7/A32/A64` | -| int32x2_t vld1_lane_s32(
     int32_t const *ptr,
     int32x2_t src,
     const int lane)
| `ptr -> Xn`
`src -> Vt.2S`
`0 <= lane <= 1` | `LD1 {Vt.S}[lane],[Xn]` | `Vt.2S -> result` | `v7/A32/A64` | -| int32x4_t vld1q_lane_s32(
     int32_t const *ptr,
     int32x4_t src,
     const int lane)
| `ptr -> Xn`
`src -> Vt.4S`
`0 <= lane <= 3` | `LD1 {Vt.S}[lane],[Xn]` | `Vt.4S -> result` | `v7/A32/A64` | -| int64x1_t vld1_lane_s64(
     int64_t const *ptr,
     int64x1_t src,
     const int lane)
| `ptr -> Xn`
`src -> Vt.1D`
`0 <= lane <= 0` | `LD1 {Vt.D}[lane],[Xn]` | `Vt.1D -> result` | `v7/A32/A64` | -| int64x2_t vld1q_lane_s64(
     int64_t const *ptr,
     int64x2_t src,
     const int lane)
| `ptr -> Xn`
`src -> Vt.2D`
`0 <= lane <= 1` | `LD1 {Vt.D}[lane],[Xn]` | `Vt.2D -> result` | `v7/A32/A64` | -| uint8x8_t vld1_lane_u8(
     uint8_t const *ptr,
     uint8x8_t src,
     const int lane)
| `ptr -> Xn`
`src -> Vt.8B`
`0 <= lane <= 7` | `LD1 {Vt.B}[lane],[Xn]` | `Vt.8B -> result` | `v7/A32/A64` | -| uint8x16_t vld1q_lane_u8(
     uint8_t const *ptr,
     uint8x16_t src,
     const int lane)
| `ptr -> Xn`
`src -> Vt.16B`
`0 <= lane <= 15` | `LD1 {Vt.B}[lane],[Xn]` | `Vt.16B -> result` | `v7/A32/A64` | -| uint16x4_t vld1_lane_u16(
     uint16_t const *ptr,
     uint16x4_t src,
     const int lane)
| `ptr -> Xn`
`src -> Vt.4H`
`0 <= lane <= 3` | `LD1 {Vt.H}[lane],[Xn]` | `Vt.4H -> result` | `v7/A32/A64` | -| uint16x8_t vld1q_lane_u16(
     uint16_t const *ptr,
     uint16x8_t src,
     const int lane)
| `ptr -> Xn`
`src -> Vt.8H`
`0 <= lane <= 7` | `LD1 {Vt.H}[lane],[Xn]` | `Vt.8H -> result` | `v7/A32/A64` | -| uint32x2_t vld1_lane_u32(
     uint32_t const *ptr,
     uint32x2_t src,
     const int lane)
| `ptr -> Xn`
`src -> Vt.2S`
`0 <= lane <= 1` | `LD1 {Vt.S}[lane],[Xn]` | `Vt.2S -> result` | `v7/A32/A64` | -| uint32x4_t vld1q_lane_u32(
     uint32_t const *ptr,
     uint32x4_t src,
     const int lane)
| `ptr -> Xn`
`src -> Vt.4S`
`0 <= lane <= 3` | `LD1 {Vt.S}[lane],[Xn]` | `Vt.4S -> result` | `v7/A32/A64` | -| uint64x1_t vld1_lane_u64(
     uint64_t const *ptr,
     uint64x1_t src,
     const int lane)
| `ptr -> Xn`
`src -> Vt.1D`
`0 <= lane <= 0` | `LD1 {Vt.D}[lane],[Xn]` | `Vt.1D -> result` | `v7/A32/A64` | -| uint64x2_t vld1q_lane_u64(
     uint64_t const *ptr,
     uint64x2_t src,
     const int lane)
| `ptr -> Xn`
`src -> Vt.2D`
`0 <= lane <= 1` | `LD1 {Vt.D}[lane],[Xn]` | `Vt.2D -> result` | `v7/A32/A64` | -| poly64x1_t vld1_lane_p64(
     poly64_t const *ptr,
     poly64x1_t src,
     const int lane)
| `ptr -> Xn`
`src -> Vt.1D`
`0 <= lane <= 0` | `LD1 {Vt.D}[lane],[Xn]` | `Vt.1D -> result` | `A32/A64` | -| poly64x2_t vld1q_lane_p64(
     poly64_t const *ptr,
     poly64x2_t src,
     const int lane)
| `ptr -> Xn`
`src -> Vt.2D`
`0 <= lane <= 1` | `LD1 {Vt.D}[lane],[Xn]` | `Vt.2D -> result` | `A32/A64` | -| float16x4_t vld1_lane_f16(
     float16_t const *ptr,
     float16x4_t src,
     const int lane)
| `ptr -> Xn`
`src -> Vt.4H`
`0 <= lane <= 3` | `LD1 {Vt.H}[lane],[Xn]` | `Vt.4H -> result` | `v7/A32/A64` | -| float16x8_t vld1q_lane_f16(
     float16_t const *ptr,
     float16x8_t src,
     const int lane)
| `ptr -> Xn`
`src -> Vt.8H`
`0 <= lane <= 7` | `LD1 {Vt.H}[lane],[Xn]` | `Vt.8H -> result` | `v7/A32/A64` | -| float32x2_t vld1_lane_f32(
     float32_t const *ptr,
     float32x2_t src,
     const int lane)
| `ptr -> Xn`
`src -> Vt.2S`
`0 <= lane <= 1` | `LD1 {Vt.S}[lane],[Xn]` | `Vt.2S -> result` | `v7/A32/A64` | -| float32x4_t vld1q_lane_f32(
     float32_t const *ptr,
     float32x4_t src,
     const int lane)
| `ptr -> Xn`
`src -> Vt.4S`
`0 <= lane <= 3` | `LD1 {Vt.S}[lane],[Xn]` | `Vt.4S -> result` | `v7/A32/A64` | -| poly8x8_t vld1_lane_p8(
     poly8_t const *ptr,
     poly8x8_t src,
     const int lane)
| `ptr -> Xn`
`src -> Vt.8B`
`0 <= lane <= 7` | `LD1 {Vt.B}[lane],[Xn]` | `Vt.8B -> result` | `v7/A32/A64` | -| poly8x16_t vld1q_lane_p8(
     poly8_t const *ptr,
     poly8x16_t src,
     const int lane)
| `ptr -> Xn`
`src -> Vt.16B`
`0 <= lane <= 15` | `LD1 {Vt.B}[lane],[Xn]` | `Vt.16B -> result` | `v7/A32/A64` | -| poly16x4_t vld1_lane_p16(
     poly16_t const *ptr,
     poly16x4_t src,
     const int lane)
| `ptr -> Xn`
`src -> Vt.4H`
`0 <= lane <= 3` | `LD1 {Vt.H}[lane],[Xn]` | `Vt.4H -> result` | `v7/A32/A64` | -| poly16x8_t vld1q_lane_p16(
     poly16_t const *ptr,
     poly16x8_t src,
     const int lane)
| `ptr -> Xn`
`src -> Vt.8H`
`0 <= lane <= 7` | `LD1 {Vt.H}[lane],[Xn]` | `Vt.8H -> result` | `v7/A32/A64` | -| float64x1_t vld1_lane_f64(
     float64_t const *ptr,
     float64x1_t src,
     const int lane)
| `ptr -> Xn`
`src -> Vt.1D`
`0 <= lane <= 0` | `LD1 {Vt.D}[lane],[Xn]` | `Vt.1D -> result` | `A64` | -| float64x2_t vld1q_lane_f64(
     float64_t const *ptr,
     float64x2_t src,
     const int lane)
| `ptr -> Xn`
`src -> Vt.2D`
`0 <= lane <= 1` | `LD1 {Vt.D}[lane],[Xn]` | `Vt.2D -> result` | `A64` | -| uint64x1_t vldap1_lane_u64(
     uint64_t const *ptr,
     uint64x1_t src,
     const int lane)
| `ptr -> Xn`
`src -> Vt.1D`
`0 <= lane <= 0` | `LDAP1 {Vt.D}[lane],[Xn]` | `Vt.1D -> result` | `A64` | -| uint64x2_t vldap1q_lane_u64(
     uint64_t const *ptr,
     uint64x2_t src,
     const int lane)
| `ptr -> Xn`
`src -> Vt.2D`
`0 <= lane <= 1` | `LDAP1 {Vt.D}[lane],[Xn]` | `Vt.2D -> result` | `A64` | -| int64x1_t vldap1_lane_s64(
     int64_t const *ptr,
     int64x1_t src,
     const int lane)
| `ptr -> Xn`
`src -> Vt.1D`
`0 <= lane <= 0` | `LDAP1 {Vt.D}[lane],[Xn]` | `Vt.1D -> result` | `A64` | -| int64x2_t vldap1q_lane_s64(
     int64_t const *ptr,
     int64x2_t src,
     const int lane)
| `ptr -> Xn`
`src -> Vt.2D`
`0 <= lane <= 1` | `LDAP1 {Vt.D}[lane],[Xn]` | `Vt.2D -> result` | `A64` | -| float64x1_t vldap1_lane_f64(
     float64_t const *ptr,
     float64x1_t src,
     const int lane)
| `ptr -> Xn`
`src -> Vt.1D`
`0 <= lane <= 0` | `LDAP1 {Vt.D}[lane],[Xn]` | `Vt.1D -> result` | `A64` | -| float64x2_t vldap1q_lane_f64(
     float64_t const *ptr,
     float64x2_t src,
     const int lane)
| `ptr -> Xn`
`src -> Vt.2D`
`0 <= lane <= 1` | `LDAP1 {Vt.D}[lane],[Xn]` | `Vt.2D -> result` | `A64` | -| poly64x1_t vldap1_lane_p64(
     poly64_t const *ptr,
     poly64x1_t src,
     const int lane)
| `ptr -> Xn`
`src -> Vt.1D`
`0 <= lane <= 0` | `LDAP1 {Vt.D}[lane],[Xn]` | `Vt.1D -> result` | `A64` | -| poly64x2_t vldap1q_lane_p64(
     poly64_t const *ptr,
     poly64x2_t src,
     const int lane)
| `ptr -> Xn`
`src -> Vt.2D`
`0 <= lane <= 1` | `LDAP1 {Vt.D}[lane],[Xn]` | `Vt.2D -> result` | `A64` | -| int8x8_t vld1_dup_s8(int8_t const *ptr) | `ptr -> Xn` | `LD1R {Vt.8B},[Xn]` | `Vt.8B -> result` | `v7/A32/A64` | -| int8x16_t vld1q_dup_s8(int8_t const *ptr) | `ptr -> Xn` | `LD1R {Vt.16B},[Xn]` | `Vt.16B -> result` | `v7/A32/A64` | -| int16x4_t vld1_dup_s16(int16_t const *ptr) | `ptr -> Xn` | `LD1R {Vt.4H},[Xn]` | `Vt.4H -> result` | `v7/A32/A64` | -| int16x8_t vld1q_dup_s16(int16_t const *ptr) | `ptr -> Xn` | `LD1R {Vt.8H},[Xn]` | `Vt.8H -> result` | `v7/A32/A64` | -| int32x2_t vld1_dup_s32(int32_t const *ptr) | `ptr -> Xn` | `LD1R {Vt.2S},[Xn]` | `Vt.2S -> result` | `v7/A32/A64` | -| int32x4_t vld1q_dup_s32(int32_t const *ptr) | `ptr -> Xn` | `LD1R {Vt.4S},[Xn]` | `Vt.4S -> result` | `v7/A32/A64` | -| int64x1_t vld1_dup_s64(int64_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.1D},[Xn]` | `Vt.1D -> result` | `v7/A32/A64` | -| int64x2_t vld1q_dup_s64(int64_t const *ptr) | `ptr -> Xn` | `LD1R {Vt.2D},[Xn]` | `Vt.2D -> result` | `v7/A32/A64` | -| uint8x8_t vld1_dup_u8(uint8_t const *ptr) | `ptr -> Xn` | `LD1R {Vt.8B},[Xn]` | `Vt.8B -> result` | `v7/A32/A64` | -| uint8x16_t vld1q_dup_u8(uint8_t const *ptr) | `ptr -> Xn` | `LD1R {Vt.16B},[Xn]` | `Vt.16B -> result` | `v7/A32/A64` | -| uint16x4_t vld1_dup_u16(uint16_t const *ptr) | `ptr -> Xn` | `LD1R {Vt.4H},[Xn]` | `Vt.4H -> result` | `v7/A32/A64` | -| uint16x8_t vld1q_dup_u16(uint16_t const *ptr) | `ptr -> Xn` | `LD1R {Vt.8H},[Xn]` | `Vt.8H -> result` | `v7/A32/A64` | -| uint32x2_t vld1_dup_u32(uint32_t const *ptr) | `ptr -> Xn` | `LD1R {Vt.2S},[Xn]` | `Vt.2S -> result` | `v7/A32/A64` | -| uint32x4_t vld1q_dup_u32(uint32_t const *ptr) | `ptr -> Xn` | `LD1R {Vt.4S},[Xn]` | `Vt.4S -> result` | `v7/A32/A64` | -| uint64x1_t vld1_dup_u64(uint64_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.1D},[Xn]` | `Vt.1D -> result` | `v7/A32/A64` | -| uint64x2_t vld1q_dup_u64(uint64_t const *ptr) | `ptr -> Xn` | `LD1R {Vt.2D},[Xn]` | `Vt.2D -> result` | `v7/A32/A64` | -| poly64x1_t vld1_dup_p64(poly64_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.1D},[Xn]` | `Vt.1D -> result` | `A32/A64` | -| poly64x2_t vld1q_dup_p64(poly64_t const *ptr) | `ptr -> Xn` | `LD1R {Vt.2D},[Xn]` | `Vt.2D -> result` | `A32/A64` | -| float16x4_t vld1_dup_f16(float16_t const *ptr) | `ptr -> Xn` | `LD1R {Vt.4H},[Xn]` | `Vt.4H -> result` | `v7/A32/A64` | -| float16x8_t vld1q_dup_f16(float16_t const *ptr) | `ptr -> Xn` | `LD1R {Vt.8H},[Xn]` | `Vt.8H -> result` | `v7/A32/A64` | -| float32x2_t vld1_dup_f32(float32_t const *ptr) | `ptr -> Xn` | `LD1R {Vt.2S},[Xn]` | `Vt.2S -> result` | `v7/A32/A64` | -| float32x4_t vld1q_dup_f32(float32_t const *ptr) | `ptr -> Xn` | `LD1R {Vt.4S},[Xn]` | `Vt.4S -> result` | `v7/A32/A64` | -| poly8x8_t vld1_dup_p8(poly8_t const *ptr) | `ptr -> Xn` | `LD1R {Vt.8B},[Xn]` | `Vt.8B -> result` | `v7/A32/A64` | -| poly8x16_t vld1q_dup_p8(poly8_t const *ptr) | `ptr -> Xn` | `LD1R {Vt.16B},[Xn]` | `Vt.16B -> result` | `v7/A32/A64` | -| poly16x4_t vld1_dup_p16(poly16_t const *ptr) | `ptr -> Xn` | `LD1R {Vt.4H},[Xn]` | `Vt.4H -> result` | `v7/A32/A64` | -| poly16x8_t vld1q_dup_p16(poly16_t const *ptr) | `ptr -> Xn` | `LD1R {Vt.8H},[Xn]` | `Vt.8H -> result` | `v7/A32/A64` | -| float64x1_t vld1_dup_f64(float64_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.1D},[Xn]` | `Vt.1D -> result` | `A64` | -| float64x2_t vld1q_dup_f64(float64_t const *ptr) | `ptr -> Xn` | `LD1R {Vt.2D},[Xn]` | `Vt.2D -> result` | `A64` | -| void vstl1_lane_u64(
     uint64_t *ptr,
     uint64x1_t val,
     const int lane)
| `val -> Vt.1D`
`ptr -> Xn`
`0 <= lane <= 0` | `STL1 {Vt.d}[lane],[Xn]` | | `A64` | -| void vstl1q_lane_u64(
     uint64_t *ptr,
     uint64x2_t val,
     const int lane)
| `val -> Vt.2D`
`ptr -> Xn`
`0 <= lane <= 1` | `STL1 {Vt.d}[lane],[Xn]` | | `A64` | -| void vstl1_lane_s64(
     int64_t *ptr,
     int64x1_t val,
     const int lane)
| `val -> Vt.1D`
`ptr -> Xn`
`0 <= lane <= 0` | `STL1 {Vt.d}[lane],[Xn]` | | `A64` | -| void vstl1q_lane_s64(
     int64_t *ptr,
     int64x2_t val,
     const int lane)
| `val -> Vt.2D`
`ptr -> Xn`
`0 <= lane <= 1` | `STL1 {Vt.d}[lane],[Xn]` | | `A64` | -| void vstl1_lane_f64(
     float64_t *ptr,
     float64x1_t val,
     const int lane)
| `val -> Vt.1D`
`ptr -> Xn`
`0 <= lane <= 0` | `STL1 {Vt.d}[lane],[Xn]` | | `A64` | -| void vstl1q_lane_f64(
     float64_t *ptr,
     float64x2_t val,
     const int lane)
| `val -> Vt.2D`
`ptr -> Xn`
`0 <= lane <= 1` | `STL1 {Vt.d}[lane],[Xn]` | | `A64` | -| void vstl1_lane_p64(
     poly64_t *ptr,
     poly64x1_t val,
     const int lane)
| `val -> Vt.1D`
`ptr -> Xn`
`0 <= lane <= 0` | `STL1 {Vt.d}[lane],[Xn]` | | `A64` | -| void vstl1q_lane_p64(
     poly64_t *ptr,
     poly64x2_t val,
     const int lane)
| `val -> Vt.2D`
`ptr -> Xn`
`0 <= lane <= 1` | `STL1 {Vt.d}[lane],[Xn]` | | `A64` | -| int8x8x2_t vld2_s8(int8_t const *ptr) | `ptr -> Xn` | `LD2 {Vt.8B - Vt2.8B},[Xn]` | `Vt2.8B -> result.val[1]`
`Vt.8B -> result.val[0]` | `v7/A32/A64` | -| int8x16x2_t vld2q_s8(int8_t const *ptr) | `ptr -> Xn` | `LD2 {Vt.16B - Vt2.16B},[Xn]` | `Vt2.16B -> result.val[1]`
`Vt.16B -> result.val[0]` | `v7/A32/A64` | -| int16x4x2_t vld2_s16(int16_t const *ptr) | `ptr -> Xn` | `LD2 {Vt.4H - Vt2.4H},[Xn]` | `Vt2.4H -> result.val[1]`
`Vt.4H -> result.val[0]` | `v7/A32/A64` | -| int16x8x2_t vld2q_s16(int16_t const *ptr) | `ptr -> Xn` | `LD2 {Vt.8H - Vt2.8H},[Xn]` | `Vt2.8H -> result.val[1]`
`Vt.8H -> result.val[0]` | `v7/A32/A64` | -| int32x2x2_t vld2_s32(int32_t const *ptr) | `ptr -> Xn` | `LD2 {Vt.2S - Vt2.2S},[Xn]` | `Vt2.2S -> result.val[1]`
`Vt.2S -> result.val[0]` | `v7/A32/A64` | -| int32x4x2_t vld2q_s32(int32_t const *ptr) | `ptr -> Xn` | `LD2 {Vt.4S - Vt2.4S},[Xn]` | `Vt2.4S -> result.val[1]`
`Vt.4S -> result.val[0]` | `v7/A32/A64` | -| uint8x8x2_t vld2_u8(uint8_t const *ptr) | `ptr -> Xn` | `LD2 {Vt.8B - Vt2.8B},[Xn]` | `Vt2.8B -> result.val[1]`
`Vt.8B -> result.val[0]` | `v7/A32/A64` | -| uint8x16x2_t vld2q_u8(uint8_t const *ptr) | `ptr -> Xn` | `LD2 {Vt.16B - Vt2.16B},[Xn]` | `Vt2.16B -> result.val[1]`
`Vt.16B -> result.val[0]` | `v7/A32/A64` | -| uint16x4x2_t vld2_u16(uint16_t const *ptr) | `ptr -> Xn` | `LD2 {Vt.4H - Vt2.4H},[Xn]` | `Vt2.4H -> result.val[1]`
`Vt.4H -> result.val[0]` | `v7/A32/A64` | -| uint16x8x2_t vld2q_u16(uint16_t const *ptr) | `ptr -> Xn` | `LD2 {Vt.8H - Vt2.8H},[Xn]` | `Vt2.8H -> result.val[1]`
`Vt.8H -> result.val[0]` | `v7/A32/A64` | -| uint32x2x2_t vld2_u32(uint32_t const *ptr) | `ptr -> Xn` | `LD2 {Vt.2S - Vt2.2S},[Xn]` | `Vt2.2S -> result.val[1]`
`Vt.2S -> result.val[0]` | `v7/A32/A64` | -| uint32x4x2_t vld2q_u32(uint32_t const *ptr) | `ptr -> Xn` | `LD2 {Vt.4S - Vt2.4S},[Xn]` | `Vt2.4S -> result.val[1]`
`Vt.4S -> result.val[0]` | `v7/A32/A64` | -| float16x4x2_t vld2_f16(float16_t const *ptr) | `ptr -> Xn` | `LD2 {Vt.4H - Vt2.4H},[Xn]` | `Vt2.4H -> result.val[1]`
`Vt.4H -> result.val[0]` | `v7/A32/A64` | -| float16x8x2_t vld2q_f16(float16_t const *ptr) | `ptr -> Xn` | `LD2 {Vt.8H - Vt2.8H},[Xn]` | `Vt2.8H -> result.val[1]`
`Vt.8H -> result.val[0]` | `v7/A32/A64` | -| float32x2x2_t vld2_f32(float32_t const *ptr) | `ptr -> Xn` | `LD2 {Vt.2S - Vt2.2S},[Xn]` | `Vt2.2S -> result.val[1]`
`Vt.2S -> result.val[0]` | `v7/A32/A64` | -| float32x4x2_t vld2q_f32(float32_t const *ptr) | `ptr -> Xn` | `LD2 {Vt.4S - Vt2.4S},[Xn]` | `Vt2.4S -> result.val[1]`
`Vt.4S -> result.val[0]` | `v7/A32/A64` | -| poly8x8x2_t vld2_p8(poly8_t const *ptr) | `ptr -> Xn` | `LD2 {Vt.8B - Vt2.8B},[Xn]` | `Vt2.8B -> result.val[1]`
`Vt.8B -> result.val[0]` | `v7/A32/A64` | -| poly8x16x2_t vld2q_p8(poly8_t const *ptr) | `ptr -> Xn` | `LD2 {Vt.16B - Vt2.16B},[Xn]` | `Vt2.16B -> result.val[1]`
`Vt.16B -> result.val[0]` | `v7/A32/A64` | -| poly16x4x2_t vld2_p16(poly16_t const *ptr) | `ptr -> Xn` | `LD2 {Vt.4H - Vt2.4H},[Xn]` | `Vt2.4H -> result.val[1]`
`Vt.4H -> result.val[0]` | `v7/A32/A64` | -| poly16x8x2_t vld2q_p16(poly16_t const *ptr) | `ptr -> Xn` | `LD2 {Vt.8H - Vt2.8H},[Xn]` | `Vt2.8H -> result.val[1]`
`Vt.8H -> result.val[0]` | `v7/A32/A64` | -| int64x1x2_t vld2_s64(int64_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.1D - Vt2.1D},[Xn]` | `Vt2.1D -> result.val[1]`
`Vt.1D -> result.val[0]` | `v7/A32/A64` | -| uint64x1x2_t vld2_u64(uint64_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.1D - Vt2.1D},[Xn]` | `Vt2.1D -> result.val[1]`
`Vt.1D -> result.val[0]` | `v7/A32/A64` | -| poly64x1x2_t vld2_p64(poly64_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.1D - Vt2.1D},[Xn]` | `Vt2.1D -> result.val[1]`
`Vt.1D -> result.val[0]` | `A32/A64` | -| int64x2x2_t vld2q_s64(int64_t const *ptr) | `ptr -> Xn` | `LD2 {Vt.2D - Vt2.2D},[Xn]` | `Vt2.2D -> result.val[1]`
`Vt.2D -> result.val[0]` | `A64` | -| uint64x2x2_t vld2q_u64(uint64_t const *ptr) | `ptr -> Xn` | `LD2 {Vt.2D - Vt2.2D},[Xn]` | `Vt2.2D -> result.val[1]`
`Vt.2D -> result.val[0]` | `A64` | -| poly64x2x2_t vld2q_p64(poly64_t const *ptr) | `ptr -> Xn` | `LD2 {Vt.2D - Vt2.2D},[Xn]` | `Vt2.2D -> result.val[1]`
`Vt.2D -> result.val[0]` | `A64` | -| float64x1x2_t vld2_f64(float64_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.1D - Vt2.1D},[Xn]` | `Vt2.1D -> result.val[1]`
`Vt.1D -> result.val[0]` | `A64` | -| float64x2x2_t vld2q_f64(float64_t const *ptr) | `ptr -> Xn` | `LD2 {Vt.2D - Vt2.2D},[Xn]` | `Vt2.2D -> result.val[1]`
`Vt.2D -> result.val[0]` | `A64` | -| int8x8x3_t vld3_s8(int8_t const *ptr) | `ptr -> Xn` | `LD3 {Vt.8B - Vt3.8B},[Xn]` | `Vt3.8B -> result.val[2]`
`Vt2.8B -> result.val[1]`
`Vt.8B -> result.val[0]` | `v7/A32/A64` | -| int8x16x3_t vld3q_s8(int8_t const *ptr) | `ptr -> Xn` | `LD3 {Vt.16B - Vt3.16B},[Xn]` | `Vt3.16B -> result.val[2]`
`Vt2.16B -> result.val[1]`
`Vt.16B -> result.val[0]` | `v7/A32/A64` | -| int16x4x3_t vld3_s16(int16_t const *ptr) | `ptr -> Xn` | `LD3 {Vt.4H - Vt3.4H},[Xn]` | `Vt3.4H -> result.val[2]`
`Vt2.4H -> result.val[1]`
`Vt.4H -> result.val[0]` | `v7/A32/A64` | -| int16x8x3_t vld3q_s16(int16_t const *ptr) | `ptr -> Xn` | `LD3 {Vt.8H - Vt3.8H},[Xn]` | `Vt3.8H -> result.val[2]`
`Vt2.8H -> result.val[1]`
`Vt.8H -> result.val[0]` | `v7/A32/A64` | -| int32x2x3_t vld3_s32(int32_t const *ptr) | `ptr -> Xn` | `LD3 {Vt.2S - Vt3.2S},[Xn]` | `Vt3.2S -> result.val[2]`
`Vt2.2S -> result.val[1]`
`Vt.2S -> result.val[0]` | `v7/A32/A64` | -| int32x4x3_t vld3q_s32(int32_t const *ptr) | `ptr -> Xn` | `LD3 {Vt.4S - Vt3.4S},[Xn]` | `Vt3.4S -> result.val[2]`
`Vt2.4S -> result.val[1]`
`Vt.4S -> result.val[0]` | `v7/A32/A64` | -| uint8x8x3_t vld3_u8(uint8_t const *ptr) | `ptr -> Xn` | `LD3 {Vt.8B - Vt3.8B},[Xn]` | `Vt3.8B -> result.val[2]`
`Vt2.8B -> result.val[1]`
`Vt.8B -> result.val[0]` | `v7/A32/A64` | -| uint8x16x3_t vld3q_u8(uint8_t const *ptr) | `ptr -> Xn` | `LD3 {Vt.16B - Vt3.16B},[Xn]` | `Vt3.16B -> result.val[2]`
`Vt2.16B -> result.val[1]`
`Vt.16B -> result.val[0]` | `v7/A32/A64` | -| uint16x4x3_t vld3_u16(uint16_t const *ptr) | `ptr -> Xn` | `LD3 {Vt.4H - Vt3.4H},[Xn]` | `Vt3.4H -> result.val[2]`
`Vt2.4H -> result.val[1]`
`Vt.4H -> result.val[0]` | `v7/A32/A64` | -| uint16x8x3_t vld3q_u16(uint16_t const *ptr) | `ptr -> Xn` | `LD3 {Vt.8H - Vt3.8H},[Xn]` | `Vt3.8H -> result.val[2]`
`Vt2.8H -> result.val[1]`
`Vt.8H -> result.val[0]` | `v7/A32/A64` | -| uint32x2x3_t vld3_u32(uint32_t const *ptr) | `ptr -> Xn` | `LD3 {Vt.2S - Vt3.2S},[Xn]` | `Vt3.2S -> result.val[2]`
`Vt2.2S -> result.val[1]`
`Vt.2S -> result.val[0]` | `v7/A32/A64` | -| uint32x4x3_t vld3q_u32(uint32_t const *ptr) | `ptr -> Xn` | `LD3 {Vt.4S - Vt3.4S},[Xn]` | `Vt3.4S -> result.val[2]`
`Vt2.4S -> result.val[1]`
`Vt.4S -> result.val[0]` | `v7/A32/A64` | -| float16x4x3_t vld3_f16(float16_t const *ptr) | `ptr -> Xn` | `LD3 {Vt.4H - Vt3.4H},[Xn]` | `Vt3.4H -> result.val[2]`
`Vt2.4H -> result.val[1]`
`Vt.4H -> result.val[0]` | `v7/A32/A64` | -| float16x8x3_t vld3q_f16(float16_t const *ptr) | `ptr -> Xn` | `LD3 {Vt.8H - Vt3.8H},[Xn]` | `Vt3.8H -> result.val[2]`
`Vt2.8H -> result.val[1]`
`Vt.8H -> result.val[0]` | `v7/A32/A64` | -| float32x2x3_t vld3_f32(float32_t const *ptr) | `ptr -> Xn` | `LD3 {Vt.2S - Vt3.2S},[Xn]` | `Vt3.2S -> result.val[2]`
`Vt2.2S -> result.val[1]`
`Vt.2S -> result.val[0]` | `v7/A32/A64` | -| float32x4x3_t vld3q_f32(float32_t const *ptr) | `ptr -> Xn` | `LD3 {Vt.4S - Vt3.4S},[Xn]` | `Vt3.4S -> result.val[2]`
`Vt2.4S -> result.val[1]`
`Vt.4S -> result.val[0]` | `v7/A32/A64` | -| poly8x8x3_t vld3_p8(poly8_t const *ptr) | `ptr -> Xn` | `LD3 {Vt.8B - Vt3.8B},[Xn]` | `Vt3.8B -> result.val[2]`
`Vt2.8B -> result.val[1]`
`Vt.8B -> result.val[0]` | `v7/A32/A64` | -| poly8x16x3_t vld3q_p8(poly8_t const *ptr) | `ptr -> Xn` | `LD3 {Vt.16B - Vt3.16B},[Xn]` | `Vt3.16B -> result.val[2]`
`Vt2.16B -> result.val[1]`
`Vt.16B -> result.val[0]` | `v7/A32/A64` | -| poly16x4x3_t vld3_p16(poly16_t const *ptr) | `ptr -> Xn` | `LD3 {Vt.4H - Vt3.4H},[Xn]` | `Vt3.4H -> result.val[2]`
`Vt2.4H -> result.val[1]`
`Vt.4H -> result.val[0]` | `v7/A32/A64` | -| poly16x8x3_t vld3q_p16(poly16_t const *ptr) | `ptr -> Xn` | `LD3 {Vt.8H - Vt3.8H},[Xn]` | `Vt3.8H -> result.val[2]`
`Vt2.8H -> result.val[1]`
`Vt.8H -> result.val[0]` | `v7/A32/A64` | -| int64x1x3_t vld3_s64(int64_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.1D - Vt3.1D},[Xn]` | `Vt3.1D -> result.val[2]`
`Vt2.1D -> result.val[1]`
`Vt.1D -> result.val[0]` | `v7/A32/A64` | -| uint64x1x3_t vld3_u64(uint64_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.1D - Vt3.1D},[Xn]` | `Vt3.1D -> result.val[2]`
`Vt2.1D -> result.val[1]`
`Vt.1D -> result.val[0]` | `v7/A32/A64` | -| poly64x1x3_t vld3_p64(poly64_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.1D - Vt3.1D},[Xn]` | `Vt3.1D -> result.val[2]`
`Vt2.1D -> result.val[1]`
`Vt.1D -> result.val[0]` | `A32/A64` | -| int64x2x3_t vld3q_s64(int64_t const *ptr) | `ptr -> Xn` | `LD3 {Vt.2D - Vt3.2D},[Xn]` | `Vt3.2D -> result.val[2]`
`Vt2.2D -> result.val[1]`
`Vt.2D -> result.val[0]` | `A64` | -| uint64x2x3_t vld3q_u64(uint64_t const *ptr) | `ptr -> Xn` | `LD3 {Vt.2D - Vt3.2D},[Xn]` | `Vt3.2D -> result.val[2]`
`Vt2.2D -> result.val[1]`
`Vt.2D -> result.val[0]` | `A64` | -| poly64x2x3_t vld3q_p64(poly64_t const *ptr) | `ptr -> Xn` | `LD3 {Vt.2D - Vt3.2D},[Xn]` | `Vt3.2D -> result.val[2]`
`Vt2.2D -> result.val[1]`
`Vt.2D -> result.val[0]` | `A64` | -| float64x1x3_t vld3_f64(float64_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.1D - Vt3.1D},[Xn]` | `Vt3.1D -> result.val[2]`
`Vt2.1D -> result.val[1]`
`Vt.1D -> result.val[0]` | `A64` | -| float64x2x3_t vld3q_f64(float64_t const *ptr) | `ptr -> Xn` | `LD3 {Vt.2D - Vt3.2D},[Xn]` | `Vt3.2D -> result.val[2]`
`Vt2.2D -> result.val[1]`
`Vt.2D -> result.val[0]` | `A64` | -| int8x8x4_t vld4_s8(int8_t const *ptr) | `ptr -> Xn` | `LD4 {Vt.8B - Vt4.8B},[Xn]` | `Vt4.8B -> result.val[3]`
`Vt3.8B -> result.val[2]`
`Vt2.8B -> result.val[1]`
`Vt.8B -> result.val[0]` | `v7/A32/A64` | -| int8x16x4_t vld4q_s8(int8_t const *ptr) | `ptr -> Xn` | `LD4 {Vt.16B - Vt4.16B},[Xn]` | `Vt4.16B -> result.val[3]`
`Vt3.16B -> result.val[2]`
`Vt2.16B -> result.val[1]`
`Vt.16B -> result.val[0]` | `v7/A32/A64` | -| int16x4x4_t vld4_s16(int16_t const *ptr) | `ptr -> Xn` | `LD4 {Vt.4H - Vt4.4H},[Xn]` | `Vt4.4H -> result.val[3]`
`Vt3.4H -> result.val[2]`
`Vt2.4H -> result.val[1]`
`Vt.4H -> result.val[0]` | `v7/A32/A64` | -| int16x8x4_t vld4q_s16(int16_t const *ptr) | `ptr -> Xn` | `LD4 {Vt.8H - Vt4.8H},[Xn]` | `Vt4.8H -> result.val[3]`
`Vt3.8H -> result.val[2]`
`Vt2.8H -> result.val[1]`
`Vt.8H -> result.val[0]` | `v7/A32/A64` | -| int32x2x4_t vld4_s32(int32_t const *ptr) | `ptr -> Xn` | `LD4 {Vt.2S - Vt4.2S},[Xn]` | `Vt4.2S -> result.val[3]`
`Vt3.2S -> result.val[2]`
`Vt2.2S -> result.val[1]`
`Vt.2S -> result.val[0]` | `v7/A32/A64` | -| int32x4x4_t vld4q_s32(int32_t const *ptr) | `ptr -> Xn` | `LD4 {Vt.4S - Vt4.4S},[Xn]` | `Vt4.4S -> result.val[3]`
`Vt3.4S -> result.val[2]`
`Vt2.4S -> result.val[1]`
`Vt.4S -> result.val[0]` | `v7/A32/A64` | -| uint8x8x4_t vld4_u8(uint8_t const *ptr) | `ptr -> Xn` | `LD4 {Vt.8B - Vt4.8B},[Xn]` | `Vt4.8B -> result.val[3]`
`Vt3.8B -> result.val[2]`
`Vt2.8B -> result.val[1]`
`Vt.8B -> result.val[0]` | `v7/A32/A64` | -| uint8x16x4_t vld4q_u8(uint8_t const *ptr) | `ptr -> Xn` | `LD4 {Vt.16B - Vt4.16B},[Xn]` | `Vt4.16B -> result.val[3]`
`Vt3.16B -> result.val[2]`
`Vt2.16B -> result.val[1]`
`Vt.16B -> result.val[0]` | `v7/A32/A64` | -| uint16x4x4_t vld4_u16(uint16_t const *ptr) | `ptr -> Xn` | `LD4 {Vt.4H - Vt4.4H},[Xn]` | `Vt4.4H -> result.val[3]`
`Vt3.4H -> result.val[2]`
`Vt2.4H -> result.val[1]`
`Vt.4H -> result.val[0]` | `v7/A32/A64` | -| uint16x8x4_t vld4q_u16(uint16_t const *ptr) | `ptr -> Xn` | `LD4 {Vt.8H - Vt4.8H},[Xn]` | `Vt4.8H -> result.val[3]`
`Vt3.8H -> result.val[2]`
`Vt2.8H -> result.val[1]`
`Vt.8H -> result.val[0]` | `v7/A32/A64` | -| uint32x2x4_t vld4_u32(uint32_t const *ptr) | `ptr -> Xn` | `LD4 {Vt.2S - Vt4.2S},[Xn]` | `Vt4.2S -> result.val[3]`
`Vt3.2S -> result.val[2]`
`Vt2.2S -> result.val[1]`
`Vt.2S -> result.val[0]` | `v7/A32/A64` | -| uint32x4x4_t vld4q_u32(uint32_t const *ptr) | `ptr -> Xn` | `LD4 {Vt.4S - Vt4.4S},[Xn]` | `Vt4.4S -> result.val[3]`
`Vt3.4S -> result.val[2]`
`Vt2.4S -> result.val[1]`
`Vt.4S -> result.val[0]` | `v7/A32/A64` | -| float16x4x4_t vld4_f16(float16_t const *ptr) | `ptr -> Xn` | `LD4 {Vt.4H - Vt4.4H},[Xn]` | `Vt4.4H -> result.val[3]`
`Vt3.4H -> result.val[2]`
`Vt2.4H -> result.val[1]`
`Vt.4H -> result.val[0]` | `v7/A32/A64` | -| float16x8x4_t vld4q_f16(float16_t const *ptr) | `ptr -> Xn` | `LD4 {Vt.8H - Vt4.8H},[Xn]` | `Vt4.8H -> result.val[3]`
`Vt3.8H -> result.val[2]`
`Vt2.8H -> result.val[1]`
`Vt.8H -> result.val[0]` | `v7/A32/A64` | -| float32x2x4_t vld4_f32(float32_t const *ptr) | `ptr -> Xn` | `LD4 {Vt.2S - Vt4.2S},[Xn]` | `Vt4.2S -> result.val[3]`
`Vt3.2S -> result.val[2]`
`Vt2.2S -> result.val[1]`
`Vt.2S -> result.val[0]` | `v7/A32/A64` | -| float32x4x4_t vld4q_f32(float32_t const *ptr) | `ptr -> Xn` | `LD4 {Vt.4S - Vt4.4S},[Xn]` | `Vt4.4S -> result.val[3]`
`Vt3.4S -> result.val[2]`
`Vt2.4S -> result.val[1]`
`Vt.4S -> result.val[0]` | `v7/A32/A64` | -| poly8x8x4_t vld4_p8(poly8_t const *ptr) | `ptr -> Xn` | `LD4 {Vt.8B - Vt4.8B},[Xn]` | `Vt4.8B -> result.val[3]`
`Vt3.8B -> result.val[2]`
`Vt2.8B -> result.val[1]`
`Vt.8B -> result.val[0]` | `v7/A32/A64` | -| poly8x16x4_t vld4q_p8(poly8_t const *ptr) | `ptr -> Xn` | `LD4 {Vt.16B - Vt4.16B},[Xn]` | `Vt4.16B -> result.val[3]`
`Vt3.16B -> result.val[2]`
`Vt2.16B -> result.val[1]`
`Vt.16B -> result.val[0]` | `v7/A32/A64` | -| poly16x4x4_t vld4_p16(poly16_t const *ptr) | `ptr -> Xn` | `LD4 {Vt.4H - Vt4.4H},[Xn]` | `Vt4.4H -> result.val[3]`
`Vt3.4H -> result.val[2]`
`Vt2.4H -> result.val[1]`
`Vt.4H -> result.val[0]` | `v7/A32/A64` | -| poly16x8x4_t vld4q_p16(poly16_t const *ptr) | `ptr -> Xn` | `LD4 {Vt.8H - Vt4.8H},[Xn]` | `Vt4.8H -> result.val[3]`
`Vt3.8H -> result.val[2]`
`Vt2.8H -> result.val[1]`
`Vt.8H -> result.val[0]` | `v7/A32/A64` | -| int64x1x4_t vld4_s64(int64_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.1D - Vt4.1D},[Xn]` | `Vt4.1D -> result.val[3]`
`Vt3.1D -> result.val[2]`
`Vt2.1D -> result.val[1]`
`Vt.1D -> result.val[0]` | `v7/A32/A64` | -| uint64x1x4_t vld4_u64(uint64_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.1D - Vt4.1D},[Xn]` | `Vt4.1D -> result.val[3]`
`Vt3.1D -> result.val[2]`
`Vt2.1D -> result.val[1]`
`Vt.1D -> result.val[0]` | `v7/A32/A64` | -| poly64x1x4_t vld4_p64(poly64_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.1D - Vt4.1D},[Xn]` | `Vt4.1D -> result.val[3]`
`Vt3.1D -> result.val[2]`
`Vt2.1D -> result.val[1]`
`Vt.1D -> result.val[0]` | `A32/A64` | -| int64x2x4_t vld4q_s64(int64_t const *ptr) | `ptr -> Xn` | `LD4 {Vt.2D - Vt4.2D},[Xn]` | `Vt4.2D -> result.val[3]`
`Vt3.2D -> result.val[2]`
`Vt2.2D -> result.val[1]`
`Vt.2D -> result.val[0]` | `A64` | -| uint64x2x4_t vld4q_u64(uint64_t const *ptr) | `ptr -> Xn` | `LD4 {Vt.2D - Vt4.2D},[Xn]` | `Vt4.2D -> result.val[3]`
`Vt3.2D -> result.val[2]`
`Vt2.2D -> result.val[1]`
`Vt.2D -> result.val[0]` | `A64` | -| poly64x2x4_t vld4q_p64(poly64_t const *ptr) | `ptr -> Xn` | `LD4 {Vt.2D - Vt4.2D},[Xn]` | `Vt4.2D -> result.val[3]`
`Vt3.2D -> result.val[2]`
`Vt2.2D -> result.val[1]`
`Vt.2D -> result.val[0]` | `A64` | -| float64x1x4_t vld4_f64(float64_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.1D - Vt4.1D},[Xn]` | `Vt4.1D -> result.val[3]`
`Vt3.1D -> result.val[2]`
`Vt2.1D -> result.val[1]`
`Vt.1D -> result.val[0]` | `A64` | -| float64x2x4_t vld4q_f64(float64_t const *ptr) | `ptr -> Xn` | `LD4 {Vt.2D - Vt4.2D},[Xn]` | `Vt4.2D -> result.val[3]`
`Vt3.2D -> result.val[2]`
`Vt2.2D -> result.val[1]`
`Vt.2D -> result.val[0]` | `A64` | -| int8x8x2_t vld2_dup_s8(int8_t const *ptr) | `ptr -> Xn` | `LD2R {Vt.8B - Vt2.8B},[Xn]` | `Vt2.8B -> result.val[1]`
`Vt.8B -> result.val[0]` | `v7/A32/A64` | -| int8x16x2_t vld2q_dup_s8(int8_t const *ptr) | `ptr -> Xn` | `LD2R {Vt.16B - Vt2.16B},[Xn]` | `Vt2.16B -> result.val[1]`
`Vt.16B -> result.val[0]` | `v7/A32/A64` | -| int16x4x2_t vld2_dup_s16(int16_t const *ptr) | `ptr -> Xn` | `LD2R {Vt.4H - Vt2.4H},[Xn]` | `Vt2.4H -> result.val[1]`
`Vt.4H -> result.val[0]` | `v7/A32/A64` | -| int16x8x2_t vld2q_dup_s16(int16_t const *ptr) | `ptr -> Xn` | `LD2R {Vt.8H - Vt2.8H},[Xn]` | `Vt2.8H -> result.val[1]`
`Vt.8H -> result.val[0]` | `v7/A32/A64` | -| int32x2x2_t vld2_dup_s32(int32_t const *ptr) | `ptr -> Xn` | `LD2R {Vt.2S - Vt2.2S},[Xn]` | `Vt2.2S -> result.val[1]`
`Vt.2S -> result.val[0]` | `v7/A32/A64` | -| int32x4x2_t vld2q_dup_s32(int32_t const *ptr) | `ptr -> Xn` | `LD2R {Vt.4S - Vt2.4S},[Xn]` | `Vt2.4S -> result.val[1]`
`Vt.4S -> result.val[0]` | `v7/A32/A64` | -| uint8x8x2_t vld2_dup_u8(uint8_t const *ptr) | `ptr -> Xn` | `LD2R {Vt.8B - Vt2.8B},[Xn]` | `Vt2.8B -> result.val[1]`
`Vt.8B -> result.val[0]` | `v7/A32/A64` | -| uint8x16x2_t vld2q_dup_u8(uint8_t const *ptr) | `ptr -> Xn` | `LD2R {Vt.16B - Vt2.16B},[Xn]` | `Vt2.16B -> result.val[1]`
`Vt.16B -> result.val[0]` | `v7/A32/A64` | -| uint16x4x2_t vld2_dup_u16(uint16_t const *ptr) | `ptr -> Xn` | `LD2R {Vt.4H - Vt2.4H},[Xn]` | `Vt2.4H -> result.val[1]`
`Vt.4H -> result.val[0]` | `v7/A32/A64` | -| uint16x8x2_t vld2q_dup_u16(uint16_t const *ptr) | `ptr -> Xn` | `LD2R {Vt.8H - Vt2.8H},[Xn]` | `Vt2.8H -> result.val[1]`
`Vt.8H -> result.val[0]` | `v7/A32/A64` | -| uint32x2x2_t vld2_dup_u32(uint32_t const *ptr) | `ptr -> Xn` | `LD2R {Vt.2S - Vt2.2S},[Xn]` | `Vt2.2S -> result.val[1]`
`Vt.2S -> result.val[0]` | `v7/A32/A64` | -| uint32x4x2_t vld2q_dup_u32(uint32_t const *ptr) | `ptr -> Xn` | `LD2R {Vt.4S - Vt2.4S},[Xn]` | `Vt2.4S -> result.val[1]`
`Vt.4S -> result.val[0]` | `v7/A32/A64` | -| float16x4x2_t vld2_dup_f16(float16_t const *ptr) | `ptr -> Xn` | `LD2R {Vt.4H - Vt2.4H},[Xn]` | `Vt2.4H -> result.val[1]`
`Vt.4H -> result.val[0]` | `v7/A32/A64` | -| float16x8x2_t vld2q_dup_f16(float16_t const *ptr) | `ptr -> Xn` | `LD2R {Vt.8H - Vt2.8H},[Xn]` | `Vt2.8H -> result.val[1]`
`Vt.8H -> result.val[0]` | `v7/A32/A64` | -| float32x2x2_t vld2_dup_f32(float32_t const *ptr) | `ptr -> Xn` | `LD2R {Vt.2S - Vt2.2S},[Xn]` | `Vt2.2S -> result.val[1]`
`Vt.2S -> result.val[0]` | `v7/A32/A64` | -| float32x4x2_t vld2q_dup_f32(float32_t const *ptr) | `ptr -> Xn` | `LD2R {Vt.4S - Vt2.4S},[Xn]` | `Vt2.4S -> result.val[1]`
`Vt.4S -> result.val[0]` | `v7/A32/A64` | -| poly8x8x2_t vld2_dup_p8(poly8_t const *ptr) | `ptr -> Xn` | `LD2R {Vt.8B - Vt2.8B},[Xn]` | `Vt2.8B -> result.val[1]`
`Vt.8B -> result.val[0]` | `v7/A32/A64` | -| poly8x16x2_t vld2q_dup_p8(poly8_t const *ptr) | `ptr -> Xn` | `LD2R {Vt.16B - Vt2.16B},[Xn]` | `Vt2.16B -> result.val[1]`
`Vt.16B -> result.val[0]` | `v7/A32/A64` | -| poly16x4x2_t vld2_dup_p16(poly16_t const *ptr) | `ptr -> Xn` | `LD2R {Vt.4H - Vt2.4H},[Xn]` | `Vt2.4H -> result.val[1]`
`Vt.4H -> result.val[0]` | `v7/A32/A64` | -| poly16x8x2_t vld2q_dup_p16(poly16_t const *ptr) | `ptr -> Xn` | `LD2R {Vt.8H - Vt2.8H},[Xn]` | `Vt2.8H -> result.val[1]`
`Vt.8H -> result.val[0]` | `v7/A32/A64` | -| int64x1x2_t vld2_dup_s64(int64_t const *ptr) | `ptr -> Xn` | `LD2R {Vt.1D - Vt2.1D},[Xn]` | `Vt2.1D -> result.val[1]`
`Vt.1D -> result.val[0]` | `v7/A32/A64` | -| uint64x1x2_t vld2_dup_u64(uint64_t const *ptr) | `ptr -> Xn` | `LD2R {Vt.1D - Vt2.1D},[Xn]` | `Vt2.1D -> result.val[1]`
`Vt.1D -> result.val[0]` | `v7/A32/A64` | -| poly64x1x2_t vld2_dup_p64(poly64_t const *ptr) | `ptr -> Xn` | `LD2R {Vt.1D - Vt2.1D},[Xn]` | `Vt2.1D -> result.val[1]`
`Vt.1D -> result.val[0]` | `A32/A64` | -| int64x2x2_t vld2q_dup_s64(int64_t const *ptr) | `ptr -> Xn` | `LD2R {Vt.2D - Vt2.2D},[Xn]` | `Vt2.2D -> result.val[1]`
`Vt.2D -> result.val[0]` | `A64` | -| uint64x2x2_t vld2q_dup_u64(uint64_t const *ptr) | `ptr -> Xn` | `LD2R {Vt.2D - Vt2.2D},[Xn]` | `Vt2.2D -> result.val[1]`
`Vt.2D -> result.val[0]` | `A64` | -| poly64x2x2_t vld2q_dup_p64(poly64_t const *ptr) | `ptr -> Xn` | `LD2R {Vt.2D - Vt2.2D},[Xn]` | `Vt2.2D -> result.val[1]`
`Vt.2D -> result.val[0]` | `A64` | -| float64x1x2_t vld2_dup_f64(float64_t const *ptr) | `ptr -> Xn` | `LD2R {Vt.1D - Vt2.1D},[Xn]` | `Vt2.1D -> result.val[1]`
`Vt.1D -> result.val[0]` | `A64` | -| float64x2x2_t vld2q_dup_f64(float64_t const *ptr) | `ptr -> Xn` | `LD2R {Vt.2D - Vt2.2D},[Xn]` | `Vt2.2D -> result.val[1]`
`Vt.2D -> result.val[0]` | `A64` | -| int8x8x3_t vld3_dup_s8(int8_t const *ptr) | `ptr -> Xn` | `LD3R {Vt.8B - Vt3.8B},[Xn]` | `Vt3.8B -> result.val[2]`
`Vt2.8B -> result.val[1]`
`Vt.8B -> result.val[0]` | `v7/A32/A64` | -| int8x16x3_t vld3q_dup_s8(int8_t const *ptr) | `ptr -> Xn` | `LD3R {Vt.16B - Vt3.16B},[Xn]` | `Vt3.16B -> result.val[2]`
`Vt2.16B -> result.val[1]`
`Vt.16B -> result.val[0]` | `v7/A32/A64` | -| int16x4x3_t vld3_dup_s16(int16_t const *ptr) | `ptr -> Xn` | `LD3R {Vt.4H - Vt3.4H},[Xn]` | `Vt3.4H -> result.val[2]`
`Vt2.4H -> result.val[1]`
`Vt.4H -> result.val[0]` | `v7/A32/A64` | -| int16x8x3_t vld3q_dup_s16(int16_t const *ptr) | `ptr -> Xn` | `LD3R {Vt.8H - Vt3.8H},[Xn]` | `Vt3.8H -> result.val[2]`
`Vt2.8H -> result.val[1]`
`Vt.8H -> result.val[0]` | `v7/A32/A64` | -| int32x2x3_t vld3_dup_s32(int32_t const *ptr) | `ptr -> Xn` | `LD3R {Vt.2S - Vt3.2S},[Xn]` | `Vt3.2S -> result.val[2]`
`Vt2.2S -> result.val[1]`
`Vt.2S -> result.val[0]` | `v7/A32/A64` | -| int32x4x3_t vld3q_dup_s32(int32_t const *ptr) | `ptr -> Xn` | `LD3R {Vt.4S - Vt3.4S},[Xn]` | `Vt3.4S -> result.val[2]`
`Vt2.4S -> result.val[1]`
`Vt.4S -> result.val[0]` | `v7/A32/A64` | -| uint8x8x3_t vld3_dup_u8(uint8_t const *ptr) | `ptr -> Xn` | `LD3R {Vt.8B - Vt3.8B},[Xn]` | `Vt3.8B -> result.val[2]`
`Vt2.8B -> result.val[1]`
`Vt.8B -> result.val[0]` | `v7/A32/A64` | -| uint8x16x3_t vld3q_dup_u8(uint8_t const *ptr) | `ptr -> Xn` | `LD3R {Vt.16B - Vt3.16B},[Xn]` | `Vt3.16B -> result.val[2]`
`Vt2.16B -> result.val[1]`
`Vt.16B -> result.val[0]` | `v7/A32/A64` | -| uint16x4x3_t vld3_dup_u16(uint16_t const *ptr) | `ptr -> Xn` | `LD3R {Vt.4H - Vt3.4H},[Xn]` | `Vt3.4H -> result.val[2]`
`Vt2.4H -> result.val[1]`
`Vt.4H -> result.val[0]` | `v7/A32/A64` | -| uint16x8x3_t vld3q_dup_u16(uint16_t const *ptr) | `ptr -> Xn` | `LD3R {Vt.8H - Vt3.8H},[Xn]` | `Vt3.8H -> result.val[2]`
`Vt2.8H -> result.val[1]`
`Vt.8H -> result.val[0]` | `v7/A32/A64` | -| uint32x2x3_t vld3_dup_u32(uint32_t const *ptr) | `ptr -> Xn` | `LD3R {Vt.2S - Vt3.2S},[Xn]` | `Vt3.2S -> result.val[2]`
`Vt2.2S -> result.val[1]`
`Vt.2S -> result.val[0]` | `v7/A32/A64` | -| uint32x4x3_t vld3q_dup_u32(uint32_t const *ptr) | `ptr -> Xn` | `LD3R {Vt.4S - Vt3.4S},[Xn]` | `Vt3.4S -> result.val[2]`
`Vt2.4S -> result.val[1]`
`Vt.4S -> result.val[0]` | `v7/A32/A64` | -| float16x4x3_t vld3_dup_f16(float16_t const *ptr) | `ptr -> Xn` | `LD3R {Vt.4H - Vt3.4H},[Xn]` | `Vt3.4H -> result.val[2]`
`Vt2.4H -> result.val[1]`
`Vt.4H -> result.val[0]` | `v7/A32/A64` | -| float16x8x3_t vld3q_dup_f16(float16_t const *ptr) | `ptr -> Xn` | `LD3R {Vt.8H - Vt3.8H},[Xn]` | `Vt3.8H -> result.val[2]`
`Vt2.8H -> result.val[1]`
`Vt.8H -> result.val[0]` | `v7/A32/A64` | -| float32x2x3_t vld3_dup_f32(float32_t const *ptr) | `ptr -> Xn` | `LD3R {Vt.2S - Vt3.2S},[Xn]` | `Vt3.2S -> result.val[2]`
`Vt2.2S -> result.val[1]`
`Vt.2S -> result.val[0]` | `v7/A32/A64` | -| float32x4x3_t vld3q_dup_f32(float32_t const *ptr) | `ptr -> Xn` | `LD3R {Vt.4S - Vt3.4S},[Xn]` | `Vt3.4S -> result.val[2]`
`Vt2.4S -> result.val[1]`
`Vt.4S -> result.val[0]` | `v7/A32/A64` | -| poly8x8x3_t vld3_dup_p8(poly8_t const *ptr) | `ptr -> Xn` | `LD3R {Vt.8B - Vt3.8B},[Xn]` | `Vt3.8B -> result.val[2]`
`Vt2.8B -> result.val[1]`
`Vt.8B -> result.val[0]` | `v7/A32/A64` | -| poly8x16x3_t vld3q_dup_p8(poly8_t const *ptr) | `ptr -> Xn` | `LD3R {Vt.16B - Vt3.16B},[Xn]` | `Vt3.16B -> result.val[2]`
`Vt2.16B -> result.val[1]`
`Vt.16B -> result.val[0]` | `v7/A32/A64` | -| poly16x4x3_t vld3_dup_p16(poly16_t const *ptr) | `ptr -> Xn` | `LD3R {Vt.4H - Vt3.4H},[Xn]` | `Vt3.4H -> result.val[2]`
`Vt2.4H -> result.val[1]`
`Vt.4H -> result.val[0]` | `v7/A32/A64` | -| poly16x8x3_t vld3q_dup_p16(poly16_t const *ptr) | `ptr -> Xn` | `LD3R {Vt.8H - Vt3.8H},[Xn]` | `Vt3.8H -> result.val[2]`
`Vt2.8H -> result.val[1]`
`Vt.8H -> result.val[0]` | `v7/A32/A64` | -| int64x1x3_t vld3_dup_s64(int64_t const *ptr) | `ptr -> Xn` | `LD3R {Vt.1D - Vt3.1D},[Xn]` | `Vt3.1D -> result.val[2]`
`Vt2.1D -> result.val[1]`
`Vt.1D -> result.val[0]` | `v7/A32/A64` | -| uint64x1x3_t vld3_dup_u64(uint64_t const *ptr) | `ptr -> Xn` | `LD3R {Vt.1D - Vt3.1D},[Xn]` | `Vt3.1D -> result.val[2]`
`Vt2.1D -> result.val[1]`
`Vt.1D -> result.val[0]` | `v7/A32/A64` | -| poly64x1x3_t vld3_dup_p64(poly64_t const *ptr) | `ptr -> Xn` | `LD3R {Vt.1D - Vt3.1D},[Xn]` | `Vt3.1D -> result.val[2]`
`Vt2.1D -> result.val[1]`
`Vt.1D -> result.val[0]` | `A32/A64` | -| int64x2x3_t vld3q_dup_s64(int64_t const *ptr) | `ptr -> Xn` | `LD3R {Vt.2D - Vt3.2D},[Xn]` | `Vt3.2D -> result.val[2]`
`Vt2.2D -> result.val[1]`
`Vt.2D -> result.val[0]` | `A64` | -| uint64x2x3_t vld3q_dup_u64(uint64_t const *ptr) | `ptr -> Xn` | `LD3R {Vt.2D - Vt3.2D},[Xn]` | `Vt3.2D -> result.val[2]`
`Vt2.2D -> result.val[1]`
`Vt.2D -> result.val[0]` | `A64` | -| poly64x2x3_t vld3q_dup_p64(poly64_t const *ptr) | `ptr -> Xn` | `LD3R {Vt.2D - Vt3.2D},[Xn]` | `Vt3.2D -> result.val[2]`
`Vt2.2D -> result.val[1]`
`Vt.2D -> result.val[0]` | `A64` | -| float64x1x3_t vld3_dup_f64(float64_t const *ptr) | `ptr -> Xn` | `LD3R {Vt.1D - Vt3.1D},[Xn]` | `Vt3.1D -> result.val[2]`
`Vt2.1D -> result.val[1]`
`Vt.1D -> result.val[0]` | `A64` | -| float64x2x3_t vld3q_dup_f64(float64_t const *ptr) | `ptr -> Xn` | `LD3R {Vt.2D - Vt3.2D},[Xn]` | `Vt3.2D -> result.val[2]`
`Vt2.2D -> result.val[1]`
`Vt.2D -> result.val[0]` | `A64` | -| int8x8x4_t vld4_dup_s8(int8_t const *ptr) | `ptr -> Xn` | `LD4R {Vt.8B - Vt4.8B},[Xn]` | `Vt4.8B -> result.val[3]`
`Vt3.8B -> result.val[2]`
`Vt2.8B -> result.val[1]`
`Vt.8B -> result.val[0]` | `v7/A32/A64` | -| int8x16x4_t vld4q_dup_s8(int8_t const *ptr) | `ptr -> Xn` | `LD4R {Vt.16B - Vt4.16B},[Xn]` | `Vt4.16B -> result.val[3]`
`Vt3.16B -> result.val[2]`
`Vt2.16B -> result.val[1]`
`Vt.16B -> result.val[0]` | `v7/A32/A64` | -| int16x4x4_t vld4_dup_s16(int16_t const *ptr) | `ptr -> Xn` | `LD4R {Vt.4H - Vt4.4H},[Xn]` | `Vt4.4H -> result.val[3]`
`Vt3.4H -> result.val[2]`
`Vt2.4H -> result.val[1]`
`Vt.4H -> result.val[0]` | `v7/A32/A64` | -| int16x8x4_t vld4q_dup_s16(int16_t const *ptr) | `ptr -> Xn` | `LD4R {Vt.8H - Vt4.8H},[Xn]` | `Vt4.8H -> result.val[3]`
`Vt3.8H -> result.val[2]`
`Vt2.8H -> result.val[1]`
`Vt.8H -> result.val[0]` | `v7/A32/A64` | -| int32x2x4_t vld4_dup_s32(int32_t const *ptr) | `ptr -> Xn` | `LD4R {Vt.2S - Vt4.2S},[Xn]` | `Vt4.2S -> result.val[3]`
`Vt3.2S -> result.val[2]`
`Vt2.2S -> result.val[1]`
`Vt.2S -> result.val[0]` | `v7/A32/A64` | -| int32x4x4_t vld4q_dup_s32(int32_t const *ptr) | `ptr -> Xn` | `LD4R {Vt.4S - Vt4.4S},[Xn]` | `Vt4.4S -> result.val[3]`
`Vt3.4S -> result.val[2]`
`Vt2.4S -> result.val[1]`
`Vt.4S -> result.val[0]` | `v7/A32/A64` | -| uint8x8x4_t vld4_dup_u8(uint8_t const *ptr) | `ptr -> Xn` | `LD4R {Vt.8B - Vt4.8B},[Xn]` | `Vt4.8B -> result.val[3]`
`Vt3.8B -> result.val[2]`
`Vt2.8B -> result.val[1]`
`Vt.8B -> result.val[0]` | `v7/A32/A64` | -| uint8x16x4_t vld4q_dup_u8(uint8_t const *ptr) | `ptr -> Xn` | `LD4R {Vt.16B - Vt4.16B},[Xn]` | `Vt4.16B -> result.val[3]`
`Vt3.16B -> result.val[2]`
`Vt2.16B -> result.val[1]`
`Vt.16B -> result.val[0]` | `v7/A32/A64` | -| uint16x4x4_t vld4_dup_u16(uint16_t const *ptr) | `ptr -> Xn` | `LD4R {Vt.4H - Vt4.4H},[Xn]` | `Vt4.4H -> result.val[3]`
`Vt3.4H -> result.val[2]`
`Vt2.4H -> result.val[1]`
`Vt.4H -> result.val[0]` | `v7/A32/A64` | -| uint16x8x4_t vld4q_dup_u16(uint16_t const *ptr) | `ptr -> Xn` | `LD4R {Vt.8H - Vt4.8H},[Xn]` | `Vt4.8H -> result.val[3]`
`Vt3.8H -> result.val[2]`
`Vt2.8H -> result.val[1]`
`Vt.8H -> result.val[0]` | `v7/A32/A64` | -| uint32x2x4_t vld4_dup_u32(uint32_t const *ptr) | `ptr -> Xn` | `LD4R {Vt.2S - Vt4.2S},[Xn]` | `Vt4.2S -> result.val[3]`
`Vt3.2S -> result.val[2]`
`Vt2.2S -> result.val[1]`
`Vt.2S -> result.val[0]` | `v7/A32/A64` | -| uint32x4x4_t vld4q_dup_u32(uint32_t const *ptr) | `ptr -> Xn` | `LD4R {Vt.4S - Vt4.4S},[Xn]` | `Vt4.4S -> result.val[3]`
`Vt3.4S -> result.val[2]`
`Vt2.4S -> result.val[1]`
`Vt.4S -> result.val[0]` | `v7/A32/A64` | -| float16x4x4_t vld4_dup_f16(float16_t const *ptr) | `ptr -> Xn` | `LD4R {Vt.4H - Vt4.4H},[Xn]` | `Vt4.4H -> result.val[3]`
`Vt3.4H -> result.val[2]`
`Vt2.4H -> result.val[1]`
`Vt.4H -> result.val[0]` | `v7/A32/A64` | -| float16x8x4_t vld4q_dup_f16(float16_t const *ptr) | `ptr -> Xn` | `LD4R {Vt.8H - Vt4.8H},[Xn]` | `Vt4.8H -> result.val[3]`
`Vt3.8H -> result.val[2]`
`Vt2.8H -> result.val[1]`
`Vt.8H -> result.val[0]` | `v7/A32/A64` | -| float32x2x4_t vld4_dup_f32(float32_t const *ptr) | `ptr -> Xn` | `LD4R {Vt.2S - Vt4.2S},[Xn]` | `Vt4.2S -> result.val[3]`
`Vt3.2S -> result.val[2]`
`Vt2.2S -> result.val[1]`
`Vt.2S -> result.val[0]` | `v7/A32/A64` | -| float32x4x4_t vld4q_dup_f32(float32_t const *ptr) | `ptr -> Xn` | `LD4R {Vt.4S - Vt4.4S},[Xn]` | `Vt4.4S -> result.val[3]`
`Vt3.4S -> result.val[2]`
`Vt2.4S -> result.val[1]`
`Vt.4S -> result.val[0]` | `v7/A32/A64` | -| poly8x8x4_t vld4_dup_p8(poly8_t const *ptr) | `ptr -> Xn` | `LD4R {Vt.8B - Vt4.8B},[Xn]` | `Vt4.8B -> result.val[3]`
`Vt3.8B -> result.val[2]`
`Vt2.8B -> result.val[1]`
`Vt.8B -> result.val[0]` | `v7/A32/A64` | -| poly8x16x4_t vld4q_dup_p8(poly8_t const *ptr) | `ptr -> Xn` | `LD4R {Vt.16B - Vt4.16B},[Xn]` | `Vt4.16B -> result.val[3]`
`Vt3.16B -> result.val[2]`
`Vt2.16B -> result.val[1]`
`Vt.16B -> result.val[0]` | `v7/A32/A64` | -| poly16x4x4_t vld4_dup_p16(poly16_t const *ptr) | `ptr -> Xn` | `LD4R {Vt.4H - Vt4.4H},[Xn]` | `Vt4.4H -> result.val[3]`
`Vt3.4H -> result.val[2]`
`Vt2.4H -> result.val[1]`
`Vt.4H -> result.val[0]` | `v7/A32/A64` | -| poly16x8x4_t vld4q_dup_p16(poly16_t const *ptr) | `ptr -> Xn` | `LD4R {Vt.8H - Vt4.8H},[Xn]` | `Vt4.8H -> result.val[3]`
`Vt3.8H -> result.val[2]`
`Vt2.8H -> result.val[1]`
`Vt.8H -> result.val[0]` | `v7/A32/A64` | -| int64x1x4_t vld4_dup_s64(int64_t const *ptr) | `ptr -> Xn` | `LD4R {Vt.1D - Vt4.1D},[Xn]` | `Vt4.1D -> result.val[3]`
`Vt3.1D -> result.val[2]`
`Vt2.1D -> result.val[1]`
`Vt.1D -> result.val[0]` | `v7/A32/A64` | -| uint64x1x4_t vld4_dup_u64(uint64_t const *ptr) | `ptr -> Xn` | `LD4R {Vt.1D - Vt4.1D},[Xn]` | `Vt4.1D -> result.val[3]`
`Vt3.1D -> result.val[2]`
`Vt2.1D -> result.val[1]`
`Vt.1D -> result.val[0]` | `v7/A32/A64` | -| poly64x1x4_t vld4_dup_p64(poly64_t const *ptr) | `ptr -> Xn` | `LD4R {Vt.1D - Vt4.1D},[Xn]` | `Vt4.1D -> result.val[3]`
`Vt3.1D -> result.val[2]`
`Vt2.1D -> result.val[1]`
`Vt.1D -> result.val[0]` | `A32/A64` | -| int64x2x4_t vld4q_dup_s64(int64_t const *ptr) | `ptr -> Xn` | `LD4R {Vt.2D - Vt4.2D},[Xn]` | `Vt4.2D -> result.val[3]`
`Vt3.2D -> result.val[2]`
`Vt2.2D -> result.val[1]`
`Vt.2D -> result.val[0]` | `A64` | -| uint64x2x4_t vld4q_dup_u64(uint64_t const *ptr) | `ptr -> Xn` | `LD4R {Vt.2D - Vt4.2D},[Xn]` | `Vt4.2D -> result.val[3]`
`Vt3.2D -> result.val[2]`
`Vt2.2D -> result.val[1]`
`Vt.2D -> result.val[0]` | `A64` | -| poly64x2x4_t vld4q_dup_p64(poly64_t const *ptr) | `ptr -> Xn` | `LD4R {Vt.2D - Vt4.2D},[Xn]` | `Vt4.2D -> result.val[3]`
`Vt3.2D -> result.val[2]`
`Vt2.2D -> result.val[1]`
`Vt.2D -> result.val[0]` | `A64` | -| float64x1x4_t vld4_dup_f64(float64_t const *ptr) | `ptr -> Xn` | `LD4R {Vt.1D - Vt4.1D},[Xn]` | `Vt4.1D -> result.val[3]`
`Vt3.1D -> result.val[2]`
`Vt2.1D -> result.val[1]`
`Vt.1D -> result.val[0]` | `A64` | -| float64x2x4_t vld4q_dup_f64(float64_t const *ptr) | `ptr -> Xn` | `LD4R {Vt.2D - Vt4.2D},[Xn]` | `Vt4.2D -> result.val[3]`
`Vt3.2D -> result.val[2]`
`Vt2.2D -> result.val[1]`
`Vt.2D -> result.val[0]` | `A64` | -| int16x4x2_t vld2_lane_s16(
     int16_t const *ptr,
     int16x4x2_t src,
     const int lane)
| `ptr -> Xn`
`src.val[1] -> Vt2.4H`
`src.val[0] -> Vt.4H`
`0 <= lane <= 3` | `LD2 {Vt.h - Vt2.h}[lane],[Xn]` | `Vt2.4H -> result.val[1]`
`Vt.4H -> result.val[0]` | `v7/A32/A64` | -| int16x8x2_t vld2q_lane_s16(
     int16_t const *ptr,
     int16x8x2_t src,
     const int lane)
| `ptr -> Xn`
`src.val[1] -> Vt2.8H`
`src.val[0] -> Vt.8H`
`0 <= lane <= 7` | `LD2 {Vt.h - Vt2.h}[lane],[Xn]` | `Vt2.8H -> result.val[1]`
`Vt.8H -> result.val[0]` | `v7/A32/A64` | -| int32x2x2_t vld2_lane_s32(
     int32_t const *ptr,
     int32x2x2_t src,
     const int lane)
| `ptr -> Xn`
`src.val[1] -> Vt2.2S`
`src.val[0] -> Vt.2S`
`0 <= lane <= 1` | `LD2 {Vt.s - Vt2.s}[lane],[Xn]` | `Vt2.2S -> result.val[1]`
`Vt.2S -> result.val[0]` | `v7/A32/A64` | -| int32x4x2_t vld2q_lane_s32(
     int32_t const *ptr,
     int32x4x2_t src,
     const int lane)
| `ptr -> Xn`
`src.val[1] -> Vt2.4S`
`src.val[0] -> Vt.4S`
`0 <= lane <= 3` | `LD2 {Vt.s - Vt2.s}[lane],[Xn]` | `Vt2.4S -> result.val[1]`
`Vt.4S -> result.val[0]` | `v7/A32/A64` | -| uint16x4x2_t vld2_lane_u16(
     uint16_t const *ptr,
     uint16x4x2_t src,
     const int lane)
| `ptr -> Xn`
`src.val[1] -> Vt2.4H`
`src.val[0] -> Vt.4H`
`0 <= lane <= 3` | `LD2 {Vt.h - Vt2.h}[lane],[Xn]` | `Vt2.4H -> result.val[1]`
`Vt.4H -> result.val[0]` | `v7/A32/A64` | -| uint16x8x2_t vld2q_lane_u16(
     uint16_t const *ptr,
     uint16x8x2_t src,
     const int lane)
| `ptr -> Xn`
`src.val[1] -> Vt2.8H`
`src.val[0] -> Vt.8H`
`0 <= lane <= 7` | `LD2 {Vt.h - Vt2.h}[lane],[Xn]` | `Vt2.8H -> result.val[1]`
`Vt.8H -> result.val[0]` | `v7/A32/A64` | -| uint32x2x2_t vld2_lane_u32(
     uint32_t const *ptr,
     uint32x2x2_t src,
     const int lane)
| `ptr -> Xn`
`src.val[1] -> Vt2.2S`
`src.val[0] -> Vt.2S`
`0 <= lane <= 1` | `LD2 {Vt.s - Vt2.s}[lane],[Xn]` | `Vt2.2S -> result.val[1]`
`Vt.2S -> result.val[0]` | `v7/A32/A64` | -| uint32x4x2_t vld2q_lane_u32(
     uint32_t const *ptr,
     uint32x4x2_t src,
     const int lane)
| `ptr -> Xn`
`src.val[1] -> Vt2.4S`
`src.val[0] -> Vt.4S`
`0 <= lane <= 3` | `LD2 {Vt.s - Vt2.s}[lane],[Xn]` | `Vt2.4S -> result.val[1]`
`Vt.4S -> result.val[0]` | `v7/A32/A64` | -| float16x4x2_t vld2_lane_f16(
     float16_t const *ptr,
     float16x4x2_t src,
     const int lane)
| `ptr -> Xn`
`src.val[1] -> Vt2.4H`
`src.val[0] -> Vt.4H`
`0 <= lane <= 3` | `LD2 {Vt.h - Vt2.h}[lane],[Xn]` | `Vt2.4H -> result.val[1]`
`Vt.4H -> result.val[0]` | `v7/A32/A64` | -| float16x8x2_t vld2q_lane_f16(
     float16_t const *ptr,
     float16x8x2_t src,
     const int lane)
| `ptr -> Xn`
`src.val[1] -> Vt2.8H`
`src.val[0] -> Vt.8H`
`0 <= lane <= 7` | `LD2 {Vt.h - Vt2.h}[lane],[Xn]` | `Vt2.8H -> result.val[1]`
`Vt.8H -> result.val[0]` | `v7/A32/A64` | -| float32x2x2_t vld2_lane_f32(
     float32_t const *ptr,
     float32x2x2_t src,
     const int lane)
| `ptr -> Xn`
`src.val[1] -> Vt2.2S`
`src.val[0] -> Vt.2S`
`0 <= lane <= 1` | `LD2 {Vt.s - Vt2.s}[lane],[Xn]` | `Vt2.2S -> result.val[1]`
`Vt.2S -> result.val[0]` | `v7/A32/A64` | -| float32x4x2_t vld2q_lane_f32(
     float32_t const *ptr,
     float32x4x2_t src,
     const int lane)
| `ptr -> Xn`
`src.val[1] -> Vt2.4S`
`src.val[0] -> Vt.4S`
`0 <= lane <= 3` | `LD2 {Vt.s - Vt2.s}[lane],[Xn]` | `Vt2.4S -> result.val[1]`
`Vt.4S -> result.val[0]` | `v7/A32/A64` | -| poly16x4x2_t vld2_lane_p16(
     poly16_t const *ptr,
     poly16x4x2_t src,
     const int lane)
| `ptr -> Xn`
`src.val[1] -> Vt2.4H`
`src.val[0] -> Vt.4H`
`0 <= lane <= 3` | `LD2 {Vt.h - Vt2.h}[lane],[Xn]` | `Vt2.4H -> result.val[1]`
`Vt.4H -> result.val[0]` | `v7/A32/A64` | -| poly16x8x2_t vld2q_lane_p16(
     poly16_t const *ptr,
     poly16x8x2_t src,
     const int lane)
| `ptr -> Xn`
`src.val[1] -> Vt2.8H`
`src.val[0] -> Vt.8H`
`0 <= lane <= 7` | `LD2 {Vt.h - Vt2.h}[lane],[Xn]` | `Vt2.8H -> result.val[1]`
`Vt.8H -> result.val[0]` | `v7/A32/A64` | -| int8x8x2_t vld2_lane_s8(
     int8_t const *ptr,
     int8x8x2_t src,
     const int lane)
| `ptr -> Xn`
`src.val[1] -> Vt2.8B`
`src.val[0] -> Vt.8B`
`0 <= lane <= 7` | `LD2 {Vt.b - Vt2.b}[lane],[Xn]` | `Vt2.8B -> result.val[1]`
`Vt.8B -> result.val[0]` | `v7/A32/A64` | -| uint8x8x2_t vld2_lane_u8(
     uint8_t const *ptr,
     uint8x8x2_t src,
     const int lane)
| `ptr -> Xn`
`src.val[1] -> Vt2.8B`
`src.val[0] -> Vt.8B`
`0 <= lane <= 7` | `LD2 {Vt.b - Vt2.b}[lane],[Xn]` | `Vt2.8B -> result.val[1]`
`Vt.8B -> result.val[0]` | `v7/A32/A64` | -| poly8x8x2_t vld2_lane_p8(
     poly8_t const *ptr,
     poly8x8x2_t src,
     const int lane)
| `ptr -> Xn`
`src.val[1] -> Vt2.8B`
`src.val[0] -> Vt.8B`
`0 <= lane <= 7` | `LD2 {Vt.b - Vt2.b}[lane],[Xn]` | `Vt2.8B -> result.val[1]`
`Vt.8B -> result.val[0]` | `v7/A32/A64` | -| int8x16x2_t vld2q_lane_s8(
     int8_t const *ptr,
     int8x16x2_t src,
     const int lane)
| `ptr -> Xn`
`src.val[1] -> Vt2.16B`
`src.val[0] -> Vt.16B`
`0 <= lane <= 15` | `LD2 {Vt.b - Vt2.b}[lane],[Xn]` | `Vt2.16B -> result.val[1]`
`Vt.16B -> result.val[0]` | `A64` | -| uint8x16x2_t vld2q_lane_u8(
     uint8_t const *ptr,
     uint8x16x2_t src,
     const int lane)
| `ptr -> Xn`
`src.val[1] -> Vt2.16B`
`src.val[0] -> Vt.16B`
`0 <= lane <= 15` | `LD2 {Vt.b - Vt2.b}[lane],[Xn]` | `Vt2.16B -> result.val[1]`
`Vt.16B -> result.val[0]` | `A64` | -| poly8x16x2_t vld2q_lane_p8(
     poly8_t const *ptr,
     poly8x16x2_t src,
     const int lane)
| `ptr -> Xn`
`src.val[1] -> Vt2.16B`
`src.val[0] -> Vt.16B`
`0 <= lane <= 15` | `LD2 {Vt.b - Vt2.b}[lane],[Xn]` | `Vt2.16B -> result.val[1]`
`Vt.16B -> result.val[0]` | `A64` | -| int64x1x2_t vld2_lane_s64(
     int64_t const *ptr,
     int64x1x2_t src,
     const int lane)
| `ptr -> Xn`
`src.val[1] -> Vt2.1D`
`src.val[0] -> Vt.1D`
`0 <= lane <= 0` | `LD2 {Vt.d - Vt2.d}[lane],[Xn]` | `ptr -> Xn`
`Vt2.1D -> result.val[1]`
`Vt.1D -> result.val[0]` | `A64` | -| int64x2x2_t vld2q_lane_s64(
     int64_t const *ptr,
     int64x2x2_t src,
     const int lane)
| `ptr -> Xn`
`src.val[1] -> Vt2.2D`
`src.val[0] -> Vt.2D`
`0 <= lane <= 1` | `LD2 {Vt.d - Vt2.d}[lane],[Xn]` | `ptr -> Xn`
`Vt2.2D -> result.val[1]`
`Vt.2D -> result.val[0]` | `A64` | -| uint64x1x2_t vld2_lane_u64(
     uint64_t const *ptr,
     uint64x1x2_t src,
     const int lane)
| `ptr -> Xn`
`src.val[1] -> Vt2.1D`
`src.val[0] -> Vt.1D`
`0 <= lane <= 0` | `LD2 {Vt.d - Vt2.d}[lane],[Xn]` | `Vt2.1D -> result.val[1]`
`Vt.1D -> result.val[0]` | `A64` | -| uint64x2x2_t vld2q_lane_u64(
     uint64_t const *ptr,
     uint64x2x2_t src,
     const int lane)
| `ptr -> Xn`
`src.val[1] -> Vt2.2D`
`src.val[0] -> Vt.2D`
`0 <= lane <= 1` | `LD2 {Vt.d - Vt2.d}[lane],[Xn]` | `Vt2.2D -> result.val[1]`
`Vt.2D -> result.val[0]` | `A64` | -| poly64x1x2_t vld2_lane_p64(
     poly64_t const *ptr,
     poly64x1x2_t src,
     const int lane)
| `ptr -> Xn`
`src.val[1] -> Vt2.1D`
`src.val[0] -> Vt.1D`
`0 <= lane <= 0` | `LD2 {Vt.d - Vt2.d}[lane],[Xn]` | `Vt2.1D -> result.val[1]`
`Vt.1D -> result.val[0]` | `A64` | -| poly64x2x2_t vld2q_lane_p64(
     poly64_t const *ptr,
     poly64x2x2_t src,
     const int lane)
| `ptr -> Xn`
`src.val[1] -> Vt2.2D`
`src.val[0] -> Vt.2D`
`0 <= lane <= 1` | `LD2 {Vt.d - Vt2.d}[lane],[Xn]` | `Vt2.2D -> result.val[1]`
`Vt.2D -> result.val[0]` | `A64` | -| float64x1x2_t vld2_lane_f64(
     float64_t const *ptr,
     float64x1x2_t src,
     const int lane)
| `ptr -> Xn`
`src.val[1] -> Vt2.1D`
`src.val[0] -> Vt.1D`
`0 <= lane <= 0` | `LD2 {Vt.d - Vt2.d}[lane],[Xn]` | `Vt2.1D -> result.val[1]`
`Vt.1D -> result.val[0]` | `A64` | -| float64x2x2_t vld2q_lane_f64(
     float64_t const *ptr,
     float64x2x2_t src,
     const int lane)
| `ptr -> Xn`
`src.val[1] -> Vt2.2D`
`src.val[0] -> Vt.2D`
`0 <= lane <= 1` | `LD2 {Vt.d - Vt2.d}[lane],[Xn]` | `Vt2.2D -> result.val[1]`
`Vt.2D -> result.val[0]` | `A64` | -| int16x4x3_t vld3_lane_s16(
     int16_t const *ptr,
     int16x4x3_t src,
     const int lane)
| `ptr -> Xn`
`src.val[2] -> Vt3.4H`
`src.val[1] -> Vt2.4H`
`src.val[0] -> Vt.4H`
`0 <= lane <= 3` | `LD3 {Vt.h - Vt3.h}[lane],[Xn]` | `Vt3.4H -> result.val[2]`
`Vt2.4H -> result.val[1]`
`Vt.4H -> result.val[0]` | `v7/A32/A64` | -| int16x8x3_t vld3q_lane_s16(
     int16_t const *ptr,
     int16x8x3_t src,
     const int lane)
| `ptr -> Xn`
`src.val[2] -> Vt3.8H`
`src.val[1] -> Vt2.8H`
`src.val[0] -> Vt.8H`
`0 <= lane <= 7` | `LD3 {Vt.h - Vt3.h}[lane],[Xn]` | `Vt3.8H -> result.val[2]`
`Vt2.8H -> result.val[1]`
`Vt.8H -> result.val[0]` | `v7/A32/A64` | -| int32x2x3_t vld3_lane_s32(
     int32_t const *ptr,
     int32x2x3_t src,
     const int lane)
| `ptr -> Xn`
`src.val[2] -> Vt3.2S`
`src.val[1] -> Vt2.2S`
`src.val[0] -> Vt.2S`
`0 <= lane <= 1` | `LD3 {Vt.s - Vt3.s}[lane],[Xn]` | `Vt3.2S -> result.val[2]`
`Vt2.2S -> result.val[1]`
`Vt.2S -> result.val[0]` | `v7/A32/A64` | -| int32x4x3_t vld3q_lane_s32(
     int32_t const *ptr,
     int32x4x3_t src,
     const int lane)
| `ptr -> Xn`
`src.val[2] -> Vt3.4S`
`src.val[1] -> Vt2.4S`
`src.val[0] -> Vt.4S`
`0 <= lane <= 3` | `LD3 {Vt.s - Vt3.s}[lane],[Xn]` | `Vt3.4S -> result.val[2]`
`Vt2.4S -> result.val[1]`
`Vt.4S -> result.val[0]` | `v7/A32/A64` | -| uint16x4x3_t vld3_lane_u16(
     uint16_t const *ptr,
     uint16x4x3_t src,
     const int lane)
| `ptr -> Xn`
`src.val[2] -> Vt3.4H`
`src.val[1] -> Vt2.4H`
`src.val[0] -> Vt.4H`
`0 <= lane <= 3` | `LD3 {Vt.h - Vt3.h}[lane],[Xn]` | `Vt3.4H -> result.val[2]`
`Vt2.4H -> result.val[1]`
`Vt.4H -> result.val[0]` | `v7/A32/A64` | -| uint16x8x3_t vld3q_lane_u16(
     uint16_t const *ptr,
     uint16x8x3_t src,
     const int lane)
| `ptr -> Xn`
`src.val[2] -> Vt3.8H`
`src.val[1] -> Vt2.8H`
`src.val[0] -> Vt.8H`
`0 <= lane <= 7` | `LD3 {Vt.h - Vt3.h}[lane],[Xn]` | `Vt3.8H -> result.val[2]`
`Vt2.8H -> result.val[1]`
`Vt.8H -> result.val[0]` | `v7/A32/A64` | -| uint32x2x3_t vld3_lane_u32(
     uint32_t const *ptr,
     uint32x2x3_t src,
     const int lane)
| `ptr -> Xn`
`src.val[2] -> Vt3.2S`
`src.val[1] -> Vt2.2S`
`src.val[0] -> Vt.2S`
`0 <= lane <= 1` | `LD3 {Vt.s - Vt3.s}[lane],[Xn]` | `Vt3.2S -> result.val[2]`
`Vt2.2S -> result.val[1]`
`Vt.2S -> result.val[0]` | `v7/A32/A64` | -| uint32x4x3_t vld3q_lane_u32(
     uint32_t const *ptr,
     uint32x4x3_t src,
     const int lane)
| `ptr -> Xn`
`src.val[2] -> Vt3.4S`
`src.val[1] -> Vt2.4S`
`src.val[0] -> Vt.4S`
`0 <= lane <= 3` | `LD3 {Vt.s - Vt3.s}[lane],[Xn]` | `Vt3.4S -> result.val[2]`
`Vt2.4S -> result.val[1]`
`Vt.4S -> result.val[0]` | `v7/A32/A64` | -| float16x4x3_t vld3_lane_f16(
     float16_t const *ptr,
     float16x4x3_t src,
     const int lane)
| `ptr -> Xn`
`src.val[2] -> Vt3.4H`
`src.val[1] -> Vt2.4H`
`src.val[0] -> Vt.4H`
`0 <= lane <= 3` | `LD3 {Vt.h - Vt3.h}[lane],[Xn]` | `Vt3.4H -> result.val[2]`
`Vt2.4H -> result.val[1]`
`Vt.4H -> result.val[0]` | `v7/A32/A64` | -| float16x8x3_t vld3q_lane_f16(
     float16_t const *ptr,
     float16x8x3_t src,
     const int lane)
| `ptr -> Xn`
`src.val[2] -> Vt3.8H`
`src.val[1] -> Vt2.8H`
`src.val[0] -> Vt.8H`
`0 <= lane <= 7` | `LD3 {Vt.h - Vt3.h}[lane],[Xn]` | `Vt3.8H -> result.val[2]`
`Vt2.8H -> result.val[1]`
`Vt.8H -> result.val[0]` | `v7/A32/A64` | -| float32x2x3_t vld3_lane_f32(
     float32_t const *ptr,
     float32x2x3_t src,
     const int lane)
| `ptr -> Xn`
`src.val[2] -> Vt3.2S`
`src.val[1] -> Vt2.2S`
`src.val[0] -> Vt.2S`
`0 <= lane <= 1` | `LD3 {Vt.s - Vt3.s}[lane],[Xn]` | `Vt3.2S -> result.val[2]`
`Vt2.2S -> result.val[1]`
`Vt.2S -> result.val[0]` | `v7/A32/A64` | -| float32x4x3_t vld3q_lane_f32(
     float32_t const *ptr,
     float32x4x3_t src,
     const int lane)
| `ptr -> Xn`
`src.val[2] -> Vt3.4S`
`src.val[1] -> Vt2.4S`
`src.val[0] -> Vt.4S`
`0 <= lane <= 3` | `LD3 {Vt.s - Vt3.s}[lane],[Xn]` | `Vt3.4S -> result.val[2]`
`Vt2.4S -> result.val[1]`
`Vt.4S -> result.val[0]` | `v7/A32/A64` | -| poly16x4x3_t vld3_lane_p16(
     poly16_t const *ptr,
     poly16x4x3_t src,
     const int lane)
| `ptr -> Xn`
`src.val[2] -> Vt3.4H`
`src.val[1] -> Vt2.4H`
`src.val[0] -> Vt.4H`
`0 <= lane <= 3` | `LD3 {Vt.h - Vt3.h}[lane],[Xn]` | `Vt3.4H -> result.val[2]`
`Vt2.4H -> result.val[1]`
`Vt.4H -> result.val[0]` | `v7/A32/A64` | -| poly16x8x3_t vld3q_lane_p16(
     poly16_t const *ptr,
     poly16x8x3_t src,
     const int lane)
| `ptr -> Xn`
`src.val[2] -> Vt3.8H`
`src.val[1] -> Vt2.8H`
`src.val[0] -> Vt.8H`
`0 <= lane <= 7` | `LD3 {Vt.h - Vt3.h}[lane],[Xn]` | `Vt3.8H -> result.val[2]`
`Vt2.8H -> result.val[1]`
`Vt.8H -> result.val[0]` | `v7/A32/A64` | -| int8x8x3_t vld3_lane_s8(
     int8_t const *ptr,
     int8x8x3_t src,
     const int lane)
| `ptr -> Xn`
`src.val[2] -> Vt3.8B`
`src.val[1] -> Vt2.8B`
`src.val[0] -> Vt.8B`
`0 <= lane <= 7` | `LD3 {Vt.b - Vt3.b}[lane],[Xn]` | `Vt3.8B -> result.val[2]`
`Vt2.8B -> result.val[1]`
`Vt.8B -> result.val[0]` | `v7/A32/A64` | -| uint8x8x3_t vld3_lane_u8(
     uint8_t const *ptr,
     uint8x8x3_t src,
     const int lane)
| `ptr -> Xn`
`src.val[2] -> Vt3.8B`
`src.val[1] -> Vt2.8B`
`src.val[0] -> Vt.8B`
`0 <= lane <= 7` | `LD3 {Vt.b - Vt3.b}[lane],[Xn]` | `Vt3.8B -> result.val[2]`
`Vt2.8B -> result.val[1]`
`Vt.8B -> result.val[0]` | `v7/A32/A64` | -| poly8x8x3_t vld3_lane_p8(
     poly8_t const *ptr,
     poly8x8x3_t src,
     const int lane)
| `ptr -> Xn`
`src.val[2] -> Vt3.8B`
`src.val[1] -> Vt2.8B`
`src.val[0] -> Vt.8B`
`0 <= lane <= 7` | `LD3 {Vt.b - Vt3.b}[lane],[Xn]` | `Vt3.8B -> result.val[2]`
`Vt2.8B -> result.val[1]`
`Vt.8B -> result.val[0]` | `v7/A32/A64` | -| int8x16x3_t vld3q_lane_s8(
     int8_t const *ptr,
     int8x16x3_t src,
     const int lane)
| `ptr -> Xn`
`src.val[2] -> Vt3.16B`
`src.val[1] -> Vt2.16B`
`src.val[0] -> Vt.16B`
`0 <= lane <= 15` | `LD3 {Vt.b - Vt3.b}[lane],[Xn]` | `Vt3.16B -> result.val[2]`
`Vt2.16B -> result.val[1]`
`Vt.16B -> result.val[0]` | `A64` | -| uint8x16x3_t vld3q_lane_u8(
     uint8_t const *ptr,
     uint8x16x3_t src,
     const int lane)
| `ptr -> Xn`
`src.val[2] -> Vt3.16B`
`src.val[1] -> Vt2.16B`
`src.val[0] -> Vt.16B`
`0 <= lane <= 15` | `LD3 {Vt.b - Vt3.b}[lane],[Xn]` | `Vt3.16B -> result.val[2]`
`Vt2.16B -> result.val[1]`
`Vt.16B -> result.val[0]` | `A64` | -| poly8x16x3_t vld3q_lane_p8(
     poly8_t const *ptr,
     poly8x16x3_t src,
     const int lane)
| `ptr -> Xn`
`src.val[2] -> Vt3.16B`
`src.val[1] -> Vt2.16B`
`src.val[0] -> Vt.16B`
`0 <= lane <= 15` | `LD3 {Vt.b - Vt3.b}[lane],[Xn]` | `Vt3.16B -> result.val[2]`
`Vt2.16B -> result.val[1]`
`Vt.16B -> result.val[0]` | `A64` | -| int64x1x3_t vld3_lane_s64(
     int64_t const *ptr,
     int64x1x3_t src,
     const int lane)
| `ptr -> Xn`
`src.val[2] -> Vt3.1D`
`src.val[1] -> Vt2.1D`
`src.val[0] -> Vt.1D`
`0 <= lane <= 0` | `LD3 {Vt.d - Vt3.d}[lane],[Xn]` | `Vt3.1D -> result.val[2]`
`Vt2.1D -> result.val[1]`
`Vt.1D -> result.val[0]` | `A64` | -| int64x2x3_t vld3q_lane_s64(
     int64_t const *ptr,
     int64x2x3_t src,
     const int lane)
| `ptr -> Xn`
`src.val[2] -> Vt3.2D`
`src.val[1] -> Vt2.2D`
`src.val[0] -> Vt.2D`
`0 <= lane <= 1` | `LD3 {Vt.d - Vt3.d}[lane],[Xn]` | `Vt3.2D -> result.val[2]`
`Vt2.2D -> result.val[1]`
`Vt.2D -> result.val[0]` | `A64` | -| uint64x1x3_t vld3_lane_u64(
     uint64_t const *ptr,
     uint64x1x3_t src,
     const int lane)
| `ptr -> Xn`
`src.val[2] -> Vt3.1D`
`src.val[1] -> Vt2.1D`
`src.val[0] -> Vt.1D`
`0 <= lane <= 0` | `LD3 {Vt.d - Vt3.d}[lane],[Xn]` | `Vt3.1D -> result.val[2]`
`Vt2.1D -> result.val[1]`
`Vt.1D -> result.val[0]` | `A64` | -| uint64x2x3_t vld3q_lane_u64(
     uint64_t const *ptr,
     uint64x2x3_t src,
     const int lane)
| `ptr -> Xn`
`src.val[2] -> Vt3.2D`
`src.val[1] -> Vt2.2D`
`src.val[0] -> Vt.2D`
`0 <= lane <= 1` | `LD3 {Vt.d - Vt3.d}[lane],[Xn]` | `Vt3.2D -> result.val[2]`
`Vt2.2D -> result.val[1]`
`Vt.2D -> result.val[0]` | `A64` | -| poly64x1x3_t vld3_lane_p64(
     poly64_t const *ptr,
     poly64x1x3_t src,
     const int lane)
| `ptr -> Xn`
`src.val[2] -> Vt3.1D`
`src.val[1] -> Vt2.1D`
`src.val[0] -> Vt.1D`
`0 <= lane <= 0` | `LD3 {Vt.d - Vt3.d}[lane],[Xn]` | `Vt3.1D -> result.val[2]`
`Vt2.1D -> result.val[1]`
`Vt.1D -> result.val[0]` | `A64` | -| poly64x2x3_t vld3q_lane_p64(
     poly64_t const *ptr,
     poly64x2x3_t src,
     const int lane)
| `ptr -> Xn`
`src.val[2] -> Vt3.2D`
`src.val[1] -> Vt2.2D`
`src.val[0] -> Vt.2D`
`0 <= lane <= 1` | `LD3 {Vt.d - Vt3.d}[lane],[Xn]` | `Vt3.2D -> result.val[2]`
`Vt2.2D -> result.val[1]`
`Vt.2D -> result.val[0]` | `A64` | -| float64x1x3_t vld3_lane_f64(
     float64_t const *ptr,
     float64x1x3_t src,
     const int lane)
| `ptr -> Xn`
`src.val[2] -> Vt3.1D`
`src.val[1] -> Vt2.1D`
`src.val[0] -> Vt.1D`
`0 <= lane <= 0` | `LD3 {Vt.d - Vt3.d}[lane],[Xn]` | `Vt3.1D -> result.val[2]`
`Vt2.1D -> result.val[1]`
`Vt.1D -> result.val[0]` | `A64` | -| float64x2x3_t vld3q_lane_f64(
     float64_t const *ptr,
     float64x2x3_t src,
     const int lane)
| `ptr -> Xn`
`src.val[2] -> Vt3.2D`
`src.val[1] -> Vt2.2D`
`src.val[0] -> Vt.2D`
`0 <= lane <= 1` | `LD3 {Vt.d - Vt3.d}[lane],[Xn]` | `Vt3.2D -> result.val[2]`
`Vt2.2D -> result.val[1]`
`Vt.2D -> result.val[0]` | `A64` | -| int16x4x4_t vld4_lane_s16(
     int16_t const *ptr,
     int16x4x4_t src,
     const int lane)
| `ptr -> Xn`
`src.val[3] -> Vt4.4H`
`src.val[2] -> Vt3.4H`
`src.val[1] -> Vt2.4H`
`src.val[0] -> Vt.4H`
`0 <= lane <= 3` | `LD4 {Vt.h - Vt4.h}[lane],[Xn]` | `Vt4.4H -> result.val[3]`
`Vt3.4H -> result.val[2]`
`Vt2.4H -> result.val[1]`
`Vt.4H -> result.val[0]` | `v7/A32/A64` | -| int16x8x4_t vld4q_lane_s16(
     int16_t const *ptr,
     int16x8x4_t src,
     const int lane)
| `ptr -> Xn`
`src.val[3] -> Vt4.8H`
`src.val[2] -> Vt3.8H`
`src.val[1] -> Vt2.8H`
`src.val[0] -> Vt.8H`
`0 <= lane <= 7` | `LD4 {Vt.h - Vt4.h}[lane],[Xn]` | `Vt4.8H -> result.val[3]`
`Vt3.8H -> result.val[2]`
`Vt2.8H -> result.val[1]`
`Vt.8H -> result.val[0]` | `v7/A32/A64` | -| int32x2x4_t vld4_lane_s32(
     int32_t const *ptr,
     int32x2x4_t src,
     const int lane)
| `ptr -> Xn`
`src.val[3] -> Vt4.2S`
`src.val[2] -> Vt3.2S`
`src.val[1] -> Vt2.2S`
`src.val[0] -> Vt.2S`
`0 <= lane <= 1` | `LD4 {Vt.s - Vt4.s}[lane],[Xn]` | `Vt4.2S -> result.val[3]`
`Vt3.2S -> result.val[2]`
`Vt2.2S -> result.val[1]`
`Vt.2S -> result.val[0]` | `v7/A32/A64` | -| int32x4x4_t vld4q_lane_s32(
     int32_t const *ptr,
     int32x4x4_t src,
     const int lane)
| `ptr -> Xn`
`src.val[3] -> Vt4.4S`
`src.val[2] -> Vt3.4S`
`src.val[1] -> Vt2.4S`
`src.val[0] -> Vt.4S`
`0 <= lane <= 3` | `LD4 {Vt.s - Vt4.s}[lane],[Xn]` | `Vt4.4S -> result.val[3]`
`Vt3.4S -> result.val[2]`
`Vt2.4S -> result.val[1]`
`Vt.4S -> result.val[0]` | `v7/A32/A64` | -| uint16x4x4_t vld4_lane_u16(
     uint16_t const *ptr,
     uint16x4x4_t src,
     const int lane)
| `ptr -> Xn`
`src.val[3] -> Vt4.4H`
`src.val[2] -> Vt3.4H`
`src.val[1] -> Vt2.4H`
`src.val[0] -> Vt.4H`
`0 <= lane <= 3` | `LD4 {Vt.h - Vt4.h}[lane],[Xn]` | `Vt4.4H -> result.val[3]`
`Vt3.4H -> result.val[2]`
`Vt2.4H -> result.val[1]`
`Vt.4H -> result.val[0]` | `v7/A32/A64` | -| uint16x8x4_t vld4q_lane_u16(
     uint16_t const *ptr,
     uint16x8x4_t src,
     const int lane)
| `ptr -> Xn`
`src.val[3] -> Vt4.8H`
`src.val[2] -> Vt3.8H`
`src.val[1] -> Vt2.8H`
`src.val[0] -> Vt.8H`
`0 <= lane <= 7` | `LD4 {Vt.h - Vt4.h}[lane],[Xn]` | `Vt4.8H -> result.val[3]`
`Vt3.8H -> result.val[2]`
`Vt2.8H -> result.val[1]`
`Vt.8H -> result.val[0]` | `v7/A32/A64` | -| uint32x2x4_t vld4_lane_u32(
     uint32_t const *ptr,
     uint32x2x4_t src,
     const int lane)
| `ptr -> Xn`
`src.val[3] -> Vt4.2S`
`src.val[2] -> Vt3.2S`
`src.val[1] -> Vt2.2S`
`src.val[0] -> Vt.2S`
`0 <= lane <= 1` | `LD4 {Vt.s - Vt4.s}[lane],[Xn]` | `Vt4.2S -> result.val[3]`
`Vt3.2S -> result.val[2]`
`Vt2.2S -> result.val[1]`
`Vt.2S -> result.val[0]` | `v7/A32/A64` | -| uint32x4x4_t vld4q_lane_u32(
     uint32_t const *ptr,
     uint32x4x4_t src,
     const int lane)
| `ptr -> Xn`
`src.val[3] -> Vt4.4S`
`src.val[2] -> Vt3.4S`
`src.val[1] -> Vt2.4S`
`src.val[0] -> Vt.4S`
`0 <= lane <= 3` | `LD4 {Vt.s - Vt4.s}[lane],[Xn]` | `Vt4.4S -> result.val[3]`
`Vt3.4S -> result.val[2]`
`Vt2.4S -> result.val[1]`
`Vt.4S -> result.val[0]` | `v7/A32/A64` | -| float16x4x4_t vld4_lane_f16(
     float16_t const *ptr,
     float16x4x4_t src,
     const int lane)
| `ptr -> Xn`
`src.val[3] -> Vt4.4H`
`src.val[2] -> Vt3.4H`
`src.val[1] -> Vt2.4H`
`src.val[0] -> Vt.4H`
`0 <= lane <= 3` | `LD4 {Vt.h - Vt4.h}[lane],[Xn]` | `Vt4.4H -> result.val[3]`
`Vt3.4H -> result.val[2]`
`Vt2.4H -> result.val[1]`
`Vt.4H -> result.val[0]` | `v7/A32/A64` | -| float16x8x4_t vld4q_lane_f16(
     float16_t const *ptr,
     float16x8x4_t src,
     const int lane)
| `ptr -> Xn`
`src.val[3] -> Vt4.8H`
`src.val[2] -> Vt3.8H`
`src.val[1] -> Vt2.8H`
`src.val[0] -> Vt.8H`
`0 <= lane <= 7` | `LD4 {Vt.h - Vt4.h}[lane],[Xn]` | `Vt4.8H -> result.val[3]`
`Vt3.8H -> result.val[2]`
`Vt2.8H -> result.val[1]`
`Vt.8H -> result.val[0]` | `v7/A32/A64` | -| float32x2x4_t vld4_lane_f32(
     float32_t const *ptr,
     float32x2x4_t src,
     const int lane)
| `ptr -> Xn`
`src.val[3] -> Vt4.2S`
`src.val[2] -> Vt3.2S`
`src.val[1] -> Vt2.2S`
`src.val[0] -> Vt.2S`
`0 <= lane <= 1` | `LD4 {Vt.s - Vt4.s}[lane],[Xn]` | `Vt4.2S -> result.val[3]`
`Vt3.2S -> result.val[2]`
`Vt2.2S -> result.val[1]`
`Vt.2S -> result.val[0]` | `v7/A32/A64` | -| float32x4x4_t vld4q_lane_f32(
     float32_t const *ptr,
     float32x4x4_t src,
     const int lane)
| `ptr -> Xn`
`src.val[3] -> Vt4.4S`
`src.val[2] -> Vt3.4S`
`src.val[1] -> Vt2.4S`
`src.val[0] -> Vt.4S`
`0 <= lane <= 3` | `LD4 {Vt.s - Vt4.s}[lane],[Xn]` | `Vt4.4S -> result.val[3]`
`Vt3.4S -> result.val[2]`
`Vt2.4S -> result.val[1]`
`Vt.4S -> result.val[0]` | `v7/A32/A64` | -| poly16x4x4_t vld4_lane_p16(
     poly16_t const *ptr,
     poly16x4x4_t src,
     const int lane)
| `ptr -> Xn`
`src.val[3] -> Vt4.4H`
`src.val[2] -> Vt3.4H`
`src.val[1] -> Vt2.4H`
`src.val[0] -> Vt.4H`
`0 <= lane <= 3` | `LD4 {Vt.h - Vt4.h}[lane],[Xn]` | `Vt4.4H -> result.val[3]`
`Vt3.4H -> result.val[2]`
`Vt2.4H -> result.val[1]`
`Vt.4H -> result.val[0]` | `v7/A32/A64` | -| poly16x8x4_t vld4q_lane_p16(
     poly16_t const *ptr,
     poly16x8x4_t src,
     const int lane)
| `ptr -> Xn`
`src.val[3] -> Vt4.8H`
`src.val[2] -> Vt3.8H`
`src.val[1] -> Vt2.8H`
`src.val[0] -> Vt.8H`
`0 <= lane <= 7` | `LD4 {Vt.h - Vt4.h}[lane],[Xn]` | `Vt4.8H -> result.val[3]`
`Vt3.8H -> result.val[2]`
`Vt2.8H -> result.val[1]`
`Vt.8H -> result.val[0]` | `v7/A32/A64` | -| int8x8x4_t vld4_lane_s8(
     int8_t const *ptr,
     int8x8x4_t src,
     const int lane)
| `ptr -> Xn`
`src.val[3] -> Vt4.8B`
`src.val[2] -> Vt3.8B`
`src.val[1] -> Vt2.8B`
`src.val[0] -> Vt.8B`
`0 <= lane <= 7` | `LD4 {Vt.b - Vt4.b}[lane],[Xn]` | `Vt4.8B -> result.val[3]`
`Vt3.8B -> result.val[2]`
`Vt2.8B -> result.val[1]`
`Vt.8B -> result.val[0]` | `v7/A32/A64` | -| uint8x8x4_t vld4_lane_u8(
     uint8_t const *ptr,
     uint8x8x4_t src,
     const int lane)
| `ptr -> Xn`
`src.val[3] -> Vt4.8B`
`src.val[2] -> Vt3.8B`
`src.val[1] -> Vt2.8B`
`src.val[0] -> Vt.8B`
`0 <= lane <= 7` | `LD4 {Vt.b - Vt4.b}[lane],[Xn]` | `Vt4.8B -> result.val[3]`
`Vt3.8B -> result.val[2]`
`Vt2.8B -> result.val[1]`
`Vt.8B -> result.val[0]` | `v7/A32/A64` | -| poly8x8x4_t vld4_lane_p8(
     poly8_t const *ptr,
     poly8x8x4_t src,
     const int lane)
| `ptr -> Xn`
`src.val[3] -> Vt4.8B`
`src.val[2] -> Vt3.8B`
`src.val[1] -> Vt2.8B`
`src.val[0] -> Vt.8B`
`0 <= lane <= 7` | `LD4 {Vt.b - Vt4.b}[lane],[Xn]` | `Vt4.8B -> result.val[3]`
`Vt3.8B -> result.val[2]`
`Vt2.8B -> result.val[1]`
`Vt.8B -> result.val[0]` | `v7/A32/A64` | -| int8x16x4_t vld4q_lane_s8(
     int8_t const *ptr,
     int8x16x4_t src,
     const int lane)
| `ptr -> Xn`
`src.val[3] -> Vt4.16B`
`src.val[2] -> Vt3.16B`
`src.val[1] -> Vt2.16B`
`src.val[0] -> Vt.16B`
`0 <= lane <= 15` | `LD4 {Vt.b - Vt4.b}[lane],[Xn]` | `Vt4.16B -> result.val[3]`
`Vt3.16B -> result.val[2]`
`Vt2.16B -> result.val[1]`
`Vt.16B -> result.val[0]` | `A64` | -| uint8x16x4_t vld4q_lane_u8(
     uint8_t const *ptr,
     uint8x16x4_t src,
     const int lane)
| `ptr -> Xn`
`src.val[3] -> Vt4.16B`
`src.val[2] -> Vt3.16B`
`src.val[1] -> Vt2.16B`
`src.val[0] -> Vt.16B`
`0 <= lane <= 15` | `LD4 {Vt.b - Vt4.b}[lane],[Xn]` | `Vt4.16B -> result.val[3]`
`Vt3.16B -> result.val[2]`
`Vt2.16B -> result.val[1]`
`Vt.16B -> result.val[0]` | `A64` | -| poly8x16x4_t vld4q_lane_p8(
     poly8_t const *ptr,
     poly8x16x4_t src,
     const int lane)
| `ptr -> Xn`
`src.val[3] -> Vt4.16B`
`src.val[2] -> Vt3.16B`
`src.val[1] -> Vt2.16B`
`src.val[0] -> Vt.16B`
`0 <= lane <= 15` | `LD4 {Vt.b - Vt4.b}[lane],[Xn]` | `Vt4.16B -> result.val[3]`
`Vt3.16B -> result.val[2]`
`Vt2.16B -> result.val[1]`
`Vt.16B -> result.val[0]` | `A64` | -| int64x1x4_t vld4_lane_s64(
     int64_t const *ptr,
     int64x1x4_t src,
     const int lane)
| `ptr -> Xn`
`src.val[3] -> Vt4.1D`
`src.val[2] -> Vt3.1D`
`src.val[1] -> Vt2.1D`
`src.val[0] -> Vt.1D`
`0 <= lane <= 0` | `LD4 {Vt.d - Vt4.d}[lane],[Xn]` | `Vt4.1D -> result.val[3]`
`Vt3.1D -> result.val[2]`
`Vt2.1D -> result.val[1]`
`Vt.1D -> result.val[0]` | `A64` | -| int64x2x4_t vld4q_lane_s64(
     int64_t const *ptr,
     int64x2x4_t src,
     const int lane)
| `ptr -> Xn`
`src.val[3] -> Vt4.2D`
`src.val[2] -> Vt3.2D`
`src.val[1] -> Vt2.2D`
`src.val[0] -> Vt.2D`
`0 <= lane <= 1` | `LD4 {Vt.d - Vt4.d}[lane],[Xn]` | `Vt4.2D -> result.val[3]`
`Vt3.2D -> result.val[2]`
`Vt2.2D -> result.val[1]`
`Vt.2D -> result.val[0]` | `A64` | -| uint64x1x4_t vld4_lane_u64(
     uint64_t const *ptr,
     uint64x1x4_t src,
     const int lane)
| `ptr -> Xn`
`src.val[3] -> Vt4.1D`
`src.val[2] -> Vt3.1D`
`src.val[1] -> Vt2.1D`
`src.val[0] -> Vt.1D`
`0 <= lane <= 0` | `LD4 {Vt.d - Vt4.d}[lane],[Xn]` | `Vt4.1D -> result.val[3]`
`Vt3.1D -> result.val[2]`
`Vt2.1D -> result.val[1]`
`Vt.1D -> result.val[0]` | `A64` | -| uint64x2x4_t vld4q_lane_u64(
     uint64_t const *ptr,
     uint64x2x4_t src,
     const int lane)
| `ptr -> Xn`
`src.val[3] -> Vt4.2D`
`src.val[2] -> Vt3.2D`
`src.val[1] -> Vt2.2D`
`src.val[0] -> Vt.2D`
`0 <= lane <= 1` | `LD4 {Vt.d - Vt4.d}[lane],[Xn]` | `Vt4.2D -> result.val[3]`
`Vt3.2D -> result.val[2]`
`Vt2.2D -> result.val[1]`
`Vt.2D -> result.val[0]` | `A64` | -| poly64x1x4_t vld4_lane_p64(
     poly64_t const *ptr,
     poly64x1x4_t src,
     const int lane)
| `ptr -> Xn`
`src.val[3] -> Vt4.1D`
`src.val[2] -> Vt3.1D`
`src.val[1] -> Vt2.1D`
`src.val[0] -> Vt.1D`
`0 <= lane <= 0` | `LD4 {Vt.d - Vt4.d}[lane],[Xn]` | `Vt4.1D -> result.val[3]`
`Vt3.1D -> result.val[2]`
`Vt2.1D -> result.val[1]`
`Vt.1D -> result.val[0]` | `A64` | -| poly64x2x4_t vld4q_lane_p64(
     poly64_t const *ptr,
     poly64x2x4_t src,
     const int lane)
| `ptr -> Xn`
`src.val[3] -> Vt4.2D`
`src.val[2] -> Vt3.2D`
`src.val[1] -> Vt2.2D`
`src.val[0] -> Vt.2D`
`0 <= lane <= 1` | `LD4 {Vt.d - Vt4.d}[lane],[Xn]` | `Vt4.2D -> result.val[3]`
`Vt3.2D -> result.val[2]`
`Vt2.2D -> result.val[1]`
`Vt.2D -> result.val[0]` | `A64` | -| float64x1x4_t vld4_lane_f64(
     float64_t const *ptr,
     float64x1x4_t src,
     const int lane)
| `ptr -> Xn`
`src.val[3] -> Vt4.1D`
`src.val[2] -> Vt3.1D`
`src.val[1] -> Vt2.1D`
`src.val[0] -> Vt.1D`
`0 <= lane <= 0` | `LD4 {Vt.d - Vt4.d}[lane],[Xn]` | `Vt4.1D -> result.val[3]`
`Vt3.1D -> result.val[2]`
`Vt2.1D -> result.val[1]`
`Vt.1D -> result.val[0]` | `A64` | -| float64x2x4_t vld4q_lane_f64(
     float64_t const *ptr,
     float64x2x4_t src,
     const int lane)
| `ptr -> Xn`
`src.val[3] -> Vt4.2D`
`src.val[2] -> Vt3.2D`
`src.val[1] -> Vt2.2D`
`src.val[0] -> Vt.2D`
`0 <= lane <= 1` | `LD4 {Vt.d - Vt4.d}[lane],[Xn]` | `Vt4.2D -> result.val[3]`
`Vt3.2D -> result.val[2]`
`Vt2.2D -> result.val[1]`
`Vt.2D -> result.val[0]` | `A64` | -| int8x8x2_t vld1_s8_x2(int8_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.8B - Vt2.8B},[Xn]` | `Vt2.8B -> result.val[1]`
`Vt.8B -> result.val[0]` | `v7/A32/A64` | -| int8x16x2_t vld1q_s8_x2(int8_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.16B - Vt2.16B},[Xn]` | `Vt2.16B -> result.val[1]`
`Vt.16B -> result.val[0]` | `v7/A32/A64` | -| int16x4x2_t vld1_s16_x2(int16_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.4H - Vt2.4H},[Xn]` | `Vt2.4H -> result.val[1]`
`Vt.4H -> result.val[0]` | `v7/A32/A64` | -| int16x8x2_t vld1q_s16_x2(int16_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.8H - Vt2.8H},[Xn]` | `Vt2.8H -> result.val[1]`
`Vt.8H -> result.val[0]` | `v7/A32/A64` | -| int32x2x2_t vld1_s32_x2(int32_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.2S - Vt2.2S},[Xn]` | `Vt2.2S -> result.val[1]`
`Vt.2S -> result.val[0]` | `v7/A32/A64` | -| int32x4x2_t vld1q_s32_x2(int32_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.4S - Vt2.4S},[Xn]` | `Vt2.4S -> result.val[1]`
`Vt.4S -> result.val[0]` | `v7/A32/A64` | -| uint8x8x2_t vld1_u8_x2(uint8_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.8B - Vt2.8B},[Xn]` | `Vt2.8B -> result.val[1]`
`Vt.8B -> result.val[0]` | `v7/A32/A64` | -| uint8x16x2_t vld1q_u8_x2(uint8_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.16B - Vt2.16B},[Xn]` | `Vt2.16B -> result.val[1]`
`Vt.16B -> result.val[0]` | `v7/A32/A64` | -| uint16x4x2_t vld1_u16_x2(uint16_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.4H - Vt2.4H},[Xn]` | `Vt2.4H -> result.val[1]`
`Vt.4H -> result.val[0]` | `v7/A32/A64` | -| uint16x8x2_t vld1q_u16_x2(uint16_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.8H - Vt2.8H},[Xn]` | `Vt2.8H -> result.val[1]`
`Vt.8H -> result.val[0]` | `v7/A32/A64` | -| uint32x2x2_t vld1_u32_x2(uint32_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.2S - Vt2.2S},[Xn]` | `Vt2.2S -> result.val[1]`
`Vt.2S -> result.val[0]` | `v7/A32/A64` | -| uint32x4x2_t vld1q_u32_x2(uint32_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.4S - Vt2.4S},[Xn]` | `Vt2.4S -> result.val[1]`
`Vt.4S -> result.val[0]` | `v7/A32/A64` | -| float16x4x2_t vld1_f16_x2(float16_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.4H - Vt2.4H},[Xn]` | `Vt2.4H -> result.val[1]`
`Vt.4H -> result.val[0]` | `v7/A32/A64` | -| float16x8x2_t vld1q_f16_x2(float16_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.8H - Vt2.8H},[Xn]` | `Vt2.8H -> result.val[1]`
`Vt.8H -> result.val[0]` | `v7/A32/A64` | -| float32x2x2_t vld1_f32_x2(float32_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.2S - Vt2.2S},[Xn]` | `Vt2.2S -> result.val[1]`
`Vt.2S -> result.val[0]` | `v7/A32/A64` | -| float32x4x2_t vld1q_f32_x2(float32_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.4S - Vt2.4S},[Xn]` | `Vt2.4S -> result.val[1]`
`Vt.4S -> result.val[0]` | `v7/A32/A64` | -| poly8x8x2_t vld1_p8_x2(poly8_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.8B - Vt2.8B},[Xn]` | `Vt2.8B -> result.val[1]`
`Vt.8B -> result.val[0]` | `v7/A32/A64` | -| poly8x16x2_t vld1q_p8_x2(poly8_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.16B - Vt2.16B},[Xn]` | `Vt2.16B -> result.val[1]`
`Vt.16B -> result.val[0]` | `v7/A32/A64` | -| poly16x4x2_t vld1_p16_x2(poly16_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.4H - Vt2.4H},[Xn]` | `Vt2.4H -> result.val[1]`
`Vt.4H -> result.val[0]` | `v7/A32/A64` | -| poly16x8x2_t vld1q_p16_x2(poly16_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.8H - Vt2.8H},[Xn]` | `Vt2.8H -> result.val[1]`
`Vt.8H -> result.val[0]` | `v7/A32/A64` | -| int64x1x2_t vld1_s64_x2(int64_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.1D - Vt2.1D},[Xn]` | `Vt2.1D -> result.val[1]`
`Vt.1D -> result.val[0]` | `v7/A32/A64` | -| uint64x1x2_t vld1_u64_x2(uint64_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.1D - Vt2.1D},[Xn]` | `Vt2.1D -> result.val[1]`
`Vt.1D -> result.val[0]` | `v7/A32/A64` | -| poly64x1x2_t vld1_p64_x2(poly64_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.1D - Vt2.1D},[Xn]` | `Vt2.1D -> result.val[1]`
`Vt.1D -> result.val[0]` | `A32/A64` | -| int64x2x2_t vld1q_s64_x2(int64_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.2D - Vt2.2D},[Xn]` | `Vt2.2D -> result.val[1]`
`Vt.2D -> result.val[0]` | `v7/A32/A64` | -| uint64x2x2_t vld1q_u64_x2(uint64_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.2D - Vt2.2D},[Xn]` | `Vt2.2D -> result.val[1]`
`Vt.2D -> result.val[0]` | `v7/A32/A64` | -| poly64x2x2_t vld1q_p64_x2(poly64_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.2D - Vt2.2D},[Xn]` | `Vt2.2D -> result.val[1]`
`Vt.2D -> result.val[0]` | `A32/A64` | -| float64x1x2_t vld1_f64_x2(float64_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.1D - Vt2.1D},[Xn]` | `Vt2.1D -> result.val[1]`
`Vt.1D -> result.val[0]` | `A64` | -| float64x2x2_t vld1q_f64_x2(float64_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.2D - Vt2.2D},[Xn]` | `Vt2.2D -> result.val[1]`
`Vt.2D -> result.val[0]` | `A64` | -| int8x8x3_t vld1_s8_x3(int8_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.8B - Vt3.8B},[Xn]` | `Vt3.8B -> result.val[2]`
`Vt2.8B -> result.val[1]`
`Vt.8B -> result.val[0]` | `v7/A32/A64` | -| int8x16x3_t vld1q_s8_x3(int8_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.16B - Vt3.16B},[Xn]` | `Vt3.16B -> result.val[2]`
`Vt2.16B -> result.val[1]`
`Vt.16B -> result.val[0]` | `v7/A32/A64` | -| int16x4x3_t vld1_s16_x3(int16_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.4H - Vt3.4H},[Xn]` | `Vt3.4H -> result.val[2]`
`Vt2.4H -> result.val[1]`
`Vt.4H -> result.val[0]` | `v7/A32/A64` | -| int16x8x3_t vld1q_s16_x3(int16_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.8H - Vt3.8H},[Xn]` | `Vt3.8H -> result.val[2]`
`Vt2.8H -> result.val[1]`
`Vt.8H -> result.val[0]` | `v7/A32/A64` | -| int32x2x3_t vld1_s32_x3(int32_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.2S - Vt3.2S},[Xn]` | `Vt3.2S -> result.val[2]`
`Vt2.2S -> result.val[1]`
`Vt.2S -> result.val[0]` | `v7/A32/A64` | -| int32x4x3_t vld1q_s32_x3(int32_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.4S - Vt3.4S},[Xn]` | `Vt3.4S -> result.val[2]`
`Vt2.4S -> result.val[1]`
`Vt.4S -> result.val[0]` | `v7/A32/A64` | -| uint8x8x3_t vld1_u8_x3(uint8_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.8B - Vt3.8B},[Xn]` | `Vt3.8B -> result.val[2]`
`Vt2.8B -> result.val[1]`
`Vt.8B -> result.val[0]` | `v7/A32/A64` | -| uint8x16x3_t vld1q_u8_x3(uint8_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.16B - Vt3.16B},[Xn]` | `Vt3.16B -> result.val[2]`
`Vt2.16B -> result.val[1]`
`Vt.16B -> result.val[0]` | `v7/A32/A64` | -| uint16x4x3_t vld1_u16_x3(uint16_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.4H - Vt3.4H},[Xn]` | `Vt3.4H -> result.val[2]`
`Vt2.4H -> result.val[1]`
`Vt.4H -> result.val[0]` | `v7/A32/A64` | -| uint16x8x3_t vld1q_u16_x3(uint16_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.8H - Vt3.8H},[Xn]` | `Vt3.8H -> result.val[2]`
`Vt2.8H -> result.val[1]`
`Vt.8H -> result.val[0]` | `v7/A32/A64` | -| uint32x2x3_t vld1_u32_x3(uint32_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.2S - Vt3.2S},[Xn]` | `Vt3.2S -> result.val[2]`
`Vt2.2S -> result.val[1]`
`Vt.2S -> result.val[0]` | `v7/A32/A64` | -| uint32x4x3_t vld1q_u32_x3(uint32_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.4S - Vt3.4S},[Xn]` | `Vt3.4S -> result.val[2]`
`Vt2.4S -> result.val[1]`
`Vt.4S -> result.val[0]` | `v7/A32/A64` | -| float16x4x3_t vld1_f16_x3(float16_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.4H - Vt3.4H},[Xn]` | `Vt3.4H -> result.val[2]`
`Vt2.4H -> result.val[1]`
`Vt.4H -> result.val[0]` | `v7/A32/A64` | -| float16x8x3_t vld1q_f16_x3(float16_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.8H - Vt3.8H},[Xn]` | `Vt3.8H -> result.val[2]`
`Vt2.8H -> result.val[1]`
`Vt.8H -> result.val[0]` | `v7/A32/A64` | -| float32x2x3_t vld1_f32_x3(float32_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.2S - Vt3.2S},[Xn]` | `Vt3.2S -> result.val[2]`
`Vt2.2S -> result.val[1]`
`Vt.2S -> result.val[0]` | `v7/A32/A64` | -| float32x4x3_t vld1q_f32_x3(float32_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.4S - Vt3.4S},[Xn]` | `Vt3.4S -> result.val[2]`
`Vt2.4S -> result.val[1]`
`Vt.4S -> result.val[0]` | `v7/A32/A64` | -| poly8x8x3_t vld1_p8_x3(poly8_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.8B - Vt3.8B},[Xn]` | `Vt3.8B -> result.val[2]`
`Vt2.8B -> result.val[1]`
`Vt.8B -> result.val[0]` | `v7/A32/A64` | -| poly8x16x3_t vld1q_p8_x3(poly8_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.16B - Vt3.16B},[Xn]` | `Vt3.16B -> result.val[2]`
`Vt2.16B -> result.val[1]`
`Vt.16B -> result.val[0]` | `v7/A32/A64` | -| poly16x4x3_t vld1_p16_x3(poly16_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.4H - Vt3.4H},[Xn]` | `Vt3.4H -> result.val[2]`
`Vt2.4H -> result.val[1]`
`Vt.4H -> result.val[0]` | `v7/A32/A64` | -| poly16x8x3_t vld1q_p16_x3(poly16_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.8H - Vt3.8H},[Xn]` | `Vt3.8H -> result.val[2]`
`Vt2.8H -> result.val[1]`
`Vt.8H -> result.val[0]` | `v7/A32/A64` | -| int64x1x3_t vld1_s64_x3(int64_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.1D - Vt3.1D},[Xn]` | `Vt3.1D -> result.val[2]`
`Vt2.1D -> result.val[1]`
`Vt.1D -> result.val[0]` | `v7/A32/A64` | -| uint64x1x3_t vld1_u64_x3(uint64_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.1D - Vt3.1D},[Xn]` | `Vt3.1D -> result.val[2]`
`Vt2.1D -> result.val[1]`
`Vt.1D -> result.val[0]` | `v7/A32/A64` | -| poly64x1x3_t vld1_p64_x3(poly64_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.1D - Vt3.1D},[Xn]` | `Vt3.1D -> result.val[2]`
`Vt2.1D -> result.val[1]`
`Vt.1D -> result.val[0]` | `A32/A64` | -| int64x2x3_t vld1q_s64_x3(int64_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.2D - Vt3.2D},[Xn]` | `Vt3.2D -> result.val[2]`
`Vt2.2D -> result.val[1]`
`Vt.2D -> result.val[0]` | `v7/A32/A64` | -| uint64x2x3_t vld1q_u64_x3(uint64_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.2D - Vt3.2D},[Xn]` | `Vt3.2D -> result.val[2]`
`Vt2.2D -> result.val[1]`
`Vt.2D -> result.val[0]` | `v7/A32/A64` | -| poly64x2x3_t vld1q_p64_x3(poly64_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.2D - Vt3.2D},[Xn]` | `Vt3.2D -> result.val[2]`
`Vt2.2D -> result.val[1]`
`Vt.2D -> result.val[0]` | `A32/A64` | -| float64x1x3_t vld1_f64_x3(float64_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.1D - Vt3.1D},[Xn]` | `Vt3.1D -> result.val[2]`
`Vt2.1D -> result.val[1]`
`Vt.1D -> result.val[0]` | `A64` | -| float64x2x3_t vld1q_f64_x3(float64_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.2D - Vt3.2D},[Xn]` | `Vt3.2D -> result.val[2]`
`Vt2.2D -> result.val[1]`
`Vt.2D -> result.val[0]` | `A64` | -| int8x8x4_t vld1_s8_x4(int8_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.8B - Vt4.8B},[Xn]` | `Vt4.8B -> result.val[3]`
`Vt3.8B -> result.val[2]`
`Vt2.8B -> result.val[1]`
`Vt.8B -> result.val[0]` | `v7/A32/A64` | -| int8x16x4_t vld1q_s8_x4(int8_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.16B - Vt4.16B},[Xn]` | `Vt4.16B -> result.val[3]`
`Vt3.16B -> result.val[2]`
`Vt2.16B -> result.val[1]`
`Vt.16B -> result.val[0]` | `v7/A32/A64` | -| int16x4x4_t vld1_s16_x4(int16_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.4H - Vt4.4H},[Xn]` | `Vt4.4H -> result.val[3]`
`Vt3.4H -> result.val[2]`
`Vt2.4H -> result.val[1]`
`Vt.4H -> result.val[0]` | `v7/A32/A64` | -| int16x8x4_t vld1q_s16_x4(int16_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.8H - Vt4.8H},[Xn]` | `Vt4.8H -> result.val[3]`
`Vt3.8H -> result.val[2]`
`Vt2.8H -> result.val[1]`
`Vt.8H -> result.val[0]` | `v7/A32/A64` | -| int32x2x4_t vld1_s32_x4(int32_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.2S - Vt4.2S},[Xn]` | `Vt4.2S -> result.val[3]`
`Vt3.2S -> result.val[2]`
`Vt2.2S -> result.val[1]`
`Vt.2S -> result.val[0]` | `v7/A32/A64` | -| int32x4x4_t vld1q_s32_x4(int32_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.4S - Vt4.4S},[Xn]` | `Vt4.4S -> result.val[3]`
`Vt3.4S -> result.val[2]`
`Vt2.4S -> result.val[1]`
`Vt.4S -> result.val[0]` | `v7/A32/A64` | -| uint8x8x4_t vld1_u8_x4(uint8_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.8B - Vt4.8B},[Xn]` | `Vt4.8B -> result.val[3]`
`Vt3.8B -> result.val[2]`
`Vt2.8B -> result.val[1]`
`Vt.8B -> result.val[0]` | `v7/A32/A64` | -| uint8x16x4_t vld1q_u8_x4(uint8_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.16B - Vt4.16B},[Xn]` | `Vt4.16B -> result.val[3]`
`Vt3.16B -> result.val[2]`
`Vt2.16B -> result.val[1]`
`Vt.16B -> result.val[0]` | `v7/A32/A64` | -| uint16x4x4_t vld1_u16_x4(uint16_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.4H - Vt4.4H},[Xn]` | `Vt4.4H -> result.val[3]`
`Vt3.4H -> result.val[2]`
`Vt2.4H -> result.val[1]`
`Vt.4H -> result.val[0]` | `v7/A32/A64` | -| uint16x8x4_t vld1q_u16_x4(uint16_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.8H - Vt4.8H},[Xn]` | `Vt4.8H -> result.val[3]`
`Vt3.8H -> result.val[2]`
`Vt2.8H -> result.val[1]`
`Vt.8H -> result.val[0]` | `v7/A32/A64` | -| uint32x2x4_t vld1_u32_x4(uint32_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.2S - Vt4.2S},[Xn]` | `Vt4.2S -> result.val[3]`
`Vt3.2S -> result.val[2]`
`Vt2.2S -> result.val[1]`
`Vt.2S -> result.val[0]` | `v7/A32/A64` | -| uint32x4x4_t vld1q_u32_x4(uint32_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.4S - Vt4.4S},[Xn]` | `Vt4.4S -> result.val[3]`
`Vt3.4S -> result.val[2]`
`Vt2.4S -> result.val[1]`
`Vt.4S -> result.val[0]` | `v7/A32/A64` | -| float16x4x4_t vld1_f16_x4(float16_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.4H - Vt4.4H},[Xn]` | `Vt4.4H -> result.val[3]`
`Vt3.4H -> result.val[2]`
`Vt2.4H -> result.val[1]`
`Vt.4H -> result.val[0]` | `v7/A32/A64` | -| float16x8x4_t vld1q_f16_x4(float16_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.8H - Vt4.8H},[Xn]` | `Vt4.8H -> result.val[3]`
`Vt3.8H -> result.val[2]`
`Vt2.8H -> result.val[1]`
`Vt.8H -> result.val[0]` | `v7/A32/A64` | -| float32x2x4_t vld1_f32_x4(float32_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.2S - Vt4.2S},[Xn]` | `Vt4.2S -> result.val[3]`
`Vt3.2S -> result.val[2]`
`Vt2.2S -> result.val[1]`
`Vt.2S -> result.val[0]` | `v7/A32/A64` | -| float32x4x4_t vld1q_f32_x4(float32_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.4S - Vt4.4S},[Xn]` | `Vt4.4S -> result.val[3]`
`Vt3.4S -> result.val[2]`
`Vt2.4S -> result.val[1]`
`Vt.4S -> result.val[0]` | `v7/A32/A64` | -| poly8x8x4_t vld1_p8_x4(poly8_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.8B - Vt4.8B},[Xn]` | `Vt4.8B -> result.val[3]`
`Vt3.8B -> result.val[2]`
`Vt2.8B -> result.val[1]`
`Vt.8B -> result.val[0]` | `v7/A32/A64` | -| poly8x16x4_t vld1q_p8_x4(poly8_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.16B - Vt4.16B},[Xn]` | `Vt4.16B -> result.val[3]`
`Vt3.16B -> result.val[2]`
`Vt2.16B -> result.val[1]`
`Vt.16B -> result.val[0]` | `v7/A32/A64` | -| poly16x4x4_t vld1_p16_x4(poly16_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.4H - Vt4.4H},[Xn]` | `Vt4.4H -> result.val[3]`
`Vt3.4H -> result.val[2]`
`Vt2.4H -> result.val[1]`
`Vt.4H -> result.val[0]` | `v7/A32/A64` | -| poly16x8x4_t vld1q_p16_x4(poly16_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.8H - Vt4.8H},[Xn]` | `Vt4.8H -> result.val[3]`
`Vt3.8H -> result.val[2]`
`Vt2.8H -> result.val[1]`
`Vt.8H -> result.val[0]` | `v7/A32/A64` | -| int64x1x4_t vld1_s64_x4(int64_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.1D - Vt4.1D},[Xn]` | `Vt4.1D -> result.val[3]`
`Vt3.1D -> result.val[2]`
`Vt2.1D -> result.val[1]`
`Vt.1D -> result.val[0]` | `v7/A32/A64` | -| uint64x1x4_t vld1_u64_x4(uint64_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.1D - Vt4.1D},[Xn]` | `Vt4.1D -> result.val[3]`
`Vt3.1D -> result.val[2]`
`Vt2.1D -> result.val[1]`
`Vt.1D -> result.val[0]` | `v7/A32/A64` | -| poly64x1x4_t vld1_p64_x4(poly64_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.1D - Vt4.1D},[Xn]` | `Vt4.1D -> result.val[3]`
`Vt3.1D -> result.val[2]`
`Vt2.1D -> result.val[1]`
`Vt.1D -> result.val[0]` | `A32/A64` | -| int64x2x4_t vld1q_s64_x4(int64_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.2D - Vt4.2D},[Xn]` | `Vt4.2D -> result.val[3]`
`Vt3.2D -> result.val[2]`
`Vt2.2D -> result.val[1]`
`Vt.2D -> result.val[0]` | `v7/A32/A64` | -| uint64x2x4_t vld1q_u64_x4(uint64_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.2D - Vt4.2D},[Xn]` | `Vt4.2D -> result.val[3]`
`Vt3.2D -> result.val[2]`
`Vt2.2D -> result.val[1]`
`Vt.2D -> result.val[0]` | `v7/A32/A64` | -| poly64x2x4_t vld1q_p64_x4(poly64_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.2D - Vt4.2D},[Xn]` | `Vt4.2D -> result.val[3]`
`Vt3.2D -> result.val[2]`
`Vt2.2D -> result.val[1]`
`Vt.2D -> result.val[0]` | `A32/A64` | -| float64x1x4_t vld1_f64_x4(float64_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.1D - Vt4.1D},[Xn]` | `Vt4.1D -> result.val[3]`
`Vt3.1D -> result.val[2]`
`Vt2.1D -> result.val[1]`
`Vt.1D -> result.val[0]` | `A64` | -| float64x2x4_t vld1q_f64_x4(float64_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.2D - Vt4.2D},[Xn]` | `Vt4.2D -> result.val[3]`
`Vt3.2D -> result.val[2]`
`Vt2.2D -> result.val[1]`
`Vt.2D -> result.val[0]` | `A64` | +| Intrinsic | Argument preparation | AArch64 Instruction | Result | Supported architectures | +|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------|---------------------------------------------------------------------------------------------------------------------|---------------------------| +| int8x8_t vld1_s8(int8_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.8B},[Xn]` | `Vt.8B -> result` | `v7/A32/A64` | +| int8x16_t vld1q_s8(int8_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.16B},[Xn]` | `Vt.16B -> result` | `v7/A32/A64` | +| int16x4_t vld1_s16(int16_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.4H},[Xn]` | `Vt.4H -> result` | `v7/A32/A64` | +| int16x8_t vld1q_s16(int16_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.8H},[Xn]` | `Vt.8H -> result` | `v7/A32/A64` | +| int32x2_t vld1_s32(int32_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.2S},[Xn]` | `Vt.2S -> result` | `v7/A32/A64` | +| int32x4_t vld1q_s32(int32_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.4S},[Xn]` | `Vt.4S -> result` | `v7/A32/A64` | +| int64x1_t vld1_s64(int64_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.1D},[Xn]` | `Vt.1D -> result` | `v7/A32/A64` | +| int64x2_t vld1q_s64(int64_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.2D},[Xn]` | `Vt.2D -> result` | `v7/A32/A64` | +| uint8x8_t vld1_u8(uint8_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.8B},[Xn]` | `Vt.8B -> result` | `v7/A32/A64` | +| uint8x16_t vld1q_u8(uint8_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.16B},[Xn]` | `Vt.16B -> result` | `v7/A32/A64` | +| uint16x4_t vld1_u16(uint16_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.4H},[Xn]` | `Vt.4H -> result` | `v7/A32/A64` | +| uint16x8_t vld1q_u16(uint16_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.8H},[Xn]` | `Vt.8H -> result` | `v7/A32/A64` | +| uint32x2_t vld1_u32(uint32_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.2S},[Xn]` | `Vt.2S -> result` | `v7/A32/A64` | +| uint32x4_t vld1q_u32(uint32_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.4S},[Xn]` | `Vt.4S -> result` | `v7/A32/A64` | +| uint64x1_t vld1_u64(uint64_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.1D},[Xn]` | `Vt.1D -> result` | `v7/A32/A64` | +| uint64x2_t vld1q_u64(uint64_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.2D},[Xn]` | `Vt.2D -> result` | `v7/A32/A64` | +| poly64x1_t vld1_p64(poly64_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.1D},[Xn]` | `Vt.1D -> result` | `A32/A64` | +| poly64x2_t vld1q_p64(poly64_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.2D},[Xn]` | `Vt.2D -> result` | `A32/A64` | +| float16x4_t vld1_f16(float16_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.4H},[Xn]` | `Vt.4H -> result` | `v7/A32/A64` | +| float16x8_t vld1q_f16(float16_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.8H},[Xn]` | `Vt.8H -> result` | `v7/A32/A64` | +| float32x2_t vld1_f32(float32_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.2S},[Xn]` | `Vt.2S -> result` | `v7/A32/A64` | +| float32x4_t vld1q_f32(float32_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.4S},[Xn]` | `Vt.4S -> result` | `v7/A32/A64` | +| poly8x8_t vld1_p8(poly8_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.8B},[Xn]` | `Vt.8B -> result` | `v7/A32/A64` | +| poly8x16_t vld1q_p8(poly8_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.16B},[Xn]` | `Vt.16B -> result` | `v7/A32/A64` | +| poly16x4_t vld1_p16(poly16_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.4H},[Xn]` | `Vt.4H -> result` | `v7/A32/A64` | +| poly16x8_t vld1q_p16(poly16_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.8H},[Xn]` | `Vt.8H -> result` | `v7/A32/A64` | +| float64x1_t vld1_f64(float64_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.1D},[Xn]` | `Vt.1D -> result` | `A64` | +| float64x2_t vld1q_f64(float64_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.2D},[Xn]` | `Vt.2D -> result` | `A64` | +| mfloat8x8_t vld1_mf8(mfloat8_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.8B},[Xn]` | `Vt.8B -> result` | `A64` | +| mfloat8x16_t vld1q_mf8(mfloat8_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.16B},[Xn]` | `Vt.16B -> result` | `A64` | +| int8x8_t vld1_lane_s8(
     int8_t const *ptr,
     int8x8_t src,
     const int lane)
| `ptr -> Xn`
`src -> Vt.8B`
`0 <= lane <= 7` | `LD1 {Vt.b}[lane],[Xn]` | `Vt.8B -> result` | `v7/A32/A64` | +| int8x16_t vld1q_lane_s8(
     int8_t const *ptr,
     int8x16_t src,
     const int lane)
| `ptr -> Xn`
`src -> Vt.16B`
`0 <= lane <= 15` | `LD1 {Vt.b}[lane],[Xn]` | `Vt.16B -> result` | `v7/A32/A64` | +| int16x4_t vld1_lane_s16(
     int16_t const *ptr,
     int16x4_t src,
     const int lane)
| `ptr -> Xn`
`src -> Vt.4H`
`0 <= lane <= 3` | `LD1 {Vt.H}[lane],[Xn]` | `Vt.4H -> result` | `v7/A32/A64` | +| int16x8_t vld1q_lane_s16(
     int16_t const *ptr,
     int16x8_t src,
     const int lane)
| `ptr -> Xn`
`src -> Vt.8H`
`0 <= lane <= 7` | `LD1 {Vt.H}[lane],[Xn]` | `Vt.8H -> result` | `v7/A32/A64` | +| int32x2_t vld1_lane_s32(
     int32_t const *ptr,
     int32x2_t src,
     const int lane)
| `ptr -> Xn`
`src -> Vt.2S`
`0 <= lane <= 1` | `LD1 {Vt.S}[lane],[Xn]` | `Vt.2S -> result` | `v7/A32/A64` | +| int32x4_t vld1q_lane_s32(
     int32_t const *ptr,
     int32x4_t src,
     const int lane)
| `ptr -> Xn`
`src -> Vt.4S`
`0 <= lane <= 3` | `LD1 {Vt.S}[lane],[Xn]` | `Vt.4S -> result` | `v7/A32/A64` | +| int64x1_t vld1_lane_s64(
     int64_t const *ptr,
     int64x1_t src,
     const int lane)
| `ptr -> Xn`
`src -> Vt.1D`
`0 <= lane <= 0` | `LD1 {Vt.D}[lane],[Xn]` | `Vt.1D -> result` | `v7/A32/A64` | +| int64x2_t vld1q_lane_s64(
     int64_t const *ptr,
     int64x2_t src,
     const int lane)
| `ptr -> Xn`
`src -> Vt.2D`
`0 <= lane <= 1` | `LD1 {Vt.D}[lane],[Xn]` | `Vt.2D -> result` | `v7/A32/A64` | +| uint8x8_t vld1_lane_u8(
     uint8_t const *ptr,
     uint8x8_t src,
     const int lane)
| `ptr -> Xn`
`src -> Vt.8B`
`0 <= lane <= 7` | `LD1 {Vt.B}[lane],[Xn]` | `Vt.8B -> result` | `v7/A32/A64` | +| uint8x16_t vld1q_lane_u8(
     uint8_t const *ptr,
     uint8x16_t src,
     const int lane)
| `ptr -> Xn`
`src -> Vt.16B`
`0 <= lane <= 15` | `LD1 {Vt.B}[lane],[Xn]` | `Vt.16B -> result` | `v7/A32/A64` | +| uint16x4_t vld1_lane_u16(
     uint16_t const *ptr,
     uint16x4_t src,
     const int lane)
| `ptr -> Xn`
`src -> Vt.4H`
`0 <= lane <= 3` | `LD1 {Vt.H}[lane],[Xn]` | `Vt.4H -> result` | `v7/A32/A64` | +| uint16x8_t vld1q_lane_u16(
     uint16_t const *ptr,
     uint16x8_t src,
     const int lane)
| `ptr -> Xn`
`src -> Vt.8H`
`0 <= lane <= 7` | `LD1 {Vt.H}[lane],[Xn]` | `Vt.8H -> result` | `v7/A32/A64` | +| uint32x2_t vld1_lane_u32(
     uint32_t const *ptr,
     uint32x2_t src,
     const int lane)
| `ptr -> Xn`
`src -> Vt.2S`
`0 <= lane <= 1` | `LD1 {Vt.S}[lane],[Xn]` | `Vt.2S -> result` | `v7/A32/A64` | +| uint32x4_t vld1q_lane_u32(
     uint32_t const *ptr,
     uint32x4_t src,
     const int lane)
| `ptr -> Xn`
`src -> Vt.4S`
`0 <= lane <= 3` | `LD1 {Vt.S}[lane],[Xn]` | `Vt.4S -> result` | `v7/A32/A64` | +| uint64x1_t vld1_lane_u64(
     uint64_t const *ptr,
     uint64x1_t src,
     const int lane)
| `ptr -> Xn`
`src -> Vt.1D`
`0 <= lane <= 0` | `LD1 {Vt.D}[lane],[Xn]` | `Vt.1D -> result` | `v7/A32/A64` | +| uint64x2_t vld1q_lane_u64(
     uint64_t const *ptr,
     uint64x2_t src,
     const int lane)
| `ptr -> Xn`
`src -> Vt.2D`
`0 <= lane <= 1` | `LD1 {Vt.D}[lane],[Xn]` | `Vt.2D -> result` | `v7/A32/A64` | +| poly64x1_t vld1_lane_p64(
     poly64_t const *ptr,
     poly64x1_t src,
     const int lane)
| `ptr -> Xn`
`src -> Vt.1D`
`0 <= lane <= 0` | `LD1 {Vt.D}[lane],[Xn]` | `Vt.1D -> result` | `A32/A64` | +| poly64x2_t vld1q_lane_p64(
     poly64_t const *ptr,
     poly64x2_t src,
     const int lane)
| `ptr -> Xn`
`src -> Vt.2D`
`0 <= lane <= 1` | `LD1 {Vt.D}[lane],[Xn]` | `Vt.2D -> result` | `A32/A64` | +| float16x4_t vld1_lane_f16(
     float16_t const *ptr,
     float16x4_t src,
     const int lane)
| `ptr -> Xn`
`src -> Vt.4H`
`0 <= lane <= 3` | `LD1 {Vt.H}[lane],[Xn]` | `Vt.4H -> result` | `v7/A32/A64` | +| float16x8_t vld1q_lane_f16(
     float16_t const *ptr,
     float16x8_t src,
     const int lane)
| `ptr -> Xn`
`src -> Vt.8H`
`0 <= lane <= 7` | `LD1 {Vt.H}[lane],[Xn]` | `Vt.8H -> result` | `v7/A32/A64` | +| float32x2_t vld1_lane_f32(
     float32_t const *ptr,
     float32x2_t src,
     const int lane)
| `ptr -> Xn`
`src -> Vt.2S`
`0 <= lane <= 1` | `LD1 {Vt.S}[lane],[Xn]` | `Vt.2S -> result` | `v7/A32/A64` | +| float32x4_t vld1q_lane_f32(
     float32_t const *ptr,
     float32x4_t src,
     const int lane)
| `ptr -> Xn`
`src -> Vt.4S`
`0 <= lane <= 3` | `LD1 {Vt.S}[lane],[Xn]` | `Vt.4S -> result` | `v7/A32/A64` | +| poly8x8_t vld1_lane_p8(
     poly8_t const *ptr,
     poly8x8_t src,
     const int lane)
| `ptr -> Xn`
`src -> Vt.8B`
`0 <= lane <= 7` | `LD1 {Vt.B}[lane],[Xn]` | `Vt.8B -> result` | `v7/A32/A64` | +| poly8x16_t vld1q_lane_p8(
     poly8_t const *ptr,
     poly8x16_t src,
     const int lane)
| `ptr -> Xn`
`src -> Vt.16B`
`0 <= lane <= 15` | `LD1 {Vt.B}[lane],[Xn]` | `Vt.16B -> result` | `v7/A32/A64` | +| poly16x4_t vld1_lane_p16(
     poly16_t const *ptr,
     poly16x4_t src,
     const int lane)
| `ptr -> Xn`
`src -> Vt.4H`
`0 <= lane <= 3` | `LD1 {Vt.H}[lane],[Xn]` | `Vt.4H -> result` | `v7/A32/A64` | +| poly16x8_t vld1q_lane_p16(
     poly16_t const *ptr,
     poly16x8_t src,
     const int lane)
| `ptr -> Xn`
`src -> Vt.8H`
`0 <= lane <= 7` | `LD1 {Vt.H}[lane],[Xn]` | `Vt.8H -> result` | `v7/A32/A64` | +| float64x1_t vld1_lane_f64(
     float64_t const *ptr,
     float64x1_t src,
     const int lane)
| `ptr -> Xn`
`src -> Vt.1D`
`0 <= lane <= 0` | `LD1 {Vt.D}[lane],[Xn]` | `Vt.1D -> result` | `A64` | +| float64x2_t vld1q_lane_f64(
     float64_t const *ptr,
     float64x2_t src,
     const int lane)
| `ptr -> Xn`
`src -> Vt.2D`
`0 <= lane <= 1` | `LD1 {Vt.D}[lane],[Xn]` | `Vt.2D -> result` | `A64` | +| mfloat8x8_t vld1_lane_mf8(
     mfloat8_t const *ptr,
     mfloat8x8_t src,
     const int lane)
| `ptr -> Xn`
`src -> Vt.8B`
`0 <= lane <= 7` | `LD1 {Vt.b}[lane],[Xn]` | `Vt.8B -> result` | `A64` | +| mfloat8x16_t vld1q_lane_mf8(
     mfloat8_t const *ptr,
     mfloat8x16_t src,
     const int lane)
| `ptr -> Xn`
`src -> Vt.16B`
`0 <= lane <= 15` | `LD1 {Vt.b}[lane],[Xn]` | `Vt.16B -> result` | `A64` | +| uint64x1_t vldap1_lane_u64(
     uint64_t const *ptr,
     uint64x1_t src,
     const int lane)
| `ptr -> Xn`
`src -> Vt.1D`
`0 <= lane <= 0` | `LDAP1 {Vt.D}[lane],[Xn]` | `Vt.1D -> result` | `A64` | +| uint64x2_t vldap1q_lane_u64(
     uint64_t const *ptr,
     uint64x2_t src,
     const int lane)
| `ptr -> Xn`
`src -> Vt.2D`
`0 <= lane <= 1` | `LDAP1 {Vt.D}[lane],[Xn]` | `Vt.2D -> result` | `A64` | +| int64x1_t vldap1_lane_s64(
     int64_t const *ptr,
     int64x1_t src,
     const int lane)
| `ptr -> Xn`
`src -> Vt.1D`
`0 <= lane <= 0` | `LDAP1 {Vt.D}[lane],[Xn]` | `Vt.1D -> result` | `A64` | +| int64x2_t vldap1q_lane_s64(
     int64_t const *ptr,
     int64x2_t src,
     const int lane)
| `ptr -> Xn`
`src -> Vt.2D`
`0 <= lane <= 1` | `LDAP1 {Vt.D}[lane],[Xn]` | `Vt.2D -> result` | `A64` | +| float64x1_t vldap1_lane_f64(
     float64_t const *ptr,
     float64x1_t src,
     const int lane)
| `ptr -> Xn`
`src -> Vt.1D`
`0 <= lane <= 0` | `LDAP1 {Vt.D}[lane],[Xn]` | `Vt.1D -> result` | `A64` | +| float64x2_t vldap1q_lane_f64(
     float64_t const *ptr,
     float64x2_t src,
     const int lane)
| `ptr -> Xn`
`src -> Vt.2D`
`0 <= lane <= 1` | `LDAP1 {Vt.D}[lane],[Xn]` | `Vt.2D -> result` | `A64` | +| poly64x1_t vldap1_lane_p64(
     poly64_t const *ptr,
     poly64x1_t src,
     const int lane)
| `ptr -> Xn`
`src -> Vt.1D`
`0 <= lane <= 0` | `LDAP1 {Vt.D}[lane],[Xn]` | `Vt.1D -> result` | `A64` | +| poly64x2_t vldap1q_lane_p64(
     poly64_t const *ptr,
     poly64x2_t src,
     const int lane)
| `ptr -> Xn`
`src -> Vt.2D`
`0 <= lane <= 1` | `LDAP1 {Vt.D}[lane],[Xn]` | `Vt.2D -> result` | `A64` | +| int8x8_t vld1_dup_s8(int8_t const *ptr) | `ptr -> Xn` | `LD1R {Vt.8B},[Xn]` | `Vt.8B -> result` | `v7/A32/A64` | +| int8x16_t vld1q_dup_s8(int8_t const *ptr) | `ptr -> Xn` | `LD1R {Vt.16B},[Xn]` | `Vt.16B -> result` | `v7/A32/A64` | +| int16x4_t vld1_dup_s16(int16_t const *ptr) | `ptr -> Xn` | `LD1R {Vt.4H},[Xn]` | `Vt.4H -> result` | `v7/A32/A64` | +| int16x8_t vld1q_dup_s16(int16_t const *ptr) | `ptr -> Xn` | `LD1R {Vt.8H},[Xn]` | `Vt.8H -> result` | `v7/A32/A64` | +| int32x2_t vld1_dup_s32(int32_t const *ptr) | `ptr -> Xn` | `LD1R {Vt.2S},[Xn]` | `Vt.2S -> result` | `v7/A32/A64` | +| int32x4_t vld1q_dup_s32(int32_t const *ptr) | `ptr -> Xn` | `LD1R {Vt.4S},[Xn]` | `Vt.4S -> result` | `v7/A32/A64` | +| int64x1_t vld1_dup_s64(int64_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.1D},[Xn]` | `Vt.1D -> result` | `v7/A32/A64` | +| int64x2_t vld1q_dup_s64(int64_t const *ptr) | `ptr -> Xn` | `LD1R {Vt.2D},[Xn]` | `Vt.2D -> result` | `v7/A32/A64` | +| uint8x8_t vld1_dup_u8(uint8_t const *ptr) | `ptr -> Xn` | `LD1R {Vt.8B},[Xn]` | `Vt.8B -> result` | `v7/A32/A64` | +| uint8x16_t vld1q_dup_u8(uint8_t const *ptr) | `ptr -> Xn` | `LD1R {Vt.16B},[Xn]` | `Vt.16B -> result` | `v7/A32/A64` | +| uint16x4_t vld1_dup_u16(uint16_t const *ptr) | `ptr -> Xn` | `LD1R {Vt.4H},[Xn]` | `Vt.4H -> result` | `v7/A32/A64` | +| uint16x8_t vld1q_dup_u16(uint16_t const *ptr) | `ptr -> Xn` | `LD1R {Vt.8H},[Xn]` | `Vt.8H -> result` | `v7/A32/A64` | +| uint32x2_t vld1_dup_u32(uint32_t const *ptr) | `ptr -> Xn` | `LD1R {Vt.2S},[Xn]` | `Vt.2S -> result` | `v7/A32/A64` | +| uint32x4_t vld1q_dup_u32(uint32_t const *ptr) | `ptr -> Xn` | `LD1R {Vt.4S},[Xn]` | `Vt.4S -> result` | `v7/A32/A64` | +| uint64x1_t vld1_dup_u64(uint64_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.1D},[Xn]` | `Vt.1D -> result` | `v7/A32/A64` | +| uint64x2_t vld1q_dup_u64(uint64_t const *ptr) | `ptr -> Xn` | `LD1R {Vt.2D},[Xn]` | `Vt.2D -> result` | `v7/A32/A64` | +| poly64x1_t vld1_dup_p64(poly64_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.1D},[Xn]` | `Vt.1D -> result` | `A32/A64` | +| poly64x2_t vld1q_dup_p64(poly64_t const *ptr) | `ptr -> Xn` | `LD1R {Vt.2D},[Xn]` | `Vt.2D -> result` | `A32/A64` | +| float16x4_t vld1_dup_f16(float16_t const *ptr) | `ptr -> Xn` | `LD1R {Vt.4H},[Xn]` | `Vt.4H -> result` | `v7/A32/A64` | +| float16x8_t vld1q_dup_f16(float16_t const *ptr) | `ptr -> Xn` | `LD1R {Vt.8H},[Xn]` | `Vt.8H -> result` | `v7/A32/A64` | +| float32x2_t vld1_dup_f32(float32_t const *ptr) | `ptr -> Xn` | `LD1R {Vt.2S},[Xn]` | `Vt.2S -> result` | `v7/A32/A64` | +| float32x4_t vld1q_dup_f32(float32_t const *ptr) | `ptr -> Xn` | `LD1R {Vt.4S},[Xn]` | `Vt.4S -> result` | `v7/A32/A64` | +| poly8x8_t vld1_dup_p8(poly8_t const *ptr) | `ptr -> Xn` | `LD1R {Vt.8B},[Xn]` | `Vt.8B -> result` | `v7/A32/A64` | +| poly8x16_t vld1q_dup_p8(poly8_t const *ptr) | `ptr -> Xn` | `LD1R {Vt.16B},[Xn]` | `Vt.16B -> result` | `v7/A32/A64` | +| poly16x4_t vld1_dup_p16(poly16_t const *ptr) | `ptr -> Xn` | `LD1R {Vt.4H},[Xn]` | `Vt.4H -> result` | `v7/A32/A64` | +| poly16x8_t vld1q_dup_p16(poly16_t const *ptr) | `ptr -> Xn` | `LD1R {Vt.8H},[Xn]` | `Vt.8H -> result` | `v7/A32/A64` | +| float64x1_t vld1_dup_f64(float64_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.1D},[Xn]` | `Vt.1D -> result` | `A64` | +| float64x2_t vld1q_dup_f64(float64_t const *ptr) | `ptr -> Xn` | `LD1R {Vt.2D},[Xn]` | `Vt.2D -> result` | `A64` | +| mfloat8x8_t vld1_dup_mf8(mfloat8_t const *ptr) | `ptr -> Xn` | `LD1R {Vt.8B},[Xn]` | `Vt.8B -> result` | `A64` | +| mfloat8x16_t vld1q_dup_mf8(mfloat8_t const *ptr) | `ptr -> Xn` | `LD1R {Vt.16B},[Xn]` | `Vt.16B -> result` | `A64` | +| void vstl1_lane_u64(
     uint64_t *ptr,
     uint64x1_t val,
     const int lane)
| `val -> Vt.1D`
`ptr -> Xn`
`0 <= lane <= 0` | `STL1 {Vt.d}[lane],[Xn]` | | `A64` | +| void vstl1q_lane_u64(
     uint64_t *ptr,
     uint64x2_t val,
     const int lane)
| `val -> Vt.2D`
`ptr -> Xn`
`0 <= lane <= 1` | `STL1 {Vt.d}[lane],[Xn]` | | `A64` | +| void vstl1_lane_s64(
     int64_t *ptr,
     int64x1_t val,
     const int lane)
| `val -> Vt.1D`
`ptr -> Xn`
`0 <= lane <= 0` | `STL1 {Vt.d}[lane],[Xn]` | | `A64` | +| void vstl1q_lane_s64(
     int64_t *ptr,
     int64x2_t val,
     const int lane)
| `val -> Vt.2D`
`ptr -> Xn`
`0 <= lane <= 1` | `STL1 {Vt.d}[lane],[Xn]` | | `A64` | +| void vstl1_lane_f64(
     float64_t *ptr,
     float64x1_t val,
     const int lane)
| `val -> Vt.1D`
`ptr -> Xn`
`0 <= lane <= 0` | `STL1 {Vt.d}[lane],[Xn]` | | `A64` | +| void vstl1q_lane_f64(
     float64_t *ptr,
     float64x2_t val,
     const int lane)
| `val -> Vt.2D`
`ptr -> Xn`
`0 <= lane <= 1` | `STL1 {Vt.d}[lane],[Xn]` | | `A64` | +| void vstl1_lane_p64(
     poly64_t *ptr,
     poly64x1_t val,
     const int lane)
| `val -> Vt.1D`
`ptr -> Xn`
`0 <= lane <= 0` | `STL1 {Vt.d}[lane],[Xn]` | | `A64` | +| void vstl1q_lane_p64(
     poly64_t *ptr,
     poly64x2_t val,
     const int lane)
| `val -> Vt.2D`
`ptr -> Xn`
`0 <= lane <= 1` | `STL1 {Vt.d}[lane],[Xn]` | | `A64` | +| int8x8x2_t vld2_s8(int8_t const *ptr) | `ptr -> Xn` | `LD2 {Vt.8B - Vt2.8B},[Xn]` | `Vt2.8B -> result.val[1]`
`Vt.8B -> result.val[0]` | `v7/A32/A64` | +| int8x16x2_t vld2q_s8(int8_t const *ptr) | `ptr -> Xn` | `LD2 {Vt.16B - Vt2.16B},[Xn]` | `Vt2.16B -> result.val[1]`
`Vt.16B -> result.val[0]` | `v7/A32/A64` | +| int16x4x2_t vld2_s16(int16_t const *ptr) | `ptr -> Xn` | `LD2 {Vt.4H - Vt2.4H},[Xn]` | `Vt2.4H -> result.val[1]`
`Vt.4H -> result.val[0]` | `v7/A32/A64` | +| int16x8x2_t vld2q_s16(int16_t const *ptr) | `ptr -> Xn` | `LD2 {Vt.8H - Vt2.8H},[Xn]` | `Vt2.8H -> result.val[1]`
`Vt.8H -> result.val[0]` | `v7/A32/A64` | +| int32x2x2_t vld2_s32(int32_t const *ptr) | `ptr -> Xn` | `LD2 {Vt.2S - Vt2.2S},[Xn]` | `Vt2.2S -> result.val[1]`
`Vt.2S -> result.val[0]` | `v7/A32/A64` | +| int32x4x2_t vld2q_s32(int32_t const *ptr) | `ptr -> Xn` | `LD2 {Vt.4S - Vt2.4S},[Xn]` | `Vt2.4S -> result.val[1]`
`Vt.4S -> result.val[0]` | `v7/A32/A64` | +| uint8x8x2_t vld2_u8(uint8_t const *ptr) | `ptr -> Xn` | `LD2 {Vt.8B - Vt2.8B},[Xn]` | `Vt2.8B -> result.val[1]`
`Vt.8B -> result.val[0]` | `v7/A32/A64` | +| uint8x16x2_t vld2q_u8(uint8_t const *ptr) | `ptr -> Xn` | `LD2 {Vt.16B - Vt2.16B},[Xn]` | `Vt2.16B -> result.val[1]`
`Vt.16B -> result.val[0]` | `v7/A32/A64` | +| uint16x4x2_t vld2_u16(uint16_t const *ptr) | `ptr -> Xn` | `LD2 {Vt.4H - Vt2.4H},[Xn]` | `Vt2.4H -> result.val[1]`
`Vt.4H -> result.val[0]` | `v7/A32/A64` | +| uint16x8x2_t vld2q_u16(uint16_t const *ptr) | `ptr -> Xn` | `LD2 {Vt.8H - Vt2.8H},[Xn]` | `Vt2.8H -> result.val[1]`
`Vt.8H -> result.val[0]` | `v7/A32/A64` | +| uint32x2x2_t vld2_u32(uint32_t const *ptr) | `ptr -> Xn` | `LD2 {Vt.2S - Vt2.2S},[Xn]` | `Vt2.2S -> result.val[1]`
`Vt.2S -> result.val[0]` | `v7/A32/A64` | +| uint32x4x2_t vld2q_u32(uint32_t const *ptr) | `ptr -> Xn` | `LD2 {Vt.4S - Vt2.4S},[Xn]` | `Vt2.4S -> result.val[1]`
`Vt.4S -> result.val[0]` | `v7/A32/A64` | +| float16x4x2_t vld2_f16(float16_t const *ptr) | `ptr -> Xn` | `LD2 {Vt.4H - Vt2.4H},[Xn]` | `Vt2.4H -> result.val[1]`
`Vt.4H -> result.val[0]` | `v7/A32/A64` | +| float16x8x2_t vld2q_f16(float16_t const *ptr) | `ptr -> Xn` | `LD2 {Vt.8H - Vt2.8H},[Xn]` | `Vt2.8H -> result.val[1]`
`Vt.8H -> result.val[0]` | `v7/A32/A64` | +| float32x2x2_t vld2_f32(float32_t const *ptr) | `ptr -> Xn` | `LD2 {Vt.2S - Vt2.2S},[Xn]` | `Vt2.2S -> result.val[1]`
`Vt.2S -> result.val[0]` | `v7/A32/A64` | +| float32x4x2_t vld2q_f32(float32_t const *ptr) | `ptr -> Xn` | `LD2 {Vt.4S - Vt2.4S},[Xn]` | `Vt2.4S -> result.val[1]`
`Vt.4S -> result.val[0]` | `v7/A32/A64` | +| poly8x8x2_t vld2_p8(poly8_t const *ptr) | `ptr -> Xn` | `LD2 {Vt.8B - Vt2.8B},[Xn]` | `Vt2.8B -> result.val[1]`
`Vt.8B -> result.val[0]` | `v7/A32/A64` | +| poly8x16x2_t vld2q_p8(poly8_t const *ptr) | `ptr -> Xn` | `LD2 {Vt.16B - Vt2.16B},[Xn]` | `Vt2.16B -> result.val[1]`
`Vt.16B -> result.val[0]` | `v7/A32/A64` | +| poly16x4x2_t vld2_p16(poly16_t const *ptr) | `ptr -> Xn` | `LD2 {Vt.4H - Vt2.4H},[Xn]` | `Vt2.4H -> result.val[1]`
`Vt.4H -> result.val[0]` | `v7/A32/A64` | +| poly16x8x2_t vld2q_p16(poly16_t const *ptr) | `ptr -> Xn` | `LD2 {Vt.8H - Vt2.8H},[Xn]` | `Vt2.8H -> result.val[1]`
`Vt.8H -> result.val[0]` | `v7/A32/A64` | +| int64x1x2_t vld2_s64(int64_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.1D - Vt2.1D},[Xn]` | `Vt2.1D -> result.val[1]`
`Vt.1D -> result.val[0]` | `v7/A32/A64` | +| uint64x1x2_t vld2_u64(uint64_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.1D - Vt2.1D},[Xn]` | `Vt2.1D -> result.val[1]`
`Vt.1D -> result.val[0]` | `v7/A32/A64` | +| poly64x1x2_t vld2_p64(poly64_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.1D - Vt2.1D},[Xn]` | `Vt2.1D -> result.val[1]`
`Vt.1D -> result.val[0]` | `A32/A64` | +| int64x2x2_t vld2q_s64(int64_t const *ptr) | `ptr -> Xn` | `LD2 {Vt.2D - Vt2.2D},[Xn]` | `Vt2.2D -> result.val[1]`
`Vt.2D -> result.val[0]` | `A64` | +| uint64x2x2_t vld2q_u64(uint64_t const *ptr) | `ptr -> Xn` | `LD2 {Vt.2D - Vt2.2D},[Xn]` | `Vt2.2D -> result.val[1]`
`Vt.2D -> result.val[0]` | `A64` | +| poly64x2x2_t vld2q_p64(poly64_t const *ptr) | `ptr -> Xn` | `LD2 {Vt.2D - Vt2.2D},[Xn]` | `Vt2.2D -> result.val[1]`
`Vt.2D -> result.val[0]` | `A64` | +| float64x1x2_t vld2_f64(float64_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.1D - Vt2.1D},[Xn]` | `Vt2.1D -> result.val[1]`
`Vt.1D -> result.val[0]` | `A64` | +| float64x2x2_t vld2q_f64(float64_t const *ptr) | `ptr -> Xn` | `LD2 {Vt.2D - Vt2.2D},[Xn]` | `Vt2.2D -> result.val[1]`
`Vt.2D -> result.val[0]` | `A64` | +| mfloat8x8x2_t vld2_mf8(mfloat8_t const *ptr) | `ptr -> Xn` | `LD2 {Vt.8B - Vt2.8B},[Xn]` | `Vt2.8B -> result.val[1]`
`Vt.8B -> result.val[0]` | `A64` | +| mfloat8x16x2_t vld2q_mf8(mfloat8_t const *ptr) | `ptr -> Xn` | `LD2 {Vt.16B - Vt2.16B},[Xn]` | `Vt2.16B -> result.val[1]`
`Vt.16B -> result.val[0]` | `A64` | +| int8x8x3_t vld3_s8(int8_t const *ptr) | `ptr -> Xn` | `LD3 {Vt.8B - Vt3.8B},[Xn]` | `Vt3.8B -> result.val[2]`
`Vt2.8B -> result.val[1]`
`Vt.8B -> result.val[0]` | `v7/A32/A64` | +| int8x16x3_t vld3q_s8(int8_t const *ptr) | `ptr -> Xn` | `LD3 {Vt.16B - Vt3.16B},[Xn]` | `Vt3.16B -> result.val[2]`
`Vt2.16B -> result.val[1]`
`Vt.16B -> result.val[0]` | `v7/A32/A64` | +| int16x4x3_t vld3_s16(int16_t const *ptr) | `ptr -> Xn` | `LD3 {Vt.4H - Vt3.4H},[Xn]` | `Vt3.4H -> result.val[2]`
`Vt2.4H -> result.val[1]`
`Vt.4H -> result.val[0]` | `v7/A32/A64` | +| int16x8x3_t vld3q_s16(int16_t const *ptr) | `ptr -> Xn` | `LD3 {Vt.8H - Vt3.8H},[Xn]` | `Vt3.8H -> result.val[2]`
`Vt2.8H -> result.val[1]`
`Vt.8H -> result.val[0]` | `v7/A32/A64` | +| int32x2x3_t vld3_s32(int32_t const *ptr) | `ptr -> Xn` | `LD3 {Vt.2S - Vt3.2S},[Xn]` | `Vt3.2S -> result.val[2]`
`Vt2.2S -> result.val[1]`
`Vt.2S -> result.val[0]` | `v7/A32/A64` | +| int32x4x3_t vld3q_s32(int32_t const *ptr) | `ptr -> Xn` | `LD3 {Vt.4S - Vt3.4S},[Xn]` | `Vt3.4S -> result.val[2]`
`Vt2.4S -> result.val[1]`
`Vt.4S -> result.val[0]` | `v7/A32/A64` | +| uint8x8x3_t vld3_u8(uint8_t const *ptr) | `ptr -> Xn` | `LD3 {Vt.8B - Vt3.8B},[Xn]` | `Vt3.8B -> result.val[2]`
`Vt2.8B -> result.val[1]`
`Vt.8B -> result.val[0]` | `v7/A32/A64` | +| uint8x16x3_t vld3q_u8(uint8_t const *ptr) | `ptr -> Xn` | `LD3 {Vt.16B - Vt3.16B},[Xn]` | `Vt3.16B -> result.val[2]`
`Vt2.16B -> result.val[1]`
`Vt.16B -> result.val[0]` | `v7/A32/A64` | +| uint16x4x3_t vld3_u16(uint16_t const *ptr) | `ptr -> Xn` | `LD3 {Vt.4H - Vt3.4H},[Xn]` | `Vt3.4H -> result.val[2]`
`Vt2.4H -> result.val[1]`
`Vt.4H -> result.val[0]` | `v7/A32/A64` | +| uint16x8x3_t vld3q_u16(uint16_t const *ptr) | `ptr -> Xn` | `LD3 {Vt.8H - Vt3.8H},[Xn]` | `Vt3.8H -> result.val[2]`
`Vt2.8H -> result.val[1]`
`Vt.8H -> result.val[0]` | `v7/A32/A64` | +| uint32x2x3_t vld3_u32(uint32_t const *ptr) | `ptr -> Xn` | `LD3 {Vt.2S - Vt3.2S},[Xn]` | `Vt3.2S -> result.val[2]`
`Vt2.2S -> result.val[1]`
`Vt.2S -> result.val[0]` | `v7/A32/A64` | +| uint32x4x3_t vld3q_u32(uint32_t const *ptr) | `ptr -> Xn` | `LD3 {Vt.4S - Vt3.4S},[Xn]` | `Vt3.4S -> result.val[2]`
`Vt2.4S -> result.val[1]`
`Vt.4S -> result.val[0]` | `v7/A32/A64` | +| float16x4x3_t vld3_f16(float16_t const *ptr) | `ptr -> Xn` | `LD3 {Vt.4H - Vt3.4H},[Xn]` | `Vt3.4H -> result.val[2]`
`Vt2.4H -> result.val[1]`
`Vt.4H -> result.val[0]` | `v7/A32/A64` | +| float16x8x3_t vld3q_f16(float16_t const *ptr) | `ptr -> Xn` | `LD3 {Vt.8H - Vt3.8H},[Xn]` | `Vt3.8H -> result.val[2]`
`Vt2.8H -> result.val[1]`
`Vt.8H -> result.val[0]` | `v7/A32/A64` | +| float32x2x3_t vld3_f32(float32_t const *ptr) | `ptr -> Xn` | `LD3 {Vt.2S - Vt3.2S},[Xn]` | `Vt3.2S -> result.val[2]`
`Vt2.2S -> result.val[1]`
`Vt.2S -> result.val[0]` | `v7/A32/A64` | +| float32x4x3_t vld3q_f32(float32_t const *ptr) | `ptr -> Xn` | `LD3 {Vt.4S - Vt3.4S},[Xn]` | `Vt3.4S -> result.val[2]`
`Vt2.4S -> result.val[1]`
`Vt.4S -> result.val[0]` | `v7/A32/A64` | +| poly8x8x3_t vld3_p8(poly8_t const *ptr) | `ptr -> Xn` | `LD3 {Vt.8B - Vt3.8B},[Xn]` | `Vt3.8B -> result.val[2]`
`Vt2.8B -> result.val[1]`
`Vt.8B -> result.val[0]` | `v7/A32/A64` | +| poly8x16x3_t vld3q_p8(poly8_t const *ptr) | `ptr -> Xn` | `LD3 {Vt.16B - Vt3.16B},[Xn]` | `Vt3.16B -> result.val[2]`
`Vt2.16B -> result.val[1]`
`Vt.16B -> result.val[0]` | `v7/A32/A64` | +| poly16x4x3_t vld3_p16(poly16_t const *ptr) | `ptr -> Xn` | `LD3 {Vt.4H - Vt3.4H},[Xn]` | `Vt3.4H -> result.val[2]`
`Vt2.4H -> result.val[1]`
`Vt.4H -> result.val[0]` | `v7/A32/A64` | +| poly16x8x3_t vld3q_p16(poly16_t const *ptr) | `ptr -> Xn` | `LD3 {Vt.8H - Vt3.8H},[Xn]` | `Vt3.8H -> result.val[2]`
`Vt2.8H -> result.val[1]`
`Vt.8H -> result.val[0]` | `v7/A32/A64` | +| int64x1x3_t vld3_s64(int64_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.1D - Vt3.1D},[Xn]` | `Vt3.1D -> result.val[2]`
`Vt2.1D -> result.val[1]`
`Vt.1D -> result.val[0]` | `v7/A32/A64` | +| uint64x1x3_t vld3_u64(uint64_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.1D - Vt3.1D},[Xn]` | `Vt3.1D -> result.val[2]`
`Vt2.1D -> result.val[1]`
`Vt.1D -> result.val[0]` | `v7/A32/A64` | +| poly64x1x3_t vld3_p64(poly64_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.1D - Vt3.1D},[Xn]` | `Vt3.1D -> result.val[2]`
`Vt2.1D -> result.val[1]`
`Vt.1D -> result.val[0]` | `A32/A64` | +| int64x2x3_t vld3q_s64(int64_t const *ptr) | `ptr -> Xn` | `LD3 {Vt.2D - Vt3.2D},[Xn]` | `Vt3.2D -> result.val[2]`
`Vt2.2D -> result.val[1]`
`Vt.2D -> result.val[0]` | `A64` | +| uint64x2x3_t vld3q_u64(uint64_t const *ptr) | `ptr -> Xn` | `LD3 {Vt.2D - Vt3.2D},[Xn]` | `Vt3.2D -> result.val[2]`
`Vt2.2D -> result.val[1]`
`Vt.2D -> result.val[0]` | `A64` | +| poly64x2x3_t vld3q_p64(poly64_t const *ptr) | `ptr -> Xn` | `LD3 {Vt.2D - Vt3.2D},[Xn]` | `Vt3.2D -> result.val[2]`
`Vt2.2D -> result.val[1]`
`Vt.2D -> result.val[0]` | `A64` | +| float64x1x3_t vld3_f64(float64_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.1D - Vt3.1D},[Xn]` | `Vt3.1D -> result.val[2]`
`Vt2.1D -> result.val[1]`
`Vt.1D -> result.val[0]` | `A64` | +| float64x2x3_t vld3q_f64(float64_t const *ptr) | `ptr -> Xn` | `LD3 {Vt.2D - Vt3.2D},[Xn]` | `Vt3.2D -> result.val[2]`
`Vt2.2D -> result.val[1]`
`Vt.2D -> result.val[0]` | `A64` | +| mfloat8x8x3_t vld3_mf8(int8_t const *ptr) | `ptr -> Xn` | `LD3 {Vt.8B - Vt3.8B},[Xn]` | `Vt3.8B -> result.val[2]`
`Vt2.8B -> result.val[1]`
`Vt.8B -> result.val[0]` | `A64` | +| mfloat8x16x3_t vld3q_mf8(int8_t const *ptr) | `ptr -> Xn` | `LD3 {Vt.16B - Vt3.16B},[Xn]` | `Vt3.16B -> result.val[2]`
`Vt2.16B -> result.val[1]`
`Vt.16B -> result.val[0]` | `A64` | +| int8x8x4_t vld4_s8(int8_t const *ptr) | `ptr -> Xn` | `LD4 {Vt.8B - Vt4.8B},[Xn]` | `Vt4.8B -> result.val[3]`
`Vt3.8B -> result.val[2]`
`Vt2.8B -> result.val[1]`
`Vt.8B -> result.val[0]` | `v7/A32/A64` | +| int8x16x4_t vld4q_s8(int8_t const *ptr) | `ptr -> Xn` | `LD4 {Vt.16B - Vt4.16B},[Xn]` | `Vt4.16B -> result.val[3]`
`Vt3.16B -> result.val[2]`
`Vt2.16B -> result.val[1]`
`Vt.16B -> result.val[0]` | `v7/A32/A64` | +| int16x4x4_t vld4_s16(int16_t const *ptr) | `ptr -> Xn` | `LD4 {Vt.4H - Vt4.4H},[Xn]` | `Vt4.4H -> result.val[3]`
`Vt3.4H -> result.val[2]`
`Vt2.4H -> result.val[1]`
`Vt.4H -> result.val[0]` | `v7/A32/A64` | +| int16x8x4_t vld4q_s16(int16_t const *ptr) | `ptr -> Xn` | `LD4 {Vt.8H - Vt4.8H},[Xn]` | `Vt4.8H -> result.val[3]`
`Vt3.8H -> result.val[2]`
`Vt2.8H -> result.val[1]`
`Vt.8H -> result.val[0]` | `v7/A32/A64` | +| int32x2x4_t vld4_s32(int32_t const *ptr) | `ptr -> Xn` | `LD4 {Vt.2S - Vt4.2S},[Xn]` | `Vt4.2S -> result.val[3]`
`Vt3.2S -> result.val[2]`
`Vt2.2S -> result.val[1]`
`Vt.2S -> result.val[0]` | `v7/A32/A64` | +| int32x4x4_t vld4q_s32(int32_t const *ptr) | `ptr -> Xn` | `LD4 {Vt.4S - Vt4.4S},[Xn]` | `Vt4.4S -> result.val[3]`
`Vt3.4S -> result.val[2]`
`Vt2.4S -> result.val[1]`
`Vt.4S -> result.val[0]` | `v7/A32/A64` | +| uint8x8x4_t vld4_u8(uint8_t const *ptr) | `ptr -> Xn` | `LD4 {Vt.8B - Vt4.8B},[Xn]` | `Vt4.8B -> result.val[3]`
`Vt3.8B -> result.val[2]`
`Vt2.8B -> result.val[1]`
`Vt.8B -> result.val[0]` | `v7/A32/A64` | +| uint8x16x4_t vld4q_u8(uint8_t const *ptr) | `ptr -> Xn` | `LD4 {Vt.16B - Vt4.16B},[Xn]` | `Vt4.16B -> result.val[3]`
`Vt3.16B -> result.val[2]`
`Vt2.16B -> result.val[1]`
`Vt.16B -> result.val[0]` | `v7/A32/A64` | +| uint16x4x4_t vld4_u16(uint16_t const *ptr) | `ptr -> Xn` | `LD4 {Vt.4H - Vt4.4H},[Xn]` | `Vt4.4H -> result.val[3]`
`Vt3.4H -> result.val[2]`
`Vt2.4H -> result.val[1]`
`Vt.4H -> result.val[0]` | `v7/A32/A64` | +| uint16x8x4_t vld4q_u16(uint16_t const *ptr) | `ptr -> Xn` | `LD4 {Vt.8H - Vt4.8H},[Xn]` | `Vt4.8H -> result.val[3]`
`Vt3.8H -> result.val[2]`
`Vt2.8H -> result.val[1]`
`Vt.8H -> result.val[0]` | `v7/A32/A64` | +| uint32x2x4_t vld4_u32(uint32_t const *ptr) | `ptr -> Xn` | `LD4 {Vt.2S - Vt4.2S},[Xn]` | `Vt4.2S -> result.val[3]`
`Vt3.2S -> result.val[2]`
`Vt2.2S -> result.val[1]`
`Vt.2S -> result.val[0]` | `v7/A32/A64` | +| uint32x4x4_t vld4q_u32(uint32_t const *ptr) | `ptr -> Xn` | `LD4 {Vt.4S - Vt4.4S},[Xn]` | `Vt4.4S -> result.val[3]`
`Vt3.4S -> result.val[2]`
`Vt2.4S -> result.val[1]`
`Vt.4S -> result.val[0]` | `v7/A32/A64` | +| float16x4x4_t vld4_f16(float16_t const *ptr) | `ptr -> Xn` | `LD4 {Vt.4H - Vt4.4H},[Xn]` | `Vt4.4H -> result.val[3]`
`Vt3.4H -> result.val[2]`
`Vt2.4H -> result.val[1]`
`Vt.4H -> result.val[0]` | `v7/A32/A64` | +| float16x8x4_t vld4q_f16(float16_t const *ptr) | `ptr -> Xn` | `LD4 {Vt.8H - Vt4.8H},[Xn]` | `Vt4.8H -> result.val[3]`
`Vt3.8H -> result.val[2]`
`Vt2.8H -> result.val[1]`
`Vt.8H -> result.val[0]` | `v7/A32/A64` | +| float32x2x4_t vld4_f32(float32_t const *ptr) | `ptr -> Xn` | `LD4 {Vt.2S - Vt4.2S},[Xn]` | `Vt4.2S -> result.val[3]`
`Vt3.2S -> result.val[2]`
`Vt2.2S -> result.val[1]`
`Vt.2S -> result.val[0]` | `v7/A32/A64` | +| float32x4x4_t vld4q_f32(float32_t const *ptr) | `ptr -> Xn` | `LD4 {Vt.4S - Vt4.4S},[Xn]` | `Vt4.4S -> result.val[3]`
`Vt3.4S -> result.val[2]`
`Vt2.4S -> result.val[1]`
`Vt.4S -> result.val[0]` | `v7/A32/A64` | +| poly8x8x4_t vld4_p8(poly8_t const *ptr) | `ptr -> Xn` | `LD4 {Vt.8B - Vt4.8B},[Xn]` | `Vt4.8B -> result.val[3]`
`Vt3.8B -> result.val[2]`
`Vt2.8B -> result.val[1]`
`Vt.8B -> result.val[0]` | `v7/A32/A64` | +| poly8x16x4_t vld4q_p8(poly8_t const *ptr) | `ptr -> Xn` | `LD4 {Vt.16B - Vt4.16B},[Xn]` | `Vt4.16B -> result.val[3]`
`Vt3.16B -> result.val[2]`
`Vt2.16B -> result.val[1]`
`Vt.16B -> result.val[0]` | `v7/A32/A64` | +| poly16x4x4_t vld4_p16(poly16_t const *ptr) | `ptr -> Xn` | `LD4 {Vt.4H - Vt4.4H},[Xn]` | `Vt4.4H -> result.val[3]`
`Vt3.4H -> result.val[2]`
`Vt2.4H -> result.val[1]`
`Vt.4H -> result.val[0]` | `v7/A32/A64` | +| poly16x8x4_t vld4q_p16(poly16_t const *ptr) | `ptr -> Xn` | `LD4 {Vt.8H - Vt4.8H},[Xn]` | `Vt4.8H -> result.val[3]`
`Vt3.8H -> result.val[2]`
`Vt2.8H -> result.val[1]`
`Vt.8H -> result.val[0]` | `v7/A32/A64` | +| int64x1x4_t vld4_s64(int64_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.1D - Vt4.1D},[Xn]` | `Vt4.1D -> result.val[3]`
`Vt3.1D -> result.val[2]`
`Vt2.1D -> result.val[1]`
`Vt.1D -> result.val[0]` | `v7/A32/A64` | +| uint64x1x4_t vld4_u64(uint64_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.1D - Vt4.1D},[Xn]` | `Vt4.1D -> result.val[3]`
`Vt3.1D -> result.val[2]`
`Vt2.1D -> result.val[1]`
`Vt.1D -> result.val[0]` | `v7/A32/A64` | +| poly64x1x4_t vld4_p64(poly64_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.1D - Vt4.1D},[Xn]` | `Vt4.1D -> result.val[3]`
`Vt3.1D -> result.val[2]`
`Vt2.1D -> result.val[1]`
`Vt.1D -> result.val[0]` | `A32/A64` | +| int64x2x4_t vld4q_s64(int64_t const *ptr) | `ptr -> Xn` | `LD4 {Vt.2D - Vt4.2D},[Xn]` | `Vt4.2D -> result.val[3]`
`Vt3.2D -> result.val[2]`
`Vt2.2D -> result.val[1]`
`Vt.2D -> result.val[0]` | `A64` | +| uint64x2x4_t vld4q_u64(uint64_t const *ptr) | `ptr -> Xn` | `LD4 {Vt.2D - Vt4.2D},[Xn]` | `Vt4.2D -> result.val[3]`
`Vt3.2D -> result.val[2]`
`Vt2.2D -> result.val[1]`
`Vt.2D -> result.val[0]` | `A64` | +| poly64x2x4_t vld4q_p64(poly64_t const *ptr) | `ptr -> Xn` | `LD4 {Vt.2D - Vt4.2D},[Xn]` | `Vt4.2D -> result.val[3]`
`Vt3.2D -> result.val[2]`
`Vt2.2D -> result.val[1]`
`Vt.2D -> result.val[0]` | `A64` | +| float64x1x4_t vld4_f64(float64_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.1D - Vt4.1D},[Xn]` | `Vt4.1D -> result.val[3]`
`Vt3.1D -> result.val[2]`
`Vt2.1D -> result.val[1]`
`Vt.1D -> result.val[0]` | `A64` | +| float64x2x4_t vld4q_f64(float64_t const *ptr) | `ptr -> Xn` | `LD4 {Vt.2D - Vt4.2D},[Xn]` | `Vt4.2D -> result.val[3]`
`Vt3.2D -> result.val[2]`
`Vt2.2D -> result.val[1]`
`Vt.2D -> result.val[0]` | `A64` | +| mfloat8x8x4_t vld4_mf8(mfloat8_t const *ptr) | `ptr -> Xn` | `LD4 {Vt.8B - Vt4.8B},[Xn]` | `Vt4.8B -> result.val[3]`
`Vt3.8B -> result.val[2]`
`Vt2.8B -> result.val[1]`
`Vt.8B -> result.val[0]` | `A64` | +| mfloat8x16x4_t vld4q_mf8(mfloat8_t const *ptr) | `ptr -> Xn` | `LD4 {Vt.16B - Vt4.16B},[Xn]` | `Vt4.16B -> result.val[3]`
`Vt3.16B -> result.val[2]`
`Vt2.16B -> result.val[1]`
`Vt.16B -> result.val[0]` | `A64` | +| int8x8x2_t vld2_dup_s8(int8_t const *ptr) | `ptr -> Xn` | `LD2R {Vt.8B - Vt2.8B},[Xn]` | `Vt2.8B -> result.val[1]`
`Vt.8B -> result.val[0]` | `v7/A32/A64` | +| int8x16x2_t vld2q_dup_s8(int8_t const *ptr) | `ptr -> Xn` | `LD2R {Vt.16B - Vt2.16B},[Xn]` | `Vt2.16B -> result.val[1]`
`Vt.16B -> result.val[0]` | `v7/A32/A64` | +| int16x4x2_t vld2_dup_s16(int16_t const *ptr) | `ptr -> Xn` | `LD2R {Vt.4H - Vt2.4H},[Xn]` | `Vt2.4H -> result.val[1]`
`Vt.4H -> result.val[0]` | `v7/A32/A64` | +| int16x8x2_t vld2q_dup_s16(int16_t const *ptr) | `ptr -> Xn` | `LD2R {Vt.8H - Vt2.8H},[Xn]` | `Vt2.8H -> result.val[1]`
`Vt.8H -> result.val[0]` | `v7/A32/A64` | +| int32x2x2_t vld2_dup_s32(int32_t const *ptr) | `ptr -> Xn` | `LD2R {Vt.2S - Vt2.2S},[Xn]` | `Vt2.2S -> result.val[1]`
`Vt.2S -> result.val[0]` | `v7/A32/A64` | +| int32x4x2_t vld2q_dup_s32(int32_t const *ptr) | `ptr -> Xn` | `LD2R {Vt.4S - Vt2.4S},[Xn]` | `Vt2.4S -> result.val[1]`
`Vt.4S -> result.val[0]` | `v7/A32/A64` | +| uint8x8x2_t vld2_dup_u8(uint8_t const *ptr) | `ptr -> Xn` | `LD2R {Vt.8B - Vt2.8B},[Xn]` | `Vt2.8B -> result.val[1]`
`Vt.8B -> result.val[0]` | `v7/A32/A64` | +| uint8x16x2_t vld2q_dup_u8(uint8_t const *ptr) | `ptr -> Xn` | `LD2R {Vt.16B - Vt2.16B},[Xn]` | `Vt2.16B -> result.val[1]`
`Vt.16B -> result.val[0]` | `v7/A32/A64` | +| uint16x4x2_t vld2_dup_u16(uint16_t const *ptr) | `ptr -> Xn` | `LD2R {Vt.4H - Vt2.4H},[Xn]` | `Vt2.4H -> result.val[1]`
`Vt.4H -> result.val[0]` | `v7/A32/A64` | +| uint16x8x2_t vld2q_dup_u16(uint16_t const *ptr) | `ptr -> Xn` | `LD2R {Vt.8H - Vt2.8H},[Xn]` | `Vt2.8H -> result.val[1]`
`Vt.8H -> result.val[0]` | `v7/A32/A64` | +| uint32x2x2_t vld2_dup_u32(uint32_t const *ptr) | `ptr -> Xn` | `LD2R {Vt.2S - Vt2.2S},[Xn]` | `Vt2.2S -> result.val[1]`
`Vt.2S -> result.val[0]` | `v7/A32/A64` | +| uint32x4x2_t vld2q_dup_u32(uint32_t const *ptr) | `ptr -> Xn` | `LD2R {Vt.4S - Vt2.4S},[Xn]` | `Vt2.4S -> result.val[1]`
`Vt.4S -> result.val[0]` | `v7/A32/A64` | +| float16x4x2_t vld2_dup_f16(float16_t const *ptr) | `ptr -> Xn` | `LD2R {Vt.4H - Vt2.4H},[Xn]` | `Vt2.4H -> result.val[1]`
`Vt.4H -> result.val[0]` | `v7/A32/A64` | +| float16x8x2_t vld2q_dup_f16(float16_t const *ptr) | `ptr -> Xn` | `LD2R {Vt.8H - Vt2.8H},[Xn]` | `Vt2.8H -> result.val[1]`
`Vt.8H -> result.val[0]` | `v7/A32/A64` | +| float32x2x2_t vld2_dup_f32(float32_t const *ptr) | `ptr -> Xn` | `LD2R {Vt.2S - Vt2.2S},[Xn]` | `Vt2.2S -> result.val[1]`
`Vt.2S -> result.val[0]` | `v7/A32/A64` | +| float32x4x2_t vld2q_dup_f32(float32_t const *ptr) | `ptr -> Xn` | `LD2R {Vt.4S - Vt2.4S},[Xn]` | `Vt2.4S -> result.val[1]`
`Vt.4S -> result.val[0]` | `v7/A32/A64` | +| poly8x8x2_t vld2_dup_p8(poly8_t const *ptr) | `ptr -> Xn` | `LD2R {Vt.8B - Vt2.8B},[Xn]` | `Vt2.8B -> result.val[1]`
`Vt.8B -> result.val[0]` | `v7/A32/A64` | +| poly8x16x2_t vld2q_dup_p8(poly8_t const *ptr) | `ptr -> Xn` | `LD2R {Vt.16B - Vt2.16B},[Xn]` | `Vt2.16B -> result.val[1]`
`Vt.16B -> result.val[0]` | `v7/A32/A64` | +| poly16x4x2_t vld2_dup_p16(poly16_t const *ptr) | `ptr -> Xn` | `LD2R {Vt.4H - Vt2.4H},[Xn]` | `Vt2.4H -> result.val[1]`
`Vt.4H -> result.val[0]` | `v7/A32/A64` | +| poly16x8x2_t vld2q_dup_p16(poly16_t const *ptr) | `ptr -> Xn` | `LD2R {Vt.8H - Vt2.8H},[Xn]` | `Vt2.8H -> result.val[1]`
`Vt.8H -> result.val[0]` | `v7/A32/A64` | +| int64x1x2_t vld2_dup_s64(int64_t const *ptr) | `ptr -> Xn` | `LD2R {Vt.1D - Vt2.1D},[Xn]` | `Vt2.1D -> result.val[1]`
`Vt.1D -> result.val[0]` | `v7/A32/A64` | +| uint64x1x2_t vld2_dup_u64(uint64_t const *ptr) | `ptr -> Xn` | `LD2R {Vt.1D - Vt2.1D},[Xn]` | `Vt2.1D -> result.val[1]`
`Vt.1D -> result.val[0]` | `v7/A32/A64` | +| poly64x1x2_t vld2_dup_p64(poly64_t const *ptr) | `ptr -> Xn` | `LD2R {Vt.1D - Vt2.1D},[Xn]` | `Vt2.1D -> result.val[1]`
`Vt.1D -> result.val[0]` | `A32/A64` | +| int64x2x2_t vld2q_dup_s64(int64_t const *ptr) | `ptr -> Xn` | `LD2R {Vt.2D - Vt2.2D},[Xn]` | `Vt2.2D -> result.val[1]`
`Vt.2D -> result.val[0]` | `A64` | +| uint64x2x2_t vld2q_dup_u64(uint64_t const *ptr) | `ptr -> Xn` | `LD2R {Vt.2D - Vt2.2D},[Xn]` | `Vt2.2D -> result.val[1]`
`Vt.2D -> result.val[0]` | `A64` | +| poly64x2x2_t vld2q_dup_p64(poly64_t const *ptr) | `ptr -> Xn` | `LD2R {Vt.2D - Vt2.2D},[Xn]` | `Vt2.2D -> result.val[1]`
`Vt.2D -> result.val[0]` | `A64` | +| float64x1x2_t vld2_dup_f64(float64_t const *ptr) | `ptr -> Xn` | `LD2R {Vt.1D - Vt2.1D},[Xn]` | `Vt2.1D -> result.val[1]`
`Vt.1D -> result.val[0]` | `A64` | +| float64x2x2_t vld2q_dup_f64(float64_t const *ptr) | `ptr -> Xn` | `LD2R {Vt.2D - Vt2.2D},[Xn]` | `Vt2.2D -> result.val[1]`
`Vt.2D -> result.val[0]` | `A64` | +| mfloat8x8x2_t vld2_dup_mf8(mfloat8_t const *ptr) | `ptr -> Xn` | `LD2R {Vt.8B - Vt2.8B},[Xn]` | `Vt2.8B -> result.val[1]`
`Vt.8B -> result.val[0]` | `A64` | +| mfloat8x16x2_t vld2q_dup_mf8(mfloat8_t const *ptr) | `ptr -> Xn` | `LD2R {Vt.16B - Vt2.16B},[Xn]` | `Vt2.16B -> result.val[1]`
`Vt.16B -> result.val[0]` | `A64` | +| int8x8x3_t vld3_dup_s8(int8_t const *ptr) | `ptr -> Xn` | `LD3R {Vt.8B - Vt3.8B},[Xn]` | `Vt3.8B -> result.val[2]`
`Vt2.8B -> result.val[1]`
`Vt.8B -> result.val[0]` | `v7/A32/A64` | +| int8x16x3_t vld3q_dup_s8(int8_t const *ptr) | `ptr -> Xn` | `LD3R {Vt.16B - Vt3.16B},[Xn]` | `Vt3.16B -> result.val[2]`
`Vt2.16B -> result.val[1]`
`Vt.16B -> result.val[0]` | `v7/A32/A64` | +| int16x4x3_t vld3_dup_s16(int16_t const *ptr) | `ptr -> Xn` | `LD3R {Vt.4H - Vt3.4H},[Xn]` | `Vt3.4H -> result.val[2]`
`Vt2.4H -> result.val[1]`
`Vt.4H -> result.val[0]` | `v7/A32/A64` | +| int16x8x3_t vld3q_dup_s16(int16_t const *ptr) | `ptr -> Xn` | `LD3R {Vt.8H - Vt3.8H},[Xn]` | `Vt3.8H -> result.val[2]`
`Vt2.8H -> result.val[1]`
`Vt.8H -> result.val[0]` | `v7/A32/A64` | +| int32x2x3_t vld3_dup_s32(int32_t const *ptr) | `ptr -> Xn` | `LD3R {Vt.2S - Vt3.2S},[Xn]` | `Vt3.2S -> result.val[2]`
`Vt2.2S -> result.val[1]`
`Vt.2S -> result.val[0]` | `v7/A32/A64` | +| int32x4x3_t vld3q_dup_s32(int32_t const *ptr) | `ptr -> Xn` | `LD3R {Vt.4S - Vt3.4S},[Xn]` | `Vt3.4S -> result.val[2]`
`Vt2.4S -> result.val[1]`
`Vt.4S -> result.val[0]` | `v7/A32/A64` | +| uint8x8x3_t vld3_dup_u8(uint8_t const *ptr) | `ptr -> Xn` | `LD3R {Vt.8B - Vt3.8B},[Xn]` | `Vt3.8B -> result.val[2]`
`Vt2.8B -> result.val[1]`
`Vt.8B -> result.val[0]` | `v7/A32/A64` | +| uint8x16x3_t vld3q_dup_u8(uint8_t const *ptr) | `ptr -> Xn` | `LD3R {Vt.16B - Vt3.16B},[Xn]` | `Vt3.16B -> result.val[2]`
`Vt2.16B -> result.val[1]`
`Vt.16B -> result.val[0]` | `v7/A32/A64` | +| uint16x4x3_t vld3_dup_u16(uint16_t const *ptr) | `ptr -> Xn` | `LD3R {Vt.4H - Vt3.4H},[Xn]` | `Vt3.4H -> result.val[2]`
`Vt2.4H -> result.val[1]`
`Vt.4H -> result.val[0]` | `v7/A32/A64` | +| uint16x8x3_t vld3q_dup_u16(uint16_t const *ptr) | `ptr -> Xn` | `LD3R {Vt.8H - Vt3.8H},[Xn]` | `Vt3.8H -> result.val[2]`
`Vt2.8H -> result.val[1]`
`Vt.8H -> result.val[0]` | `v7/A32/A64` | +| uint32x2x3_t vld3_dup_u32(uint32_t const *ptr) | `ptr -> Xn` | `LD3R {Vt.2S - Vt3.2S},[Xn]` | `Vt3.2S -> result.val[2]`
`Vt2.2S -> result.val[1]`
`Vt.2S -> result.val[0]` | `v7/A32/A64` | +| uint32x4x3_t vld3q_dup_u32(uint32_t const *ptr) | `ptr -> Xn` | `LD3R {Vt.4S - Vt3.4S},[Xn]` | `Vt3.4S -> result.val[2]`
`Vt2.4S -> result.val[1]`
`Vt.4S -> result.val[0]` | `v7/A32/A64` | +| float16x4x3_t vld3_dup_f16(float16_t const *ptr) | `ptr -> Xn` | `LD3R {Vt.4H - Vt3.4H},[Xn]` | `Vt3.4H -> result.val[2]`
`Vt2.4H -> result.val[1]`
`Vt.4H -> result.val[0]` | `v7/A32/A64` | +| float16x8x3_t vld3q_dup_f16(float16_t const *ptr) | `ptr -> Xn` | `LD3R {Vt.8H - Vt3.8H},[Xn]` | `Vt3.8H -> result.val[2]`
`Vt2.8H -> result.val[1]`
`Vt.8H -> result.val[0]` | `v7/A32/A64` | +| float32x2x3_t vld3_dup_f32(float32_t const *ptr) | `ptr -> Xn` | `LD3R {Vt.2S - Vt3.2S},[Xn]` | `Vt3.2S -> result.val[2]`
`Vt2.2S -> result.val[1]`
`Vt.2S -> result.val[0]` | `v7/A32/A64` | +| float32x4x3_t vld3q_dup_f32(float32_t const *ptr) | `ptr -> Xn` | `LD3R {Vt.4S - Vt3.4S},[Xn]` | `Vt3.4S -> result.val[2]`
`Vt2.4S -> result.val[1]`
`Vt.4S -> result.val[0]` | `v7/A32/A64` | +| poly8x8x3_t vld3_dup_p8(poly8_t const *ptr) | `ptr -> Xn` | `LD3R {Vt.8B - Vt3.8B},[Xn]` | `Vt3.8B -> result.val[2]`
`Vt2.8B -> result.val[1]`
`Vt.8B -> result.val[0]` | `v7/A32/A64` | +| poly8x16x3_t vld3q_dup_p8(poly8_t const *ptr) | `ptr -> Xn` | `LD3R {Vt.16B - Vt3.16B},[Xn]` | `Vt3.16B -> result.val[2]`
`Vt2.16B -> result.val[1]`
`Vt.16B -> result.val[0]` | `v7/A32/A64` | +| poly16x4x3_t vld3_dup_p16(poly16_t const *ptr) | `ptr -> Xn` | `LD3R {Vt.4H - Vt3.4H},[Xn]` | `Vt3.4H -> result.val[2]`
`Vt2.4H -> result.val[1]`
`Vt.4H -> result.val[0]` | `v7/A32/A64` | +| poly16x8x3_t vld3q_dup_p16(poly16_t const *ptr) | `ptr -> Xn` | `LD3R {Vt.8H - Vt3.8H},[Xn]` | `Vt3.8H -> result.val[2]`
`Vt2.8H -> result.val[1]`
`Vt.8H -> result.val[0]` | `v7/A32/A64` | +| int64x1x3_t vld3_dup_s64(int64_t const *ptr) | `ptr -> Xn` | `LD3R {Vt.1D - Vt3.1D},[Xn]` | `Vt3.1D -> result.val[2]`
`Vt2.1D -> result.val[1]`
`Vt.1D -> result.val[0]` | `v7/A32/A64` | +| uint64x1x3_t vld3_dup_u64(uint64_t const *ptr) | `ptr -> Xn` | `LD3R {Vt.1D - Vt3.1D},[Xn]` | `Vt3.1D -> result.val[2]`
`Vt2.1D -> result.val[1]`
`Vt.1D -> result.val[0]` | `v7/A32/A64` | +| poly64x1x3_t vld3_dup_p64(poly64_t const *ptr) | `ptr -> Xn` | `LD3R {Vt.1D - Vt3.1D},[Xn]` | `Vt3.1D -> result.val[2]`
`Vt2.1D -> result.val[1]`
`Vt.1D -> result.val[0]` | `A32/A64` | +| int64x2x3_t vld3q_dup_s64(int64_t const *ptr) | `ptr -> Xn` | `LD3R {Vt.2D - Vt3.2D},[Xn]` | `Vt3.2D -> result.val[2]`
`Vt2.2D -> result.val[1]`
`Vt.2D -> result.val[0]` | `A64` | +| uint64x2x3_t vld3q_dup_u64(uint64_t const *ptr) | `ptr -> Xn` | `LD3R {Vt.2D - Vt3.2D},[Xn]` | `Vt3.2D -> result.val[2]`
`Vt2.2D -> result.val[1]`
`Vt.2D -> result.val[0]` | `A64` | +| poly64x2x3_t vld3q_dup_p64(poly64_t const *ptr) | `ptr -> Xn` | `LD3R {Vt.2D - Vt3.2D},[Xn]` | `Vt3.2D -> result.val[2]`
`Vt2.2D -> result.val[1]`
`Vt.2D -> result.val[0]` | `A64` | +| float64x1x3_t vld3_dup_f64(float64_t const *ptr) | `ptr -> Xn` | `LD3R {Vt.1D - Vt3.1D},[Xn]` | `Vt3.1D -> result.val[2]`
`Vt2.1D -> result.val[1]`
`Vt.1D -> result.val[0]` | `A64` | +| float64x2x3_t vld3q_dup_f64(float64_t const *ptr) | `ptr -> Xn` | `LD3R {Vt.2D - Vt3.2D},[Xn]` | `Vt3.2D -> result.val[2]`
`Vt2.2D -> result.val[1]`
`Vt.2D -> result.val[0]` | `A64` | +| mfloat8x8x3_t vld3_dup_mf8(mfloat8_t const *ptr) | `ptr -> Xn` | `LD3R {Vt.8B - Vt3.8B},[Xn]` | `Vt3.8B -> result.val[2]`
`Vt2.8B -> result.val[1]`
`Vt.8B -> result.val[0]` | `A64` | +| mfloat8x16x3_t vld3q_dup_mf8(mfloat8_t const *ptr) | `ptr -> Xn` | `LD3R {Vt.16B - Vt3.16B},[Xn]` | `Vt3.16B -> result.val[2]`
`Vt2.16B -> result.val[1]`
`Vt.16B -> result.val[0]` | `A64` | +| int8x8x4_t vld4_dup_s8(int8_t const *ptr) | `ptr -> Xn` | `LD4R {Vt.8B - Vt4.8B},[Xn]` | `Vt4.8B -> result.val[3]`
`Vt3.8B -> result.val[2]`
`Vt2.8B -> result.val[1]`
`Vt.8B -> result.val[0]` | `v7/A32/A64` | +| int8x16x4_t vld4q_dup_s8(int8_t const *ptr) | `ptr -> Xn` | `LD4R {Vt.16B - Vt4.16B},[Xn]` | `Vt4.16B -> result.val[3]`
`Vt3.16B -> result.val[2]`
`Vt2.16B -> result.val[1]`
`Vt.16B -> result.val[0]` | `v7/A32/A64` | +| int16x4x4_t vld4_dup_s16(int16_t const *ptr) | `ptr -> Xn` | `LD4R {Vt.4H - Vt4.4H},[Xn]` | `Vt4.4H -> result.val[3]`
`Vt3.4H -> result.val[2]`
`Vt2.4H -> result.val[1]`
`Vt.4H -> result.val[0]` | `v7/A32/A64` | +| int16x8x4_t vld4q_dup_s16(int16_t const *ptr) | `ptr -> Xn` | `LD4R {Vt.8H - Vt4.8H},[Xn]` | `Vt4.8H -> result.val[3]`
`Vt3.8H -> result.val[2]`
`Vt2.8H -> result.val[1]`
`Vt.8H -> result.val[0]` | `v7/A32/A64` | +| int32x2x4_t vld4_dup_s32(int32_t const *ptr) | `ptr -> Xn` | `LD4R {Vt.2S - Vt4.2S},[Xn]` | `Vt4.2S -> result.val[3]`
`Vt3.2S -> result.val[2]`
`Vt2.2S -> result.val[1]`
`Vt.2S -> result.val[0]` | `v7/A32/A64` | +| int32x4x4_t vld4q_dup_s32(int32_t const *ptr) | `ptr -> Xn` | `LD4R {Vt.4S - Vt4.4S},[Xn]` | `Vt4.4S -> result.val[3]`
`Vt3.4S -> result.val[2]`
`Vt2.4S -> result.val[1]`
`Vt.4S -> result.val[0]` | `v7/A32/A64` | +| uint8x8x4_t vld4_dup_u8(uint8_t const *ptr) | `ptr -> Xn` | `LD4R {Vt.8B - Vt4.8B},[Xn]` | `Vt4.8B -> result.val[3]`
`Vt3.8B -> result.val[2]`
`Vt2.8B -> result.val[1]`
`Vt.8B -> result.val[0]` | `v7/A32/A64` | +| uint8x16x4_t vld4q_dup_u8(uint8_t const *ptr) | `ptr -> Xn` | `LD4R {Vt.16B - Vt4.16B},[Xn]` | `Vt4.16B -> result.val[3]`
`Vt3.16B -> result.val[2]`
`Vt2.16B -> result.val[1]`
`Vt.16B -> result.val[0]` | `v7/A32/A64` | +| uint16x4x4_t vld4_dup_u16(uint16_t const *ptr) | `ptr -> Xn` | `LD4R {Vt.4H - Vt4.4H},[Xn]` | `Vt4.4H -> result.val[3]`
`Vt3.4H -> result.val[2]`
`Vt2.4H -> result.val[1]`
`Vt.4H -> result.val[0]` | `v7/A32/A64` | +| uint16x8x4_t vld4q_dup_u16(uint16_t const *ptr) | `ptr -> Xn` | `LD4R {Vt.8H - Vt4.8H},[Xn]` | `Vt4.8H -> result.val[3]`
`Vt3.8H -> result.val[2]`
`Vt2.8H -> result.val[1]`
`Vt.8H -> result.val[0]` | `v7/A32/A64` | +| uint32x2x4_t vld4_dup_u32(uint32_t const *ptr) | `ptr -> Xn` | `LD4R {Vt.2S - Vt4.2S},[Xn]` | `Vt4.2S -> result.val[3]`
`Vt3.2S -> result.val[2]`
`Vt2.2S -> result.val[1]`
`Vt.2S -> result.val[0]` | `v7/A32/A64` | +| uint32x4x4_t vld4q_dup_u32(uint32_t const *ptr) | `ptr -> Xn` | `LD4R {Vt.4S - Vt4.4S},[Xn]` | `Vt4.4S -> result.val[3]`
`Vt3.4S -> result.val[2]`
`Vt2.4S -> result.val[1]`
`Vt.4S -> result.val[0]` | `v7/A32/A64` | +| float16x4x4_t vld4_dup_f16(float16_t const *ptr) | `ptr -> Xn` | `LD4R {Vt.4H - Vt4.4H},[Xn]` | `Vt4.4H -> result.val[3]`
`Vt3.4H -> result.val[2]`
`Vt2.4H -> result.val[1]`
`Vt.4H -> result.val[0]` | `v7/A32/A64` | +| float16x8x4_t vld4q_dup_f16(float16_t const *ptr) | `ptr -> Xn` | `LD4R {Vt.8H - Vt4.8H},[Xn]` | `Vt4.8H -> result.val[3]`
`Vt3.8H -> result.val[2]`
`Vt2.8H -> result.val[1]`
`Vt.8H -> result.val[0]` | `v7/A32/A64` | +| float32x2x4_t vld4_dup_f32(float32_t const *ptr) | `ptr -> Xn` | `LD4R {Vt.2S - Vt4.2S},[Xn]` | `Vt4.2S -> result.val[3]`
`Vt3.2S -> result.val[2]`
`Vt2.2S -> result.val[1]`
`Vt.2S -> result.val[0]` | `v7/A32/A64` | +| float32x4x4_t vld4q_dup_f32(float32_t const *ptr) | `ptr -> Xn` | `LD4R {Vt.4S - Vt4.4S},[Xn]` | `Vt4.4S -> result.val[3]`
`Vt3.4S -> result.val[2]`
`Vt2.4S -> result.val[1]`
`Vt.4S -> result.val[0]` | `v7/A32/A64` | +| poly8x8x4_t vld4_dup_p8(poly8_t const *ptr) | `ptr -> Xn` | `LD4R {Vt.8B - Vt4.8B},[Xn]` | `Vt4.8B -> result.val[3]`
`Vt3.8B -> result.val[2]`
`Vt2.8B -> result.val[1]`
`Vt.8B -> result.val[0]` | `v7/A32/A64` | +| poly8x16x4_t vld4q_dup_p8(poly8_t const *ptr) | `ptr -> Xn` | `LD4R {Vt.16B - Vt4.16B},[Xn]` | `Vt4.16B -> result.val[3]`
`Vt3.16B -> result.val[2]`
`Vt2.16B -> result.val[1]`
`Vt.16B -> result.val[0]` | `v7/A32/A64` | +| poly16x4x4_t vld4_dup_p16(poly16_t const *ptr) | `ptr -> Xn` | `LD4R {Vt.4H - Vt4.4H},[Xn]` | `Vt4.4H -> result.val[3]`
`Vt3.4H -> result.val[2]`
`Vt2.4H -> result.val[1]`
`Vt.4H -> result.val[0]` | `v7/A32/A64` | +| poly16x8x4_t vld4q_dup_p16(poly16_t const *ptr) | `ptr -> Xn` | `LD4R {Vt.8H - Vt4.8H},[Xn]` | `Vt4.8H -> result.val[3]`
`Vt3.8H -> result.val[2]`
`Vt2.8H -> result.val[1]`
`Vt.8H -> result.val[0]` | `v7/A32/A64` | +| int64x1x4_t vld4_dup_s64(int64_t const *ptr) | `ptr -> Xn` | `LD4R {Vt.1D - Vt4.1D},[Xn]` | `Vt4.1D -> result.val[3]`
`Vt3.1D -> result.val[2]`
`Vt2.1D -> result.val[1]`
`Vt.1D -> result.val[0]` | `v7/A32/A64` | +| uint64x1x4_t vld4_dup_u64(uint64_t const *ptr) | `ptr -> Xn` | `LD4R {Vt.1D - Vt4.1D},[Xn]` | `Vt4.1D -> result.val[3]`
`Vt3.1D -> result.val[2]`
`Vt2.1D -> result.val[1]`
`Vt.1D -> result.val[0]` | `v7/A32/A64` | +| poly64x1x4_t vld4_dup_p64(poly64_t const *ptr) | `ptr -> Xn` | `LD4R {Vt.1D - Vt4.1D},[Xn]` | `Vt4.1D -> result.val[3]`
`Vt3.1D -> result.val[2]`
`Vt2.1D -> result.val[1]`
`Vt.1D -> result.val[0]` | `A32/A64` | +| int64x2x4_t vld4q_dup_s64(int64_t const *ptr) | `ptr -> Xn` | `LD4R {Vt.2D - Vt4.2D},[Xn]` | `Vt4.2D -> result.val[3]`
`Vt3.2D -> result.val[2]`
`Vt2.2D -> result.val[1]`
`Vt.2D -> result.val[0]` | `A64` | +| uint64x2x4_t vld4q_dup_u64(uint64_t const *ptr) | `ptr -> Xn` | `LD4R {Vt.2D - Vt4.2D},[Xn]` | `Vt4.2D -> result.val[3]`
`Vt3.2D -> result.val[2]`
`Vt2.2D -> result.val[1]`
`Vt.2D -> result.val[0]` | `A64` | +| poly64x2x4_t vld4q_dup_p64(poly64_t const *ptr) | `ptr -> Xn` | `LD4R {Vt.2D - Vt4.2D},[Xn]` | `Vt4.2D -> result.val[3]`
`Vt3.2D -> result.val[2]`
`Vt2.2D -> result.val[1]`
`Vt.2D -> result.val[0]` | `A64` | +| float64x1x4_t vld4_dup_f64(float64_t const *ptr) | `ptr -> Xn` | `LD4R {Vt.1D - Vt4.1D},[Xn]` | `Vt4.1D -> result.val[3]`
`Vt3.1D -> result.val[2]`
`Vt2.1D -> result.val[1]`
`Vt.1D -> result.val[0]` | `A64` | +| float64x2x4_t vld4q_dup_f64(float64_t const *ptr) | `ptr -> Xn` | `LD4R {Vt.2D - Vt4.2D},[Xn]` | `Vt4.2D -> result.val[3]`
`Vt3.2D -> result.val[2]`
`Vt2.2D -> result.val[1]`
`Vt.2D -> result.val[0]` | `A64` | +| mfloat8x8x4_t vld4_dup_mf8(mfloat8_t const *ptr) | `ptr -> Xn` | `LD4R {Vt.8B - Vt4.8B},[Xn]` | `Vt4.8B -> result.val[3]`
`Vt3.8B -> result.val[2]`
`Vt2.8B -> result.val[1]`
`Vt.8B -> result.val[0]` | `A64` | +| mfloat8x16x4_t vld4q_dup_mf8(mfloat8_t const *ptr) | `ptr -> Xn` | `LD4R {Vt.16B - Vt4.16B},[Xn]` | `Vt4.16B -> result.val[3]`
`Vt3.16B -> result.val[2]`
`Vt2.16B -> result.val[1]`
`Vt.16B -> result.val[0]` | `A64` | +| int16x4x2_t vld2_lane_s16(
     int16_t const *ptr,
     int16x4x2_t src,
     const int lane)
| `ptr -> Xn`
`src.val[1] -> Vt2.4H`
`src.val[0] -> Vt.4H`
`0 <= lane <= 3` | `LD2 {Vt.h - Vt2.h}[lane],[Xn]` | `Vt2.4H -> result.val[1]`
`Vt.4H -> result.val[0]` | `v7/A32/A64` | +| int16x8x2_t vld2q_lane_s16(
     int16_t const *ptr,
     int16x8x2_t src,
     const int lane)
| `ptr -> Xn`
`src.val[1] -> Vt2.8H`
`src.val[0] -> Vt.8H`
`0 <= lane <= 7` | `LD2 {Vt.h - Vt2.h}[lane],[Xn]` | `Vt2.8H -> result.val[1]`
`Vt.8H -> result.val[0]` | `v7/A32/A64` | +| int32x2x2_t vld2_lane_s32(
     int32_t const *ptr,
     int32x2x2_t src,
     const int lane)
| `ptr -> Xn`
`src.val[1] -> Vt2.2S`
`src.val[0] -> Vt.2S`
`0 <= lane <= 1` | `LD2 {Vt.s - Vt2.s}[lane],[Xn]` | `Vt2.2S -> result.val[1]`
`Vt.2S -> result.val[0]` | `v7/A32/A64` | +| int32x4x2_t vld2q_lane_s32(
     int32_t const *ptr,
     int32x4x2_t src,
     const int lane)
| `ptr -> Xn`
`src.val[1] -> Vt2.4S`
`src.val[0] -> Vt.4S`
`0 <= lane <= 3` | `LD2 {Vt.s - Vt2.s}[lane],[Xn]` | `Vt2.4S -> result.val[1]`
`Vt.4S -> result.val[0]` | `v7/A32/A64` | +| uint16x4x2_t vld2_lane_u16(
     uint16_t const *ptr,
     uint16x4x2_t src,
     const int lane)
| `ptr -> Xn`
`src.val[1] -> Vt2.4H`
`src.val[0] -> Vt.4H`
`0 <= lane <= 3` | `LD2 {Vt.h - Vt2.h}[lane],[Xn]` | `Vt2.4H -> result.val[1]`
`Vt.4H -> result.val[0]` | `v7/A32/A64` | +| uint16x8x2_t vld2q_lane_u16(
     uint16_t const *ptr,
     uint16x8x2_t src,
     const int lane)
| `ptr -> Xn`
`src.val[1] -> Vt2.8H`
`src.val[0] -> Vt.8H`
`0 <= lane <= 7` | `LD2 {Vt.h - Vt2.h}[lane],[Xn]` | `Vt2.8H -> result.val[1]`
`Vt.8H -> result.val[0]` | `v7/A32/A64` | +| uint32x2x2_t vld2_lane_u32(
     uint32_t const *ptr,
     uint32x2x2_t src,
     const int lane)
| `ptr -> Xn`
`src.val[1] -> Vt2.2S`
`src.val[0] -> Vt.2S`
`0 <= lane <= 1` | `LD2 {Vt.s - Vt2.s}[lane],[Xn]` | `Vt2.2S -> result.val[1]`
`Vt.2S -> result.val[0]` | `v7/A32/A64` | +| uint32x4x2_t vld2q_lane_u32(
     uint32_t const *ptr,
     uint32x4x2_t src,
     const int lane)
| `ptr -> Xn`
`src.val[1] -> Vt2.4S`
`src.val[0] -> Vt.4S`
`0 <= lane <= 3` | `LD2 {Vt.s - Vt2.s}[lane],[Xn]` | `Vt2.4S -> result.val[1]`
`Vt.4S -> result.val[0]` | `v7/A32/A64` | +| float16x4x2_t vld2_lane_f16(
     float16_t const *ptr,
     float16x4x2_t src,
     const int lane)
| `ptr -> Xn`
`src.val[1] -> Vt2.4H`
`src.val[0] -> Vt.4H`
`0 <= lane <= 3` | `LD2 {Vt.h - Vt2.h}[lane],[Xn]` | `Vt2.4H -> result.val[1]`
`Vt.4H -> result.val[0]` | `v7/A32/A64` | +| float16x8x2_t vld2q_lane_f16(
     float16_t const *ptr,
     float16x8x2_t src,
     const int lane)
| `ptr -> Xn`
`src.val[1] -> Vt2.8H`
`src.val[0] -> Vt.8H`
`0 <= lane <= 7` | `LD2 {Vt.h - Vt2.h}[lane],[Xn]` | `Vt2.8H -> result.val[1]`
`Vt.8H -> result.val[0]` | `v7/A32/A64` | +| float32x2x2_t vld2_lane_f32(
     float32_t const *ptr,
     float32x2x2_t src,
     const int lane)
| `ptr -> Xn`
`src.val[1] -> Vt2.2S`
`src.val[0] -> Vt.2S`
`0 <= lane <= 1` | `LD2 {Vt.s - Vt2.s}[lane],[Xn]` | `Vt2.2S -> result.val[1]`
`Vt.2S -> result.val[0]` | `v7/A32/A64` | +| float32x4x2_t vld2q_lane_f32(
     float32_t const *ptr,
     float32x4x2_t src,
     const int lane)
| `ptr -> Xn`
`src.val[1] -> Vt2.4S`
`src.val[0] -> Vt.4S`
`0 <= lane <= 3` | `LD2 {Vt.s - Vt2.s}[lane],[Xn]` | `Vt2.4S -> result.val[1]`
`Vt.4S -> result.val[0]` | `v7/A32/A64` | +| poly16x4x2_t vld2_lane_p16(
     poly16_t const *ptr,
     poly16x4x2_t src,
     const int lane)
| `ptr -> Xn`
`src.val[1] -> Vt2.4H`
`src.val[0] -> Vt.4H`
`0 <= lane <= 3` | `LD2 {Vt.h - Vt2.h}[lane],[Xn]` | `Vt2.4H -> result.val[1]`
`Vt.4H -> result.val[0]` | `v7/A32/A64` | +| poly16x8x2_t vld2q_lane_p16(
     poly16_t const *ptr,
     poly16x8x2_t src,
     const int lane)
| `ptr -> Xn`
`src.val[1] -> Vt2.8H`
`src.val[0] -> Vt.8H`
`0 <= lane <= 7` | `LD2 {Vt.h - Vt2.h}[lane],[Xn]` | `Vt2.8H -> result.val[1]`
`Vt.8H -> result.val[0]` | `v7/A32/A64` | +| int8x8x2_t vld2_lane_s8(
     int8_t const *ptr,
     int8x8x2_t src,
     const int lane)
| `ptr -> Xn`
`src.val[1] -> Vt2.8B`
`src.val[0] -> Vt.8B`
`0 <= lane <= 7` | `LD2 {Vt.b - Vt2.b}[lane],[Xn]` | `Vt2.8B -> result.val[1]`
`Vt.8B -> result.val[0]` | `v7/A32/A64` | +| uint8x8x2_t vld2_lane_u8(
     uint8_t const *ptr,
     uint8x8x2_t src,
     const int lane)
| `ptr -> Xn`
`src.val[1] -> Vt2.8B`
`src.val[0] -> Vt.8B`
`0 <= lane <= 7` | `LD2 {Vt.b - Vt2.b}[lane],[Xn]` | `Vt2.8B -> result.val[1]`
`Vt.8B -> result.val[0]` | `v7/A32/A64` | +| poly8x8x2_t vld2_lane_p8(
     poly8_t const *ptr,
     poly8x8x2_t src,
     const int lane)
| `ptr -> Xn`
`src.val[1] -> Vt2.8B`
`src.val[0] -> Vt.8B`
`0 <= lane <= 7` | `LD2 {Vt.b - Vt2.b}[lane],[Xn]` | `Vt2.8B -> result.val[1]`
`Vt.8B -> result.val[0]` | `v7/A32/A64` | +| int8x16x2_t vld2q_lane_s8(
     int8_t const *ptr,
     int8x16x2_t src,
     const int lane)
| `ptr -> Xn`
`src.val[1] -> Vt2.16B`
`src.val[0] -> Vt.16B`
`0 <= lane <= 15` | `LD2 {Vt.b - Vt2.b}[lane],[Xn]` | `Vt2.16B -> result.val[1]`
`Vt.16B -> result.val[0]` | `A64` | +| uint8x16x2_t vld2q_lane_u8(
     uint8_t const *ptr,
     uint8x16x2_t src,
     const int lane)
| `ptr -> Xn`
`src.val[1] -> Vt2.16B`
`src.val[0] -> Vt.16B`
`0 <= lane <= 15` | `LD2 {Vt.b - Vt2.b}[lane],[Xn]` | `Vt2.16B -> result.val[1]`
`Vt.16B -> result.val[0]` | `A64` | +| poly8x16x2_t vld2q_lane_p8(
     poly8_t const *ptr,
     poly8x16x2_t src,
     const int lane)
| `ptr -> Xn`
`src.val[1] -> Vt2.16B`
`src.val[0] -> Vt.16B`
`0 <= lane <= 15` | `LD2 {Vt.b - Vt2.b}[lane],[Xn]` | `Vt2.16B -> result.val[1]`
`Vt.16B -> result.val[0]` | `A64` | +| int64x1x2_t vld2_lane_s64(
     int64_t const *ptr,
     int64x1x2_t src,
     const int lane)
| `ptr -> Xn`
`src.val[1] -> Vt2.1D`
`src.val[0] -> Vt.1D`
`0 <= lane <= 0` | `LD2 {Vt.d - Vt2.d}[lane],[Xn]` | `ptr -> Xn`
`Vt2.1D -> result.val[1]`
`Vt.1D -> result.val[0]` | `A64` | +| int64x2x2_t vld2q_lane_s64(
     int64_t const *ptr,
     int64x2x2_t src,
     const int lane)
| `ptr -> Xn`
`src.val[1] -> Vt2.2D`
`src.val[0] -> Vt.2D`
`0 <= lane <= 1` | `LD2 {Vt.d - Vt2.d}[lane],[Xn]` | `ptr -> Xn`
`Vt2.2D -> result.val[1]`
`Vt.2D -> result.val[0]` | `A64` | +| uint64x1x2_t vld2_lane_u64(
     uint64_t const *ptr,
     uint64x1x2_t src,
     const int lane)
| `ptr -> Xn`
`src.val[1] -> Vt2.1D`
`src.val[0] -> Vt.1D`
`0 <= lane <= 0` | `LD2 {Vt.d - Vt2.d}[lane],[Xn]` | `Vt2.1D -> result.val[1]`
`Vt.1D -> result.val[0]` | `A64` | +| uint64x2x2_t vld2q_lane_u64(
     uint64_t const *ptr,
     uint64x2x2_t src,
     const int lane)
| `ptr -> Xn`
`src.val[1] -> Vt2.2D`
`src.val[0] -> Vt.2D`
`0 <= lane <= 1` | `LD2 {Vt.d - Vt2.d}[lane],[Xn]` | `Vt2.2D -> result.val[1]`
`Vt.2D -> result.val[0]` | `A64` | +| poly64x1x2_t vld2_lane_p64(
     poly64_t const *ptr,
     poly64x1x2_t src,
     const int lane)
| `ptr -> Xn`
`src.val[1] -> Vt2.1D`
`src.val[0] -> Vt.1D`
`0 <= lane <= 0` | `LD2 {Vt.d - Vt2.d}[lane],[Xn]` | `Vt2.1D -> result.val[1]`
`Vt.1D -> result.val[0]` | `A64` | +| poly64x2x2_t vld2q_lane_p64(
     poly64_t const *ptr,
     poly64x2x2_t src,
     const int lane)
| `ptr -> Xn`
`src.val[1] -> Vt2.2D`
`src.val[0] -> Vt.2D`
`0 <= lane <= 1` | `LD2 {Vt.d - Vt2.d}[lane],[Xn]` | `Vt2.2D -> result.val[1]`
`Vt.2D -> result.val[0]` | `A64` | +| float64x1x2_t vld2_lane_f64(
     float64_t const *ptr,
     float64x1x2_t src,
     const int lane)
| `ptr -> Xn`
`src.val[1] -> Vt2.1D`
`src.val[0] -> Vt.1D`
`0 <= lane <= 0` | `LD2 {Vt.d - Vt2.d}[lane],[Xn]` | `Vt2.1D -> result.val[1]`
`Vt.1D -> result.val[0]` | `A64` | +| float64x2x2_t vld2q_lane_f64(
     float64_t const *ptr,
     float64x2x2_t src,
     const int lane)
| `ptr -> Xn`
`src.val[1] -> Vt2.2D`
`src.val[0] -> Vt.2D`
`0 <= lane <= 1` | `LD2 {Vt.d - Vt2.d}[lane],[Xn]` | `Vt2.2D -> result.val[1]`
`Vt.2D -> result.val[0]` | `A64` | +| mfloat8x8x2_t vld2_lane_mf8(
     mfloat8_t const *ptr,
     mfloat8x8x2_t src,
     const int lane)
| `ptr -> Xn`
`src.val[1] -> Vt2.8B`
`src.val[0] -> Vt.8B`
`0 <= lane <= 7` | `LD2 {Vt.b - Vt2.b}[lane],[Xn]` | `Vt2.8B -> result.val[1]`
`Vt.8B -> result.val[0]` | `A64` | +| mfloat8x16x2_t vld2q_lane_mf8(
     mfloat8_t const *ptr,
     mfloat8x16x2_t src,
     const int lane)
| `ptr -> Xn`
`src.val[1] -> Vt2.16B`
`src.val[0] -> Vt.16B`
`0 <= lane <= 15` | `LD2 {Vt.b - Vt2.b}[lane],[Xn]` | `Vt2.16B -> result.val[1]`
`Vt.16B -> result.val[0]` | `A64` | +| int16x4x3_t vld3_lane_s16(
     int16_t const *ptr,
     int16x4x3_t src,
     const int lane)
| `ptr -> Xn`
`src.val[2] -> Vt3.4H`
`src.val[1] -> Vt2.4H`
`src.val[0] -> Vt.4H`
`0 <= lane <= 3` | `LD3 {Vt.h - Vt3.h}[lane],[Xn]` | `Vt3.4H -> result.val[2]`
`Vt2.4H -> result.val[1]`
`Vt.4H -> result.val[0]` | `v7/A32/A64` | +| int16x8x3_t vld3q_lane_s16(
     int16_t const *ptr,
     int16x8x3_t src,
     const int lane)
| `ptr -> Xn`
`src.val[2] -> Vt3.8H`
`src.val[1] -> Vt2.8H`
`src.val[0] -> Vt.8H`
`0 <= lane <= 7` | `LD3 {Vt.h - Vt3.h}[lane],[Xn]` | `Vt3.8H -> result.val[2]`
`Vt2.8H -> result.val[1]`
`Vt.8H -> result.val[0]` | `v7/A32/A64` | +| int32x2x3_t vld3_lane_s32(
     int32_t const *ptr,
     int32x2x3_t src,
     const int lane)
| `ptr -> Xn`
`src.val[2] -> Vt3.2S`
`src.val[1] -> Vt2.2S`
`src.val[0] -> Vt.2S`
`0 <= lane <= 1` | `LD3 {Vt.s - Vt3.s}[lane],[Xn]` | `Vt3.2S -> result.val[2]`
`Vt2.2S -> result.val[1]`
`Vt.2S -> result.val[0]` | `v7/A32/A64` | +| int32x4x3_t vld3q_lane_s32(
     int32_t const *ptr,
     int32x4x3_t src,
     const int lane)
| `ptr -> Xn`
`src.val[2] -> Vt3.4S`
`src.val[1] -> Vt2.4S`
`src.val[0] -> Vt.4S`
`0 <= lane <= 3` | `LD3 {Vt.s - Vt3.s}[lane],[Xn]` | `Vt3.4S -> result.val[2]`
`Vt2.4S -> result.val[1]`
`Vt.4S -> result.val[0]` | `v7/A32/A64` | +| uint16x4x3_t vld3_lane_u16(
     uint16_t const *ptr,
     uint16x4x3_t src,
     const int lane)
| `ptr -> Xn`
`src.val[2] -> Vt3.4H`
`src.val[1] -> Vt2.4H`
`src.val[0] -> Vt.4H`
`0 <= lane <= 3` | `LD3 {Vt.h - Vt3.h}[lane],[Xn]` | `Vt3.4H -> result.val[2]`
`Vt2.4H -> result.val[1]`
`Vt.4H -> result.val[0]` | `v7/A32/A64` | +| uint16x8x3_t vld3q_lane_u16(
     uint16_t const *ptr,
     uint16x8x3_t src,
     const int lane)
| `ptr -> Xn`
`src.val[2] -> Vt3.8H`
`src.val[1] -> Vt2.8H`
`src.val[0] -> Vt.8H`
`0 <= lane <= 7` | `LD3 {Vt.h - Vt3.h}[lane],[Xn]` | `Vt3.8H -> result.val[2]`
`Vt2.8H -> result.val[1]`
`Vt.8H -> result.val[0]` | `v7/A32/A64` | +| uint32x2x3_t vld3_lane_u32(
     uint32_t const *ptr,
     uint32x2x3_t src,
     const int lane)
| `ptr -> Xn`
`src.val[2] -> Vt3.2S`
`src.val[1] -> Vt2.2S`
`src.val[0] -> Vt.2S`
`0 <= lane <= 1` | `LD3 {Vt.s - Vt3.s}[lane],[Xn]` | `Vt3.2S -> result.val[2]`
`Vt2.2S -> result.val[1]`
`Vt.2S -> result.val[0]` | `v7/A32/A64` | +| uint32x4x3_t vld3q_lane_u32(
     uint32_t const *ptr,
     uint32x4x3_t src,
     const int lane)
| `ptr -> Xn`
`src.val[2] -> Vt3.4S`
`src.val[1] -> Vt2.4S`
`src.val[0] -> Vt.4S`
`0 <= lane <= 3` | `LD3 {Vt.s - Vt3.s}[lane],[Xn]` | `Vt3.4S -> result.val[2]`
`Vt2.4S -> result.val[1]`
`Vt.4S -> result.val[0]` | `v7/A32/A64` | +| float16x4x3_t vld3_lane_f16(
     float16_t const *ptr,
     float16x4x3_t src,
     const int lane)
| `ptr -> Xn`
`src.val[2] -> Vt3.4H`
`src.val[1] -> Vt2.4H`
`src.val[0] -> Vt.4H`
`0 <= lane <= 3` | `LD3 {Vt.h - Vt3.h}[lane],[Xn]` | `Vt3.4H -> result.val[2]`
`Vt2.4H -> result.val[1]`
`Vt.4H -> result.val[0]` | `v7/A32/A64` | +| float16x8x3_t vld3q_lane_f16(
     float16_t const *ptr,
     float16x8x3_t src,
     const int lane)
| `ptr -> Xn`
`src.val[2] -> Vt3.8H`
`src.val[1] -> Vt2.8H`
`src.val[0] -> Vt.8H`
`0 <= lane <= 7` | `LD3 {Vt.h - Vt3.h}[lane],[Xn]` | `Vt3.8H -> result.val[2]`
`Vt2.8H -> result.val[1]`
`Vt.8H -> result.val[0]` | `v7/A32/A64` | +| float32x2x3_t vld3_lane_f32(
     float32_t const *ptr,
     float32x2x3_t src,
     const int lane)
| `ptr -> Xn`
`src.val[2] -> Vt3.2S`
`src.val[1] -> Vt2.2S`
`src.val[0] -> Vt.2S`
`0 <= lane <= 1` | `LD3 {Vt.s - Vt3.s}[lane],[Xn]` | `Vt3.2S -> result.val[2]`
`Vt2.2S -> result.val[1]`
`Vt.2S -> result.val[0]` | `v7/A32/A64` | +| float32x4x3_t vld3q_lane_f32(
     float32_t const *ptr,
     float32x4x3_t src,
     const int lane)
| `ptr -> Xn`
`src.val[2] -> Vt3.4S`
`src.val[1] -> Vt2.4S`
`src.val[0] -> Vt.4S`
`0 <= lane <= 3` | `LD3 {Vt.s - Vt3.s}[lane],[Xn]` | `Vt3.4S -> result.val[2]`
`Vt2.4S -> result.val[1]`
`Vt.4S -> result.val[0]` | `v7/A32/A64` | +| poly16x4x3_t vld3_lane_p16(
     poly16_t const *ptr,
     poly16x4x3_t src,
     const int lane)
| `ptr -> Xn`
`src.val[2] -> Vt3.4H`
`src.val[1] -> Vt2.4H`
`src.val[0] -> Vt.4H`
`0 <= lane <= 3` | `LD3 {Vt.h - Vt3.h}[lane],[Xn]` | `Vt3.4H -> result.val[2]`
`Vt2.4H -> result.val[1]`
`Vt.4H -> result.val[0]` | `v7/A32/A64` | +| poly16x8x3_t vld3q_lane_p16(
     poly16_t const *ptr,
     poly16x8x3_t src,
     const int lane)
| `ptr -> Xn`
`src.val[2] -> Vt3.8H`
`src.val[1] -> Vt2.8H`
`src.val[0] -> Vt.8H`
`0 <= lane <= 7` | `LD3 {Vt.h - Vt3.h}[lane],[Xn]` | `Vt3.8H -> result.val[2]`
`Vt2.8H -> result.val[1]`
`Vt.8H -> result.val[0]` | `v7/A32/A64` | +| int8x8x3_t vld3_lane_s8(
     int8_t const *ptr,
     int8x8x3_t src,
     const int lane)
| `ptr -> Xn`
`src.val[2] -> Vt3.8B`
`src.val[1] -> Vt2.8B`
`src.val[0] -> Vt.8B`
`0 <= lane <= 7` | `LD3 {Vt.b - Vt3.b}[lane],[Xn]` | `Vt3.8B -> result.val[2]`
`Vt2.8B -> result.val[1]`
`Vt.8B -> result.val[0]` | `v7/A32/A64` | +| uint8x8x3_t vld3_lane_u8(
     uint8_t const *ptr,
     uint8x8x3_t src,
     const int lane)
| `ptr -> Xn`
`src.val[2] -> Vt3.8B`
`src.val[1] -> Vt2.8B`
`src.val[0] -> Vt.8B`
`0 <= lane <= 7` | `LD3 {Vt.b - Vt3.b}[lane],[Xn]` | `Vt3.8B -> result.val[2]`
`Vt2.8B -> result.val[1]`
`Vt.8B -> result.val[0]` | `v7/A32/A64` | +| poly8x8x3_t vld3_lane_p8(
     poly8_t const *ptr,
     poly8x8x3_t src,
     const int lane)
| `ptr -> Xn`
`src.val[2] -> Vt3.8B`
`src.val[1] -> Vt2.8B`
`src.val[0] -> Vt.8B`
`0 <= lane <= 7` | `LD3 {Vt.b - Vt3.b}[lane],[Xn]` | `Vt3.8B -> result.val[2]`
`Vt2.8B -> result.val[1]`
`Vt.8B -> result.val[0]` | `v7/A32/A64` | +| int8x16x3_t vld3q_lane_s8(
     int8_t const *ptr,
     int8x16x3_t src,
     const int lane)
| `ptr -> Xn`
`src.val[2] -> Vt3.16B`
`src.val[1] -> Vt2.16B`
`src.val[0] -> Vt.16B`
`0 <= lane <= 15` | `LD3 {Vt.b - Vt3.b}[lane],[Xn]` | `Vt3.16B -> result.val[2]`
`Vt2.16B -> result.val[1]`
`Vt.16B -> result.val[0]` | `A64` | +| uint8x16x3_t vld3q_lane_u8(
     uint8_t const *ptr,
     uint8x16x3_t src,
     const int lane)
| `ptr -> Xn`
`src.val[2] -> Vt3.16B`
`src.val[1] -> Vt2.16B`
`src.val[0] -> Vt.16B`
`0 <= lane <= 15` | `LD3 {Vt.b - Vt3.b}[lane],[Xn]` | `Vt3.16B -> result.val[2]`
`Vt2.16B -> result.val[1]`
`Vt.16B -> result.val[0]` | `A64` | +| poly8x16x3_t vld3q_lane_p8(
     poly8_t const *ptr,
     poly8x16x3_t src,
     const int lane)
| `ptr -> Xn`
`src.val[2] -> Vt3.16B`
`src.val[1] -> Vt2.16B`
`src.val[0] -> Vt.16B`
`0 <= lane <= 15` | `LD3 {Vt.b - Vt3.b}[lane],[Xn]` | `Vt3.16B -> result.val[2]`
`Vt2.16B -> result.val[1]`
`Vt.16B -> result.val[0]` | `A64` | +| int64x1x3_t vld3_lane_s64(
     int64_t const *ptr,
     int64x1x3_t src,
     const int lane)
| `ptr -> Xn`
`src.val[2] -> Vt3.1D`
`src.val[1] -> Vt2.1D`
`src.val[0] -> Vt.1D`
`0 <= lane <= 0` | `LD3 {Vt.d - Vt3.d}[lane],[Xn]` | `Vt3.1D -> result.val[2]`
`Vt2.1D -> result.val[1]`
`Vt.1D -> result.val[0]` | `A64` | +| int64x2x3_t vld3q_lane_s64(
     int64_t const *ptr,
     int64x2x3_t src,
     const int lane)
| `ptr -> Xn`
`src.val[2] -> Vt3.2D`
`src.val[1] -> Vt2.2D`
`src.val[0] -> Vt.2D`
`0 <= lane <= 1` | `LD3 {Vt.d - Vt3.d}[lane],[Xn]` | `Vt3.2D -> result.val[2]`
`Vt2.2D -> result.val[1]`
`Vt.2D -> result.val[0]` | `A64` | +| uint64x1x3_t vld3_lane_u64(
     uint64_t const *ptr,
     uint64x1x3_t src,
     const int lane)
| `ptr -> Xn`
`src.val[2] -> Vt3.1D`
`src.val[1] -> Vt2.1D`
`src.val[0] -> Vt.1D`
`0 <= lane <= 0` | `LD3 {Vt.d - Vt3.d}[lane],[Xn]` | `Vt3.1D -> result.val[2]`
`Vt2.1D -> result.val[1]`
`Vt.1D -> result.val[0]` | `A64` | +| uint64x2x3_t vld3q_lane_u64(
     uint64_t const *ptr,
     uint64x2x3_t src,
     const int lane)
| `ptr -> Xn`
`src.val[2] -> Vt3.2D`
`src.val[1] -> Vt2.2D`
`src.val[0] -> Vt.2D`
`0 <= lane <= 1` | `LD3 {Vt.d - Vt3.d}[lane],[Xn]` | `Vt3.2D -> result.val[2]`
`Vt2.2D -> result.val[1]`
`Vt.2D -> result.val[0]` | `A64` | +| poly64x1x3_t vld3_lane_p64(
     poly64_t const *ptr,
     poly64x1x3_t src,
     const int lane)
| `ptr -> Xn`
`src.val[2] -> Vt3.1D`
`src.val[1] -> Vt2.1D`
`src.val[0] -> Vt.1D`
`0 <= lane <= 0` | `LD3 {Vt.d - Vt3.d}[lane],[Xn]` | `Vt3.1D -> result.val[2]`
`Vt2.1D -> result.val[1]`
`Vt.1D -> result.val[0]` | `A64` | +| poly64x2x3_t vld3q_lane_p64(
     poly64_t const *ptr,
     poly64x2x3_t src,
     const int lane)
| `ptr -> Xn`
`src.val[2] -> Vt3.2D`
`src.val[1] -> Vt2.2D`
`src.val[0] -> Vt.2D`
`0 <= lane <= 1` | `LD3 {Vt.d - Vt3.d}[lane],[Xn]` | `Vt3.2D -> result.val[2]`
`Vt2.2D -> result.val[1]`
`Vt.2D -> result.val[0]` | `A64` | +| float64x1x3_t vld3_lane_f64(
     float64_t const *ptr,
     float64x1x3_t src,
     const int lane)
| `ptr -> Xn`
`src.val[2] -> Vt3.1D`
`src.val[1] -> Vt2.1D`
`src.val[0] -> Vt.1D`
`0 <= lane <= 0` | `LD3 {Vt.d - Vt3.d}[lane],[Xn]` | `Vt3.1D -> result.val[2]`
`Vt2.1D -> result.val[1]`
`Vt.1D -> result.val[0]` | `A64` | +| float64x2x3_t vld3q_lane_f64(
     float64_t const *ptr,
     float64x2x3_t src,
     const int lane)
| `ptr -> Xn`
`src.val[2] -> Vt3.2D`
`src.val[1] -> Vt2.2D`
`src.val[0] -> Vt.2D`
`0 <= lane <= 1` | `LD3 {Vt.d - Vt3.d}[lane],[Xn]` | `Vt3.2D -> result.val[2]`
`Vt2.2D -> result.val[1]`
`Vt.2D -> result.val[0]` | `A64` | +| mfloat8x8x3_t vld3_lane_mf8(
     mfloat8_t const *ptr,
     mfloat8x8x3_t src,
     const int lane)
| `ptr -> Xn`
`src.val[2] -> Vt3.8B`
`src.val[1] -> Vt2.8B`
`src.val[0] -> Vt.8B`
`0 <= lane <= 7` | `LD3 {Vt.b - Vt3.b}[lane],[Xn]` | `Vt3.8B -> result.val[2]`
`Vt2.8B -> result.val[1]`
`Vt.8B -> result.val[0]` | `A64` | +| mfloat8x16x3_t vld3q_lane_mf8(
     mfloat8_t const *ptr,
     mfloat8x16x3_t src,
     const int lane)
| `ptr -> Xn`
`src.val[2] -> Vt3.16B`
`src.val[1] -> Vt2.16B`
`src.val[0] -> Vt.16B`
`0 <= lane <= 15` | `LD3 {Vt.b - Vt3.b}[lane],[Xn]` | `Vt3.16B -> result.val[2]`
`Vt2.16B -> result.val[1]`
`Vt.16B -> result.val[0]` | `A64` | +| int16x4x4_t vld4_lane_s16(
     int16_t const *ptr,
     int16x4x4_t src,
     const int lane)
| `ptr -> Xn`
`src.val[3] -> Vt4.4H`
`src.val[2] -> Vt3.4H`
`src.val[1] -> Vt2.4H`
`src.val[0] -> Vt.4H`
`0 <= lane <= 3` | `LD4 {Vt.h - Vt4.h}[lane],[Xn]` | `Vt4.4H -> result.val[3]`
`Vt3.4H -> result.val[2]`
`Vt2.4H -> result.val[1]`
`Vt.4H -> result.val[0]` | `v7/A32/A64` | +| int16x8x4_t vld4q_lane_s16(
     int16_t const *ptr,
     int16x8x4_t src,
     const int lane)
| `ptr -> Xn`
`src.val[3] -> Vt4.8H`
`src.val[2] -> Vt3.8H`
`src.val[1] -> Vt2.8H`
`src.val[0] -> Vt.8H`
`0 <= lane <= 7` | `LD4 {Vt.h - Vt4.h}[lane],[Xn]` | `Vt4.8H -> result.val[3]`
`Vt3.8H -> result.val[2]`
`Vt2.8H -> result.val[1]`
`Vt.8H -> result.val[0]` | `v7/A32/A64` | +| int32x2x4_t vld4_lane_s32(
     int32_t const *ptr,
     int32x2x4_t src,
     const int lane)
| `ptr -> Xn`
`src.val[3] -> Vt4.2S`
`src.val[2] -> Vt3.2S`
`src.val[1] -> Vt2.2S`
`src.val[0] -> Vt.2S`
`0 <= lane <= 1` | `LD4 {Vt.s - Vt4.s}[lane],[Xn]` | `Vt4.2S -> result.val[3]`
`Vt3.2S -> result.val[2]`
`Vt2.2S -> result.val[1]`
`Vt.2S -> result.val[0]` | `v7/A32/A64` | +| int32x4x4_t vld4q_lane_s32(
     int32_t const *ptr,
     int32x4x4_t src,
     const int lane)
| `ptr -> Xn`
`src.val[3] -> Vt4.4S`
`src.val[2] -> Vt3.4S`
`src.val[1] -> Vt2.4S`
`src.val[0] -> Vt.4S`
`0 <= lane <= 3` | `LD4 {Vt.s - Vt4.s}[lane],[Xn]` | `Vt4.4S -> result.val[3]`
`Vt3.4S -> result.val[2]`
`Vt2.4S -> result.val[1]`
`Vt.4S -> result.val[0]` | `v7/A32/A64` | +| uint16x4x4_t vld4_lane_u16(
     uint16_t const *ptr,
     uint16x4x4_t src,
     const int lane)
| `ptr -> Xn`
`src.val[3] -> Vt4.4H`
`src.val[2] -> Vt3.4H`
`src.val[1] -> Vt2.4H`
`src.val[0] -> Vt.4H`
`0 <= lane <= 3` | `LD4 {Vt.h - Vt4.h}[lane],[Xn]` | `Vt4.4H -> result.val[3]`
`Vt3.4H -> result.val[2]`
`Vt2.4H -> result.val[1]`
`Vt.4H -> result.val[0]` | `v7/A32/A64` | +| uint16x8x4_t vld4q_lane_u16(
     uint16_t const *ptr,
     uint16x8x4_t src,
     const int lane)
| `ptr -> Xn`
`src.val[3] -> Vt4.8H`
`src.val[2] -> Vt3.8H`
`src.val[1] -> Vt2.8H`
`src.val[0] -> Vt.8H`
`0 <= lane <= 7` | `LD4 {Vt.h - Vt4.h}[lane],[Xn]` | `Vt4.8H -> result.val[3]`
`Vt3.8H -> result.val[2]`
`Vt2.8H -> result.val[1]`
`Vt.8H -> result.val[0]` | `v7/A32/A64` | +| uint32x2x4_t vld4_lane_u32(
     uint32_t const *ptr,
     uint32x2x4_t src,
     const int lane)
| `ptr -> Xn`
`src.val[3] -> Vt4.2S`
`src.val[2] -> Vt3.2S`
`src.val[1] -> Vt2.2S`
`src.val[0] -> Vt.2S`
`0 <= lane <= 1` | `LD4 {Vt.s - Vt4.s}[lane],[Xn]` | `Vt4.2S -> result.val[3]`
`Vt3.2S -> result.val[2]`
`Vt2.2S -> result.val[1]`
`Vt.2S -> result.val[0]` | `v7/A32/A64` | +| uint32x4x4_t vld4q_lane_u32(
     uint32_t const *ptr,
     uint32x4x4_t src,
     const int lane)
| `ptr -> Xn`
`src.val[3] -> Vt4.4S`
`src.val[2] -> Vt3.4S`
`src.val[1] -> Vt2.4S`
`src.val[0] -> Vt.4S`
`0 <= lane <= 3` | `LD4 {Vt.s - Vt4.s}[lane],[Xn]` | `Vt4.4S -> result.val[3]`
`Vt3.4S -> result.val[2]`
`Vt2.4S -> result.val[1]`
`Vt.4S -> result.val[0]` | `v7/A32/A64` | +| float16x4x4_t vld4_lane_f16(
     float16_t const *ptr,
     float16x4x4_t src,
     const int lane)
| `ptr -> Xn`
`src.val[3] -> Vt4.4H`
`src.val[2] -> Vt3.4H`
`src.val[1] -> Vt2.4H`
`src.val[0] -> Vt.4H`
`0 <= lane <= 3` | `LD4 {Vt.h - Vt4.h}[lane],[Xn]` | `Vt4.4H -> result.val[3]`
`Vt3.4H -> result.val[2]`
`Vt2.4H -> result.val[1]`
`Vt.4H -> result.val[0]` | `v7/A32/A64` | +| float16x8x4_t vld4q_lane_f16(
     float16_t const *ptr,
     float16x8x4_t src,
     const int lane)
| `ptr -> Xn`
`src.val[3] -> Vt4.8H`
`src.val[2] -> Vt3.8H`
`src.val[1] -> Vt2.8H`
`src.val[0] -> Vt.8H`
`0 <= lane <= 7` | `LD4 {Vt.h - Vt4.h}[lane],[Xn]` | `Vt4.8H -> result.val[3]`
`Vt3.8H -> result.val[2]`
`Vt2.8H -> result.val[1]`
`Vt.8H -> result.val[0]` | `v7/A32/A64` | +| float32x2x4_t vld4_lane_f32(
     float32_t const *ptr,
     float32x2x4_t src,
     const int lane)
| `ptr -> Xn`
`src.val[3] -> Vt4.2S`
`src.val[2] -> Vt3.2S`
`src.val[1] -> Vt2.2S`
`src.val[0] -> Vt.2S`
`0 <= lane <= 1` | `LD4 {Vt.s - Vt4.s}[lane],[Xn]` | `Vt4.2S -> result.val[3]`
`Vt3.2S -> result.val[2]`
`Vt2.2S -> result.val[1]`
`Vt.2S -> result.val[0]` | `v7/A32/A64` | +| float32x4x4_t vld4q_lane_f32(
     float32_t const *ptr,
     float32x4x4_t src,
     const int lane)
| `ptr -> Xn`
`src.val[3] -> Vt4.4S`
`src.val[2] -> Vt3.4S`
`src.val[1] -> Vt2.4S`
`src.val[0] -> Vt.4S`
`0 <= lane <= 3` | `LD4 {Vt.s - Vt4.s}[lane],[Xn]` | `Vt4.4S -> result.val[3]`
`Vt3.4S -> result.val[2]`
`Vt2.4S -> result.val[1]`
`Vt.4S -> result.val[0]` | `v7/A32/A64` | +| poly16x4x4_t vld4_lane_p16(
     poly16_t const *ptr,
     poly16x4x4_t src,
     const int lane)
| `ptr -> Xn`
`src.val[3] -> Vt4.4H`
`src.val[2] -> Vt3.4H`
`src.val[1] -> Vt2.4H`
`src.val[0] -> Vt.4H`
`0 <= lane <= 3` | `LD4 {Vt.h - Vt4.h}[lane],[Xn]` | `Vt4.4H -> result.val[3]`
`Vt3.4H -> result.val[2]`
`Vt2.4H -> result.val[1]`
`Vt.4H -> result.val[0]` | `v7/A32/A64` | +| poly16x8x4_t vld4q_lane_p16(
     poly16_t const *ptr,
     poly16x8x4_t src,
     const int lane)
| `ptr -> Xn`
`src.val[3] -> Vt4.8H`
`src.val[2] -> Vt3.8H`
`src.val[1] -> Vt2.8H`
`src.val[0] -> Vt.8H`
`0 <= lane <= 7` | `LD4 {Vt.h - Vt4.h}[lane],[Xn]` | `Vt4.8H -> result.val[3]`
`Vt3.8H -> result.val[2]`
`Vt2.8H -> result.val[1]`
`Vt.8H -> result.val[0]` | `v7/A32/A64` | +| int8x8x4_t vld4_lane_s8(
     int8_t const *ptr,
     int8x8x4_t src,
     const int lane)
| `ptr -> Xn`
`src.val[3] -> Vt4.8B`
`src.val[2] -> Vt3.8B`
`src.val[1] -> Vt2.8B`
`src.val[0] -> Vt.8B`
`0 <= lane <= 7` | `LD4 {Vt.b - Vt4.b}[lane],[Xn]` | `Vt4.8B -> result.val[3]`
`Vt3.8B -> result.val[2]`
`Vt2.8B -> result.val[1]`
`Vt.8B -> result.val[0]` | `v7/A32/A64` | +| uint8x8x4_t vld4_lane_u8(
     uint8_t const *ptr,
     uint8x8x4_t src,
     const int lane)
| `ptr -> Xn`
`src.val[3] -> Vt4.8B`
`src.val[2] -> Vt3.8B`
`src.val[1] -> Vt2.8B`
`src.val[0] -> Vt.8B`
`0 <= lane <= 7` | `LD4 {Vt.b - Vt4.b}[lane],[Xn]` | `Vt4.8B -> result.val[3]`
`Vt3.8B -> result.val[2]`
`Vt2.8B -> result.val[1]`
`Vt.8B -> result.val[0]` | `v7/A32/A64` | +| poly8x8x4_t vld4_lane_p8(
     poly8_t const *ptr,
     poly8x8x4_t src,
     const int lane)
| `ptr -> Xn`
`src.val[3] -> Vt4.8B`
`src.val[2] -> Vt3.8B`
`src.val[1] -> Vt2.8B`
`src.val[0] -> Vt.8B`
`0 <= lane <= 7` | `LD4 {Vt.b - Vt4.b}[lane],[Xn]` | `Vt4.8B -> result.val[3]`
`Vt3.8B -> result.val[2]`
`Vt2.8B -> result.val[1]`
`Vt.8B -> result.val[0]` | `v7/A32/A64` | +| int8x16x4_t vld4q_lane_s8(
     int8_t const *ptr,
     int8x16x4_t src,
     const int lane)
| `ptr -> Xn`
`src.val[3] -> Vt4.16B`
`src.val[2] -> Vt3.16B`
`src.val[1] -> Vt2.16B`
`src.val[0] -> Vt.16B`
`0 <= lane <= 15` | `LD4 {Vt.b - Vt4.b}[lane],[Xn]` | `Vt4.16B -> result.val[3]`
`Vt3.16B -> result.val[2]`
`Vt2.16B -> result.val[1]`
`Vt.16B -> result.val[0]` | `A64` | +| uint8x16x4_t vld4q_lane_u8(
     uint8_t const *ptr,
     uint8x16x4_t src,
     const int lane)
| `ptr -> Xn`
`src.val[3] -> Vt4.16B`
`src.val[2] -> Vt3.16B`
`src.val[1] -> Vt2.16B`
`src.val[0] -> Vt.16B`
`0 <= lane <= 15` | `LD4 {Vt.b - Vt4.b}[lane],[Xn]` | `Vt4.16B -> result.val[3]`
`Vt3.16B -> result.val[2]`
`Vt2.16B -> result.val[1]`
`Vt.16B -> result.val[0]` | `A64` | +| poly8x16x4_t vld4q_lane_p8(
     poly8_t const *ptr,
     poly8x16x4_t src,
     const int lane)
| `ptr -> Xn`
`src.val[3] -> Vt4.16B`
`src.val[2] -> Vt3.16B`
`src.val[1] -> Vt2.16B`
`src.val[0] -> Vt.16B`
`0 <= lane <= 15` | `LD4 {Vt.b - Vt4.b}[lane],[Xn]` | `Vt4.16B -> result.val[3]`
`Vt3.16B -> result.val[2]`
`Vt2.16B -> result.val[1]`
`Vt.16B -> result.val[0]` | `A64` | +| int64x1x4_t vld4_lane_s64(
     int64_t const *ptr,
     int64x1x4_t src,
     const int lane)
| `ptr -> Xn`
`src.val[3] -> Vt4.1D`
`src.val[2] -> Vt3.1D`
`src.val[1] -> Vt2.1D`
`src.val[0] -> Vt.1D`
`0 <= lane <= 0` | `LD4 {Vt.d - Vt4.d}[lane],[Xn]` | `Vt4.1D -> result.val[3]`
`Vt3.1D -> result.val[2]`
`Vt2.1D -> result.val[1]`
`Vt.1D -> result.val[0]` | `A64` | +| int64x2x4_t vld4q_lane_s64(
     int64_t const *ptr,
     int64x2x4_t src,
     const int lane)
| `ptr -> Xn`
`src.val[3] -> Vt4.2D`
`src.val[2] -> Vt3.2D`
`src.val[1] -> Vt2.2D`
`src.val[0] -> Vt.2D`
`0 <= lane <= 1` | `LD4 {Vt.d - Vt4.d}[lane],[Xn]` | `Vt4.2D -> result.val[3]`
`Vt3.2D -> result.val[2]`
`Vt2.2D -> result.val[1]`
`Vt.2D -> result.val[0]` | `A64` | +| uint64x1x4_t vld4_lane_u64(
     uint64_t const *ptr,
     uint64x1x4_t src,
     const int lane)
| `ptr -> Xn`
`src.val[3] -> Vt4.1D`
`src.val[2] -> Vt3.1D`
`src.val[1] -> Vt2.1D`
`src.val[0] -> Vt.1D`
`0 <= lane <= 0` | `LD4 {Vt.d - Vt4.d}[lane],[Xn]` | `Vt4.1D -> result.val[3]`
`Vt3.1D -> result.val[2]`
`Vt2.1D -> result.val[1]`
`Vt.1D -> result.val[0]` | `A64` | +| uint64x2x4_t vld4q_lane_u64(
     uint64_t const *ptr,
     uint64x2x4_t src,
     const int lane)
| `ptr -> Xn`
`src.val[3] -> Vt4.2D`
`src.val[2] -> Vt3.2D`
`src.val[1] -> Vt2.2D`
`src.val[0] -> Vt.2D`
`0 <= lane <= 1` | `LD4 {Vt.d - Vt4.d}[lane],[Xn]` | `Vt4.2D -> result.val[3]`
`Vt3.2D -> result.val[2]`
`Vt2.2D -> result.val[1]`
`Vt.2D -> result.val[0]` | `A64` | +| poly64x1x4_t vld4_lane_p64(
     poly64_t const *ptr,
     poly64x1x4_t src,
     const int lane)
| `ptr -> Xn`
`src.val[3] -> Vt4.1D`
`src.val[2] -> Vt3.1D`
`src.val[1] -> Vt2.1D`
`src.val[0] -> Vt.1D`
`0 <= lane <= 0` | `LD4 {Vt.d - Vt4.d}[lane],[Xn]` | `Vt4.1D -> result.val[3]`
`Vt3.1D -> result.val[2]`
`Vt2.1D -> result.val[1]`
`Vt.1D -> result.val[0]` | `A64` | +| poly64x2x4_t vld4q_lane_p64(
     poly64_t const *ptr,
     poly64x2x4_t src,
     const int lane)
| `ptr -> Xn`
`src.val[3] -> Vt4.2D`
`src.val[2] -> Vt3.2D`
`src.val[1] -> Vt2.2D`
`src.val[0] -> Vt.2D`
`0 <= lane <= 1` | `LD4 {Vt.d - Vt4.d}[lane],[Xn]` | `Vt4.2D -> result.val[3]`
`Vt3.2D -> result.val[2]`
`Vt2.2D -> result.val[1]`
`Vt.2D -> result.val[0]` | `A64` | +| float64x1x4_t vld4_lane_f64(
     float64_t const *ptr,
     float64x1x4_t src,
     const int lane)
| `ptr -> Xn`
`src.val[3] -> Vt4.1D`
`src.val[2] -> Vt3.1D`
`src.val[1] -> Vt2.1D`
`src.val[0] -> Vt.1D`
`0 <= lane <= 0` | `LD4 {Vt.d - Vt4.d}[lane],[Xn]` | `Vt4.1D -> result.val[3]`
`Vt3.1D -> result.val[2]`
`Vt2.1D -> result.val[1]`
`Vt.1D -> result.val[0]` | `A64` | +| float64x2x4_t vld4q_lane_f64(
     float64_t const *ptr,
     float64x2x4_t src,
     const int lane)
| `ptr -> Xn`
`src.val[3] -> Vt4.2D`
`src.val[2] -> Vt3.2D`
`src.val[1] -> Vt2.2D`
`src.val[0] -> Vt.2D`
`0 <= lane <= 1` | `LD4 {Vt.d - Vt4.d}[lane],[Xn]` | `Vt4.2D -> result.val[3]`
`Vt3.2D -> result.val[2]`
`Vt2.2D -> result.val[1]`
`Vt.2D -> result.val[0]` | `A64` | +| mfloat8x8x4_t vld4_lane_mf8(
     mfloat8_t const *ptr,
     mfloat8x8x4_t src,
     const int lane)
| `ptr -> Xn`
`src.val[3] -> Vt4.8B`
`src.val[2] -> Vt3.8B`
`src.val[1] -> Vt2.8B`
`src.val[0] -> Vt.8B`
`0 <= lane <= 7` | `LD4 {Vt.b - Vt4.b}[lane],[Xn]` | `Vt4.8B -> result.val[3]`
`Vt3.8B -> result.val[2]`
`Vt2.8B -> result.val[1]`
`Vt.8B -> result.val[0]` | `A64` | +| mfloat8x16x4_t vld4q_lane_mf8(
     mfloat8_t const *ptr,
     mfloat8x16x4_t src,
     const int lane)
| `ptr -> Xn`
`src.val[3] -> Vt4.16B`
`src.val[2] -> Vt3.16B`
`src.val[1] -> Vt2.16B`
`src.val[0] -> Vt.16B`
`0 <= lane <= 15` | `LD4 {Vt.b - Vt4.b}[lane],[Xn]` | `Vt4.16B -> result.val[3]`
`Vt3.16B -> result.val[2]`
`Vt2.16B -> result.val[1]`
`Vt.16B -> result.val[0]` | `A64` | +| int8x8x2_t vld1_s8_x2(int8_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.8B - Vt2.8B},[Xn]` | `Vt2.8B -> result.val[1]`
`Vt.8B -> result.val[0]` | `v7/A32/A64` | +| int8x16x2_t vld1q_s8_x2(int8_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.16B - Vt2.16B},[Xn]` | `Vt2.16B -> result.val[1]`
`Vt.16B -> result.val[0]` | `v7/A32/A64` | +| int16x4x2_t vld1_s16_x2(int16_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.4H - Vt2.4H},[Xn]` | `Vt2.4H -> result.val[1]`
`Vt.4H -> result.val[0]` | `v7/A32/A64` | +| int16x8x2_t vld1q_s16_x2(int16_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.8H - Vt2.8H},[Xn]` | `Vt2.8H -> result.val[1]`
`Vt.8H -> result.val[0]` | `v7/A32/A64` | +| int32x2x2_t vld1_s32_x2(int32_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.2S - Vt2.2S},[Xn]` | `Vt2.2S -> result.val[1]`
`Vt.2S -> result.val[0]` | `v7/A32/A64` | +| int32x4x2_t vld1q_s32_x2(int32_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.4S - Vt2.4S},[Xn]` | `Vt2.4S -> result.val[1]`
`Vt.4S -> result.val[0]` | `v7/A32/A64` | +| uint8x8x2_t vld1_u8_x2(uint8_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.8B - Vt2.8B},[Xn]` | `Vt2.8B -> result.val[1]`
`Vt.8B -> result.val[0]` | `v7/A32/A64` | +| uint8x16x2_t vld1q_u8_x2(uint8_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.16B - Vt2.16B},[Xn]` | `Vt2.16B -> result.val[1]`
`Vt.16B -> result.val[0]` | `v7/A32/A64` | +| uint16x4x2_t vld1_u16_x2(uint16_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.4H - Vt2.4H},[Xn]` | `Vt2.4H -> result.val[1]`
`Vt.4H -> result.val[0]` | `v7/A32/A64` | +| uint16x8x2_t vld1q_u16_x2(uint16_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.8H - Vt2.8H},[Xn]` | `Vt2.8H -> result.val[1]`
`Vt.8H -> result.val[0]` | `v7/A32/A64` | +| uint32x2x2_t vld1_u32_x2(uint32_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.2S - Vt2.2S},[Xn]` | `Vt2.2S -> result.val[1]`
`Vt.2S -> result.val[0]` | `v7/A32/A64` | +| uint32x4x2_t vld1q_u32_x2(uint32_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.4S - Vt2.4S},[Xn]` | `Vt2.4S -> result.val[1]`
`Vt.4S -> result.val[0]` | `v7/A32/A64` | +| float16x4x2_t vld1_f16_x2(float16_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.4H - Vt2.4H},[Xn]` | `Vt2.4H -> result.val[1]`
`Vt.4H -> result.val[0]` | `v7/A32/A64` | +| float16x8x2_t vld1q_f16_x2(float16_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.8H - Vt2.8H},[Xn]` | `Vt2.8H -> result.val[1]`
`Vt.8H -> result.val[0]` | `v7/A32/A64` | +| float32x2x2_t vld1_f32_x2(float32_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.2S - Vt2.2S},[Xn]` | `Vt2.2S -> result.val[1]`
`Vt.2S -> result.val[0]` | `v7/A32/A64` | +| float32x4x2_t vld1q_f32_x2(float32_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.4S - Vt2.4S},[Xn]` | `Vt2.4S -> result.val[1]`
`Vt.4S -> result.val[0]` | `v7/A32/A64` | +| poly8x8x2_t vld1_p8_x2(poly8_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.8B - Vt2.8B},[Xn]` | `Vt2.8B -> result.val[1]`
`Vt.8B -> result.val[0]` | `v7/A32/A64` | +| poly8x16x2_t vld1q_p8_x2(poly8_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.16B - Vt2.16B},[Xn]` | `Vt2.16B -> result.val[1]`
`Vt.16B -> result.val[0]` | `v7/A32/A64` | +| poly16x4x2_t vld1_p16_x2(poly16_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.4H - Vt2.4H},[Xn]` | `Vt2.4H -> result.val[1]`
`Vt.4H -> result.val[0]` | `v7/A32/A64` | +| poly16x8x2_t vld1q_p16_x2(poly16_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.8H - Vt2.8H},[Xn]` | `Vt2.8H -> result.val[1]`
`Vt.8H -> result.val[0]` | `v7/A32/A64` | +| int64x1x2_t vld1_s64_x2(int64_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.1D - Vt2.1D},[Xn]` | `Vt2.1D -> result.val[1]`
`Vt.1D -> result.val[0]` | `v7/A32/A64` | +| uint64x1x2_t vld1_u64_x2(uint64_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.1D - Vt2.1D},[Xn]` | `Vt2.1D -> result.val[1]`
`Vt.1D -> result.val[0]` | `v7/A32/A64` | +| poly64x1x2_t vld1_p64_x2(poly64_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.1D - Vt2.1D},[Xn]` | `Vt2.1D -> result.val[1]`
`Vt.1D -> result.val[0]` | `A32/A64` | +| int64x2x2_t vld1q_s64_x2(int64_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.2D - Vt2.2D},[Xn]` | `Vt2.2D -> result.val[1]`
`Vt.2D -> result.val[0]` | `v7/A32/A64` | +| uint64x2x2_t vld1q_u64_x2(uint64_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.2D - Vt2.2D},[Xn]` | `Vt2.2D -> result.val[1]`
`Vt.2D -> result.val[0]` | `v7/A32/A64` | +| poly64x2x2_t vld1q_p64_x2(poly64_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.2D - Vt2.2D},[Xn]` | `Vt2.2D -> result.val[1]`
`Vt.2D -> result.val[0]` | `A32/A64` | +| float64x1x2_t vld1_f64_x2(float64_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.1D - Vt2.1D},[Xn]` | `Vt2.1D -> result.val[1]`
`Vt.1D -> result.val[0]` | `A64` | +| float64x2x2_t vld1q_f64_x2(float64_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.2D - Vt2.2D},[Xn]` | `Vt2.2D -> result.val[1]`
`Vt.2D -> result.val[0]` | `A64` | +| mfloat8x8x2_t vld1_mf8_x2(mfloat8_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.8B - Vt2.8B},[Xn]` | `Vt2.8B -> result.val[1]`
`Vt.8B -> result.val[0]` | `A64` | +| mfloat8x16x2_t vld1q_mf8_x2(mfloat8_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.16B - Vt2.16B},[Xn]` | `Vt2.16B -> result.val[1]`
`Vt.16B -> result.val[0]` | `A64` | +| int8x8x3_t vld1_s8_x3(int8_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.8B - Vt3.8B},[Xn]` | `Vt3.8B -> result.val[2]`
`Vt2.8B -> result.val[1]`
`Vt.8B -> result.val[0]` | `v7/A32/A64` | +| int8x16x3_t vld1q_s8_x3(int8_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.16B - Vt3.16B},[Xn]` | `Vt3.16B -> result.val[2]`
`Vt2.16B -> result.val[1]`
`Vt.16B -> result.val[0]` | `v7/A32/A64` | +| int16x4x3_t vld1_s16_x3(int16_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.4H - Vt3.4H},[Xn]` | `Vt3.4H -> result.val[2]`
`Vt2.4H -> result.val[1]`
`Vt.4H -> result.val[0]` | `v7/A32/A64` | +| int16x8x3_t vld1q_s16_x3(int16_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.8H - Vt3.8H},[Xn]` | `Vt3.8H -> result.val[2]`
`Vt2.8H -> result.val[1]`
`Vt.8H -> result.val[0]` | `v7/A32/A64` | +| int32x2x3_t vld1_s32_x3(int32_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.2S - Vt3.2S},[Xn]` | `Vt3.2S -> result.val[2]`
`Vt2.2S -> result.val[1]`
`Vt.2S -> result.val[0]` | `v7/A32/A64` | +| int32x4x3_t vld1q_s32_x3(int32_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.4S - Vt3.4S},[Xn]` | `Vt3.4S -> result.val[2]`
`Vt2.4S -> result.val[1]`
`Vt.4S -> result.val[0]` | `v7/A32/A64` | +| uint8x8x3_t vld1_u8_x3(uint8_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.8B - Vt3.8B},[Xn]` | `Vt3.8B -> result.val[2]`
`Vt2.8B -> result.val[1]`
`Vt.8B -> result.val[0]` | `v7/A32/A64` | +| uint8x16x3_t vld1q_u8_x3(uint8_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.16B - Vt3.16B},[Xn]` | `Vt3.16B -> result.val[2]`
`Vt2.16B -> result.val[1]`
`Vt.16B -> result.val[0]` | `v7/A32/A64` | +| uint16x4x3_t vld1_u16_x3(uint16_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.4H - Vt3.4H},[Xn]` | `Vt3.4H -> result.val[2]`
`Vt2.4H -> result.val[1]`
`Vt.4H -> result.val[0]` | `v7/A32/A64` | +| uint16x8x3_t vld1q_u16_x3(uint16_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.8H - Vt3.8H},[Xn]` | `Vt3.8H -> result.val[2]`
`Vt2.8H -> result.val[1]`
`Vt.8H -> result.val[0]` | `v7/A32/A64` | +| uint32x2x3_t vld1_u32_x3(uint32_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.2S - Vt3.2S},[Xn]` | `Vt3.2S -> result.val[2]`
`Vt2.2S -> result.val[1]`
`Vt.2S -> result.val[0]` | `v7/A32/A64` | +| uint32x4x3_t vld1q_u32_x3(uint32_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.4S - Vt3.4S},[Xn]` | `Vt3.4S -> result.val[2]`
`Vt2.4S -> result.val[1]`
`Vt.4S -> result.val[0]` | `v7/A32/A64` | +| float16x4x3_t vld1_f16_x3(float16_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.4H - Vt3.4H},[Xn]` | `Vt3.4H -> result.val[2]`
`Vt2.4H -> result.val[1]`
`Vt.4H -> result.val[0]` | `v7/A32/A64` | +| float16x8x3_t vld1q_f16_x3(float16_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.8H - Vt3.8H},[Xn]` | `Vt3.8H -> result.val[2]`
`Vt2.8H -> result.val[1]`
`Vt.8H -> result.val[0]` | `v7/A32/A64` | +| float32x2x3_t vld1_f32_x3(float32_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.2S - Vt3.2S},[Xn]` | `Vt3.2S -> result.val[2]`
`Vt2.2S -> result.val[1]`
`Vt.2S -> result.val[0]` | `v7/A32/A64` | +| float32x4x3_t vld1q_f32_x3(float32_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.4S - Vt3.4S},[Xn]` | `Vt3.4S -> result.val[2]`
`Vt2.4S -> result.val[1]`
`Vt.4S -> result.val[0]` | `v7/A32/A64` | +| poly8x8x3_t vld1_p8_x3(poly8_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.8B - Vt3.8B},[Xn]` | `Vt3.8B -> result.val[2]`
`Vt2.8B -> result.val[1]`
`Vt.8B -> result.val[0]` | `v7/A32/A64` | +| poly8x16x3_t vld1q_p8_x3(poly8_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.16B - Vt3.16B},[Xn]` | `Vt3.16B -> result.val[2]`
`Vt2.16B -> result.val[1]`
`Vt.16B -> result.val[0]` | `v7/A32/A64` | +| poly16x4x3_t vld1_p16_x3(poly16_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.4H - Vt3.4H},[Xn]` | `Vt3.4H -> result.val[2]`
`Vt2.4H -> result.val[1]`
`Vt.4H -> result.val[0]` | `v7/A32/A64` | +| poly16x8x3_t vld1q_p16_x3(poly16_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.8H - Vt3.8H},[Xn]` | `Vt3.8H -> result.val[2]`
`Vt2.8H -> result.val[1]`
`Vt.8H -> result.val[0]` | `v7/A32/A64` | +| int64x1x3_t vld1_s64_x3(int64_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.1D - Vt3.1D},[Xn]` | `Vt3.1D -> result.val[2]`
`Vt2.1D -> result.val[1]`
`Vt.1D -> result.val[0]` | `v7/A32/A64` | +| uint64x1x3_t vld1_u64_x3(uint64_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.1D - Vt3.1D},[Xn]` | `Vt3.1D -> result.val[2]`
`Vt2.1D -> result.val[1]`
`Vt.1D -> result.val[0]` | `v7/A32/A64` | +| poly64x1x3_t vld1_p64_x3(poly64_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.1D - Vt3.1D},[Xn]` | `Vt3.1D -> result.val[2]`
`Vt2.1D -> result.val[1]`
`Vt.1D -> result.val[0]` | `A32/A64` | +| int64x2x3_t vld1q_s64_x3(int64_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.2D - Vt3.2D},[Xn]` | `Vt3.2D -> result.val[2]`
`Vt2.2D -> result.val[1]`
`Vt.2D -> result.val[0]` | `v7/A32/A64` | +| uint64x2x3_t vld1q_u64_x3(uint64_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.2D - Vt3.2D},[Xn]` | `Vt3.2D -> result.val[2]`
`Vt2.2D -> result.val[1]`
`Vt.2D -> result.val[0]` | `v7/A32/A64` | +| poly64x2x3_t vld1q_p64_x3(poly64_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.2D - Vt3.2D},[Xn]` | `Vt3.2D -> result.val[2]`
`Vt2.2D -> result.val[1]`
`Vt.2D -> result.val[0]` | `A32/A64` | +| float64x1x3_t vld1_f64_x3(float64_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.1D - Vt3.1D},[Xn]` | `Vt3.1D -> result.val[2]`
`Vt2.1D -> result.val[1]`
`Vt.1D -> result.val[0]` | `A64` | +| float64x2x3_t vld1q_f64_x3(float64_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.2D - Vt3.2D},[Xn]` | `Vt3.2D -> result.val[2]`
`Vt2.2D -> result.val[1]`
`Vt.2D -> result.val[0]` | `A64` | +| mfloat8x8x3_t vld1_mf8_x3(mfloat8_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.8B - Vt3.8B},[Xn]` | `Vt3.8B -> result.val[2]`
`Vt2.8B -> result.val[1]`
`Vt.8B -> result.val[0]` | `A64` | +| mfloat8x16x3_t vld1q_mf8_x3(mfloat8_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.16B - Vt3.16B},[Xn]` | `Vt3.16B -> result.val[2]`
`Vt2.16B -> result.val[1]`
`Vt.16B -> result.val[0]` | `A64` | +| int8x8x4_t vld1_s8_x4(int8_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.8B - Vt4.8B},[Xn]` | `Vt4.8B -> result.val[3]`
`Vt3.8B -> result.val[2]`
`Vt2.8B -> result.val[1]`
`Vt.8B -> result.val[0]` | `v7/A32/A64` | +| int8x16x4_t vld1q_s8_x4(int8_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.16B - Vt4.16B},[Xn]` | `Vt4.16B -> result.val[3]`
`Vt3.16B -> result.val[2]`
`Vt2.16B -> result.val[1]`
`Vt.16B -> result.val[0]` | `v7/A32/A64` | +| int16x4x4_t vld1_s16_x4(int16_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.4H - Vt4.4H},[Xn]` | `Vt4.4H -> result.val[3]`
`Vt3.4H -> result.val[2]`
`Vt2.4H -> result.val[1]`
`Vt.4H -> result.val[0]` | `v7/A32/A64` | +| int16x8x4_t vld1q_s16_x4(int16_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.8H - Vt4.8H},[Xn]` | `Vt4.8H -> result.val[3]`
`Vt3.8H -> result.val[2]`
`Vt2.8H -> result.val[1]`
`Vt.8H -> result.val[0]` | `v7/A32/A64` | +| int32x2x4_t vld1_s32_x4(int32_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.2S - Vt4.2S},[Xn]` | `Vt4.2S -> result.val[3]`
`Vt3.2S -> result.val[2]`
`Vt2.2S -> result.val[1]`
`Vt.2S -> result.val[0]` | `v7/A32/A64` | +| int32x4x4_t vld1q_s32_x4(int32_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.4S - Vt4.4S},[Xn]` | `Vt4.4S -> result.val[3]`
`Vt3.4S -> result.val[2]`
`Vt2.4S -> result.val[1]`
`Vt.4S -> result.val[0]` | `v7/A32/A64` | +| uint8x8x4_t vld1_u8_x4(uint8_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.8B - Vt4.8B},[Xn]` | `Vt4.8B -> result.val[3]`
`Vt3.8B -> result.val[2]`
`Vt2.8B -> result.val[1]`
`Vt.8B -> result.val[0]` | `v7/A32/A64` | +| uint8x16x4_t vld1q_u8_x4(uint8_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.16B - Vt4.16B},[Xn]` | `Vt4.16B -> result.val[3]`
`Vt3.16B -> result.val[2]`
`Vt2.16B -> result.val[1]`
`Vt.16B -> result.val[0]` | `v7/A32/A64` | +| uint16x4x4_t vld1_u16_x4(uint16_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.4H - Vt4.4H},[Xn]` | `Vt4.4H -> result.val[3]`
`Vt3.4H -> result.val[2]`
`Vt2.4H -> result.val[1]`
`Vt.4H -> result.val[0]` | `v7/A32/A64` | +| uint16x8x4_t vld1q_u16_x4(uint16_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.8H - Vt4.8H},[Xn]` | `Vt4.8H -> result.val[3]`
`Vt3.8H -> result.val[2]`
`Vt2.8H -> result.val[1]`
`Vt.8H -> result.val[0]` | `v7/A32/A64` | +| uint32x2x4_t vld1_u32_x4(uint32_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.2S - Vt4.2S},[Xn]` | `Vt4.2S -> result.val[3]`
`Vt3.2S -> result.val[2]`
`Vt2.2S -> result.val[1]`
`Vt.2S -> result.val[0]` | `v7/A32/A64` | +| uint32x4x4_t vld1q_u32_x4(uint32_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.4S - Vt4.4S},[Xn]` | `Vt4.4S -> result.val[3]`
`Vt3.4S -> result.val[2]`
`Vt2.4S -> result.val[1]`
`Vt.4S -> result.val[0]` | `v7/A32/A64` | +| float16x4x4_t vld1_f16_x4(float16_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.4H - Vt4.4H},[Xn]` | `Vt4.4H -> result.val[3]`
`Vt3.4H -> result.val[2]`
`Vt2.4H -> result.val[1]`
`Vt.4H -> result.val[0]` | `v7/A32/A64` | +| float16x8x4_t vld1q_f16_x4(float16_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.8H - Vt4.8H},[Xn]` | `Vt4.8H -> result.val[3]`
`Vt3.8H -> result.val[2]`
`Vt2.8H -> result.val[1]`
`Vt.8H -> result.val[0]` | `v7/A32/A64` | +| float32x2x4_t vld1_f32_x4(float32_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.2S - Vt4.2S},[Xn]` | `Vt4.2S -> result.val[3]`
`Vt3.2S -> result.val[2]`
`Vt2.2S -> result.val[1]`
`Vt.2S -> result.val[0]` | `v7/A32/A64` | +| float32x4x4_t vld1q_f32_x4(float32_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.4S - Vt4.4S},[Xn]` | `Vt4.4S -> result.val[3]`
`Vt3.4S -> result.val[2]`
`Vt2.4S -> result.val[1]`
`Vt.4S -> result.val[0]` | `v7/A32/A64` | +| poly8x8x4_t vld1_p8_x4(poly8_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.8B - Vt4.8B},[Xn]` | `Vt4.8B -> result.val[3]`
`Vt3.8B -> result.val[2]`
`Vt2.8B -> result.val[1]`
`Vt.8B -> result.val[0]` | `v7/A32/A64` | +| poly8x16x4_t vld1q_p8_x4(poly8_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.16B - Vt4.16B},[Xn]` | `Vt4.16B -> result.val[3]`
`Vt3.16B -> result.val[2]`
`Vt2.16B -> result.val[1]`
`Vt.16B -> result.val[0]` | `v7/A32/A64` | +| poly16x4x4_t vld1_p16_x4(poly16_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.4H - Vt4.4H},[Xn]` | `Vt4.4H -> result.val[3]`
`Vt3.4H -> result.val[2]`
`Vt2.4H -> result.val[1]`
`Vt.4H -> result.val[0]` | `v7/A32/A64` | +| poly16x8x4_t vld1q_p16_x4(poly16_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.8H - Vt4.8H},[Xn]` | `Vt4.8H -> result.val[3]`
`Vt3.8H -> result.val[2]`
`Vt2.8H -> result.val[1]`
`Vt.8H -> result.val[0]` | `v7/A32/A64` | +| int64x1x4_t vld1_s64_x4(int64_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.1D - Vt4.1D},[Xn]` | `Vt4.1D -> result.val[3]`
`Vt3.1D -> result.val[2]`
`Vt2.1D -> result.val[1]`
`Vt.1D -> result.val[0]` | `v7/A32/A64` | +| uint64x1x4_t vld1_u64_x4(uint64_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.1D - Vt4.1D},[Xn]` | `Vt4.1D -> result.val[3]`
`Vt3.1D -> result.val[2]`
`Vt2.1D -> result.val[1]`
`Vt.1D -> result.val[0]` | `v7/A32/A64` | +| poly64x1x4_t vld1_p64_x4(poly64_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.1D - Vt4.1D},[Xn]` | `Vt4.1D -> result.val[3]`
`Vt3.1D -> result.val[2]`
`Vt2.1D -> result.val[1]`
`Vt.1D -> result.val[0]` | `A32/A64` | +| int64x2x4_t vld1q_s64_x4(int64_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.2D - Vt4.2D},[Xn]` | `Vt4.2D -> result.val[3]`
`Vt3.2D -> result.val[2]`
`Vt2.2D -> result.val[1]`
`Vt.2D -> result.val[0]` | `v7/A32/A64` | +| uint64x2x4_t vld1q_u64_x4(uint64_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.2D - Vt4.2D},[Xn]` | `Vt4.2D -> result.val[3]`
`Vt3.2D -> result.val[2]`
`Vt2.2D -> result.val[1]`
`Vt.2D -> result.val[0]` | `v7/A32/A64` | +| poly64x2x4_t vld1q_p64_x4(poly64_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.2D - Vt4.2D},[Xn]` | `Vt4.2D -> result.val[3]`
`Vt3.2D -> result.val[2]`
`Vt2.2D -> result.val[1]`
`Vt.2D -> result.val[0]` | `A32/A64` | +| float64x1x4_t vld1_f64_x4(float64_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.1D - Vt4.1D},[Xn]` | `Vt4.1D -> result.val[3]`
`Vt3.1D -> result.val[2]`
`Vt2.1D -> result.val[1]`
`Vt.1D -> result.val[0]` | `A64` | +| float64x2x4_t vld1q_f64_x4(float64_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.2D - Vt4.2D},[Xn]` | `Vt4.2D -> result.val[3]`
`Vt3.2D -> result.val[2]`
`Vt2.2D -> result.val[1]`
`Vt.2D -> result.val[0]` | `A64` | +| mfloat8x8x4_t vld1_mf8_x4(mfloat8_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.8B - Vt4.8B},[Xn]` | `Vt4.8B -> result.val[3]`
`Vt3.8B -> result.val[2]`
`Vt2.8B -> result.val[1]`
`Vt.8B -> result.val[0]` | `A64` | +| mfloat8x16x4_t vld1q_mf8_x4(mfloat8_t const *ptr) | `ptr -> Xn` | `LD1 {Vt.16B - Vt4.16B},[Xn]` | `Vt4.16B -> result.val[3]`
`Vt3.16B -> result.val[2]`
`Vt2.16B -> result.val[1]`
`Vt.16B -> result.val[0]` | `A64` | #### Load @@ -4105,316 +4244,338 @@ The intrinsics in this section are guarded by the macro ``__ARM_NEON``. #### Stride -| Intrinsic | Argument preparation | AArch64 Instruction | Result | Supported architectures | -|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------|----------|---------------------------| -| void vst1_s8(
     int8_t *ptr,
     int8x8_t val)
| `val -> Vt.8B`
`ptr -> Xn` | `ST1 {Vt.8B},[Xn]` | | `v7/A32/A64` | -| void vst1q_s8(
     int8_t *ptr,
     int8x16_t val)
| `val -> Vt.16B`
`ptr -> Xn` | `ST1 {Vt.16B},[Xn]` | | `v7/A32/A64` | -| void vst1_s16(
     int16_t *ptr,
     int16x4_t val)
| `val -> Vt.4H`
`ptr -> Xn` | `ST1 {Vt.4H},[Xn]` | | `v7/A32/A64` | -| void vst1q_s16(
     int16_t *ptr,
     int16x8_t val)
| `val -> Vt.8H`
`ptr -> Xn` | `ST1 {Vt.8H},[Xn]` | | `v7/A32/A64` | -| void vst1_s32(
     int32_t *ptr,
     int32x2_t val)
| `val -> Vt.2S`
`ptr -> Xn` | `ST1 {Vt.2S},[Xn]` | | `v7/A32/A64` | -| void vst1q_s32(
     int32_t *ptr,
     int32x4_t val)
| `val -> Vt.4S`
`ptr -> Xn` | `ST1 {Vt.4S},[Xn]` | | `v7/A32/A64` | -| void vst1_s64(
     int64_t *ptr,
     int64x1_t val)
| `val -> Vt.1D`
`ptr -> Xn` | `ST1 {Vt.1D},[Xn]` | | `v7/A32/A64` | -| void vst1q_s64(
     int64_t *ptr,
     int64x2_t val)
| `val -> Vt.2D`
`ptr -> Xn` | `ST1 {Vt.2D},[Xn]` | | `v7/A32/A64` | -| void vst1_u8(
     uint8_t *ptr,
     uint8x8_t val)
| `val -> Vt.8B`
`ptr -> Xn` | `ST1 {Vt.8B},[Xn]` | | `v7/A32/A64` | -| void vst1q_u8(
     uint8_t *ptr,
     uint8x16_t val)
| `val -> Vt.16B`
`ptr -> Xn` | `ST1 {Vt.16B},[Xn]` | | `v7/A32/A64` | -| void vst1_u16(
     uint16_t *ptr,
     uint16x4_t val)
| `val -> Vt.4H`
`ptr -> Xn` | `ST1 {Vt.4H},[Xn]` | | `v7/A32/A64` | -| void vst1q_u16(
     uint16_t *ptr,
     uint16x8_t val)
| `val -> Vt.8H`
`ptr -> Xn` | `ST1 {Vt.8H},[Xn]` | | `v7/A32/A64` | -| void vst1_u32(
     uint32_t *ptr,
     uint32x2_t val)
| `val -> Vt.2S`
`ptr -> Xn` | `ST1 {Vt.2S},[Xn]` | | `v7/A32/A64` | -| void vst1q_u32(
     uint32_t *ptr,
     uint32x4_t val)
| `val -> Vt.4S`
`ptr -> Xn` | `ST1 {Vt.4S},[Xn]` | | `v7/A32/A64` | -| void vst1_u64(
     uint64_t *ptr,
     uint64x1_t val)
| `val -> Vt.1D`
`ptr -> Xn` | `ST1 {Vt.1D},[Xn]` | | `v7/A32/A64` | -| void vst1q_u64(
     uint64_t *ptr,
     uint64x2_t val)
| `val -> Vt.2D`
`ptr -> Xn` | `ST1 {Vt.2D},[Xn]` | | `v7/A32/A64` | -| void vst1_p64(
     poly64_t *ptr,
     poly64x1_t val)
| `val -> Vt.1D`
`ptr -> Xn` | `ST1 {Vt.1D},[Xn]` | | `A32/A64` | -| void vst1q_p64(
     poly64_t *ptr,
     poly64x2_t val)
| `val -> Vt.2D`
`ptr -> Xn` | `ST1 {Vt.2D},[Xn]` | | `A32/A64` | -| void vst1_f16(
     float16_t *ptr,
     float16x4_t val)
| `val -> Vt.4H`
`ptr -> Xn` | `ST1 {Vt.4H},[Xn]` | | `v7/A32/A64` | -| void vst1q_f16(
     float16_t *ptr,
     float16x8_t val)
| `val -> Vt.8H`
`ptr -> Xn` | `ST1 {Vt.8H},[Xn]` | | `v7/A32/A64` | -| void vst1_f32(
     float32_t *ptr,
     float32x2_t val)
| `val -> Vt.2S`
`ptr -> Xn` | `ST1 {Vt.2S},[Xn]` | | `v7/A32/A64` | -| void vst1q_f32(
     float32_t *ptr,
     float32x4_t val)
| `val -> Vt.4S`
`ptr -> Xn` | `ST1 {Vt.4S},[Xn]` | | `v7/A32/A64` | -| void vst1_p8(
     poly8_t *ptr,
     poly8x8_t val)
| `val -> Vt.8B`
`ptr -> Xn` | `ST1 {Vt.8B},[Xn]` | | `v7/A32/A64` | -| void vst1q_p8(
     poly8_t *ptr,
     poly8x16_t val)
| `val -> Vt.16B`
`ptr -> Xn` | `ST1 {Vt.16B},[Xn]` | | `v7/A32/A64` | -| void vst1_p16(
     poly16_t *ptr,
     poly16x4_t val)
| `val -> Vt.4H`
`ptr -> Xn` | `ST1 {Vt.4H},[Xn]` | | `v7/A32/A64` | -| void vst1q_p16(
     poly16_t *ptr,
     poly16x8_t val)
| `val -> Vt.8H`
`ptr -> Xn` | `ST1 {Vt.8H},[Xn]` | | `v7/A32/A64` | -| void vst1_f64(
     float64_t *ptr,
     float64x1_t val)
| `val -> Vt.1D`
`ptr -> Xn` | `ST1 {Vt.1D},[Xn]` | | `A64` | -| void vst1q_f64(
     float64_t *ptr,
     float64x2_t val)
| `val -> Vt.2D`
`ptr -> Xn` | `ST1 {Vt.2D},[Xn]` | | `A64` | -| void vst1_lane_s8(
     int8_t *ptr,
     int8x8_t val,
     const int lane)
| `val -> Vt.8B`
`ptr -> Xn`
`0 <= lane <= 7` | `ST1 {Vt.b}[lane],[Xn]` | | `v7/A32/A64` | -| void vst1q_lane_s8(
     int8_t *ptr,
     int8x16_t val,
     const int lane)
| `val -> Vt.16B`
`ptr -> Xn`
`0 <= lane <= 15` | `ST1 {Vt.b}[lane],[Xn]` | | `v7/A32/A64` | -| void vst1_lane_s16(
     int16_t *ptr,
     int16x4_t val,
     const int lane)
| `val -> Vt.4H`
`ptr -> Xn`
`0 <= lane <= 3` | `ST1 {Vt.h}[lane],[Xn]` | | `v7/A32/A64` | -| void vst1q_lane_s16(
     int16_t *ptr,
     int16x8_t val,
     const int lane)
| `val -> Vt.8H`
`ptr -> Xn`
`0 <= lane <= 7` | `ST1 {Vt.h}[lane],[Xn]` | | `v7/A32/A64` | -| void vst1_lane_s32(
     int32_t *ptr,
     int32x2_t val,
     const int lane)
| `val -> Vt.2S`
`ptr -> Xn`
`0 <= lane <= 1` | `ST1 {Vt.s}[lane],[Xn]` | | `v7/A32/A64` | -| void vst1q_lane_s32(
     int32_t *ptr,
     int32x4_t val,
     const int lane)
| `val -> Vt.4S`
`ptr -> Xn`
`0 <= lane <= 3` | `ST1 {Vt.s}[lane],[Xn]` | | `v7/A32/A64` | -| void vst1_lane_s64(
     int64_t *ptr,
     int64x1_t val,
     const int lane)
| `val -> Vt.1D`
`ptr -> Xn`
`0 <= lane <= 0` | `ST1 {Vt.d}[lane],[Xn]` | | `v7/A32/A64` | -| void vst1q_lane_s64(
     int64_t *ptr,
     int64x2_t val,
     const int lane)
| `val -> Vt.2D`
`ptr -> Xn`
`0 <= lane <= 1` | `ST1 {Vt.d}[lane],[Xn]` | | `v7/A32/A64` | -| void vst1_lane_u8(
     uint8_t *ptr,
     uint8x8_t val,
     const int lane)
| `val -> Vt.8B`
`ptr -> Xn`
`0 <= lane <= 7` | `ST1 {Vt.b}[lane],[Xn]` | | `v7/A32/A64` | -| void vst1q_lane_u8(
     uint8_t *ptr,
     uint8x16_t val,
     const int lane)
| `val -> Vt.16B`
`ptr -> Xn`
`0 <= lane <= 15` | `ST1 {Vt.b}[lane],[Xn]` | | `v7/A32/A64` | -| void vst1_lane_u16(
     uint16_t *ptr,
     uint16x4_t val,
     const int lane)
| `val -> Vt.4H`
`ptr -> Xn`
`0 <= lane <= 3` | `ST1 {Vt.h}[lane],[Xn]` | | `v7/A32/A64` | -| void vst1q_lane_u16(
     uint16_t *ptr,
     uint16x8_t val,
     const int lane)
| `val -> Vt.8H`
`ptr -> Xn`
`0 <= lane <= 7` | `ST1 {Vt.h}[lane],[Xn]` | | `v7/A32/A64` | -| void vst1_lane_u32(
     uint32_t *ptr,
     uint32x2_t val,
     const int lane)
| `val -> Vt.2S`
`ptr -> Xn`
`0 <= lane <= 1` | `ST1 {Vt.s}[lane],[Xn]` | | `v7/A32/A64` | -| void vst1q_lane_u32(
     uint32_t *ptr,
     uint32x4_t val,
     const int lane)
| `val -> Vt.4S`
`ptr -> Xn`
`0 <= lane <= 3` | `ST1 {Vt.s}[lane],[Xn]` | | `v7/A32/A64` | -| void vst1_lane_u64(
     uint64_t *ptr,
     uint64x1_t val,
     const int lane)
| `val -> Vt.1D`
`ptr -> Xn`
`0 <= lane <= 0` | `ST1 {Vt.d}[lane],[Xn]` | | `v7/A32/A64` | -| void vst1q_lane_u64(
     uint64_t *ptr,
     uint64x2_t val,
     const int lane)
| `val -> Vt.2D`
`ptr -> Xn`
`0 <= lane <= 1` | `ST1 {Vt.d}[lane],[Xn]` | | `v7/A32/A64` | -| void vst1_lane_p64(
     poly64_t *ptr,
     poly64x1_t val,
     const int lane)
| `val -> Vt.1D`
`ptr -> Xn`
`0 <= lane <= 0` | `ST1 {Vt.d}[lane],[Xn]` | | `A32/A64` | -| void vst1q_lane_p64(
     poly64_t *ptr,
     poly64x2_t val,
     const int lane)
| `val -> Vt.2D`
`ptr -> Xn`
`0 <= lane <= 1` | `ST1 {Vt.d}[lane],[Xn]` | | `A32/A64` | -| void vst1_lane_f16(
     float16_t *ptr,
     float16x4_t val,
     const int lane)
| `val -> Vt.4H`
`ptr -> Xn`
`0 <= lane <= 3` | `ST1 {Vt.h}[lane],[Xn]` | | `v7/A32/A64` | -| void vst1q_lane_f16(
     float16_t *ptr,
     float16x8_t val,
     const int lane)
| `val -> Vt.8H`
`ptr -> Xn`
`0 <= lane <= 7` | `ST1 {Vt.h}[lane],[Xn]` | | `v7/A32/A64` | -| void vst1_lane_f32(
     float32_t *ptr,
     float32x2_t val,
     const int lane)
| `val -> Vt.2S`
`ptr -> Xn`
`0 <= lane <= 1` | `ST1 {Vt.s}[lane],[Xn]` | | `v7/A32/A64` | -| void vst1q_lane_f32(
     float32_t *ptr,
     float32x4_t val,
     const int lane)
| `val -> Vt.4S`
`ptr -> Xn`
`0 <= lane <= 3` | `ST1 {Vt.s}[lane],[Xn]` | | `v7/A32/A64` | -| void vst1_lane_p8(
     poly8_t *ptr,
     poly8x8_t val,
     const int lane)
| `val -> Vt.8B`
`ptr -> Xn`
`0 <= lane <= 7` | `ST1 {Vt.b}[lane],[Xn]` | | `v7/A32/A64` | -| void vst1q_lane_p8(
     poly8_t *ptr,
     poly8x16_t val,
     const int lane)
| `val -> Vt.16B`
`ptr -> Xn`
`0 <= lane <= 15` | `ST1 {Vt.b}[lane],[Xn]` | | `v7/A32/A64` | -| void vst1_lane_p16(
     poly16_t *ptr,
     poly16x4_t val,
     const int lane)
| `val -> Vt.4H`
`ptr -> Xn`
`0 <= lane <= 3` | `ST1 {Vt.h}[lane],[Xn]` | | `v7/A32/A64` | -| void vst1q_lane_p16(
     poly16_t *ptr,
     poly16x8_t val,
     const int lane)
| `val -> Vt.8H`
`ptr -> Xn`
`0 <= lane <= 7` | `ST1 {Vt.h}[lane],[Xn]` | | `v7/A32/A64` | -| void vst1_lane_f64(
     float64_t *ptr,
     float64x1_t val,
     const int lane)
| `val -> Vt.1D`
`ptr -> Xn`
`0 <= lane <= 0` | `ST1 {Vt.d}[lane],[Xn]` | | `A64` | -| void vst1q_lane_f64(
     float64_t *ptr,
     float64x2_t val,
     const int lane)
| `val -> Vt.2D`
`ptr -> Xn`
`0 <= lane <= 1` | `ST1 {Vt.d}[lane],[Xn]` | | `A64` | -| void vst2_s8(
     int8_t *ptr,
     int8x8x2_t val)
| `val.val[1] -> Vt2.8B`
`val.val[0] -> Vt.8B`
`ptr -> Xn` | `ST2 {Vt.8B - Vt2.8B},[Xn]` | | `v7/A32/A64` | -| void vst2q_s8(
     int8_t *ptr,
     int8x16x2_t val)
| `val.val[1] -> Vt2.16B`
`val.val[0] -> Vt.16B`
`ptr -> Xn` | `ST2 {Vt.16B - Vt2.16B},[Xn]` | | `v7/A32/A64` | -| void vst2_s16(
     int16_t *ptr,
     int16x4x2_t val)
| `val.val[1] -> Vt2.4H`
`val.val[0] -> Vt.4H`
`ptr -> Xn` | `ST2 {Vt.4H - Vt2.4H},[Xn]` | | `v7/A32/A64` | -| void vst2q_s16(
     int16_t *ptr,
     int16x8x2_t val)
| `val.val[1] -> Vt2.8H`
`val.val[0] -> Vt.8H`
`ptr -> Xn` | `ST2 {Vt.8H - Vt2.8H},[Xn]` | | `v7/A32/A64` | -| void vst2_s32(
     int32_t *ptr,
     int32x2x2_t val)
| `val.val[1] -> Vt2.2S`
`val.val[0] -> Vt.2S`
`ptr -> Xn` | `ST2 {Vt.2S - Vt2.2S},[Xn]` | | `v7/A32/A64` | -| void vst2q_s32(
     int32_t *ptr,
     int32x4x2_t val)
| `val.val[1] -> Vt2.4S`
`val.val[0] -> Vt.4S`
`ptr -> Xn` | `ST2 {Vt.4S - Vt2.4S},[Xn]` | | `v7/A32/A64` | -| void vst2_u8(
     uint8_t *ptr,
     uint8x8x2_t val)
| `val.val[1] -> Vt2.8B`
`val.val[0] -> Vt.8B`
`ptr -> Xn` | `ST2 {Vt.8B - Vt2.8B},[Xn]` | | `v7/A32/A64` | -| void vst2q_u8(
     uint8_t *ptr,
     uint8x16x2_t val)
| `val.val[1] -> Vt2.16B`
`val.val[0] -> Vt.16B`
`ptr -> Xn` | `ST2 {Vt.16B - Vt2.16B},[Xn]` | | `v7/A32/A64` | -| void vst2_u16(
     uint16_t *ptr,
     uint16x4x2_t val)
| `val.val[1] -> Vt2.4H`
`val.val[0] -> Vt.4H`
`ptr -> Xn` | `ST2 {Vt.4H - Vt2.4H},[Xn]` | | `v7/A32/A64` | -| void vst2q_u16(
     uint16_t *ptr,
     uint16x8x2_t val)
| `val.val[1] -> Vt2.8H`
`val.val[0] -> Vt.8H`
`ptr -> Xn` | `ST2 {Vt.8H - Vt2.8H},[Xn]` | | `v7/A32/A64` | -| void vst2_u32(
     uint32_t *ptr,
     uint32x2x2_t val)
| `val.val[1] -> Vt2.2S`
`val.val[0] -> Vt.2S`
`ptr -> Xn` | `ST2 {Vt.2S - Vt2.2S},[Xn]` | | `v7/A32/A64` | -| void vst2q_u32(
     uint32_t *ptr,
     uint32x4x2_t val)
| `val.val[1] -> Vt2.4S`
`val.val[0] -> Vt.4S`
`ptr -> Xn` | `ST2 {Vt.4S - Vt2.4S},[Xn]` | | `v7/A32/A64` | -| void vst2_f16(
     float16_t *ptr,
     float16x4x2_t val)
| `val.val[1] -> Vt2.4H`
`val.val[0] -> Vt.4H`
`ptr -> Xn` | `ST2 {Vt.4H - Vt2.4H},[Xn]` | | `v7/A32/A64` | -| void vst2q_f16(
     float16_t *ptr,
     float16x8x2_t val)
| `val.val[1] -> Vt2.8H`
`val.val[0] -> Vt.8H`
`ptr -> Xn` | `ST2 {Vt.8H - Vt2.8H},[Xn]` | | `v7/A32/A64` | -| void vst2_f32(
     float32_t *ptr,
     float32x2x2_t val)
| `val.val[1] -> Vt2.2S`
`val.val[0] -> Vt.2S`
`ptr -> Xn` | `ST2 {Vt.2S - Vt2.2S},[Xn]` | | `v7/A32/A64` | -| void vst2q_f32(
     float32_t *ptr,
     float32x4x2_t val)
| `val.val[1] -> Vt2.4S`
`val.val[0] -> Vt.4S`
`ptr -> Xn` | `ST2 {Vt.4S - Vt2.4S},[Xn]` | | `v7/A32/A64` | -| void vst2_p8(
     poly8_t *ptr,
     poly8x8x2_t val)
| `val.val[1] -> Vt2.8B`
`val.val[0] -> Vt.8B`
`ptr -> Xn` | `ST2 {Vt.8B - Vt2.8B},[Xn]` | | `v7/A32/A64` | -| void vst2q_p8(
     poly8_t *ptr,
     poly8x16x2_t val)
| `val.val[1] -> Vt2.16B`
`val.val[0] -> Vt.16B`
`ptr -> Xn` | `ST2 {Vt.16B - Vt2.16B},[Xn]` | | `v7/A32/A64` | -| void vst2_p16(
     poly16_t *ptr,
     poly16x4x2_t val)
| `val.val[1] -> Vt2.4H`
`val.val[0] -> Vt.4H`
`ptr -> Xn` | `ST2 {Vt.4H - Vt2.4H},[Xn]` | | `v7/A32/A64` | -| void vst2q_p16(
     poly16_t *ptr,
     poly16x8x2_t val)
| `val.val[1] -> Vt2.8H`
`val.val[0] -> Vt.8H`
`ptr -> Xn` | `ST2 {Vt.8H - Vt2.8H},[Xn]` | | `v7/A32/A64` | -| void vst2_s64(
     int64_t *ptr,
     int64x1x2_t val)
| `val.val[1] -> Vt2.1D`
`val.val[0] -> Vt.1D`
`ptr -> Xn` | `ST1 {Vt.1D - Vt2.1D},[Xn]` | | `v7/A32/A64` | -| void vst2_u64(
     uint64_t *ptr,
     uint64x1x2_t val)
| `val.val[1] -> Vt2.1D`
`val.val[0] -> Vt.1D`
`ptr -> Xn` | `ST1 {Vt.1D - Vt2.1D},[Xn]` | | `v7/A32/A64` | -| void vst2_p64(
     poly64_t *ptr,
     poly64x1x2_t val)
| `val.val[1] -> Vt2.1D`
`val.val[0] -> Vt.1D`
`ptr -> Xn` | `ST1 {Vt.1D - Vt2.1D},[Xn]` | | `A32/A64` | -| void vst2q_s64(
     int64_t *ptr,
     int64x2x2_t val)
| `val.val[1] -> Vt2.2D`
`val.val[0] -> Vt.2D`
`ptr -> Xn` | `ST2 {Vt.2D - Vt2.2D},[Xn]` | | `A64` | -| void vst2q_u64(
     uint64_t *ptr,
     uint64x2x2_t val)
| `val.val[1] -> Vt2.2D`
`val.val[0] -> Vt.2D`
`ptr -> Xn` | `ST2 {Vt.2D - Vt2.2D},[Xn]` | | `A64` | -| void vst2q_p64(
     poly64_t *ptr,
     poly64x2x2_t val)
| `val.val[1] -> Vt2.2D`
`val.val[0] -> Vt.2D`
`ptr -> Xn` | `ST2 {Vt.2D - Vt2.2D},[Xn]` | | `A64` | -| void vst2_f64(
     float64_t *ptr,
     float64x1x2_t val)
| `val.val[1] -> Vt2.1D`
`val.val[0] -> Vt.1D`
`ptr -> Xn` | `ST1 {Vt.1D - Vt2.1D},[Xn]` | | `A64` | -| void vst2q_f64(
     float64_t *ptr,
     float64x2x2_t val)
| `val.val[1] -> Vt2.2D`
`val.val[0] -> Vt.2D`
`ptr -> Xn` | `ST2 {Vt.2D - Vt2.2D},[Xn]` | | `A64` | -| void vst3_s8(
     int8_t *ptr,
     int8x8x3_t val)
| `val.val[2] -> Vt3.8B`
`val.val[1] -> Vt2.8B`
`val.val[0] -> Vt.8B`
`ptr -> Xn` | `ST3 {Vt.8B - Vt3.8B},[Xn]` | | `v7/A32/A64` | -| void vst3q_s8(
     int8_t *ptr,
     int8x16x3_t val)
| `val.val[2] -> Vt3.16B`
`val.val[1] -> Vt2.16B`
`val.val[0] -> Vt.16B`
`ptr -> Xn` | `ST3 {Vt.16B - Vt3.16B},[Xn]` | | `v7/A32/A64` | -| void vst3_s16(
     int16_t *ptr,
     int16x4x3_t val)
| `val.val[2] -> Vt3.4H`
`val.val[1] -> Vt2.4H`
`val.val[0] -> Vt.4H`
`ptr -> Xn` | `ST3 {Vt.4H - Vt3.4H},[Xn]` | | `v7/A32/A64` | -| void vst3q_s16(
     int16_t *ptr,
     int16x8x3_t val)
| `val.val[2] -> Vt3.8H`
`val.val[1] -> Vt2.8H`
`val.val[0] -> Vt.8H`
`ptr -> Xn` | `ST3 {Vt.8H - Vt3.8H},[Xn]` | | `v7/A32/A64` | -| void vst3_s32(
     int32_t *ptr,
     int32x2x3_t val)
| `val.val[2] -> Vt3.2S`
`val.val[1] -> Vt2.2S`
`val.val[0] -> Vt.2S`
`ptr -> Xn` | `ST3 {Vt.2S - Vt3.2S},[Xn]` | | `v7/A32/A64` | -| void vst3q_s32(
     int32_t *ptr,
     int32x4x3_t val)
| `val.val[2] -> Vt3.4S`
`val.val[1] -> Vt2.4S`
`val.val[0] -> Vt.4S`
`ptr -> Xn` | `ST3 {Vt.4S - Vt3.4S},[Xn]` | | `v7/A32/A64` | -| void vst3_u8(
     uint8_t *ptr,
     uint8x8x3_t val)
| `val.val[2] -> Vt3.8B`
`val.val[1] -> Vt2.8B`
`val.val[0] -> Vt.8B`
`ptr -> Xn` | `ST3 {Vt.8B - Vt3.8B},[Xn]` | | `v7/A32/A64` | -| void vst3q_u8(
     uint8_t *ptr,
     uint8x16x3_t val)
| `val.val[2] -> Vt3.16B`
`val.val[1] -> Vt2.16B`
`val.val[0] -> Vt.16B`
`ptr -> Xn` | `ST3 {Vt.16B - Vt3.16B},[Xn]` | | `v7/A32/A64` | -| void vst3_u16(
     uint16_t *ptr,
     uint16x4x3_t val)
| `val.val[2] -> Vt3.4H`
`val.val[1] -> Vt2.4H`
`val.val[0] -> Vt.4H`
`ptr -> Xn` | `ST3 {Vt.4H - Vt3.4H},[Xn]` | | `v7/A32/A64` | -| void vst3q_u16(
     uint16_t *ptr,
     uint16x8x3_t val)
| `val.val[2] -> Vt3.8H`
`val.val[1] -> Vt2.8H`
`val.val[0] -> Vt.8H`
`ptr -> Xn` | `ST3 {Vt.8H - Vt3.8H},[Xn]` | | `v7/A32/A64` | -| void vst3_u32(
     uint32_t *ptr,
     uint32x2x3_t val)
| `val.val[2] -> Vt3.2S`
`val.val[1] -> Vt2.2S`
`val.val[0] -> Vt.2S`
`ptr -> Xn` | `ST3 {Vt.2S - Vt3.2S},[Xn]` | | `v7/A32/A64` | -| void vst3q_u32(
     uint32_t *ptr,
     uint32x4x3_t val)
| `val.val[2] -> Vt3.4S`
`val.val[1] -> Vt2.4S`
`val.val[0] -> Vt.4S`
`ptr -> Xn` | `ST3 {Vt.4S - Vt3.4S},[Xn]` | | `v7/A32/A64` | -| void vst3_f16(
     float16_t *ptr,
     float16x4x3_t val)
| `val.val[2] -> Vt3.4H`
`val.val[1] -> Vt2.4H`
`val.val[0] -> Vt.4H`
`ptr -> Xn` | `ST3 {Vt.4H - Vt3.4H},[Xn]` | | `v7/A32/A64` | -| void vst3q_f16(
     float16_t *ptr,
     float16x8x3_t val)
| `val.val[2] -> Vt3.8H`
`val.val[1] -> Vt2.8H`
`val.val[0] -> Vt.8H`
`ptr -> Xn` | `ST3 {Vt.8H - Vt3.8H},[Xn]` | | `v7/A32/A64` | -| void vst3_f32(
     float32_t *ptr,
     float32x2x3_t val)
| `val.val[2] -> Vt3.2S`
`val.val[1] -> Vt2.2S`
`val.val[0] -> Vt.2S`
`ptr -> Xn` | `ST3 {Vt.2S - Vt3.2S},[Xn]` | | `v7/A32/A64` | -| void vst3q_f32(
     float32_t *ptr,
     float32x4x3_t val)
| `val.val[2] -> Vt3.4S`
`val.val[1] -> Vt2.4S`
`val.val[0] -> Vt.4S`
`ptr -> Xn` | `ST3 {Vt.4S - Vt3.4S},[Xn]` | | `v7/A32/A64` | -| void vst3_p8(
     poly8_t *ptr,
     poly8x8x3_t val)
| `val.val[2] -> Vt3.8B`
`val.val[1] -> Vt2.8B`
`val.val[0] -> Vt.8B`
`ptr -> Xn` | `ST3 {Vt.8B - Vt3.8B},[Xn]` | | `v7/A32/A64` | -| void vst3q_p8(
     poly8_t *ptr,
     poly8x16x3_t val)
| `val.val[2] -> Vt3.16B`
`val.val[1] -> Vt2.16B`
`val.val[0] -> Vt.16B`
`ptr -> Xn` | `ST3 {Vt.16B - Vt3.16B},[Xn]` | | `v7/A32/A64` | -| void vst3_p16(
     poly16_t *ptr,
     poly16x4x3_t val)
| `val.val[2] -> Vt3.4H`
`val.val[1] -> Vt2.4H`
`val.val[0] -> Vt.4H`
`ptr -> Xn` | `ST3 {Vt.4H - Vt3.4H},[Xn]` | | `v7/A32/A64` | -| void vst3q_p16(
     poly16_t *ptr,
     poly16x8x3_t val)
| `val.val[2] -> Vt3.8H`
`val.val[1] -> Vt2.8H`
`val.val[0] -> Vt.8H`
`ptr -> Xn` | `ST3 {Vt.8H - Vt3.8H},[Xn]` | | `v7/A32/A64` | -| void vst3_s64(
     int64_t *ptr,
     int64x1x3_t val)
| `val.val[2] -> Vt3.1D`
`val.val[1] -> Vt2.1D`
`val.val[0] -> Vt.1D`
`ptr -> Xn` | `ST1 {Vt.1D - Vt3.1D},[Xn]` | | `v7/A32/A64` | -| void vst3_u64(
     uint64_t *ptr,
     uint64x1x3_t val)
| `val.val[2] -> Vt3.1D`
`val.val[1] -> Vt2.1D`
`val.val[0] -> Vt.1D`
`ptr -> Xn` | `ST1 {Vt.1D - Vt3.1D},[Xn]` | | `v7/A32/A64` | -| void vst3_p64(
     poly64_t *ptr,
     poly64x1x3_t val)
| `val.val[2] -> Vt3.1D`
`val.val[1] -> Vt2.1D`
`val.val[0] -> Vt.1D`
`ptr -> Xn` | `ST1 {Vt.1D - Vt3.1D},[Xn]` | | `A32/A64` | -| void vst3q_s64(
     int64_t *ptr,
     int64x2x3_t val)
| `val.val[2] -> Vt3.2D`
`val.val[1] -> Vt2.2D`
`val.val[0] -> Vt.2D`
`ptr -> Xn` | `ST3 {Vt.2D - Vt3.2D},[Xn]` | | `A64` | -| void vst3q_u64(
     uint64_t *ptr,
     uint64x2x3_t val)
| `val.val[2] -> Vt3.2D`
`val.val[1] -> Vt2.2D`
`val.val[0] -> Vt.2D`
`ptr -> Xn` | `ST3 {Vt.2D - Vt3.2D},[Xn]` | | `A64` | -| void vst3q_p64(
     poly64_t *ptr,
     poly64x2x3_t val)
| `val.val[2] -> Vt3.2D`
`val.val[1] -> Vt2.2D`
`val.val[0] -> Vt.2D`
`ptr -> Xn` | `ST3 {Vt.2D - Vt3.2D},[Xn]` | | `A64` | -| void vst3_f64(
     float64_t *ptr,
     float64x1x3_t val)
| `val.val[2] -> Vt3.1D`
`val.val[1] -> Vt2.1D`
`val.val[0] -> Vt.1D`
`ptr -> Xn` | `ST1 {Vt.1D - Vt3.1D},[Xn]` | | `A64` | -| void vst3q_f64(
     float64_t *ptr,
     float64x2x3_t val)
| `val.val[2] -> Vt3.2D`
`val.val[1] -> Vt2.2D`
`val.val[0] -> Vt.2D`
`ptr -> Xn` | `ST3 {Vt.2D - Vt3.2D},[Xn]` | | `A64` | -| void vst4_s8(
     int8_t *ptr,
     int8x8x4_t val)
| `val.val[3] -> Vt4.8B`
`val.val[2] -> Vt3.8B`
`val.val[1] -> Vt2.8B`
`val.val[0] -> Vt.8B`
`ptr -> Xn` | `ST4 {Vt.8B - Vt4.8B},[Xn]` | | `v7/A32/A64` | -| void vst4q_s8(
     int8_t *ptr,
     int8x16x4_t val)
| `val.val[3] -> Vt4.16B`
`val.val[2] -> Vt3.16B`
`val.val[1] -> Vt2.16B`
`val.val[0] -> Vt.16B`
`ptr -> Xn` | `ST4 {Vt.16B - Vt4.16B},[Xn]` | | `v7/A32/A64` | -| void vst4_s16(
     int16_t *ptr,
     int16x4x4_t val)
| `val.val[3] -> Vt4.4H`
`val.val[2] -> Vt3.4H`
`val.val[1] -> Vt2.4H`
`val.val[0] -> Vt.4H`
`ptr -> Xn` | `ST4 {Vt.4H - Vt4.4H},[Xn]` | | `v7/A32/A64` | -| void vst4q_s16(
     int16_t *ptr,
     int16x8x4_t val)
| `val.val[3] -> Vt4.8H`
`val.val[2] -> Vt3.8H`
`val.val[1] -> Vt2.8H`
`val.val[0] -> Vt.8H`
`ptr -> Xn` | `ST4 {Vt.8H - Vt4.8H},[Xn]` | | `v7/A32/A64` | -| void vst4_s32(
     int32_t *ptr,
     int32x2x4_t val)
| `val.val[3] -> Vt4.2S`
`val.val[2] -> Vt3.2S`
`val.val[1] -> Vt2.2S`
`val.val[0] -> Vt.2S`
`ptr -> Xn` | `ST4 {Vt.2S - Vt4.2S},[Xn]` | | `v7/A32/A64` | -| void vst4q_s32(
     int32_t *ptr,
     int32x4x4_t val)
| `val.val[3] -> Vt4.4S`
`val.val[2] -> Vt3.4S`
`val.val[1] -> Vt2.4S`
`val.val[0] -> Vt.4S`
`ptr -> Xn` | `ST4 {Vt.4S - Vt4.4S},[Xn]` | | `v7/A32/A64` | -| void vst4_u8(
     uint8_t *ptr,
     uint8x8x4_t val)
| `val.val[3] -> Vt4.8B`
`val.val[2] -> Vt3.8B`
`val.val[1] -> Vt2.8B`
`val.val[0] -> Vt.8B`
`ptr -> Xn` | `ST4 {Vt.8B - Vt4.8B},[Xn]` | | `v7/A32/A64` | -| void vst4q_u8(
     uint8_t *ptr,
     uint8x16x4_t val)
| `val.val[3] -> Vt4.16B`
`val.val[2] -> Vt3.16B`
`val.val[1] -> Vt2.16B`
`val.val[0] -> Vt.16B`
`ptr -> Xn` | `ST4 {Vt.16B - Vt4.16B},[Xn]` | | `v7/A32/A64` | -| void vst4_u16(
     uint16_t *ptr,
     uint16x4x4_t val)
| `val.val[3] -> Vt4.4H`
`val.val[2] -> Vt3.4H`
`val.val[1] -> Vt2.4H`
`val.val[0] -> Vt.4H`
`ptr -> Xn` | `ST4 {Vt.4H - Vt4.4H},[Xn]` | | `v7/A32/A64` | -| void vst4q_u16(
     uint16_t *ptr,
     uint16x8x4_t val)
| `val.val[3] -> Vt4.8H`
`val.val[2] -> Vt3.8H`
`val.val[1] -> Vt2.8H`
`val.val[0] -> Vt.8H`
`ptr -> Xn` | `ST4 {Vt.8H - Vt4.8H},[Xn]` | | `v7/A32/A64` | -| void vst4_u32(
     uint32_t *ptr,
     uint32x2x4_t val)
| `val.val[3] -> Vt4.2S`
`val.val[2] -> Vt3.2S`
`val.val[1] -> Vt2.2S`
`val.val[0] -> Vt.2S`
`ptr -> Xn` | `ST4 {Vt.2S - Vt4.2S},[Xn]` | | `v7/A32/A64` | -| void vst4q_u32(
     uint32_t *ptr,
     uint32x4x4_t val)
| `val.val[3] -> Vt4.4S`
`val.val[2] -> Vt3.4S`
`val.val[1] -> Vt2.4S`
`val.val[0] -> Vt.4S`
`ptr -> Xn` | `ST4 {Vt.4S - Vt4.4S},[Xn]` | | `v7/A32/A64` | -| void vst4_f16(
     float16_t *ptr,
     float16x4x4_t val)
| `val.val[3] -> Vt4.4H`
`val.val[2] -> Vt3.4H`
`val.val[1] -> Vt2.4H`
`val.val[0] -> Vt.4H`
`ptr -> Xn` | `ST4 {Vt.4H - Vt4.4H},[Xn]` | | `v7/A32/A64` | -| void vst4q_f16(
     float16_t *ptr,
     float16x8x4_t val)
| `val.val[3] -> Vt4.8H`
`val.val[2] -> Vt3.8H`
`val.val[1] -> Vt2.8H`
`val.val[0] -> Vt.8H`
`ptr -> Xn` | `ST4 {Vt.8H - Vt4.8H},[Xn]` | | `v7/A32/A64` | -| void vst4_f32(
     float32_t *ptr,
     float32x2x4_t val)
| `val.val[3] -> Vt4.2S`
`val.val[2] -> Vt3.2S`
`val.val[1] -> Vt2.2S`
`val.val[0] -> Vt.2S`
`ptr -> Xn` | `ST4 {Vt.2S - Vt4.2S},[Xn]` | | `v7/A32/A64` | -| void vst4q_f32(
     float32_t *ptr,
     float32x4x4_t val)
| `val.val[3] -> Vt4.4S`
`val.val[2] -> Vt3.4S`
`val.val[1] -> Vt2.4S`
`val.val[0] -> Vt.4S`
`ptr -> Xn` | `ST4 {Vt.4S - Vt4.4S},[Xn]` | | `v7/A32/A64` | -| void vst4_p8(
     poly8_t *ptr,
     poly8x8x4_t val)
| `val.val[3] -> Vt4.8B`
`val.val[2] -> Vt3.8B`
`val.val[1] -> Vt2.8B`
`val.val[0] -> Vt.8B`
`ptr -> Xn` | `ST4 {Vt.8B - Vt4.8B},[Xn]` | | `v7/A32/A64` | -| void vst4q_p8(
     poly8_t *ptr,
     poly8x16x4_t val)
| `val.val[3] -> Vt4.16B`
`val.val[2] -> Vt3.16B`
`val.val[1] -> Vt2.16B`
`val.val[0] -> Vt.16B`
`ptr -> Xn` | `ST4 {Vt.16B - Vt4.16B},[Xn]` | | `v7/A32/A64` | -| void vst4_p16(
     poly16_t *ptr,
     poly16x4x4_t val)
| `val.val[3] -> Vt4.4H`
`val.val[2] -> Vt3.4H`
`val.val[1] -> Vt2.4H`
`val.val[0] -> Vt.4H`
`ptr -> Xn` | `ST4 {Vt.4H - Vt4.4H},[Xn]` | | `v7/A32/A64` | -| void vst4q_p16(
     poly16_t *ptr,
     poly16x8x4_t val)
| `val.val[3] -> Vt4.8H`
`val.val[2] -> Vt3.8H`
`val.val[1] -> Vt2.8H`
`val.val[0] -> Vt.8H`
`ptr -> Xn` | `ST4 {Vt.8H - Vt4.8H},[Xn]` | | `v7/A32/A64` | -| void vst4_s64(
     int64_t *ptr,
     int64x1x4_t val)
| `val.val[3] -> Vt4.1D`
`val.val[2] -> Vt3.1D`
`val.val[1] -> Vt2.1D`
`val.val[0] -> Vt.1D`
`ptr -> Xn` | `ST1 {Vt.1D - Vt4.1D},[Xn]` | | `v7/A32/A64` | -| void vst4_u64(
     uint64_t *ptr,
     uint64x1x4_t val)
| `val.val[3] -> Vt4.1D`
`val.val[2] -> Vt3.1D`
`val.val[1] -> Vt2.1D`
`val.val[0] -> Vt.1D`
`ptr -> Xn` | `ST1 {Vt.1D - Vt4.1D},[Xn]` | | `v7/A32/A64` | -| void vst4_p64(
     poly64_t *ptr,
     poly64x1x4_t val)
| `val.val[3] -> Vt4.1D`
`val.val[2] -> Vt3.1D`
`val.val[1] -> Vt2.1D`
`val.val[0] -> Vt.1D`
`ptr -> Xn` | `ST1 {Vt.1D - Vt4.1D},[Xn]` | | `A32/A64` | -| void vst4q_s64(
     int64_t *ptr,
     int64x2x4_t val)
| `val.val[3] -> Vt4.2D`
`val.val[2] -> Vt3.2D`
`val.val[1] -> Vt2.2D`
`val.val[0] -> Vt.2D`
`ptr -> Xn` | `ST4 {Vt.2D - Vt4.2D},[Xn]` | | `A64` | -| void vst4q_u64(
     uint64_t *ptr,
     uint64x2x4_t val)
| `val.val[3] -> Vt4.2D`
`val.val[2] -> Vt3.2D`
`val.val[1] -> Vt2.2D`
`val.val[0] -> Vt.2D`
`ptr -> Xn` | `ST4 {Vt.2D - Vt4.2D},[Xn]` | | `A64` | -| void vst4q_p64(
     poly64_t *ptr,
     poly64x2x4_t val)
| `val.val[3] -> Vt4.2D`
`val.val[2] -> Vt3.2D`
`val.val[1] -> Vt2.2D`
`val.val[0] -> Vt.2D`
`ptr -> Xn` | `ST4 {Vt.2D - Vt4.2D},[Xn]` | | `A64` | -| void vst4_f64(
     float64_t *ptr,
     float64x1x4_t val)
| `val.val[3] -> Vt4.1D`
`val.val[2] -> Vt3.1D`
`val.val[1] -> Vt2.1D`
`val.val[0] -> Vt.1D`
`ptr -> Xn` | `ST1 {Vt.1D - Vt4.1D},[Xn]` | | `A64` | -| void vst4q_f64(
     float64_t *ptr,
     float64x2x4_t val)
| `val.val[3] -> Vt4.2D`
`val.val[2] -> Vt3.2D`
`val.val[1] -> Vt2.2D`
`val.val[0] -> Vt.2D`
`ptr -> Xn` | `ST4 {Vt.2D - Vt4.2D},[Xn]` | | `A64` | -| void vst2_lane_s8(
     int8_t *ptr,
     int8x8x2_t val,
     const int lane)
| `val.val[1] -> Vt2.8B`
`val.val[0] -> Vt.8B`
`ptr -> Xn`
`0 <= lane <= 7` | `ST2 {Vt.b - Vt2.b}[lane],[Xn]` | | `v7/A32/A64` | -| void vst2_lane_u8(
     uint8_t *ptr,
     uint8x8x2_t val,
     const int lane)
| `val.val[1] -> Vt2.8B`
`val.val[0] -> Vt.8B`
`ptr -> Xn`
`0 <= lane <= 7` | `ST2 {Vt.b - Vt2.b}[lane],[Xn]` | | `v7/A32/A64` | -| void vst2_lane_p8(
     poly8_t *ptr,
     poly8x8x2_t val,
     const int lane)
| `val.val[1] -> Vt2.8B`
`val.val[0] -> Vt.8B`
`ptr -> Xn`
`0 <= lane <= 7` | `ST2 {Vt.b - Vt2.b}[lane],[Xn]` | | `v7/A32/A64` | -| void vst3_lane_s8(
     int8_t *ptr,
     int8x8x3_t val,
     const int lane)
| `val.val[2] -> Vt3.8B`
`val.val[1] -> Vt2.8B`
`val.val[0] -> Vt.8B`
`ptr -> Xn`
`0 <= lane <= 7` | `ST3 {Vt.b - Vt3.b}[lane],[Xn]` | | `v7/A32/A64` | -| void vst3_lane_u8(
     uint8_t *ptr,
     uint8x8x3_t val,
     const int lane)
| `val.val[2] -> Vt3.8B`
`val.val[1] -> Vt2.8B`
`val.val[0] -> Vt.8B`
`ptr -> Xn`
`0 <= lane <= 7` | `ST3 {Vt.b - Vt3.b}[lane],[Xn]` | | `v7/A32/A64` | -| void vst3_lane_p8(
     poly8_t *ptr,
     poly8x8x3_t val,
     const int lane)
| `val.val[2] -> Vt3.8B`
`val.val[1] -> Vt2.8B`
`val.val[0] -> Vt.8B`
`ptr -> Xn`
`0 <= lane <= 7` | `ST3 {Vt.b - Vt3.b}[lane],[Xn]` | | `v7/A32/A64` | -| void vst4_lane_s8(
     int8_t *ptr,
     int8x8x4_t val,
     const int lane)
| `val.val[3] -> Vt4.8B`
`val.val[2] -> Vt3.8B`
`val.val[1] -> Vt2.8B`
`val.val[0] -> Vt.8B`
`ptr -> Xn`
`0 <= lane <= 7` | `ST4 {Vt.b - Vt4.b}[lane],[Xn]` | | `v7/A32/A64` | -| void vst4_lane_u8(
     uint8_t *ptr,
     uint8x8x4_t val,
     const int lane)
| `val.val[3] -> Vt4.8B`
`val.val[2] -> Vt3.8B`
`val.val[1] -> Vt2.8B`
`val.val[0] -> Vt.8B`
`ptr -> Xn`
`0 <= lane <= 7` | `ST4 {Vt.b - Vt4.b}[lane],[Xn]` | | `v7/A32/A64` | -| void vst4_lane_p8(
     poly8_t *ptr,
     poly8x8x4_t val,
     const int lane)
| `val.val[3] -> Vt4.8B`
`val.val[2] -> Vt3.8B`
`val.val[1] -> Vt2.8B`
`val.val[0] -> Vt.8B`
`ptr -> Xn`
`0 <= lane <= 7` | `ST4 {Vt.b - Vt4.b}[lane],[Xn]` | | `v7/A32/A64` | -| void vst2_lane_s16(
     int16_t *ptr,
     int16x4x2_t val,
     const int lane)
| `val.val[1] -> Vt2.4H`
`val.val[0] -> Vt.4H`
`ptr -> Xn`
`0 <= lane <= 3` | `ST2 {Vt.h - Vt2.h}[lane],[Xn]` | | `v7/A32/A64` | -| void vst2q_lane_s16(
     int16_t *ptr,
     int16x8x2_t val,
     const int lane)
| `val.val[1] -> Vt2.8H`
`val.val[0] -> Vt.8H`
`ptr -> Xn`
`0 <= lane <= 7` | `ST2 {Vt.h - Vt2.h}[lane],[Xn]` | | `v7/A32/A64` | -| void vst2_lane_s32(
     int32_t *ptr,
     int32x2x2_t val,
     const int lane)
| `val.val[1] -> Vt2.2S`
`val.val[0] -> Vt.2S`
`ptr -> Xn`
`0 <= lane <= 1` | `ST2 {Vt.s - Vt2.s}[lane],[Xn]` | | `v7/A32/A64` | -| void vst2q_lane_s32(
     int32_t *ptr,
     int32x4x2_t val,
     const int lane)
| `val.val[1] -> Vt2.4S`
`val.val[0] -> Vt.4S`
`ptr -> Xn`
`0 <= lane <= 3` | `ST2 {Vt.s - Vt2.s}[lane],[Xn]` | | `v7/A32/A64` | -| void vst2_lane_u16(
     uint16_t *ptr,
     uint16x4x2_t val,
     const int lane)
| `val.val[1] -> Vt2.4H`
`val.val[0] -> Vt.4H`
`ptr -> Xn`
`0 <= lane <= 3` | `ST2 {Vt.h - Vt2.h}[lane],[Xn]` | | `v7/A32/A64` | -| void vst2q_lane_u16(
     uint16_t *ptr,
     uint16x8x2_t val,
     const int lane)
| `val.val[1] -> Vt2.8H`
`val.val[0] -> Vt.8H`
`ptr -> Xn`
`0 <= lane <= 7` | `ST2 {Vt.h - Vt2.h}[lane],[Xn]` | | `v7/A32/A64` | -| void vst2_lane_u32(
     uint32_t *ptr,
     uint32x2x2_t val,
     const int lane)
| `val.val[1] -> Vt2.2S`
`val.val[0] -> Vt.2S`
`ptr -> Xn`
`0 <= lane <= 1` | `ST2 {Vt.s - Vt2.s}[lane],[Xn]` | | `v7/A32/A64` | -| void vst2q_lane_u32(
     uint32_t *ptr,
     uint32x4x2_t val,
     const int lane)
| `val.val[1] -> Vt2.4S`
`val.val[0] -> Vt.4S`
`ptr -> Xn`
`0 <= lane <= 3` | `ST2 {Vt.s - Vt2.s}[lane],[Xn]` | | `v7/A32/A64` | -| void vst2_lane_f16(
     float16_t *ptr,
     float16x4x2_t val,
     const int lane)
| `val.val[1] -> Vt2.4H`
`val.val[0] -> Vt.4H`
`ptr -> Xn`
`0 <= lane <= 3` | `ST2 {Vt.h - Vt2.h}[lane],[Xn]` | | `v7/A32/A64` | -| void vst2q_lane_f16(
     float16_t *ptr,
     float16x8x2_t val,
     const int lane)
| `val.val[1] -> Vt2.8H`
`val.val[0] -> Vt.8H`
`ptr -> Xn`
`0 <= lane <= 7` | `ST2 {Vt.h - Vt2.h}[lane],[Xn]` | | `v7/A32/A64` | -| void vst2_lane_f32(
     float32_t *ptr,
     float32x2x2_t val,
     const int lane)
| `val.val[1] -> Vt2.2S`
`val.val[0] -> Vt.2S`
`ptr -> Xn`
`0 <= lane <= 1` | `ST2 {Vt.s - Vt2.s}[lane],[Xn]` | | `v7/A32/A64` | -| void vst2q_lane_f32(
     float32_t *ptr,
     float32x4x2_t val,
     const int lane)
| `val.val[1] -> Vt2.4S`
`val.val[0] -> Vt.4S`
`ptr -> Xn`
`0 <= lane <= 3` | `ST2 {Vt.s - Vt2.s}[lane],[Xn]` | | `v7/A32/A64` | -| void vst2_lane_p16(
     poly16_t *ptr,
     poly16x4x2_t val,
     const int lane)
| `val.val[1] -> Vt2.4H`
`val.val[0] -> Vt.4H`
`ptr -> Xn`
`0 <= lane <= 3` | `ST2 {Vt.h - Vt2.h}[lane],[Xn]` | | `v7/A32/A64` | -| void vst2q_lane_p16(
     poly16_t *ptr,
     poly16x8x2_t val,
     const int lane)
| `val.val[1] -> Vt2.8H`
`val.val[0] -> Vt.8H`
`ptr -> Xn`
`0 <= lane <= 7` | `ST2 {Vt.h - Vt2.h}[lane],[Xn]` | | `v7/A32/A64` | -| void vst2q_lane_s8(
     int8_t *ptr,
     int8x16x2_t val,
     const int lane)
| `val.val[1] -> Vt2.16B`
`val.val[0] -> Vt.16B`
`ptr -> Xn`
`0 <= lane <= 15` | `ST2 {Vt.b - Vt2.b}[lane],[Xn]` | | `A64` | -| void vst2q_lane_u8(
     uint8_t *ptr,
     uint8x16x2_t val,
     const int lane)
| `val.val[1] -> Vt2.16B`
`val.val[0] -> Vt.16B`
`ptr -> Xn`
`0 <= lane <= 15` | `ST2 {Vt.b - Vt2.b}[lane],[Xn]` | | `A64` | -| void vst2q_lane_p8(
     poly8_t *ptr,
     poly8x16x2_t val,
     const int lane)
| `val.val[1] -> Vt2.16B`
`val.val[0] -> Vt.16B`
`ptr -> Xn`
`0 <= lane <= 15` | `ST2 {Vt.b - Vt2.b}[lane],[Xn]` | | `A64` | -| void vst2_lane_s64(
     int64_t *ptr,
     int64x1x2_t val,
     const int lane)
| `val.val[1] -> Vt2.1D`
`val.val[0] -> Vt.1D`
`ptr -> Xn`
`0 <= lane <= 0` | `ST2 {Vt.d - Vt2.d}[lane],[Xn]` | | `A64` | -| void vst2q_lane_s64(
     int64_t *ptr,
     int64x2x2_t val,
     const int lane)
| `val.val[1] -> Vt2.2D`
`val.val[0] -> Vt.2D`
`ptr -> Xn`
`0 <= lane <= 1` | `ST2 {Vt.d - Vt2.d}[lane],[Xn]` | | `A64` | -| void vst2_lane_u64(
     uint64_t *ptr,
     uint64x1x2_t val,
     const int lane)
| `val.val[1] -> Vt2.1D`
`val.val[0] -> Vt.1D`
`ptr -> Xn`
`0 <= lane <= 0` | `ST2 {Vt.d - Vt2.d}[lane],[Xn]` | | `A64` | -| void vst2q_lane_u64(
     uint64_t *ptr,
     uint64x2x2_t val,
     const int lane)
| `val.val[1] -> Vt2.2D`
`val.val[0] -> Vt.2D`
`ptr -> Xn`
`0 <= lane <= 1` | `ST2 {Vt.d - Vt2.d}[lane],[Xn]` | | `A64` | -| void vst2_lane_p64(
     poly64_t *ptr,
     poly64x1x2_t val,
     const int lane)
| `val.val[1] -> Vt2.1D`
`val.val[0] -> Vt.1D`
`ptr -> Xn`
`0 <= lane <= 0` | `ST2 {Vt.d - Vt2.d}[lane],[Xn]` | | `A64` | -| void vst2q_lane_p64(
     poly64_t *ptr,
     poly64x2x2_t val,
     const int lane)
| `val.val[1] -> Vt2.2D`
`val.val[0] -> Vt.2D`
`ptr -> Xn`
`0 <= lane <= 1` | `ST2 {Vt.d - Vt2.d}[lane],[Xn]` | | `A64` | -| void vst2_lane_f64(
     float64_t *ptr,
     float64x1x2_t val,
     const int lane)
| `val.val[1] -> Vt2.1D`
`val.val[0] -> Vt.1D`
`ptr -> Xn`
`0 <= lane <= 0` | `ST2 {Vt.d - Vt2.d}[lane],[Xn]` | | `A64` | -| void vst2q_lane_f64(
     float64_t *ptr,
     float64x2x2_t val,
     const int lane)
| `val.val[1] -> Vt2.2D`
`val.val[0] -> Vt.2D`
`ptr -> Xn`
`0 <= lane <= 1` | `ST2 {Vt.d - Vt2.d}[lane],[Xn]` | | `A64` | -| void vst3_lane_s16(
     int16_t *ptr,
     int16x4x3_t val,
     const int lane)
| `val.val[2] -> Vt3.4H`
`val.val[1] -> Vt2.4H`
`val.val[0] -> Vt.4H`
`ptr -> Xn`
`0 <= lane <= 3` | `ST3 {Vt.h - Vt3.h}[lane],[Xn]` | | `v7/A32/A64` | -| void vst3q_lane_s16(
     int16_t *ptr,
     int16x8x3_t val,
     const int lane)
| `val.val[2] -> Vt3.8H`
`val.val[1] -> Vt2.8H`
`val.val[0] -> Vt.8H`
`ptr -> Xn`
`0 <= lane <= 7` | `ST3 {Vt.h - Vt3.h}[lane],[Xn]` | | `v7/A32/A64` | -| void vst3_lane_s32(
     int32_t *ptr,
     int32x2x3_t val,
     const int lane)
| `val.val[2] -> Vt3.2S`
`val.val[1] -> Vt2.2S`
`val.val[0] -> Vt.2S`
`ptr -> Xn`
`0 <= lane <= 1` | `ST3 {Vt.s - Vt3.s}[lane],[Xn]` | | `v7/A32/A64` | -| void vst3q_lane_s32(
     int32_t *ptr,
     int32x4x3_t val,
     const int lane)
| `val.val[2] -> Vt3.4S`
`val.val[1] -> Vt2.4S`
`val.val[0] -> Vt.4S`
`ptr -> Xn`
`0 <= lane <= 3` | `ST3 {Vt.s - Vt3.s}[lane],[Xn]` | | `v7/A32/A64` | -| void vst3_lane_u16(
     uint16_t *ptr,
     uint16x4x3_t val,
     const int lane)
| `val.val[2] -> Vt3.4H`
`val.val[1] -> Vt2.4H`
`val.val[0] -> Vt.4H`
`ptr -> Xn`
`0 <= lane <= 3` | `ST3 {Vt.h - Vt3.h}[lane],[Xn]` | | `v7/A32/A64` | -| void vst3q_lane_u16(
     uint16_t *ptr,
     uint16x8x3_t val,
     const int lane)
| `val.val[2] -> Vt3.8H`
`val.val[1] -> Vt2.8H`
`val.val[0] -> Vt.8H`
`ptr -> Xn`
`0 <= lane <= 7` | `ST3 {Vt.h - Vt3.h}[lane],[Xn]` | | `v7/A32/A64` | -| void vst3_lane_u32(
     uint32_t *ptr,
     uint32x2x3_t val,
     const int lane)
| `val.val[2] -> Vt3.2S`
`val.val[1] -> Vt2.2S`
`val.val[0] -> Vt.2S`
`ptr -> Xn`
`0 <= lane <= 1` | `ST3 {Vt.s - Vt3.s}[lane],[Xn]` | | `v7/A32/A64` | -| void vst3q_lane_u32(
     uint32_t *ptr,
     uint32x4x3_t val,
     const int lane)
| `val.val[2] -> Vt3.4S`
`val.val[1] -> Vt2.4S`
`val.val[0] -> Vt.4S`
`ptr -> Xn`
`0 <= lane <= 3` | `ST3 {Vt.s - Vt3.s}[lane],[Xn]` | | `v7/A32/A64` | -| void vst3_lane_f16(
     float16_t *ptr,
     float16x4x3_t val,
     const int lane)
| `val.val[2] -> Vt3.4H`
`val.val[1] -> Vt2.4H`
`val.val[0] -> Vt.4H`
`ptr -> Xn`
`0 <= lane <= 3` | `ST3 {Vt.h - Vt3.h}[lane],[Xn]` | | `v7/A32/A64` | -| void vst3q_lane_f16(
     float16_t *ptr,
     float16x8x3_t val,
     const int lane)
| `val.val[2] -> Vt3.8H`
`val.val[1] -> Vt2.8H`
`val.val[0] -> Vt.8H`
`ptr -> Xn`
`0 <= lane <= 7` | `ST3 {Vt.h - Vt3.h}[lane],[Xn]` | | `v7/A32/A64` | -| void vst3_lane_f32(
     float32_t *ptr,
     float32x2x3_t val,
     const int lane)
| `val.val[2] -> Vt3.2S`
`val.val[1] -> Vt2.2S`
`val.val[0] -> Vt.2S`
`ptr -> Xn`
`0 <= lane <= 1` | `ST3 {Vt.s - Vt3.s}[lane],[Xn]` | | `v7/A32/A64` | -| void vst3q_lane_f32(
     float32_t *ptr,
     float32x4x3_t val,
     const int lane)
| `val.val[2] -> Vt3.4S`
`val.val[1] -> Vt2.4S`
`val.val[0] -> Vt.4S`
`ptr -> Xn`
`0 <= lane <= 3` | `ST3 {Vt.s - Vt3.s}[lane],[Xn]` | | `v7/A32/A64` | -| void vst3_lane_p16(
     poly16_t *ptr,
     poly16x4x3_t val,
     const int lane)
| `val.val[2] -> Vt3.4H`
`val.val[1] -> Vt2.4H`
`val.val[0] -> Vt.4H`
`ptr -> Xn`
`0 <= lane <= 3` | `ST3 {Vt.h - Vt3.h}[lane],[Xn]` | | `v7/A32/A64` | -| void vst3q_lane_p16(
     poly16_t *ptr,
     poly16x8x3_t val,
     const int lane)
| `val.val[2] -> Vt3.8H`
`val.val[1] -> Vt2.8H`
`val.val[0] -> Vt.8H`
`ptr -> Xn`
`0 <= lane <= 7` | `ST3 {Vt.h - Vt3.h}[lane],[Xn]` | | `v7/A32/A64` | -| void vst3q_lane_s8(
     int8_t *ptr,
     int8x16x3_t val,
     const int lane)
| `val.val[2] -> Vt3.16B`
`val.val[1] -> Vt2.16B`
`val.val[0] -> Vt.16B`
`ptr -> Xn`
`0 <= lane <= 15` | `ST3 {Vt.b - Vt3.b}[lane],[Xn]` | | `v7/A32/A64` | -| void vst3q_lane_u8(
     uint8_t *ptr,
     uint8x16x3_t val,
     const int lane)
| `val.val[2] -> Vt3.16B`
`val.val[1] -> Vt2.16B`
`val.val[0] -> Vt.16B`
`ptr -> Xn`
`0 <= lane <= 15` | `ST3 {Vt.b - Vt3.b}[lane],[Xn]` | | `v7/A32/A64` | -| void vst3q_lane_p8(
     poly8_t *ptr,
     poly8x16x3_t val,
     const int lane)
| `val.val[2] -> Vt3.16B`
`val.val[1] -> Vt2.16B`
`val.val[0] -> Vt.16B`
`ptr -> Xn`
`0 <= lane <= 15` | `ST3 {Vt.b - Vt3.b}[lane],[Xn]` | | `v7/A32/A64` | -| void vst3_lane_s64(
     int64_t *ptr,
     int64x1x3_t val,
     const int lane)
| `val.val[2] -> Vt3.1D`
`val.val[1] -> Vt2.1D`
`val.val[0] -> Vt.1D`
`ptr -> Xn`
`0 <= lane <= 0` | `ST3 {Vt.d - Vt3.d}[lane],[Xn]` | | `A64` | -| void vst3q_lane_s64(
     int64_t *ptr,
     int64x2x3_t val,
     const int lane)
| `val.val[2] -> Vt3.2D`
`val.val[1] -> Vt2.2D`
`val.val[0] -> Vt.2D`
`ptr -> Xn`
`0 <= lane <= 1` | `ST3 {Vt.d - Vt3.d}[lane],[Xn]` | | `A64` | -| void vst3_lane_u64(
     uint64_t *ptr,
     uint64x1x3_t val,
     const int lane)
| `val.val[2] -> Vt3.1D`
`val.val[1] -> Vt2.1D`
`val.val[0] -> Vt.1D`
`ptr -> Xn`
`0 <= lane <= 0` | `ST3 {Vt.d - Vt3.d}[lane],[Xn]` | | `A64` | -| void vst3q_lane_u64(
     uint64_t *ptr,
     uint64x2x3_t val,
     const int lane)
| `val.val[2] -> Vt3.2D`
`val.val[1] -> Vt2.2D`
`val.val[0] -> Vt.2D`
`ptr -> Xn`
`0 <= lane <= 1` | `ST3 {Vt.d - Vt3.d}[lane],[Xn]` | | `A64` | -| void vst3_lane_p64(
     poly64_t *ptr,
     poly64x1x3_t val,
     const int lane)
| `val.val[2] -> Vt3.1D`
`val.val[1] -> Vt2.1D`
`val.val[0] -> Vt.1D`
`ptr -> Xn`
`0 <= lane <= 0` | `ST3 {Vt.d - Vt3.d}[lane],[Xn]` | | `A64` | -| void vst3q_lane_p64(
     poly64_t *ptr,
     poly64x2x3_t val,
     const int lane)
| `val.val[2] -> Vt3.2D`
`val.val[1] -> Vt2.2D`
`val.val[0] -> Vt.2D`
`ptr -> Xn`
`0 <= lane <= 1` | `ST3 {Vt.d - Vt3.d}[lane],[Xn]` | | `A64` | -| void vst3_lane_f64(
     float64_t *ptr,
     float64x1x3_t val,
     const int lane)
| `val.val[2] -> Vt3.1D`
`val.val[1] -> Vt2.1D`
`val.val[0] -> Vt.1D`
`ptr -> Xn`
`0 <= lane <= 0` | `ST3 {Vt.d - Vt3.d}[lane],[Xn]` | | `A64` | -| void vst3q_lane_f64(
     float64_t *ptr,
     float64x2x3_t val,
     const int lane)
| `val.val[2] -> Vt3.2D`
`val.val[1] -> Vt2.2D`
`val.val[0] -> Vt.2D`
`ptr -> Xn`
`0 <= lane <= 1` | `ST3 {Vt.d - Vt3.d}[lane],[Xn]` | | `A64` | -| void vst4_lane_s16(
     int16_t *ptr,
     int16x4x4_t val,
     const int lane)
| `val.val[3] -> Vt4.4H`
`val.val[2] -> Vt3.4H`
`val.val[1] -> Vt2.4H`
`val.val[0] -> Vt.4H`
`ptr -> Xn`
`0 <= lane <= 3` | `ST4 {Vt.h - Vt4.h}[lane],[Xn]` | | `v7/A32/A64` | -| void vst4q_lane_s16(
     int16_t *ptr,
     int16x8x4_t val,
     const int lane)
| `val.val[3] -> Vt4.8H`
`val.val[2] -> Vt3.8H`
`val.val[1] -> Vt2.8H`
`val.val[0] -> Vt.8H`
`ptr -> Xn`
`0 <= lane <= 7` | `ST4 {Vt.h - Vt4.h}[lane],[Xn]` | | `v7/A32/A64` | -| void vst4_lane_s32(
     int32_t *ptr,
     int32x2x4_t val,
     const int lane)
| `val.val[3] -> Vt4.2S`
`val.val[2] -> Vt3.2S`
`val.val[1] -> Vt2.2S`
`val.val[0] -> Vt.2S`
`ptr -> Xn`
`0 <= lane <= 1` | `ST4 {Vt.s - Vt4.s}[lane],[Xn]` | | `v7/A32/A64` | -| void vst4q_lane_s32(
     int32_t *ptr,
     int32x4x4_t val,
     const int lane)
| `val.val[3] -> Vt4.4S`
`val.val[2] -> Vt3.4S`
`val.val[1] -> Vt2.4S`
`val.val[0] -> Vt.4S`
`ptr -> Xn`
`0 <= lane <= 3` | `ST4 {Vt.s - Vt4.s}[lane],[Xn]` | | `v7/A32/A64` | -| void vst4_lane_u16(
     uint16_t *ptr,
     uint16x4x4_t val,
     const int lane)
| `val.val[3] -> Vt4.4H`
`val.val[2] -> Vt3.4H`
`val.val[1] -> Vt2.4H`
`val.val[0] -> Vt.4H`
`ptr -> Xn`
`0 <= lane <= 3` | `ST4 {Vt.h - Vt4.h}[lane],[Xn]` | | `v7/A32/A64` | -| void vst4q_lane_u16(
     uint16_t *ptr,
     uint16x8x4_t val,
     const int lane)
| `val.val[3] -> Vt4.8H`
`val.val[2] -> Vt3.8H`
`val.val[1] -> Vt2.8H`
`val.val[0] -> Vt.8H`
`ptr -> Xn`
`0 <= lane <= 7` | `ST4 {Vt.h - Vt4.h}[lane],[Xn]` | | `v7/A32/A64` | -| void vst4_lane_u32(
     uint32_t *ptr,
     uint32x2x4_t val,
     const int lane)
| `val.val[3] -> Vt4.2S`
`val.val[2] -> Vt3.2S`
`val.val[1] -> Vt2.2S`
`val.val[0] -> Vt.2S`
`ptr -> Xn`
`0 <= lane <= 1` | `ST4 {Vt.s - Vt4.s}[lane],[Xn]` | | `v7/A32/A64` | -| void vst4q_lane_u32(
     uint32_t *ptr,
     uint32x4x4_t val,
     const int lane)
| `val.val[3] -> Vt4.4S`
`val.val[2] -> Vt3.4S`
`val.val[1] -> Vt2.4S`
`val.val[0] -> Vt.4S`
`ptr -> Xn`
`0 <= lane <= 3` | `ST4 {Vt.s - Vt4.s}[lane],[Xn]` | | `v7/A32/A64` | -| void vst4_lane_f16(
     float16_t *ptr,
     float16x4x4_t val,
     const int lane)
| `val.val[3] -> Vt4.4H`
`val.val[2] -> Vt3.4H`
`val.val[1] -> Vt2.4H`
`val.val[0] -> Vt.4H`
`ptr -> Xn`
`0 <= lane <= 3` | `ST4 {Vt.h - Vt4.h}[lane],[Xn]` | | `v7/A32/A64` | -| void vst4q_lane_f16(
     float16_t *ptr,
     float16x8x4_t val,
     const int lane)
| `val.val[3] -> Vt4.8H`
`val.val[2] -> Vt3.8H`
`val.val[1] -> Vt2.8H`
`val.val[0] -> Vt.8H`
`ptr -> Xn`
`0 <= lane <= 7` | `ST4 {Vt.h - Vt4.h}[lane],[Xn]` | | `v7/A32/A64` | -| void vst4_lane_f32(
     float32_t *ptr,
     float32x2x4_t val,
     const int lane)
| `val.val[3] -> Vt4.2S`
`val.val[2] -> Vt3.2S`
`val.val[1] -> Vt2.2S`
`val.val[0] -> Vt.2S`
`ptr -> Xn`
`0 <= lane <= 1` | `ST4 {Vt.s - Vt4.s}[lane],[Xn]` | | `v7/A32/A64` | -| void vst4q_lane_f32(
     float32_t *ptr,
     float32x4x4_t val,
     const int lane)
| `val.val[3] -> Vt4.4S`
`val.val[2] -> Vt3.4S`
`val.val[1] -> Vt2.4S`
`val.val[0] -> Vt.4S`
`ptr -> Xn`
`0 <= lane <= 3` | `ST4 {Vt.s - Vt4.s}[lane],[Xn]` | | `v7/A32/A64` | -| void vst4_lane_p16(
     poly16_t *ptr,
     poly16x4x4_t val,
     const int lane)
| `val.val[3] -> Vt4.4H`
`val.val[2] -> Vt3.4H`
`val.val[1] -> Vt2.4H`
`val.val[0] -> Vt.4H`
`ptr -> Xn`
`0 <= lane <= 3` | `ST4 {Vt.h - Vt4.h}[lane],[Xn]` | | `v7/A32/A64` | -| void vst4q_lane_p16(
     poly16_t *ptr,
     poly16x8x4_t val,
     const int lane)
| `val.val[3] -> Vt4.8H`
`val.val[2] -> Vt3.8H`
`val.val[1] -> Vt2.8H`
`val.val[0] -> Vt.8H`
`ptr -> Xn`
`0 <= lane <= 7` | `ST4 {Vt.h - Vt4.h}[lane],[Xn]` | | `v7/A32/A64` | -| void vst4q_lane_s8(
     int8_t *ptr,
     int8x16x4_t val,
     const int lane)
| `val.val[3] -> Vt4.16B`
`val.val[2] -> Vt3.16B`
`val.val[1] -> Vt2.16B`
`val.val[0] -> Vt.16B`
`ptr -> Xn`
`0 <= lane <= 15` | `ST4 {Vt.b - Vt4.b}[lane],[Xn]` | | `A64` | -| void vst4q_lane_u8(
     uint8_t *ptr,
     uint8x16x4_t val,
     const int lane)
| `val.val[3] -> Vt4.16B`
`val.val[2] -> Vt3.16B`
`val.val[1] -> Vt2.16B`
`val.val[0] -> Vt.16B`
`ptr -> Xn`
`0 <= lane <= 15` | `ST4 {Vt.b - Vt4.b}[lane],[Xn]` | | `A64` | -| void vst4q_lane_p8(
     poly8_t *ptr,
     poly8x16x4_t val,
     const int lane)
| `val.val[3] -> Vt4.16B`
`val.val[2] -> Vt3.16B`
`val.val[1] -> Vt2.16B`
`val.val[0] -> Vt.16B`
`ptr -> Xn`
`0 <= lane <= 15` | `ST4 {Vt.b - Vt4.b}[lane],[Xn]` | | `A64` | -| void vst4_lane_s64(
     int64_t *ptr,
     int64x1x4_t val,
     const int lane)
| `val.val[3] -> Vt4.1D`
`val.val[2] -> Vt3.1D`
`val.val[1] -> Vt2.1D`
`val.val[0] -> Vt.1D`
`ptr -> Xn`
`0 <= lane <= 0` | `ST4 {Vt.d - Vt4.d}[lane],[Xn]` | | `A64` | -| void vst4q_lane_s64(
     int64_t *ptr,
     int64x2x4_t val,
     const int lane)
| `val.val[3] -> Vt4.2D`
`val.val[2] -> Vt3.2D`
`val.val[1] -> Vt2.2D`
`val.val[0] -> Vt.2D`
`ptr -> Xn`
`0 <= lane <= 1` | `ST4 {Vt.d - Vt4.d}[lane],[Xn]` | | `A64` | -| void vst4_lane_u64(
     uint64_t *ptr,
     uint64x1x4_t val,
     const int lane)
| `val.val[3] -> Vt4.1D`
`val.val[2] -> Vt3.1D`
`val.val[1] -> Vt2.1D`
`val.val[0] -> Vt.1D`
`ptr -> Xn`
`0 <= lane <= 0` | `ST4 {Vt.d - Vt4.d}[lane],[Xn]` | | `A64` | -| void vst4q_lane_u64(
     uint64_t *ptr,
     uint64x2x4_t val,
     const int lane)
| `val.val[3] -> Vt4.2D`
`val.val[2] -> Vt3.2D`
`val.val[1] -> Vt2.2D`
`val.val[0] -> Vt.2D`
`ptr -> Xn`
`0 <= lane <= 1` | `ST4 {Vt.d - Vt4.d}[lane],[Xn]` | | `A64` | -| void vst4_lane_p64(
     poly64_t *ptr,
     poly64x1x4_t val,
     const int lane)
| `val.val[3] -> Vt4.1D`
`val.val[2] -> Vt3.1D`
`val.val[1] -> Vt2.1D`
`val.val[0] -> Vt.1D`
`ptr -> Xn`
`0 <= lane <= 0` | `ST4 {Vt.d - Vt4.d}[lane],[Xn]` | | `A64` | -| void vst4q_lane_p64(
     poly64_t *ptr,
     poly64x2x4_t val,
     const int lane)
| `val.val[3] -> Vt4.2D`
`val.val[2] -> Vt3.2D`
`val.val[1] -> Vt2.2D`
`val.val[0] -> Vt.2D`
`ptr -> Xn`
`0 <= lane <= 1` | `ST4 {Vt.d - Vt4.d}[lane],[Xn]` | | `A64` | -| void vst4_lane_f64(
     float64_t *ptr,
     float64x1x4_t val,
     const int lane)
| `val.val[3] -> Vt4.1D`
`val.val[2] -> Vt3.1D`
`val.val[1] -> Vt2.1D`
`val.val[0] -> Vt.1D`
`ptr -> Xn`
`0 <= lane <= 0` | `ST4 {Vt.d - Vt4.d}[lane],[Xn]` | | `A64` | -| void vst4q_lane_f64(
     float64_t *ptr,
     float64x2x4_t val,
     const int lane)
| `val.val[3] -> Vt4.2D`
`val.val[2] -> Vt3.2D`
`val.val[1] -> Vt2.2D`
`val.val[0] -> Vt.2D`
`ptr -> Xn`
`0 <= lane <= 1` | `ST4 {Vt.d - Vt4.d}[lane],[Xn]` | | `A64` | -| void vst1_s8_x2(
     int8_t *ptr,
     int8x8x2_t val)
| `val.val[1] -> Vt2.8B`
`val.val[0] -> Vt.8B`
`ptr -> Xn` | `ST1 {Vt.8B - Vt2.8B},[Xn]` | | `v7/A32/A64` | -| void vst1q_s8_x2(
     int8_t *ptr,
     int8x16x2_t val)
| `val.val[1] -> Vt2.16B`
`val.val[0] -> Vt.16B`
`ptr -> Xn` | `ST1 {Vt.16B - Vt2.16B},[Xn]` | | `v7/A32/A64` | -| void vst1_s16_x2(
     int16_t *ptr,
     int16x4x2_t val)
| `val.val[1] -> Vt2.4H`
`val.val[0] -> Vt.4H`
`ptr -> Xn` | `ST1 {Vt.4H - Vt2.4H},[Xn]` | | `v7/A32/A64` | -| void vst1q_s16_x2(
     int16_t *ptr,
     int16x8x2_t val)
| `val.val[1] -> Vt2.8H`
`val.val[0] -> Vt.8H`
`ptr -> Xn` | `ST1 {Vt.8H - Vt2.8H},[Xn]` | | `v7/A32/A64` | -| void vst1_s32_x2(
     int32_t *ptr,
     int32x2x2_t val)
| `val.val[1] -> Vt2.2S`
`val.val[0] -> Vt.2S`
`ptr -> Xn` | `ST1 {Vt.2S - Vt2.2S},[Xn]` | | `v7/A32/A64` | -| void vst1q_s32_x2(
     int32_t *ptr,
     int32x4x2_t val)
| `val.val[1] -> Vt2.4S`
`val.val[0] -> Vt.4S`
`ptr -> Xn` | `ST1 {Vt.4S - Vt2.4S},[Xn]` | | `v7/A32/A64` | -| void vst1_u8_x2(
     uint8_t *ptr,
     uint8x8x2_t val)
| `val.val[1] -> Vt2.8B`
`val.val[0] -> Vt.8B`
`ptr -> Xn` | `ST1 {Vt.8B - Vt2.8B},[Xn]` | | `v7/A32/A64` | -| void vst1q_u8_x2(
     uint8_t *ptr,
     uint8x16x2_t val)
| `val.val[1] -> Vt2.16B`
`val.val[0] -> Vt.16B`
`ptr -> Xn` | `ST1 {Vt.16B - Vt2.16B},[Xn]` | | `v7/A32/A64` | -| void vst1_u16_x2(
     uint16_t *ptr,
     uint16x4x2_t val)
| `val.val[1] -> Vt2.4H`
`val.val[0] -> Vt.4H`
`ptr -> Xn` | `ST1 {Vt.4H - Vt2.4H},[Xn]` | | `v7/A32/A64` | -| void vst1q_u16_x2(
     uint16_t *ptr,
     uint16x8x2_t val)
| `val.val[1] -> Vt2.8H`
`val.val[0] -> Vt.8H`
`ptr -> Xn` | `ST1 {Vt.8H - Vt2.8H},[Xn]` | | `v7/A32/A64` | -| void vst1_u32_x2(
     uint32_t *ptr,
     uint32x2x2_t val)
| `val.val[1] -> Vt2.2S`
`val.val[0] -> Vt.2S`
`ptr -> Xn` | `ST1 {Vt.2S - Vt2.2S},[Xn]` | | `v7/A32/A64` | -| void vst1q_u32_x2(
     uint32_t *ptr,
     uint32x4x2_t val)
| `val.val[1] -> Vt2.4S`
`val.val[0] -> Vt.4S`
`ptr -> Xn` | `ST1 {Vt.4S - Vt2.4S},[Xn]` | | `v7/A32/A64` | -| void vst1_f16_x2(
     float16_t *ptr,
     float16x4x2_t val)
| `val.val[1] -> Vt2.4H`
`val.val[0] -> Vt.4H`
`ptr -> Xn` | `ST1 {Vt.4H - Vt2.4H},[Xn]` | | `v7/A32/A64` | -| void vst1q_f16_x2(
     float16_t *ptr,
     float16x8x2_t val)
| `val.val[1] -> Vt2.8H`
`val.val[0] -> Vt.8H`
`ptr -> Xn` | `ST1 {Vt.8H - Vt2.8H},[Xn]` | | `v7/A32/A64` | -| void vst1_f32_x2(
     float32_t *ptr,
     float32x2x2_t val)
| `val.val[1] -> Vt2.2S`
`val.val[0] -> Vt.2S`
`ptr -> Xn` | `ST1 {Vt.2S - Vt2.2S},[Xn]` | | `v7/A32/A64` | -| void vst1q_f32_x2(
     float32_t *ptr,
     float32x4x2_t val)
| `val.val[1] -> Vt2.4S`
`val.val[0] -> Vt.4S`
`ptr -> Xn` | `ST1 {Vt.4S - Vt2.4S},[Xn]` | | `v7/A32/A64` | -| void vst1_p8_x2(
     poly8_t *ptr,
     poly8x8x2_t val)
| `val.val[1] -> Vt2.8B`
`val.val[0] -> Vt.8B`
`ptr -> Xn` | `ST1 {Vt.8B - Vt2.8B},[Xn]` | | `v7/A32/A64` | -| void vst1q_p8_x2(
     poly8_t *ptr,
     poly8x16x2_t val)
| `val.val[1] -> Vt2.16B`
`val.val[0] -> Vt.16B`
`ptr -> Xn` | `ST1 {Vt.16B - Vt2.16B},[Xn]` | | `v7/A32/A64` | -| void vst1_p16_x2(
     poly16_t *ptr,
     poly16x4x2_t val)
| `val.val[1] -> Vt2.4H`
`val.val[0] -> Vt.4H`
`ptr -> Xn` | `ST1 {Vt.4H - Vt2.4H},[Xn]` | | `v7/A32/A64` | -| void vst1q_p16_x2(
     poly16_t *ptr,
     poly16x8x2_t val)
| `val.val[1] -> Vt2.8H`
`val.val[0] -> Vt.8H`
`ptr -> Xn` | `ST1 {Vt.8H - Vt2.8H},[Xn]` | | `v7/A32/A64` | -| void vst1_s64_x2(
     int64_t *ptr,
     int64x1x2_t val)
| `val.val[1] -> Vt2.1D`
`val.val[0] -> Vt.1D`
`ptr -> Xn` | `ST1 {Vt.1D - Vt2.1D},[Xn]` | | `v7/A32/A64` | -| void vst1_u64_x2(
     uint64_t *ptr,
     uint64x1x2_t val)
| `val.val[1] -> Vt2.1D`
`val.val[0] -> Vt.1D`
`ptr -> Xn` | `ST1 {Vt.1D - Vt2.1D},[Xn]` | | `v7/A32/A64` | -| void vst1_p64_x2(
     poly64_t *ptr,
     poly64x1x2_t val)
| `val.val[1] -> Vt2.1D`
`val.val[0] -> Vt.1D`
`ptr -> Xn` | `ST1 {Vt.1D - Vt2.1D},[Xn]` | | `A32/A64` | -| void vst1q_s64_x2(
     int64_t *ptr,
     int64x2x2_t val)
| `val.val[1] -> Vt2.2D`
`val.val[0] -> Vt.2D`
`ptr -> Xn` | `ST1 {Vt.2D - Vt2.2D},[Xn]` | | `v7/A32/A64` | -| void vst1q_u64_x2(
     uint64_t *ptr,
     uint64x2x2_t val)
| `val.val[1] -> Vt2.2D`
`val.val[0] -> Vt.2D`
`ptr -> Xn` | `ST1 {Vt.2D - Vt2.2D},[Xn]` | | `v7/A32/A64` | -| void vst1q_p64_x2(
     poly64_t *ptr,
     poly64x2x2_t val)
| `val.val[1] -> Vt2.2D`
`val.val[0] -> Vt.2D`
`ptr -> Xn` | `ST1 {Vt.2D - Vt2.2D},[Xn]` | | `A32/A64` | -| void vst1_f64_x2(
     float64_t *ptr,
     float64x1x2_t val)
| `val.val[1] -> Vt2.1D`
`val.val[0] -> Vt.1D`
`ptr -> Xn` | `ST1 {Vt.1D - Vt2.1D},[Xn]` | | `A64` | -| void vst1q_f64_x2(
     float64_t *ptr,
     float64x2x2_t val)
| `val.val[1] -> Vt2.2D`
`val.val[0] -> Vt.2D`
`ptr -> Xn` | `ST1 {Vt.2D - Vt2.2D},[Xn]` | | `A64` | -| void vst1_s8_x3(
     int8_t *ptr,
     int8x8x3_t val)
| `val.val[2] -> Vt3.8B`
`val.val[1] -> Vt2.8B`
`val.val[0] -> Vt.8B`
`ptr -> Xn` | `ST1 {Vt.8B - Vt3.8B},[Xn]` | | `v7/A32/A64` | -| void vst1q_s8_x3(
     int8_t *ptr,
     int8x16x3_t val)
| `val.val[2] -> Vt3.16B`
`val.val[1] -> Vt2.16B`
`val.val[0] -> Vt.16B`
`ptr -> Xn` | `ST1 {Vt.16B - Vt3.16B},[Xn]` | | `v7/A32/A64` | -| void vst1_s16_x3(
     int16_t *ptr,
     int16x4x3_t val)
| `val.val[2] -> Vt3.4H`
`val.val[1] -> Vt2.4H`
`val.val[0] -> Vt.4H`
`ptr -> Xn` | `ST1 {Vt.4H - Vt3.4H},[Xn]` | | `v7/A32/A64` | -| void vst1q_s16_x3(
     int16_t *ptr,
     int16x8x3_t val)
| `val.val[2] -> Vt3.8H`
`val.val[1] -> Vt2.8H`
`val.val[0] -> Vt.8H`
`ptr -> Xn` | `ST1 {Vt.8H - Vt3.8H},[Xn]` | | `v7/A32/A64` | -| void vst1_s32_x3(
     int32_t *ptr,
     int32x2x3_t val)
| `val.val[2] -> Vt3.2S`
`val.val[1] -> Vt2.2S`
`val.val[0] -> Vt.2S`
`ptr -> Xn` | `ST1 {Vt.2S - Vt3.2S},[Xn]` | | `v7/A32/A64` | -| void vst1q_s32_x3(
     int32_t *ptr,
     int32x4x3_t val)
| `val.val[2] -> Vt3.4S`
`val.val[1] -> Vt2.4S`
`val.val[0] -> Vt.4S`
`ptr -> Xn` | `ST1 {Vt.4S - Vt3.4S},[Xn]` | | `v7/A32/A64` | -| void vst1_u8_x3(
     uint8_t *ptr,
     uint8x8x3_t val)
| `val.val[2] -> Vt3.8B`
`val.val[1] -> Vt2.8B`
`val.val[0] -> Vt.8B`
`ptr -> Xn` | `ST1 {Vt.8B - Vt3.8B},[Xn]` | | `v7/A32/A64` | -| void vst1q_u8_x3(
     uint8_t *ptr,
     uint8x16x3_t val)
| `val.val[2] -> Vt3.16B`
`val.val[1] -> Vt2.16B`
`val.val[0] -> Vt.16B`
`ptr -> Xn` | `ST1 {Vt.16B - Vt3.16B},[Xn]` | | `v7/A32/A64` | -| void vst1_u16_x3(
     uint16_t *ptr,
     uint16x4x3_t val)
| `val.val[2] -> Vt3.4H`
`val.val[1] -> Vt2.4H`
`val.val[0] -> Vt.4H`
`ptr -> Xn` | `ST1 {Vt.4H - Vt3.4H},[Xn]` | | `v7/A32/A64` | -| void vst1q_u16_x3(
     uint16_t *ptr,
     uint16x8x3_t val)
| `val.val[2] -> Vt3.8H`
`val.val[1] -> Vt2.8H`
`val.val[0] -> Vt.8H`
`ptr -> Xn` | `ST1 {Vt.8H - Vt3.8H},[Xn]` | | `v7/A32/A64` | -| void vst1_u32_x3(
     uint32_t *ptr,
     uint32x2x3_t val)
| `val.val[2] -> Vt3.2S`
`val.val[1] -> Vt2.2S`
`val.val[0] -> Vt.2S`
`ptr -> Xn` | `ST1 {Vt.2S - Vt3.2S},[Xn]` | | `v7/A32/A64` | -| void vst1q_u32_x3(
     uint32_t *ptr,
     uint32x4x3_t val)
| `val.val[2] -> Vt3.4S`
`val.val[1] -> Vt2.4S`
`val.val[0] -> Vt.4S`
`ptr -> Xn` | `ST1 {Vt.4S - Vt3.4S},[Xn]` | | `v7/A32/A64` | -| void vst1_f16_x3(
     float16_t *ptr,
     float16x4x3_t val)
| `val.val[2] -> Vt3.4H`
`val.val[1] -> Vt2.4H`
`val.val[0] -> Vt.4H`
`ptr -> Xn` | `ST1 {Vt.4H - Vt3.4H},[Xn]` | | `v7/A32/A64` | -| void vst1q_f16_x3(
     float16_t *ptr,
     float16x8x3_t val)
| `val.val[2] -> Vt3.8H`
`val.val[1] -> Vt2.8H`
`val.val[0] -> Vt.8H`
`ptr -> Xn` | `ST1 {Vt.8H - Vt3.8H},[Xn]` | | `v7/A32/A64` | -| void vst1_f32_x3(
     float32_t *ptr,
     float32x2x3_t val)
| `val.val[2] -> Vt3.2S`
`val.val[1] -> Vt2.2S`
`val.val[0] -> Vt.2S`
`ptr -> Xn` | `ST1 {Vt.2S - Vt3.2S},[Xn]` | | `v7/A32/A64` | -| void vst1q_f32_x3(
     float32_t *ptr,
     float32x4x3_t val)
| `val.val[2] -> Vt3.4S`
`val.val[1] -> Vt2.4S`
`val.val[0] -> Vt.4S`
`ptr -> Xn` | `ST1 {Vt.4S - Vt3.4S},[Xn]` | | `v7/A32/A64` | -| void vst1_p8_x3(
     poly8_t *ptr,
     poly8x8x3_t val)
| `val.val[2] -> Vt3.8B`
`val.val[1] -> Vt2.8B`
`val.val[0] -> Vt.8B`
`ptr -> Xn` | `ST1 {Vt.8B - Vt3.8B},[Xn]` | | `v7/A32/A64` | -| void vst1q_p8_x3(
     poly8_t *ptr,
     poly8x16x3_t val)
| `val.val[2] -> Vt3.16B`
`val.val[1] -> Vt2.16B`
`val.val[0] -> Vt.16B`
`ptr -> Xn` | `ST1 {Vt.16B - Vt3.16B},[Xn]` | | `v7/A32/A64` | -| void vst1_p16_x3(
     poly16_t *ptr,
     poly16x4x3_t val)
| `val.val[2] -> Vt3.4H`
`val.val[1] -> Vt2.4H`
`val.val[0] -> Vt.4H`
`ptr -> Xn` | `ST1 {Vt.4H - Vt3.4H},[Xn]` | | `v7/A32/A64` | -| void vst1q_p16_x3(
     poly16_t *ptr,
     poly16x8x3_t val)
| `val.val[2] -> Vt3.8H`
`val.val[1] -> Vt2.8H`
`val.val[0] -> Vt.8H`
`ptr -> Xn` | `ST1 {Vt.8H - Vt3.8H},[Xn]` | | `v7/A32/A64` | -| void vst1_s64_x3(
     int64_t *ptr,
     int64x1x3_t val)
| `val.val[2] -> Vt3.1D`
`val.val[1] -> Vt2.1D`
`val.val[0] -> Vt.1D`
`ptr -> Xn` | `ST1 {Vt.1D - Vt3.1D},[Xn]` | | `v7/A32/A64` | -| void vst1_u64_x3(
     uint64_t *ptr,
     uint64x1x3_t val)
| `val.val[2] -> Vt3.1D`
`val.val[1] -> Vt2.1D`
`val.val[0] -> Vt.1D`
`ptr -> Xn` | `ST1 {Vt.1D - Vt3.1D},[Xn]` | | `v7/A32/A64` | -| void vst1_p64_x3(
     poly64_t *ptr,
     poly64x1x3_t val)
| `val.val[2] -> Vt3.1D`
`val.val[1] -> Vt2.1D`
`val.val[0] -> Vt.1D`
`ptr -> Xn` | `ST1 {Vt.1D - Vt3.1D},[Xn]` | | `A32/A64` | -| void vst1q_s64_x3(
     int64_t *ptr,
     int64x2x3_t val)
| `val.val[2] -> Vt3.2D`
`val.val[1] -> Vt2.2D`
`val.val[0] -> Vt.2D`
`ptr -> Xn` | `ST1 {Vt.2D - Vt3.2D},[Xn]` | | `v7/A32/A64` | -| void vst1q_u64_x3(
     uint64_t *ptr,
     uint64x2x3_t val)
| `val.val[2] -> Vt3.2D`
`val.val[1] -> Vt2.2D`
`val.val[0] -> Vt.2D`
`ptr -> Xn` | `ST1 {Vt.2D - Vt3.2D},[Xn]` | | `v7/A32/A64` | -| void vst1q_p64_x3(
     poly64_t *ptr,
     poly64x2x3_t val)
| `val.val[2] -> Vt3.2D`
`val.val[1] -> Vt2.2D`
`val.val[0] -> Vt.2D`
`ptr -> Xn` | `ST1 {Vt.2D - Vt3.2D},[Xn]` | | `v7/A32/A64` | -| void vst1_f64_x3(
     float64_t *ptr,
     float64x1x3_t val)
| `val.val[2] -> Vt3.1D`
`val.val[1] -> Vt2.1D`
`val.val[0] -> Vt.1D`
`ptr -> Xn` | `ST1 {Vt.1D - Vt3.1D},[Xn]` | | `A64` | -| void vst1q_f64_x3(
     float64_t *ptr,
     float64x2x3_t val)
| `val.val[2] -> Vt3.2D`
`val.val[1] -> Vt2.2D`
`val.val[0] -> Vt.2D`
`ptr -> Xn` | `ST1 {Vt.2D - Vt3.2D},[Xn]` | | `A64` | -| void vst1_s8_x4(
     int8_t *ptr,
     int8x8x4_t val)
| `val.val[3] -> Vt4.8B`
`val.val[2] -> Vt3.8B`
`val.val[1] -> Vt2.8B`
`val.val[0] -> Vt.8B`
`ptr -> Xn` | `ST1 {Vt.8B - Vt4.8B},[Xn]` | | `v7/A32/A64` | -| void vst1q_s8_x4(
     int8_t *ptr,
     int8x16x4_t val)
| `val.val[3] -> Vt4.16B`
`val.val[2] -> Vt3.16B`
`val.val[1] -> Vt2.16B`
`val.val[0] -> Vt.16B`
`ptr -> Xn` | `ST1 {Vt.16B - Vt4.16B},[Xn]` | | `v7/A32/A64` | -| void vst1_s16_x4(
     int16_t *ptr,
     int16x4x4_t val)
| `val.val[3] -> Vt4.4H`
`val.val[2] -> Vt3.4H`
`val.val[1] -> Vt2.4H`
`val.val[0] -> Vt.4H`
`ptr -> Xn` | `ST1 {Vt.4H - Vt4.4H},[Xn]` | | `v7/A32/A64` | -| void vst1q_s16_x4(
     int16_t *ptr,
     int16x8x4_t val)
| `val.val[3] -> Vt4.8H`
`val.val[2] -> Vt3.8H`
`val.val[1] -> Vt2.8H`
`val.val[0] -> Vt.8H`
`ptr -> Xn` | `ST1 {Vt.8H - Vt4.8H},[Xn]` | | `v7/A32/A64` | -| void vst1_s32_x4(
     int32_t *ptr,
     int32x2x4_t val)
| `val.val[3] -> Vt4.2S`
`val.val[2] -> Vt3.2S`
`val.val[1] -> Vt2.2S`
`val.val[0] -> Vt.2S`
`ptr -> Xn` | `ST1 {Vt.2S - Vt4.2S},[Xn]` | | `v7/A32/A64` | -| void vst1q_s32_x4(
     int32_t *ptr,
     int32x4x4_t val)
| `val.val[3] -> Vt4.4S`
`val.val[2] -> Vt3.4S`
`val.val[1] -> Vt2.4S`
`val.val[0] -> Vt.4S`
`ptr -> Xn` | `ST1 {Vt.4S - Vt4.4S},[Xn]` | | `v7/A32/A64` | -| void vst1_u8_x4(
     uint8_t *ptr,
     uint8x8x4_t val)
| `val.val[3] -> Vt4.8B`
`val.val[2] -> Vt3.8B`
`val.val[1] -> Vt2.8B`
`val.val[0] -> Vt.8B`
`ptr -> Xn` | `ST1 {Vt.8B - Vt4.8B},[Xn]` | | `v7/A32/A64` | -| void vst1q_u8_x4(
     uint8_t *ptr,
     uint8x16x4_t val)
| `val.val[3] -> Vt4.16B`
`val.val[2] -> Vt3.16B`
`val.val[1] -> Vt2.16B`
`val.val[0] -> Vt.16B`
`ptr -> Xn` | `ST1 {Vt.16B - Vt4.16B},[Xn]` | | `v7/A32/A64` | -| void vst1_u16_x4(
     uint16_t *ptr,
     uint16x4x4_t val)
| `val.val[3] -> Vt4.4H`
`val.val[2] -> Vt3.4H`
`val.val[1] -> Vt2.4H`
`val.val[0] -> Vt.4H`
`ptr -> Xn` | `ST1 {Vt.4H - Vt4.4H},[Xn]` | | `v7/A32/A64` | -| void vst1q_u16_x4(
     uint16_t *ptr,
     uint16x8x4_t val)
| `val.val[3] -> Vt4.8H`
`val.val[2] -> Vt3.8H`
`val.val[1] -> Vt2.8H`
`val.val[0] -> Vt.8H`
`ptr -> Xn` | `ST1 {Vt.8H - Vt4.8H},[Xn]` | | `v7/A32/A64` | -| void vst1_u32_x4(
     uint32_t *ptr,
     uint32x2x4_t val)
| `val.val[3] -> Vt4.2S`
`val.val[2] -> Vt3.2S`
`val.val[1] -> Vt2.2S`
`val.val[0] -> Vt.2S`
`ptr -> Xn` | `ST1 {Vt.2S - Vt4.2S},[Xn]` | | `v7/A32/A64` | -| void vst1q_u32_x4(
     uint32_t *ptr,
     uint32x4x4_t val)
| `val.val[3] -> Vt4.4S`
`val.val[2] -> Vt3.4S`
`val.val[1] -> Vt2.4S`
`val.val[0] -> Vt.4S`
`ptr -> Xn` | `ST1 {Vt.4S - Vt4.4S},[Xn]` | | `v7/A32/A64` | -| void vst1_f16_x4(
     float16_t *ptr,
     float16x4x4_t val)
| `val.val[3] -> Vt4.4H`
`val.val[2] -> Vt3.4H`
`val.val[1] -> Vt2.4H`
`val.val[0] -> Vt.4H`
`ptr -> Xn` | `ST1 {Vt.4H - Vt4.4H},[Xn]` | | `v7/A32/A64` | -| void vst1q_f16_x4(
     float16_t *ptr,
     float16x8x4_t val)
| `val.val[3] -> Vt4.8H`
`val.val[2] -> Vt3.8H`
`val.val[1] -> Vt2.8H`
`val.val[0] -> Vt.8H`
`ptr -> Xn` | `ST1 {Vt.8H - Vt4.8H},[Xn]` | | `v7/A32/A64` | -| void vst1_f32_x4(
     float32_t *ptr,
     float32x2x4_t val)
| `val.val[3] -> Vt4.2S`
`val.val[2] -> Vt3.2S`
`val.val[1] -> Vt2.2S`
`val.val[0] -> Vt.2S`
`ptr -> Xn` | `ST1 {Vt.2S - Vt4.2S},[Xn]` | | `v7/A32/A64` | -| void vst1q_f32_x4(
     float32_t *ptr,
     float32x4x4_t val)
| `val.val[3] -> Vt4.4S`
`val.val[2] -> Vt3.4S`
`val.val[1] -> Vt2.4S`
`val.val[0] -> Vt.4S`
`ptr -> Xn` | `ST1 {Vt.4S - Vt4.4S},[Xn]` | | `v7/A32/A64` | -| void vst1_p8_x4(
     poly8_t *ptr,
     poly8x8x4_t val)
| `val.val[3] -> Vt4.8B`
`val.val[2] -> Vt3.8B`
`val.val[1] -> Vt2.8B`
`val.val[0] -> Vt.8B`
`ptr -> Xn` | `ST1 {Vt.8B - Vt4.8B},[Xn]` | | `v7/A32/A64` | -| void vst1q_p8_x4(
     poly8_t *ptr,
     poly8x16x4_t val)
| `val.val[3] -> Vt4.16B`
`val.val[2] -> Vt3.16B`
`val.val[1] -> Vt2.16B`
`val.val[0] -> Vt.16B`
`ptr -> Xn` | `ST1 {Vt.16B - Vt4.16B},[Xn]` | | `v7/A32/A64` | -| void vst1_p16_x4(
     poly16_t *ptr,
     poly16x4x4_t val)
| `val.val[3] -> Vt4.4H`
`val.val[2] -> Vt3.4H`
`val.val[1] -> Vt2.4H`
`val.val[0] -> Vt.4H`
`ptr -> Xn` | `ST1 {Vt.4H - Vt4.4H},[Xn]` | | `v7/A32/A64` | -| void vst1q_p16_x4(
     poly16_t *ptr,
     poly16x8x4_t val)
| `val.val[3] -> Vt4.8H`
`val.val[2] -> Vt3.8H`
`val.val[1] -> Vt2.8H`
`val.val[0] -> Vt.8H`
`ptr -> Xn` | `ST1 {Vt.8H - Vt4.8H},[Xn]` | | `v7/A32/A64` | -| void vst1_s64_x4(
     int64_t *ptr,
     int64x1x4_t val)
| `val.val[3] -> Vt4.1D`
`val.val[2] -> Vt3.1D`
`val.val[1] -> Vt2.1D`
`val.val[0] -> Vt.1D`
`ptr -> Xn` | `ST1 {Vt.1D - Vt4.1D},[Xn]` | | `v7/A32/A64` | -| void vst1_u64_x4(
     uint64_t *ptr,
     uint64x1x4_t val)
| `val.val[3] -> Vt4.1D`
`val.val[2] -> Vt3.1D`
`val.val[1] -> Vt2.1D`
`val.val[0] -> Vt.1D`
`ptr -> Xn` | `ST1 {Vt.1D - Vt4.1D},[Xn]` | | `v7/A32/A64` | -| void vst1_p64_x4(
     poly64_t *ptr,
     poly64x1x4_t val)
| `val.val[3] -> Vt4.1D`
`val.val[2] -> Vt3.1D`
`val.val[1] -> Vt2.1D`
`val.val[0] -> Vt.1D`
`ptr -> Xn` | `ST1 {Vt.1D - Vt4.1D},[Xn]` | | `A32/A64` | -| void vst1q_s64_x4(
     int64_t *ptr,
     int64x2x4_t val)
| `val.val[3] -> Vt4.2D`
`val.val[2] -> Vt3.2D`
`val.val[1] -> Vt2.2D`
`val.val[0] -> Vt.2D`
`ptr -> Xn` | `ST1 {Vt.2D - Vt4.2D},[Xn]` | | `v7/A32/A64` | -| void vst1q_u64_x4(
     uint64_t *ptr,
     uint64x2x4_t val)
| `val.val[3] -> Vt4.2D`
`val.val[2] -> Vt3.2D`
`val.val[1] -> Vt2.2D`
`val.val[0] -> Vt.2D`
`ptr -> Xn` | `ST1 {Vt.2D - Vt4.2D},[Xn]` | | `v7/A32/A64` | -| void vst1q_p64_x4(
     poly64_t *ptr,
     poly64x2x4_t val)
| `val.val[3] -> Vt4.2D`
`val.val[2] -> Vt3.2D`
`val.val[1] -> Vt2.2D`
`val.val[0] -> Vt.2D`
`ptr -> Xn` | `ST1 {Vt.2D - Vt4.2D},[Xn]` | | `A32/A64` | -| void vst1_f64_x4(
     float64_t *ptr,
     float64x1x4_t val)
| `val.val[3] -> Vt4.1D`
`val.val[2] -> Vt3.1D`
`val.val[1] -> Vt2.1D`
`val.val[0] -> Vt.1D`
`ptr -> Xn` | `ST1 {Vt.1D - Vt4.1D},[Xn]` | | `A64` | -| void vst1q_f64_x4(
     float64_t *ptr,
     float64x2x4_t val)
| `val.val[3] -> Vt4.2D`
`val.val[2] -> Vt3.2D`
`val.val[1] -> Vt2.2D`
`val.val[0] -> Vt.2D`
`ptr -> Xn` | `ST1 {Vt.2D - Vt4.2D},[Xn]` | | `A64` | +| Intrinsic | Argument preparation | AArch64 Instruction | Result | Supported architectures | +|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------|----------|---------------------------| +| void vst1_s8(
     int8_t *ptr,
     int8x8_t val)
| `val -> Vt.8B`
`ptr -> Xn` | `ST1 {Vt.8B},[Xn]` | | `v7/A32/A64` | +| void vst1q_s8(
     int8_t *ptr,
     int8x16_t val)
| `val -> Vt.16B`
`ptr -> Xn` | `ST1 {Vt.16B},[Xn]` | | `v7/A32/A64` | +| void vst1_s16(
     int16_t *ptr,
     int16x4_t val)
| `val -> Vt.4H`
`ptr -> Xn` | `ST1 {Vt.4H},[Xn]` | | `v7/A32/A64` | +| void vst1q_s16(
     int16_t *ptr,
     int16x8_t val)
| `val -> Vt.8H`
`ptr -> Xn` | `ST1 {Vt.8H},[Xn]` | | `v7/A32/A64` | +| void vst1_s32(
     int32_t *ptr,
     int32x2_t val)
| `val -> Vt.2S`
`ptr -> Xn` | `ST1 {Vt.2S},[Xn]` | | `v7/A32/A64` | +| void vst1q_s32(
     int32_t *ptr,
     int32x4_t val)
| `val -> Vt.4S`
`ptr -> Xn` | `ST1 {Vt.4S},[Xn]` | | `v7/A32/A64` | +| void vst1_s64(
     int64_t *ptr,
     int64x1_t val)
| `val -> Vt.1D`
`ptr -> Xn` | `ST1 {Vt.1D},[Xn]` | | `v7/A32/A64` | +| void vst1q_s64(
     int64_t *ptr,
     int64x2_t val)
| `val -> Vt.2D`
`ptr -> Xn` | `ST1 {Vt.2D},[Xn]` | | `v7/A32/A64` | +| void vst1_u8(
     uint8_t *ptr,
     uint8x8_t val)
| `val -> Vt.8B`
`ptr -> Xn` | `ST1 {Vt.8B},[Xn]` | | `v7/A32/A64` | +| void vst1q_u8(
     uint8_t *ptr,
     uint8x16_t val)
| `val -> Vt.16B`
`ptr -> Xn` | `ST1 {Vt.16B},[Xn]` | | `v7/A32/A64` | +| void vst1_u16(
     uint16_t *ptr,
     uint16x4_t val)
| `val -> Vt.4H`
`ptr -> Xn` | `ST1 {Vt.4H},[Xn]` | | `v7/A32/A64` | +| void vst1q_u16(
     uint16_t *ptr,
     uint16x8_t val)
| `val -> Vt.8H`
`ptr -> Xn` | `ST1 {Vt.8H},[Xn]` | | `v7/A32/A64` | +| void vst1_u32(
     uint32_t *ptr,
     uint32x2_t val)
| `val -> Vt.2S`
`ptr -> Xn` | `ST1 {Vt.2S},[Xn]` | | `v7/A32/A64` | +| void vst1q_u32(
     uint32_t *ptr,
     uint32x4_t val)
| `val -> Vt.4S`
`ptr -> Xn` | `ST1 {Vt.4S},[Xn]` | | `v7/A32/A64` | +| void vst1_u64(
     uint64_t *ptr,
     uint64x1_t val)
| `val -> Vt.1D`
`ptr -> Xn` | `ST1 {Vt.1D},[Xn]` | | `v7/A32/A64` | +| void vst1q_u64(
     uint64_t *ptr,
     uint64x2_t val)
| `val -> Vt.2D`
`ptr -> Xn` | `ST1 {Vt.2D},[Xn]` | | `v7/A32/A64` | +| void vst1_p64(
     poly64_t *ptr,
     poly64x1_t val)
| `val -> Vt.1D`
`ptr -> Xn` | `ST1 {Vt.1D},[Xn]` | | `A32/A64` | +| void vst1q_p64(
     poly64_t *ptr,
     poly64x2_t val)
| `val -> Vt.2D`
`ptr -> Xn` | `ST1 {Vt.2D},[Xn]` | | `A32/A64` | +| void vst1_f16(
     float16_t *ptr,
     float16x4_t val)
| `val -> Vt.4H`
`ptr -> Xn` | `ST1 {Vt.4H},[Xn]` | | `v7/A32/A64` | +| void vst1q_f16(
     float16_t *ptr,
     float16x8_t val)
| `val -> Vt.8H`
`ptr -> Xn` | `ST1 {Vt.8H},[Xn]` | | `v7/A32/A64` | +| void vst1_f32(
     float32_t *ptr,
     float32x2_t val)
| `val -> Vt.2S`
`ptr -> Xn` | `ST1 {Vt.2S},[Xn]` | | `v7/A32/A64` | +| void vst1q_f32(
     float32_t *ptr,
     float32x4_t val)
| `val -> Vt.4S`
`ptr -> Xn` | `ST1 {Vt.4S},[Xn]` | | `v7/A32/A64` | +| void vst1_p8(
     poly8_t *ptr,
     poly8x8_t val)
| `val -> Vt.8B`
`ptr -> Xn` | `ST1 {Vt.8B},[Xn]` | | `v7/A32/A64` | +| void vst1q_p8(
     poly8_t *ptr,
     poly8x16_t val)
| `val -> Vt.16B`
`ptr -> Xn` | `ST1 {Vt.16B},[Xn]` | | `v7/A32/A64` | +| void vst1_p16(
     poly16_t *ptr,
     poly16x4_t val)
| `val -> Vt.4H`
`ptr -> Xn` | `ST1 {Vt.4H},[Xn]` | | `v7/A32/A64` | +| void vst1q_p16(
     poly16_t *ptr,
     poly16x8_t val)
| `val -> Vt.8H`
`ptr -> Xn` | `ST1 {Vt.8H},[Xn]` | | `v7/A32/A64` | +| void vst1_f64(
     float64_t *ptr,
     float64x1_t val)
| `val -> Vt.1D`
`ptr -> Xn` | `ST1 {Vt.1D},[Xn]` | | `A64` | +| void vst1q_f64(
     float64_t *ptr,
     float64x2_t val)
| `val -> Vt.2D`
`ptr -> Xn` | `ST1 {Vt.2D},[Xn]` | | `A64` | +| void vst1_mf8(
     mfloat8_t *ptr,
     mfloat8x8_t val)
| `val -> Vt.8B`
`ptr -> Xn` | `ST1 {Vt.8B},[Xn]` | | `A64` | +| void vst1q_mf8(
     mfloat8_t *ptr,
     mfloat8x16_t val)
| `val -> Vt.16B`
`ptr -> Xn` | `ST1 {Vt.16B},[Xn]` | | `A64` | +| void vst1_lane_s8(
     int8_t *ptr,
     int8x8_t val,
     const int lane)
| `val -> Vt.8B`
`ptr -> Xn`
`0 <= lane <= 7` | `ST1 {Vt.b}[lane],[Xn]` | | `v7/A32/A64` | +| void vst1q_lane_s8(
     int8_t *ptr,
     int8x16_t val,
     const int lane)
| `val -> Vt.16B`
`ptr -> Xn`
`0 <= lane <= 15` | `ST1 {Vt.b}[lane],[Xn]` | | `v7/A32/A64` | +| void vst1_lane_s16(
     int16_t *ptr,
     int16x4_t val,
     const int lane)
| `val -> Vt.4H`
`ptr -> Xn`
`0 <= lane <= 3` | `ST1 {Vt.h}[lane],[Xn]` | | `v7/A32/A64` | +| void vst1q_lane_s16(
     int16_t *ptr,
     int16x8_t val,
     const int lane)
| `val -> Vt.8H`
`ptr -> Xn`
`0 <= lane <= 7` | `ST1 {Vt.h}[lane],[Xn]` | | `v7/A32/A64` | +| void vst1_lane_s32(
     int32_t *ptr,
     int32x2_t val,
     const int lane)
| `val -> Vt.2S`
`ptr -> Xn`
`0 <= lane <= 1` | `ST1 {Vt.s}[lane],[Xn]` | | `v7/A32/A64` | +| void vst1q_lane_s32(
     int32_t *ptr,
     int32x4_t val,
     const int lane)
| `val -> Vt.4S`
`ptr -> Xn`
`0 <= lane <= 3` | `ST1 {Vt.s}[lane],[Xn]` | | `v7/A32/A64` | +| void vst1_lane_s64(
     int64_t *ptr,
     int64x1_t val,
     const int lane)
| `val -> Vt.1D`
`ptr -> Xn`
`0 <= lane <= 0` | `ST1 {Vt.d}[lane],[Xn]` | | `v7/A32/A64` | +| void vst1q_lane_s64(
     int64_t *ptr,
     int64x2_t val,
     const int lane)
| `val -> Vt.2D`
`ptr -> Xn`
`0 <= lane <= 1` | `ST1 {Vt.d}[lane],[Xn]` | | `v7/A32/A64` | +| void vst1_lane_u8(
     uint8_t *ptr,
     uint8x8_t val,
     const int lane)
| `val -> Vt.8B`
`ptr -> Xn`
`0 <= lane <= 7` | `ST1 {Vt.b}[lane],[Xn]` | | `v7/A32/A64` | +| void vst1q_lane_u8(
     uint8_t *ptr,
     uint8x16_t val,
     const int lane)
| `val -> Vt.16B`
`ptr -> Xn`
`0 <= lane <= 15` | `ST1 {Vt.b}[lane],[Xn]` | | `v7/A32/A64` | +| void vst1_lane_u16(
     uint16_t *ptr,
     uint16x4_t val,
     const int lane)
| `val -> Vt.4H`
`ptr -> Xn`
`0 <= lane <= 3` | `ST1 {Vt.h}[lane],[Xn]` | | `v7/A32/A64` | +| void vst1q_lane_u16(
     uint16_t *ptr,
     uint16x8_t val,
     const int lane)
| `val -> Vt.8H`
`ptr -> Xn`
`0 <= lane <= 7` | `ST1 {Vt.h}[lane],[Xn]` | | `v7/A32/A64` | +| void vst1_lane_u32(
     uint32_t *ptr,
     uint32x2_t val,
     const int lane)
| `val -> Vt.2S`
`ptr -> Xn`
`0 <= lane <= 1` | `ST1 {Vt.s}[lane],[Xn]` | | `v7/A32/A64` | +| void vst1q_lane_u32(
     uint32_t *ptr,
     uint32x4_t val,
     const int lane)
| `val -> Vt.4S`
`ptr -> Xn`
`0 <= lane <= 3` | `ST1 {Vt.s}[lane],[Xn]` | | `v7/A32/A64` | +| void vst1_lane_u64(
     uint64_t *ptr,
     uint64x1_t val,
     const int lane)
| `val -> Vt.1D`
`ptr -> Xn`
`0 <= lane <= 0` | `ST1 {Vt.d}[lane],[Xn]` | | `v7/A32/A64` | +| void vst1q_lane_u64(
     uint64_t *ptr,
     uint64x2_t val,
     const int lane)
| `val -> Vt.2D`
`ptr -> Xn`
`0 <= lane <= 1` | `ST1 {Vt.d}[lane],[Xn]` | | `v7/A32/A64` | +| void vst1_lane_p64(
     poly64_t *ptr,
     poly64x1_t val,
     const int lane)
| `val -> Vt.1D`
`ptr -> Xn`
`0 <= lane <= 0` | `ST1 {Vt.d}[lane],[Xn]` | | `A32/A64` | +| void vst1q_lane_p64(
     poly64_t *ptr,
     poly64x2_t val,
     const int lane)
| `val -> Vt.2D`
`ptr -> Xn`
`0 <= lane <= 1` | `ST1 {Vt.d}[lane],[Xn]` | | `A32/A64` | +| void vst1_lane_f16(
     float16_t *ptr,
     float16x4_t val,
     const int lane)
| `val -> Vt.4H`
`ptr -> Xn`
`0 <= lane <= 3` | `ST1 {Vt.h}[lane],[Xn]` | | `v7/A32/A64` | +| void vst1q_lane_f16(
     float16_t *ptr,
     float16x8_t val,
     const int lane)
| `val -> Vt.8H`
`ptr -> Xn`
`0 <= lane <= 7` | `ST1 {Vt.h}[lane],[Xn]` | | `v7/A32/A64` | +| void vst1_lane_f32(
     float32_t *ptr,
     float32x2_t val,
     const int lane)
| `val -> Vt.2S`
`ptr -> Xn`
`0 <= lane <= 1` | `ST1 {Vt.s}[lane],[Xn]` | | `v7/A32/A64` | +| void vst1q_lane_f32(
     float32_t *ptr,
     float32x4_t val,
     const int lane)
| `val -> Vt.4S`
`ptr -> Xn`
`0 <= lane <= 3` | `ST1 {Vt.s}[lane],[Xn]` | | `v7/A32/A64` | +| void vst1_lane_p8(
     poly8_t *ptr,
     poly8x8_t val,
     const int lane)
| `val -> Vt.8B`
`ptr -> Xn`
`0 <= lane <= 7` | `ST1 {Vt.b}[lane],[Xn]` | | `v7/A32/A64` | +| void vst1q_lane_p8(
     poly8_t *ptr,
     poly8x16_t val,
     const int lane)
| `val -> Vt.16B`
`ptr -> Xn`
`0 <= lane <= 15` | `ST1 {Vt.b}[lane],[Xn]` | | `v7/A32/A64` | +| void vst1_lane_p16(
     poly16_t *ptr,
     poly16x4_t val,
     const int lane)
| `val -> Vt.4H`
`ptr -> Xn`
`0 <= lane <= 3` | `ST1 {Vt.h}[lane],[Xn]` | | `v7/A32/A64` | +| void vst1q_lane_p16(
     poly16_t *ptr,
     poly16x8_t val,
     const int lane)
| `val -> Vt.8H`
`ptr -> Xn`
`0 <= lane <= 7` | `ST1 {Vt.h}[lane],[Xn]` | | `v7/A32/A64` | +| void vst1_lane_f64(
     float64_t *ptr,
     float64x1_t val,
     const int lane)
| `val -> Vt.1D`
`ptr -> Xn`
`0 <= lane <= 0` | `ST1 {Vt.d}[lane],[Xn]` | | `A64` | +| void vst1q_lane_f64(
     float64_t *ptr,
     float64x2_t val,
     const int lane)
| `val -> Vt.2D`
`ptr -> Xn`
`0 <= lane <= 1` | `ST1 {Vt.d}[lane],[Xn]` | | `A64` | +| void vst1_lane_mf8(
     mfloat8_t *ptr,
     mfloat8x8_t val,
     const int lane)
| `val -> Vt.8B`
`ptr -> Xn`
`0 <= lane <= 7` | `ST1 {Vt.b}[lane],[Xn]` | | `A64` | +| void vst1q_lane_mf8(
     mfloat8_t *ptr,
     mfloat8x16_t val,
     const int lane)
| `val -> Vt.16B`
`ptr -> Xn`
`0 <= lane <= 15` | `ST1 {Vt.b}[lane],[Xn]` | | `A64` | +| void vst2_s8(
     int8_t *ptr,
     int8x8x2_t val)
| `val.val[1] -> Vt2.8B`
`val.val[0] -> Vt.8B`
`ptr -> Xn` | `ST2 {Vt.8B - Vt2.8B},[Xn]` | | `v7/A32/A64` | +| void vst2q_s8(
     int8_t *ptr,
     int8x16x2_t val)
| `val.val[1] -> Vt2.16B`
`val.val[0] -> Vt.16B`
`ptr -> Xn` | `ST2 {Vt.16B - Vt2.16B},[Xn]` | | `v7/A32/A64` | +| void vst2_s16(
     int16_t *ptr,
     int16x4x2_t val)
| `val.val[1] -> Vt2.4H`
`val.val[0] -> Vt.4H`
`ptr -> Xn` | `ST2 {Vt.4H - Vt2.4H},[Xn]` | | `v7/A32/A64` | +| void vst2q_s16(
     int16_t *ptr,
     int16x8x2_t val)
| `val.val[1] -> Vt2.8H`
`val.val[0] -> Vt.8H`
`ptr -> Xn` | `ST2 {Vt.8H - Vt2.8H},[Xn]` | | `v7/A32/A64` | +| void vst2_s32(
     int32_t *ptr,
     int32x2x2_t val)
| `val.val[1] -> Vt2.2S`
`val.val[0] -> Vt.2S`
`ptr -> Xn` | `ST2 {Vt.2S - Vt2.2S},[Xn]` | | `v7/A32/A64` | +| void vst2q_s32(
     int32_t *ptr,
     int32x4x2_t val)
| `val.val[1] -> Vt2.4S`
`val.val[0] -> Vt.4S`
`ptr -> Xn` | `ST2 {Vt.4S - Vt2.4S},[Xn]` | | `v7/A32/A64` | +| void vst2_u8(
     uint8_t *ptr,
     uint8x8x2_t val)
| `val.val[1] -> Vt2.8B`
`val.val[0] -> Vt.8B`
`ptr -> Xn` | `ST2 {Vt.8B - Vt2.8B},[Xn]` | | `v7/A32/A64` | +| void vst2q_u8(
     uint8_t *ptr,
     uint8x16x2_t val)
| `val.val[1] -> Vt2.16B`
`val.val[0] -> Vt.16B`
`ptr -> Xn` | `ST2 {Vt.16B - Vt2.16B},[Xn]` | | `v7/A32/A64` | +| void vst2_u16(
     uint16_t *ptr,
     uint16x4x2_t val)
| `val.val[1] -> Vt2.4H`
`val.val[0] -> Vt.4H`
`ptr -> Xn` | `ST2 {Vt.4H - Vt2.4H},[Xn]` | | `v7/A32/A64` | +| void vst2q_u16(
     uint16_t *ptr,
     uint16x8x2_t val)
| `val.val[1] -> Vt2.8H`
`val.val[0] -> Vt.8H`
`ptr -> Xn` | `ST2 {Vt.8H - Vt2.8H},[Xn]` | | `v7/A32/A64` | +| void vst2_u32(
     uint32_t *ptr,
     uint32x2x2_t val)
| `val.val[1] -> Vt2.2S`
`val.val[0] -> Vt.2S`
`ptr -> Xn` | `ST2 {Vt.2S - Vt2.2S},[Xn]` | | `v7/A32/A64` | +| void vst2q_u32(
     uint32_t *ptr,
     uint32x4x2_t val)
| `val.val[1] -> Vt2.4S`
`val.val[0] -> Vt.4S`
`ptr -> Xn` | `ST2 {Vt.4S - Vt2.4S},[Xn]` | | `v7/A32/A64` | +| void vst2_f16(
     float16_t *ptr,
     float16x4x2_t val)
| `val.val[1] -> Vt2.4H`
`val.val[0] -> Vt.4H`
`ptr -> Xn` | `ST2 {Vt.4H - Vt2.4H},[Xn]` | | `v7/A32/A64` | +| void vst2q_f16(
     float16_t *ptr,
     float16x8x2_t val)
| `val.val[1] -> Vt2.8H`
`val.val[0] -> Vt.8H`
`ptr -> Xn` | `ST2 {Vt.8H - Vt2.8H},[Xn]` | | `v7/A32/A64` | +| void vst2_f32(
     float32_t *ptr,
     float32x2x2_t val)
| `val.val[1] -> Vt2.2S`
`val.val[0] -> Vt.2S`
`ptr -> Xn` | `ST2 {Vt.2S - Vt2.2S},[Xn]` | | `v7/A32/A64` | +| void vst2q_f32(
     float32_t *ptr,
     float32x4x2_t val)
| `val.val[1] -> Vt2.4S`
`val.val[0] -> Vt.4S`
`ptr -> Xn` | `ST2 {Vt.4S - Vt2.4S},[Xn]` | | `v7/A32/A64` | +| void vst2_p8(
     poly8_t *ptr,
     poly8x8x2_t val)
| `val.val[1] -> Vt2.8B`
`val.val[0] -> Vt.8B`
`ptr -> Xn` | `ST2 {Vt.8B - Vt2.8B},[Xn]` | | `v7/A32/A64` | +| void vst2q_p8(
     poly8_t *ptr,
     poly8x16x2_t val)
| `val.val[1] -> Vt2.16B`
`val.val[0] -> Vt.16B`
`ptr -> Xn` | `ST2 {Vt.16B - Vt2.16B},[Xn]` | | `v7/A32/A64` | +| void vst2_p16(
     poly16_t *ptr,
     poly16x4x2_t val)
| `val.val[1] -> Vt2.4H`
`val.val[0] -> Vt.4H`
`ptr -> Xn` | `ST2 {Vt.4H - Vt2.4H},[Xn]` | | `v7/A32/A64` | +| void vst2q_p16(
     poly16_t *ptr,
     poly16x8x2_t val)
| `val.val[1] -> Vt2.8H`
`val.val[0] -> Vt.8H`
`ptr -> Xn` | `ST2 {Vt.8H - Vt2.8H},[Xn]` | | `v7/A32/A64` | +| void vst2_s64(
     int64_t *ptr,
     int64x1x2_t val)
| `val.val[1] -> Vt2.1D`
`val.val[0] -> Vt.1D`
`ptr -> Xn` | `ST1 {Vt.1D - Vt2.1D},[Xn]` | | `v7/A32/A64` | +| void vst2_u64(
     uint64_t *ptr,
     uint64x1x2_t val)
| `val.val[1] -> Vt2.1D`
`val.val[0] -> Vt.1D`
`ptr -> Xn` | `ST1 {Vt.1D - Vt2.1D},[Xn]` | | `v7/A32/A64` | +| void vst2_p64(
     poly64_t *ptr,
     poly64x1x2_t val)
| `val.val[1] -> Vt2.1D`
`val.val[0] -> Vt.1D`
`ptr -> Xn` | `ST1 {Vt.1D - Vt2.1D},[Xn]` | | `A32/A64` | +| void vst2q_s64(
     int64_t *ptr,
     int64x2x2_t val)
| `val.val[1] -> Vt2.2D`
`val.val[0] -> Vt.2D`
`ptr -> Xn` | `ST2 {Vt.2D - Vt2.2D},[Xn]` | | `A64` | +| void vst2q_u64(
     uint64_t *ptr,
     uint64x2x2_t val)
| `val.val[1] -> Vt2.2D`
`val.val[0] -> Vt.2D`
`ptr -> Xn` | `ST2 {Vt.2D - Vt2.2D},[Xn]` | | `A64` | +| void vst2q_p64(
     poly64_t *ptr,
     poly64x2x2_t val)
| `val.val[1] -> Vt2.2D`
`val.val[0] -> Vt.2D`
`ptr -> Xn` | `ST2 {Vt.2D - Vt2.2D},[Xn]` | | `A64` | +| void vst2_f64(
     float64_t *ptr,
     float64x1x2_t val)
| `val.val[1] -> Vt2.1D`
`val.val[0] -> Vt.1D`
`ptr -> Xn` | `ST1 {Vt.1D - Vt2.1D},[Xn]` | | `A64` | +| void vst2q_f64(
     float64_t *ptr,
     float64x2x2_t val)
| `val.val[1] -> Vt2.2D`
`val.val[0] -> Vt.2D`
`ptr -> Xn` | `ST2 {Vt.2D - Vt2.2D},[Xn]` | | `A64` | +| void vst2_mf8(
     mfloat8_t *ptr,
     mfloat8x8x2_t val)
| `val.val[1] -> Vt2.8B`
`val.val[0] -> Vt.8B`
`ptr -> Xn` | `ST2 {Vt.8B - Vt2.8B},[Xn]` | | `A64` | +| void vst2q_mf8(
     mfloat8_t *ptr,
     mfloat8x16x2_t val)
| `val.val[1] -> Vt2.16B`
`val.val[0] -> Vt.16B`
`ptr -> Xn` | `ST2 {Vt.16B - Vt2.16B},[Xn]` | | `A64` | +| void vst3_s8(
     int8_t *ptr,
     int8x8x3_t val)
| `val.val[2] -> Vt3.8B`
`val.val[1] -> Vt2.8B`
`val.val[0] -> Vt.8B`
`ptr -> Xn` | `ST3 {Vt.8B - Vt3.8B},[Xn]` | | `v7/A32/A64` | +| void vst3q_s8(
     int8_t *ptr,
     int8x16x3_t val)
| `val.val[2] -> Vt3.16B`
`val.val[1] -> Vt2.16B`
`val.val[0] -> Vt.16B`
`ptr -> Xn` | `ST3 {Vt.16B - Vt3.16B},[Xn]` | | `v7/A32/A64` | +| void vst3_s16(
     int16_t *ptr,
     int16x4x3_t val)
| `val.val[2] -> Vt3.4H`
`val.val[1] -> Vt2.4H`
`val.val[0] -> Vt.4H`
`ptr -> Xn` | `ST3 {Vt.4H - Vt3.4H},[Xn]` | | `v7/A32/A64` | +| void vst3q_s16(
     int16_t *ptr,
     int16x8x3_t val)
| `val.val[2] -> Vt3.8H`
`val.val[1] -> Vt2.8H`
`val.val[0] -> Vt.8H`
`ptr -> Xn` | `ST3 {Vt.8H - Vt3.8H},[Xn]` | | `v7/A32/A64` | +| void vst3_s32(
     int32_t *ptr,
     int32x2x3_t val)
| `val.val[2] -> Vt3.2S`
`val.val[1] -> Vt2.2S`
`val.val[0] -> Vt.2S`
`ptr -> Xn` | `ST3 {Vt.2S - Vt3.2S},[Xn]` | | `v7/A32/A64` | +| void vst3q_s32(
     int32_t *ptr,
     int32x4x3_t val)
| `val.val[2] -> Vt3.4S`
`val.val[1] -> Vt2.4S`
`val.val[0] -> Vt.4S`
`ptr -> Xn` | `ST3 {Vt.4S - Vt3.4S},[Xn]` | | `v7/A32/A64` | +| void vst3_u8(
     uint8_t *ptr,
     uint8x8x3_t val)
| `val.val[2] -> Vt3.8B`
`val.val[1] -> Vt2.8B`
`val.val[0] -> Vt.8B`
`ptr -> Xn` | `ST3 {Vt.8B - Vt3.8B},[Xn]` | | `v7/A32/A64` | +| void vst3q_u8(
     uint8_t *ptr,
     uint8x16x3_t val)
| `val.val[2] -> Vt3.16B`
`val.val[1] -> Vt2.16B`
`val.val[0] -> Vt.16B`
`ptr -> Xn` | `ST3 {Vt.16B - Vt3.16B},[Xn]` | | `v7/A32/A64` | +| void vst3_u16(
     uint16_t *ptr,
     uint16x4x3_t val)
| `val.val[2] -> Vt3.4H`
`val.val[1] -> Vt2.4H`
`val.val[0] -> Vt.4H`
`ptr -> Xn` | `ST3 {Vt.4H - Vt3.4H},[Xn]` | | `v7/A32/A64` | +| void vst3q_u16(
     uint16_t *ptr,
     uint16x8x3_t val)
| `val.val[2] -> Vt3.8H`
`val.val[1] -> Vt2.8H`
`val.val[0] -> Vt.8H`
`ptr -> Xn` | `ST3 {Vt.8H - Vt3.8H},[Xn]` | | `v7/A32/A64` | +| void vst3_u32(
     uint32_t *ptr,
     uint32x2x3_t val)
| `val.val[2] -> Vt3.2S`
`val.val[1] -> Vt2.2S`
`val.val[0] -> Vt.2S`
`ptr -> Xn` | `ST3 {Vt.2S - Vt3.2S},[Xn]` | | `v7/A32/A64` | +| void vst3q_u32(
     uint32_t *ptr,
     uint32x4x3_t val)
| `val.val[2] -> Vt3.4S`
`val.val[1] -> Vt2.4S`
`val.val[0] -> Vt.4S`
`ptr -> Xn` | `ST3 {Vt.4S - Vt3.4S},[Xn]` | | `v7/A32/A64` | +| void vst3_f16(
     float16_t *ptr,
     float16x4x3_t val)
| `val.val[2] -> Vt3.4H`
`val.val[1] -> Vt2.4H`
`val.val[0] -> Vt.4H`
`ptr -> Xn` | `ST3 {Vt.4H - Vt3.4H},[Xn]` | | `v7/A32/A64` | +| void vst3q_f16(
     float16_t *ptr,
     float16x8x3_t val)
| `val.val[2] -> Vt3.8H`
`val.val[1] -> Vt2.8H`
`val.val[0] -> Vt.8H`
`ptr -> Xn` | `ST3 {Vt.8H - Vt3.8H},[Xn]` | | `v7/A32/A64` | +| void vst3_f32(
     float32_t *ptr,
     float32x2x3_t val)
| `val.val[2] -> Vt3.2S`
`val.val[1] -> Vt2.2S`
`val.val[0] -> Vt.2S`
`ptr -> Xn` | `ST3 {Vt.2S - Vt3.2S},[Xn]` | | `v7/A32/A64` | +| void vst3q_f32(
     float32_t *ptr,
     float32x4x3_t val)
| `val.val[2] -> Vt3.4S`
`val.val[1] -> Vt2.4S`
`val.val[0] -> Vt.4S`
`ptr -> Xn` | `ST3 {Vt.4S - Vt3.4S},[Xn]` | | `v7/A32/A64` | +| void vst3_p8(
     poly8_t *ptr,
     poly8x8x3_t val)
| `val.val[2] -> Vt3.8B`
`val.val[1] -> Vt2.8B`
`val.val[0] -> Vt.8B`
`ptr -> Xn` | `ST3 {Vt.8B - Vt3.8B},[Xn]` | | `v7/A32/A64` | +| void vst3q_p8(
     poly8_t *ptr,
     poly8x16x3_t val)
| `val.val[2] -> Vt3.16B`
`val.val[1] -> Vt2.16B`
`val.val[0] -> Vt.16B`
`ptr -> Xn` | `ST3 {Vt.16B - Vt3.16B},[Xn]` | | `v7/A32/A64` | +| void vst3_p16(
     poly16_t *ptr,
     poly16x4x3_t val)
| `val.val[2] -> Vt3.4H`
`val.val[1] -> Vt2.4H`
`val.val[0] -> Vt.4H`
`ptr -> Xn` | `ST3 {Vt.4H - Vt3.4H},[Xn]` | | `v7/A32/A64` | +| void vst3q_p16(
     poly16_t *ptr,
     poly16x8x3_t val)
| `val.val[2] -> Vt3.8H`
`val.val[1] -> Vt2.8H`
`val.val[0] -> Vt.8H`
`ptr -> Xn` | `ST3 {Vt.8H - Vt3.8H},[Xn]` | | `v7/A32/A64` | +| void vst3_s64(
     int64_t *ptr,
     int64x1x3_t val)
| `val.val[2] -> Vt3.1D`
`val.val[1] -> Vt2.1D`
`val.val[0] -> Vt.1D`
`ptr -> Xn` | `ST1 {Vt.1D - Vt3.1D},[Xn]` | | `v7/A32/A64` | +| void vst3_u64(
     uint64_t *ptr,
     uint64x1x3_t val)
| `val.val[2] -> Vt3.1D`
`val.val[1] -> Vt2.1D`
`val.val[0] -> Vt.1D`
`ptr -> Xn` | `ST1 {Vt.1D - Vt3.1D},[Xn]` | | `v7/A32/A64` | +| void vst3_p64(
     poly64_t *ptr,
     poly64x1x3_t val)
| `val.val[2] -> Vt3.1D`
`val.val[1] -> Vt2.1D`
`val.val[0] -> Vt.1D`
`ptr -> Xn` | `ST1 {Vt.1D - Vt3.1D},[Xn]` | | `A32/A64` | +| void vst3q_s64(
     int64_t *ptr,
     int64x2x3_t val)
| `val.val[2] -> Vt3.2D`
`val.val[1] -> Vt2.2D`
`val.val[0] -> Vt.2D`
`ptr -> Xn` | `ST3 {Vt.2D - Vt3.2D},[Xn]` | | `A64` | +| void vst3q_u64(
     uint64_t *ptr,
     uint64x2x3_t val)
| `val.val[2] -> Vt3.2D`
`val.val[1] -> Vt2.2D`
`val.val[0] -> Vt.2D`
`ptr -> Xn` | `ST3 {Vt.2D - Vt3.2D},[Xn]` | | `A64` | +| void vst3q_p64(
     poly64_t *ptr,
     poly64x2x3_t val)
| `val.val[2] -> Vt3.2D`
`val.val[1] -> Vt2.2D`
`val.val[0] -> Vt.2D`
`ptr -> Xn` | `ST3 {Vt.2D - Vt3.2D},[Xn]` | | `A64` | +| void vst3_f64(
     float64_t *ptr,
     float64x1x3_t val)
| `val.val[2] -> Vt3.1D`
`val.val[1] -> Vt2.1D`
`val.val[0] -> Vt.1D`
`ptr -> Xn` | `ST1 {Vt.1D - Vt3.1D},[Xn]` | | `A64` | +| void vst3q_f64(
     float64_t *ptr,
     float64x2x3_t val)
| `val.val[2] -> Vt3.2D`
`val.val[1] -> Vt2.2D`
`val.val[0] -> Vt.2D`
`ptr -> Xn` | `ST3 {Vt.2D - Vt3.2D},[Xn]` | | `A64` | +| void vst3_mf8(
     mfloat8_t *ptr,
     mfloat8x8x3_t val)
| `val.val[2] -> Vt3.8B`
`val.val[1] -> Vt2.8B`
`val.val[0] -> Vt.8B`
`ptr -> Xn` | `ST3 {Vt.8B - Vt3.8B},[Xn]` | | `A64` | +| void vst3q_mf8(
     mfloat8_t *ptr,
     mfloat8x16x3_t val)
| `val.val[2] -> Vt3.16B`
`val.val[1] -> Vt2.16B`
`val.val[0] -> Vt.16B`
`ptr -> Xn` | `ST3 {Vt.16B - Vt3.16B},[Xn]` | | `A64` | +| void vst4_s8(
     int8_t *ptr,
     int8x8x4_t val)
| `val.val[3] -> Vt4.8B`
`val.val[2] -> Vt3.8B`
`val.val[1] -> Vt2.8B`
`val.val[0] -> Vt.8B`
`ptr -> Xn` | `ST4 {Vt.8B - Vt4.8B},[Xn]` | | `v7/A32/A64` | +| void vst4q_s8(
     int8_t *ptr,
     int8x16x4_t val)
| `val.val[3] -> Vt4.16B`
`val.val[2] -> Vt3.16B`
`val.val[1] -> Vt2.16B`
`val.val[0] -> Vt.16B`
`ptr -> Xn` | `ST4 {Vt.16B - Vt4.16B},[Xn]` | | `v7/A32/A64` | +| void vst4_s16(
     int16_t *ptr,
     int16x4x4_t val)
| `val.val[3] -> Vt4.4H`
`val.val[2] -> Vt3.4H`
`val.val[1] -> Vt2.4H`
`val.val[0] -> Vt.4H`
`ptr -> Xn` | `ST4 {Vt.4H - Vt4.4H},[Xn]` | | `v7/A32/A64` | +| void vst4q_s16(
     int16_t *ptr,
     int16x8x4_t val)
| `val.val[3] -> Vt4.8H`
`val.val[2] -> Vt3.8H`
`val.val[1] -> Vt2.8H`
`val.val[0] -> Vt.8H`
`ptr -> Xn` | `ST4 {Vt.8H - Vt4.8H},[Xn]` | | `v7/A32/A64` | +| void vst4_s32(
     int32_t *ptr,
     int32x2x4_t val)
| `val.val[3] -> Vt4.2S`
`val.val[2] -> Vt3.2S`
`val.val[1] -> Vt2.2S`
`val.val[0] -> Vt.2S`
`ptr -> Xn` | `ST4 {Vt.2S - Vt4.2S},[Xn]` | | `v7/A32/A64` | +| void vst4q_s32(
     int32_t *ptr,
     int32x4x4_t val)
| `val.val[3] -> Vt4.4S`
`val.val[2] -> Vt3.4S`
`val.val[1] -> Vt2.4S`
`val.val[0] -> Vt.4S`
`ptr -> Xn` | `ST4 {Vt.4S - Vt4.4S},[Xn]` | | `v7/A32/A64` | +| void vst4_u8(
     uint8_t *ptr,
     uint8x8x4_t val)
| `val.val[3] -> Vt4.8B`
`val.val[2] -> Vt3.8B`
`val.val[1] -> Vt2.8B`
`val.val[0] -> Vt.8B`
`ptr -> Xn` | `ST4 {Vt.8B - Vt4.8B},[Xn]` | | `v7/A32/A64` | +| void vst4q_u8(
     uint8_t *ptr,
     uint8x16x4_t val)
| `val.val[3] -> Vt4.16B`
`val.val[2] -> Vt3.16B`
`val.val[1] -> Vt2.16B`
`val.val[0] -> Vt.16B`
`ptr -> Xn` | `ST4 {Vt.16B - Vt4.16B},[Xn]` | | `v7/A32/A64` | +| void vst4_u16(
     uint16_t *ptr,
     uint16x4x4_t val)
| `val.val[3] -> Vt4.4H`
`val.val[2] -> Vt3.4H`
`val.val[1] -> Vt2.4H`
`val.val[0] -> Vt.4H`
`ptr -> Xn` | `ST4 {Vt.4H - Vt4.4H},[Xn]` | | `v7/A32/A64` | +| void vst4q_u16(
     uint16_t *ptr,
     uint16x8x4_t val)
| `val.val[3] -> Vt4.8H`
`val.val[2] -> Vt3.8H`
`val.val[1] -> Vt2.8H`
`val.val[0] -> Vt.8H`
`ptr -> Xn` | `ST4 {Vt.8H - Vt4.8H},[Xn]` | | `v7/A32/A64` | +| void vst4_u32(
     uint32_t *ptr,
     uint32x2x4_t val)
| `val.val[3] -> Vt4.2S`
`val.val[2] -> Vt3.2S`
`val.val[1] -> Vt2.2S`
`val.val[0] -> Vt.2S`
`ptr -> Xn` | `ST4 {Vt.2S - Vt4.2S},[Xn]` | | `v7/A32/A64` | +| void vst4q_u32(
     uint32_t *ptr,
     uint32x4x4_t val)
| `val.val[3] -> Vt4.4S`
`val.val[2] -> Vt3.4S`
`val.val[1] -> Vt2.4S`
`val.val[0] -> Vt.4S`
`ptr -> Xn` | `ST4 {Vt.4S - Vt4.4S},[Xn]` | | `v7/A32/A64` | +| void vst4_f16(
     float16_t *ptr,
     float16x4x4_t val)
| `val.val[3] -> Vt4.4H`
`val.val[2] -> Vt3.4H`
`val.val[1] -> Vt2.4H`
`val.val[0] -> Vt.4H`
`ptr -> Xn` | `ST4 {Vt.4H - Vt4.4H},[Xn]` | | `v7/A32/A64` | +| void vst4q_f16(
     float16_t *ptr,
     float16x8x4_t val)
| `val.val[3] -> Vt4.8H`
`val.val[2] -> Vt3.8H`
`val.val[1] -> Vt2.8H`
`val.val[0] -> Vt.8H`
`ptr -> Xn` | `ST4 {Vt.8H - Vt4.8H},[Xn]` | | `v7/A32/A64` | +| void vst4_f32(
     float32_t *ptr,
     float32x2x4_t val)
| `val.val[3] -> Vt4.2S`
`val.val[2] -> Vt3.2S`
`val.val[1] -> Vt2.2S`
`val.val[0] -> Vt.2S`
`ptr -> Xn` | `ST4 {Vt.2S - Vt4.2S},[Xn]` | | `v7/A32/A64` | +| void vst4q_f32(
     float32_t *ptr,
     float32x4x4_t val)
| `val.val[3] -> Vt4.4S`
`val.val[2] -> Vt3.4S`
`val.val[1] -> Vt2.4S`
`val.val[0] -> Vt.4S`
`ptr -> Xn` | `ST4 {Vt.4S - Vt4.4S},[Xn]` | | `v7/A32/A64` | +| void vst4_p8(
     poly8_t *ptr,
     poly8x8x4_t val)
| `val.val[3] -> Vt4.8B`
`val.val[2] -> Vt3.8B`
`val.val[1] -> Vt2.8B`
`val.val[0] -> Vt.8B`
`ptr -> Xn` | `ST4 {Vt.8B - Vt4.8B},[Xn]` | | `v7/A32/A64` | +| void vst4q_p8(
     poly8_t *ptr,
     poly8x16x4_t val)
| `val.val[3] -> Vt4.16B`
`val.val[2] -> Vt3.16B`
`val.val[1] -> Vt2.16B`
`val.val[0] -> Vt.16B`
`ptr -> Xn` | `ST4 {Vt.16B - Vt4.16B},[Xn]` | | `v7/A32/A64` | +| void vst4_p16(
     poly16_t *ptr,
     poly16x4x4_t val)
| `val.val[3] -> Vt4.4H`
`val.val[2] -> Vt3.4H`
`val.val[1] -> Vt2.4H`
`val.val[0] -> Vt.4H`
`ptr -> Xn` | `ST4 {Vt.4H - Vt4.4H},[Xn]` | | `v7/A32/A64` | +| void vst4q_p16(
     poly16_t *ptr,
     poly16x8x4_t val)
| `val.val[3] -> Vt4.8H`
`val.val[2] -> Vt3.8H`
`val.val[1] -> Vt2.8H`
`val.val[0] -> Vt.8H`
`ptr -> Xn` | `ST4 {Vt.8H - Vt4.8H},[Xn]` | | `v7/A32/A64` | +| void vst4_s64(
     int64_t *ptr,
     int64x1x4_t val)
| `val.val[3] -> Vt4.1D`
`val.val[2] -> Vt3.1D`
`val.val[1] -> Vt2.1D`
`val.val[0] -> Vt.1D`
`ptr -> Xn` | `ST1 {Vt.1D - Vt4.1D},[Xn]` | | `v7/A32/A64` | +| void vst4_u64(
     uint64_t *ptr,
     uint64x1x4_t val)
| `val.val[3] -> Vt4.1D`
`val.val[2] -> Vt3.1D`
`val.val[1] -> Vt2.1D`
`val.val[0] -> Vt.1D`
`ptr -> Xn` | `ST1 {Vt.1D - Vt4.1D},[Xn]` | | `v7/A32/A64` | +| void vst4_p64(
     poly64_t *ptr,
     poly64x1x4_t val)
| `val.val[3] -> Vt4.1D`
`val.val[2] -> Vt3.1D`
`val.val[1] -> Vt2.1D`
`val.val[0] -> Vt.1D`
`ptr -> Xn` | `ST1 {Vt.1D - Vt4.1D},[Xn]` | | `A32/A64` | +| void vst4q_s64(
     int64_t *ptr,
     int64x2x4_t val)
| `val.val[3] -> Vt4.2D`
`val.val[2] -> Vt3.2D`
`val.val[1] -> Vt2.2D`
`val.val[0] -> Vt.2D`
`ptr -> Xn` | `ST4 {Vt.2D - Vt4.2D},[Xn]` | | `A64` | +| void vst4q_u64(
     uint64_t *ptr,
     uint64x2x4_t val)
| `val.val[3] -> Vt4.2D`
`val.val[2] -> Vt3.2D`
`val.val[1] -> Vt2.2D`
`val.val[0] -> Vt.2D`
`ptr -> Xn` | `ST4 {Vt.2D - Vt4.2D},[Xn]` | | `A64` | +| void vst4q_p64(
     poly64_t *ptr,
     poly64x2x4_t val)
| `val.val[3] -> Vt4.2D`
`val.val[2] -> Vt3.2D`
`val.val[1] -> Vt2.2D`
`val.val[0] -> Vt.2D`
`ptr -> Xn` | `ST4 {Vt.2D - Vt4.2D},[Xn]` | | `A64` | +| void vst4_f64(
     float64_t *ptr,
     float64x1x4_t val)
| `val.val[3] -> Vt4.1D`
`val.val[2] -> Vt3.1D`
`val.val[1] -> Vt2.1D`
`val.val[0] -> Vt.1D`
`ptr -> Xn` | `ST1 {Vt.1D - Vt4.1D},[Xn]` | | `A64` | +| void vst4q_f64(
     float64_t *ptr,
     float64x2x4_t val)
| `val.val[3] -> Vt4.2D`
`val.val[2] -> Vt3.2D`
`val.val[1] -> Vt2.2D`
`val.val[0] -> Vt.2D`
`ptr -> Xn` | `ST4 {Vt.2D - Vt4.2D},[Xn]` | | `A64` | +| void vst4_mf8(
     mfloat8_t *ptr,
     mfloat8x8x4_t val)
| `val.val[3] -> Vt4.8B`
`val.val[2] -> Vt3.8B`
`val.val[1] -> Vt2.8B`
`val.val[0] -> Vt.8B`
`ptr -> Xn` | `ST4 {Vt.8B - Vt4.8B},[Xn]` | | `A64` | +| void vst4q_mf8(
     mfloat8_t *ptr,
     mfloat8x16x4_t val)
| `val.val[3] -> Vt4.16B`
`val.val[2] -> Vt3.16B`
`val.val[1] -> Vt2.16B`
`val.val[0] -> Vt.16B`
`ptr -> Xn` | `ST4 {Vt.16B - Vt4.16B},[Xn]` | | `A64` | +| void vst2_lane_s8(
     int8_t *ptr,
     int8x8x2_t val,
     const int lane)
| `val.val[1] -> Vt2.8B`
`val.val[0] -> Vt.8B`
`ptr -> Xn`
`0 <= lane <= 7` | `ST2 {Vt.b - Vt2.b}[lane],[Xn]` | | `v7/A32/A64` | +| void vst2_lane_u8(
     uint8_t *ptr,
     uint8x8x2_t val,
     const int lane)
| `val.val[1] -> Vt2.8B`
`val.val[0] -> Vt.8B`
`ptr -> Xn`
`0 <= lane <= 7` | `ST2 {Vt.b - Vt2.b}[lane],[Xn]` | | `v7/A32/A64` | +| void vst2_lane_p8(
     poly8_t *ptr,
     poly8x8x2_t val,
     const int lane)
| `val.val[1] -> Vt2.8B`
`val.val[0] -> Vt.8B`
`ptr -> Xn`
`0 <= lane <= 7` | `ST2 {Vt.b - Vt2.b}[lane],[Xn]` | | `v7/A32/A64` | +| void vst2_lane_mf8(
     mfloat8_t *ptr,
     mfloat8x8x2_t val,
     const int lane)
| `val.val[1] -> Vt2.8B`
`val.val[0] -> Vt.8B`
`ptr -> Xn`
`0 <= lane <= 7` | `ST2 {Vt.b - Vt2.b}[lane],[Xn]` | | `A64` | +| void vst3_lane_s8(
     int8_t *ptr,
     int8x8x3_t val,
     const int lane)
| `val.val[2] -> Vt3.8B`
`val.val[1] -> Vt2.8B`
`val.val[0] -> Vt.8B`
`ptr -> Xn`
`0 <= lane <= 7` | `ST3 {Vt.b - Vt3.b}[lane],[Xn]` | | `v7/A32/A64` | +| void vst3_lane_u8(
     uint8_t *ptr,
     uint8x8x3_t val,
     const int lane)
| `val.val[2] -> Vt3.8B`
`val.val[1] -> Vt2.8B`
`val.val[0] -> Vt.8B`
`ptr -> Xn`
`0 <= lane <= 7` | `ST3 {Vt.b - Vt3.b}[lane],[Xn]` | | `v7/A32/A64` | +| void vst3_lane_p8(
     poly8_t *ptr,
     poly8x8x3_t val,
     const int lane)
| `val.val[2] -> Vt3.8B`
`val.val[1] -> Vt2.8B`
`val.val[0] -> Vt.8B`
`ptr -> Xn`
`0 <= lane <= 7` | `ST3 {Vt.b - Vt3.b}[lane],[Xn]` | | `v7/A32/A64` | +| void vst3_lane_mf8(
     mfloat8_t *ptr,
     mfloat8x8x3_t val,
     const int lane)
| `val.val[2] -> Vt3.8B`
`val.val[1] -> Vt2.8B`
`val.val[0] -> Vt.8B`
`ptr -> Xn`
`0 <= lane <= 7` | `ST3 {Vt.b - Vt3.b}[lane],[Xn]` | | `A64` | +| void vst4_lane_s8(
     int8_t *ptr,
     int8x8x4_t val,
     const int lane)
| `val.val[3] -> Vt4.8B`
`val.val[2] -> Vt3.8B`
`val.val[1] -> Vt2.8B`
`val.val[0] -> Vt.8B`
`ptr -> Xn`
`0 <= lane <= 7` | `ST4 {Vt.b - Vt4.b}[lane],[Xn]` | | `v7/A32/A64` | +| void vst4_lane_u8(
     uint8_t *ptr,
     uint8x8x4_t val,
     const int lane)
| `val.val[3] -> Vt4.8B`
`val.val[2] -> Vt3.8B`
`val.val[1] -> Vt2.8B`
`val.val[0] -> Vt.8B`
`ptr -> Xn`
`0 <= lane <= 7` | `ST4 {Vt.b - Vt4.b}[lane],[Xn]` | | `v7/A32/A64` | +| void vst4_lane_p8(
     poly8_t *ptr,
     poly8x8x4_t val,
     const int lane)
| `val.val[3] -> Vt4.8B`
`val.val[2] -> Vt3.8B`
`val.val[1] -> Vt2.8B`
`val.val[0] -> Vt.8B`
`ptr -> Xn`
`0 <= lane <= 7` | `ST4 {Vt.b - Vt4.b}[lane],[Xn]` | | `v7/A32/A64` | +| void vst4_lane_mf8(
     mfloat8_t *ptr,
     mfloat8x8x4_t val,
     const int lane)
| `val.val[3] -> Vt4.8B`
`val.val[2] -> Vt3.8B`
`val.val[1] -> Vt2.8B`
`val.val[0] -> Vt.8B`
`ptr -> Xn`
`0 <= lane <= 7` | `ST4 {Vt.b - Vt4.b}[lane],[Xn]` | | `A64` | +| void vst2_lane_s16(
     int16_t *ptr,
     int16x4x2_t val,
     const int lane)
| `val.val[1] -> Vt2.4H`
`val.val[0] -> Vt.4H`
`ptr -> Xn`
`0 <= lane <= 3` | `ST2 {Vt.h - Vt2.h}[lane],[Xn]` | | `v7/A32/A64` | +| void vst2q_lane_s16(
     int16_t *ptr,
     int16x8x2_t val,
     const int lane)
| `val.val[1] -> Vt2.8H`
`val.val[0] -> Vt.8H`
`ptr -> Xn`
`0 <= lane <= 7` | `ST2 {Vt.h - Vt2.h}[lane],[Xn]` | | `v7/A32/A64` | +| void vst2_lane_s32(
     int32_t *ptr,
     int32x2x2_t val,
     const int lane)
| `val.val[1] -> Vt2.2S`
`val.val[0] -> Vt.2S`
`ptr -> Xn`
`0 <= lane <= 1` | `ST2 {Vt.s - Vt2.s}[lane],[Xn]` | | `v7/A32/A64` | +| void vst2q_lane_s32(
     int32_t *ptr,
     int32x4x2_t val,
     const int lane)
| `val.val[1] -> Vt2.4S`
`val.val[0] -> Vt.4S`
`ptr -> Xn`
`0 <= lane <= 3` | `ST2 {Vt.s - Vt2.s}[lane],[Xn]` | | `v7/A32/A64` | +| void vst2_lane_u16(
     uint16_t *ptr,
     uint16x4x2_t val,
     const int lane)
| `val.val[1] -> Vt2.4H`
`val.val[0] -> Vt.4H`
`ptr -> Xn`
`0 <= lane <= 3` | `ST2 {Vt.h - Vt2.h}[lane],[Xn]` | | `v7/A32/A64` | +| void vst2q_lane_u16(
     uint16_t *ptr,
     uint16x8x2_t val,
     const int lane)
| `val.val[1] -> Vt2.8H`
`val.val[0] -> Vt.8H`
`ptr -> Xn`
`0 <= lane <= 7` | `ST2 {Vt.h - Vt2.h}[lane],[Xn]` | | `v7/A32/A64` | +| void vst2_lane_u32(
     uint32_t *ptr,
     uint32x2x2_t val,
     const int lane)
| `val.val[1] -> Vt2.2S`
`val.val[0] -> Vt.2S`
`ptr -> Xn`
`0 <= lane <= 1` | `ST2 {Vt.s - Vt2.s}[lane],[Xn]` | | `v7/A32/A64` | +| void vst2q_lane_u32(
     uint32_t *ptr,
     uint32x4x2_t val,
     const int lane)
| `val.val[1] -> Vt2.4S`
`val.val[0] -> Vt.4S`
`ptr -> Xn`
`0 <= lane <= 3` | `ST2 {Vt.s - Vt2.s}[lane],[Xn]` | | `v7/A32/A64` | +| void vst2_lane_f16(
     float16_t *ptr,
     float16x4x2_t val,
     const int lane)
| `val.val[1] -> Vt2.4H`
`val.val[0] -> Vt.4H`
`ptr -> Xn`
`0 <= lane <= 3` | `ST2 {Vt.h - Vt2.h}[lane],[Xn]` | | `v7/A32/A64` | +| void vst2q_lane_f16(
     float16_t *ptr,
     float16x8x2_t val,
     const int lane)
| `val.val[1] -> Vt2.8H`
`val.val[0] -> Vt.8H`
`ptr -> Xn`
`0 <= lane <= 7` | `ST2 {Vt.h - Vt2.h}[lane],[Xn]` | | `v7/A32/A64` | +| void vst2_lane_f32(
     float32_t *ptr,
     float32x2x2_t val,
     const int lane)
| `val.val[1] -> Vt2.2S`
`val.val[0] -> Vt.2S`
`ptr -> Xn`
`0 <= lane <= 1` | `ST2 {Vt.s - Vt2.s}[lane],[Xn]` | | `v7/A32/A64` | +| void vst2q_lane_f32(
     float32_t *ptr,
     float32x4x2_t val,
     const int lane)
| `val.val[1] -> Vt2.4S`
`val.val[0] -> Vt.4S`
`ptr -> Xn`
`0 <= lane <= 3` | `ST2 {Vt.s - Vt2.s}[lane],[Xn]` | | `v7/A32/A64` | +| void vst2_lane_p16(
     poly16_t *ptr,
     poly16x4x2_t val,
     const int lane)
| `val.val[1] -> Vt2.4H`
`val.val[0] -> Vt.4H`
`ptr -> Xn`
`0 <= lane <= 3` | `ST2 {Vt.h - Vt2.h}[lane],[Xn]` | | `v7/A32/A64` | +| void vst2q_lane_p16(
     poly16_t *ptr,
     poly16x8x2_t val,
     const int lane)
| `val.val[1] -> Vt2.8H`
`val.val[0] -> Vt.8H`
`ptr -> Xn`
`0 <= lane <= 7` | `ST2 {Vt.h - Vt2.h}[lane],[Xn]` | | `v7/A32/A64` | +| void vst2q_lane_s8(
     int8_t *ptr,
     int8x16x2_t val,
     const int lane)
| `val.val[1] -> Vt2.16B`
`val.val[0] -> Vt.16B`
`ptr -> Xn`
`0 <= lane <= 15` | `ST2 {Vt.b - Vt2.b}[lane],[Xn]` | | `A64` | +| void vst2q_lane_u8(
     uint8_t *ptr,
     uint8x16x2_t val,
     const int lane)
| `val.val[1] -> Vt2.16B`
`val.val[0] -> Vt.16B`
`ptr -> Xn`
`0 <= lane <= 15` | `ST2 {Vt.b - Vt2.b}[lane],[Xn]` | | `A64` | +| void vst2q_lane_p8(
     poly8_t *ptr,
     poly8x16x2_t val,
     const int lane)
| `val.val[1] -> Vt2.16B`
`val.val[0] -> Vt.16B`
`ptr -> Xn`
`0 <= lane <= 15` | `ST2 {Vt.b - Vt2.b}[lane],[Xn]` | | `A64` | +| void vst2q_lane_mf8(
     mfloat8_t *ptr,
     mfloat8x16x2_t val,
     const int lane)
| `val.val[1] -> Vt2.16B`
`val.val[0] -> Vt.16B`
`ptr -> Xn`
`0 <= lane <= 15` | `ST2 {Vt.b - Vt2.b}[lane],[Xn]` | | `A64` | +| void vst2_lane_s64(
     int64_t *ptr,
     int64x1x2_t val,
     const int lane)
| `val.val[1] -> Vt2.1D`
`val.val[0] -> Vt.1D`
`ptr -> Xn`
`0 <= lane <= 0` | `ST2 {Vt.d - Vt2.d}[lane],[Xn]` | | `A64` | +| void vst2q_lane_s64(
     int64_t *ptr,
     int64x2x2_t val,
     const int lane)
| `val.val[1] -> Vt2.2D`
`val.val[0] -> Vt.2D`
`ptr -> Xn`
`0 <= lane <= 1` | `ST2 {Vt.d - Vt2.d}[lane],[Xn]` | | `A64` | +| void vst2_lane_u64(
     uint64_t *ptr,
     uint64x1x2_t val,
     const int lane)
| `val.val[1] -> Vt2.1D`
`val.val[0] -> Vt.1D`
`ptr -> Xn`
`0 <= lane <= 0` | `ST2 {Vt.d - Vt2.d}[lane],[Xn]` | | `A64` | +| void vst2q_lane_u64(
     uint64_t *ptr,
     uint64x2x2_t val,
     const int lane)
| `val.val[1] -> Vt2.2D`
`val.val[0] -> Vt.2D`
`ptr -> Xn`
`0 <= lane <= 1` | `ST2 {Vt.d - Vt2.d}[lane],[Xn]` | | `A64` | +| void vst2_lane_p64(
     poly64_t *ptr,
     poly64x1x2_t val,
     const int lane)
| `val.val[1] -> Vt2.1D`
`val.val[0] -> Vt.1D`
`ptr -> Xn`
`0 <= lane <= 0` | `ST2 {Vt.d - Vt2.d}[lane],[Xn]` | | `A64` | +| void vst2q_lane_p64(
     poly64_t *ptr,
     poly64x2x2_t val,
     const int lane)
| `val.val[1] -> Vt2.2D`
`val.val[0] -> Vt.2D`
`ptr -> Xn`
`0 <= lane <= 1` | `ST2 {Vt.d - Vt2.d}[lane],[Xn]` | | `A64` | +| void vst2_lane_f64(
     float64_t *ptr,
     float64x1x2_t val,
     const int lane)
| `val.val[1] -> Vt2.1D`
`val.val[0] -> Vt.1D`
`ptr -> Xn`
`0 <= lane <= 0` | `ST2 {Vt.d - Vt2.d}[lane],[Xn]` | | `A64` | +| void vst2q_lane_f64(
     float64_t *ptr,
     float64x2x2_t val,
     const int lane)
| `val.val[1] -> Vt2.2D`
`val.val[0] -> Vt.2D`
`ptr -> Xn`
`0 <= lane <= 1` | `ST2 {Vt.d - Vt2.d}[lane],[Xn]` | | `A64` | +| void vst3_lane_s16(
     int16_t *ptr,
     int16x4x3_t val,
     const int lane)
| `val.val[2] -> Vt3.4H`
`val.val[1] -> Vt2.4H`
`val.val[0] -> Vt.4H`
`ptr -> Xn`
`0 <= lane <= 3` | `ST3 {Vt.h - Vt3.h}[lane],[Xn]` | | `v7/A32/A64` | +| void vst3q_lane_s16(
     int16_t *ptr,
     int16x8x3_t val,
     const int lane)
| `val.val[2] -> Vt3.8H`
`val.val[1] -> Vt2.8H`
`val.val[0] -> Vt.8H`
`ptr -> Xn`
`0 <= lane <= 7` | `ST3 {Vt.h - Vt3.h}[lane],[Xn]` | | `v7/A32/A64` | +| void vst3_lane_s32(
     int32_t *ptr,
     int32x2x3_t val,
     const int lane)
| `val.val[2] -> Vt3.2S`
`val.val[1] -> Vt2.2S`
`val.val[0] -> Vt.2S`
`ptr -> Xn`
`0 <= lane <= 1` | `ST3 {Vt.s - Vt3.s}[lane],[Xn]` | | `v7/A32/A64` | +| void vst3q_lane_s32(
     int32_t *ptr,
     int32x4x3_t val,
     const int lane)
| `val.val[2] -> Vt3.4S`
`val.val[1] -> Vt2.4S`
`val.val[0] -> Vt.4S`
`ptr -> Xn`
`0 <= lane <= 3` | `ST3 {Vt.s - Vt3.s}[lane],[Xn]` | | `v7/A32/A64` | +| void vst3_lane_u16(
     uint16_t *ptr,
     uint16x4x3_t val,
     const int lane)
| `val.val[2] -> Vt3.4H`
`val.val[1] -> Vt2.4H`
`val.val[0] -> Vt.4H`
`ptr -> Xn`
`0 <= lane <= 3` | `ST3 {Vt.h - Vt3.h}[lane],[Xn]` | | `v7/A32/A64` | +| void vst3q_lane_u16(
     uint16_t *ptr,
     uint16x8x3_t val,
     const int lane)
| `val.val[2] -> Vt3.8H`
`val.val[1] -> Vt2.8H`
`val.val[0] -> Vt.8H`
`ptr -> Xn`
`0 <= lane <= 7` | `ST3 {Vt.h - Vt3.h}[lane],[Xn]` | | `v7/A32/A64` | +| void vst3_lane_u32(
     uint32_t *ptr,
     uint32x2x3_t val,
     const int lane)
| `val.val[2] -> Vt3.2S`
`val.val[1] -> Vt2.2S`
`val.val[0] -> Vt.2S`
`ptr -> Xn`
`0 <= lane <= 1` | `ST3 {Vt.s - Vt3.s}[lane],[Xn]` | | `v7/A32/A64` | +| void vst3q_lane_u32(
     uint32_t *ptr,
     uint32x4x3_t val,
     const int lane)
| `val.val[2] -> Vt3.4S`
`val.val[1] -> Vt2.4S`
`val.val[0] -> Vt.4S`
`ptr -> Xn`
`0 <= lane <= 3` | `ST3 {Vt.s - Vt3.s}[lane],[Xn]` | | `v7/A32/A64` | +| void vst3_lane_f16(
     float16_t *ptr,
     float16x4x3_t val,
     const int lane)
| `val.val[2] -> Vt3.4H`
`val.val[1] -> Vt2.4H`
`val.val[0] -> Vt.4H`
`ptr -> Xn`
`0 <= lane <= 3` | `ST3 {Vt.h - Vt3.h}[lane],[Xn]` | | `v7/A32/A64` | +| void vst3q_lane_f16(
     float16_t *ptr,
     float16x8x3_t val,
     const int lane)
| `val.val[2] -> Vt3.8H`
`val.val[1] -> Vt2.8H`
`val.val[0] -> Vt.8H`
`ptr -> Xn`
`0 <= lane <= 7` | `ST3 {Vt.h - Vt3.h}[lane],[Xn]` | | `v7/A32/A64` | +| void vst3_lane_f32(
     float32_t *ptr,
     float32x2x3_t val,
     const int lane)
| `val.val[2] -> Vt3.2S`
`val.val[1] -> Vt2.2S`
`val.val[0] -> Vt.2S`
`ptr -> Xn`
`0 <= lane <= 1` | `ST3 {Vt.s - Vt3.s}[lane],[Xn]` | | `v7/A32/A64` | +| void vst3q_lane_f32(
     float32_t *ptr,
     float32x4x3_t val,
     const int lane)
| `val.val[2] -> Vt3.4S`
`val.val[1] -> Vt2.4S`
`val.val[0] -> Vt.4S`
`ptr -> Xn`
`0 <= lane <= 3` | `ST3 {Vt.s - Vt3.s}[lane],[Xn]` | | `v7/A32/A64` | +| void vst3_lane_p16(
     poly16_t *ptr,
     poly16x4x3_t val,
     const int lane)
| `val.val[2] -> Vt3.4H`
`val.val[1] -> Vt2.4H`
`val.val[0] -> Vt.4H`
`ptr -> Xn`
`0 <= lane <= 3` | `ST3 {Vt.h - Vt3.h}[lane],[Xn]` | | `v7/A32/A64` | +| void vst3q_lane_p16(
     poly16_t *ptr,
     poly16x8x3_t val,
     const int lane)
| `val.val[2] -> Vt3.8H`
`val.val[1] -> Vt2.8H`
`val.val[0] -> Vt.8H`
`ptr -> Xn`
`0 <= lane <= 7` | `ST3 {Vt.h - Vt3.h}[lane],[Xn]` | | `v7/A32/A64` | +| void vst3q_lane_s8(
     int8_t *ptr,
     int8x16x3_t val,
     const int lane)
| `val.val[2] -> Vt3.16B`
`val.val[1] -> Vt2.16B`
`val.val[0] -> Vt.16B`
`ptr -> Xn`
`0 <= lane <= 15` | `ST3 {Vt.b - Vt3.b}[lane],[Xn]` | | `v7/A32/A64` | +| void vst3q_lane_u8(
     uint8_t *ptr,
     uint8x16x3_t val,
     const int lane)
| `val.val[2] -> Vt3.16B`
`val.val[1] -> Vt2.16B`
`val.val[0] -> Vt.16B`
`ptr -> Xn`
`0 <= lane <= 15` | `ST3 {Vt.b - Vt3.b}[lane],[Xn]` | | `v7/A32/A64` | +| void vst3q_lane_p8(
     poly8_t *ptr,
     poly8x16x3_t val,
     const int lane)
| `val.val[2] -> Vt3.16B`
`val.val[1] -> Vt2.16B`
`val.val[0] -> Vt.16B`
`ptr -> Xn`
`0 <= lane <= 15` | `ST3 {Vt.b - Vt3.b}[lane],[Xn]` | | `v7/A32/A64` | +| void vst3_lane_s64(
     int64_t *ptr,
     int64x1x3_t val,
     const int lane)
| `val.val[2] -> Vt3.1D`
`val.val[1] -> Vt2.1D`
`val.val[0] -> Vt.1D`
`ptr -> Xn`
`0 <= lane <= 0` | `ST3 {Vt.d - Vt3.d}[lane],[Xn]` | | `A64` | +| void vst3q_lane_s64(
     int64_t *ptr,
     int64x2x3_t val,
     const int lane)
| `val.val[2] -> Vt3.2D`
`val.val[1] -> Vt2.2D`
`val.val[0] -> Vt.2D`
`ptr -> Xn`
`0 <= lane <= 1` | `ST3 {Vt.d - Vt3.d}[lane],[Xn]` | | `A64` | +| void vst3_lane_u64(
     uint64_t *ptr,
     uint64x1x3_t val,
     const int lane)
| `val.val[2] -> Vt3.1D`
`val.val[1] -> Vt2.1D`
`val.val[0] -> Vt.1D`
`ptr -> Xn`
`0 <= lane <= 0` | `ST3 {Vt.d - Vt3.d}[lane],[Xn]` | | `A64` | +| void vst3q_lane_u64(
     uint64_t *ptr,
     uint64x2x3_t val,
     const int lane)
| `val.val[2] -> Vt3.2D`
`val.val[1] -> Vt2.2D`
`val.val[0] -> Vt.2D`
`ptr -> Xn`
`0 <= lane <= 1` | `ST3 {Vt.d - Vt3.d}[lane],[Xn]` | | `A64` | +| void vst3_lane_p64(
     poly64_t *ptr,
     poly64x1x3_t val,
     const int lane)
| `val.val[2] -> Vt3.1D`
`val.val[1] -> Vt2.1D`
`val.val[0] -> Vt.1D`
`ptr -> Xn`
`0 <= lane <= 0` | `ST3 {Vt.d - Vt3.d}[lane],[Xn]` | | `A64` | +| void vst3q_lane_p64(
     poly64_t *ptr,
     poly64x2x3_t val,
     const int lane)
| `val.val[2] -> Vt3.2D`
`val.val[1] -> Vt2.2D`
`val.val[0] -> Vt.2D`
`ptr -> Xn`
`0 <= lane <= 1` | `ST3 {Vt.d - Vt3.d}[lane],[Xn]` | | `A64` | +| void vst3_lane_f64(
     float64_t *ptr,
     float64x1x3_t val,
     const int lane)
| `val.val[2] -> Vt3.1D`
`val.val[1] -> Vt2.1D`
`val.val[0] -> Vt.1D`
`ptr -> Xn`
`0 <= lane <= 0` | `ST3 {Vt.d - Vt3.d}[lane],[Xn]` | | `A64` | +| void vst3q_lane_f64(
     float64_t *ptr,
     float64x2x3_t val,
     const int lane)
| `val.val[2] -> Vt3.2D`
`val.val[1] -> Vt2.2D`
`val.val[0] -> Vt.2D`
`ptr -> Xn`
`0 <= lane <= 1` | `ST3 {Vt.d - Vt3.d}[lane],[Xn]` | | `A64` | +| void vst3q_lane_mf8(
     mfloat8_t *ptr,
     mfloat8x16x3_t val,
     const int lane)
| `val.val[2] -> Vt3.16B`
`val.val[1] -> Vt2.16B`
`val.val[0] -> Vt.16B`
`ptr -> Xn`
`0 <= lane <= 15` | `ST3 {Vt.b - Vt3.b}[lane],[Xn]` | | `A64` | +| void vst4_lane_s16(
     int16_t *ptr,
     int16x4x4_t val,
     const int lane)
| `val.val[3] -> Vt4.4H`
`val.val[2] -> Vt3.4H`
`val.val[1] -> Vt2.4H`
`val.val[0] -> Vt.4H`
`ptr -> Xn`
`0 <= lane <= 3` | `ST4 {Vt.h - Vt4.h}[lane],[Xn]` | | `v7/A32/A64` | +| void vst4q_lane_s16(
     int16_t *ptr,
     int16x8x4_t val,
     const int lane)
| `val.val[3] -> Vt4.8H`
`val.val[2] -> Vt3.8H`
`val.val[1] -> Vt2.8H`
`val.val[0] -> Vt.8H`
`ptr -> Xn`
`0 <= lane <= 7` | `ST4 {Vt.h - Vt4.h}[lane],[Xn]` | | `v7/A32/A64` | +| void vst4_lane_s32(
     int32_t *ptr,
     int32x2x4_t val,
     const int lane)
| `val.val[3] -> Vt4.2S`
`val.val[2] -> Vt3.2S`
`val.val[1] -> Vt2.2S`
`val.val[0] -> Vt.2S`
`ptr -> Xn`
`0 <= lane <= 1` | `ST4 {Vt.s - Vt4.s}[lane],[Xn]` | | `v7/A32/A64` | +| void vst4q_lane_s32(
     int32_t *ptr,
     int32x4x4_t val,
     const int lane)
| `val.val[3] -> Vt4.4S`
`val.val[2] -> Vt3.4S`
`val.val[1] -> Vt2.4S`
`val.val[0] -> Vt.4S`
`ptr -> Xn`
`0 <= lane <= 3` | `ST4 {Vt.s - Vt4.s}[lane],[Xn]` | | `v7/A32/A64` | +| void vst4_lane_u16(
     uint16_t *ptr,
     uint16x4x4_t val,
     const int lane)
| `val.val[3] -> Vt4.4H`
`val.val[2] -> Vt3.4H`
`val.val[1] -> Vt2.4H`
`val.val[0] -> Vt.4H`
`ptr -> Xn`
`0 <= lane <= 3` | `ST4 {Vt.h - Vt4.h}[lane],[Xn]` | | `v7/A32/A64` | +| void vst4q_lane_u16(
     uint16_t *ptr,
     uint16x8x4_t val,
     const int lane)
| `val.val[3] -> Vt4.8H`
`val.val[2] -> Vt3.8H`
`val.val[1] -> Vt2.8H`
`val.val[0] -> Vt.8H`
`ptr -> Xn`
`0 <= lane <= 7` | `ST4 {Vt.h - Vt4.h}[lane],[Xn]` | | `v7/A32/A64` | +| void vst4_lane_u32(
     uint32_t *ptr,
     uint32x2x4_t val,
     const int lane)
| `val.val[3] -> Vt4.2S`
`val.val[2] -> Vt3.2S`
`val.val[1] -> Vt2.2S`
`val.val[0] -> Vt.2S`
`ptr -> Xn`
`0 <= lane <= 1` | `ST4 {Vt.s - Vt4.s}[lane],[Xn]` | | `v7/A32/A64` | +| void vst4q_lane_u32(
     uint32_t *ptr,
     uint32x4x4_t val,
     const int lane)
| `val.val[3] -> Vt4.4S`
`val.val[2] -> Vt3.4S`
`val.val[1] -> Vt2.4S`
`val.val[0] -> Vt.4S`
`ptr -> Xn`
`0 <= lane <= 3` | `ST4 {Vt.s - Vt4.s}[lane],[Xn]` | | `v7/A32/A64` | +| void vst4_lane_f16(
     float16_t *ptr,
     float16x4x4_t val,
     const int lane)
| `val.val[3] -> Vt4.4H`
`val.val[2] -> Vt3.4H`
`val.val[1] -> Vt2.4H`
`val.val[0] -> Vt.4H`
`ptr -> Xn`
`0 <= lane <= 3` | `ST4 {Vt.h - Vt4.h}[lane],[Xn]` | | `v7/A32/A64` | +| void vst4q_lane_f16(
     float16_t *ptr,
     float16x8x4_t val,
     const int lane)
| `val.val[3] -> Vt4.8H`
`val.val[2] -> Vt3.8H`
`val.val[1] -> Vt2.8H`
`val.val[0] -> Vt.8H`
`ptr -> Xn`
`0 <= lane <= 7` | `ST4 {Vt.h - Vt4.h}[lane],[Xn]` | | `v7/A32/A64` | +| void vst4_lane_f32(
     float32_t *ptr,
     float32x2x4_t val,
     const int lane)
| `val.val[3] -> Vt4.2S`
`val.val[2] -> Vt3.2S`
`val.val[1] -> Vt2.2S`
`val.val[0] -> Vt.2S`
`ptr -> Xn`
`0 <= lane <= 1` | `ST4 {Vt.s - Vt4.s}[lane],[Xn]` | | `v7/A32/A64` | +| void vst4q_lane_f32(
     float32_t *ptr,
     float32x4x4_t val,
     const int lane)
| `val.val[3] -> Vt4.4S`
`val.val[2] -> Vt3.4S`
`val.val[1] -> Vt2.4S`
`val.val[0] -> Vt.4S`
`ptr -> Xn`
`0 <= lane <= 3` | `ST4 {Vt.s - Vt4.s}[lane],[Xn]` | | `v7/A32/A64` | +| void vst4_lane_p16(
     poly16_t *ptr,
     poly16x4x4_t val,
     const int lane)
| `val.val[3] -> Vt4.4H`
`val.val[2] -> Vt3.4H`
`val.val[1] -> Vt2.4H`
`val.val[0] -> Vt.4H`
`ptr -> Xn`
`0 <= lane <= 3` | `ST4 {Vt.h - Vt4.h}[lane],[Xn]` | | `v7/A32/A64` | +| void vst4q_lane_p16(
     poly16_t *ptr,
     poly16x8x4_t val,
     const int lane)
| `val.val[3] -> Vt4.8H`
`val.val[2] -> Vt3.8H`
`val.val[1] -> Vt2.8H`
`val.val[0] -> Vt.8H`
`ptr -> Xn`
`0 <= lane <= 7` | `ST4 {Vt.h - Vt4.h}[lane],[Xn]` | | `v7/A32/A64` | +| void vst4q_lane_s8(
     int8_t *ptr,
     int8x16x4_t val,
     const int lane)
| `val.val[3] -> Vt4.16B`
`val.val[2] -> Vt3.16B`
`val.val[1] -> Vt2.16B`
`val.val[0] -> Vt.16B`
`ptr -> Xn`
`0 <= lane <= 15` | `ST4 {Vt.b - Vt4.b}[lane],[Xn]` | | `A64` | +| void vst4q_lane_u8(
     uint8_t *ptr,
     uint8x16x4_t val,
     const int lane)
| `val.val[3] -> Vt4.16B`
`val.val[2] -> Vt3.16B`
`val.val[1] -> Vt2.16B`
`val.val[0] -> Vt.16B`
`ptr -> Xn`
`0 <= lane <= 15` | `ST4 {Vt.b - Vt4.b}[lane],[Xn]` | | `A64` | +| void vst4q_lane_p8(
     poly8_t *ptr,
     poly8x16x4_t val,
     const int lane)
| `val.val[3] -> Vt4.16B`
`val.val[2] -> Vt3.16B`
`val.val[1] -> Vt2.16B`
`val.val[0] -> Vt.16B`
`ptr -> Xn`
`0 <= lane <= 15` | `ST4 {Vt.b - Vt4.b}[lane],[Xn]` | | `A64` | +| void vst4_lane_s64(
     int64_t *ptr,
     int64x1x4_t val,
     const int lane)
| `val.val[3] -> Vt4.1D`
`val.val[2] -> Vt3.1D`
`val.val[1] -> Vt2.1D`
`val.val[0] -> Vt.1D`
`ptr -> Xn`
`0 <= lane <= 0` | `ST4 {Vt.d - Vt4.d}[lane],[Xn]` | | `A64` | +| void vst4q_lane_s64(
     int64_t *ptr,
     int64x2x4_t val,
     const int lane)
| `val.val[3] -> Vt4.2D`
`val.val[2] -> Vt3.2D`
`val.val[1] -> Vt2.2D`
`val.val[0] -> Vt.2D`
`ptr -> Xn`
`0 <= lane <= 1` | `ST4 {Vt.d - Vt4.d}[lane],[Xn]` | | `A64` | +| void vst4_lane_u64(
     uint64_t *ptr,
     uint64x1x4_t val,
     const int lane)
| `val.val[3] -> Vt4.1D`
`val.val[2] -> Vt3.1D`
`val.val[1] -> Vt2.1D`
`val.val[0] -> Vt.1D`
`ptr -> Xn`
`0 <= lane <= 0` | `ST4 {Vt.d - Vt4.d}[lane],[Xn]` | | `A64` | +| void vst4q_lane_u64(
     uint64_t *ptr,
     uint64x2x4_t val,
     const int lane)
| `val.val[3] -> Vt4.2D`
`val.val[2] -> Vt3.2D`
`val.val[1] -> Vt2.2D`
`val.val[0] -> Vt.2D`
`ptr -> Xn`
`0 <= lane <= 1` | `ST4 {Vt.d - Vt4.d}[lane],[Xn]` | | `A64` | +| void vst4_lane_p64(
     poly64_t *ptr,
     poly64x1x4_t val,
     const int lane)
| `val.val[3] -> Vt4.1D`
`val.val[2] -> Vt3.1D`
`val.val[1] -> Vt2.1D`
`val.val[0] -> Vt.1D`
`ptr -> Xn`
`0 <= lane <= 0` | `ST4 {Vt.d - Vt4.d}[lane],[Xn]` | | `A64` | +| void vst4q_lane_p64(
     poly64_t *ptr,
     poly64x2x4_t val,
     const int lane)
| `val.val[3] -> Vt4.2D`
`val.val[2] -> Vt3.2D`
`val.val[1] -> Vt2.2D`
`val.val[0] -> Vt.2D`
`ptr -> Xn`
`0 <= lane <= 1` | `ST4 {Vt.d - Vt4.d}[lane],[Xn]` | | `A64` | +| void vst4_lane_f64(
     float64_t *ptr,
     float64x1x4_t val,
     const int lane)
| `val.val[3] -> Vt4.1D`
`val.val[2] -> Vt3.1D`
`val.val[1] -> Vt2.1D`
`val.val[0] -> Vt.1D`
`ptr -> Xn`
`0 <= lane <= 0` | `ST4 {Vt.d - Vt4.d}[lane],[Xn]` | | `A64` | +| void vst4q_lane_f64(
     float64_t *ptr,
     float64x2x4_t val,
     const int lane)
| `val.val[3] -> Vt4.2D`
`val.val[2] -> Vt3.2D`
`val.val[1] -> Vt2.2D`
`val.val[0] -> Vt.2D`
`ptr -> Xn`
`0 <= lane <= 1` | `ST4 {Vt.d - Vt4.d}[lane],[Xn]` | | `A64` | +| void vst4q_lane_mf8(
     mfloat8_t *ptr,
     mfloat8x16x4_t val,
     const int lane)
| `val.val[3] -> Vt4.16B`
`val.val[2] -> Vt3.16B`
`val.val[1] -> Vt2.16B`
`val.val[0] -> Vt.16B`
`ptr -> Xn`
`0 <= lane <= 15` | `ST4 {Vt.b - Vt4.b}[lane],[Xn]` | | `A64` | +| void vst1_s8_x2(
     int8_t *ptr,
     int8x8x2_t val)
| `val.val[1] -> Vt2.8B`
`val.val[0] -> Vt.8B`
`ptr -> Xn` | `ST1 {Vt.8B - Vt2.8B},[Xn]` | | `v7/A32/A64` | +| void vst1q_s8_x2(
     int8_t *ptr,
     int8x16x2_t val)
| `val.val[1] -> Vt2.16B`
`val.val[0] -> Vt.16B`
`ptr -> Xn` | `ST1 {Vt.16B - Vt2.16B},[Xn]` | | `v7/A32/A64` | +| void vst1_s16_x2(
     int16_t *ptr,
     int16x4x2_t val)
| `val.val[1] -> Vt2.4H`
`val.val[0] -> Vt.4H`
`ptr -> Xn` | `ST1 {Vt.4H - Vt2.4H},[Xn]` | | `v7/A32/A64` | +| void vst1q_s16_x2(
     int16_t *ptr,
     int16x8x2_t val)
| `val.val[1] -> Vt2.8H`
`val.val[0] -> Vt.8H`
`ptr -> Xn` | `ST1 {Vt.8H - Vt2.8H},[Xn]` | | `v7/A32/A64` | +| void vst1_s32_x2(
     int32_t *ptr,
     int32x2x2_t val)
| `val.val[1] -> Vt2.2S`
`val.val[0] -> Vt.2S`
`ptr -> Xn` | `ST1 {Vt.2S - Vt2.2S},[Xn]` | | `v7/A32/A64` | +| void vst1q_s32_x2(
     int32_t *ptr,
     int32x4x2_t val)
| `val.val[1] -> Vt2.4S`
`val.val[0] -> Vt.4S`
`ptr -> Xn` | `ST1 {Vt.4S - Vt2.4S},[Xn]` | | `v7/A32/A64` | +| void vst1_u8_x2(
     uint8_t *ptr,
     uint8x8x2_t val)
| `val.val[1] -> Vt2.8B`
`val.val[0] -> Vt.8B`
`ptr -> Xn` | `ST1 {Vt.8B - Vt2.8B},[Xn]` | | `v7/A32/A64` | +| void vst1q_u8_x2(
     uint8_t *ptr,
     uint8x16x2_t val)
| `val.val[1] -> Vt2.16B`
`val.val[0] -> Vt.16B`
`ptr -> Xn` | `ST1 {Vt.16B - Vt2.16B},[Xn]` | | `v7/A32/A64` | +| void vst1_u16_x2(
     uint16_t *ptr,
     uint16x4x2_t val)
| `val.val[1] -> Vt2.4H`
`val.val[0] -> Vt.4H`
`ptr -> Xn` | `ST1 {Vt.4H - Vt2.4H},[Xn]` | | `v7/A32/A64` | +| void vst1q_u16_x2(
     uint16_t *ptr,
     uint16x8x2_t val)
| `val.val[1] -> Vt2.8H`
`val.val[0] -> Vt.8H`
`ptr -> Xn` | `ST1 {Vt.8H - Vt2.8H},[Xn]` | | `v7/A32/A64` | +| void vst1_u32_x2(
     uint32_t *ptr,
     uint32x2x2_t val)
| `val.val[1] -> Vt2.2S`
`val.val[0] -> Vt.2S`
`ptr -> Xn` | `ST1 {Vt.2S - Vt2.2S},[Xn]` | | `v7/A32/A64` | +| void vst1q_u32_x2(
     uint32_t *ptr,
     uint32x4x2_t val)
| `val.val[1] -> Vt2.4S`
`val.val[0] -> Vt.4S`
`ptr -> Xn` | `ST1 {Vt.4S - Vt2.4S},[Xn]` | | `v7/A32/A64` | +| void vst1_f16_x2(
     float16_t *ptr,
     float16x4x2_t val)
| `val.val[1] -> Vt2.4H`
`val.val[0] -> Vt.4H`
`ptr -> Xn` | `ST1 {Vt.4H - Vt2.4H},[Xn]` | | `v7/A32/A64` | +| void vst1q_f16_x2(
     float16_t *ptr,
     float16x8x2_t val)
| `val.val[1] -> Vt2.8H`
`val.val[0] -> Vt.8H`
`ptr -> Xn` | `ST1 {Vt.8H - Vt2.8H},[Xn]` | | `v7/A32/A64` | +| void vst1_f32_x2(
     float32_t *ptr,
     float32x2x2_t val)
| `val.val[1] -> Vt2.2S`
`val.val[0] -> Vt.2S`
`ptr -> Xn` | `ST1 {Vt.2S - Vt2.2S},[Xn]` | | `v7/A32/A64` | +| void vst1q_f32_x2(
     float32_t *ptr,
     float32x4x2_t val)
| `val.val[1] -> Vt2.4S`
`val.val[0] -> Vt.4S`
`ptr -> Xn` | `ST1 {Vt.4S - Vt2.4S},[Xn]` | | `v7/A32/A64` | +| void vst1_p8_x2(
     poly8_t *ptr,
     poly8x8x2_t val)
| `val.val[1] -> Vt2.8B`
`val.val[0] -> Vt.8B`
`ptr -> Xn` | `ST1 {Vt.8B - Vt2.8B},[Xn]` | | `v7/A32/A64` | +| void vst1q_p8_x2(
     poly8_t *ptr,
     poly8x16x2_t val)
| `val.val[1] -> Vt2.16B`
`val.val[0] -> Vt.16B`
`ptr -> Xn` | `ST1 {Vt.16B - Vt2.16B},[Xn]` | | `v7/A32/A64` | +| void vst1_p16_x2(
     poly16_t *ptr,
     poly16x4x2_t val)
| `val.val[1] -> Vt2.4H`
`val.val[0] -> Vt.4H`
`ptr -> Xn` | `ST1 {Vt.4H - Vt2.4H},[Xn]` | | `v7/A32/A64` | +| void vst1q_p16_x2(
     poly16_t *ptr,
     poly16x8x2_t val)
| `val.val[1] -> Vt2.8H`
`val.val[0] -> Vt.8H`
`ptr -> Xn` | `ST1 {Vt.8H - Vt2.8H},[Xn]` | | `v7/A32/A64` | +| void vst1_s64_x2(
     int64_t *ptr,
     int64x1x2_t val)
| `val.val[1] -> Vt2.1D`
`val.val[0] -> Vt.1D`
`ptr -> Xn` | `ST1 {Vt.1D - Vt2.1D},[Xn]` | | `v7/A32/A64` | +| void vst1_u64_x2(
     uint64_t *ptr,
     uint64x1x2_t val)
| `val.val[1] -> Vt2.1D`
`val.val[0] -> Vt.1D`
`ptr -> Xn` | `ST1 {Vt.1D - Vt2.1D},[Xn]` | | `v7/A32/A64` | +| void vst1_p64_x2(
     poly64_t *ptr,
     poly64x1x2_t val)
| `val.val[1] -> Vt2.1D`
`val.val[0] -> Vt.1D`
`ptr -> Xn` | `ST1 {Vt.1D - Vt2.1D},[Xn]` | | `A32/A64` | +| void vst1q_s64_x2(
     int64_t *ptr,
     int64x2x2_t val)
| `val.val[1] -> Vt2.2D`
`val.val[0] -> Vt.2D`
`ptr -> Xn` | `ST1 {Vt.2D - Vt2.2D},[Xn]` | | `v7/A32/A64` | +| void vst1q_u64_x2(
     uint64_t *ptr,
     uint64x2x2_t val)
| `val.val[1] -> Vt2.2D`
`val.val[0] -> Vt.2D`
`ptr -> Xn` | `ST1 {Vt.2D - Vt2.2D},[Xn]` | | `v7/A32/A64` | +| void vst1q_p64_x2(
     poly64_t *ptr,
     poly64x2x2_t val)
| `val.val[1] -> Vt2.2D`
`val.val[0] -> Vt.2D`
`ptr -> Xn` | `ST1 {Vt.2D - Vt2.2D},[Xn]` | | `A32/A64` | +| void vst1_f64_x2(
     float64_t *ptr,
     float64x1x2_t val)
| `val.val[1] -> Vt2.1D`
`val.val[0] -> Vt.1D`
`ptr -> Xn` | `ST1 {Vt.1D - Vt2.1D},[Xn]` | | `A64` | +| void vst1q_f64_x2(
     float64_t *ptr,
     float64x2x2_t val)
| `val.val[1] -> Vt2.2D`
`val.val[0] -> Vt.2D`
`ptr -> Xn` | `ST1 {Vt.2D - Vt2.2D},[Xn]` | | `A64` | +| void vst1_mf8_x2(
     mfloat8_t *ptr,
     mfloat8x8x2_t val)
| `val.val[1] -> Vt2.8B`
`val.val[0] -> Vt.8B`
`ptr -> Xn` | `ST1 {Vt.8B - Vt2.8B},[Xn]` | | `A64` | +| void vst1q_mf8_x2(
     mfloat8_t *ptr,
     mfloat8x16x2_t val)
| `val.val[1] -> Vt2.16B`
`val.val[0] -> Vt.16B`
`ptr -> Xn` | `ST1 {Vt.16B - Vt2.16B},[Xn]` | | `A64` | +| void vst1_s8_x3(
     int8_t *ptr,
     int8x8x3_t val)
| `val.val[2] -> Vt3.8B`
`val.val[1] -> Vt2.8B`
`val.val[0] -> Vt.8B`
`ptr -> Xn` | `ST1 {Vt.8B - Vt3.8B},[Xn]` | | `v7/A32/A64` | +| void vst1q_s8_x3(
     int8_t *ptr,
     int8x16x3_t val)
| `val.val[2] -> Vt3.16B`
`val.val[1] -> Vt2.16B`
`val.val[0] -> Vt.16B`
`ptr -> Xn` | `ST1 {Vt.16B - Vt3.16B},[Xn]` | | `v7/A32/A64` | +| void vst1_s16_x3(
     int16_t *ptr,
     int16x4x3_t val)
| `val.val[2] -> Vt3.4H`
`val.val[1] -> Vt2.4H`
`val.val[0] -> Vt.4H`
`ptr -> Xn` | `ST1 {Vt.4H - Vt3.4H},[Xn]` | | `v7/A32/A64` | +| void vst1q_s16_x3(
     int16_t *ptr,
     int16x8x3_t val)
| `val.val[2] -> Vt3.8H`
`val.val[1] -> Vt2.8H`
`val.val[0] -> Vt.8H`
`ptr -> Xn` | `ST1 {Vt.8H - Vt3.8H},[Xn]` | | `v7/A32/A64` | +| void vst1_s32_x3(
     int32_t *ptr,
     int32x2x3_t val)
| `val.val[2] -> Vt3.2S`
`val.val[1] -> Vt2.2S`
`val.val[0] -> Vt.2S`
`ptr -> Xn` | `ST1 {Vt.2S - Vt3.2S},[Xn]` | | `v7/A32/A64` | +| void vst1q_s32_x3(
     int32_t *ptr,
     int32x4x3_t val)
| `val.val[2] -> Vt3.4S`
`val.val[1] -> Vt2.4S`
`val.val[0] -> Vt.4S`
`ptr -> Xn` | `ST1 {Vt.4S - Vt3.4S},[Xn]` | | `v7/A32/A64` | +| void vst1_u8_x3(
     uint8_t *ptr,
     uint8x8x3_t val)
| `val.val[2] -> Vt3.8B`
`val.val[1] -> Vt2.8B`
`val.val[0] -> Vt.8B`
`ptr -> Xn` | `ST1 {Vt.8B - Vt3.8B},[Xn]` | | `v7/A32/A64` | +| void vst1q_u8_x3(
     uint8_t *ptr,
     uint8x16x3_t val)
| `val.val[2] -> Vt3.16B`
`val.val[1] -> Vt2.16B`
`val.val[0] -> Vt.16B`
`ptr -> Xn` | `ST1 {Vt.16B - Vt3.16B},[Xn]` | | `v7/A32/A64` | +| void vst1_u16_x3(
     uint16_t *ptr,
     uint16x4x3_t val)
| `val.val[2] -> Vt3.4H`
`val.val[1] -> Vt2.4H`
`val.val[0] -> Vt.4H`
`ptr -> Xn` | `ST1 {Vt.4H - Vt3.4H},[Xn]` | | `v7/A32/A64` | +| void vst1q_u16_x3(
     uint16_t *ptr,
     uint16x8x3_t val)
| `val.val[2] -> Vt3.8H`
`val.val[1] -> Vt2.8H`
`val.val[0] -> Vt.8H`
`ptr -> Xn` | `ST1 {Vt.8H - Vt3.8H},[Xn]` | | `v7/A32/A64` | +| void vst1_u32_x3(
     uint32_t *ptr,
     uint32x2x3_t val)
| `val.val[2] -> Vt3.2S`
`val.val[1] -> Vt2.2S`
`val.val[0] -> Vt.2S`
`ptr -> Xn` | `ST1 {Vt.2S - Vt3.2S},[Xn]` | | `v7/A32/A64` | +| void vst1q_u32_x3(
     uint32_t *ptr,
     uint32x4x3_t val)
| `val.val[2] -> Vt3.4S`
`val.val[1] -> Vt2.4S`
`val.val[0] -> Vt.4S`
`ptr -> Xn` | `ST1 {Vt.4S - Vt3.4S},[Xn]` | | `v7/A32/A64` | +| void vst1_f16_x3(
     float16_t *ptr,
     float16x4x3_t val)
| `val.val[2] -> Vt3.4H`
`val.val[1] -> Vt2.4H`
`val.val[0] -> Vt.4H`
`ptr -> Xn` | `ST1 {Vt.4H - Vt3.4H},[Xn]` | | `v7/A32/A64` | +| void vst1q_f16_x3(
     float16_t *ptr,
     float16x8x3_t val)
| `val.val[2] -> Vt3.8H`
`val.val[1] -> Vt2.8H`
`val.val[0] -> Vt.8H`
`ptr -> Xn` | `ST1 {Vt.8H - Vt3.8H},[Xn]` | | `v7/A32/A64` | +| void vst1_f32_x3(
     float32_t *ptr,
     float32x2x3_t val)
| `val.val[2] -> Vt3.2S`
`val.val[1] -> Vt2.2S`
`val.val[0] -> Vt.2S`
`ptr -> Xn` | `ST1 {Vt.2S - Vt3.2S},[Xn]` | | `v7/A32/A64` | +| void vst1q_f32_x3(
     float32_t *ptr,
     float32x4x3_t val)
| `val.val[2] -> Vt3.4S`
`val.val[1] -> Vt2.4S`
`val.val[0] -> Vt.4S`
`ptr -> Xn` | `ST1 {Vt.4S - Vt3.4S},[Xn]` | | `v7/A32/A64` | +| void vst1_p8_x3(
     poly8_t *ptr,
     poly8x8x3_t val)
| `val.val[2] -> Vt3.8B`
`val.val[1] -> Vt2.8B`
`val.val[0] -> Vt.8B`
`ptr -> Xn` | `ST1 {Vt.8B - Vt3.8B},[Xn]` | | `v7/A32/A64` | +| void vst1q_p8_x3(
     poly8_t *ptr,
     poly8x16x3_t val)
| `val.val[2] -> Vt3.16B`
`val.val[1] -> Vt2.16B`
`val.val[0] -> Vt.16B`
`ptr -> Xn` | `ST1 {Vt.16B - Vt3.16B},[Xn]` | | `v7/A32/A64` | +| void vst1_p16_x3(
     poly16_t *ptr,
     poly16x4x3_t val)
| `val.val[2] -> Vt3.4H`
`val.val[1] -> Vt2.4H`
`val.val[0] -> Vt.4H`
`ptr -> Xn` | `ST1 {Vt.4H - Vt3.4H},[Xn]` | | `v7/A32/A64` | +| void vst1q_p16_x3(
     poly16_t *ptr,
     poly16x8x3_t val)
| `val.val[2] -> Vt3.8H`
`val.val[1] -> Vt2.8H`
`val.val[0] -> Vt.8H`
`ptr -> Xn` | `ST1 {Vt.8H - Vt3.8H},[Xn]` | | `v7/A32/A64` | +| void vst1_s64_x3(
     int64_t *ptr,
     int64x1x3_t val)
| `val.val[2] -> Vt3.1D`
`val.val[1] -> Vt2.1D`
`val.val[0] -> Vt.1D`
`ptr -> Xn` | `ST1 {Vt.1D - Vt3.1D},[Xn]` | | `v7/A32/A64` | +| void vst1_u64_x3(
     uint64_t *ptr,
     uint64x1x3_t val)
| `val.val[2] -> Vt3.1D`
`val.val[1] -> Vt2.1D`
`val.val[0] -> Vt.1D`
`ptr -> Xn` | `ST1 {Vt.1D - Vt3.1D},[Xn]` | | `v7/A32/A64` | +| void vst1_p64_x3(
     poly64_t *ptr,
     poly64x1x3_t val)
| `val.val[2] -> Vt3.1D`
`val.val[1] -> Vt2.1D`
`val.val[0] -> Vt.1D`
`ptr -> Xn` | `ST1 {Vt.1D - Vt3.1D},[Xn]` | | `A32/A64` | +| void vst1q_s64_x3(
     int64_t *ptr,
     int64x2x3_t val)
| `val.val[2] -> Vt3.2D`
`val.val[1] -> Vt2.2D`
`val.val[0] -> Vt.2D`
`ptr -> Xn` | `ST1 {Vt.2D - Vt3.2D},[Xn]` | | `v7/A32/A64` | +| void vst1q_u64_x3(
     uint64_t *ptr,
     uint64x2x3_t val)
| `val.val[2] -> Vt3.2D`
`val.val[1] -> Vt2.2D`
`val.val[0] -> Vt.2D`
`ptr -> Xn` | `ST1 {Vt.2D - Vt3.2D},[Xn]` | | `v7/A32/A64` | +| void vst1q_p64_x3(
     poly64_t *ptr,
     poly64x2x3_t val)
| `val.val[2] -> Vt3.2D`
`val.val[1] -> Vt2.2D`
`val.val[0] -> Vt.2D`
`ptr -> Xn` | `ST1 {Vt.2D - Vt3.2D},[Xn]` | | `v7/A32/A64` | +| void vst1_f64_x3(
     float64_t *ptr,
     float64x1x3_t val)
| `val.val[2] -> Vt3.1D`
`val.val[1] -> Vt2.1D`
`val.val[0] -> Vt.1D`
`ptr -> Xn` | `ST1 {Vt.1D - Vt3.1D},[Xn]` | | `A64` | +| void vst1q_f64_x3(
     float64_t *ptr,
     float64x2x3_t val)
| `val.val[2] -> Vt3.2D`
`val.val[1] -> Vt2.2D`
`val.val[0] -> Vt.2D`
`ptr -> Xn` | `ST1 {Vt.2D - Vt3.2D},[Xn]` | | `A64` | +| void vst1_mf8_x3(
     mfloat8_t *ptr,
     mfloat8x8x3_t val)
| `val.val[2] -> Vt3.8B`
`val.val[1] -> Vt2.8B`
`val.val[0] -> Vt.8B`
`ptr -> Xn` | `ST1 {Vt.8B - Vt3.8B},[Xn]` | | `A64` | +| void vst1q_mf8_x3(
     mfloat8_t *ptr,
     mfloat8x16x3_t val)
| `val.val[2] -> Vt3.16B`
`val.val[1] -> Vt2.16B`
`val.val[0] -> Vt.16B`
`ptr -> Xn` | `ST1 {Vt.16B - Vt3.16B},[Xn]` | | `A64` | +| void vst1_s8_x4(
     int8_t *ptr,
     int8x8x4_t val)
| `val.val[3] -> Vt4.8B`
`val.val[2] -> Vt3.8B`
`val.val[1] -> Vt2.8B`
`val.val[0] -> Vt.8B`
`ptr -> Xn` | `ST1 {Vt.8B - Vt4.8B},[Xn]` | | `v7/A32/A64` | +| void vst1q_s8_x4(
     int8_t *ptr,
     int8x16x4_t val)
| `val.val[3] -> Vt4.16B`
`val.val[2] -> Vt3.16B`
`val.val[1] -> Vt2.16B`
`val.val[0] -> Vt.16B`
`ptr -> Xn` | `ST1 {Vt.16B - Vt4.16B},[Xn]` | | `v7/A32/A64` | +| void vst1_s16_x4(
     int16_t *ptr,
     int16x4x4_t val)
| `val.val[3] -> Vt4.4H`
`val.val[2] -> Vt3.4H`
`val.val[1] -> Vt2.4H`
`val.val[0] -> Vt.4H`
`ptr -> Xn` | `ST1 {Vt.4H - Vt4.4H},[Xn]` | | `v7/A32/A64` | +| void vst1q_s16_x4(
     int16_t *ptr,
     int16x8x4_t val)
| `val.val[3] -> Vt4.8H`
`val.val[2] -> Vt3.8H`
`val.val[1] -> Vt2.8H`
`val.val[0] -> Vt.8H`
`ptr -> Xn` | `ST1 {Vt.8H - Vt4.8H},[Xn]` | | `v7/A32/A64` | +| void vst1_s32_x4(
     int32_t *ptr,
     int32x2x4_t val)
| `val.val[3] -> Vt4.2S`
`val.val[2] -> Vt3.2S`
`val.val[1] -> Vt2.2S`
`val.val[0] -> Vt.2S`
`ptr -> Xn` | `ST1 {Vt.2S - Vt4.2S},[Xn]` | | `v7/A32/A64` | +| void vst1q_s32_x4(
     int32_t *ptr,
     int32x4x4_t val)
| `val.val[3] -> Vt4.4S`
`val.val[2] -> Vt3.4S`
`val.val[1] -> Vt2.4S`
`val.val[0] -> Vt.4S`
`ptr -> Xn` | `ST1 {Vt.4S - Vt4.4S},[Xn]` | | `v7/A32/A64` | +| void vst1_u8_x4(
     uint8_t *ptr,
     uint8x8x4_t val)
| `val.val[3] -> Vt4.8B`
`val.val[2] -> Vt3.8B`
`val.val[1] -> Vt2.8B`
`val.val[0] -> Vt.8B`
`ptr -> Xn` | `ST1 {Vt.8B - Vt4.8B},[Xn]` | | `v7/A32/A64` | +| void vst1q_u8_x4(
     uint8_t *ptr,
     uint8x16x4_t val)
| `val.val[3] -> Vt4.16B`
`val.val[2] -> Vt3.16B`
`val.val[1] -> Vt2.16B`
`val.val[0] -> Vt.16B`
`ptr -> Xn` | `ST1 {Vt.16B - Vt4.16B},[Xn]` | | `v7/A32/A64` | +| void vst1_u16_x4(
     uint16_t *ptr,
     uint16x4x4_t val)
| `val.val[3] -> Vt4.4H`
`val.val[2] -> Vt3.4H`
`val.val[1] -> Vt2.4H`
`val.val[0] -> Vt.4H`
`ptr -> Xn` | `ST1 {Vt.4H - Vt4.4H},[Xn]` | | `v7/A32/A64` | +| void vst1q_u16_x4(
     uint16_t *ptr,
     uint16x8x4_t val)
| `val.val[3] -> Vt4.8H`
`val.val[2] -> Vt3.8H`
`val.val[1] -> Vt2.8H`
`val.val[0] -> Vt.8H`
`ptr -> Xn` | `ST1 {Vt.8H - Vt4.8H},[Xn]` | | `v7/A32/A64` | +| void vst1_u32_x4(
     uint32_t *ptr,
     uint32x2x4_t val)
| `val.val[3] -> Vt4.2S`
`val.val[2] -> Vt3.2S`
`val.val[1] -> Vt2.2S`
`val.val[0] -> Vt.2S`
`ptr -> Xn` | `ST1 {Vt.2S - Vt4.2S},[Xn]` | | `v7/A32/A64` | +| void vst1q_u32_x4(
     uint32_t *ptr,
     uint32x4x4_t val)
| `val.val[3] -> Vt4.4S`
`val.val[2] -> Vt3.4S`
`val.val[1] -> Vt2.4S`
`val.val[0] -> Vt.4S`
`ptr -> Xn` | `ST1 {Vt.4S - Vt4.4S},[Xn]` | | `v7/A32/A64` | +| void vst1_f16_x4(
     float16_t *ptr,
     float16x4x4_t val)
| `val.val[3] -> Vt4.4H`
`val.val[2] -> Vt3.4H`
`val.val[1] -> Vt2.4H`
`val.val[0] -> Vt.4H`
`ptr -> Xn` | `ST1 {Vt.4H - Vt4.4H},[Xn]` | | `v7/A32/A64` | +| void vst1q_f16_x4(
     float16_t *ptr,
     float16x8x4_t val)
| `val.val[3] -> Vt4.8H`
`val.val[2] -> Vt3.8H`
`val.val[1] -> Vt2.8H`
`val.val[0] -> Vt.8H`
`ptr -> Xn` | `ST1 {Vt.8H - Vt4.8H},[Xn]` | | `v7/A32/A64` | +| void vst1_f32_x4(
     float32_t *ptr,
     float32x2x4_t val)
| `val.val[3] -> Vt4.2S`
`val.val[2] -> Vt3.2S`
`val.val[1] -> Vt2.2S`
`val.val[0] -> Vt.2S`
`ptr -> Xn` | `ST1 {Vt.2S - Vt4.2S},[Xn]` | | `v7/A32/A64` | +| void vst1q_f32_x4(
     float32_t *ptr,
     float32x4x4_t val)
| `val.val[3] -> Vt4.4S`
`val.val[2] -> Vt3.4S`
`val.val[1] -> Vt2.4S`
`val.val[0] -> Vt.4S`
`ptr -> Xn` | `ST1 {Vt.4S - Vt4.4S},[Xn]` | | `v7/A32/A64` | +| void vst1_p8_x4(
     poly8_t *ptr,
     poly8x8x4_t val)
| `val.val[3] -> Vt4.8B`
`val.val[2] -> Vt3.8B`
`val.val[1] -> Vt2.8B`
`val.val[0] -> Vt.8B`
`ptr -> Xn` | `ST1 {Vt.8B - Vt4.8B},[Xn]` | | `v7/A32/A64` | +| void vst1q_p8_x4(
     poly8_t *ptr,
     poly8x16x4_t val)
| `val.val[3] -> Vt4.16B`
`val.val[2] -> Vt3.16B`
`val.val[1] -> Vt2.16B`
`val.val[0] -> Vt.16B`
`ptr -> Xn` | `ST1 {Vt.16B - Vt4.16B},[Xn]` | | `v7/A32/A64` | +| void vst1_p16_x4(
     poly16_t *ptr,
     poly16x4x4_t val)
| `val.val[3] -> Vt4.4H`
`val.val[2] -> Vt3.4H`
`val.val[1] -> Vt2.4H`
`val.val[0] -> Vt.4H`
`ptr -> Xn` | `ST1 {Vt.4H - Vt4.4H},[Xn]` | | `v7/A32/A64` | +| void vst1q_p16_x4(
     poly16_t *ptr,
     poly16x8x4_t val)
| `val.val[3] -> Vt4.8H`
`val.val[2] -> Vt3.8H`
`val.val[1] -> Vt2.8H`
`val.val[0] -> Vt.8H`
`ptr -> Xn` | `ST1 {Vt.8H - Vt4.8H},[Xn]` | | `v7/A32/A64` | +| void vst1_s64_x4(
     int64_t *ptr,
     int64x1x4_t val)
| `val.val[3] -> Vt4.1D`
`val.val[2] -> Vt3.1D`
`val.val[1] -> Vt2.1D`
`val.val[0] -> Vt.1D`
`ptr -> Xn` | `ST1 {Vt.1D - Vt4.1D},[Xn]` | | `v7/A32/A64` | +| void vst1_u64_x4(
     uint64_t *ptr,
     uint64x1x4_t val)
| `val.val[3] -> Vt4.1D`
`val.val[2] -> Vt3.1D`
`val.val[1] -> Vt2.1D`
`val.val[0] -> Vt.1D`
`ptr -> Xn` | `ST1 {Vt.1D - Vt4.1D},[Xn]` | | `v7/A32/A64` | +| void vst1_p64_x4(
     poly64_t *ptr,
     poly64x1x4_t val)
| `val.val[3] -> Vt4.1D`
`val.val[2] -> Vt3.1D`
`val.val[1] -> Vt2.1D`
`val.val[0] -> Vt.1D`
`ptr -> Xn` | `ST1 {Vt.1D - Vt4.1D},[Xn]` | | `A32/A64` | +| void vst1q_s64_x4(
     int64_t *ptr,
     int64x2x4_t val)
| `val.val[3] -> Vt4.2D`
`val.val[2] -> Vt3.2D`
`val.val[1] -> Vt2.2D`
`val.val[0] -> Vt.2D`
`ptr -> Xn` | `ST1 {Vt.2D - Vt4.2D},[Xn]` | | `v7/A32/A64` | +| void vst1q_u64_x4(
     uint64_t *ptr,
     uint64x2x4_t val)
| `val.val[3] -> Vt4.2D`
`val.val[2] -> Vt3.2D`
`val.val[1] -> Vt2.2D`
`val.val[0] -> Vt.2D`
`ptr -> Xn` | `ST1 {Vt.2D - Vt4.2D},[Xn]` | | `v7/A32/A64` | +| void vst1q_p64_x4(
     poly64_t *ptr,
     poly64x2x4_t val)
| `val.val[3] -> Vt4.2D`
`val.val[2] -> Vt3.2D`
`val.val[1] -> Vt2.2D`
`val.val[0] -> Vt.2D`
`ptr -> Xn` | `ST1 {Vt.2D - Vt4.2D},[Xn]` | | `A32/A64` | +| void vst1_f64_x4(
     float64_t *ptr,
     float64x1x4_t val)
| `val.val[3] -> Vt4.1D`
`val.val[2] -> Vt3.1D`
`val.val[1] -> Vt2.1D`
`val.val[0] -> Vt.1D`
`ptr -> Xn` | `ST1 {Vt.1D - Vt4.1D},[Xn]` | | `A64` | +| void vst1q_f64_x4(
     float64_t *ptr,
     float64x2x4_t val)
| `val.val[3] -> Vt4.2D`
`val.val[2] -> Vt3.2D`
`val.val[1] -> Vt2.2D`
`val.val[0] -> Vt.2D`
`ptr -> Xn` | `ST1 {Vt.2D - Vt4.2D},[Xn]` | | `A64` | +| void vst1_mf8_x4(
     int8_t *ptr,
     int8x8x4_t val)
| `val.val[3] -> Vt4.8B`
`val.val[2] -> Vt3.8B`
`val.val[1] -> Vt2.8B`
`val.val[0] -> Vt.8B`
`ptr -> Xn` | `ST1 {Vt.8B - Vt4.8B},[Xn]` | | `A64` | +| void vst1q_mf8_x4(
     int8_t *ptr,
     int8x16x4_t val)
| `val.val[3] -> Vt4.16B`
`val.val[2] -> Vt3.16B`
`val.val[1] -> Vt2.16B`
`val.val[0] -> Vt.16B`
`ptr -> Xn` | `ST1 {Vt.16B - Vt4.16B},[Xn]` | | `A64` | #### Store @@ -4426,85 +4587,109 @@ The intrinsics in this section are guarded by the macro ``__ARM_NEON``. #### Table lookup -| Intrinsic | Argument preparation | AArch64 Instruction | Result | Supported architectures | -|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------|--------------------|---------------------------| -| int8x8_t vtbl1_s8(
     int8x8_t a,
     int8x8_t idx)
| `Zeros(64):a -> Vn.16B`
`idx -> Vm.8B` | `TBL Vd.8B,{Vn.16B},Vm.8B` | `Vd.8B -> result` | `v7/A32/A64` | -| uint8x8_t vtbl1_u8(
     uint8x8_t a,
     uint8x8_t idx)
| `Zeros(64):a -> Vn.16B`
`idx -> Vm.8B` | `TBL Vd.8B,{Vn.16B},Vm.8B` | `Vd.8B -> result` | `v7/A32/A64` | -| poly8x8_t vtbl1_p8(
     poly8x8_t a,
     uint8x8_t idx)
| `Zeros(64):a -> Vn.16B`
`idx -> Vm.8B` | `TBL Vd.8B,{Vn.16B},Vm.8B` | `Vd.8B -> result` | `v7/A32/A64` | -| int8x8_t vtbx1_s8(
     int8x8_t a,
     int8x8_t b,
     int8x8_t idx)
| `Zeros(64):b -> Vn.16B`
`a -> Vd.8B`
`idx -> Vm.8B` | `MOVI Vtmp.8B,#8`
`CMHS Vtmp.8B,Vm.8B,Vtmp.8B`
`TBL Vtmp1.8B,{Vn.16B},Vm.8B`
`BIF Vd.8B,Vtmp1.8B,Vtmp.8B` | `Vd.8B -> result` | `v7/A32/A64` | -| uint8x8_t vtbx1_u8(
     uint8x8_t a,
     uint8x8_t b,
     uint8x8_t idx)
| `Zeros(64):b -> Vn.16B`
`a -> Vd.8B`
`idx -> Vm.8B` | `MOVI Vtmp.8B,#8`
`CMHS Vtmp.8B,Vm.8B,Vtmp.8B`
`TBL Vtmp1.8B,{Vn.16B},Vm.8B`
`BIF Vd.8B,Vtmp1.8B,Vtmp.8B` | `Vd.8B -> result` | `v7/A32/A64` | -| poly8x8_t vtbx1_p8(
     poly8x8_t a,
     poly8x8_t b,
     uint8x8_t idx)
| `Zeros(64):b -> Vn.16B`
`a -> Vd.8B`
`idx -> Vm.8B` | `MOVI Vtmp.8B,#8`
`CMHS Vtmp.8B,Vm.8B,Vtmp.8B`
`TBL Vtmp1.8B,{Vn.16B},Vm.8B`
`BIF Vd.8B,Vtmp1.8B, Vtmp.8B` | `Vd.8B -> result` | `v7/A32/A64` | -| int8x8_t vtbl2_s8(
     int8x8x2_t a,
     int8x8_t idx)
| `a.val[1]:a.val[0] -> Vn.16B`
`idx -> Vm.8B` | `TBL Vd.8B,{Vn.16B},Vm.8B` | `Vd.8B -> result` | `v7/A32/A64` | -| uint8x8_t vtbl2_u8(
     uint8x8x2_t a,
     uint8x8_t idx)
| `a.val[1]:a.val[0] -> Vn.16B`
`idx -> Vm.8B` | `TBL Vd.8B,{Vn.16B},Vm.8B` | `Vd.8B -> result` | `v7/A32/A64` | -| poly8x8_t vtbl2_p8(
     poly8x8x2_t a,
     uint8x8_t idx)
| `a.val[1]:a.val[0] -> Vn.16B`
`idx -> Vm.8B` | `TBL Vd.8B,{Vn.16B},Vm.8B` | `Vd.8B -> result` | `v7/A32/A64` | -| int8x8_t vtbl3_s8(
     int8x8x3_t a,
     int8x8_t idx)
| `a.val[1]:a.val[0] -> Vn.16B`
`Zeros(64):a.val[2] -> Vn+1.16B`
`idx -> Vm.8B` | `TBL Vd.8B,{Vn.16B,Vn+1.16B},Vm.8B` | `Vd.8B -> result` | `v7/A32/A64` | -| uint8x8_t vtbl3_u8(
     uint8x8x3_t a,
     uint8x8_t idx)
| `a.val[1]:a.val[0] -> Vn.16B`
`Zeros(64):a.val[2] -> Vn+1.16B`
`idx -> Vm.8B` | `TBL Vd.8B,{Vn.16B,Vn+1.16B},Vm.8B` | `Vd.8B -> result` | `v7/A32/A64` | -| poly8x8_t vtbl3_p8(
     poly8x8x3_t a,
     uint8x8_t idx)
| `a.val[1]:a.val[0] -> Vn.16B`
`Zeros(64):a.val[2] -> Vn+1.16B`
`idx -> Vm.8B` | `TBL Vd.8B,{Vn.16B,Vn+1.16B},Vm.8B` | `Vd.8B -> result` | `v7/A32/A64` | -| int8x8_t vtbl4_s8(
     int8x8x4_t a,
     int8x8_t idx)
| `a.val[1]:a.val[0] -> Vn.16B`
`a.val[3]:a.val[2] -> Vn+1.16B`
`idx -> Vm.8B` | `TBL Vd.8B,{Vn.16B,Vn+1.16B},Vm.8B` | `Vd.8B -> result` | `v7/A32/A64` | -| uint8x8_t vtbl4_u8(
     uint8x8x4_t a,
     uint8x8_t idx)
| `a.val[1]:a.val[0] -> Vn.16B`
`a.val[3]:a.val[2] -> Vn+1.16B`
`idx -> Vm.8B` | `TBL Vd.8B,{Vn.16B,Vn+1.16B},Vm.8B` | `Vd.8B -> result` | `v7/A32/A64` | -| poly8x8_t vtbl4_p8(
     poly8x8x4_t a,
     uint8x8_t idx)
| `a.val[1]:a.val[0] -> Vn.16B`
`a.val[3]:a.val[2] -> Vn+1.16B`
`idx -> Vm.8B` | `TBL Vd.8B,{Vn.16B,Vn+1.16B},Vm.8B` | `Vd.8B -> result` | `v7/A32/A64` | -| int8x8_t vqtbl1_s8(
     int8x16_t t,
     uint8x8_t idx)
| `t -> Vn.16B`
`idx -> Vm.8B` | `TBL Vd.8B,{Vn.16B},Vm.8B` | `Vd.8B -> result` | `A64` | -| int8x16_t vqtbl1q_s8(
     int8x16_t t,
     uint8x16_t idx)
| `t -> Vn.16B`
`idx -> Vm.16B` | `TBL Vd.16B,{Vn.16B},Vm.16B` | `Vd.16B -> result` | `A64` | -| uint8x8_t vqtbl1_u8(
     uint8x16_t t,
     uint8x8_t idx)
| `t -> Vn.16B`
`idx -> Vm.8B` | `TBL Vd.8B,{Vn.16B},Vm.8B` | `Vd.8B -> result` | `A64` | -| uint8x16_t vqtbl1q_u8(
     uint8x16_t t,
     uint8x16_t idx)
| `t -> Vn.16B`
`idx -> Vm.16B` | `TBL Vd.16B,{Vn.16B},Vm.16B` | `Vd.16B -> result` | `A64` | -| poly8x8_t vqtbl1_p8(
     poly8x16_t t,
     uint8x8_t idx)
| `t -> Vn.16B`
`idx -> Vm.8B` | `TBL Vd.8B,{Vn.16B},Vm.8B` | `Vd.8B -> result` | `A64` | -| poly8x16_t vqtbl1q_p8(
     poly8x16_t t,
     uint8x16_t idx)
| `t -> Vn.16B`
`idx -> Vm.16B` | `TBL Vd.16B,{Vn.16B},Vm.16B` | `Vd.16B -> result` | `A64` | -| int8x8_t vqtbl2_s8(
     int8x16x2_t t,
     uint8x8_t idx)
| `t.val[0] -> Vn.16B`
`t.val[1] -> Vn+1.16B`
`idx -> Vm.8B` | `TBL Vd.8B,{Vn.16B - Vn+1.16B},Vm.8B` | `Vd.8B -> result` | `A64` | -| int8x16_t vqtbl2q_s8(
     int8x16x2_t t,
     uint8x16_t idx)
| `t.val[0] -> Vn.16B`
`t.val[1] -> Vn+1.16B`
`idx -> Vm.16B` | `TBL Vd.16B,{Vn.16B - Vn+1.16B},Vm.16B` | `Vd.16B -> result` | `A64` | -| uint8x8_t vqtbl2_u8(
     uint8x16x2_t t,
     uint8x8_t idx)
| `t.val[0] -> Vn.16B`
`t.val[1] -> Vn+1.16B`
`idx -> Vm.8B` | `TBL Vd.8B,{Vn.16B - Vn+1.16B},Vm.8B` | `Vd.8B -> result` | `A64` | -| uint8x16_t vqtbl2q_u8(
     uint8x16x2_t t,
     uint8x16_t idx)
| `t.val[0] -> Vn.16B`
`t.val[1] -> Vn+1.16B`
`idx -> Vm.16B` | `TBL Vd.16B,{Vn.16B - Vn+1.16B},Vm.16B` | `Vd.16B -> result` | `A64` | -| poly8x8_t vqtbl2_p8(
     poly8x16x2_t t,
     uint8x8_t idx)
| `t.val[0] -> Vn.16B`
`t.val[1] -> Vn+1.16B`
`idx -> Vm.8B` | `TBL Vd.8B,{Vn.16B - Vn+1.16B},Vm.8B` | `Vd.8B -> result` | `A64` | -| poly8x16_t vqtbl2q_p8(
     poly8x16x2_t t,
     uint8x16_t idx)
| `t.val[0] -> Vn.16B`
`t.val[1] -> Vn+1.16B`
`idx -> Vm.16B` | `TBL Vd.16B,{Vn.16B - Vn+1.16B},Vm.16B` | `Vd.16B -> result` | `A64` | -| int8x8_t vqtbl3_s8(
     int8x16x3_t t,
     uint8x8_t idx)
| `t.val[0] -> Vn.16B`
`t.val[1] -> Vn+1.16B`
`t.val[2] -> Vn+2.16B`
`idx -> Vm.8B` | `TBL Vd.8B,{Vn.16B - Vn+2.16B},Vm.8B` | `Vd.8B -> result` | `A64` | -| int8x16_t vqtbl3q_s8(
     int8x16x3_t t,
     uint8x16_t idx)
| `t.val[0] -> Vn.16B`
`t.val[1] -> Vn+1.16B`
`t.val[2] -> Vn+2.16B`
`idx -> Vm.16B` | `TBL Vd.16B,{Vn.16B - Vn+2.16B},Vm.16B` | `Vd.16B -> result` | `A64` | -| uint8x8_t vqtbl3_u8(
     uint8x16x3_t t,
     uint8x8_t idx)
| `t.val[0] -> Vn.16B`
`t.val[1] -> Vn+1.16B`
`t.val[2] -> Vn+2.16B`
`idx -> Vm.8B` | `TBL Vd.8B,{Vn.16B - Vn+2.16B},Vm.8B` | `Vd.8B -> result` | `A64` | -| uint8x16_t vqtbl3q_u8(
     uint8x16x3_t t,
     uint8x16_t idx)
| `t.val[0] -> Vn.16B`
`t.val[1] -> Vn+1.16B`
`t.val[2] -> Vn+2.16B`
`idx -> Vm.16B` | `TBL Vd.16B,{Vn.16B - Vn+2.16B},Vm.16B` | `Vd.16B -> result` | `A64` | -| poly8x8_t vqtbl3_p8(
     poly8x16x3_t t,
     uint8x8_t idx)
| `t.val[0] -> Vn.16B`
`t.val[1] -> Vn+1.16B`
`t.val[2] -> Vn+2.16B`
`idx -> Vm.8B` | `TBL Vd.8B,{Vn.16B - Vn+2.16B},Vm.8B` | `Vd.8B -> result` | `A64` | -| poly8x16_t vqtbl3q_p8(
     poly8x16x3_t t,
     uint8x16_t idx)
| `t.val[0] -> Vn.16B`
`t.val[1] -> Vn+1.16B`
`t.val[2] -> Vn+2.16B`
`idx -> Vm.16B` | `TBL Vd.16B,{Vn.16B - Vn+2.16B},Vm.16B` | `Vd.16B -> result` | `A64` | -| int8x8_t vqtbl4_s8(
     int8x16x4_t t,
     uint8x8_t idx)
| `t.val[0] -> Vn.16B`
`t.val[1] -> Vn+1.16B`
`t.val[2] -> Vn+2.16B`
`t.val[3] -> Vn+3.16B`
`idx -> Vm.8B` | `TBL Vd.8B,{Vn.16B - Vn+3.16B},Vm.8B` | `Vd.8B -> result` | `A64` | -| int8x16_t vqtbl4q_s8(
     int8x16x4_t t,
     uint8x16_t idx)
| `t.val[0] -> Vn.16B`
`t.val[1] -> Vn+1.16B`
`t.val[2] -> Vn+2.16B`
`t.val[3] -> Vn+3.16B`
`idx -> Vm.16B` | `TBL Vd.16B,{Vn.16B - Vn+3.16B},Vm.16B` | `Vd.16B -> result` | `A64` | -| uint8x8_t vqtbl4_u8(
     uint8x16x4_t t,
     uint8x8_t idx)
| `t.val[0] -> Vn.16B`
`t.val[1] -> Vn+1.16B`
`t.val[2] -> Vn+2.16B`
`t.val[3] -> Vn+3.16B`
`idx -> Vm.8B` | `TBL Vd.8B,{Vn.16B - Vn+3.16B},Vm.8B` | `Vd.8B -> result` | `A64` | -| uint8x16_t vqtbl4q_u8(
     uint8x16x4_t t,
     uint8x16_t idx)
| `t.val[0] -> Vn.16B`
`t.val[1] -> Vn+1.16B`
`t.val[2] -> Vn+2.16B`
`t.val[3] -> Vn+3.16B`
`idx -> Vm.16B` | `TBL Vd.16B,{Vn.16B - Vn+3.16B},Vm.16B` | `Vd.16B -> result` | `A64` | -| poly8x8_t vqtbl4_p8(
     poly8x16x4_t t,
     uint8x8_t idx)
| `t.val[0] -> Vn.16B`
`t.val[1] -> Vn+1.16B`
`t.val[2] -> Vn+2.16B`
`t.val[3] -> Vn+3.16B`
`idx -> Vm.8B` | `TBL Vd.8B,{Vn.16B - Vn+3.16B},Vm.8B` | `Vd.8B -> result` | `A64` | -| poly8x16_t vqtbl4q_p8(
     poly8x16x4_t t,
     uint8x16_t idx)
| `t.val[0] -> Vn.16B`
`t.val[1] -> Vn+1.16B`
`t.val[2] -> Vn+2.16B`
`t.val[3] -> Vn+3.16B`
`idx -> Vm.16B` | `TBL Vd.16B,{Vn.16B - Vn+3.16B},Vm.16B` | `Vd.16B -> result` | `A64` | +| Intrinsic | Argument preparation | AArch64 Instruction | Result | Supported architectures | +|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------|--------------------|---------------------------| +| int8x8_t vtbl1_s8(
     int8x8_t a,
     int8x8_t idx)
| `Zeros(64):a -> Vn.16B`
`idx -> Vm.8B` | `TBL Vd.8B,{Vn.16B},Vm.8B` | `Vd.8B -> result` | `v7/A32/A64` | +| uint8x8_t vtbl1_u8(
     uint8x8_t a,
     uint8x8_t idx)
| `Zeros(64):a -> Vn.16B`
`idx -> Vm.8B` | `TBL Vd.8B,{Vn.16B},Vm.8B` | `Vd.8B -> result` | `v7/A32/A64` | +| poly8x8_t vtbl1_p8(
     poly8x8_t a,
     uint8x8_t idx)
| `Zeros(64):a -> Vn.16B`
`idx -> Vm.8B` | `TBL Vd.8B,{Vn.16B},Vm.8B` | `Vd.8B -> result` | `v7/A32/A64` | +| mfloat8x8_t vtbl1_mf8(
     mfloat8x8_t a,
     uint8x8_t idx)
| `Zeros(64):a -> Vn.16B`
`idx -> Vm.8B` | `TBL Vd.8B,{Vn.16B},Vm.8B` | `Vd.8B -> result` | `A64` | +| int8x8_t vtbx1_s8(
     int8x8_t a,
     int8x8_t b,
     int8x8_t idx)
| `Zeros(64):b -> Vn.16B`
`a -> Vd.8B`
`idx -> Vm.8B` | `MOVI Vtmp.8B,#8`
`CMHS Vtmp.8B,Vm.8B,Vtmp.8B`
`TBL Vtmp1.8B,{Vn.16B},Vm.8B`
`BIF Vd.8B,Vtmp1.8B,Vtmp.8B` | `Vd.8B -> result` | `v7/A32/A64` | +| uint8x8_t vtbx1_u8(
     uint8x8_t a,
     uint8x8_t b,
     uint8x8_t idx)
| `Zeros(64):b -> Vn.16B`
`a -> Vd.8B`
`idx -> Vm.8B` | `MOVI Vtmp.8B,#8`
`CMHS Vtmp.8B,Vm.8B,Vtmp.8B`
`TBL Vtmp1.8B,{Vn.16B},Vm.8B`
`BIF Vd.8B,Vtmp1.8B,Vtmp.8B` | `Vd.8B -> result` | `v7/A32/A64` | +| poly8x8_t vtbx1_p8(
     poly8x8_t a,
     poly8x8_t b,
     uint8x8_t idx)
| `Zeros(64):b -> Vn.16B`
`a -> Vd.8B`
`idx -> Vm.8B` | `MOVI Vtmp.8B,#8`
`CMHS Vtmp.8B,Vm.8B,Vtmp.8B`
`TBL Vtmp1.8B,{Vn.16B},Vm.8B`
`BIF Vd.8B,Vtmp1.8B, Vtmp.8B` | `Vd.8B -> result` | `v7/A32/A64` | +| mfloat8x8_t vtbx1_mf8(
     mfloat8x8_t a,
     mfloat8x8_t b,
     uint8x8_t idx)
| `Zeros(64):b -> Vn.16B`
`a -> Vd.8B`
`idx -> Vm.8B` | `MOVI Vtmp.8B,#8`
`CMHS Vtmp.8B,Vm.8B,Vtmp.8B`
`TBL Vtmp1.8B,{Vn.16B},Vm.8B`
`BIF Vd.8B,Vtmp1.8B, Vtmp.8B` | `Vd.8B -> result` | `A64` | +| int8x8_t vtbl2_s8(
     int8x8x2_t a,
     int8x8_t idx)
| `a.val[1]:a.val[0] -> Vn.16B`
`idx -> Vm.8B` | `TBL Vd.8B,{Vn.16B},Vm.8B` | `Vd.8B -> result` | `v7/A32/A64` | +| uint8x8_t vtbl2_u8(
     uint8x8x2_t a,
     uint8x8_t idx)
| `a.val[1]:a.val[0] -> Vn.16B`
`idx -> Vm.8B` | `TBL Vd.8B,{Vn.16B},Vm.8B` | `Vd.8B -> result` | `v7/A32/A64` | +| poly8x8_t vtbl2_p8(
     poly8x8x2_t a,
     uint8x8_t idx)
| `a.val[1]:a.val[0] -> Vn.16B`
`idx -> Vm.8B` | `TBL Vd.8B,{Vn.16B},Vm.8B` | `Vd.8B -> result` | `v7/A32/A64` | +| mfloat8x8_t vtbl2_mf8(
     mfloat8x8x2_t a,
     uint8x8_t idx)
| `a.val[1]:a.val[0] -> Vn.16B`
`idx -> Vm.8B` | `TBL Vd.8B,{Vn.16B},Vm.8B` | `Vd.8B -> result` | `A64` | +| int8x8_t vtbl3_s8(
     int8x8x3_t a,
     int8x8_t idx)
| `a.val[1]:a.val[0] -> Vn.16B`
`Zeros(64):a.val[2] -> Vn+1.16B`
`idx -> Vm.8B` | `TBL Vd.8B,{Vn.16B,Vn+1.16B},Vm.8B` | `Vd.8B -> result` | `v7/A32/A64` | +| uint8x8_t vtbl3_u8(
     uint8x8x3_t a,
     uint8x8_t idx)
| `a.val[1]:a.val[0] -> Vn.16B`
`Zeros(64):a.val[2] -> Vn+1.16B`
`idx -> Vm.8B` | `TBL Vd.8B,{Vn.16B,Vn+1.16B},Vm.8B` | `Vd.8B -> result` | `v7/A32/A64` | +| poly8x8_t vtbl3_p8(
     poly8x8x3_t a,
     uint8x8_t idx)
| `a.val[1]:a.val[0] -> Vn.16B`
`Zeros(64):a.val[2] -> Vn+1.16B`
`idx -> Vm.8B` | `TBL Vd.8B,{Vn.16B,Vn+1.16B},Vm.8B` | `Vd.8B -> result` | `v7/A32/A64` | +| mfloat8x8_t vtbl3_mf8(
     mfloat8x8x3_t a,
     uint8x8_t idx)
| `a.val[1]:a.val[0] -> Vn.16B`
`Zeros(64):a.val[2] -> Vn+1.16B`
`idx -> Vm.8B` | `TBL Vd.8B,{Vn.16B,Vn+1.16B},Vm.8B` | `Vd.8B -> result` | `A64` | +| int8x8_t vtbl4_s8(
     int8x8x4_t a,
     int8x8_t idx)
| `a.val[1]:a.val[0] -> Vn.16B`
`a.val[3]:a.val[2] -> Vn+1.16B`
`idx -> Vm.8B` | `TBL Vd.8B,{Vn.16B,Vn+1.16B},Vm.8B` | `Vd.8B -> result` | `v7/A32/A64` | +| uint8x8_t vtbl4_u8(
     uint8x8x4_t a,
     uint8x8_t idx)
| `a.val[1]:a.val[0] -> Vn.16B`
`a.val[3]:a.val[2] -> Vn+1.16B`
`idx -> Vm.8B` | `TBL Vd.8B,{Vn.16B,Vn+1.16B},Vm.8B` | `Vd.8B -> result` | `v7/A32/A64` | +| poly8x8_t vtbl4_p8(
     poly8x8x4_t a,
     uint8x8_t idx)
| `a.val[1]:a.val[0] -> Vn.16B`
`a.val[3]:a.val[2] -> Vn+1.16B`
`idx -> Vm.8B` | `TBL Vd.8B,{Vn.16B,Vn+1.16B},Vm.8B` | `Vd.8B -> result` | `v7/A32/A64` | +| mfloat8x8_t vtbl4_mf8(
     mfloat8x8x4_t a,
     uint8x8_t idx)
| `a.val[1]:a.val[0] -> Vn.16B`
`a.val[3]:a.val[2] -> Vn+1.16B`
`idx -> Vm.8B` | `TBL Vd.8B,{Vn.16B,Vn+1.16B},Vm.8B` | `Vd.8B -> result` | `A64` | +| int8x8_t vqtbl1_s8(
     int8x16_t t,
     uint8x8_t idx)
| `t -> Vn.16B`
`idx -> Vm.8B` | `TBL Vd.8B,{Vn.16B},Vm.8B` | `Vd.8B -> result` | `A64` | +| int8x16_t vqtbl1q_s8(
     int8x16_t t,
     uint8x16_t idx)
| `t -> Vn.16B`
`idx -> Vm.16B` | `TBL Vd.16B,{Vn.16B},Vm.16B` | `Vd.16B -> result` | `A64` | +| uint8x8_t vqtbl1_u8(
     uint8x16_t t,
     uint8x8_t idx)
| `t -> Vn.16B`
`idx -> Vm.8B` | `TBL Vd.8B,{Vn.16B},Vm.8B` | `Vd.8B -> result` | `A64` | +| uint8x16_t vqtbl1q_u8(
     uint8x16_t t,
     uint8x16_t idx)
| `t -> Vn.16B`
`idx -> Vm.16B` | `TBL Vd.16B,{Vn.16B},Vm.16B` | `Vd.16B -> result` | `A64` | +| poly8x8_t vqtbl1_p8(
     poly8x16_t t,
     uint8x8_t idx)
| `t -> Vn.16B`
`idx -> Vm.8B` | `TBL Vd.8B,{Vn.16B},Vm.8B` | `Vd.8B -> result` | `A64` | +| poly8x16_t vqtbl1q_p8(
     poly8x16_t t,
     uint8x16_t idx)
| `t -> Vn.16B`
`idx -> Vm.16B` | `TBL Vd.16B,{Vn.16B},Vm.16B` | `Vd.16B -> result` | `A64` | +| mfloat8x8_t vqtbl1_mf8(
     mfloat8x16_t t,
     uint8x8_t idx)
| `t -> Vn.16B`
`idx -> Vm.8B` | `TBL Vd.8B,{Vn.16B},Vm.8B` | `Vd.8B -> result` | `A64` | +| mfloat8x16_t vqtbl1q_mf8(
     mfloat8x16_t t,
     uint8x16_t idx)
| `t -> Vn.16B`
`idx -> Vm.16B` | `TBL Vd.16B,{Vn.16B},Vm.16B` | `Vd.16B -> result` | `A64` | +| int8x8_t vqtbl2_s8(
     int8x16x2_t t,
     uint8x8_t idx)
| `t.val[0] -> Vn.16B`
`t.val[1] -> Vn+1.16B`
`idx -> Vm.8B` | `TBL Vd.8B,{Vn.16B - Vn+1.16B},Vm.8B` | `Vd.8B -> result` | `A64` | +| int8x16_t vqtbl2q_s8(
     int8x16x2_t t,
     uint8x16_t idx)
| `t.val[0] -> Vn.16B`
`t.val[1] -> Vn+1.16B`
`idx -> Vm.16B` | `TBL Vd.16B,{Vn.16B - Vn+1.16B},Vm.16B` | `Vd.16B -> result` | `A64` | +| uint8x8_t vqtbl2_u8(
     uint8x16x2_t t,
     uint8x8_t idx)
| `t.val[0] -> Vn.16B`
`t.val[1] -> Vn+1.16B`
`idx -> Vm.8B` | `TBL Vd.8B,{Vn.16B - Vn+1.16B},Vm.8B` | `Vd.8B -> result` | `A64` | +| uint8x16_t vqtbl2q_u8(
     uint8x16x2_t t,
     uint8x16_t idx)
| `t.val[0] -> Vn.16B`
`t.val[1] -> Vn+1.16B`
`idx -> Vm.16B` | `TBL Vd.16B,{Vn.16B - Vn+1.16B},Vm.16B` | `Vd.16B -> result` | `A64` | +| poly8x8_t vqtbl2_p8(
     poly8x16x2_t t,
     uint8x8_t idx)
| `t.val[0] -> Vn.16B`
`t.val[1] -> Vn+1.16B`
`idx -> Vm.8B` | `TBL Vd.8B,{Vn.16B - Vn+1.16B},Vm.8B` | `Vd.8B -> result` | `A64` | +| poly8x16_t vqtbl2q_p8(
     poly8x16x2_t t,
     uint8x16_t idx)
| `t.val[0] -> Vn.16B`
`t.val[1] -> Vn+1.16B`
`idx -> Vm.16B` | `TBL Vd.16B,{Vn.16B - Vn+1.16B},Vm.16B` | `Vd.16B -> result` | `A64` | +| mfloat8x8_t vqtbl2_mf8(
     mfloat8x16x2_t t,
     uint8x8_t idx)
| `t.val[0] -> Vn.16B`
`t.val[1] -> Vn+1.16B`
`idx -> Vm.8B` | `TBL Vd.8B,{Vn.16B - Vn+1.16B},Vm.8B` | `Vd.8B -> result` | `A64` | +| mfloat8x16_t vqtbl2q_mf8(
     mfloat8x16x2_t t,
     uint8x16_t idx)
| `t.val[0] -> Vn.16B`
`t.val[1] -> Vn+1.16B`
`idx -> Vm.16B` | `TBL Vd.16B,{Vn.16B - Vn+1.16B},Vm.16B` | `Vd.16B -> result` | `A64` | +| int8x8_t vqtbl3_s8(
     int8x16x3_t t,
     uint8x8_t idx)
| `t.val[0] -> Vn.16B`
`t.val[1] -> Vn+1.16B`
`t.val[2] -> Vn+2.16B`
`idx -> Vm.8B` | `TBL Vd.8B,{Vn.16B - Vn+2.16B},Vm.8B` | `Vd.8B -> result` | `A64` | +| int8x16_t vqtbl3q_s8(
     int8x16x3_t t,
     uint8x16_t idx)
| `t.val[0] -> Vn.16B`
`t.val[1] -> Vn+1.16B`
`t.val[2] -> Vn+2.16B`
`idx -> Vm.16B` | `TBL Vd.16B,{Vn.16B - Vn+2.16B},Vm.16B` | `Vd.16B -> result` | `A64` | +| uint8x8_t vqtbl3_u8(
     uint8x16x3_t t,
     uint8x8_t idx)
| `t.val[0] -> Vn.16B`
`t.val[1] -> Vn+1.16B`
`t.val[2] -> Vn+2.16B`
`idx -> Vm.8B` | `TBL Vd.8B,{Vn.16B - Vn+2.16B},Vm.8B` | `Vd.8B -> result` | `A64` | +| uint8x16_t vqtbl3q_u8(
     uint8x16x3_t t,
     uint8x16_t idx)
| `t.val[0] -> Vn.16B`
`t.val[1] -> Vn+1.16B`
`t.val[2] -> Vn+2.16B`
`idx -> Vm.16B` | `TBL Vd.16B,{Vn.16B - Vn+2.16B},Vm.16B` | `Vd.16B -> result` | `A64` | +| poly8x8_t vqtbl3_p8(
     poly8x16x3_t t,
     uint8x8_t idx)
| `t.val[0] -> Vn.16B`
`t.val[1] -> Vn+1.16B`
`t.val[2] -> Vn+2.16B`
`idx -> Vm.8B` | `TBL Vd.8B,{Vn.16B - Vn+2.16B},Vm.8B` | `Vd.8B -> result` | `A64` | +| poly8x16_t vqtbl3q_p8(
     poly8x16x3_t t,
     uint8x16_t idx)
| `t.val[0] -> Vn.16B`
`t.val[1] -> Vn+1.16B`
`t.val[2] -> Vn+2.16B`
`idx -> Vm.16B` | `TBL Vd.16B,{Vn.16B - Vn+2.16B},Vm.16B` | `Vd.16B -> result` | `A64` | +| mfloat8x8_t vqtbl3_mf8(
     mfloat8x16x3_t t,
     uint8x8_t idx)
| `t.val[0] -> Vn.16B`
`t.val[1] -> Vn+1.16B`
`t.val[2] -> Vn+2.16B`
`idx -> Vm.8B` | `TBL Vd.8B,{Vn.16B - Vn+2.16B},Vm.8B` | `Vd.8B -> result` | `A64` | +| mfloat8x16_t vqtbl3q_mf8(
     mfloat8x16x3_t t,
     uint8x16_t idx)
| `t.val[0] -> Vn.16B`
`t.val[1] -> Vn+1.16B`
`t.val[2] -> Vn+2.16B`
`idx -> Vm.16B` | `TBL Vd.16B,{Vn.16B - Vn+2.16B},Vm.16B` | `Vd.16B -> result` | `A64` | +| int8x8_t vqtbl4_s8(
     int8x16x4_t t,
     uint8x8_t idx)
| `t.val[0] -> Vn.16B`
`t.val[1] -> Vn+1.16B`
`t.val[2] -> Vn+2.16B`
`t.val[3] -> Vn+3.16B`
`idx -> Vm.8B` | `TBL Vd.8B,{Vn.16B - Vn+3.16B},Vm.8B` | `Vd.8B -> result` | `A64` | +| int8x16_t vqtbl4q_s8(
     int8x16x4_t t,
     uint8x16_t idx)
| `t.val[0] -> Vn.16B`
`t.val[1] -> Vn+1.16B`
`t.val[2] -> Vn+2.16B`
`t.val[3] -> Vn+3.16B`
`idx -> Vm.16B` | `TBL Vd.16B,{Vn.16B - Vn+3.16B},Vm.16B` | `Vd.16B -> result` | `A64` | +| uint8x8_t vqtbl4_u8(
     uint8x16x4_t t,
     uint8x8_t idx)
| `t.val[0] -> Vn.16B`
`t.val[1] -> Vn+1.16B`
`t.val[2] -> Vn+2.16B`
`t.val[3] -> Vn+3.16B`
`idx -> Vm.8B` | `TBL Vd.8B,{Vn.16B - Vn+3.16B},Vm.8B` | `Vd.8B -> result` | `A64` | +| uint8x16_t vqtbl4q_u8(
     uint8x16x4_t t,
     uint8x16_t idx)
| `t.val[0] -> Vn.16B`
`t.val[1] -> Vn+1.16B`
`t.val[2] -> Vn+2.16B`
`t.val[3] -> Vn+3.16B`
`idx -> Vm.16B` | `TBL Vd.16B,{Vn.16B - Vn+3.16B},Vm.16B` | `Vd.16B -> result` | `A64` | +| poly8x8_t vqtbl4_p8(
     poly8x16x4_t t,
     uint8x8_t idx)
| `t.val[0] -> Vn.16B`
`t.val[1] -> Vn+1.16B`
`t.val[2] -> Vn+2.16B`
`t.val[3] -> Vn+3.16B`
`idx -> Vm.8B` | `TBL Vd.8B,{Vn.16B - Vn+3.16B},Vm.8B` | `Vd.8B -> result` | `A64` | +| poly8x16_t vqtbl4q_p8(
     poly8x16x4_t t,
     uint8x16_t idx)
| `t.val[0] -> Vn.16B`
`t.val[1] -> Vn+1.16B`
`t.val[2] -> Vn+2.16B`
`t.val[3] -> Vn+3.16B`
`idx -> Vm.16B` | `TBL Vd.16B,{Vn.16B - Vn+3.16B},Vm.16B` | `Vd.16B -> result` | `A64` | +| mfloat8x8_t vqtbl4_mf8(
     mfloat8x16x4_t t,
     uint8x8_t idx)
| `t.val[0] -> Vn.16B`
`t.val[1] -> Vn+1.16B`
`t.val[2] -> Vn+2.16B`
`t.val[3] -> Vn+3.16B`
`idx -> Vm.8B` | `TBL Vd.8B,{Vn.16B - Vn+3.16B},Vm.8B` | `Vd.8B -> result` | `A64` | +| mfloat8x16_t vqtbl4q_mf8(
     mfloat8x16x4_t t,
     uint8x16_t idx)
| `t.val[0] -> Vn.16B`
`t.val[1] -> Vn+1.16B`
`t.val[2] -> Vn+2.16B`
`t.val[3] -> Vn+3.16B`
`idx -> Vm.16B` | `TBL Vd.16B,{Vn.16B - Vn+3.16B},Vm.16B` | `Vd.16B -> result` | `A64` | #### Extended table lookup -| Intrinsic | Argument preparation | AArch64 Instruction | Result | Supported architectures | -|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------|--------------------|---------------------------| -| int8x8_t vtbx2_s8(
     int8x8_t a,
     int8x8x2_t b,
     int8x8_t idx)
| `b.val[1]:b.val[0] -> Vn.16B`
`a -> Vd.8B`
`idx -> Vm.8B` | `TBX Vd.8B,{Vn.16B},Vm.8B` | `Vd.8B -> result` | `v7/A32/A64` | -| uint8x8_t vtbx2_u8(
     uint8x8_t a,
     uint8x8x2_t b,
     uint8x8_t idx)
| `b.val[1]:b.val[0] -> Vn.16B`
`a -> Vd.8B`
`idx -> Vm.8B` | `TBX Vd.8B,{Vn.16B},Vm.8B` | `Vd.8B -> result` | `v7/A32/A64` | -| poly8x8_t vtbx2_p8(
     poly8x8_t a,
     poly8x8x2_t b,
     uint8x8_t idx)
| `b.val[1]:b.val[0] -> Vn.16B`
`a -> Vd.8B`
`idx -> Vm.8B` | `TBX Vd.8B,{Vn.16B},Vm.8B` | `Vd.8B -> result` | `v7/A32/A64` | -| int8x8_t vtbx3_s8(
     int8x8_t a,
     int8x8x3_t b,
     int8x8_t idx)
| `b.val[1]:b.val[0] -> Vn.16B`
`Zeros(64):b.val[2] -> Vn+1.16B`
`a -> Vd.8B`
`idx -> Vm.8B` | `MOVI Vtmp.8B,#24`
`CMHS Vtmp.8B,Vm.8B,Vtmp.8B`
`TBL Vtmp1.8B,{Vn.16B,Vn+1.16B},Vm.8`
`BIF Vd.8B,Vtmp1.8B,Vtmp.8B` | `Vd.8B -> result` | `v7/A32/A64` | -| uint8x8_t vtbx3_u8(
     uint8x8_t a,
     uint8x8x3_t b,
     uint8x8_t idx)
| `b.val[1]:b.val[0] -> Vn.16B`
`Zeros(64):b.val[2] -> Vn+1.16B`
`a -> Vd.8B`
`idx -> Vm.8B` | `MOVI Vtmp.8B,#24`
`CMHS Vtmp.8B,Vm.8B,Vtmp.8B`
`TBL Vtmp1.8B,{Vn.16B,Vn+1.16B},Vm.8B`
`BIF Vd.8B,Vtmp1.8B,Vtmp.8B` | `Vd.8B -> result` | `v7/A32/A64` | -| poly8x8_t vtbx3_p8(
     poly8x8_t a,
     poly8x8x3_t b,
     uint8x8_t idx)
| `b.val[1]:b.val[0] -> Vn.16B`
`Zeros(64):b.val[2] -> Vn+1.16B`
`a -> Vd.8B`
`idx -> Vm.8B` | `MOVI Vtmp.8B,#24`
`CMHS Vtmp.8B,Vm.8B,Vtmp.8B`
`TBL Vtmp1.8B,{Vn.16B,Vn+1.16B},Vm.8B`
`BIF Vd.8B,Vtmp1.8B,Vtmp.8B` | `Vd.8B -> result` | `v7/A32/A64` | -| int8x8_t vtbx4_s8(
     int8x8_t a,
     int8x8x4_t b,
     int8x8_t idx)
| `b.val[1]:b.val[0] -> Vn.16B`
`b.val[3]:b.val[2] -> Vn+1.16B`
`a -> Vd.8B`
`c-> Vm.8B` | `TBX Vd.8B,{Vn.16B,Vn+1.16B},Vm.8B` | `Vd.8B -> result` | `v7/A32/A64` | -| uint8x8_t vtbx4_u8(
     uint8x8_t a,
     uint8x8x4_t b,
     uint8x8_t idx)
| `b.val[1]:b.val[0] -> Vn.16B`
`b.val[3]:b.val[2] -> Vn+1.16B`
`a -> Vd.8B`
`c-> Vm.8B` | `TBX Vd.8B,{Vn.16B,Vn+1.16B},Vm.8B` | `Vd.8B -> result` | `v7/A32/A64` | -| poly8x8_t vtbx4_p8(
     poly8x8_t a,
     poly8x8x4_t b,
     uint8x8_t idx)
| `b.val[1]:b.val[0] -> Vn.16B`
`b.val[3]:b.val[2] -> Vn+1.16B`
`a -> Vd.8B`
`c-> Vm.8B` | `TBX Vd.8B,{Vn.16B,Vn+1.16B},Vm.8B` | `Vd.8B -> result` | `v7/A32/A64` | -| int8x8_t vqtbx1_s8(
     int8x8_t a,
     int8x16_t t,
     uint8x8_t idx)
| `a -> Vd.8B`
`t -> Vn.16B`
`idx -> Vm.8B` | `TBX Vd.8B,{Vn.16B},Vm.8B` | `Vd.8B -> result` | `A64` | -| int8x16_t vqtbx1q_s8(
     int8x16_t a,
     int8x16_t t,
     uint8x16_t idx)
| `a -> Vd.16B`
`t -> Vn.16B`
`idx -> Vm.16B` | `TBX Vd.16B,{Vn.16B},Vm.16B` | `Vd.16B -> result` | `A64` | -| uint8x8_t vqtbx1_u8(
     uint8x8_t a,
     uint8x16_t t,
     uint8x8_t idx)
| `a -> Vd.8B`
`t -> Vn.16B`
`idx -> Vm.8B` | `TBX Vd.8B,{Vn.16B},Vm.8B` | `Vd.8B -> result` | `A64` | -| uint8x16_t vqtbx1q_u8(
     uint8x16_t a,
     uint8x16_t t,
     uint8x16_t idx)
| `a -> Vd.16B`
`t -> Vn.16B`
`idx -> Vm.16B` | `TBX Vd.16B,{Vn.16B},Vm.16B` | `Vd.16B -> result` | `A64` | -| poly8x8_t vqtbx1_p8(
     poly8x8_t a,
     poly8x16_t t,
     uint8x8_t idx)
| `a -> Vd.8B`
`t -> Vn.16B`
`idx -> Vm.8B` | `TBX Vd.8B,{Vn.16B},Vm.8B` | `Vd.8B -> result` | `A64` | -| poly8x16_t vqtbx1q_p8(
     poly8x16_t a,
     poly8x16_t t,
     uint8x16_t idx)
| `a -> Vd.16B`
`t -> Vn.16B`
`idx -> Vm.16B` | `TBX Vd.16B,{Vn.16B},Vm.16B` | `Vd.16B -> result` | `A64` | -| int8x8_t vqtbx2_s8(
     int8x8_t a,
     int8x16x2_t t,
     uint8x8_t idx)
| `t.val[0] -> Vn.16B`
`t.val[1] -> Vn+1.16B`
`idx -> Vm.8B`
`a -> Vd.8B` | `TBX Vd.8B,{Vn.16B - Vn+1.16B},Vm.8B` | `Vd.8B -> result` | `A64` | -| int8x16_t vqtbx2q_s8(
     int8x16_t a,
     int8x16x2_t t,
     uint8x16_t idx)
| `t.val[0] -> Vn.16B`
`t.val[1] -> Vn+1.16B`
`idx -> Vm.16B`
`a -> Vd.16B` | `TBX Vd.16B,{Vn.16B - Vn+1.16B},Vm.16B` | `Vd.16B -> result` | `A64` | -| uint8x8_t vqtbx2_u8(
     uint8x8_t a,
     uint8x16x2_t t,
     uint8x8_t idx)
| `t.val[0] -> Vn.16B`
`t.val[1] -> Vn+1.16B`
`idx -> Vm.8B`
`a -> Vd.8B` | `TBX Vd.8B,{Vn.16B - Vn+1.16B},Vm.8B` | `Vd.8B -> result` | `A64` | -| uint8x16_t vqtbx2q_u8(
     uint8x16_t a,
     uint8x16x2_t t,
     uint8x16_t idx)
| `t.val[0] -> Vn.16B`
`t.val[1] -> Vn+1.16B`
`idx -> Vm.16B`
`a -> Vd.16B` | `TBX Vd.16B,{Vn.16B - Vn+1.16B},Vm.16B` | `Vd.16B -> result` | `A64` | -| poly8x8_t vqtbx2_p8(
     poly8x8_t a,
     poly8x16x2_t t,
     uint8x8_t idx)
| `t.val[0] -> Vn.16B`
`t.val[1] -> Vn+1.16B`
`idx -> Vm.8B`
`a -> Vd.8B` | `TBX Vd.8B,{Vn.16B - Vn+1.16B},Vm.8B` | `Vd.8B -> result` | `A64` | -| poly8x16_t vqtbx2q_p8(
     poly8x16_t a,
     poly8x16x2_t t,
     uint8x16_t idx)
| `t.val[0] -> Vn.16B`
`t.val[1] -> Vn+1.16B`
`idx -> Vm.16B`
`a -> Vd.16B` | `TBX Vd.16B,{Vn.16B - Vn+1.16B},Vm.16B` | `Vd.16B -> result` | `A64` | -| int8x8_t vqtbx3_s8(
     int8x8_t a,
     int8x16x3_t t,
     uint8x8_t idx)
| `t.val[0] -> Vn.16B`
`t.val[1] -> Vn+1.16B`
`t.val[2] -> Vn+2.16B`
`idx -> Vm.8B`
`a -> Vd.8B` | `TBX Vd.8B,{Vn.16B - Vn+2.16B},Vm.8B` | `Vd.8B -> result` | `A64` | -| int8x16_t vqtbx3q_s8(
     int8x16_t a,
     int8x16x3_t t,
     uint8x16_t idx)
| `t.val[0] -> Vn.16B`
`t.val[1] -> Vn+1.16B`
`t.val[2] -> Vn+2.16B`
`idx -> Vm.16B`
`a -> Vd.16B` | `TBX Vd.16B,{Vn.16B - Vn+2.16B},Vm.16B` | `Vd.16B -> result` | `A64` | -| uint8x8_t vqtbx3_u8(
     uint8x8_t a,
     uint8x16x3_t t,
     uint8x8_t idx)
| `t.val[0] -> Vn.16B`
`t.val[1] -> Vn+1.16B`
`t.val[2] -> Vn+2.16B`
`idx -> Vm.8B`
`a -> Vd.8B` | `TBX Vd.8B,{Vn.16B - Vn+2.16B},Vm.8B` | `Vd.8B -> result` | `A64` | -| uint8x16_t vqtbx3q_u8(
     uint8x16_t a,
     uint8x16x3_t t,
     uint8x16_t idx)
| `t.val[0] -> Vn.16B`
`t.val[1] -> Vn+1.16B`
`t.val[2] -> Vn+2.16B`
`idx -> Vm.16B`
`a -> Vd.16B` | `TBX Vd.16B,{Vn.16B - Vn+2.16B},Vm.16B` | `Vd.16B -> result` | `A64` | -| poly8x8_t vqtbx3_p8(
     poly8x8_t a,
     poly8x16x3_t t,
     uint8x8_t idx)
| `t.val[0] -> Vn.16B`
`t.val[1] -> Vn+1.16B`
`t.val[2] -> Vn+2.16B`
`idx -> Vm.8B`
`a -> Vd.8B` | `TBX Vd.8B,{Vn.16B - Vn+2.16B},Vm.8B` | `Vd.8B -> result` | `A64` | -| poly8x16_t vqtbx3q_p8(
     poly8x16_t a,
     poly8x16x3_t t,
     uint8x16_t idx)
| `t.val[0] -> Vn.16B`
`t.val[1] -> Vn+1.16B`
`t.val[2] -> Vn+2.16B`
`idx -> Vm.16B`
`a -> Vd.16B` | `TBX Vd.16B,{Vn.16B - Vn+2.16B},Vm.16B` | `Vd.16B -> result` | `A64` | -| int8x8_t vqtbx4_s8(
     int8x8_t a,
     int8x16x4_t t,
     uint8x8_t idx)
| `t.val[0] -> Vn.16B`
`t.val[1] -> Vn+1.16B`
`t.val[2] -> Vn+2.16B`
`t.val[3] -> Vn+3.16B`
`idx -> Vm.8B`
`a -> Vd.8B` | `TBX Vd.8B,{Vn.16B - Vn+3.16B},Vm.8B` | `Vd.8B -> result` | `A64` | -| int8x16_t vqtbx4q_s8(
     int8x16_t a,
     int8x16x4_t t,
     uint8x16_t idx)
| `t.val[0] -> Vn.16B`
`t.val[1] -> Vn+1.16B`
`t.val[2] -> Vn+2.16B`
`t.val[3] -> Vn+3.16B`
`idx -> Vm.16B`
`a -> Vd.16B` | `TBX Vd.16B,{Vn.16B - Vn+3.16B},Vm.16B` | `Vd.16B -> result` | `A64` | -| uint8x8_t vqtbx4_u8(
     uint8x8_t a,
     uint8x16x4_t t,
     uint8x8_t idx)
| `t.val[0] -> Vn.16B`
`t.val[1] -> Vn+1.16B`
`t.val[2] -> Vn+2.16B`
`t.val[3] -> Vn+3.16B`
`idx -> Vm.8B`
`a -> Vd.8B` | `TBX Vd.8B,{Vn.16B - Vn+3.16B},Vm.8B` | `Vd.8B -> result` | `A64` | -| uint8x16_t vqtbx4q_u8(
     uint8x16_t a,
     uint8x16x4_t t,
     uint8x16_t idx)
| `t.val[0] -> Vn.16B`
`t.val[1] -> Vn+1.16B`
`t.val[2] -> Vn+2.16B`
`t.val[3] -> Vn+3.16B`
`idx -> Vm.16B`
`a -> Vd.16B` | `TBX Vd.16B,{Vn.16B - Vn+3.16B},Vm.16B` | `Vd.16B -> result` | `A64` | -| poly8x8_t vqtbx4_p8(
     poly8x8_t a,
     poly8x16x4_t t,
     uint8x8_t idx)
| `t.val[0] -> Vn.16B`
`t.val[1] -> Vn+1.16B`
`t.val[2] -> Vn+2.16B`
`t.val[3] -> Vn+3.16B`
`idx -> Vm.8B`
`a -> Vd.8B` | `TBX Vd.8B,{Vn.16B - Vn+3.16B},Vm.8B` | `Vd.8B -> result` | `A64` | -| poly8x16_t vqtbx4q_p8(
     poly8x16_t a,
     poly8x16x4_t t,
     uint8x16_t idx)
| `t.val[0] -> Vn.16B`
`t.val[1] -> Vn+1.16B`
`t.val[2] -> Vn+2.16B`
`t.val[3] -> Vn+3.16B`
`idx -> Vm.16B`
`a -> Vd.16B` | `TBX Vd.16B,{Vn.16B - Vn+3.16B},Vm.16B` | `Vd.16B -> result` | `A64` | +| Intrinsic | Argument preparation | AArch64 Instruction | Result | Supported architectures | +|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------|--------------------|---------------------------| +| int8x8_t vtbx2_s8(
     int8x8_t a,
     int8x8x2_t b,
     int8x8_t idx)
| `b.val[1]:b.val[0] -> Vn.16B`
`a -> Vd.8B`
`idx -> Vm.8B` | `TBX Vd.8B,{Vn.16B},Vm.8B` | `Vd.8B -> result` | `v7/A32/A64` | +| uint8x8_t vtbx2_u8(
     uint8x8_t a,
     uint8x8x2_t b,
     uint8x8_t idx)
| `b.val[1]:b.val[0] -> Vn.16B`
`a -> Vd.8B`
`idx -> Vm.8B` | `TBX Vd.8B,{Vn.16B},Vm.8B` | `Vd.8B -> result` | `v7/A32/A64` | +| poly8x8_t vtbx2_p8(
     poly8x8_t a,
     poly8x8x2_t b,
     uint8x8_t idx)
| `b.val[1]:b.val[0] -> Vn.16B`
`a -> Vd.8B`
`idx -> Vm.8B` | `TBX Vd.8B,{Vn.16B},Vm.8B` | `Vd.8B -> result` | `v7/A32/A64` | +| mfloat8x8_t vtbx2_mf8(
     mfloat8x8_t a,
     mfloat8x8x2_t b,
     uint8x8_t idx)
| `b.val[1]:b.val[0] -> Vn.16B`
`a -> Vd.8B`
`idx -> Vm.8B` | `TBX Vd.8B,{Vn.16B},Vm.8B` | `Vd.8B -> result` | `A64` | +| int8x8_t vtbx3_s8(
     int8x8_t a,
     int8x8x3_t b,
     int8x8_t idx)
| `b.val[1]:b.val[0] -> Vn.16B`
`Zeros(64):b.val[2] -> Vn+1.16B`
`a -> Vd.8B`
`idx -> Vm.8B` | `MOVI Vtmp.8B,#24`
`CMHS Vtmp.8B,Vm.8B,Vtmp.8B`
`TBL Vtmp1.8B,{Vn.16B,Vn+1.16B},Vm.8`
`BIF Vd.8B,Vtmp1.8B,Vtmp.8B` | `Vd.8B -> result` | `v7/A32/A64` | +| uint8x8_t vtbx3_u8(
     uint8x8_t a,
     uint8x8x3_t b,
     uint8x8_t idx)
| `b.val[1]:b.val[0] -> Vn.16B`
`Zeros(64):b.val[2] -> Vn+1.16B`
`a -> Vd.8B`
`idx -> Vm.8B` | `MOVI Vtmp.8B,#24`
`CMHS Vtmp.8B,Vm.8B,Vtmp.8B`
`TBL Vtmp1.8B,{Vn.16B,Vn+1.16B},Vm.8B`
`BIF Vd.8B,Vtmp1.8B,Vtmp.8B` | `Vd.8B -> result` | `v7/A32/A64` | +| poly8x8_t vtbx3_p8(
     poly8x8_t a,
     poly8x8x3_t b,
     uint8x8_t idx)
| `b.val[1]:b.val[0] -> Vn.16B`
`Zeros(64):b.val[2] -> Vn+1.16B`
`a -> Vd.8B`
`idx -> Vm.8B` | `MOVI Vtmp.8B,#24`
`CMHS Vtmp.8B,Vm.8B,Vtmp.8B`
`TBL Vtmp1.8B,{Vn.16B,Vn+1.16B},Vm.8B`
`BIF Vd.8B,Vtmp1.8B,Vtmp.8B` | `Vd.8B -> result` | `v7/A32/A64` | +| mfloat8x8_t vtbx3_mf8(
     mfloat8x8_t a,
     mfloat8x8x3_t b,
     uint8x8_t idx)
| `b.val[1]:b.val[0] -> Vn.16B`
`Zeros(64):b.val[2] -> Vn+1.16B`
`a -> Vd.8B`
`idx -> Vm.8B` | `MOVI Vtmp.8B,#24`
`CMHS Vtmp.8B,Vm.8B,Vtmp.8B`
`TBL Vtmp1.8B,{Vn.16B,Vn+1.16B},Vm.8B`
`BIF Vd.8B,Vtmp1.8B,Vtmp.8B` | `Vd.8B -> result` | `A64` | +| int8x8_t vtbx4_s8(
     int8x8_t a,
     int8x8x4_t b,
     int8x8_t idx)
| `b.val[1]:b.val[0] -> Vn.16B`
`b.val[3]:b.val[2] -> Vn+1.16B`
`a -> Vd.8B`
`c-> Vm.8B` | `TBX Vd.8B,{Vn.16B,Vn+1.16B},Vm.8B` | `Vd.8B -> result` | `v7/A32/A64` | +| uint8x8_t vtbx4_u8(
     uint8x8_t a,
     uint8x8x4_t b,
     uint8x8_t idx)
| `b.val[1]:b.val[0] -> Vn.16B`
`b.val[3]:b.val[2] -> Vn+1.16B`
`a -> Vd.8B`
`c-> Vm.8B` | `TBX Vd.8B,{Vn.16B,Vn+1.16B},Vm.8B` | `Vd.8B -> result` | `v7/A32/A64` | +| poly8x8_t vtbx4_p8(
     poly8x8_t a,
     poly8x8x4_t b,
     uint8x8_t idx)
| `b.val[1]:b.val[0] -> Vn.16B`
`b.val[3]:b.val[2] -> Vn+1.16B`
`a -> Vd.8B`
`c-> Vm.8B` | `TBX Vd.8B,{Vn.16B,Vn+1.16B},Vm.8B` | `Vd.8B -> result` | `v7/A32/A64` | +| mfloat8x8_t vtbx4_mf8(
     mfloat8x8_t a,
     mfloat8x8x4_t b,
     uint8x8_t idx)
| `b.val[1]:b.val[0] -> Vn.16B`
`b.val[3]:b.val[2] -> Vn+1.16B`
`a -> Vd.8B`
`c-> Vm.8B` | `TBX Vd.8B,{Vn.16B,Vn+1.16B},Vm.8B` | `Vd.8B -> result` | `A64` | +| int8x8_t vqtbx1_s8(
     int8x8_t a,
     int8x16_t t,
     uint8x8_t idx)
| `a -> Vd.8B`
`t -> Vn.16B`
`idx -> Vm.8B` | `TBX Vd.8B,{Vn.16B},Vm.8B` | `Vd.8B -> result` | `A64` | +| int8x16_t vqtbx1q_s8(
     int8x16_t a,
     int8x16_t t,
     uint8x16_t idx)
| `a -> Vd.16B`
`t -> Vn.16B`
`idx -> Vm.16B` | `TBX Vd.16B,{Vn.16B},Vm.16B` | `Vd.16B -> result` | `A64` | +| uint8x8_t vqtbx1_u8(
     uint8x8_t a,
     uint8x16_t t,
     uint8x8_t idx)
| `a -> Vd.8B`
`t -> Vn.16B`
`idx -> Vm.8B` | `TBX Vd.8B,{Vn.16B},Vm.8B` | `Vd.8B -> result` | `A64` | +| uint8x16_t vqtbx1q_u8(
     uint8x16_t a,
     uint8x16_t t,
     uint8x16_t idx)
| `a -> Vd.16B`
`t -> Vn.16B`
`idx -> Vm.16B` | `TBX Vd.16B,{Vn.16B},Vm.16B` | `Vd.16B -> result` | `A64` | +| poly8x8_t vqtbx1_p8(
     poly8x8_t a,
     poly8x16_t t,
     uint8x8_t idx)
| `a -> Vd.8B`
`t -> Vn.16B`
`idx -> Vm.8B` | `TBX Vd.8B,{Vn.16B},Vm.8B` | `Vd.8B -> result` | `A64` | +| poly8x16_t vqtbx1q_p8(
     poly8x16_t a,
     poly8x16_t t,
     uint8x16_t idx)
| `a -> Vd.16B`
`t -> Vn.16B`
`idx -> Vm.16B` | `TBX Vd.16B,{Vn.16B},Vm.16B` | `Vd.16B -> result` | `A64` | +| mfloat8x8_t vqtbx1_mf8(
     mfloat8x8_t a,
     mfloat8x16_t t,
     uint8x8_t idx)
| `a -> Vd.8B`
`t -> Vn.16B`
`idx -> Vm.8B` | `TBX Vd.8B,{Vn.16B},Vm.8B` | `Vd.8B -> result` | `A64` | +| mfloat8x16_t vqtbx1q_mf8(
     mfloat8x16_t a,
     mfloat8x16_t t,
     uint8x16_t idx)
| `a -> Vd.16B`
`t -> Vn.16B`
`idx -> Vm.16B` | `TBX Vd.16B,{Vn.16B},Vm.16B` | `Vd.16B -> result` | `A64` | +| int8x8_t vqtbx2_s8(
     int8x8_t a,
     int8x16x2_t t,
     uint8x8_t idx)
| `t.val[0] -> Vn.16B`
`t.val[1] -> Vn+1.16B`
`idx -> Vm.8B`
`a -> Vd.8B` | `TBX Vd.8B,{Vn.16B - Vn+1.16B},Vm.8B` | `Vd.8B -> result` | `A64` | +| int8x16_t vqtbx2q_s8(
     int8x16_t a,
     int8x16x2_t t,
     uint8x16_t idx)
| `t.val[0] -> Vn.16B`
`t.val[1] -> Vn+1.16B`
`idx -> Vm.16B`
`a -> Vd.16B` | `TBX Vd.16B,{Vn.16B - Vn+1.16B},Vm.16B` | `Vd.16B -> result` | `A64` | +| uint8x8_t vqtbx2_u8(
     uint8x8_t a,
     uint8x16x2_t t,
     uint8x8_t idx)
| `t.val[0] -> Vn.16B`
`t.val[1] -> Vn+1.16B`
`idx -> Vm.8B`
`a -> Vd.8B` | `TBX Vd.8B,{Vn.16B - Vn+1.16B},Vm.8B` | `Vd.8B -> result` | `A64` | +| uint8x16_t vqtbx2q_u8(
     uint8x16_t a,
     uint8x16x2_t t,
     uint8x16_t idx)
| `t.val[0] -> Vn.16B`
`t.val[1] -> Vn+1.16B`
`idx -> Vm.16B`
`a -> Vd.16B` | `TBX Vd.16B,{Vn.16B - Vn+1.16B},Vm.16B` | `Vd.16B -> result` | `A64` | +| poly8x8_t vqtbx2_p8(
     poly8x8_t a,
     poly8x16x2_t t,
     uint8x8_t idx)
| `t.val[0] -> Vn.16B`
`t.val[1] -> Vn+1.16B`
`idx -> Vm.8B`
`a -> Vd.8B` | `TBX Vd.8B,{Vn.16B - Vn+1.16B},Vm.8B` | `Vd.8B -> result` | `A64` | +| poly8x16_t vqtbx2q_p8(
     poly8x16_t a,
     poly8x16x2_t t,
     uint8x16_t idx)
| `t.val[0] -> Vn.16B`
`t.val[1] -> Vn+1.16B`
`idx -> Vm.16B`
`a -> Vd.16B` | `TBX Vd.16B,{Vn.16B - Vn+1.16B},Vm.16B` | `Vd.16B -> result` | `A64` | +| mfloat8x8_t vqtbx2_mf8(
     mfloat8x8_t a,
     mfloat8x16x2_t t,
     uint8x8_t idx)
| `t.val[0] -> Vn.16B`
`t.val[1] -> Vn+1.16B`
`idx -> Vm.8B`
`a -> Vd.8B` | `TBX Vd.8B,{Vn.16B - Vn+1.16B},Vm.8B` | `Vd.8B -> result` | `A64` | +| mfloat8x16_t vqtbx2q_mf8(
     mfloat8x16_t a,
     mfloat8x16x2_t t,
     uint8x16_t idx)
| `t.val[0] -> Vn.16B`
`t.val[1] -> Vn+1.16B`
`idx -> Vm.16B`
`a -> Vd.16B` | `TBX Vd.16B,{Vn.16B - Vn+1.16B},Vm.16B` | `Vd.16B -> result` | `A64` | +| int8x8_t vqtbx3_s8(
     int8x8_t a,
     int8x16x3_t t,
     uint8x8_t idx)
| `t.val[0] -> Vn.16B`
`t.val[1] -> Vn+1.16B`
`t.val[2] -> Vn+2.16B`
`idx -> Vm.8B`
`a -> Vd.8B` | `TBX Vd.8B,{Vn.16B - Vn+2.16B},Vm.8B` | `Vd.8B -> result` | `A64` | +| int8x16_t vqtbx3q_s8(
     int8x16_t a,
     int8x16x3_t t,
     uint8x16_t idx)
| `t.val[0] -> Vn.16B`
`t.val[1] -> Vn+1.16B`
`t.val[2] -> Vn+2.16B`
`idx -> Vm.16B`
`a -> Vd.16B` | `TBX Vd.16B,{Vn.16B - Vn+2.16B},Vm.16B` | `Vd.16B -> result` | `A64` | +| uint8x8_t vqtbx3_u8(
     uint8x8_t a,
     uint8x16x3_t t,
     uint8x8_t idx)
| `t.val[0] -> Vn.16B`
`t.val[1] -> Vn+1.16B`
`t.val[2] -> Vn+2.16B`
`idx -> Vm.8B`
`a -> Vd.8B` | `TBX Vd.8B,{Vn.16B - Vn+2.16B},Vm.8B` | `Vd.8B -> result` | `A64` | +| uint8x16_t vqtbx3q_u8(
     uint8x16_t a,
     uint8x16x3_t t,
     uint8x16_t idx)
| `t.val[0] -> Vn.16B`
`t.val[1] -> Vn+1.16B`
`t.val[2] -> Vn+2.16B`
`idx -> Vm.16B`
`a -> Vd.16B` | `TBX Vd.16B,{Vn.16B - Vn+2.16B},Vm.16B` | `Vd.16B -> result` | `A64` | +| poly8x8_t vqtbx3_p8(
     poly8x8_t a,
     poly8x16x3_t t,
     uint8x8_t idx)
| `t.val[0] -> Vn.16B`
`t.val[1] -> Vn+1.16B`
`t.val[2] -> Vn+2.16B`
`idx -> Vm.8B`
`a -> Vd.8B` | `TBX Vd.8B,{Vn.16B - Vn+2.16B},Vm.8B` | `Vd.8B -> result` | `A64` | +| poly8x16_t vqtbx3q_p8(
     poly8x16_t a,
     poly8x16x3_t t,
     uint8x16_t idx)
| `t.val[0] -> Vn.16B`
`t.val[1] -> Vn+1.16B`
`t.val[2] -> Vn+2.16B`
`idx -> Vm.16B`
`a -> Vd.16B` | `TBX Vd.16B,{Vn.16B - Vn+2.16B},Vm.16B` | `Vd.16B -> result` | `A64` | +| mfloat8x8_t vqtbx3_mf8(
     mfloat8x8_t a,
     mfloat8x16x3_t t,
     uint8x8_t idx)
| `t.val[0] -> Vn.16B`
`t.val[1] -> Vn+1.16B`
`t.val[2] -> Vn+2.16B`
`idx -> Vm.8B`
`a -> Vd.8B` | `TBX Vd.8B,{Vn.16B - Vn+2.16B},Vm.8B` | `Vd.8B -> result` | `A64` | +| mfloat8x16_t vqtbx3q_mf8(
     mfloat8x16_t a,
     mfloat8x16x3_t t,
     uint8x16_t idx)
| `t.val[0] -> Vn.16B`
`t.val[1] -> Vn+1.16B`
`t.val[2] -> Vn+2.16B`
`idx -> Vm.16B`
`a -> Vd.16B` | `TBX Vd.16B,{Vn.16B - Vn+2.16B},Vm.16B` | `Vd.16B -> result` | `A64` | +| int8x8_t vqtbx4_s8(
     int8x8_t a,
     int8x16x4_t t,
     uint8x8_t idx)
| `t.val[0] -> Vn.16B`
`t.val[1] -> Vn+1.16B`
`t.val[2] -> Vn+2.16B`
`t.val[3] -> Vn+3.16B`
`idx -> Vm.8B`
`a -> Vd.8B` | `TBX Vd.8B,{Vn.16B - Vn+3.16B},Vm.8B` | `Vd.8B -> result` | `A64` | +| int8x16_t vqtbx4q_s8(
     int8x16_t a,
     int8x16x4_t t,
     uint8x16_t idx)
| `t.val[0] -> Vn.16B`
`t.val[1] -> Vn+1.16B`
`t.val[2] -> Vn+2.16B`
`t.val[3] -> Vn+3.16B`
`idx -> Vm.16B`
`a -> Vd.16B` | `TBX Vd.16B,{Vn.16B - Vn+3.16B},Vm.16B` | `Vd.16B -> result` | `A64` | +| uint8x8_t vqtbx4_u8(
     uint8x8_t a,
     uint8x16x4_t t,
     uint8x8_t idx)
| `t.val[0] -> Vn.16B`
`t.val[1] -> Vn+1.16B`
`t.val[2] -> Vn+2.16B`
`t.val[3] -> Vn+3.16B`
`idx -> Vm.8B`
`a -> Vd.8B` | `TBX Vd.8B,{Vn.16B - Vn+3.16B},Vm.8B` | `Vd.8B -> result` | `A64` | +| uint8x16_t vqtbx4q_u8(
     uint8x16_t a,
     uint8x16x4_t t,
     uint8x16_t idx)
| `t.val[0] -> Vn.16B`
`t.val[1] -> Vn+1.16B`
`t.val[2] -> Vn+2.16B`
`t.val[3] -> Vn+3.16B`
`idx -> Vm.16B`
`a -> Vd.16B` | `TBX Vd.16B,{Vn.16B - Vn+3.16B},Vm.16B` | `Vd.16B -> result` | `A64` | +| poly8x8_t vqtbx4_p8(
     poly8x8_t a,
     poly8x16x4_t t,
     uint8x8_t idx)
| `t.val[0] -> Vn.16B`
`t.val[1] -> Vn+1.16B`
`t.val[2] -> Vn+2.16B`
`t.val[3] -> Vn+3.16B`
`idx -> Vm.8B`
`a -> Vd.8B` | `TBX Vd.8B,{Vn.16B - Vn+3.16B},Vm.8B` | `Vd.8B -> result` | `A64` | +| poly8x16_t vqtbx4q_p8(
     poly8x16_t a,
     poly8x16x4_t t,
     uint8x16_t idx)
| `t.val[0] -> Vn.16B`
`t.val[1] -> Vn+1.16B`
`t.val[2] -> Vn+2.16B`
`t.val[3] -> Vn+3.16B`
`idx -> Vm.16B`
`a -> Vd.16B` | `TBX Vd.16B,{Vn.16B - Vn+3.16B},Vm.16B` | `Vd.16B -> result` | `A64` | +| mfloat8x8_t vqtbx4_mf8(
     mfloat8x8_t a,
     mfloat8x16x4_t t,
     uint8x8_t idx)
| `t.val[0] -> Vn.16B`
`t.val[1] -> Vn+1.16B`
`t.val[2] -> Vn+2.16B`
`t.val[3] -> Vn+3.16B`
`idx -> Vm.8B`
`a -> Vd.8B` | `TBX Vd.8B,{Vn.16B - Vn+3.16B},Vm.8B` | `Vd.8B -> result` | `A64` | +| mfloat8x16_t vqtbx4q_mf8(
     mfloat8x16_t a,
     mfloat8x16x4_t t,
     uint8x16_t idx)
| `t.val[0] -> Vn.16B`
`t.val[1] -> Vn+1.16B`
`t.val[2] -> Vn+2.16B`
`t.val[3] -> Vn+3.16B`
`idx -> Vm.16B`
`a -> Vd.16B` | `TBX Vd.16B,{Vn.16B - Vn+3.16B},Vm.16B` | `Vd.16B -> result` | `A64` | #### Lookup table read with 2-bit indices @@ -5778,62 +5963,66 @@ The intrinsics in this section are guarded by the macro ``__ARM_NEON``. #### Reinterpret casts -| Intrinsic | Argument preparation | AArch64 Instruction | Result | Supported architectures | -|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------|-----------------------|--------------------|---------------------------| -| bfloat16x4_t vreinterpret_bf16_s8(int8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.4H -> result` | `A32/A64` | -| bfloat16x4_t vreinterpret_bf16_s16(int16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.4H -> result` | `A32/A64` | -| bfloat16x4_t vreinterpret_bf16_s32(int32x2_t a) | `a -> Vd.2S` | `NOP` | `Vd.4H -> result` | `A32/A64` | -| bfloat16x4_t vreinterpret_bf16_f32(float32x2_t a) | `a -> Vd.2S` | `NOP` | `Vd.4H -> result` | `A32/A64` | -| bfloat16x4_t vreinterpret_bf16_u8(uint8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.4H -> result` | `A32/A64` | -| bfloat16x4_t vreinterpret_bf16_u16(uint16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.4H -> result` | `A32/A64` | -| bfloat16x4_t vreinterpret_bf16_u32(uint32x2_t a) | `a -> Vd.2S` | `NOP` | `Vd.4H -> result` | `A32/A64` | -| bfloat16x4_t vreinterpret_bf16_p8(poly8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.4H -> result` | `A32/A64` | -| bfloat16x4_t vreinterpret_bf16_p16(poly16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.4H -> result` | `A32/A64` | -| bfloat16x4_t vreinterpret_bf16_u64(uint64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.4H -> result` | `A32/A64` | -| bfloat16x4_t vreinterpret_bf16_s64(int64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.4H -> result` | `A32/A64` | -| bfloat16x8_t vreinterpretq_bf16_s8(int8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.8H -> result` | `A32/A64` | -| bfloat16x8_t vreinterpretq_bf16_s16(int16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.8H -> result` | `A32/A64` | -| bfloat16x8_t vreinterpretq_bf16_s32(int32x4_t a) | `a -> Vd.4S` | `NOP` | `Vd.8H -> result` | `A32/A64` | -| bfloat16x8_t vreinterpretq_bf16_f32(float32x4_t a) | `a -> Vd.4S` | `NOP` | `Vd.8H -> result` | `A32/A64` | -| bfloat16x8_t vreinterpretq_bf16_u8(uint8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.8H -> result` | `A32/A64` | -| bfloat16x8_t vreinterpretq_bf16_u16(uint16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.8H -> result` | `A32/A64` | -| bfloat16x8_t vreinterpretq_bf16_u32(uint32x4_t a) | `a -> Vd.4S` | `NOP` | `Vd.8H -> result` | `A32/A64` | -| bfloat16x8_t vreinterpretq_bf16_p8(poly8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.8H -> result` | `A32/A64` | -| bfloat16x8_t vreinterpretq_bf16_p16(poly16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.8H -> result` | `A32/A64` | -| bfloat16x8_t vreinterpretq_bf16_u64(uint64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.8H -> result` | `A32/A64` | -| bfloat16x8_t vreinterpretq_bf16_s64(int64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.8H -> result` | `A32/A64` | -| bfloat16x4_t vreinterpret_bf16_f64(float64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.4H -> result` | `A64` | -| bfloat16x8_t vreinterpretq_bf16_f64(float64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.8H -> result` | `A64` | -| bfloat16x4_t vreinterpret_bf16_p64(poly64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.4H -> result` | `A32/A64` | -| bfloat16x8_t vreinterpretq_bf16_p64(poly64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.8H -> result` | `A32/A64` | -| bfloat16x8_t vreinterpretq_bf16_p128(poly128_t a) | `a -> Vd.1Q` | `NOP` | `Vd.8H -> result` | `A32/A64` | -| int8x8_t vreinterpret_s8_bf16(bfloat16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.8B -> result` | `A32/A64` | -| int16x4_t vreinterpret_s16_bf16(bfloat16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.4H -> result` | `A32/A64` | -| int32x2_t vreinterpret_s32_bf16(bfloat16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.2S -> result` | `A32/A64` | -| float32x2_t vreinterpret_f32_bf16(bfloat16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.2S -> result` | `A32/A64` | -| uint8x8_t vreinterpret_u8_bf16(bfloat16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.8B -> result` | `A32/A64` | -| uint16x4_t vreinterpret_u16_bf16(bfloat16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.4H -> result` | `A32/A64` | -| uint32x2_t vreinterpret_u32_bf16(bfloat16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.2S -> result` | `A32/A64` | -| poly8x8_t vreinterpret_p8_bf16(bfloat16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.8B -> result` | `A32/A64` | -| poly16x4_t vreinterpret_p16_bf16(bfloat16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.4H -> result` | `A32/A64` | -| uint64x1_t vreinterpret_u64_bf16(bfloat16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.1D -> result` | `A32/A64` | -| int64x1_t vreinterpret_s64_bf16(bfloat16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.1D -> result` | `A32/A64` | -| float64x1_t vreinterpret_f64_bf16(bfloat16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.1D -> result` | `A64` | -| poly64x1_t vreinterpret_p64_bf16(bfloat16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.1D -> result` | `A32/A64` | -| int8x16_t vreinterpretq_s8_bf16(bfloat16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.16B -> result` | `A32/A64` | -| int16x8_t vreinterpretq_s16_bf16(bfloat16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.8H -> result` | `A32/A64` | -| int32x4_t vreinterpretq_s32_bf16(bfloat16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.4S -> result` | `A32/A64` | -| float32x4_t vreinterpretq_f32_bf16(bfloat16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.4S -> result` | `A32/A64` | -| uint8x16_t vreinterpretq_u8_bf16(bfloat16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.16B -> result` | `A32/A64` | -| uint16x8_t vreinterpretq_u16_bf16(bfloat16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.8H -> result` | `A32/A64` | -| uint32x4_t vreinterpretq_u32_bf16(bfloat16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.4S -> result` | `A32/A64` | -| poly8x16_t vreinterpretq_p8_bf16(bfloat16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.16B -> result` | `A32/A64` | -| poly16x8_t vreinterpretq_p16_bf16(bfloat16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.8H -> result` | `A32/A64` | -| uint64x2_t vreinterpretq_u64_bf16(bfloat16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.2D -> result` | `A32/A64` | -| int64x2_t vreinterpretq_s64_bf16(bfloat16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.2D -> result` | `A32/A64` | -| float64x2_t vreinterpretq_f64_bf16(bfloat16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.2D -> result` | `A64` | -| poly64x2_t vreinterpretq_p64_bf16(bfloat16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.2D -> result` | `A32/A64` | -| poly128_t vreinterpretq_p128_bf16(bfloat16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.1Q -> result` | `A32/A64` | +| Intrinsic | Argument preparation | AArch64 Instruction | Result | Supported architectures | +|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------|-----------------------|--------------------|---------------------------| +| bfloat16x4_t vreinterpret_bf16_s8(int8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.4H -> result` | `A32/A64` | +| bfloat16x4_t vreinterpret_bf16_s16(int16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.4H -> result` | `A32/A64` | +| bfloat16x4_t vreinterpret_bf16_s32(int32x2_t a) | `a -> Vd.2S` | `NOP` | `Vd.4H -> result` | `A32/A64` | +| bfloat16x4_t vreinterpret_bf16_f32(float32x2_t a) | `a -> Vd.2S` | `NOP` | `Vd.4H -> result` | `A32/A64` | +| bfloat16x4_t vreinterpret_bf16_u8(uint8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.4H -> result` | `A32/A64` | +| bfloat16x4_t vreinterpret_bf16_u16(uint16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.4H -> result` | `A32/A64` | +| bfloat16x4_t vreinterpret_bf16_u32(uint32x2_t a) | `a -> Vd.2S` | `NOP` | `Vd.4H -> result` | `A32/A64` | +| bfloat16x4_t vreinterpret_bf16_p8(poly8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.4H -> result` | `A32/A64` | +| bfloat16x4_t vreinterpret_bf16_p16(poly16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.4H -> result` | `A32/A64` | +| bfloat16x4_t vreinterpret_bf16_mf8(mfloat8x8_t a) | `a -> Vd.8B` | `NOP` | `Vd.4H -> result` | `A64` | +| bfloat16x4_t vreinterpret_bf16_u64(uint64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.4H -> result` | `A32/A64` | +| bfloat16x4_t vreinterpret_bf16_s64(int64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.4H -> result` | `A32/A64` | +| bfloat16x8_t vreinterpretq_bf16_s8(int8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.8H -> result` | `A32/A64` | +| bfloat16x8_t vreinterpretq_bf16_s16(int16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.8H -> result` | `A32/A64` | +| bfloat16x8_t vreinterpretq_bf16_s32(int32x4_t a) | `a -> Vd.4S` | `NOP` | `Vd.8H -> result` | `A32/A64` | +| bfloat16x8_t vreinterpretq_bf16_f32(float32x4_t a) | `a -> Vd.4S` | `NOP` | `Vd.8H -> result` | `A32/A64` | +| bfloat16x8_t vreinterpretq_bf16_u8(uint8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.8H -> result` | `A32/A64` | +| bfloat16x8_t vreinterpretq_bf16_u16(uint16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.8H -> result` | `A32/A64` | +| bfloat16x8_t vreinterpretq_bf16_u32(uint32x4_t a) | `a -> Vd.4S` | `NOP` | `Vd.8H -> result` | `A32/A64` | +| bfloat16x8_t vreinterpretq_bf16_p8(poly8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.8H -> result` | `A32/A64` | +| bfloat16x8_t vreinterpretq_bf16_p16(poly16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.8H -> result` | `A32/A64` | +| bfloat16x8_t vreinterpretq_bf16_mf8(mfloat8x16_t a) | `a -> Vd.16B` | `NOP` | `Vd.8H -> result` | `A64` | +| bfloat16x8_t vreinterpretq_bf16_u64(uint64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.8H -> result` | `A32/A64` | +| bfloat16x8_t vreinterpretq_bf16_s64(int64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.8H -> result` | `A32/A64` | +| bfloat16x4_t vreinterpret_bf16_f64(float64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.4H -> result` | `A64` | +| bfloat16x8_t vreinterpretq_bf16_f64(float64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.8H -> result` | `A64` | +| bfloat16x4_t vreinterpret_bf16_p64(poly64x1_t a) | `a -> Vd.1D` | `NOP` | `Vd.4H -> result` | `A32/A64` | +| bfloat16x8_t vreinterpretq_bf16_p64(poly64x2_t a) | `a -> Vd.2D` | `NOP` | `Vd.8H -> result` | `A32/A64` | +| bfloat16x8_t vreinterpretq_bf16_p128(poly128_t a) | `a -> Vd.1Q` | `NOP` | `Vd.8H -> result` | `A32/A64` | +| int8x8_t vreinterpret_s8_bf16(bfloat16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.8B -> result` | `A32/A64` | +| int16x4_t vreinterpret_s16_bf16(bfloat16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.4H -> result` | `A32/A64` | +| int32x2_t vreinterpret_s32_bf16(bfloat16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.2S -> result` | `A32/A64` | +| float32x2_t vreinterpret_f32_bf16(bfloat16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.2S -> result` | `A32/A64` | +| uint8x8_t vreinterpret_u8_bf16(bfloat16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.8B -> result` | `A32/A64` | +| uint16x4_t vreinterpret_u16_bf16(bfloat16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.4H -> result` | `A32/A64` | +| uint32x2_t vreinterpret_u32_bf16(bfloat16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.2S -> result` | `A32/A64` | +| poly8x8_t vreinterpret_p8_bf16(bfloat16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.8B -> result` | `A32/A64` | +| poly16x4_t vreinterpret_p16_bf16(bfloat16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.4H -> result` | `A32/A64` | +| mfloat8x8_t vreinterpret_mf8_bf16(bfloat16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.8B -> result` | `A64` | +| uint64x1_t vreinterpret_u64_bf16(bfloat16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.1D -> result` | `A32/A64` | +| int64x1_t vreinterpret_s64_bf16(bfloat16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.1D -> result` | `A32/A64` | +| float64x1_t vreinterpret_f64_bf16(bfloat16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.1D -> result` | `A64` | +| poly64x1_t vreinterpret_p64_bf16(bfloat16x4_t a) | `a -> Vd.4H` | `NOP` | `Vd.1D -> result` | `A32/A64` | +| int8x16_t vreinterpretq_s8_bf16(bfloat16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.16B -> result` | `A32/A64` | +| int16x8_t vreinterpretq_s16_bf16(bfloat16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.8H -> result` | `A32/A64` | +| int32x4_t vreinterpretq_s32_bf16(bfloat16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.4S -> result` | `A32/A64` | +| float32x4_t vreinterpretq_f32_bf16(bfloat16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.4S -> result` | `A32/A64` | +| uint8x16_t vreinterpretq_u8_bf16(bfloat16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.16B -> result` | `A32/A64` | +| uint16x8_t vreinterpretq_u16_bf16(bfloat16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.8H -> result` | `A32/A64` | +| uint32x4_t vreinterpretq_u32_bf16(bfloat16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.4S -> result` | `A32/A64` | +| poly8x16_t vreinterpretq_p8_bf16(bfloat16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.16B -> result` | `A32/A64` | +| poly16x8_t vreinterpretq_p16_bf16(bfloat16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.8H -> result` | `A32/A64` | +| mfloat8x16_t vreinterpretq_mf8_bf16(bfloat16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.16B -> result` | `A64` | +| uint64x2_t vreinterpretq_u64_bf16(bfloat16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.2D -> result` | `A32/A64` | +| int64x2_t vreinterpretq_s64_bf16(bfloat16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.2D -> result` | `A32/A64` | +| float64x2_t vreinterpretq_f64_bf16(bfloat16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.2D -> result` | `A64` | +| poly64x2_t vreinterpretq_p64_bf16(bfloat16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.2D -> result` | `A32/A64` | +| poly128_t vreinterpretq_p128_bf16(bfloat16x8_t a) | `a -> Vd.8H` | `NOP` | `Vd.1Q -> result` | `A32/A64` | #### Conversions @@ -5886,3 +6075,82 @@ The intrinsics in this section are guarded by the macro ``__ARM_NEON``. | float32x4_t vbfmlalbq_laneq_f32(
     float32x4_t r,
     bfloat16x8_t a,
     bfloat16x8_t b,
     const int lane)
| `r -> Vd.4S`
`a -> Vn.8H`
`b -> Vm.8H`
`0 <= lane <= 7` | `BFMLALB Vd.4S,Vn.8H,Vm.H[lane]` | `Vd.4S -> result` | `A32/A64` | | float32x4_t vbfmlaltq_lane_f32(
     float32x4_t r,
     bfloat16x8_t a,
     bfloat16x4_t b,
     const int lane)
| `r -> Vd.4S`
`a -> Vn.8H`
`b -> Vm.4H`
`0 <= lane <= 3` | `BFMLALT Vd.4S,Vn.8H,Vm.H[lane]` | `Vd.4S -> result` | `A32/A64` | | float32x4_t vbfmlaltq_laneq_f32(
     float32x4_t r,
     bfloat16x8_t a,
     bfloat16x8_t b,
     const int lane)
| `r -> Vd.4S`
`a -> Vn.8H`
`b -> Vm.8H`
`0 <= lane <= 7` | `BFMLALT Vd.4S,Vn.8H,Vm.H[lane]` | `Vd.4S -> result` | `A32/A64` | + +## Modal 8-bit floating-point intrinsics + +### Data type conversion + +#### Conversions + +| Intrinsic | Argument preparation | AArch64 Instruction | Result | Supported architectures | +|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------|-------------------------------|--------------------|---------------------------| +| bfloat16x8_t vcvt1_bf16_mf8_fpm(
     mfloat8x8_t vn,
     fpm_t fpm)
| `vn -> Vn.8B` | `BF1CVTL Vd.8H,Vn.8B` | `Vd.8H -> result` | `A64` | +| bfloat16x8_t vcvt1_low_bf16_mf8_fpm(
     mfloat8x16_t vn,
     fpm_t fpm)
| `vn -> Vn.8B` | `BF1CVTL Vd.8H,Vn.8B` | `Vd.8H -> result` | `A64` | +| bfloat16x8_t vcvt2_bf16_mf8_fpm(
     mfloat8x8_t vn,
     fpm_t fpm)
| `vn -> Vn.8B` | `BF2CVTL Vd.8H,Vn.8B` | `Vd.8H -> result` | `A64` | +| bfloat16x8_t vcvt2_low_bf16_mf8_fpm(
     mfloat8x16_t vn,
     fpm_t fpm)
| `vn -> Vn.8B` | `BF2CVTL Vd.8H,Vn.8B` | `Vd.8H -> result` | `A64` | +| bfloat16x8_t vcvt1_high_bf16_mf8_fpm(
     mfloat8x16_t vn,
     fpm_t fpm)
| `vn -> Vn.16B` | `BF1CVTL2 Vd.8H,Vn.16B` | `Vd.8H -> result` | `A64` | +| bfloat16x8_t vcvt2_high_bf16_mf8_fpm(
     mfloat8x16_t vn,
     fpm_t fpm)
| `vn -> Vn.16B` | `BF2CVTL2 Vd.8H,Vn.16B` | `Vd.8H -> result` | `A64` | +| float16x8_t vcvt1_f16_mf8_fpm(
     mfloat8x8_t vn,
     fpm_t fpm)
| `vn -> Vn.8B` | `F1CVTL Vd.8H,Vn.8B` | `Vd.8H -> result` | `A64` | +| float16x8_t vcvt1_low_f16_mf8_fpm(
     mfloat8x16_t vn,
     fpm_t fpm)
| `vn -> Vn.8B` | `F1CVTL Vd.8H,Vn.8B` | `Vd.8H -> result` | `A64` | +| float16x8_t vcvt2_f16_mf8_fpm(
     mfloat8x8_t vn,
     fpm_t fpm)
| `vn -> Vn.8B` | `F2CVTL Vd.8H,Vn.8B` | `Vd.8H -> result` | `A64` | +| float16x8_t vcvt2_low_f16_mf8_fpm(
     mfloat8x16_t vn,
     fpm_t fpm)
| `vn -> Vn.8B` | `F2CVTL Vd.8H,Vn.8B` | `Vd.8H -> result` | `A64` | +| float16x8_t vcvt1_high_f16_mf8_fpm(
     mfloat8x16_t vn,
     fpm_t fpm)
| `vn -> Vn.16B` | `F1CVTL2 Vd.8H,Vn.16B` | `Vd.8H -> result` | `A64` | +| float16x8_t vcvt2_high_f16_mf8_fpm(
     mfloat8x16_t vn,
     fpm_t fpm)
| `vn -> Vn.16B` | `F2CVTL2 Vd.8H,Vn.16B` | `Vd.8H -> result` | `A64` | +| mfloat8x8_t vcvt_mf8_f32_fpm(
     float32x4_t vn,
     float32x4_t vm,
     fpm_t fpm)
| `vn -> Vn.4S`
`vm -> Vm.4S` | `FCVTN Vd.8B, Vn.4S, Vm.4S` | `Vd.8B -> result` | `A64` | +| mfloat8x16_t vcvt_high_mf8_f32_fpm(
     mfloat8x8_t vd,
     float32x4_t vn,
     float32x4_t vm,
     fpm_t fpm)
| `vn -> Vn.4S`
`vm -> Vm.4S` | `FCVTN2 Vd.16B, Vn.4S, Vm.4S` | `Vd.16B -> result` | `A64` | +| mfloat8x8_t vcvt_mf8_f16_fpm(
     float16x4_t vn,
     float16x4_t vm,
     fpm_t fpm)
| `vn -> Vn.4H`
`vm -> Vm.4H` | `FCVTN Vd.8B, Vn.4H, Vm.4H` | `Vd.8B -> result` | `A64` | +| mfloat8x16_t vcvtq_mf8_f16_fpm(
     float16x8_t vn,
     float16x8_t vm,
     fpm_t fpm)
| `vn -> Vn.8H`
`vm -> Vm.8H` | `FCVTN Vd.16B, Vn.8H, Vm.8H` | `Vd.16B -> result` | `A64` | + +### Vector arithmetic + +#### Exponent + +| Intrinsic | Argument preparation | AArch64 Instruction | Result | Supported architectures | +|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------|------------------------------|-------------------|---------------------------| +| float16x4_t vscale_f16(
     float16x4_t vn,
     int16x4_t vm)
| `vn -> Vn.4H`
`vm -> Vm.4H` | `FSCALE Vd.4H, Vn.4H, Vm.4H` | `Vd.4H -> result` | `A64` | +| float16x8_t vscaleq_f16(
     float16x8_t vn,
     int16x8_t vm)
| `vn -> Vn.8H`
`vm -> Vm.8H` | `FSCALE Vd.8H, Vn.8H, Vm.8H` | `Vd.8H -> result` | `A64` | +| float32x2_t vscale_f32(
     float32x2_t vn,
     int32x2_t vm)
| `vn -> Vn.2S`
`vm -> Vm.2S` | `FSCALE Vd.2S, Vn.2S, Vm.2S` | `Vd.2S -> result` | `A64` | +| float32x4_t vscaleq_f32(
     float32x4_t vn,
     int32x4_t vm)
| `vn -> Vn.4S`
`vm -> Vm.4S` | `FSCALE Vd.4S, Vn.4S, Vm.4S` | `Vd.4S -> result` | `A64` | +| float64x2_t vscaleq_f64(
     float64x2_t vn,
     int64x2_t vm)
| `vn -> Vn.2D`
`vm -> Vm.2D` | `FSCALE Vd.2D, Vn.2D, Vm.2D` | `Vd.2D -> result` | `A64` | + +#### Dot product + +| Intrinsic | Argument preparation | AArch64 Instruction | Result | Supported architectures | +|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------|-----------------------------------|--------------------|---------------------------| +| float32x2_t vdot_f32_mf8_fpm(
     float32x2_t vd,
     mfloat8x8_t vn,
     mfloat8x8_t vm,
     fpm_t fpm)
| `vd -> Vd.2S`
`vn -> Vn.8B`
`vm -> Vm.8B` | `FDOT Vd.2S, Vn.8B, Vm.8B` | `Vd.2S -> result` | `A64` | +| float32x4_t vdotq_f32_mf8_fpm(
     float32x4_t vd,
     mfloat8x16_t vn,
     mfloat8x16_t vm,
     fpm_t fpm)
| `vd -> Vd.4S`
`vn -> Vn.16B`
`vm -> Vm.16B` | `FDOT Vd.4S, Vn.16B, Vm.16B` | `Vd.4S -> result` | `A64` | +| float32x2_t vdot_lane_f32_mf8_fpm(
     float32x2_t vd,
     mfloat8x8_t vn,
     mfloat8x8_t vm,
     const int lane,
     fpm_t fpm)
| `vd -> Vd.2S`
`vn -> Vn.8B`
`vm -> Vm.4B`
`0 <= lane <= 1` | `FDOT Vd.2S, Vn.8B, Vm.4B[lane]` | `Vd.2S -> result` | `A64` | +| float32x2_t vdot_laneq_f32_mf8_fpm(
     float32x2_t vd,
     mfloat8x8_t vn,
     mfloat8x16_t vm,
     const int lane,
     fpm_t fpm)
| `vd -> Vd.2S`
`vn -> Vn.16B`
`vm -> Vm.4B`
`0 <= lane <= 3` | `FDOT Vd.2S, Vn.8B, Vm.4B[lane]` | `Vd.2S -> result` | `A64` | +| float32x4_t vdotq_lane_f32_mf8_fpm(
     float32x4_t vd,
     mfloat8x16_t vn,
     mfloat8x8_t vm,
     const int lane,
     fpm_t fpm)
| `vd -> Vd.4S`
`vn -> Vn.8B`
`vm -> Vm.4B`
`0 <= lane <= 1` | `FDOT Vd.4S, Vn.8B, Vm.4B[lane]` | `Vd.4S -> result` | `A64` | +| float32x4_t vdotq_laneq_f32_mf8_fpm(
     float32x4_t vd,
     mfloat8x16_t vn,
     mfloat8x16_t vm,
     const int lane,
     fpm_t fpm)
| `vd -> Vd.4S`
`vn -> Vn.16`
`vm -> Vm.4B`
`0 <= lane <= 3` | `FDOT Vd.4S, Vn.8B, Vm.4B[lane]` | `Vd.4SB -> result` | `A64` | +| float16x4_t vdot_f16_mf8_fpm(
     float16x4_t vd,
     mfloat8x8_t vn,
     mfloat8x8_t vm,
     fpm_t fpm)
| `vd -> Vd.4H`
`vn -> Vn.8B`
`vm -> Vm.8B` | `FDOT Vd.4H, Vn.8B, Vm.8B` | `Vd.4H -> result` | `A64` | +| float16x8_t vdotq_f16_mf8_fpm(
     float16x8_t vd,
     mfloat8x16_t vn,
     mfloat8x16_t vm,
     fpm_t fpm)
| `vd -> Vd.8H`
`vn -> Vn.16B`
`vm -> Vm.16B` | `FDOT Vd.8H, Vn.16B, Vm.16B` | `Vd.8H -> result` | `A64` | +| float16x4_t vdot_lane_f16_mf8_fpm(
     float16x4_t vd,
     mfloat8x8_t vn,
     mfloat8x8_t vm,
     const int lane,
     fpm_t fpm)
| `vd -> Vd.4H`
`vn -> Vn.8B`
`vm -> Vm.2B`
`0 <= lane <= 3` | `FDOT Vd.4H, Vn.8B, Vm.2B[lane]` | `Vd.4H -> result` | `A64` | +| float16x4_t vdot_laneq_f16_mf8_fpm(
     float16x4_t vd,
     mfloat8x8_t vn,
     mfloat8x16_t vm,
     const int lane,
     fpm_t fpm)
| `vd -> Vd.4H`
`vn -> Vn.8B`
`vm -> Vm.2B`
`0 <= lane <= 7` | `FDOT Vd.4H, Vn.8B, Vm.2B[lane]` | `Vd.4H -> result` | `A64` | +| float16x8_t vdotq_lane_f16_mf8_fpm(
     float16x8_t vd,
     mfloat8x16_t vn,
     mfloat8x8_t vm,
     const int lane,
     fpm_t fpm)
| `vd -> Vd.8H`
`vn -> Vn.16B`
`vm -> Vm.2B`
`0 <= lane <= 3` | `FDOT Vd.8H, Vn.16B, Vm.2B[lane]` | `Vd.8H -> result` | `A64` | +| float16x8_t vdotq_laneq_f16_mf8_fpm(
     float16x8_t vd,
     mfloat8x16_t vn,
     mfloat8x16_t vm,
     const int lane,
     fpm_t fpm)
| `vd -> Vd.8H`
`vn -> Vn.16B`
`vm -> Vm.2B`
`0 <= lane <= 7` | `FDOT Vd.8H, Vn.16B, Vm.2B[lane]` | `Vd.8H -> result` | `A64` | + +#### Multiply + +##### Multiply-accumulate and widen + +| Intrinsic | Argument preparation | AArch64 Instruction | Result | Supported architectures | +|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------|--------------------------------------|-------------------|---------------------------| +| float16x8_t vmlalbq_f16_mf8_fpm(
     float16x8_t vd,
     mfloat8x16_t vn,
     mfloat8x16_t vm,
     fpm_t fpm)
| `vd -> Vd.8H`
`vn -> Vn.16B`
`vm -> Vm.16B` | `FMLALB Vd.8H, Vn.16B, Vm.16B` | `Vd.8H -> result` | `A64` | +| float16x8_t vmlaltq_f16_mf8_fpm(
     float16x8_t vd,
     mfloat8x16_t vn,
     mfloat8x16_t vm,
     fpm_t fpm)
| `vd -> Vd.8H`
`vn -> Vn.16B`
`vm -> Vm.16B` | `FMLALT Vd.8H, Vn.16B, Vm.16B` | `Vd.8H -> result` | `A64` | +| float16x8_t vmlalbq_lane_f16_mf8_fpm(
     float16x8_t vd,
     mfloat8x16_t vn,
     mfloat8x8_t vm,
     const int lane,
     fpm_t fpm)
| `vd -> Vd.8H`
`vn -> Vn.16B`
`vm -> Vm.B`
`0 <= lane <= 7` | `FMLALB Vd.8H, Vn.16B, Vm.B[lane]` | `Vd.8H -> result` | `A64` | +| float16x8_t vmlalbq_laneq_f16_mf8_fpm(
     float16x8_t vd,
     mfloat8x16_t vn,
     mfloat8x16_t vm,
     const int lane,
     fpm_t fpm)
| `vd -> Vd.8H`
`vn -> Vn.16B`
`vm -> Vm.B`
`0 <= lane <= 15` | `FMLALB Vd.8H, Vn.16B, Vm.B[lane]` | `Vd.8H -> result` | `A64` | +| float16x8_t vmlaltq_lane_f16_mf8_fpm(
     float16x8_t vd,
     mfloat8x16_t vn,
     mfloat8x8_t vm,
     const int lane,
     fpm_t fpm)
| `vd -> Vd.8H`
`vn -> Vn.16B`
`vm -> Vm.B`
`0 <= lane <= 7` | `FMLALT Vd.8H, Vn.16B, Vm.B[lane]` | `Vd.8H -> result` | `A64` | +| float16x8_t vmlaltq_laneq_f16_mf8_fpm(
     float16x8_t vd,
     mfloat8x16_t vn,
     mfloat8x16_t vm,
     const int lane,
     fpm_t fpm)
| `vd -> Vd.8H`
`vn -> Vn.16B`
`vm -> Vm.B`
`0 <= lane <= 15` | `FMLALT Vd.8H, Vn.16B, Vm.B[lane]` | `Vd.8H -> result` | `A64` | +| float32x4_t vmlallbbq_f32_mf8_fpm(
     float32x4_t vd,
     mfloat8x16_t vn,
     mfloat8x16_t vm,
     fpm_t fpm)
| `vd -> Vd.4S`
`vn -> Vn.16B`
`vm -> Vm.16B` | `FMLALLBB Vd.4S, Vn.16B, Vm.16B` | `Vd.4S -> result` | `A64` | +| float32x4_t vmlallbtq_f32_mf8_fpm(
     float32x4_t vd,
     mfloat8x16_t vn,
     mfloat8x16_t vm,
     fpm_t fpm)
| `vd -> Vd.4S`
`vn -> Vn.16B`
`vm -> Vm.16B` | `FMLALLBT Vd.4S, Vn.16B, Vm.16B` | `Vd.4S -> result` | `A64` | +| float32x4_t vmlalltbq_f32_mf8_fpm(
     float32x4_t vd,
     mfloat8x16_t vn,
     mfloat8x16_t vm,
     fpm_t fpm)
| `vd -> Vd.4S`
`vn -> Vn.16B`
`vm -> Vm.16B` | `FMLALLTB Vd.4S, Vn.16B, Vm.16B` | `Vd.4S -> result` | `A64` | +| float32x4_t vmlallttq_f32_mf8_fpm(
     float32x4_t vd,
     mfloat8x16_t vn,
     mfloat8x16_t vm,
     fpm_t fpm)
| `vd -> Vd.4S`
`vn -> Vn.16B`
`vm -> Vm.16B` | `FMLALLTT Vd.4S, Vn.16B, Vm.16B` | `Vd.4S -> result` | `A64` | +| float32x4_t vmlallbbq_lane_f32_mf8_fpm(
     float32x4_t vd,
     mfloat8x16_t vn,
     mfloat8x8_t vm,
     const int lane,
     fpm_t fpm)
| `vd -> Vd.4S`
`vm -> Vn.16B`
`vm -> Vm.B`
`0 <= lane <= 7` | `FMLALLBB Vd.4S, Vn.16B, Vm.B[lane]` | `Vd.4S -> result` | `A64` | +| float32x4_t vmlallbbq_laneq_f32_mf8_fpm(
     float32x4_t vd,
     mfloat8x16_t vn,
     mfloat8x16_t vm,
     const int lane,
     fpm_t fpm)
| `vd -> Vd.4S`
`vm -> Vn.16B`
`vm -> Vm.B`
`0 <= lane <= 15` | `FMLALLBB Vd.4S, Vn.16B, Vm.B[lane]` | `Vd.4S -> result` | `A64` | +| float32x4_t vmlallbtq_lane_f32_mf8_fpm(
     float32x4_t vd,
     mfloat8x16_t vn,
     mfloat8x8_t vm,
     const int lane,
     fpm_t fpm)
| `vd -> Vd.4S`
`vm -> Vn.16B`
`vm -> Vm.B`
`0 <= lane <= 7` | `FMLALLBB Vd.4S, Vn.16B, Vm.B[lane]` | `Vd.4S -> result` | `A64` | +| float32x4_t vmlallbtq_laneq_f32_mf8_fpm(
     float32x4_t vd,
     mfloat8x16_t vn,
     mfloat8x16_t vm,
     const int lane,
     fpm_t fpm)
| `vd -> Vd.4S`
`vm -> Vn.16B`
`vm -> Vm.B`
`0 <= lane <= 15` | `FMLALLBB Vd.4S, Vn.16B, Vm.B[lane]` | `Vd.4S -> result` | `A64` | +| float32x4_t vmlalltbq_lane_f32_mf8_fpm(
     float32x4_t vd,
     mfloat8x16_t vn,
     mfloat8x8_t vm,
     const int lane,
     fpm_t fpm)
| `vd -> Vd.4S`
`vm -> Vn.16B`
`vm -> Vm.B`
`0 <= lane <= 7` | `FMLALLBB Vd.4S, Vn.16B, Vm.B[lane]` | `Vd.4S -> result` | `A64` | +| float32x4_t vmlalltbq_laneq_f32_mf8_fpm(
     float32x4_t vd,
     mfloat8x16_t vn,
     mfloat8x16_t vm,
     const int lane,
     fpm_t fpm)
| `vd -> Vd.4S`
`vm -> Vn.16B`
`vm -> Vm.B`
`0 <= lane <= 15` | `FMLALLBB Vd.4S, Vn.16B, Vm.B[lane]` | `Vd.4S -> result` | `A64` | +| float32x4_t vmlallttq_lane_f32_mf8_fpm(
     float32x4_t vd,
     mfloat8x16_t vn,
     mfloat8x8_t vm,
     const int lane,
     fpm_t fpm)
| `vd -> Vd.4S`
`vm -> Vn.16B`
`vm -> Vm.B`
`0 <= lane <= 7` | `FMLALLBB Vd.4S, Vn.16B, Vm.B[lane]` | `Vd.4S -> result` | `A64` | +| float32x4_t vmlallttq_laneq_f32_mf8_fpm(
     float32x4_t vd,
     mfloat8x16_t vn,
     mfloat8x16_t vm,
     const int lane,
     fpm_t fpm)
| `vd -> Vd.4S`
`vm -> Vn.16B`
`vm -> Vm.B`
`0 <= lane <= 15` | `FMLALLBB Vd.4S, Vn.16B, Vm.B[lane]` | `Vd.4S -> result` | `A64` | diff --git a/tools/intrinsic_db/advsimd.csv b/tools/intrinsic_db/advsimd.csv index 56e04f85..ec88903a 100644 --- a/tools/intrinsic_db/advsimd.csv +++ b/tools/intrinsic_db/advsimd.csv @@ -1844,6 +1844,8 @@ poly8x8_t vcopy_lane_p8(poly8x8_t a, __builtin_constant_p(lane1), poly8x8_t b, _ poly8x16_t vcopyq_lane_p8(poly8x16_t a, __builtin_constant_p(lane1), poly8x8_t b, __builtin_constant_p(lane2)) a -> Vd.16B;0 <= lane1 <= 15;b -> Vn.8B;0 <= lane2 <= 7 INS Vd.B[lane1],Vn.B[lane2] Vd.16B -> result A64 poly16x4_t vcopy_lane_p16(poly16x4_t a, __builtin_constant_p(lane1), poly16x4_t b, __builtin_constant_p(lane2)) a -> Vd.4H;0 <= lane1 <= 3;b -> Vn.4H;0 <= lane2 <= 3 INS Vd.H[lane1],Vn.H[lane2] Vd.4H -> result A64 poly16x8_t vcopyq_lane_p16(poly16x8_t a, __builtin_constant_p(lane1), poly16x4_t b, __builtin_constant_p(lane2)) a -> Vd.8H;0 <= lane1 <= 7;b -> Vn.4H;0 <= lane2 <= 3 INS Vd.H[lane1],Vn.H[lane2] Vd.8H -> result A64 +mfloat8x8_t vcopy_lane_mf8(mfloat8x8_t a, __builtin_constant_p(lane1), mfloat8x8_t b, __builtin_constant_p(lane2)) a -> Vd.8B;0 <= lane1 <= 7;b -> Vn.8B;0 <= lane2 <= 7 INS Vd.B[lane1],Vn.B[lane2] Vd.8B -> result A64 +mfloat8x16_t vcopyq_lane_mf8(mfloat8x16_t a, __builtin_constant_p(lane1), mfloat8x8_t b, __builtin_constant_p(lane2)) a -> Vd.16B;0 <= lane1 <= 15;b -> Vn.8B;0 <= lane2 <= 7 INS Vd.B[lane1],Vn.B[lane2] Vd.16B -> result A64 int8x8_t vcopy_laneq_s8(int8x8_t a, __builtin_constant_p(lane1), int8x16_t b, __builtin_constant_p(lane2)) a -> Vd.8B;0 <= lane1 <= 7;b -> Vn.16B;0 <= lane2 <= 15 INS Vd.B[lane1],Vn.B[lane2] Vd.8B -> result A64 int8x16_t vcopyq_laneq_s8(int8x16_t a, __builtin_constant_p(lane1), int8x16_t b, __builtin_constant_p(lane2)) a -> Vd.16B;0 <= lane1 <= 15;b -> Vn.16B;0 <= lane2 <= 15 INS Vd.B[lane1],Vn.B[lane2] Vd.16B -> result A64 int16x4_t vcopy_laneq_s16(int16x4_t a, __builtin_constant_p(lane1), int16x8_t b, __builtin_constant_p(lane2)) a -> Vd.4H;0 <= lane1 <= 3;b -> Vn.8H;0 <= lane2 <= 7 INS Vd.H[lane1],Vn.H[lane2] Vd.4H -> result A64 @@ -1870,6 +1872,8 @@ poly8x8_t vcopy_laneq_p8(poly8x8_t a, __builtin_constant_p(lane1), poly8x16_t b, poly8x16_t vcopyq_laneq_p8(poly8x16_t a, __builtin_constant_p(lane1), poly8x16_t b, __builtin_constant_p(lane2)) a -> Vd.16B;0 <= lane1 <= 15;b -> Vn.16B;0 <= lane2 <= 15 INS Vd.B[lane1],Vn.B[lane2] Vd.16B -> result A64 poly16x4_t vcopy_laneq_p16(poly16x4_t a, __builtin_constant_p(lane1), poly16x8_t b, __builtin_constant_p(lane2)) a -> Vd.4H;0 <= lane1 <= 3;b -> Vn.8H;0 <= lane2 <= 7 INS Vd.H[lane1],Vn.H[lane2] Vd.4H -> result A64 poly16x8_t vcopyq_laneq_p16(poly16x8_t a, __builtin_constant_p(lane1), poly16x8_t b, __builtin_constant_p(lane2)) a -> Vd.8H;0 <= lane1 <= 7;b -> Vn.8H;0 <= lane2 <= 7 INS Vd.H[lane1],Vn.H[lane2] Vd.8H -> result A64 +mfloat8x8_t vcopy_laneq_mf8(mfloat8x8_t a, __builtin_constant_p(lane1), mfloat8x16_t b, __builtin_constant_p(lane2)) a -> Vd.8B;0 <= lane1 <= 7;b -> Vn.16B;0 <= lane2 <= 15 INS Vd.B[lane1],Vn.B[lane2] Vd.8B -> result A64 +mfloat8x16_t vcopyq_laneq_mf8(mfloat8x16_t a, __builtin_constant_p(lane1), mfloat8x16_t b, __builtin_constant_p(lane2)) a -> Vd.16B;0 <= lane1 <= 15;b -> Vn.16B;0 <= lane2 <= 15 INS Vd.B[lane1],Vn.B[lane2] Vd.16B -> result A64 int8x8_t vrbit_s8(int8x8_t a) a -> Vn.8B RBIT Vd.8B,Vn.8B Vd.8B -> result A64 int8x16_t vrbitq_s8(int8x16_t a) a -> Vn.16B RBIT Vd.16B,Vn.16B Vd.16B -> result A64 uint8x8_t vrbit_u8(uint8x8_t a) a -> Vn.8B RBIT Vd.8B,Vn.8B Vd.8B -> result A64 @@ -1890,6 +1894,7 @@ float32x2_t vcreate_f32(uint64_t a) a -> Xn INS Vd.D[0],Xn Vd.2S -> result v7/A3 poly8x8_t vcreate_p8(uint64_t a) a -> Xn INS Vd.D[0],Xn Vd.8B -> result v7/A32/A64 poly16x4_t vcreate_p16(uint64_t a) a -> Xn INS Vd.D[0],Xn Vd.4H -> result v7/A32/A64 float64x1_t vcreate_f64(uint64_t a) a -> Xn INS Vd.D[0],Xn Vd.1D -> result A64 +mfloat8x8_t vcreate_mf8(uint64_t a) a -> Xn INS Vd.D[0],Xn Vd.8B -> result A64 int8x8_t vdup_n_s8(int8_t value) value -> rn DUP Vd.8B,rn Vd.8B -> result v7/A32/A64 int8x16_t vdupq_n_s8(int8_t value) value -> rn DUP Vd.16B,rn Vd.16B -> result v7/A32/A64 int16x4_t vdup_n_s16(int16_t value) value -> rn DUP Vd.4H,rn Vd.4H -> result v7/A32/A64 @@ -1916,6 +1921,8 @@ poly16x4_t vdup_n_p16(poly16_t value) value -> rn DUP Vd.4H,rn Vd.4H -> result v poly16x8_t vdupq_n_p16(poly16_t value) value -> rn DUP Vd.8H,rn Vd.8H -> result v7/A32/A64 float64x1_t vdup_n_f64(float64_t value) value -> rn INS Dd.D[0],xn Vd.1D -> result A64 float64x2_t vdupq_n_f64(float64_t value) value -> rn DUP Vd.2D,rn Vd.2D -> result A64 +mfloat8x8_t vdup_n_mf8(mfloat8_t value) value -> rn DUP Vd.8B,rn Vd.8B -> result A64 +mfloat8x16_t vdupq_n_mf8(mfloat8_t value) value -> rn DUP Vd.16B,rn Vd.16B -> result A64 int8x8_t vmov_n_s8(int8_t value) value -> rn DUP Vd.8B,rn Vd.8B -> result v7/A32/A64 int8x16_t vmovq_n_s8(int8_t value) value -> rn DUP Vd.16B,rn Vd.16B -> result v7/A32/A64 int16x4_t vmov_n_s16(int16_t value) value -> rn DUP Vd.4H,rn Vd.4H -> result v7/A32/A64 @@ -1940,6 +1947,8 @@ poly16x4_t vmov_n_p16(poly16_t value) value -> rn DUP Vd.4H,rn Vd.4H -> result v poly16x8_t vmovq_n_p16(poly16_t value) value -> rn DUP Vd.8H,rn Vd.8H -> result v7/A32/A64 float64x1_t vmov_n_f64(float64_t value) value -> rn DUP Vd.1D,rn Vd.1D -> result A64 float64x2_t vmovq_n_f64(float64_t value) value -> rn DUP Vd.2D,rn Vd.2D -> result A64 +mfloat8x8_t vmov_n_mf8(mfloat8_t value) value -> rn DUP Vd.8B,rn Vd.8B -> result A64 +mfloat8x16_t vmovq_n_mf8(mfloat8_t value) value -> rn DUP Vd.16B,rn Vd.16B -> result A64 int8x8_t vdup_lane_s8(int8x8_t vec, __builtin_constant_p(lane)) vec -> Vn.8B;0 <= lane <= 7 DUP Vd.8B,Vn.B[lane] Vd.8B -> result v7/A32/A64 int8x16_t vdupq_lane_s8(int8x8_t vec, __builtin_constant_p(lane)) vec -> Vn.8B;0 <= lane <= 7 DUP Vd.16B,Vn.B[lane] Vd.16B -> result v7/A32/A64 int16x4_t vdup_lane_s16(int16x4_t vec, __builtin_constant_p(lane)) vec -> Vn.4H;0 <= lane <= 3 DUP Vd.4H,Vn.H[lane] Vd.4H -> result v7/A32/A64 @@ -1966,6 +1975,8 @@ poly16x4_t vdup_lane_p16(poly16x4_t vec, __builtin_constant_p(lane)) vec -> Vn.4 poly16x8_t vdupq_lane_p16(poly16x4_t vec, __builtin_constant_p(lane)) vec -> Vn.4H;0 <= lane <= 3 DUP Vd.8H,Vn.H[lane] Vd.8H -> result v7/A32/A64 float64x1_t vdup_lane_f64(float64x1_t vec, __builtin_constant_p(lane)) vec -> Vn.1D;0 <= lane <= 0 DUP Dd,Vn.D[lane] Dd -> result A64 float64x2_t vdupq_lane_f64(float64x1_t vec, __builtin_constant_p(lane)) vec -> Vn.1D;0 <= lane <= 0 DUP Vd.2D,Vn.D[lane] Vd.2D -> result A64 +mfloat8x8_t vdup_lane_mf8(mfloat8x8_t vec, __builtin_constant_p(lane)) vec -> Vn.8B;0 <= lane <= 7 DUP Vd.8B,Vn.B[lane] Vd.8B -> result /A64 +mfloat8x16_t vdupq_lane_mf8(mfloat8x8_t vec, __builtin_constant_p(lane)) vec -> Vn.8B;0 <= lane <= 7 DUP Vd.16B,Vn.B[lane] Vd.16B -> result A64 int8x8_t vdup_laneq_s8(int8x16_t vec, __builtin_constant_p(lane)) vec -> Vn.16B;0 <= lane <= 15 DUP Vd.8B,Vn.B[lane] Vd.8B -> result A64 int8x16_t vdupq_laneq_s8(int8x16_t vec, __builtin_constant_p(lane)) vec -> Vn.16B;0 <= lane <= 15 DUP Vd.16B,Vn.B[lane] Vd.16B -> result A64 int16x4_t vdup_laneq_s16(int16x8_t vec, __builtin_constant_p(lane)) vec -> Vn.8H;0 <= lane <= 7 DUP Vd.4H,Vn.H[lane] Vd.4H -> result A64 @@ -1992,6 +2003,8 @@ poly16x4_t vdup_laneq_p16(poly16x8_t vec, __builtin_constant_p(lane)) vec -> Vn. poly16x8_t vdupq_laneq_p16(poly16x8_t vec, __builtin_constant_p(lane)) vec -> Vn.8H;0 <= lane <= 7 DUP Vd.8H,Vn.H[lane] Vd.8H -> result A64 float64x1_t vdup_laneq_f64(float64x2_t vec, __builtin_constant_p(lane)) vec -> Vn.2D;0 <= lane <= 1 DUP Dd,Vn.D[lane] Dd -> result A64 float64x2_t vdupq_laneq_f64(float64x2_t vec, __builtin_constant_p(lane)) vec -> Vn.2D;0 <= lane <= 1 DUP Vd.2D,Vn.D[lane] Vd.2D -> result A64 +mfloat8x8_t vdup_laneq_mf8(mfloat8x16_t vec, __builtin_constant_p(lane)) vec -> Vn.16B;0 <= lane <= 15 DUP Vd.8B,Vn.B[lane] Vd.8B -> result A64 +mfloat8x16_t vdupq_laneq_mf8(mfloat8x16_t vec, __builtin_constant_p(lane)) vec -> Vn.16B;0 <= lane <= 15 DUP Vd.16B,Vn.B[lane] Vd.16B -> result A64 int8x16_t vcombine_s8(int8x8_t low, int8x8_t high) low -> Vn.8B;high -> Vm.8B DUP Vd.1D,Vn.D[0];INS Vd.D[1],Vm.D[0] Vd.16B -> result v7/A32/A64 int16x8_t vcombine_s16(int16x4_t low, int16x4_t high) low -> Vn.4H;high -> Vm.4H DUP Vd.1D,Vn.D[0];INS Vd.D[1],Vm.D[0] Vd.8H -> result v7/A32/A64 int32x4_t vcombine_s32(int32x2_t low, int32x2_t high) low -> Vn.2S;high -> Vm.2S DUP Vd.1D,Vn.D[0];INS Vd.D[1],Vm.D[0] Vd.4S -> result v7/A32/A64 @@ -2006,6 +2019,7 @@ float32x4_t vcombine_f32(float32x2_t low, float32x2_t high) low -> Vn.2S;high -> poly8x16_t vcombine_p8(poly8x8_t low, poly8x8_t high) low -> Vn.8B;high -> Vm.8B DUP Vd.1D,Vn.D[0];INS Vd.D[1],Vm.D[0] Vd.16B -> result v7/A32/A64 poly16x8_t vcombine_p16(poly16x4_t low, poly16x4_t high) low -> Vn.4H;high -> Vm.4H DUP Vd.1D,Vn.D[0];INS Vd.D[1],Vm.D[0] Vd.8H -> result v7/A32/A64 float64x2_t vcombine_f64(float64x1_t low, float64x1_t high) low -> Vn.1D;high -> Vm.1D DUP Vd.1D,Vn.D[0];INS Vd.D[1],Vm.D[0] Vd.2D -> result A64 +mfloat8x16_t vcombine_mf8(mfloat8x8_t low, mfloat8x8_t high) low -> Vn.8B;high -> Vm.8B DUP Vd.1D,Vn.D[0];INS Vd.D[1],Vm.D[0] Vd.16B -> result A64 int8x8_t vget_high_s8(int8x16_t a) a -> Vn.16B DUP Vd.1D,Vn.D[1] Vd.8B -> result v7/A32/A64 int16x4_t vget_high_s16(int16x8_t a) a -> Vn.8H DUP Vd.1D,Vn.D[1] Vd.4H -> result v7/A32/A64 int32x2_t vget_high_s32(int32x4_t a) a -> Vn.4S DUP Vd.1D,Vn.D[1] Vd.2S -> result v7/A32/A64 @@ -2020,6 +2034,7 @@ float32x2_t vget_high_f32(float32x4_t a) a -> Vn.4S DUP Vd.1D,Vn.D[1] Vd.2S -> r poly8x8_t vget_high_p8(poly8x16_t a) a -> Vn.16B DUP Vd.1D,Vn.D[1] Vd.8B -> result v7/A32/A64 poly16x4_t vget_high_p16(poly16x8_t a) a -> Vn.8H DUP Vd.1D,Vn.D[1] Vd.4H -> result v7/A32/A64 float64x1_t vget_high_f64(float64x2_t a) a -> Vn.2D DUP Vd.1D,Vn.D[1] Vd.1D -> result A64 +mfloat8x8_t vget_high_mf8(mfloat8x16_t a) a -> Vn.16B DUP Vd.1D,Vn.D[1] Vd.8B -> result A64 int8x8_t vget_low_s8(int8x16_t a) a -> Vn.16B DUP Vd.1D,Vn.D[0] Vd.8B -> result v7/A32/A64 int16x4_t vget_low_s16(int16x8_t a) a -> Vn.8H DUP Vd.1D,Vn.D[0] Vd.4H -> result v7/A32/A64 int32x2_t vget_low_s32(int32x4_t a) a -> Vn.4S DUP Vd.1D,Vn.D[0] Vd.2S -> result v7/A32/A64 @@ -2034,6 +2049,7 @@ float32x2_t vget_low_f32(float32x4_t a) a -> Vn.4S DUP Vd.1D,Vn.D[0] Vd.2S -> re poly8x8_t vget_low_p8(poly8x16_t a) a -> Vn.16B DUP Vd.1D,Vn.D[0] Vd.8B -> result v7/A32/A64 poly16x4_t vget_low_p16(poly16x8_t a) a -> Vn.8H DUP Vd.1D,Vn.D[0] Vd.4H -> result v7/A32/A64 float64x1_t vget_low_f64(float64x2_t a) a -> Vn.2D DUP Vd.1D,Vn.D[0] Vd.1D -> result A64 +mfloat8x8_t vget_low_mf8(mfloat8x16_t a) a -> Vn.16B DUP Vd.1D,Vn.D[0] Vd.8B -> result A64 int8_t vdupb_lane_s8(int8x8_t vec, __builtin_constant_p(lane)) vec -> Vn.8B;0 <= lane <= 7 DUP Bd,Vn.B[lane] Bd -> result A64 int16_t vduph_lane_s16(int16x4_t vec, __builtin_constant_p(lane)) vec -> Vn.4H;0 <= lane <= 3 DUP Hd,Vn.H[lane] Hd -> result A64 int32_t vdups_lane_s32(int32x2_t vec, __builtin_constant_p(lane)) vec -> Vn.2S;0 <= lane <= 1 DUP Sd,Vn.S[lane] Sd -> result A64 @@ -2046,6 +2062,7 @@ float32_t vdups_lane_f32(float32x2_t vec, __builtin_constant_p(lane)) vec -> Vn. float64_t vdupd_lane_f64(float64x1_t vec, __builtin_constant_p(lane)) vec -> Vn.1D;0 <= lane <= 0 DUP Dd,Vn.D[lane] Dd -> result A64 poly8_t vdupb_lane_p8(poly8x8_t vec, __builtin_constant_p(lane)) vec -> Vn.8B;0 <= lane <= 7 DUP Bd,Vn.B[lane] Bd -> result A64 poly16_t vduph_lane_p16(poly16x4_t vec, __builtin_constant_p(lane)) vec -> Vn.4H;0 <= lane <= 3 DUP Hd,Vn.H[lane] Hd -> result A64 +mfloat8_t vdupb_lane_mf8(mfloat8x8_t vec, __builtin_constant_p(lane)) vec -> Vn.8B;0 <= lane <= 7 DUP Bd,Vn.B[lane] Bd -> result A64 int8_t vdupb_laneq_s8(int8x16_t vec, __builtin_constant_p(lane)) vec -> Vn.16B;0 <= lane <= 15 DUP Bd,Vn.B[lane] Bd -> result A64 int16_t vduph_laneq_s16(int16x8_t vec, __builtin_constant_p(lane)) vec -> Vn.8H;0 <= lane <= 7 DUP Hd,Vn.H[lane] Hd -> result A64 int32_t vdups_laneq_s32(int32x4_t vec, __builtin_constant_p(lane)) vec -> Vn.4S;0 <= lane <= 3 DUP Sd,Vn.S[lane] Sd -> result A64 @@ -2058,6 +2075,7 @@ float32_t vdups_laneq_f32(float32x4_t vec, __builtin_constant_p(lane)) vec -> Vn float64_t vdupd_laneq_f64(float64x2_t vec, __builtin_constant_p(lane)) vec -> Vn.2D;0 <= lane <= 1 DUP Dd,Vn.D[lane] Dd -> result A64 poly8_t vdupb_laneq_p8(poly8x16_t vec, __builtin_constant_p(lane)) vec -> Vn.16B;0 <= lane <= 15 DUP Bd,Vn.B[lane] Bd -> result A64 poly16_t vduph_laneq_p16(poly16x8_t vec, __builtin_constant_p(lane)) vec -> Vn.8H;0 <= lane <= 7 DUP Hd,Vn.H[lane] Hd -> result A64 +mfloat8_t vdupb_laneq_mf8(mfloat8x16_t vec, __builtin_constant_p(lane)) vec -> Vn.16B;0 <= lane <= 15 DUP Bd,Vn.B[lane] Bd -> result A64 int8x8_t vld1_s8(int8_t const *ptr) ptr -> Xn LD1 {Vt.8B},[Xn] Vt.8B -> result v7/A32/A64 int8x16_t vld1q_s8(int8_t const *ptr) ptr -> Xn LD1 {Vt.16B},[Xn] Vt.16B -> result v7/A32/A64 int16x4_t vld1_s16(int16_t const *ptr) ptr -> Xn LD1 {Vt.4H},[Xn] Vt.4H -> result v7/A32/A64 @@ -2086,6 +2104,8 @@ poly16x4_t vld1_p16(poly16_t const *ptr) ptr -> Xn LD1 {Vt.4H},[Xn] Vt.4H -> res poly16x8_t vld1q_p16(poly16_t const *ptr) ptr -> Xn LD1 {Vt.8H},[Xn] Vt.8H -> result v7/A32/A64 float64x1_t vld1_f64(float64_t const *ptr) ptr -> Xn LD1 {Vt.1D},[Xn] Vt.1D -> result A64 float64x2_t vld1q_f64(float64_t const *ptr) ptr -> Xn LD1 {Vt.2D},[Xn] Vt.2D -> result A64 +mfloat8x8_t vld1_mf8(mfloat8_t const *ptr) ptr -> Xn LD1 {Vt.8B},[Xn] Vt.8B -> result A64 +mfloat8x16_t vld1q_mf8(mfloat8_t const *ptr) ptr -> Xn LD1 {Vt.16B},[Xn] Vt.16B -> result A64 int8x8_t vld1_lane_s8(int8_t const *ptr, int8x8_t src, __builtin_constant_p(lane)) ptr -> Xn;src -> Vt.8B;0 <= lane <= 7 LD1 {Vt.b}[lane],[Xn] Vt.8B -> result v7/A32/A64 int8x16_t vld1q_lane_s8(int8_t const *ptr, int8x16_t src, __builtin_constant_p(lane)) ptr -> Xn;src -> Vt.16B;0 <= lane <= 15 LD1 {Vt.b}[lane],[Xn] Vt.16B -> result v7/A32/A64 int16x4_t vld1_lane_s16(int16_t const *ptr, int16x4_t src, __builtin_constant_p(lane)) ptr -> Xn;src -> Vt.4H;0 <= lane <= 3 LD1 {Vt.H}[lane],[Xn] Vt.4H -> result v7/A32/A64 @@ -2114,6 +2134,8 @@ poly16x4_t vld1_lane_p16(poly16_t const *ptr, poly16x4_t src, __builtin_constant poly16x8_t vld1q_lane_p16(poly16_t const *ptr, poly16x8_t src, __builtin_constant_p(lane)) ptr -> Xn;src -> Vt.8H;0 <= lane <= 7 LD1 {Vt.H}[lane],[Xn] Vt.8H -> result v7/A32/A64 float64x1_t vld1_lane_f64(float64_t const *ptr, float64x1_t src, __builtin_constant_p(lane)) ptr -> Xn;src -> Vt.1D;0 <= lane <= 0 LD1 {Vt.D}[lane],[Xn] Vt.1D -> result A64 float64x2_t vld1q_lane_f64(float64_t const *ptr, float64x2_t src, __builtin_constant_p(lane)) ptr -> Xn;src -> Vt.2D;0 <= lane <= 1 LD1 {Vt.D}[lane],[Xn] Vt.2D -> result A64 +mfloat8x8_t vld1_lane_mf8(mfloat8_t const *ptr, mfloat8x8_t src, __builtin_constant_p(lane)) ptr -> Xn;src -> Vt.8B;0 <= lane <= 7 LD1 {Vt.b}[lane],[Xn] Vt.8B -> result A64 +mfloat8x16_t vld1q_lane_mf8(mfloat8_t const *ptr, mfloat8x16_t src, __builtin_constant_p(lane)) ptr -> Xn;src -> Vt.16B;0 <= lane <= 15 LD1 {Vt.b}[lane],[Xn] Vt.16B -> result A64 uint64x1_t vldap1_lane_u64(uint64_t const *ptr, uint64x1_t src, __builtin_constant_p(lane)) ptr -> Xn;src -> Vt.1D;0 <= lane <= 0 LDAP1 {Vt.D}[lane],[Xn] Vt.1D -> result A64 uint64x2_t vldap1q_lane_u64(uint64_t const *ptr, uint64x2_t src, __builtin_constant_p(lane)) ptr -> Xn;src -> Vt.2D;0 <= lane <= 1 LDAP1 {Vt.D}[lane],[Xn] Vt.2D -> result A64 int64x1_t vldap1_lane_s64(int64_t const *ptr, int64x1_t src, __builtin_constant_p(lane)) ptr -> Xn;src -> Vt.1D;0 <= lane <= 0 LDAP1 {Vt.D}[lane],[Xn] Vt.1D -> result A64 @@ -2150,6 +2172,8 @@ poly16x4_t vld1_dup_p16(poly16_t const *ptr) ptr -> Xn LD1R {Vt.4H},[Xn] Vt.4H - poly16x8_t vld1q_dup_p16(poly16_t const *ptr) ptr -> Xn LD1R {Vt.8H},[Xn] Vt.8H -> result v7/A32/A64 float64x1_t vld1_dup_f64(float64_t const *ptr) ptr -> Xn LD1 {Vt.1D},[Xn] Vt.1D -> result A64 float64x2_t vld1q_dup_f64(float64_t const *ptr) ptr -> Xn LD1R {Vt.2D},[Xn] Vt.2D -> result A64 +mfloat8x8_t vld1_dup_mf8(mfloat8_t const *ptr) ptr -> Xn LD1R {Vt.8B},[Xn] Vt.8B -> result A64 +mfloat8x16_t vld1q_dup_mf8(mfloat8_t const *ptr) ptr -> Xn LD1R {Vt.16B},[Xn] Vt.16B -> result A64 void vst1_s8(int8_t *ptr, int8x8_t val) val -> Vt.8B;ptr -> Xn ST1 {Vt.8B},[Xn] v7/A32/A64 void vst1q_s8(int8_t *ptr, int8x16_t val) val -> Vt.16B;ptr -> Xn ST1 {Vt.16B},[Xn] v7/A32/A64 void vst1_s16(int16_t *ptr, int16x4_t val) val -> Vt.4H;ptr -> Xn ST1 {Vt.4H},[Xn] v7/A32/A64 @@ -2178,6 +2202,8 @@ void vst1_p16(poly16_t *ptr, poly16x4_t val) val -> Vt.4H;ptr -> Xn ST1 {Vt.4H}, void vst1q_p16(poly16_t *ptr, poly16x8_t val) val -> Vt.8H;ptr -> Xn ST1 {Vt.8H},[Xn] v7/A32/A64 void vst1_f64(float64_t *ptr, float64x1_t val) val -> Vt.1D;ptr -> Xn ST1 {Vt.1D},[Xn] A64 void vst1q_f64(float64_t *ptr, float64x2_t val) val -> Vt.2D;ptr -> Xn ST1 {Vt.2D},[Xn] A64 +void vst1_mf8(mfloat8_t *ptr, mfloat8x8_t val) val -> Vt.8B;ptr -> Xn ST1 {Vt.8B},[Xn] A64 +void vst1q_mf8(mfloat8_t *ptr, mfloat8x16_t val) val -> Vt.16B;ptr -> Xn ST1 {Vt.16B},[Xn] A64 void vst1_lane_s8(int8_t *ptr, int8x8_t val, __builtin_constant_p(lane)) val -> Vt.8B;ptr -> Xn;0 <= lane <= 7 ST1 {Vt.b}[lane],[Xn] v7/A32/A64 void vst1q_lane_s8(int8_t *ptr, int8x16_t val, __builtin_constant_p(lane)) val -> Vt.16B;ptr -> Xn;0 <= lane <= 15 ST1 {Vt.b}[lane],[Xn] v7/A32/A64 void vst1_lane_s16(int16_t *ptr, int16x4_t val, __builtin_constant_p(lane)) val -> Vt.4H;ptr -> Xn;0 <= lane <= 3 ST1 {Vt.h}[lane],[Xn] v7/A32/A64 @@ -2214,6 +2240,8 @@ void vstl1_lane_f64(float64_t *ptr, float64x1_t val, __builtin_constant_p(lane)) void vstl1q_lane_f64(float64_t *ptr, float64x2_t val, __builtin_constant_p(lane)) val -> Vt.2D;ptr -> Xn;0 <= lane <= 1 STL1 {Vt.d}[lane],[Xn] A64 void vstl1_lane_p64(poly64_t *ptr, poly64x1_t val, __builtin_constant_p(lane)) val -> Vt.1D;ptr -> Xn;0 <= lane <= 0 STL1 {Vt.d}[lane],[Xn] A64 void vstl1q_lane_p64(poly64_t *ptr, poly64x2_t val, __builtin_constant_p(lane)) val -> Vt.2D;ptr -> Xn;0 <= lane <= 1 STL1 {Vt.d}[lane],[Xn] A64 +void vst1_lane_mf8(mfloat8_t *ptr, mfloat8x8_t val, __builtin_constant_p(lane)) val -> Vt.8B;ptr -> Xn;0 <= lane <= 7 ST1 {Vt.b}[lane],[Xn] A64 +void vst1q_lane_mf8(mfloat8_t *ptr, mfloat8x16_t val, __builtin_constant_p(lane)) val -> Vt.16B;ptr -> Xn;0 <= lane <= 15 ST1 {Vt.b}[lane],[Xn] A64 int8x8x2_t vld2_s8(int8_t const *ptr) ptr -> Xn LD2 {Vt.8B - Vt2.8B},[Xn] Vt2.8B -> result.val[1];Vt.8B -> result.val[0] v7/A32/A64 int8x16x2_t vld2q_s8(int8_t const *ptr) ptr -> Xn LD2 {Vt.16B - Vt2.16B},[Xn] Vt2.16B -> result.val[1];Vt.16B -> result.val[0] v7/A32/A64 int16x4x2_t vld2_s16(int16_t const *ptr) ptr -> Xn LD2 {Vt.4H - Vt2.4H},[Xn] Vt2.4H -> result.val[1];Vt.4H -> result.val[0] v7/A32/A64 @@ -2242,6 +2270,8 @@ uint64x2x2_t vld2q_u64(uint64_t const *ptr) ptr -> Xn LD2 {Vt.2D - Vt2.2D},[Xn] poly64x2x2_t vld2q_p64(poly64_t const *ptr) ptr -> Xn LD2 {Vt.2D - Vt2.2D},[Xn] Vt2.2D -> result.val[1];Vt.2D -> result.val[0] A64 float64x1x2_t vld2_f64(float64_t const *ptr) ptr -> Xn LD1 {Vt.1D - Vt2.1D},[Xn] Vt2.1D -> result.val[1];Vt.1D -> result.val[0] A64 float64x2x2_t vld2q_f64(float64_t const *ptr) ptr -> Xn LD2 {Vt.2D - Vt2.2D},[Xn] Vt2.2D -> result.val[1];Vt.2D -> result.val[0] A64 +mfloat8x8x2_t vld2_mf8(mfloat8_t const *ptr) ptr -> Xn LD2 {Vt.8B - Vt2.8B},[Xn] Vt2.8B -> result.val[1];Vt.8B -> result.val[0] A64 +mfloat8x16x2_t vld2q_mf8(mfloat8_t const *ptr) ptr -> Xn LD2 {Vt.16B - Vt2.16B},[Xn] Vt2.16B -> result.val[1];Vt.16B -> result.val[0] A64 int8x8x3_t vld3_s8(int8_t const *ptr) ptr -> Xn LD3 {Vt.8B - Vt3.8B},[Xn] Vt3.8B -> result.val[2];Vt2.8B -> result.val[1];Vt.8B -> result.val[0] v7/A32/A64 int8x16x3_t vld3q_s8(int8_t const *ptr) ptr -> Xn LD3 {Vt.16B - Vt3.16B},[Xn] Vt3.16B -> result.val[2];Vt2.16B -> result.val[1];Vt.16B -> result.val[0] v7/A32/A64 int16x4x3_t vld3_s16(int16_t const *ptr) ptr -> Xn LD3 {Vt.4H - Vt3.4H},[Xn] Vt3.4H -> result.val[2];Vt2.4H -> result.val[1];Vt.4H -> result.val[0] v7/A32/A64 @@ -2270,6 +2300,8 @@ uint64x2x3_t vld3q_u64(uint64_t const *ptr) ptr -> Xn LD3 {Vt.2D - Vt3.2D},[Xn] poly64x2x3_t vld3q_p64(poly64_t const *ptr) ptr -> Xn LD3 {Vt.2D - Vt3.2D},[Xn] Vt3.2D -> result.val[2];Vt2.2D -> result.val[1];Vt.2D -> result.val[0] A64 float64x1x3_t vld3_f64(float64_t const *ptr) ptr -> Xn LD1 {Vt.1D - Vt3.1D},[Xn] Vt3.1D -> result.val[2];Vt2.1D -> result.val[1];Vt.1D -> result.val[0] A64 float64x2x3_t vld3q_f64(float64_t const *ptr) ptr -> Xn LD3 {Vt.2D - Vt3.2D},[Xn] Vt3.2D -> result.val[2];Vt2.2D -> result.val[1];Vt.2D -> result.val[0] A64 +mfloat8x8x3_t vld3_mf8(int8_t const *ptr) ptr -> Xn LD3 {Vt.8B - Vt3.8B},[Xn] Vt3.8B -> result.val[2];Vt2.8B -> result.val[1];Vt.8B -> result.val[0] A64 +mfloat8x16x3_t vld3q_mf8(int8_t const *ptr) ptr -> Xn LD3 {Vt.16B - Vt3.16B},[Xn] Vt3.16B -> result.val[2];Vt2.16B -> result.val[1];Vt.16B -> result.val[0] A64 int8x8x4_t vld4_s8(int8_t const *ptr) ptr -> Xn LD4 {Vt.8B - Vt4.8B},[Xn] Vt4.8B -> result.val[3];Vt3.8B -> result.val[2];Vt2.8B -> result.val[1];Vt.8B -> result.val[0] v7/A32/A64 int8x16x4_t vld4q_s8(int8_t const *ptr) ptr -> Xn LD4 {Vt.16B - Vt4.16B},[Xn] Vt4.16B -> result.val[3];Vt3.16B -> result.val[2];Vt2.16B -> result.val[1];Vt.16B -> result.val[0] v7/A32/A64 int16x4x4_t vld4_s16(int16_t const *ptr) ptr -> Xn LD4 {Vt.4H - Vt4.4H},[Xn] Vt4.4H -> result.val[3];Vt3.4H -> result.val[2];Vt2.4H -> result.val[1];Vt.4H -> result.val[0] v7/A32/A64 @@ -2298,6 +2330,8 @@ uint64x2x4_t vld4q_u64(uint64_t const *ptr) ptr -> Xn LD4 {Vt.2D - Vt4.2D},[Xn] poly64x2x4_t vld4q_p64(poly64_t const *ptr) ptr -> Xn LD4 {Vt.2D - Vt4.2D},[Xn] Vt4.2D -> result.val[3];Vt3.2D -> result.val[2];Vt2.2D -> result.val[1];Vt.2D -> result.val[0] A64 float64x1x4_t vld4_f64(float64_t const *ptr) ptr -> Xn LD1 {Vt.1D - Vt4.1D},[Xn] Vt4.1D -> result.val[3];Vt3.1D -> result.val[2];Vt2.1D -> result.val[1];Vt.1D -> result.val[0] A64 float64x2x4_t vld4q_f64(float64_t const *ptr) ptr -> Xn LD4 {Vt.2D - Vt4.2D},[Xn] Vt4.2D -> result.val[3];Vt3.2D -> result.val[2];Vt2.2D -> result.val[1];Vt.2D -> result.val[0] A64 +mfloat8x8x4_t vld4_mf8(mfloat8_t const *ptr) ptr -> Xn LD4 {Vt.8B - Vt4.8B},[Xn] Vt4.8B -> result.val[3];Vt3.8B -> result.val[2];Vt2.8B -> result.val[1];Vt.8B -> result.val[0] A64 +mfloat8x16x4_t vld4q_mf8(mfloat8_t const *ptr) ptr -> Xn LD4 {Vt.16B - Vt4.16B},[Xn] Vt4.16B -> result.val[3];Vt3.16B -> result.val[2];Vt2.16B -> result.val[1];Vt.16B -> result.val[0] A64 int8x8x2_t vld2_dup_s8(int8_t const *ptr) ptr -> Xn LD2R {Vt.8B - Vt2.8B},[Xn] Vt2.8B -> result.val[1];Vt.8B -> result.val[0] v7/A32/A64 int8x16x2_t vld2q_dup_s8(int8_t const *ptr) ptr -> Xn LD2R {Vt.16B - Vt2.16B},[Xn] Vt2.16B -> result.val[1];Vt.16B -> result.val[0] v7/A32/A64 int16x4x2_t vld2_dup_s16(int16_t const *ptr) ptr -> Xn LD2R {Vt.4H - Vt2.4H},[Xn] Vt2.4H -> result.val[1];Vt.4H -> result.val[0] v7/A32/A64 @@ -2326,6 +2360,8 @@ uint64x2x2_t vld2q_dup_u64(uint64_t const *ptr) ptr -> Xn LD2R {Vt.2D - Vt2.2D}, poly64x2x2_t vld2q_dup_p64(poly64_t const *ptr) ptr -> Xn LD2R {Vt.2D - Vt2.2D},[Xn] Vt2.2D -> result.val[1];Vt.2D -> result.val[0] A64 float64x1x2_t vld2_dup_f64(float64_t const *ptr) ptr -> Xn LD2R {Vt.1D - Vt2.1D},[Xn] Vt2.1D -> result.val[1];Vt.1D -> result.val[0] A64 float64x2x2_t vld2q_dup_f64(float64_t const *ptr) ptr -> Xn LD2R {Vt.2D - Vt2.2D},[Xn] Vt2.2D -> result.val[1];Vt.2D -> result.val[0] A64 +mfloat8x8x2_t vld2_dup_mf8(mfloat8_t const *ptr) ptr -> Xn LD2R {Vt.8B - Vt2.8B},[Xn] Vt2.8B -> result.val[1];Vt.8B -> result.val[0] A64 +mfloat8x16x2_t vld2q_dup_mf8(mfloat8_t const *ptr) ptr -> Xn LD2R {Vt.16B - Vt2.16B},[Xn] Vt2.16B -> result.val[1];Vt.16B -> result.val[0] A64 int8x8x3_t vld3_dup_s8(int8_t const *ptr) ptr -> Xn LD3R {Vt.8B - Vt3.8B},[Xn] Vt3.8B -> result.val[2];Vt2.8B -> result.val[1];Vt.8B -> result.val[0] v7/A32/A64 int8x16x3_t vld3q_dup_s8(int8_t const *ptr) ptr -> Xn LD3R {Vt.16B - Vt3.16B},[Xn] Vt3.16B -> result.val[2];Vt2.16B -> result.val[1];Vt.16B -> result.val[0] v7/A32/A64 int16x4x3_t vld3_dup_s16(int16_t const *ptr) ptr -> Xn LD3R {Vt.4H - Vt3.4H},[Xn] Vt3.4H -> result.val[2];Vt2.4H -> result.val[1];Vt.4H -> result.val[0] v7/A32/A64 @@ -2354,6 +2390,8 @@ uint64x2x3_t vld3q_dup_u64(uint64_t const *ptr) ptr -> Xn LD3R {Vt.2D - Vt3.2D}, poly64x2x3_t vld3q_dup_p64(poly64_t const *ptr) ptr -> Xn LD3R {Vt.2D - Vt3.2D},[Xn] Vt3.2D -> result.val[2];Vt2.2D -> result.val[1];Vt.2D -> result.val[0] A64 float64x1x3_t vld3_dup_f64(float64_t const *ptr) ptr -> Xn LD3R {Vt.1D - Vt3.1D},[Xn] Vt3.1D -> result.val[2];Vt2.1D -> result.val[1];Vt.1D -> result.val[0] A64 float64x2x3_t vld3q_dup_f64(float64_t const *ptr) ptr -> Xn LD3R {Vt.2D - Vt3.2D},[Xn] Vt3.2D -> result.val[2];Vt2.2D -> result.val[1];Vt.2D -> result.val[0] A64 +mfloat8x8x3_t vld3_dup_mf8(mfloat8_t const *ptr) ptr -> Xn LD3R {Vt.8B - Vt3.8B},[Xn] Vt3.8B -> result.val[2];Vt2.8B -> result.val[1];Vt.8B -> result.val[0] A64 +mfloat8x16x3_t vld3q_dup_mf8(mfloat8_t const *ptr) ptr -> Xn LD3R {Vt.16B - Vt3.16B},[Xn] Vt3.16B -> result.val[2];Vt2.16B -> result.val[1];Vt.16B -> result.val[0] A64 int8x8x4_t vld4_dup_s8(int8_t const *ptr) ptr -> Xn LD4R {Vt.8B - Vt4.8B},[Xn] Vt4.8B -> result.val[3];Vt3.8B -> result.val[2];Vt2.8B -> result.val[1];Vt.8B -> result.val[0] v7/A32/A64 int8x16x4_t vld4q_dup_s8(int8_t const *ptr) ptr -> Xn LD4R {Vt.16B - Vt4.16B},[Xn] Vt4.16B -> result.val[3];Vt3.16B -> result.val[2];Vt2.16B -> result.val[1];Vt.16B -> result.val[0] v7/A32/A64 int16x4x4_t vld4_dup_s16(int16_t const *ptr) ptr -> Xn LD4R {Vt.4H - Vt4.4H},[Xn] Vt4.4H -> result.val[3];Vt3.4H -> result.val[2];Vt2.4H -> result.val[1];Vt.4H -> result.val[0] v7/A32/A64 @@ -2382,6 +2420,8 @@ uint64x2x4_t vld4q_dup_u64(uint64_t const *ptr) ptr -> Xn LD4R {Vt.2D - Vt4.2D}, poly64x2x4_t vld4q_dup_p64(poly64_t const *ptr) ptr -> Xn LD4R {Vt.2D - Vt4.2D},[Xn] Vt4.2D -> result.val[3];Vt3.2D -> result.val[2];Vt2.2D -> result.val[1];Vt.2D -> result.val[0] A64 float64x1x4_t vld4_dup_f64(float64_t const *ptr) ptr -> Xn LD4R {Vt.1D - Vt4.1D},[Xn] Vt4.1D -> result.val[3];Vt3.1D -> result.val[2];Vt2.1D -> result.val[1];Vt.1D -> result.val[0] A64 float64x2x4_t vld4q_dup_f64(float64_t const *ptr) ptr -> Xn LD4R {Vt.2D - Vt4.2D},[Xn] Vt4.2D -> result.val[3];Vt3.2D -> result.val[2];Vt2.2D -> result.val[1];Vt.2D -> result.val[0] A64 +mfloat8x8x4_t vld4_dup_mf8(mfloat8_t const *ptr) ptr -> Xn LD4R {Vt.8B - Vt4.8B},[Xn] Vt4.8B -> result.val[3];Vt3.8B -> result.val[2];Vt2.8B -> result.val[1];Vt.8B -> result.val[0] A64 +mfloat8x16x4_t vld4q_dup_mf8(mfloat8_t const *ptr) ptr -> Xn LD4R {Vt.16B - Vt4.16B},[Xn] Vt4.16B -> result.val[3];Vt3.16B -> result.val[2];Vt2.16B -> result.val[1];Vt.16B -> result.val[0] A64 void vst2_s8(int8_t *ptr, int8x8x2_t val) val.val[1] -> Vt2.8B;val.val[0] -> Vt.8B;ptr -> Xn ST2 {Vt.8B - Vt2.8B},[Xn] v7/A32/A64 void vst2q_s8(int8_t *ptr, int8x16x2_t val) val.val[1] -> Vt2.16B;val.val[0] -> Vt.16B;ptr -> Xn ST2 {Vt.16B - Vt2.16B},[Xn] v7/A32/A64 void vst2_s16(int16_t *ptr, int16x4x2_t val) val.val[1] -> Vt2.4H;val.val[0] -> Vt.4H;ptr -> Xn ST2 {Vt.4H - Vt2.4H},[Xn] v7/A32/A64 @@ -2410,6 +2450,8 @@ void vst2q_u64(uint64_t *ptr, uint64x2x2_t val) val.val[1] -> Vt2.2D;val.val[0] void vst2q_p64(poly64_t *ptr, poly64x2x2_t val) val.val[1] -> Vt2.2D;val.val[0] -> Vt.2D;ptr -> Xn ST2 {Vt.2D - Vt2.2D},[Xn] A64 void vst2_f64(float64_t *ptr, float64x1x2_t val) val.val[1] -> Vt2.1D;val.val[0] -> Vt.1D;ptr -> Xn ST1 {Vt.1D - Vt2.1D},[Xn] A64 void vst2q_f64(float64_t *ptr, float64x2x2_t val) val.val[1] -> Vt2.2D;val.val[0] -> Vt.2D;ptr -> Xn ST2 {Vt.2D - Vt2.2D},[Xn] A64 +void vst2_mf8(mfloat8_t *ptr, mfloat8x8x2_t val) val.val[1] -> Vt2.8B;val.val[0] -> Vt.8B;ptr -> Xn ST2 {Vt.8B - Vt2.8B},[Xn] A64 +void vst2q_mf8(mfloat8_t *ptr, mfloat8x16x2_t val) val.val[1] -> Vt2.16B;val.val[0] -> Vt.16B;ptr -> Xn ST2 {Vt.16B - Vt2.16B},[Xn] A64 void vst3_s8(int8_t *ptr, int8x8x3_t val) val.val[2] -> Vt3.8B;val.val[1] -> Vt2.8B;val.val[0] -> Vt.8B;ptr -> Xn ST3 {Vt.8B - Vt3.8B},[Xn] v7/A32/A64 void vst3q_s8(int8_t *ptr, int8x16x3_t val) val.val[2] -> Vt3.16B;val.val[1] -> Vt2.16B;val.val[0] -> Vt.16B;ptr -> Xn ST3 {Vt.16B - Vt3.16B},[Xn] v7/A32/A64 void vst3_s16(int16_t *ptr, int16x4x3_t val) val.val[2] -> Vt3.4H;val.val[1] -> Vt2.4H;val.val[0] -> Vt.4H;ptr -> Xn ST3 {Vt.4H - Vt3.4H},[Xn] v7/A32/A64 @@ -2438,6 +2480,8 @@ void vst3q_u64(uint64_t *ptr, uint64x2x3_t val) val.val[2] -> Vt3.2D;val.val[1] void vst3q_p64(poly64_t *ptr, poly64x2x3_t val) val.val[2] -> Vt3.2D;val.val[1] -> Vt2.2D;val.val[0] -> Vt.2D;ptr -> Xn ST3 {Vt.2D - Vt3.2D},[Xn] A64 void vst3_f64(float64_t *ptr, float64x1x3_t val) val.val[2] -> Vt3.1D;val.val[1] -> Vt2.1D;val.val[0] -> Vt.1D;ptr -> Xn ST1 {Vt.1D - Vt3.1D},[Xn] A64 void vst3q_f64(float64_t *ptr, float64x2x3_t val) val.val[2] -> Vt3.2D;val.val[1] -> Vt2.2D;val.val[0] -> Vt.2D;ptr -> Xn ST3 {Vt.2D - Vt3.2D},[Xn] A64 +void vst3_mf8(mfloat8_t *ptr, mfloat8x8x3_t val) val.val[2] -> Vt3.8B;val.val[1] -> Vt2.8B;val.val[0] -> Vt.8B;ptr -> Xn ST3 {Vt.8B - Vt3.8B},[Xn] A64 +void vst3q_mf8(mfloat8_t *ptr, mfloat8x16x3_t val) val.val[2] -> Vt3.16B;val.val[1] -> Vt2.16B;val.val[0] -> Vt.16B;ptr -> Xn ST3 {Vt.16B - Vt3.16B},[Xn] A64 void vst4_s8(int8_t *ptr, int8x8x4_t val) val.val[3] -> Vt4.8B;val.val[2] -> Vt3.8B;val.val[1] -> Vt2.8B;val.val[0] -> Vt.8B;ptr -> Xn ST4 {Vt.8B - Vt4.8B},[Xn] v7/A32/A64 void vst4q_s8(int8_t *ptr, int8x16x4_t val) val.val[3] -> Vt4.16B;val.val[2] -> Vt3.16B;val.val[1] -> Vt2.16B;val.val[0] -> Vt.16B;ptr -> Xn ST4 {Vt.16B - Vt4.16B},[Xn] v7/A32/A64 void vst4_s16(int16_t *ptr, int16x4x4_t val) val.val[3] -> Vt4.4H;val.val[2] -> Vt3.4H;val.val[1] -> Vt2.4H;val.val[0] -> Vt.4H;ptr -> Xn ST4 {Vt.4H - Vt4.4H},[Xn] v7/A32/A64 @@ -2466,6 +2510,8 @@ void vst4q_u64(uint64_t *ptr, uint64x2x4_t val) val.val[3] -> Vt4.2D;val.val[2] void vst4q_p64(poly64_t *ptr, poly64x2x4_t val) val.val[3] -> Vt4.2D;val.val[2] -> Vt3.2D;val.val[1] -> Vt2.2D;val.val[0] -> Vt.2D;ptr -> Xn ST4 {Vt.2D - Vt4.2D},[Xn] A64 void vst4_f64(float64_t *ptr, float64x1x4_t val) val.val[3] -> Vt4.1D;val.val[2] -> Vt3.1D;val.val[1] -> Vt2.1D;val.val[0] -> Vt.1D;ptr -> Xn ST1 {Vt.1D - Vt4.1D},[Xn] A64 void vst4q_f64(float64_t *ptr, float64x2x4_t val) val.val[3] -> Vt4.2D;val.val[2] -> Vt3.2D;val.val[1] -> Vt2.2D;val.val[0] -> Vt.2D;ptr -> Xn ST4 {Vt.2D - Vt4.2D},[Xn] A64 +void vst4_mf8(mfloat8_t *ptr, mfloat8x8x4_t val) val.val[3] -> Vt4.8B;val.val[2] -> Vt3.8B;val.val[1] -> Vt2.8B;val.val[0] -> Vt.8B;ptr -> Xn ST4 {Vt.8B - Vt4.8B},[Xn] A64 +void vst4q_mf8(mfloat8_t *ptr, mfloat8x16x4_t val) val.val[3] -> Vt4.16B;val.val[2] -> Vt3.16B;val.val[1] -> Vt2.16B;val.val[0] -> Vt.16B;ptr -> Xn ST4 {Vt.16B - Vt4.16B},[Xn] A64 int16x4x2_t vld2_lane_s16(int16_t const *ptr, int16x4x2_t src, __builtin_constant_p(lane)) ptr -> Xn;src.val[1] -> Vt2.4H;src.val[0] -> Vt.4H;0 <= lane <= 3 LD2 {Vt.h - Vt2.h}[lane],[Xn] Vt2.4H -> result.val[1];Vt.4H -> result.val[0] v7/A32/A64 int16x8x2_t vld2q_lane_s16(int16_t const *ptr, int16x8x2_t src, __builtin_constant_p(lane)) ptr -> Xn;src.val[1] -> Vt2.8H;src.val[0] -> Vt.8H;0 <= lane <= 7 LD2 {Vt.h - Vt2.h}[lane],[Xn] Vt2.8H -> result.val[1];Vt.8H -> result.val[0] v7/A32/A64 int32x2x2_t vld2_lane_s32(int32_t const *ptr, int32x2x2_t src, __builtin_constant_p(lane)) ptr -> Xn;src.val[1] -> Vt2.2S;src.val[0] -> Vt.2S;0 <= lane <= 1 LD2 {Vt.s - Vt2.s}[lane],[Xn] Vt2.2S -> result.val[1];Vt.2S -> result.val[0] v7/A32/A64 @@ -2494,6 +2540,8 @@ poly64x1x2_t vld2_lane_p64(poly64_t const *ptr, poly64x1x2_t src, __builtin_cons poly64x2x2_t vld2q_lane_p64(poly64_t const *ptr, poly64x2x2_t src, __builtin_constant_p(lane)) ptr -> Xn;src.val[1] -> Vt2.2D;src.val[0] -> Vt.2D;0 <= lane <= 1 LD2 {Vt.d - Vt2.d}[lane],[Xn] Vt2.2D -> result.val[1];Vt.2D -> result.val[0] A64 float64x1x2_t vld2_lane_f64(float64_t const *ptr, float64x1x2_t src, __builtin_constant_p(lane)) ptr -> Xn;src.val[1] -> Vt2.1D;src.val[0] -> Vt.1D;0 <= lane <= 0 LD2 {Vt.d - Vt2.d}[lane],[Xn] Vt2.1D -> result.val[1];Vt.1D -> result.val[0] A64 float64x2x2_t vld2q_lane_f64(float64_t const *ptr, float64x2x2_t src, __builtin_constant_p(lane)) ptr -> Xn;src.val[1] -> Vt2.2D;src.val[0] -> Vt.2D;0 <= lane <= 1 LD2 {Vt.d - Vt2.d}[lane],[Xn] Vt2.2D -> result.val[1];Vt.2D -> result.val[0] A64 +mfloat8x8x2_t vld2_lane_mf8(mfloat8_t const *ptr, mfloat8x8x2_t src, __builtin_constant_p(lane)) ptr -> Xn;src.val[1] -> Vt2.8B;src.val[0] -> Vt.8B;0 <= lane <= 7 LD2 {Vt.b - Vt2.b}[lane],[Xn] Vt2.8B -> result.val[1];Vt.8B -> result.val[0] A64 +mfloat8x16x2_t vld2q_lane_mf8(mfloat8_t const *ptr, mfloat8x16x2_t src, __builtin_constant_p(lane)) ptr -> Xn;src.val[1] -> Vt2.16B;src.val[0] -> Vt.16B;0 <= lane <= 15 LD2 {Vt.b - Vt2.b}[lane],[Xn] Vt2.16B -> result.val[1];Vt.16B -> result.val[0] A64 int16x4x3_t vld3_lane_s16(int16_t const *ptr, int16x4x3_t src, __builtin_constant_p(lane)) ptr -> Xn;src.val[2] -> Vt3.4H;src.val[1] -> Vt2.4H;src.val[0] -> Vt.4H;0 <= lane <= 3 LD3 {Vt.h - Vt3.h}[lane],[Xn] Vt3.4H -> result.val[2];Vt2.4H -> result.val[1];Vt.4H -> result.val[0] v7/A32/A64 int16x8x3_t vld3q_lane_s16(int16_t const *ptr, int16x8x3_t src, __builtin_constant_p(lane)) ptr -> Xn;src.val[2] -> Vt3.8H;src.val[1] -> Vt2.8H;src.val[0] -> Vt.8H;0 <= lane <= 7 LD3 {Vt.h - Vt3.h}[lane],[Xn] Vt3.8H -> result.val[2];Vt2.8H -> result.val[1];Vt.8H -> result.val[0] v7/A32/A64 int32x2x3_t vld3_lane_s32(int32_t const *ptr, int32x2x3_t src, __builtin_constant_p(lane)) ptr -> Xn;src.val[2] -> Vt3.2S;src.val[1] -> Vt2.2S;src.val[0] -> Vt.2S;0 <= lane <= 1 LD3 {Vt.s - Vt3.s}[lane],[Xn] Vt3.2S -> result.val[2];Vt2.2S -> result.val[1];Vt.2S -> result.val[0] v7/A32/A64 @@ -2522,6 +2570,8 @@ poly64x1x3_t vld3_lane_p64(poly64_t const *ptr, poly64x1x3_t src, __builtin_cons poly64x2x3_t vld3q_lane_p64(poly64_t const *ptr, poly64x2x3_t src, __builtin_constant_p(lane)) ptr -> Xn;src.val[2] -> Vt3.2D;src.val[1] -> Vt2.2D;src.val[0] -> Vt.2D;0 <= lane <= 1 LD3 {Vt.d - Vt3.d}[lane],[Xn] Vt3.2D -> result.val[2];Vt2.2D -> result.val[1];Vt.2D -> result.val[0] A64 float64x1x3_t vld3_lane_f64(float64_t const *ptr, float64x1x3_t src, __builtin_constant_p(lane)) ptr -> Xn;src.val[2] -> Vt3.1D;src.val[1] -> Vt2.1D;src.val[0] -> Vt.1D;0 <= lane <= 0 LD3 {Vt.d - Vt3.d}[lane],[Xn] Vt3.1D -> result.val[2];Vt2.1D -> result.val[1];Vt.1D -> result.val[0] A64 float64x2x3_t vld3q_lane_f64(float64_t const *ptr, float64x2x3_t src, __builtin_constant_p(lane)) ptr -> Xn;src.val[2] -> Vt3.2D;src.val[1] -> Vt2.2D;src.val[0] -> Vt.2D;0 <= lane <= 1 LD3 {Vt.d - Vt3.d}[lane],[Xn] Vt3.2D -> result.val[2];Vt2.2D -> result.val[1];Vt.2D -> result.val[0] A64 +mfloat8x8x3_t vld3_lane_mf8(mfloat8_t const *ptr, mfloat8x8x3_t src, __builtin_constant_p(lane)) ptr -> Xn;src.val[2] -> Vt3.8B;src.val[1] -> Vt2.8B;src.val[0] -> Vt.8B;0 <= lane <= 7 LD3 {Vt.b - Vt3.b}[lane],[Xn] Vt3.8B -> result.val[2];Vt2.8B -> result.val[1];Vt.8B -> result.val[0] A64 +mfloat8x16x3_t vld3q_lane_mf8(mfloat8_t const *ptr, mfloat8x16x3_t src, __builtin_constant_p(lane)) ptr -> Xn;src.val[2] -> Vt3.16B;src.val[1] -> Vt2.16B;src.val[0] -> Vt.16B;0 <= lane <= 15 LD3 {Vt.b - Vt3.b}[lane],[Xn] Vt3.16B -> result.val[2];Vt2.16B -> result.val[1];Vt.16B -> result.val[0] A64 int16x4x4_t vld4_lane_s16(int16_t const *ptr, int16x4x4_t src, __builtin_constant_p(lane)) ptr -> Xn;src.val[3] -> Vt4.4H;src.val[2] -> Vt3.4H;src.val[1] -> Vt2.4H;src.val[0] -> Vt.4H;0 <= lane <= 3 LD4 {Vt.h - Vt4.h}[lane],[Xn] Vt4.4H -> result.val[3];Vt3.4H -> result.val[2];Vt2.4H -> result.val[1];Vt.4H -> result.val[0] v7/A32/A64 int16x8x4_t vld4q_lane_s16(int16_t const *ptr, int16x8x4_t src, __builtin_constant_p(lane)) ptr -> Xn;src.val[3] -> Vt4.8H;src.val[2] -> Vt3.8H;src.val[1] -> Vt2.8H;src.val[0] -> Vt.8H;0 <= lane <= 7 LD4 {Vt.h - Vt4.h}[lane],[Xn] Vt4.8H -> result.val[3];Vt3.8H -> result.val[2];Vt2.8H -> result.val[1];Vt.8H -> result.val[0] v7/A32/A64 int32x2x4_t vld4_lane_s32(int32_t const *ptr, int32x2x4_t src, __builtin_constant_p(lane)) ptr -> Xn;src.val[3] -> Vt4.2S;src.val[2] -> Vt3.2S;src.val[1] -> Vt2.2S;src.val[0] -> Vt.2S;0 <= lane <= 1 LD4 {Vt.s - Vt4.s}[lane],[Xn] Vt4.2S -> result.val[3];Vt3.2S -> result.val[2];Vt2.2S -> result.val[1];Vt.2S -> result.val[0] v7/A32/A64 @@ -2550,15 +2600,20 @@ poly64x1x4_t vld4_lane_p64(poly64_t const *ptr, poly64x1x4_t src, __builtin_cons poly64x2x4_t vld4q_lane_p64(poly64_t const *ptr, poly64x2x4_t src, __builtin_constant_p(lane)) ptr -> Xn;src.val[3] -> Vt4.2D;src.val[2] -> Vt3.2D;src.val[1] -> Vt2.2D;src.val[0] -> Vt.2D;0 <= lane <= 1 LD4 {Vt.d - Vt4.d}[lane],[Xn] Vt4.2D -> result.val[3];Vt3.2D -> result.val[2];Vt2.2D -> result.val[1];Vt.2D -> result.val[0] A64 float64x1x4_t vld4_lane_f64(float64_t const *ptr, float64x1x4_t src, __builtin_constant_p(lane)) ptr -> Xn;src.val[3] -> Vt4.1D;src.val[2] -> Vt3.1D;src.val[1] -> Vt2.1D;src.val[0] -> Vt.1D;0 <= lane <= 0 LD4 {Vt.d - Vt4.d}[lane],[Xn] Vt4.1D -> result.val[3];Vt3.1D -> result.val[2];Vt2.1D -> result.val[1];Vt.1D -> result.val[0] A64 float64x2x4_t vld4q_lane_f64(float64_t const *ptr, float64x2x4_t src, __builtin_constant_p(lane)) ptr -> Xn;src.val[3] -> Vt4.2D;src.val[2] -> Vt3.2D;src.val[1] -> Vt2.2D;src.val[0] -> Vt.2D;0 <= lane <= 1 LD4 {Vt.d - Vt4.d}[lane],[Xn] Vt4.2D -> result.val[3];Vt3.2D -> result.val[2];Vt2.2D -> result.val[1];Vt.2D -> result.val[0] A64 +mfloat8x8x4_t vld4_lane_mf8(mfloat8_t const *ptr, mfloat8x8x4_t src, __builtin_constant_p(lane)) ptr -> Xn;src.val[3] -> Vt4.8B;src.val[2] -> Vt3.8B;src.val[1] -> Vt2.8B;src.val[0] -> Vt.8B;0 <= lane <= 7 LD4 {Vt.b - Vt4.b}[lane],[Xn] Vt4.8B -> result.val[3];Vt3.8B -> result.val[2];Vt2.8B -> result.val[1];Vt.8B -> result.val[0] A64 +mfloat8x16x4_t vld4q_lane_mf8(mfloat8_t const *ptr, mfloat8x16x4_t src, __builtin_constant_p(lane)) ptr -> Xn;src.val[3] -> Vt4.16B;src.val[2] -> Vt3.16B;src.val[1] -> Vt2.16B;src.val[0] -> Vt.16B;0 <= lane <= 15 LD4 {Vt.b - Vt4.b}[lane],[Xn] Vt4.16B -> result.val[3];Vt3.16B -> result.val[2];Vt2.16B -> result.val[1];Vt.16B -> result.val[0] A64 void vst2_lane_s8(int8_t *ptr, int8x8x2_t val, __builtin_constant_p(lane)) val.val[1] -> Vt2.8B;val.val[0] -> Vt.8B;ptr -> Xn;0 <= lane <= 7 ST2 {Vt.b - Vt2.b}[lane],[Xn] v7/A32/A64 void vst2_lane_u8(uint8_t *ptr, uint8x8x2_t val, __builtin_constant_p(lane)) val.val[1] -> Vt2.8B;val.val[0] -> Vt.8B;ptr -> Xn;0 <= lane <= 7 ST2 {Vt.b - Vt2.b}[lane],[Xn] v7/A32/A64 void vst2_lane_p8(poly8_t *ptr, poly8x8x2_t val, __builtin_constant_p(lane)) val.val[1] -> Vt2.8B;val.val[0] -> Vt.8B;ptr -> Xn;0 <= lane <= 7 ST2 {Vt.b - Vt2.b}[lane],[Xn] v7/A32/A64 +void vst2_lane_mf8(mfloat8_t *ptr, mfloat8x8x2_t val, __builtin_constant_p(lane)) val.val[1] -> Vt2.8B;val.val[0] -> Vt.8B;ptr -> Xn;0 <= lane <= 7 ST2 {Vt.b - Vt2.b}[lane],[Xn] A64 void vst3_lane_s8(int8_t *ptr, int8x8x3_t val, __builtin_constant_p(lane)) val.val[2] -> Vt3.8B;val.val[1] -> Vt2.8B;val.val[0] -> Vt.8B;ptr -> Xn;0 <= lane <= 7 ST3 {Vt.b - Vt3.b}[lane],[Xn] v7/A32/A64 void vst3_lane_u8(uint8_t *ptr, uint8x8x3_t val, __builtin_constant_p(lane)) val.val[2] -> Vt3.8B;val.val[1] -> Vt2.8B;val.val[0] -> Vt.8B;ptr -> Xn;0 <= lane <= 7 ST3 {Vt.b - Vt3.b}[lane],[Xn] v7/A32/A64 void vst3_lane_p8(poly8_t *ptr, poly8x8x3_t val, __builtin_constant_p(lane)) val.val[2] -> Vt3.8B;val.val[1] -> Vt2.8B;val.val[0] -> Vt.8B;ptr -> Xn;0 <= lane <= 7 ST3 {Vt.b - Vt3.b}[lane],[Xn] v7/A32/A64 +void vst3_lane_mf8(mfloat8_t *ptr, mfloat8x8x3_t val, __builtin_constant_p(lane)) val.val[2] -> Vt3.8B;val.val[1] -> Vt2.8B;val.val[0] -> Vt.8B;ptr -> Xn;0 <= lane <= 7 ST3 {Vt.b - Vt3.b}[lane],[Xn] A64 void vst4_lane_s8(int8_t *ptr, int8x8x4_t val, __builtin_constant_p(lane)) val.val[3] -> Vt4.8B;val.val[2] -> Vt3.8B;val.val[1] -> Vt2.8B;val.val[0] -> Vt.8B;ptr -> Xn;0 <= lane <= 7 ST4 {Vt.b - Vt4.b}[lane],[Xn] v7/A32/A64 void vst4_lane_u8(uint8_t *ptr, uint8x8x4_t val, __builtin_constant_p(lane)) val.val[3] -> Vt4.8B;val.val[2] -> Vt3.8B;val.val[1] -> Vt2.8B;val.val[0] -> Vt.8B;ptr -> Xn;0 <= lane <= 7 ST4 {Vt.b - Vt4.b}[lane],[Xn] v7/A32/A64 void vst4_lane_p8(poly8_t *ptr, poly8x8x4_t val, __builtin_constant_p(lane)) val.val[3] -> Vt4.8B;val.val[2] -> Vt3.8B;val.val[1] -> Vt2.8B;val.val[0] -> Vt.8B;ptr -> Xn;0 <= lane <= 7 ST4 {Vt.b - Vt4.b}[lane],[Xn] v7/A32/A64 +void vst4_lane_mf8(mfloat8_t *ptr, mfloat8x8x4_t val, __builtin_constant_p(lane)) val.val[3] -> Vt4.8B;val.val[2] -> Vt3.8B;val.val[1] -> Vt2.8B;val.val[0] -> Vt.8B;ptr -> Xn;0 <= lane <= 7 ST4 {Vt.b - Vt4.b}[lane],[Xn] A64 void vst2_lane_s16(int16_t *ptr, int16x4x2_t val, __builtin_constant_p(lane)) val.val[1] -> Vt2.4H;val.val[0] -> Vt.4H;ptr -> Xn;0 <= lane <= 3 ST2 {Vt.h - Vt2.h}[lane],[Xn] v7/A32/A64 void vst2q_lane_s16(int16_t *ptr, int16x8x2_t val, __builtin_constant_p(lane)) val.val[1] -> Vt2.8H;val.val[0] -> Vt.8H;ptr -> Xn;0 <= lane <= 7 ST2 {Vt.h - Vt2.h}[lane],[Xn] v7/A32/A64 void vst2_lane_s32(int32_t *ptr, int32x2x2_t val, __builtin_constant_p(lane)) val.val[1] -> Vt2.2S;val.val[0] -> Vt.2S;ptr -> Xn;0 <= lane <= 1 ST2 {Vt.s - Vt2.s}[lane],[Xn] v7/A32/A64 @@ -2576,6 +2631,7 @@ void vst2q_lane_p16(poly16_t *ptr, poly16x8x2_t val, __builtin_constant_p(lane)) void vst2q_lane_s8(int8_t *ptr, int8x16x2_t val, __builtin_constant_p(lane)) val.val[1] -> Vt2.16B;val.val[0] -> Vt.16B;ptr -> Xn;0 <= lane <= 15 ST2 {Vt.b - Vt2.b}[lane],[Xn] A64 void vst2q_lane_u8(uint8_t *ptr, uint8x16x2_t val, __builtin_constant_p(lane)) val.val[1] -> Vt2.16B;val.val[0] -> Vt.16B;ptr -> Xn;0 <= lane <= 15 ST2 {Vt.b - Vt2.b}[lane],[Xn] A64 void vst2q_lane_p8(poly8_t *ptr, poly8x16x2_t val, __builtin_constant_p(lane)) val.val[1] -> Vt2.16B;val.val[0] -> Vt.16B;ptr -> Xn;0 <= lane <= 15 ST2 {Vt.b - Vt2.b}[lane],[Xn] A64 +void vst2q_lane_mf8(mfloat8_t *ptr, mfloat8x16x2_t val, __builtin_constant_p(lane)) val.val[1] -> Vt2.16B;val.val[0] -> Vt.16B;ptr -> Xn;0 <= lane <= 15 ST2 {Vt.b - Vt2.b}[lane],[Xn] A64 void vst2_lane_s64(int64_t *ptr, int64x1x2_t val, __builtin_constant_p(lane)) val.val[1] -> Vt2.1D;val.val[0] -> Vt.1D;ptr -> Xn;0 <= lane <= 0 ST2 {Vt.d - Vt2.d}[lane],[Xn] A64 void vst2q_lane_s64(int64_t *ptr, int64x2x2_t val, __builtin_constant_p(lane)) val.val[1] -> Vt2.2D;val.val[0] -> Vt.2D;ptr -> Xn;0 <= lane <= 1 ST2 {Vt.d - Vt2.d}[lane],[Xn] A64 void vst2_lane_u64(uint64_t *ptr, uint64x1x2_t val, __builtin_constant_p(lane)) val.val[1] -> Vt2.1D;val.val[0] -> Vt.1D;ptr -> Xn;0 <= lane <= 0 ST2 {Vt.d - Vt2.d}[lane],[Xn] A64 @@ -2609,6 +2665,7 @@ void vst3_lane_p64(poly64_t *ptr, poly64x1x3_t val, __builtin_constant_p(lane)) void vst3q_lane_p64(poly64_t *ptr, poly64x2x3_t val, __builtin_constant_p(lane)) val.val[2] -> Vt3.2D;val.val[1] -> Vt2.2D;val.val[0] -> Vt.2D;ptr -> Xn;0 <= lane <= 1 ST3 {Vt.d - Vt3.d}[lane],[Xn] A64 void vst3_lane_f64(float64_t *ptr, float64x1x3_t val, __builtin_constant_p(lane)) val.val[2] -> Vt3.1D;val.val[1] -> Vt2.1D;val.val[0] -> Vt.1D;ptr -> Xn;0 <= lane <= 0 ST3 {Vt.d - Vt3.d}[lane],[Xn] A64 void vst3q_lane_f64(float64_t *ptr, float64x2x3_t val, __builtin_constant_p(lane)) val.val[2] -> Vt3.2D;val.val[1] -> Vt2.2D;val.val[0] -> Vt.2D;ptr -> Xn;0 <= lane <= 1 ST3 {Vt.d - Vt3.d}[lane],[Xn] A64 +void vst3q_lane_mf8(mfloat8_t *ptr, mfloat8x16x3_t val, __builtin_constant_p(lane)) val.val[2] -> Vt3.16B;val.val[1] -> Vt2.16B;val.val[0] -> Vt.16B;ptr -> Xn;0 <= lane <= 15 ST3 {Vt.b - Vt3.b}[lane],[Xn] A64 void vst4_lane_s16(int16_t *ptr, int16x4x4_t val, __builtin_constant_p(lane)) val.val[3] -> Vt4.4H;val.val[2] -> Vt3.4H;val.val[1] -> Vt2.4H;val.val[0] -> Vt.4H;ptr -> Xn;0 <= lane <= 3 ST4 {Vt.h - Vt4.h}[lane],[Xn] v7/A32/A64 void vst4q_lane_s16(int16_t *ptr, int16x8x4_t val, __builtin_constant_p(lane)) val.val[3] -> Vt4.8H;val.val[2] -> Vt3.8H;val.val[1] -> Vt2.8H;val.val[0] -> Vt.8H;ptr -> Xn;0 <= lane <= 7 ST4 {Vt.h - Vt4.h}[lane],[Xn] v7/A32/A64 void vst4_lane_s32(int32_t *ptr, int32x2x4_t val, __builtin_constant_p(lane)) val.val[3] -> Vt4.2S;val.val[2] -> Vt3.2S;val.val[1] -> Vt2.2S;val.val[0] -> Vt.2S;ptr -> Xn;0 <= lane <= 1 ST4 {Vt.s - Vt4.s}[lane],[Xn] v7/A32/A64 @@ -2634,6 +2691,7 @@ void vst4_lane_p64(poly64_t *ptr, poly64x1x4_t val, __builtin_constant_p(lane)) void vst4q_lane_p64(poly64_t *ptr, poly64x2x4_t val, __builtin_constant_p(lane)) val.val[3] -> Vt4.2D;val.val[2] -> Vt3.2D;val.val[1] -> Vt2.2D;val.val[0] -> Vt.2D;ptr -> Xn;0 <= lane <= 1 ST4 {Vt.d - Vt4.d}[lane],[Xn] A64 void vst4_lane_f64(float64_t *ptr, float64x1x4_t val, __builtin_constant_p(lane)) val.val[3] -> Vt4.1D;val.val[2] -> Vt3.1D;val.val[1] -> Vt2.1D;val.val[0] -> Vt.1D;ptr -> Xn;0 <= lane <= 0 ST4 {Vt.d - Vt4.d}[lane],[Xn] A64 void vst4q_lane_f64(float64_t *ptr, float64x2x4_t val, __builtin_constant_p(lane)) val.val[3] -> Vt4.2D;val.val[2] -> Vt3.2D;val.val[1] -> Vt2.2D;val.val[0] -> Vt.2D;ptr -> Xn;0 <= lane <= 1 ST4 {Vt.d - Vt4.d}[lane],[Xn] A64 +void vst4q_lane_mf8(mfloat8_t *ptr, mfloat8x16x4_t val, __builtin_constant_p(lane)) val.val[3] -> Vt4.16B;val.val[2] -> Vt3.16B;val.val[1] -> Vt2.16B;val.val[0] -> Vt.16B;ptr -> Xn;0 <= lane <= 15 ST4 {Vt.b - Vt4.b}[lane],[Xn] A64 void vst1_s8_x2(int8_t *ptr, int8x8x2_t val) val.val[1] -> Vt2.8B;val.val[0] -> Vt.8B;ptr -> Xn ST1 {Vt.8B - Vt2.8B},[Xn] v7/A32/A64 void vst1q_s8_x2(int8_t *ptr, int8x16x2_t val) val.val[1] -> Vt2.16B;val.val[0] -> Vt.16B;ptr -> Xn ST1 {Vt.16B - Vt2.16B},[Xn] v7/A32/A64 void vst1_s16_x2(int16_t *ptr, int16x4x2_t val) val.val[1] -> Vt2.4H;val.val[0] -> Vt.4H;ptr -> Xn ST1 {Vt.4H - Vt2.4H},[Xn] v7/A32/A64 @@ -2662,6 +2720,8 @@ void vst1q_u64_x2(uint64_t *ptr, uint64x2x2_t val) val.val[1] -> Vt2.2D;val.val[ void vst1q_p64_x2(poly64_t *ptr, poly64x2x2_t val) val.val[1] -> Vt2.2D;val.val[0] -> Vt.2D;ptr -> Xn ST1 {Vt.2D - Vt2.2D},[Xn] A32/A64 void vst1_f64_x2(float64_t *ptr, float64x1x2_t val) val.val[1] -> Vt2.1D;val.val[0] -> Vt.1D;ptr -> Xn ST1 {Vt.1D - Vt2.1D},[Xn] A64 void vst1q_f64_x2(float64_t *ptr, float64x2x2_t val) val.val[1] -> Vt2.2D;val.val[0] -> Vt.2D;ptr -> Xn ST1 {Vt.2D - Vt2.2D},[Xn] A64 +void vst1_mf8_x2(mfloat8_t *ptr, mfloat8x8x2_t val) val.val[1] -> Vt2.8B;val.val[0] -> Vt.8B;ptr -> Xn ST1 {Vt.8B - Vt2.8B},[Xn] A64 +void vst1q_mf8_x2(mfloat8_t *ptr, mfloat8x16x2_t val) val.val[1] -> Vt2.16B;val.val[0] -> Vt.16B;ptr -> Xn ST1 {Vt.16B - Vt2.16B},[Xn] A64 void vst1_s8_x3(int8_t *ptr, int8x8x3_t val) val.val[2] -> Vt3.8B;val.val[1] -> Vt2.8B;val.val[0] -> Vt.8B;ptr -> Xn ST1 {Vt.8B - Vt3.8B},[Xn] v7/A32/A64 void vst1q_s8_x3(int8_t *ptr, int8x16x3_t val) val.val[2] -> Vt3.16B;val.val[1] -> Vt2.16B;val.val[0] -> Vt.16B;ptr -> Xn ST1 {Vt.16B - Vt3.16B},[Xn] v7/A32/A64 void vst1_s16_x3(int16_t *ptr, int16x4x3_t val) val.val[2] -> Vt3.4H;val.val[1] -> Vt2.4H;val.val[0] -> Vt.4H;ptr -> Xn ST1 {Vt.4H - Vt3.4H},[Xn] v7/A32/A64 @@ -2690,6 +2750,8 @@ void vst1q_u64_x3(uint64_t *ptr, uint64x2x3_t val) val.val[2] -> Vt3.2D;val.val[ void vst1q_p64_x3(poly64_t *ptr, poly64x2x3_t val) val.val[2] -> Vt3.2D;val.val[1] -> Vt2.2D;val.val[0] -> Vt.2D;ptr -> Xn ST1 {Vt.2D - Vt3.2D},[Xn] v7/A32/A64 void vst1_f64_x3(float64_t *ptr, float64x1x3_t val) val.val[2] -> Vt3.1D;val.val[1] -> Vt2.1D;val.val[0] -> Vt.1D;ptr -> Xn ST1 {Vt.1D - Vt3.1D},[Xn] A64 void vst1q_f64_x3(float64_t *ptr, float64x2x3_t val) val.val[2] -> Vt3.2D;val.val[1] -> Vt2.2D;val.val[0] -> Vt.2D;ptr -> Xn ST1 {Vt.2D - Vt3.2D},[Xn] A64 +void vst1_mf8_x3(mfloat8_t *ptr, mfloat8x8x3_t val) val.val[2] -> Vt3.8B;val.val[1] -> Vt2.8B;val.val[0] -> Vt.8B;ptr -> Xn ST1 {Vt.8B - Vt3.8B},[Xn] A64 +void vst1q_mf8_x3(mfloat8_t *ptr, mfloat8x16x3_t val) val.val[2] -> Vt3.16B;val.val[1] -> Vt2.16B;val.val[0] -> Vt.16B;ptr -> Xn ST1 {Vt.16B - Vt3.16B},[Xn] A64 void vst1_s8_x4(int8_t *ptr, int8x8x4_t val) val.val[3] -> Vt4.8B;val.val[2] -> Vt3.8B;val.val[1] -> Vt2.8B;val.val[0] -> Vt.8B;ptr -> Xn ST1 {Vt.8B - Vt4.8B},[Xn] v7/A32/A64 void vst1q_s8_x4(int8_t *ptr, int8x16x4_t val) val.val[3] -> Vt4.16B;val.val[2] -> Vt3.16B;val.val[1] -> Vt2.16B;val.val[0] -> Vt.16B;ptr -> Xn ST1 {Vt.16B - Vt4.16B},[Xn] v7/A32/A64 void vst1_s16_x4(int16_t *ptr, int16x4x4_t val) val.val[3] -> Vt4.4H;val.val[2] -> Vt3.4H;val.val[1] -> Vt2.4H;val.val[0] -> Vt.4H;ptr -> Xn ST1 {Vt.4H - Vt4.4H},[Xn] v7/A32/A64 @@ -2718,6 +2780,8 @@ void vst1q_u64_x4(uint64_t *ptr, uint64x2x4_t val) val.val[3] -> Vt4.2D;val.val[ void vst1q_p64_x4(poly64_t *ptr, poly64x2x4_t val) val.val[3] -> Vt4.2D;val.val[2] -> Vt3.2D;val.val[1] -> Vt2.2D;val.val[0] -> Vt.2D;ptr -> Xn ST1 {Vt.2D - Vt4.2D},[Xn] A32/A64 void vst1_f64_x4(float64_t *ptr, float64x1x4_t val) val.val[3] -> Vt4.1D;val.val[2] -> Vt3.1D;val.val[1] -> Vt2.1D;val.val[0] -> Vt.1D;ptr -> Xn ST1 {Vt.1D - Vt4.1D},[Xn] A64 void vst1q_f64_x4(float64_t *ptr, float64x2x4_t val) val.val[3] -> Vt4.2D;val.val[2] -> Vt3.2D;val.val[1] -> Vt2.2D;val.val[0] -> Vt.2D;ptr -> Xn ST1 {Vt.2D - Vt4.2D},[Xn] A64 +void vst1_mf8_x4(int8_t *ptr, int8x8x4_t val) val.val[3] -> Vt4.8B;val.val[2] -> Vt3.8B;val.val[1] -> Vt2.8B;val.val[0] -> Vt.8B;ptr -> Xn ST1 {Vt.8B - Vt4.8B},[Xn] A64 +void vst1q_mf8_x4(int8_t *ptr, int8x16x4_t val) val.val[3] -> Vt4.16B;val.val[2] -> Vt3.16B;val.val[1] -> Vt2.16B;val.val[0] -> Vt.16B;ptr -> Xn ST1 {Vt.16B - Vt4.16B},[Xn] A64 int8x8x2_t vld1_s8_x2(int8_t const *ptr) ptr -> Xn LD1 {Vt.8B - Vt2.8B},[Xn] Vt2.8B -> result.val[1];Vt.8B -> result.val[0] v7/A32/A64 int8x16x2_t vld1q_s8_x2(int8_t const *ptr) ptr -> Xn LD1 {Vt.16B - Vt2.16B},[Xn] Vt2.16B -> result.val[1];Vt.16B -> result.val[0] v7/A32/A64 int16x4x2_t vld1_s16_x2(int16_t const *ptr) ptr -> Xn LD1 {Vt.4H - Vt2.4H},[Xn] Vt2.4H -> result.val[1];Vt.4H -> result.val[0] v7/A32/A64 @@ -2746,6 +2810,8 @@ uint64x2x2_t vld1q_u64_x2(uint64_t const *ptr) ptr -> Xn LD1 {Vt.2D - Vt2.2D},[X poly64x2x2_t vld1q_p64_x2(poly64_t const *ptr) ptr -> Xn LD1 {Vt.2D - Vt2.2D},[Xn] Vt2.2D -> result.val[1];Vt.2D -> result.val[0] A32/A64 float64x1x2_t vld1_f64_x2(float64_t const *ptr) ptr -> Xn LD1 {Vt.1D - Vt2.1D},[Xn] Vt2.1D -> result.val[1];Vt.1D -> result.val[0] A64 float64x2x2_t vld1q_f64_x2(float64_t const *ptr) ptr -> Xn LD1 {Vt.2D - Vt2.2D},[Xn] Vt2.2D -> result.val[1];Vt.2D -> result.val[0] A64 +mfloat8x8x2_t vld1_mf8_x2(mfloat8_t const *ptr) ptr -> Xn LD1 {Vt.8B - Vt2.8B},[Xn] Vt2.8B -> result.val[1];Vt.8B -> result.val[0] A64 +mfloat8x16x2_t vld1q_mf8_x2(mfloat8_t const *ptr) ptr -> Xn LD1 {Vt.16B - Vt2.16B},[Xn] Vt2.16B -> result.val[1];Vt.16B -> result.val[0] A64 int8x8x3_t vld1_s8_x3(int8_t const *ptr) ptr -> Xn LD1 {Vt.8B - Vt3.8B},[Xn] Vt3.8B -> result.val[2];Vt2.8B -> result.val[1];Vt.8B -> result.val[0] v7/A32/A64 int8x16x3_t vld1q_s8_x3(int8_t const *ptr) ptr -> Xn LD1 {Vt.16B - Vt3.16B},[Xn] Vt3.16B -> result.val[2];Vt2.16B -> result.val[1];Vt.16B -> result.val[0] v7/A32/A64 int16x4x3_t vld1_s16_x3(int16_t const *ptr) ptr -> Xn LD1 {Vt.4H - Vt3.4H},[Xn] Vt3.4H -> result.val[2];Vt2.4H -> result.val[1];Vt.4H -> result.val[0] v7/A32/A64 @@ -2774,6 +2840,8 @@ uint64x2x3_t vld1q_u64_x3(uint64_t const *ptr) ptr -> Xn LD1 {Vt.2D - Vt3.2D},[X poly64x2x3_t vld1q_p64_x3(poly64_t const *ptr) ptr -> Xn LD1 {Vt.2D - Vt3.2D},[Xn] Vt3.2D -> result.val[2];Vt2.2D -> result.val[1];Vt.2D -> result.val[0] A32/A64 float64x1x3_t vld1_f64_x3(float64_t const *ptr) ptr -> Xn LD1 {Vt.1D - Vt3.1D},[Xn] Vt3.1D -> result.val[2];Vt2.1D -> result.val[1];Vt.1D -> result.val[0] A64 float64x2x3_t vld1q_f64_x3(float64_t const *ptr) ptr -> Xn LD1 {Vt.2D - Vt3.2D},[Xn] Vt3.2D -> result.val[2];Vt2.2D -> result.val[1];Vt.2D -> result.val[0] A64 +mfloat8x8x3_t vld1_mf8_x3(mfloat8_t const *ptr) ptr -> Xn LD1 {Vt.8B - Vt3.8B},[Xn] Vt3.8B -> result.val[2];Vt2.8B -> result.val[1];Vt.8B -> result.val[0] A64 +mfloat8x16x3_t vld1q_mf8_x3(mfloat8_t const *ptr) ptr -> Xn LD1 {Vt.16B - Vt3.16B},[Xn] Vt3.16B -> result.val[2];Vt2.16B -> result.val[1];Vt.16B -> result.val[0] A64 int8x8x4_t vld1_s8_x4(int8_t const *ptr) ptr -> Xn LD1 {Vt.8B - Vt4.8B},[Xn] Vt4.8B -> result.val[3];Vt3.8B -> result.val[2];Vt2.8B -> result.val[1];Vt.8B -> result.val[0] v7/A32/A64 int8x16x4_t vld1q_s8_x4(int8_t const *ptr) ptr -> Xn LD1 {Vt.16B - Vt4.16B},[Xn] Vt4.16B -> result.val[3];Vt3.16B -> result.val[2];Vt2.16B -> result.val[1];Vt.16B -> result.val[0] v7/A32/A64 int16x4x4_t vld1_s16_x4(int16_t const *ptr) ptr -> Xn LD1 {Vt.4H - Vt4.4H},[Xn] Vt4.4H -> result.val[3];Vt3.4H -> result.val[2];Vt2.4H -> result.val[1];Vt.4H -> result.val[0] v7/A32/A64 @@ -2802,6 +2870,8 @@ uint64x2x4_t vld1q_u64_x4(uint64_t const *ptr) ptr -> Xn LD1 {Vt.2D - Vt4.2D},[X poly64x2x4_t vld1q_p64_x4(poly64_t const *ptr) ptr -> Xn LD1 {Vt.2D - Vt4.2D},[Xn] Vt4.2D -> result.val[3];Vt3.2D -> result.val[2];Vt2.2D -> result.val[1];Vt.2D -> result.val[0] A32/A64 float64x1x4_t vld1_f64_x4(float64_t const *ptr) ptr -> Xn LD1 {Vt.1D - Vt4.1D},[Xn] Vt4.1D -> result.val[3];Vt3.1D -> result.val[2];Vt2.1D -> result.val[1];Vt.1D -> result.val[0] A64 float64x2x4_t vld1q_f64_x4(float64_t const *ptr) ptr -> Xn LD1 {Vt.2D - Vt4.2D},[Xn] Vt4.2D -> result.val[3];Vt3.2D -> result.val[2];Vt2.2D -> result.val[1];Vt.2D -> result.val[0] A64 +mfloat8x8x4_t vld1_mf8_x4(mfloat8_t const *ptr) ptr -> Xn LD1 {Vt.8B - Vt4.8B},[Xn] Vt4.8B -> result.val[3];Vt3.8B -> result.val[2];Vt2.8B -> result.val[1];Vt.8B -> result.val[0] A64 +mfloat8x16x4_t vld1q_mf8_x4(mfloat8_t const *ptr) ptr -> Xn LD1 {Vt.16B - Vt4.16B},[Xn] Vt4.16B -> result.val[3];Vt3.16B -> result.val[2];Vt2.16B -> result.val[1];Vt.16B -> result.val[0] A64 int8x8_t vpadd_s8(int8x8_t a, int8x8_t b) a -> Vn.8B;b -> Vm.8B ADDP Vd.8B,Vn.8B,Vm.8B Vd.8B -> result v7/A32/A64 int16x4_t vpadd_s16(int16x4_t a, int16x4_t b) a -> Vn.4H;b -> Vm.4H ADDP Vd.4H,Vn.4H,Vm.4H Vd.4H -> result v7/A32/A64 int32x2_t vpadd_s32(int32x2_t a, int32x2_t b) a -> Vn.2S;b -> Vm.2S ADDP Vd.2S,Vn.2S,Vm.2S Vd.2S -> result v7/A32/A64 @@ -2982,6 +3052,8 @@ poly8x8_t vext_p8(poly8x8_t a, poly8x8_t b, __builtin_constant_p(n)) a -> Vn.8B; poly8x16_t vextq_p8(poly8x16_t a, poly8x16_t b, __builtin_constant_p(n)) a -> Vn.16B;b -> Vm.16B;0 <= n <= 15 EXT Vd.16B,Vn.16B,Vm.16B,#n Vd.16B -> result v7/A32/A64 poly16x4_t vext_p16(poly16x4_t a, poly16x4_t b, __builtin_constant_p(n)) a -> Vn.8B;b -> Vm.8B;0 <= n <= 3 EXT Vd.8B,Vn.8B,Vm.8B,#(n<<1) Vd.8B -> result v7/A32/A64 poly16x8_t vextq_p16(poly16x8_t a, poly16x8_t b, __builtin_constant_p(n)) a -> Vn.16B;b -> Vm.16B;0 <= n <= 7 EXT Vd.16B,Vn.16B,Vm.16B,#(n<<1) Vd.16B -> result v7/A32/A64 +mfloat8x8_t vext_mf8(mfloat8x8_t a, mfloat8x8_t b, __builtin_constant_p(n)) a -> Vn.8B;b -> Vm.8B;0 <= n <= 7 EXT Vd.8B,Vn.8B,Vm.8B,#n Vd.8B -> result A64 +mfloat8x16_t vextq_mf8(mfloat8x16_t a, mfloat8x16_t b, __builtin_constant_p(n)) a -> Vn.16B;b -> Vm.16B;0 <= n <= 15 EXT Vd.16B,Vn.16B,Vm.16B,#n Vd.16B -> result A64 int8x8_t vrev64_s8(int8x8_t vec) vec -> Vn.8B REV64 Vd.8B,Vn.8B Vd.8B -> result v7/A32/A64 int8x16_t vrev64q_s8(int8x16_t vec) vec -> Vn.16B REV64 Vd.16B,Vn.16B Vd.16B -> result v7/A32/A64 int16x4_t vrev64_s16(int16x4_t vec) vec -> Vn.4H REV64 Vd.4H,Vn.4H Vd.4H -> result v7/A32/A64 @@ -3000,6 +3072,8 @@ poly8x8_t vrev64_p8(poly8x8_t vec) vec -> Vn.8B REV64 Vd.8B,Vn.8B Vd.8B -> resul poly8x16_t vrev64q_p8(poly8x16_t vec) vec -> Vn.16B REV64 Vd.16B,Vn.16B Vd.16B -> result v7/A32/A64 poly16x4_t vrev64_p16(poly16x4_t vec) vec -> Vn.4H REV64 Vd.4H,Vn.4H Vd.4H -> result v7/A32/A64 poly16x8_t vrev64q_p16(poly16x8_t vec) vec -> Vn.8H REV64 Vd.8H,Vn.8H Vd.8H -> result v7/A32/A64 +mfloat8x8_t vrev64_mf8(mfloat8x8_t vec) vec -> Vn.8B REV64 Vd.8B,Vn.8B Vd.8B -> result A64 +mfloat8x16_t vrev64q_mf8(mfloat8x16_t vec) vec -> Vn.16B REV64 Vd.16B,Vn.16B Vd.16B -> result A64 int8x8_t vrev32_s8(int8x8_t vec) vec -> Vn.8B REV32 Vd.8B,Vn.8B Vd.8B -> result v7/A32/A64 int8x16_t vrev32q_s8(int8x16_t vec) vec -> Vn.16B REV32 Vd.16B,Vn.16B Vd.16B -> result v7/A32/A64 int16x4_t vrev32_s16(int16x4_t vec) vec -> Vn.4H REV32 Vd.4H,Vn.4H Vd.4H -> result v7/A32/A64 @@ -3012,12 +3086,16 @@ poly8x8_t vrev32_p8(poly8x8_t vec) vec -> Vn.8B REV32 Vd.8B,Vn.8B Vd.8B -> resul poly8x16_t vrev32q_p8(poly8x16_t vec) vec -> Vn.16B REV32 Vd.16B,Vn.16B Vd.16B -> result v7/A32/A64 poly16x4_t vrev32_p16(poly16x4_t vec) vec -> Vn.4H REV32 Vd.4H,Vn.4H Vd.4H -> result v7/A32/A64 poly16x8_t vrev32q_p16(poly16x8_t vec) vec -> Vn.8H REV32 Vd.8H,Vn.8H Vd.8H -> result v7/A32/A64 +mfloat8x8_t vrev32_mf8(mfloat8x8_t vec) vec -> Vn.8B REV32 Vd.8B,Vn.8B Vd.8B -> result A64 +mfloat8x16_t vrev32q_mf8(mfloat8x16_t vec) vec -> Vn.16B REV32 Vd.16B,Vn.16B Vd.16B -> result A64 int8x8_t vrev16_s8(int8x8_t vec) vec -> Vn.8B REV16 Vd.8B,Vn.8B Vd.8B -> result v7/A32/A64 int8x16_t vrev16q_s8(int8x16_t vec) vec -> Vn.16B REV16 Vd.16B,Vn.16B Vd.16B -> result v7/A32/A64 uint8x8_t vrev16_u8(uint8x8_t vec) vec -> Vn.8B REV16 Vd.8B,Vn.8B Vd.8B -> result v7/A32/A64 uint8x16_t vrev16q_u8(uint8x16_t vec) vec -> Vn.16B REV16 Vd.16B,Vn.16B Vd.16B -> result v7/A32/A64 poly8x8_t vrev16_p8(poly8x8_t vec) vec -> Vn.8B REV16 Vd.8B,Vn.8B Vd.8B -> result v7/A32/A64 poly8x16_t vrev16q_p8(poly8x16_t vec) vec -> Vn.16B REV16 Vd.16B,Vn.16B Vd.16B -> result v7/A32/A64 +mfloat8x8_t vrev16_mf8(mfloat8x8_t vec) vec -> Vn.8B REV16 Vd.8B,Vn.8B Vd.8B -> result A64 +mfloat8x16_t vrev16q_mf8(mfloat8x16_t vec) vec -> Vn.16B REV16 Vd.16B,Vn.16B Vd.16B -> result A64 int8x8_t vzip1_s8(int8x8_t a, int8x8_t b) a -> Vn.8B;b -> Vm.8B ZIP1 Vd.8B,Vn.8B,Vm.8B Vd.8B -> result A64 int8x16_t vzip1q_s8(int8x16_t a, int8x16_t b) a -> Vn.16B;b -> Vm.16B ZIP1 Vd.16B,Vn.16B,Vm.16B Vd.16B -> result A64 int16x4_t vzip1_s16(int16x4_t a, int16x4_t b) a -> Vn.4H;b -> Vm.4H ZIP1 Vd.4H,Vn.4H,Vm.4H Vd.4H -> result A64 @@ -3040,6 +3118,8 @@ poly8x8_t vzip1_p8(poly8x8_t a, poly8x8_t b) a -> Vn.8B;b -> Vm.8B ZIP1 Vd.8B,Vn poly8x16_t vzip1q_p8(poly8x16_t a, poly8x16_t b) a -> Vn.16B;b -> Vm.16B ZIP1 Vd.16B,Vn.16B,Vm.16B Vd.16B -> result A64 poly16x4_t vzip1_p16(poly16x4_t a, poly16x4_t b) a -> Vn.4H;b -> Vm.4H ZIP1 Vd.4H,Vn.4H,Vm.4H Vd.4H -> result A64 poly16x8_t vzip1q_p16(poly16x8_t a, poly16x8_t b) a -> Vn.8H;b -> Vm.8H ZIP1 Vd.8H,Vn.8H,Vm.8H Vd.8H -> result A64 +mfloat8x8_t vzip1_mf8(mfloat8x8_t a, mfloat8x8_t b) a -> Vn.8B;b -> Vm.8B ZIP1 Vd.8B,Vn.8B,Vm.8B Vd.8B -> result A64 +mfloat8x16_t vzip1q_mf8(mfloat8x16_t a, mfloat8x16_t b) a -> Vn.16B;b -> Vm.16B ZIP1 Vd.16B,Vn.16B,Vm.16B Vd.16B -> result A64 int8x8_t vzip2_s8(int8x8_t a, int8x8_t b) a -> Vn.8B;b -> Vm.8B ZIP2 Vd.8B,Vn.8B,Vm.8B Vd.8B -> result A64 int8x16_t vzip2q_s8(int8x16_t a, int8x16_t b) a -> Vn.16B;b -> Vm.16B ZIP2 Vd.16B,Vn.16B,Vm.16B Vd.16B -> result A64 int16x4_t vzip2_s16(int16x4_t a, int16x4_t b) a -> Vn.4H;b -> Vm.4H ZIP2 Vd.4H,Vn.4H,Vm.4H Vd.4H -> result A64 @@ -3062,6 +3142,8 @@ poly8x8_t vzip2_p8(poly8x8_t a, poly8x8_t b) a -> Vn.8B;b -> Vm.8B ZIP2 Vd.8B,Vn poly8x16_t vzip2q_p8(poly8x16_t a, poly8x16_t b) a -> Vn.16B;b -> Vm.16B ZIP2 Vd.16B,Vn.16B,Vm.16B Vd.16B -> result A64 poly16x4_t vzip2_p16(poly16x4_t a, poly16x4_t b) a -> Vn.4H;b -> Vm.4H ZIP2 Vd.4H,Vn.4H,Vm.4H Vd.4H -> result A64 poly16x8_t vzip2q_p16(poly16x8_t a, poly16x8_t b) a -> Vn.8H;b -> Vm.8H ZIP2 Vd.8H,Vn.8H,Vm.8H Vd.8H -> result A64 +mfloat8x8_t vzip2_mf8(mfloat8x8_t a, mfloat8x8_t b) a -> Vn.8B;b -> Vm.8B ZIP2 Vd.8B,Vn.8B,Vm.8B Vd.8B -> result A64 +mfloat8x16_t vzip2q_mf8(mfloat8x16_t a, mfloat8x16_t b) a -> Vn.16B;b -> Vm.16B ZIP2 Vd.16B,Vn.16B,Vm.16B Vd.16B -> result A64 int8x8_t vuzp1_s8(int8x8_t a, int8x8_t b) a -> Vn.8B;b -> Vm.8B UZP1 Vd.8B,Vn.8B,Vm.8B Vd.8B -> result A64 int8x16_t vuzp1q_s8(int8x16_t a, int8x16_t b) a -> Vn.16B;b -> Vm.16B UZP1 Vd.16B,Vn.16B,Vm.16B Vd.16B -> result A64 int16x4_t vuzp1_s16(int16x4_t a, int16x4_t b) a -> Vn.4H;b -> Vm.4H UZP1 Vd.4H,Vn.4H,Vm.4H Vd.4H -> result A64 @@ -3084,6 +3166,8 @@ poly8x8_t vuzp1_p8(poly8x8_t a, poly8x8_t b) a -> Vn.8B;b -> Vm.8B UZP1 Vd.8B,Vn poly8x16_t vuzp1q_p8(poly8x16_t a, poly8x16_t b) a -> Vn.16B;b -> Vm.16B UZP1 Vd.16B,Vn.16B,Vm.16B Vd.16B -> result A64 poly16x4_t vuzp1_p16(poly16x4_t a, poly16x4_t b) a -> Vn.4H;b -> Vm.4H UZP1 Vd.4H,Vn.4H,Vm.4H Vd.4H -> result A64 poly16x8_t vuzp1q_p16(poly16x8_t a, poly16x8_t b) a -> Vn.8H;b -> Vm.8H UZP1 Vd.8H,Vn.8H,Vm.8H Vd.8H -> result A64 +mfloat8x8_t vuzp1_mf8(mfloat8x8_t a, mfloat8x8_t b) a -> Vn.8B;b -> Vm.8B UZP1 Vd.8B,Vn.8B,Vm.8B Vd.8B -> result A64 +mfloat8x16_t vuzp1q_mf8(mfloat8x16_t a, mfloat8x16_t b) a -> Vn.16B;b -> Vm.16B UZP1 Vd.16B,Vn.16B,Vm.16B Vd.16B -> result A64 int8x8_t vuzp2_s8(int8x8_t a, int8x8_t b) a -> Vn.8B;b -> Vm.8B UZP2 Vd.8B,Vn.8B,Vm.8B Vd.8B -> result A64 int8x16_t vuzp2q_s8(int8x16_t a, int8x16_t b) a -> Vn.16B;b -> Vm.16B UZP2 Vd.16B,Vn.16B,Vm.16B Vd.16B -> result A64 int16x4_t vuzp2_s16(int16x4_t a, int16x4_t b) a -> Vn.4H;b -> Vm.4H UZP2 Vd.4H,Vn.4H,Vm.4H Vd.4H -> result A64 @@ -3106,6 +3190,8 @@ poly8x8_t vuzp2_p8(poly8x8_t a, poly8x8_t b) a -> Vn.8B;b -> Vm.8B UZP2 Vd.8B,Vn poly8x16_t vuzp2q_p8(poly8x16_t a, poly8x16_t b) a -> Vn.16B;b -> Vm.16B UZP2 Vd.16B,Vn.16B,Vm.16B Vd.16B -> result A64 poly16x4_t vuzp2_p16(poly16x4_t a, poly16x4_t b) a -> Vn.4H;b -> Vm.4H UZP2 Vd.4H,Vn.4H,Vm.4H Vd.4H -> result A64 poly16x8_t vuzp2q_p16(poly16x8_t a, poly16x8_t b) a -> Vn.8H;b -> Vm.8H UZP2 Vd.8H,Vn.8H,Vm.8H Vd.8H -> result A64 +mfloat8x8_t vuzp2_mf8(mfloat8x8_t a, mfloat8x8_t b) a -> Vn.8B;b -> Vm.8B UZP2 Vd.8B,Vn.8B,Vm.8B Vd.8B -> result A64 +mfloat8x16_t vuzp2q_mf8(mfloat8x16_t a, mfloat8x16_t b) a -> Vn.16B;b -> Vm.16B UZP2 Vd.16B,Vn.16B,Vm.16B Vd.16B -> result A64 int8x8_t vtrn1_s8(int8x8_t a, int8x8_t b) a -> Vn.8B;b -> Vm.8B TRN1 Vd.8B,Vn.8B,Vm.8B Vd.8B -> result A64 int8x16_t vtrn1q_s8(int8x16_t a, int8x16_t b) a -> Vn.16B;b -> Vm.16B TRN1 Vd.16B,Vn.16B,Vm.16B Vd.16B -> result A64 int16x4_t vtrn1_s16(int16x4_t a, int16x4_t b) a -> Vn.4H;b -> Vm.4H TRN1 Vd.4H,Vn.4H,Vm.4H Vd.4H -> result A64 @@ -3128,6 +3214,8 @@ poly8x8_t vtrn1_p8(poly8x8_t a, poly8x8_t b) a -> Vn.8B;b -> Vm.8B TRN1 Vd.8B,Vn poly8x16_t vtrn1q_p8(poly8x16_t a, poly8x16_t b) a -> Vn.16B;b -> Vm.16B TRN1 Vd.16B,Vn.16B,Vm.16B Vd.16B -> result A64 poly16x4_t vtrn1_p16(poly16x4_t a, poly16x4_t b) a -> Vn.4H;b -> Vm.4H TRN1 Vd.4H,Vn.4H,Vm.4H Vd.4H -> result A64 poly16x8_t vtrn1q_p16(poly16x8_t a, poly16x8_t b) a -> Vn.8H;b -> Vm.8H TRN1 Vd.8H,Vn.8H,Vm.8H Vd.8H -> result A64 +mfloat8x8_t vtrn1_mf8(mfloat8x8_t a, mfloat8x8_t b) a -> Vn.8B;b -> Vm.8B TRN1 Vd.8B,Vn.8B,Vm.8B Vd.8B -> result A64 +mfloat8x16_t vtrn1q_mf8(mfloat8x16_t a, mfloat8x16_t b) a -> Vn.16B;b -> Vm.16B TRN1 Vd.16B,Vn.16B,Vm.16B Vd.16B -> result A64 int8x8_t vtrn2_s8(int8x8_t a, int8x8_t b) a -> Vn.8B;b -> Vm.8B TRN2 Vd.8B,Vn.8B,Vm.8B Vd.8B -> result A64 int8x16_t vtrn2q_s8(int8x16_t a, int8x16_t b) a -> Vn.16B;b -> Vm.16B TRN2 Vd.16B,Vn.16B,Vm.16B Vd.16B -> result A64 int16x4_t vtrn2_s16(int16x4_t a, int16x4_t b) a -> Vn.4H;b -> Vm.4H TRN2 Vd.4H,Vn.4H,Vm.4H Vd.4H -> result A64 @@ -3150,78 +3238,104 @@ poly8x8_t vtrn2_p8(poly8x8_t a, poly8x8_t b) a -> Vn.8B;b -> Vm.8B TRN2 Vd.8B,Vn poly8x16_t vtrn2q_p8(poly8x16_t a, poly8x16_t b) a -> Vn.16B;b -> Vm.16B TRN2 Vd.16B,Vn.16B,Vm.16B Vd.16B -> result A64 poly16x4_t vtrn2_p16(poly16x4_t a, poly16x4_t b) a -> Vn.4H;b -> Vm.4H TRN2 Vd.4H,Vn.4H,Vm.4H Vd.4H -> result A64 poly16x8_t vtrn2q_p16(poly16x8_t a, poly16x8_t b) a -> Vn.8H;b -> Vm.8H TRN2 Vd.8H,Vn.8H,Vm.8H Vd.8H -> result A64 +mfloat8x8_t vtrn2_mf8(mfloat8x8_t a, mfloat8x8_t b) a -> Vn.8B;b -> Vm.8B TRN2 Vd.8B,Vn.8B,Vm.8B Vd.8B -> result A64 +mfloat8x16_t vtrn2q_mf8(mfloat8x16_t a, mfloat8x16_t b) a -> Vn.16B;b -> Vm.16B TRN2 Vd.16B,Vn.16B,Vm.16B Vd.16B -> result A64 int8x8_t vtbl1_s8(int8x8_t a, int8x8_t idx) Zeros(64):a -> Vn.16B;idx -> Vm.8B TBL Vd.8B,{Vn.16B},Vm.8B Vd.8B -> result v7/A32/A64 uint8x8_t vtbl1_u8(uint8x8_t a, uint8x8_t idx) Zeros(64):a -> Vn.16B;idx -> Vm.8B TBL Vd.8B,{Vn.16B},Vm.8B Vd.8B -> result v7/A32/A64 poly8x8_t vtbl1_p8(poly8x8_t a, uint8x8_t idx) Zeros(64):a -> Vn.16B;idx -> Vm.8B TBL Vd.8B,{Vn.16B},Vm.8B Vd.8B -> result v7/A32/A64 +mfloat8x8_t vtbl1_mf8(mfloat8x8_t a, uint8x8_t idx) Zeros(64):a -> Vn.16B;idx -> Vm.8B TBL Vd.8B,{Vn.16B},Vm.8B Vd.8B -> result A64 int8x8_t vtbx1_s8(int8x8_t a, int8x8_t b, int8x8_t idx) Zeros(64):b -> Vn.16B;a -> Vd.8B;idx -> Vm.8B MOVI Vtmp.8B,#8;CMHS Vtmp.8B,Vm.8B,Vtmp.8B;TBL Vtmp1.8B,{Vn.16B},Vm.8B;BIF Vd.8B,Vtmp1.8B,Vtmp.8B Vd.8B -> result v7/A32/A64 uint8x8_t vtbx1_u8(uint8x8_t a, uint8x8_t b, uint8x8_t idx) Zeros(64):b -> Vn.16B;a -> Vd.8B;idx -> Vm.8B MOVI Vtmp.8B,#8;CMHS Vtmp.8B,Vm.8B,Vtmp.8B;TBL Vtmp1.8B,{Vn.16B},Vm.8B;BIF Vd.8B,Vtmp1.8B,Vtmp.8B Vd.8B -> result v7/A32/A64 poly8x8_t vtbx1_p8(poly8x8_t a, poly8x8_t b, uint8x8_t idx) Zeros(64):b -> Vn.16B;a -> Vd.8B;idx -> Vm.8B MOVI Vtmp.8B,#8;CMHS Vtmp.8B,Vm.8B,Vtmp.8B;TBL Vtmp1.8B,{Vn.16B},Vm.8B;BIF Vd.8B,Vtmp1.8B, Vtmp.8B Vd.8B -> result v7/A32/A64 +mfloat8x8_t vtbx1_mf8(mfloat8x8_t a, mfloat8x8_t b, uint8x8_t idx) Zeros(64):b -> Vn.16B;a -> Vd.8B;idx -> Vm.8B MOVI Vtmp.8B,#8;CMHS Vtmp.8B,Vm.8B,Vtmp.8B;TBL Vtmp1.8B,{Vn.16B},Vm.8B;BIF Vd.8B,Vtmp1.8B, Vtmp.8B Vd.8B -> result A64 int8x8_t vtbl2_s8(int8x8x2_t a, int8x8_t idx) a.val[1]:a.val[0] -> Vn.16B;idx -> Vm.8B TBL Vd.8B,{Vn.16B},Vm.8B Vd.8B -> result v7/A32/A64 uint8x8_t vtbl2_u8(uint8x8x2_t a, uint8x8_t idx) a.val[1]:a.val[0] -> Vn.16B;idx -> Vm.8B TBL Vd.8B,{Vn.16B},Vm.8B Vd.8B -> result v7/A32/A64 poly8x8_t vtbl2_p8(poly8x8x2_t a, uint8x8_t idx) a.val[1]:a.val[0] -> Vn.16B;idx -> Vm.8B TBL Vd.8B,{Vn.16B},Vm.8B Vd.8B -> result v7/A32/A64 +mfloat8x8_t vtbl2_mf8(mfloat8x8x2_t a, uint8x8_t idx) a.val[1]:a.val[0] -> Vn.16B;idx -> Vm.8B TBL Vd.8B,{Vn.16B},Vm.8B Vd.8B -> result A64 int8x8_t vtbl3_s8(int8x8x3_t a, int8x8_t idx) a.val[1]:a.val[0] -> Vn.16B;Zeros(64):a.val[2] -> Vn+1.16B;idx -> Vm.8B TBL Vd.8B,{Vn.16B,Vn+1.16B},Vm.8B Vd.8B -> result v7/A32/A64 uint8x8_t vtbl3_u8(uint8x8x3_t a, uint8x8_t idx) a.val[1]:a.val[0] -> Vn.16B;Zeros(64):a.val[2] -> Vn+1.16B;idx -> Vm.8B TBL Vd.8B,{Vn.16B,Vn+1.16B},Vm.8B Vd.8B -> result v7/A32/A64 poly8x8_t vtbl3_p8(poly8x8x3_t a, uint8x8_t idx) a.val[1]:a.val[0] -> Vn.16B;Zeros(64):a.val[2] -> Vn+1.16B;idx -> Vm.8B TBL Vd.8B,{Vn.16B,Vn+1.16B},Vm.8B Vd.8B -> result v7/A32/A64 +mfloat8x8_t vtbl3_mf8(mfloat8x8x3_t a, uint8x8_t idx) a.val[1]:a.val[0] -> Vn.16B;Zeros(64):a.val[2] -> Vn+1.16B;idx -> Vm.8B TBL Vd.8B,{Vn.16B,Vn+1.16B},Vm.8B Vd.8B -> result A64 int8x8_t vtbl4_s8(int8x8x4_t a, int8x8_t idx) a.val[1]:a.val[0] -> Vn.16B;a.val[3]:a.val[2] -> Vn+1.16B;idx -> Vm.8B TBL Vd.8B,{Vn.16B,Vn+1.16B},Vm.8B Vd.8B -> result v7/A32/A64 uint8x8_t vtbl4_u8(uint8x8x4_t a, uint8x8_t idx) a.val[1]:a.val[0] -> Vn.16B;a.val[3]:a.val[2] -> Vn+1.16B;idx -> Vm.8B TBL Vd.8B,{Vn.16B,Vn+1.16B},Vm.8B Vd.8B -> result v7/A32/A64 poly8x8_t vtbl4_p8(poly8x8x4_t a, uint8x8_t idx) a.val[1]:a.val[0] -> Vn.16B;a.val[3]:a.val[2] -> Vn+1.16B;idx -> Vm.8B TBL Vd.8B,{Vn.16B,Vn+1.16B},Vm.8B Vd.8B -> result v7/A32/A64 +mfloat8x8_t vtbl4_mf8(mfloat8x8x4_t a, uint8x8_t idx) a.val[1]:a.val[0] -> Vn.16B;a.val[3]:a.val[2] -> Vn+1.16B;idx -> Vm.8B TBL Vd.8B,{Vn.16B,Vn+1.16B},Vm.8B Vd.8B -> result A64 int8x8_t vtbx2_s8(int8x8_t a, int8x8x2_t b, int8x8_t idx) b.val[1]:b.val[0] -> Vn.16B;a -> Vd.8B;idx -> Vm.8B TBX Vd.8B,{Vn.16B},Vm.8B Vd.8B -> result v7/A32/A64 uint8x8_t vtbx2_u8(uint8x8_t a, uint8x8x2_t b, uint8x8_t idx) b.val[1]:b.val[0] -> Vn.16B;a -> Vd.8B;idx -> Vm.8B TBX Vd.8B,{Vn.16B},Vm.8B Vd.8B -> result v7/A32/A64 poly8x8_t vtbx2_p8(poly8x8_t a, poly8x8x2_t b, uint8x8_t idx) b.val[1]:b.val[0] -> Vn.16B;a -> Vd.8B;idx -> Vm.8B TBX Vd.8B,{Vn.16B},Vm.8B Vd.8B -> result v7/A32/A64 +mfloat8x8_t vtbx2_mf8(mfloat8x8_t a, mfloat8x8x2_t b, uint8x8_t idx) b.val[1]:b.val[0] -> Vn.16B;a -> Vd.8B;idx -> Vm.8B TBX Vd.8B,{Vn.16B},Vm.8B Vd.8B -> result A64 int8x8_t vtbx3_s8(int8x8_t a, int8x8x3_t b, int8x8_t idx) b.val[1]:b.val[0] -> Vn.16B;Zeros(64):b.val[2] -> Vn+1.16B;a -> Vd.8B;idx -> Vm.8B MOVI Vtmp.8B,#24;CMHS Vtmp.8B,Vm.8B,Vtmp.8B;TBL Vtmp1.8B,{Vn.16B,Vn+1.16B},Vm.8;BIF Vd.8B,Vtmp1.8B,Vtmp.8B Vd.8B -> result v7/A32/A64 uint8x8_t vtbx3_u8(uint8x8_t a, uint8x8x3_t b, uint8x8_t idx) b.val[1]:b.val[0] -> Vn.16B;Zeros(64):b.val[2] -> Vn+1.16B;a -> Vd.8B;idx -> Vm.8B MOVI Vtmp.8B,#24;CMHS Vtmp.8B,Vm.8B,Vtmp.8B;TBL Vtmp1.8B,{Vn.16B,Vn+1.16B},Vm.8B;BIF Vd.8B,Vtmp1.8B,Vtmp.8B Vd.8B -> result v7/A32/A64 poly8x8_t vtbx3_p8(poly8x8_t a, poly8x8x3_t b, uint8x8_t idx) b.val[1]:b.val[0] -> Vn.16B;Zeros(64):b.val[2] -> Vn+1.16B;a -> Vd.8B;idx -> Vm.8B MOVI Vtmp.8B,#24;CMHS Vtmp.8B,Vm.8B,Vtmp.8B;TBL Vtmp1.8B,{Vn.16B,Vn+1.16B},Vm.8B;BIF Vd.8B,Vtmp1.8B,Vtmp.8B Vd.8B -> result v7/A32/A64 +mfloat8x8_t vtbx3_mf8(mfloat8x8_t a, mfloat8x8x3_t b, uint8x8_t idx) b.val[1]:b.val[0] -> Vn.16B;Zeros(64):b.val[2] -> Vn+1.16B;a -> Vd.8B;idx -> Vm.8B MOVI Vtmp.8B,#24;CMHS Vtmp.8B,Vm.8B,Vtmp.8B;TBL Vtmp1.8B,{Vn.16B,Vn+1.16B},Vm.8B;BIF Vd.8B,Vtmp1.8B,Vtmp.8B Vd.8B -> result A64 int8x8_t vtbx4_s8(int8x8_t a, int8x8x4_t b, int8x8_t idx) b.val[1]:b.val[0] -> Vn.16B;b.val[3]:b.val[2] -> Vn+1.16B;a -> Vd.8B; c-> Vm.8B TBX Vd.8B,{Vn.16B,Vn+1.16B},Vm.8B Vd.8B -> result v7/A32/A64 uint8x8_t vtbx4_u8(uint8x8_t a, uint8x8x4_t b, uint8x8_t idx) b.val[1]:b.val[0] -> Vn.16B;b.val[3]:b.val[2] -> Vn+1.16B;a -> Vd.8B; c-> Vm.8B TBX Vd.8B,{Vn.16B,Vn+1.16B},Vm.8B Vd.8B -> result v7/A32/A64 poly8x8_t vtbx4_p8(poly8x8_t a, poly8x8x4_t b, uint8x8_t idx) b.val[1]:b.val[0] -> Vn.16B;b.val[3]:b.val[2] -> Vn+1.16B;a -> Vd.8B; c-> Vm.8B TBX Vd.8B,{Vn.16B,Vn+1.16B},Vm.8B Vd.8B -> result v7/A32/A64 +mfloat8x8_t vtbx4_mf8(mfloat8x8_t a, mfloat8x8x4_t b, uint8x8_t idx) b.val[1]:b.val[0] -> Vn.16B;b.val[3]:b.val[2] -> Vn+1.16B;a -> Vd.8B; c-> Vm.8B TBX Vd.8B,{Vn.16B,Vn+1.16B},Vm.8B Vd.8B -> result A64 int8x8_t vqtbl1_s8(int8x16_t t, uint8x8_t idx) t -> Vn.16B;idx -> Vm.8B TBL Vd.8B,{Vn.16B},Vm.8B Vd.8B -> result A64 int8x16_t vqtbl1q_s8(int8x16_t t, uint8x16_t idx) t -> Vn.16B;idx -> Vm.16B TBL Vd.16B,{Vn.16B},Vm.16B Vd.16B -> result A64 uint8x8_t vqtbl1_u8(uint8x16_t t, uint8x8_t idx) t -> Vn.16B;idx -> Vm.8B TBL Vd.8B,{Vn.16B},Vm.8B Vd.8B -> result A64 uint8x16_t vqtbl1q_u8(uint8x16_t t, uint8x16_t idx) t -> Vn.16B;idx -> Vm.16B TBL Vd.16B,{Vn.16B},Vm.16B Vd.16B -> result A64 poly8x8_t vqtbl1_p8(poly8x16_t t, uint8x8_t idx) t -> Vn.16B;idx -> Vm.8B TBL Vd.8B,{Vn.16B},Vm.8B Vd.8B -> result A64 poly8x16_t vqtbl1q_p8(poly8x16_t t, uint8x16_t idx) t -> Vn.16B;idx -> Vm.16B TBL Vd.16B,{Vn.16B},Vm.16B Vd.16B -> result A64 +mfloat8x8_t vqtbl1_mf8(mfloat8x16_t t, uint8x8_t idx) t -> Vn.16B;idx -> Vm.8B TBL Vd.8B,{Vn.16B},Vm.8B Vd.8B -> result A64 +mfloat8x16_t vqtbl1q_mf8(mfloat8x16_t t, uint8x16_t idx) t -> Vn.16B;idx -> Vm.16B TBL Vd.16B,{Vn.16B},Vm.16B Vd.16B -> result A64 int8x8_t vqtbx1_s8(int8x8_t a, int8x16_t t, uint8x8_t idx) a -> Vd.8B;t -> Vn.16B;idx -> Vm.8B TBX Vd.8B,{Vn.16B},Vm.8B Vd.8B -> result A64 int8x16_t vqtbx1q_s8(int8x16_t a, int8x16_t t, uint8x16_t idx) a -> Vd.16B;t -> Vn.16B;idx -> Vm.16B TBX Vd.16B,{Vn.16B},Vm.16B Vd.16B -> result A64 uint8x8_t vqtbx1_u8(uint8x8_t a, uint8x16_t t, uint8x8_t idx) a -> Vd.8B;t -> Vn.16B;idx -> Vm.8B TBX Vd.8B,{Vn.16B},Vm.8B Vd.8B -> result A64 uint8x16_t vqtbx1q_u8(uint8x16_t a, uint8x16_t t, uint8x16_t idx) a -> Vd.16B;t -> Vn.16B;idx -> Vm.16B TBX Vd.16B,{Vn.16B},Vm.16B Vd.16B -> result A64 poly8x8_t vqtbx1_p8(poly8x8_t a, poly8x16_t t, uint8x8_t idx) a -> Vd.8B;t -> Vn.16B;idx -> Vm.8B TBX Vd.8B,{Vn.16B},Vm.8B Vd.8B -> result A64 poly8x16_t vqtbx1q_p8(poly8x16_t a, poly8x16_t t, uint8x16_t idx) a -> Vd.16B;t -> Vn.16B;idx -> Vm.16B TBX Vd.16B,{Vn.16B},Vm.16B Vd.16B -> result A64 +mfloat8x8_t vqtbx1_mf8(mfloat8x8_t a, mfloat8x16_t t, uint8x8_t idx) a -> Vd.8B;t -> Vn.16B;idx -> Vm.8B TBX Vd.8B,{Vn.16B},Vm.8B Vd.8B -> result A64 +mfloat8x16_t vqtbx1q_mf8(mfloat8x16_t a, mfloat8x16_t t, uint8x16_t idx) a -> Vd.16B;t -> Vn.16B;idx -> Vm.16B TBX Vd.16B,{Vn.16B},Vm.16B Vd.16B -> result A64 int8x8_t vqtbl2_s8(int8x16x2_t t, uint8x8_t idx) t.val[0] -> Vn.16B;t.val[1] -> Vn+1.16B;idx -> Vm.8B TBL Vd.8B,{Vn.16B - Vn+1.16B},Vm.8B Vd.8B -> result A64 int8x16_t vqtbl2q_s8(int8x16x2_t t, uint8x16_t idx) t.val[0] -> Vn.16B;t.val[1] -> Vn+1.16B;idx -> Vm.16B TBL Vd.16B,{Vn.16B - Vn+1.16B},Vm.16B Vd.16B -> result A64 uint8x8_t vqtbl2_u8(uint8x16x2_t t, uint8x8_t idx) t.val[0] -> Vn.16B;t.val[1] -> Vn+1.16B;idx -> Vm.8B TBL Vd.8B,{Vn.16B - Vn+1.16B},Vm.8B Vd.8B -> result A64 uint8x16_t vqtbl2q_u8(uint8x16x2_t t, uint8x16_t idx) t.val[0] -> Vn.16B;t.val[1] -> Vn+1.16B;idx -> Vm.16B TBL Vd.16B,{Vn.16B - Vn+1.16B},Vm.16B Vd.16B -> result A64 poly8x8_t vqtbl2_p8(poly8x16x2_t t, uint8x8_t idx) t.val[0] -> Vn.16B;t.val[1] -> Vn+1.16B;idx -> Vm.8B TBL Vd.8B,{Vn.16B - Vn+1.16B},Vm.8B Vd.8B -> result A64 poly8x16_t vqtbl2q_p8(poly8x16x2_t t, uint8x16_t idx) t.val[0] -> Vn.16B;t.val[1] -> Vn+1.16B;idx -> Vm.16B TBL Vd.16B,{Vn.16B - Vn+1.16B},Vm.16B Vd.16B -> result A64 +mfloat8x8_t vqtbl2_mf8(mfloat8x16x2_t t, uint8x8_t idx) t.val[0] -> Vn.16B;t.val[1] -> Vn+1.16B;idx -> Vm.8B TBL Vd.8B,{Vn.16B - Vn+1.16B},Vm.8B Vd.8B -> result A64 +mfloat8x16_t vqtbl2q_mf8(mfloat8x16x2_t t, uint8x16_t idx) t.val[0] -> Vn.16B;t.val[1] -> Vn+1.16B;idx -> Vm.16B TBL Vd.16B,{Vn.16B - Vn+1.16B},Vm.16B Vd.16B -> result A64 int8x8_t vqtbl3_s8(int8x16x3_t t, uint8x8_t idx) t.val[0] -> Vn.16B;t.val[1] -> Vn+1.16B;t.val[2] -> Vn+2.16B;idx -> Vm.8B TBL Vd.8B,{Vn.16B - Vn+2.16B},Vm.8B Vd.8B -> result A64 int8x16_t vqtbl3q_s8(int8x16x3_t t, uint8x16_t idx) t.val[0] -> Vn.16B;t.val[1] -> Vn+1.16B;t.val[2] -> Vn+2.16B;idx -> Vm.16B TBL Vd.16B,{Vn.16B - Vn+2.16B},Vm.16B Vd.16B -> result A64 uint8x8_t vqtbl3_u8(uint8x16x3_t t, uint8x8_t idx) t.val[0] -> Vn.16B;t.val[1] -> Vn+1.16B;t.val[2] -> Vn+2.16B;idx -> Vm.8B TBL Vd.8B,{Vn.16B - Vn+2.16B},Vm.8B Vd.8B -> result A64 uint8x16_t vqtbl3q_u8(uint8x16x3_t t, uint8x16_t idx) t.val[0] -> Vn.16B;t.val[1] -> Vn+1.16B;t.val[2] -> Vn+2.16B;idx -> Vm.16B TBL Vd.16B,{Vn.16B - Vn+2.16B},Vm.16B Vd.16B -> result A64 poly8x8_t vqtbl3_p8(poly8x16x3_t t, uint8x8_t idx) t.val[0] -> Vn.16B;t.val[1] -> Vn+1.16B;t.val[2] -> Vn+2.16B;idx -> Vm.8B TBL Vd.8B,{Vn.16B - Vn+2.16B},Vm.8B Vd.8B -> result A64 poly8x16_t vqtbl3q_p8(poly8x16x3_t t, uint8x16_t idx) t.val[0] -> Vn.16B;t.val[1] -> Vn+1.16B;t.val[2] -> Vn+2.16B;idx -> Vm.16B TBL Vd.16B,{Vn.16B - Vn+2.16B},Vm.16B Vd.16B -> result A64 +mfloat8x8_t vqtbl3_mf8(mfloat8x16x3_t t, uint8x8_t idx) t.val[0] -> Vn.16B;t.val[1] -> Vn+1.16B;t.val[2] -> Vn+2.16B;idx -> Vm.8B TBL Vd.8B,{Vn.16B - Vn+2.16B},Vm.8B Vd.8B -> result A64 +mfloat8x16_t vqtbl3q_mf8(mfloat8x16x3_t t, uint8x16_t idx) t.val[0] -> Vn.16B;t.val[1] -> Vn+1.16B;t.val[2] -> Vn+2.16B;idx -> Vm.16B TBL Vd.16B,{Vn.16B - Vn+2.16B},Vm.16B Vd.16B -> result A64 int8x8_t vqtbl4_s8(int8x16x4_t t, uint8x8_t idx) t.val[0] -> Vn.16B;t.val[1] -> Vn+1.16B;t.val[2] -> Vn+2.16B;t.val[3] -> Vn+3.16B;idx -> Vm.8B TBL Vd.8B,{Vn.16B - Vn+3.16B},Vm.8B Vd.8B -> result A64 int8x16_t vqtbl4q_s8(int8x16x4_t t, uint8x16_t idx) t.val[0] -> Vn.16B;t.val[1] -> Vn+1.16B;t.val[2] -> Vn+2.16B;t.val[3] -> Vn+3.16B;idx -> Vm.16B TBL Vd.16B,{Vn.16B - Vn+3.16B},Vm.16B Vd.16B -> result A64 uint8x8_t vqtbl4_u8(uint8x16x4_t t, uint8x8_t idx) t.val[0] -> Vn.16B;t.val[1] -> Vn+1.16B;t.val[2] -> Vn+2.16B;t.val[3] -> Vn+3.16B;idx -> Vm.8B TBL Vd.8B,{Vn.16B - Vn+3.16B},Vm.8B Vd.8B -> result A64 uint8x16_t vqtbl4q_u8(uint8x16x4_t t, uint8x16_t idx) t.val[0] -> Vn.16B;t.val[1] -> Vn+1.16B;t.val[2] -> Vn+2.16B;t.val[3] -> Vn+3.16B;idx -> Vm.16B TBL Vd.16B,{Vn.16B - Vn+3.16B},Vm.16B Vd.16B -> result A64 poly8x8_t vqtbl4_p8(poly8x16x4_t t, uint8x8_t idx) t.val[0] -> Vn.16B;t.val[1] -> Vn+1.16B;t.val[2] -> Vn+2.16B;t.val[3] -> Vn+3.16B;idx -> Vm.8B TBL Vd.8B,{Vn.16B - Vn+3.16B},Vm.8B Vd.8B -> result A64 poly8x16_t vqtbl4q_p8(poly8x16x4_t t, uint8x16_t idx) t.val[0] -> Vn.16B;t.val[1] -> Vn+1.16B;t.val[2] -> Vn+2.16B;t.val[3] -> Vn+3.16B;idx -> Vm.16B TBL Vd.16B,{Vn.16B - Vn+3.16B},Vm.16B Vd.16B -> result A64 +mfloat8x8_t vqtbl4_mf8(mfloat8x16x4_t t, uint8x8_t idx) t.val[0] -> Vn.16B;t.val[1] -> Vn+1.16B;t.val[2] -> Vn+2.16B;t.val[3] -> Vn+3.16B;idx -> Vm.8B TBL Vd.8B,{Vn.16B - Vn+3.16B},Vm.8B Vd.8B -> result A64 +mfloat8x16_t vqtbl4q_mf8(mfloat8x16x4_t t, uint8x16_t idx) t.val[0] -> Vn.16B;t.val[1] -> Vn+1.16B;t.val[2] -> Vn+2.16B;t.val[3] -> Vn+3.16B;idx -> Vm.16B TBL Vd.16B,{Vn.16B - Vn+3.16B},Vm.16B Vd.16B -> result A64 int8x8_t vqtbx2_s8(int8x8_t a, int8x16x2_t t, uint8x8_t idx) t.val[0] -> Vn.16B;t.val[1] -> Vn+1.16B;idx -> Vm.8B;a -> Vd.8B TBX Vd.8B,{Vn.16B - Vn+1.16B},Vm.8B Vd.8B -> result A64 int8x16_t vqtbx2q_s8(int8x16_t a, int8x16x2_t t, uint8x16_t idx) t.val[0] -> Vn.16B;t.val[1] -> Vn+1.16B;idx -> Vm.16B;a -> Vd.16B TBX Vd.16B,{Vn.16B - Vn+1.16B},Vm.16B Vd.16B -> result A64 uint8x8_t vqtbx2_u8(uint8x8_t a, uint8x16x2_t t, uint8x8_t idx) t.val[0] -> Vn.16B;t.val[1] -> Vn+1.16B;idx -> Vm.8B;a -> Vd.8B TBX Vd.8B,{Vn.16B - Vn+1.16B},Vm.8B Vd.8B -> result A64 uint8x16_t vqtbx2q_u8(uint8x16_t a, uint8x16x2_t t, uint8x16_t idx) t.val[0] -> Vn.16B;t.val[1] -> Vn+1.16B;idx -> Vm.16B;a -> Vd.16B TBX Vd.16B,{Vn.16B - Vn+1.16B},Vm.16B Vd.16B -> result A64 poly8x8_t vqtbx2_p8(poly8x8_t a, poly8x16x2_t t, uint8x8_t idx) t.val[0] -> Vn.16B;t.val[1] -> Vn+1.16B;idx -> Vm.8B;a -> Vd.8B TBX Vd.8B,{Vn.16B - Vn+1.16B},Vm.8B Vd.8B -> result A64 poly8x16_t vqtbx2q_p8(poly8x16_t a, poly8x16x2_t t, uint8x16_t idx) t.val[0] -> Vn.16B;t.val[1] -> Vn+1.16B;idx -> Vm.16B;a -> Vd.16B TBX Vd.16B,{Vn.16B - Vn+1.16B},Vm.16B Vd.16B -> result A64 +mfloat8x8_t vqtbx2_mf8(mfloat8x8_t a, mfloat8x16x2_t t, uint8x8_t idx) t.val[0] -> Vn.16B;t.val[1] -> Vn+1.16B;idx -> Vm.8B;a -> Vd.8B TBX Vd.8B,{Vn.16B - Vn+1.16B},Vm.8B Vd.8B -> result A64 +mfloat8x16_t vqtbx2q_mf8(mfloat8x16_t a, mfloat8x16x2_t t, uint8x16_t idx) t.val[0] -> Vn.16B;t.val[1] -> Vn+1.16B;idx -> Vm.16B;a -> Vd.16B TBX Vd.16B,{Vn.16B - Vn+1.16B},Vm.16B Vd.16B -> result A64 int8x8_t vqtbx3_s8(int8x8_t a, int8x16x3_t t, uint8x8_t idx) t.val[0] -> Vn.16B;t.val[1] -> Vn+1.16B;t.val[2] -> Vn+2.16B;idx -> Vm.8B;a -> Vd.8B TBX Vd.8B,{Vn.16B - Vn+2.16B},Vm.8B Vd.8B -> result A64 int8x16_t vqtbx3q_s8(int8x16_t a, int8x16x3_t t, uint8x16_t idx) t.val[0] -> Vn.16B;t.val[1] -> Vn+1.16B;t.val[2] -> Vn+2.16B;idx -> Vm.16B;a -> Vd.16B TBX Vd.16B,{Vn.16B - Vn+2.16B},Vm.16B Vd.16B -> result A64 uint8x8_t vqtbx3_u8(uint8x8_t a, uint8x16x3_t t, uint8x8_t idx) t.val[0] -> Vn.16B;t.val[1] -> Vn+1.16B;t.val[2] -> Vn+2.16B;idx -> Vm.8B;a -> Vd.8B TBX Vd.8B,{Vn.16B - Vn+2.16B},Vm.8B Vd.8B -> result A64 uint8x16_t vqtbx3q_u8(uint8x16_t a, uint8x16x3_t t, uint8x16_t idx) t.val[0] -> Vn.16B;t.val[1] -> Vn+1.16B;t.val[2] -> Vn+2.16B;idx -> Vm.16B;a -> Vd.16B TBX Vd.16B,{Vn.16B - Vn+2.16B},Vm.16B Vd.16B -> result A64 poly8x8_t vqtbx3_p8(poly8x8_t a, poly8x16x3_t t, uint8x8_t idx) t.val[0] -> Vn.16B;t.val[1] -> Vn+1.16B;t.val[2] -> Vn+2.16B;idx -> Vm.8B;a -> Vd.8B TBX Vd.8B,{Vn.16B - Vn+2.16B},Vm.8B Vd.8B -> result A64 poly8x16_t vqtbx3q_p8(poly8x16_t a, poly8x16x3_t t, uint8x16_t idx) t.val[0] -> Vn.16B;t.val[1] -> Vn+1.16B;t.val[2] -> Vn+2.16B;idx -> Vm.16B;a -> Vd.16B TBX Vd.16B,{Vn.16B - Vn+2.16B},Vm.16B Vd.16B -> result A64 +mfloat8x8_t vqtbx3_mf8(mfloat8x8_t a, mfloat8x16x3_t t, uint8x8_t idx) t.val[0] -> Vn.16B;t.val[1] -> Vn+1.16B;t.val[2] -> Vn+2.16B;idx -> Vm.8B;a -> Vd.8B TBX Vd.8B,{Vn.16B - Vn+2.16B},Vm.8B Vd.8B -> result A64 +mfloat8x16_t vqtbx3q_mf8(mfloat8x16_t a, mfloat8x16x3_t t, uint8x16_t idx) t.val[0] -> Vn.16B;t.val[1] -> Vn+1.16B;t.val[2] -> Vn+2.16B;idx -> Vm.16B;a -> Vd.16B TBX Vd.16B,{Vn.16B - Vn+2.16B},Vm.16B Vd.16B -> result A64 int8x8_t vqtbx4_s8(int8x8_t a, int8x16x4_t t, uint8x8_t idx) t.val[0] -> Vn.16B;t.val[1] -> Vn+1.16B;t.val[2] -> Vn+2.16B;t.val[3] -> Vn+3.16B;idx -> Vm.8B;a -> Vd.8B TBX Vd.8B,{Vn.16B - Vn+3.16B},Vm.8B Vd.8B -> result A64 int8x16_t vqtbx4q_s8(int8x16_t a, int8x16x4_t t, uint8x16_t idx) t.val[0] -> Vn.16B;t.val[1] -> Vn+1.16B;t.val[2] -> Vn+2.16B;t.val[3] -> Vn+3.16B;idx -> Vm.16B;a -> Vd.16B TBX Vd.16B,{Vn.16B - Vn+3.16B},Vm.16B Vd.16B -> result A64 uint8x8_t vqtbx4_u8(uint8x8_t a, uint8x16x4_t t, uint8x8_t idx) t.val[0] -> Vn.16B;t.val[1] -> Vn+1.16B;t.val[2] -> Vn+2.16B;t.val[3] -> Vn+3.16B;idx -> Vm.8B;a -> Vd.8B TBX Vd.8B,{Vn.16B - Vn+3.16B},Vm.8B Vd.8B -> result A64 uint8x16_t vqtbx4q_u8(uint8x16_t a, uint8x16x4_t t, uint8x16_t idx) t.val[0] -> Vn.16B;t.val[1] -> Vn+1.16B;t.val[2] -> Vn+2.16B;t.val[3] -> Vn+3.16B;idx -> Vm.16B;a -> Vd.16B TBX Vd.16B,{Vn.16B - Vn+3.16B},Vm.16B Vd.16B -> result A64 poly8x8_t vqtbx4_p8(poly8x8_t a, poly8x16x4_t t, uint8x8_t idx) t.val[0] -> Vn.16B;t.val[1] -> Vn+1.16B;t.val[2] -> Vn+2.16B;t.val[3] -> Vn+3.16B;idx -> Vm.8B;a -> Vd.8B TBX Vd.8B,{Vn.16B - Vn+3.16B},Vm.8B Vd.8B -> result A64 poly8x16_t vqtbx4q_p8(poly8x16_t a, poly8x16x4_t t, uint8x16_t idx) t.val[0] -> Vn.16B;t.val[1] -> Vn+1.16B;t.val[2] -> Vn+2.16B;t.val[3] -> Vn+3.16B;idx -> Vm.16B;a -> Vd.16B TBX Vd.16B,{Vn.16B - Vn+3.16B},Vm.16B Vd.16B -> result A64 +mfloat8x8_t vqtbx4_mf8(mfloat8x8_t a, mfloat8x16x4_t t, uint8x8_t idx) t.val[0] -> Vn.16B;t.val[1] -> Vn+1.16B;t.val[2] -> Vn+2.16B;t.val[3] -> Vn+3.16B;idx -> Vm.8B;a -> Vd.8B TBX Vd.8B,{Vn.16B - Vn+3.16B},Vm.8B Vd.8B -> result A64 +mfloat8x16_t vqtbx4q_mf8(mfloat8x16_t a, mfloat8x16x4_t t, uint8x16_t idx) t.val[0] -> Vn.16B;t.val[1] -> Vn+1.16B;t.val[2] -> Vn+2.16B;t.val[3] -> Vn+3.16B;idx -> Vm.16B;a -> Vd.16B TBX Vd.16B,{Vn.16B - Vn+3.16B},Vm.16B Vd.16B -> result A64 uint8_t vget_lane_u8(uint8x8_t v, __builtin_constant_p(lane)) 0<=lane<=7;v -> Vn.8B UMOV Rd,Vn.B[lane] Rd -> result v7/A32/A64 uint16_t vget_lane_u16(uint16x4_t v, __builtin_constant_p(lane)) 0<=lane<=3;v -> Vn.4H UMOV Rd,Vn.H[lane] Rd -> result v7/A32/A64 uint32_t vget_lane_u32(uint32x2_t v, __builtin_constant_p(lane)) 0<=lane<=1;v -> Vn.2S UMOV Rd,Vn.S[lane] Rd -> result v7/A32/A64 @@ -3265,6 +3379,7 @@ float16x4_t vset_lane_f16(float16_t a, float16x4_t v, __builtin_constant_p(lane) float16x8_t vsetq_lane_f16(float16_t a, float16x8_t v, __builtin_constant_p(lane)) 0<=lane<=7;a -> VnH;v -> Vd.8H MOV Vd.H[lane],Vn.H[0] Vd.8H -> result v7/A32/A64 float32x2_t vset_lane_f32(float32_t a, float32x2_t v, __builtin_constant_p(lane)) 0<=lane<=1;a -> Rn;v -> Vd.2S MOV Vd.S[lane],Rn Vd.2S -> result v7/A32/A64 float64x1_t vset_lane_f64(float64_t a, float64x1_t v, __builtin_constant_p(lane)) lane==0;a -> Rn;v -> Vd.1D MOV Vd.D[lane],Rn Vd.1D -> result A64 +mfloat8x8_t vset_lane_mf8(mfloat8_t a, mfloat8x8_t v, __builtin_constant_p(lane)) 0<=lane<=7;a -> Rn;v -> Vd.8B MOV Vd.B[lane],Rn Vd.8B -> result A64 uint8x16_t vsetq_lane_u8(uint8_t a, uint8x16_t v, __builtin_constant_p(lane)) 0<=lane<=15;a -> Rn;v -> Vd.16B MOV Vd.B[lane],Rn Vd.16B -> result v7/A32/A64 uint16x8_t vsetq_lane_u16(uint16_t a, uint16x8_t v, __builtin_constant_p(lane)) 0<=lane<=7;a -> Rn;v -> Vd.8H MOV Vd.H[lane],Rn Vd.8H -> result v7/A32/A64 uint32x4_t vsetq_lane_u32(uint32_t a, uint32x4_t v, __builtin_constant_p(lane)) 0<=lane<=3;a -> Rn;v -> Vd.4S MOV Vd.S[lane],Rn Vd.4S -> result v7/A32/A64 @@ -3278,6 +3393,7 @@ poly8x16_t vsetq_lane_p8(poly8_t a, poly8x16_t v, __builtin_constant_p(lane)) 0< poly16x8_t vsetq_lane_p16(poly16_t a, poly16x8_t v, __builtin_constant_p(lane)) 0<=lane<=7;a -> Rn;v -> Vd.8H MOV Vd.H[lane],Rn Vd.8H -> result v7/A32/A64 float32x4_t vsetq_lane_f32(float32_t a, float32x4_t v, __builtin_constant_p(lane)) 0<=lane<=3;a -> Rn;v -> Vd.4S MOV Vd.S[lane],Rn Vd.4S -> result v7/A32/A64 float64x2_t vsetq_lane_f64(float64_t a, float64x2_t v, __builtin_constant_p(lane)) 0<=lane<=1;a -> Rn;v -> Vd.2D MOV Vd.D[lane],Rn Vd.2D -> result A64 +mfloat8x16_t vsetq_lane_mf8(mfloat8_t a, mfloat8x16_t v, __builtin_constant_p(lane)) 0<=lane<=15;a -> Rn;v -> Vd.16B MOV Vd.B[lane],Rn Vd.16B -> result A64 float32_t vrecpxs_f32(float32_t a) a -> Sn FRECPX Sd,Sn Sd -> result A64 float64_t vrecpxd_f64(float64_t a) a -> Dn FRECPX Dd,Dn Dd -> result A64 float32x2_t vfma_n_f32(float32x2_t a, float32x2_t b, float32_t n) n -> Vm.S[0];b -> Vn.2S;a -> Vd.2S FMLA Vd.2S,Vn.2S,Vm.S[0] Vd.2S -> result v7/A32/A64 @@ -3297,6 +3413,7 @@ poly16x4x2_t vtrn_p16(poly16x4_t a, poly16x4_t b) a -> Vn.4H;b -> Vm.4H TRN1 Vd1 int32x2x2_t vtrn_s32(int32x2_t a, int32x2_t b) a -> Vn.2S;b -> Vm.2S TRN1 Vd1.2S,Vn.2S,Vm.2S;TRN2 Vd2.2S,Vn.2S,Vm.2S Vd1.2S -> result.val[0];Vd2.2S -> result.val[1] v7/A32/A64 float32x2x2_t vtrn_f32(float32x2_t a, float32x2_t b) a -> Vn.2S;b -> Vm.2S TRN1 Vd1.2S,Vn.2S,Vm.2S;TRN2 Vd2.2S,Vn.2S,Vm.2S Vd1.2S -> result.val[0];Vd2.2S -> result.val[1] v7/A32/A64 uint32x2x2_t vtrn_u32(uint32x2_t a, uint32x2_t b) a -> Vn.2S;b -> Vm.2S TRN1 Vd1.2S,Vn.2S,Vm.2S;TRN2 Vd2.2S,Vn.2S,Vm.2S Vd1.2S -> result.val[0];Vd2.2S -> result.val[1] v7/A32/A64 +mfloat8x8x2_t vtrn_mf8(mfloat8x8_t a, mfloat8x8_t b) a -> Vn.8B;b -> Vm.8B TRN1 Vd1.8B,Vn.8B,Vm.8B;TRN2 Vd2.8B,Vn.8B,Vm.8B Vd1.8B -> result.val[0];Vd2.8B -> result.val[1] A64 int8x16x2_t vtrnq_s8(int8x16_t a, int8x16_t b) a -> Vn.16B;b -> Vm.16B TRN1 Vd1.16B,Vn.16B,Vm.16B;TRN2 Vd2.16B,Vn.16B,Vm.16B Vd1.16B -> result.val[0];Vd2.16B -> result.val[1] v7/A32/A64 int16x8x2_t vtrnq_s16(int16x8_t a, int16x8_t b) a -> Vn.8H;b -> Vm.8H TRN1 Vd1.8H,Vn.8H,Vm.8H;TRN2 Vd2.8H,Vn.8H,Vm.8H Vd1.8H -> result.val[0];Vd2.8H -> result.val[1] v7/A32/A64 int32x4x2_t vtrnq_s32(int32x4_t a, int32x4_t b) a -> Vn.4S;b -> Vm.4S TRN1 Vd1.4S,Vn.4S,Vm.4S;TRN2 Vd2.4S,Vn.4S,Vm.4S Vd1.4S -> result.val[0];Vd2.4S -> result.val[1] v7/A32/A64 @@ -3306,12 +3423,14 @@ uint16x8x2_t vtrnq_u16(uint16x8_t a, uint16x8_t b) a -> Vn.8H;b -> Vm.8H TRN1 Vd uint32x4x2_t vtrnq_u32(uint32x4_t a, uint32x4_t b) a -> Vn.4S;b -> Vm.4S TRN1 Vd1.4S,Vn.4S,Vm.4S;TRN2 Vd2.4S,Vn.4S,Vm.4S Vd1.4S -> result.val[0];Vd2.4S -> result.val[1] v7/A32/A64 poly8x16x2_t vtrnq_p8(poly8x16_t a, poly8x16_t b) a -> Vn.16B;b -> Vm.16B TRN1 Vd1.16B,Vn.16B,Vm.16B;TRN2 Vd2.16B,Vn.16B,Vm.16B Vd1.16B -> result.val[0];Vd2.16B -> result.val[1] v7/A32/A64 poly16x8x2_t vtrnq_p16(poly16x8_t a, poly16x8_t b) a -> Vn.8H;b -> Vm.8H TRN1 Vd1.8H,Vn.8H,Vm.8H;TRN2 Vd2.8H,Vn.8H,Vm.8H Vd1.8H -> result.val[0];Vd2.8H -> result.val[1] v7/A32/A64 +mfloat8x16x2_t vtrnq_mf8(mfloat8x16_t a, mfloat8x16_t b) a -> Vn.16B;b -> Vm.16B TRN1 Vd1.16B,Vn.16B,Vm.16B;TRN2 Vd2.16B,Vn.16B,Vm.16B Vd1.16B -> result.val[0];Vd2.16B -> result.val[1] A64 int8x8x2_t vzip_s8(int8x8_t a, int8x8_t b) a -> Vn.8B;b -> Vm.8B ZIP1 Vd1.8B,Vn.8B,Vm.8B;ZIP2 Vd2.8B,Vn.8B,Vm.8B Vd1.8B -> result.val[0];Vd2.8B -> result.val[1] v7/A32/A64 int16x4x2_t vzip_s16(int16x4_t a, int16x4_t b) a -> Vn.4H;b -> Vm.4H ZIP1 Vd1.4H,Vn.4H,Vm.4H;ZIP2 Vd2.4H,Vn.4H,Vm.4H Vd1.4H -> result.val[0];Vd2.4H -> result.val[1] v7/A32/A64 uint8x8x2_t vzip_u8(uint8x8_t a, uint8x8_t b) a -> Vn.8B;b -> Vm.8B ZIP1 Vd1.8B,Vn.8B,Vm.8B;ZIP2 Vd2.8B,Vn.8B,Vm.8B Vd1.8B -> result.val[0];Vd2.8B -> result.val[1] v7/A32/A64 uint16x4x2_t vzip_u16(uint16x4_t a, uint16x4_t b) a -> Vn.4H;b -> Vm.4H ZIP1 Vd1.4H,Vn.4H,Vm.4H;ZIP2 Vd2.4H,Vn.4H,Vm.4H Vd1.4H -> result.val[0];Vd2.4H -> result.val[1] v7/A32/A64 poly8x8x2_t vzip_p8(poly8x8_t a, poly8x8_t b) a -> Vn.8B;b -> Vm.8B ZIP1 Vd1.8B,Vn.8B,Vm.8B;ZIP2 Vd2.8B,Vn.8B,Vm.8B Vd1.8B -> result.val[0];Vd2.8B -> result.val[1] v7/A32/A64 poly16x4x2_t vzip_p16(poly16x4_t a, poly16x4_t b) a -> Vn.4H;b -> Vm.4H ZIP1 Vd1.4H,Vn.4H,Vm.4H;ZIP2 Vd2.4H,Vn.4H,Vm.4H Vd1.4H -> result.val[0];Vd2.4H -> result.val[1] v7/A32/A64 +mfloat8x8x2_t vzip_mf8(mfloat8x8_t a, mfloat8x8_t b) a -> Vn.8B;b -> Vm.8B ZIP1 Vd1.8B,Vn.8B,Vm.8B;ZIP2 Vd2.8B,Vn.8B,Vm.8B Vd1.8B -> result.val[0];Vd2.8B -> result.val[1] A64 int32x2x2_t vzip_s32(int32x2_t a, int32x2_t b) a -> Vn.2S;b -> Vm.2S ZIP1 Vd1.2S,Vn.2S,Vm.2S;ZIP2 Vd2.2S,Vn.2S,Vm.2S Vd1.2S -> result.val[0];Vd2.2S -> result.val[1] v7/A32/A64 float32x2x2_t vzip_f32(float32x2_t a, float32x2_t b) a -> Vn.2S;b -> Vm.2S ZIP1 Vd1.2S,Vn.2S,Vm.2S;ZIP2 Vd2.2S,Vn.2S,Vm.2S Vd1.2S -> result.val[0];Vd2.2S -> result.val[1] v7/A32/A64 uint32x2x2_t vzip_u32(uint32x2_t a, uint32x2_t b) a -> Vn.2S;b -> Vm.2S ZIP1 Vd1.2S,Vn.2S,Vm.2S;ZIP2 Vd2.2S,Vn.2S,Vm.2S Vd1.2S -> result.val[0];Vd2.2S -> result.val[1] v7/A32/A64 @@ -3324,6 +3443,7 @@ uint16x8x2_t vzipq_u16(uint16x8_t a, uint16x8_t b) a -> Vn.8H;b -> Vm.8H ZIP1 Vd uint32x4x2_t vzipq_u32(uint32x4_t a, uint32x4_t b) a -> Vn.4S;b -> Vm.4S ZIP1 Vd1.4S,Vn.4S,Vm.4S;ZIP2 Vd2.4S,Vn.4S,Vm.4S Vd1.4S -> result.val[0];Vd2.4S -> result.val[1] v7/A32/A64 poly8x16x2_t vzipq_p8(poly8x16_t a, poly8x16_t b) a -> Vn.16B;b -> Vm.16B ZIP1 Vd1.16B,Vn.16B,Vm.16B;ZIP2 Vd2.16B,Vn.16B,Vm.16B Vd1.16B -> result.val[0];Vd2.16B -> result.val[1] v7/A32/A64 poly16x8x2_t vzipq_p16(poly16x8_t a, poly16x8_t b) a -> Vn.8H;b -> Vm.8H ZIP1 Vd1.8H,Vn.8H,Vm.8H;ZIP2 Vd2.8H,Vn.8H,Vm.8H Vd1.8H -> result.val[0];Vd2.8H -> result.val[1] v7/A32/A64 +mfloat8x16x2_t vzipq_mf8(mfloat8x16_t a, mfloat8x16_t b) a -> Vn.16B;b -> Vm.16B ZIP1 Vd1.16B,Vn.16B,Vm.16B;ZIP2 Vd2.16B,Vn.16B,Vm.16B Vd1.16B -> result.val[0];Vd2.16B -> result.val[1] A64 int8x8x2_t vuzp_s8(int8x8_t a, int8x8_t b) a -> Vn.8B;b -> Vm.8B UZP1 Vd1.8B,Vn.8B,Vm.8B;UZP2 Vd2.8B,Vn.8B,Vm.8B Vd1.8B -> result.val[0];Vd2.8B -> result.val[1] v7/A32/A64 int16x4x2_t vuzp_s16(int16x4_t a, int16x4_t b) a -> Vn.4H;b -> Vm.4H UZP1 Vd1.4H,Vn.4H,Vm.4H;UZP2 Vd2.4H,Vn.4H,Vm.4H Vd1.4H -> result.val[0];Vd2.4H -> result.val[1] v7/A32/A64 int32x2x2_t vuzp_s32(int32x2_t a, int32x2_t b) a -> Vn.2S;b -> Vm.2S UZP1 Vd1.2S,Vn.2S,Vm.2S;UZP2 Vd2.2S,Vn.2S,Vm.2S Vd1.2S -> result.val[0];Vd2.2S -> result.val[1] v7/A32/A64 @@ -3333,6 +3453,7 @@ uint16x4x2_t vuzp_u16(uint16x4_t a, uint16x4_t b) a -> Vn.4H;b -> Vm.4H UZP1 Vd1 uint32x2x2_t vuzp_u32(uint32x2_t a, uint32x2_t b) a -> Vn.2S;b -> Vm.2S UZP1 Vd1.2S,Vn.2S,Vm.2S;UZP2 Vd2.2S,Vn.2S,Vm.2S Vd1.2S -> result.val[0];Vd2.2S -> result.val[1] v7/A32/A64 poly8x8x2_t vuzp_p8(poly8x8_t a, poly8x8_t b) a -> Vn.8B;b -> Vm.8B UZP1 Vd1.8B,Vn.8B,Vm.8B;UZP2 Vd2.8B,Vn.8B,Vm.8B Vd1.8B -> result.val[0];Vd2.8B -> result.val[1] v7/A32/A64 poly16x4x2_t vuzp_p16(poly16x4_t a, poly16x4_t b) a -> Vn.4H;b -> Vm.4H UZP1 Vd1.4H,Vn.4H,Vm.4H;UZP2 Vd2.4H,Vn.4H,Vm.4H Vd1.4H -> result.val[0];Vd2.4H -> result.val[1] v7/A32/A64 +mfloat8x8x2_t vuzp_mf8(mfloat8x8_t a, mfloat8x8_t b) a -> Vn.8B;b -> Vm.8B UZP1 Vd1.8B,Vn.8B,Vm.8B;UZP2 Vd2.8B,Vn.8B,Vm.8B Vd1.8B -> result.val[0];Vd2.8B -> result.val[1] A64 int8x16x2_t vuzpq_s8(int8x16_t a, int8x16_t b) a -> Vn.16B;b -> Vm.16B UZP1 Vd1.16B,Vn.16B,Vm.16B;UZP2 Vd2.16B,Vn.16B,Vm.16B Vd1.16B -> result.val[0];Vd2.16B -> result.val[1] v7/A32/A64 int16x8x2_t vuzpq_s16(int16x8_t a, int16x8_t b) a -> Vn.8H;b -> Vm.8H UZP1 Vd1.8H,Vn.8H,Vm.8H;UZP2 Vd2.8H,Vn.8H,Vm.8H Vd1.8H -> result.val[0];Vd2.8H -> result.val[1] v7/A32/A64 int32x4x2_t vuzpq_s32(int32x4_t a, int32x4_t b) a -> Vn.4S;b -> Vm.4S UZP1 Vd1.4S,Vn.4S,Vm.4S;UZP2 Vd2.4S,Vn.4S,Vm.4S Vd1.4S -> result.val[0];Vd2.4S -> result.val[1] v7/A32/A64 @@ -3342,6 +3463,7 @@ uint16x8x2_t vuzpq_u16(uint16x8_t a, uint16x8_t b) a -> Vn.8H;b -> Vm.8H UZP1 Vd uint32x4x2_t vuzpq_u32(uint32x4_t a, uint32x4_t b) a -> Vn.4S;b -> Vm.4S UZP1 Vd1.4S,Vn.4S,Vm.4S;UZP2 Vd2.4S,Vn.4S,Vm.4S Vd1.4S -> result.val[0];Vd2.4S -> result.val[1] v7/A32/A64 poly8x16x2_t vuzpq_p8(poly8x16_t a, poly8x16_t b) a -> Vn.16B;b -> Vm.16B UZP1 Vd1.16B,Vn.16B,Vm.16B;UZP2 Vd2.16B,Vn.16B,Vm.16B Vd1.16B -> result.val[0];Vd2.16B -> result.val[1] v7/A32/A64 poly16x8x2_t vuzpq_p16(poly16x8_t a, poly16x8_t b) a -> Vn.8H;b -> Vm.8H UZP1 Vd1.8H,Vn.8H,Vm.8H;UZP2 Vd2.8H,Vn.8H,Vm.8H Vd1.8H -> result.val[0];Vd2.8H -> result.val[1] v7/A32/A64 +mfloat8x16x2_t vuzpq_mf8(mfloat8x16_t a, mfloat8x16_t b) a -> Vn.16B;b -> Vm.16B UZP1 Vd1.16B,Vn.16B,Vm.16B;UZP2 Vd2.16B,Vn.16B,Vm.16B Vd1.16B -> result.val[0];Vd2.16B -> result.val[1] A64 int16x4_t vreinterpret_s16_s8(int8x8_t a) a -> Vd.8B NOP Vd.4H -> result v7/A32/A64 int32x2_t vreinterpret_s32_s8(int8x8_t a) a -> Vd.8B NOP Vd.2S -> result v7/A32/A64 float32x2_t vreinterpret_f32_s8(int8x8_t a) a -> Vd.8B NOP Vd.2S -> result v7/A32/A64 @@ -3350,6 +3472,7 @@ uint16x4_t vreinterpret_u16_s8(int8x8_t a) a -> Vd.8B NOP Vd.4H -> result v7/A32 uint32x2_t vreinterpret_u32_s8(int8x8_t a) a -> Vd.8B NOP Vd.2S -> result v7/A32/A64 poly8x8_t vreinterpret_p8_s8(int8x8_t a) a -> Vd.8B NOP Vd.8B -> result v7/A32/A64 poly16x4_t vreinterpret_p16_s8(int8x8_t a) a -> Vd.8B NOP Vd.4H -> result v7/A32/A64 +mfloat8x8_t vreinterpret_mf8_s8(int8x8_t a) a -> Vd.8B NOP Vd.8B -> result A64 uint64x1_t vreinterpret_u64_s8(int8x8_t a) a -> Vd.8B NOP Vd.1D -> result v7/A32/A64 int64x1_t vreinterpret_s64_s8(int8x8_t a) a -> Vd.8B NOP Vd.1D -> result v7/A32/A64 float64x1_t vreinterpret_f64_s8(int8x8_t a) a -> Vd.8B NOP Vd.1D -> result A64 @@ -3363,6 +3486,7 @@ uint16x4_t vreinterpret_u16_s16(int16x4_t a) a -> Vd.4H NOP Vd.4H -> result v7/A uint32x2_t vreinterpret_u32_s16(int16x4_t a) a -> Vd.4H NOP Vd.2S -> result v7/A32/A64 poly8x8_t vreinterpret_p8_s16(int16x4_t a) a -> Vd.4H NOP Vd.8B -> result v7/A32/A64 poly16x4_t vreinterpret_p16_s16(int16x4_t a) a -> Vd.4H NOP Vd.4H -> result v7/A32/A64 +mfloat8x8_t vreinterpret_mf8_s16(int16x4_t a) a -> Vd.4H NOP Vd.8B -> result A64 uint64x1_t vreinterpret_u64_s16(int16x4_t a) a -> Vd.4H NOP Vd.1D -> result v7/A32/A64 int64x1_t vreinterpret_s64_s16(int16x4_t a) a -> Vd.4H NOP Vd.1D -> result v7/A32/A64 float64x1_t vreinterpret_f64_s16(int16x4_t a) a -> Vd.4H NOP Vd.1D -> result A64 @@ -3376,6 +3500,7 @@ uint16x4_t vreinterpret_u16_s32(int32x2_t a) a -> Vd.2S NOP Vd.4H -> result v7/A uint32x2_t vreinterpret_u32_s32(int32x2_t a) a -> Vd.2S NOP Vd.2S -> result v7/A32/A64 poly8x8_t vreinterpret_p8_s32(int32x2_t a) a -> Vd.2S NOP Vd.8B -> result v7/A32/A64 poly16x4_t vreinterpret_p16_s32(int32x2_t a) a -> Vd.2S NOP Vd.4H -> result v7/A32/A64 +mfloat8x8_t vreinterpret_mf8_s32(int32x2_t a) a -> Vd.2S NOP Vd.8B -> result A64 uint64x1_t vreinterpret_u64_s32(int32x2_t a) a -> Vd.2S NOP Vd.1D -> result v7/A32/A64 int64x1_t vreinterpret_s64_s32(int32x2_t a) a -> Vd.2S NOP Vd.1D -> result v7/A32/A64 float64x1_t vreinterpret_f64_s32(int32x2_t a) a -> Vd.2S NOP Vd.1D -> result A64 @@ -3389,6 +3514,7 @@ uint16x4_t vreinterpret_u16_f32(float32x2_t a) a -> Vd.2S NOP Vd.4H -> result v7 uint32x2_t vreinterpret_u32_f32(float32x2_t a) a -> Vd.2S NOP Vd.2S -> result v7/A32/A64 poly8x8_t vreinterpret_p8_f32(float32x2_t a) a -> Vd.2S NOP Vd.8B -> result v7/A32/A64 poly16x4_t vreinterpret_p16_f32(float32x2_t a) a -> Vd.2S NOP Vd.4H -> result v7/A32/A64 +mfloat8x8_t vreinterpret_mf8_f32(float32x2_t a) a -> Vd.2S NOP Vd.8B -> result A64 uint64x1_t vreinterpret_u64_f32(float32x2_t a) a -> Vd.2S NOP Vd.1D -> result v7/A32/A64 int64x1_t vreinterpret_s64_f32(float32x2_t a) a -> Vd.2S NOP Vd.1D -> result v7/A32/A64 float64x1_t vreinterpret_f64_f32(float32x2_t a) a -> Vd.2S NOP Vd.1D -> result A64 @@ -3403,6 +3529,7 @@ uint16x4_t vreinterpret_u16_u8(uint8x8_t a) a -> Vd.8B NOP Vd.4H -> result v7/A3 uint32x2_t vreinterpret_u32_u8(uint8x8_t a) a -> Vd.8B NOP Vd.2S -> result v7/A32/A64 poly8x8_t vreinterpret_p8_u8(uint8x8_t a) a -> Vd.8B NOP Vd.8B -> result v7/A32/A64 poly16x4_t vreinterpret_p16_u8(uint8x8_t a) a -> Vd.8B NOP Vd.4H -> result v7/A32/A64 +mfloat8x8_t vreinterpret_mf8_u8(uint8x8_t a) a -> Vd.8B NOP Vd.8B -> result A64 uint64x1_t vreinterpret_u64_u8(uint8x8_t a) a -> Vd.8B NOP Vd.1D -> result v7/A32/A64 int64x1_t vreinterpret_s64_u8(uint8x8_t a) a -> Vd.8B NOP Vd.1D -> result v7/A32/A64 float64x1_t vreinterpret_f64_u8(uint8x8_t a) a -> Vd.8B NOP Vd.1D -> result A64 @@ -3416,6 +3543,7 @@ uint8x8_t vreinterpret_u8_u16(uint16x4_t a) a -> Vd.4H NOP Vd.8B -> result v7/A3 uint32x2_t vreinterpret_u32_u16(uint16x4_t a) a -> Vd.4H NOP Vd.2S -> result v7/A32/A64 poly8x8_t vreinterpret_p8_u16(uint16x4_t a) a -> Vd.4H NOP Vd.8B -> result v7/A32/A64 poly16x4_t vreinterpret_p16_u16(uint16x4_t a) a -> Vd.4H NOP Vd.4H -> result v7/A32/A64 +mfloat8x8_t vreinterpret_mf8_u16(uint16x4_t a) a -> Vd.4H NOP Vd.8B -> result A64 uint64x1_t vreinterpret_u64_u16(uint16x4_t a) a -> Vd.4H NOP Vd.1D -> result v7/A32/A64 int64x1_t vreinterpret_s64_u16(uint16x4_t a) a -> Vd.4H NOP Vd.1D -> result v7/A32/A64 float64x1_t vreinterpret_f64_u16(uint16x4_t a) a -> Vd.4H NOP Vd.1D -> result A64 @@ -3429,6 +3557,7 @@ uint8x8_t vreinterpret_u8_u32(uint32x2_t a) a -> Vd.2S NOP Vd.8B -> result v7/A3 uint16x4_t vreinterpret_u16_u32(uint32x2_t a) a -> Vd.2S NOP Vd.4H -> result v7/A32/A64 poly8x8_t vreinterpret_p8_u32(uint32x2_t a) a -> Vd.2S NOP Vd.8B -> result v7/A32/A64 poly16x4_t vreinterpret_p16_u32(uint32x2_t a) a -> Vd.2S NOP Vd.4H -> result v7/A32/A64 +mfloat8x8_t vreinterpret_mf8_u32(uint32x2_t a) a -> Vd.2S NOP Vd.8B -> result A64 uint64x1_t vreinterpret_u64_u32(uint32x2_t a) a -> Vd.2S NOP Vd.1D -> result v7/A32/A64 int64x1_t vreinterpret_s64_u32(uint32x2_t a) a -> Vd.2S NOP Vd.1D -> result v7/A32/A64 float64x1_t vreinterpret_f64_u32(uint32x2_t a) a -> Vd.2S NOP Vd.1D -> result A64 @@ -3447,6 +3576,19 @@ int64x1_t vreinterpret_s64_p8(poly8x8_t a) a -> Vd.8B NOP Vd.1D -> result v7/A32 float64x1_t vreinterpret_f64_p8(poly8x8_t a) a -> Vd.8B NOP Vd.1D -> result A64 poly64x1_t vreinterpret_p64_p8(poly8x8_t a) a -> Vd.8B NOP Vd.1D -> result A32/A64 float16x4_t vreinterpret_f16_p8(poly8x8_t a) a -> Vd.8B NOP Vd.4H -> result v7/A32/A64 +int8x8_t vreinterpret_s8_mf8(mfloat8x8_t a) a -> Vd.8B NOP Vd.8B -> result A64 +int16x4_t vreinterpret_s16_mf8(mfloat8x8_t a) a -> Vd.8B NOP Vd.4H -> result A64 +int32x2_t vreinterpret_s32_mf8(mfloat8x8_t a) a -> Vd.8B NOP Vd.2S -> result A64 +float32x2_t vreinterpret_f32_mf8(mfloat8x8_t a) a -> Vd.8B NOP Vd.2S -> result A64 +uint8x8_t vreinterpret_u8_mf8(mfloat8x8_t a) a -> Vd.8B NOP Vd.8B -> result A64 +uint16x4_t vreinterpret_u16_mf8(mfloat8x8_t a) a -> Vd.8B NOP Vd.4H -> result A64 +uint32x2_t vreinterpret_u32_mf8(mfloat8x8_t a) a -> Vd.8B NOP Vd.2S -> result A64 +poly16x4_t vreinterpret_p16_mf8(mfloat8x8_t a) a -> Vd.8B NOP Vd.4H -> result A64 +uint64x1_t vreinterpret_u64_mf8(mfloat8x8_t a) a -> Vd.8B NOP Vd.1D -> result A64 +int64x1_t vreinterpret_s64_mf8(mfloat8x8_t a) a -> Vd.8B NOP Vd.1D -> result A64 +float64x1_t vreinterpret_f64_mf8(mfloat8x8_t a) a -> Vd.8B NOP Vd.1D -> result A64 +poly64x1_t vreinterpret_p64_mf8(mfloat8x8_t a) a -> Vd.8B NOP Vd.1D -> result A64 +float16x4_t vreinterpret_f16_mf8(mfloat8x8_t a) a -> Vd.8B NOP Vd.4H -> result A64 int8x8_t vreinterpret_s8_p16(poly16x4_t a) a -> Vd.4H NOP Vd.8B -> result v7/A32/A64 int16x4_t vreinterpret_s16_p16(poly16x4_t a) a -> Vd.4H NOP Vd.4H -> result v7/A32/A64 int32x2_t vreinterpret_s32_p16(poly16x4_t a) a -> Vd.4H NOP Vd.2S -> result v7/A32/A64 @@ -3455,6 +3597,7 @@ uint8x8_t vreinterpret_u8_p16(poly16x4_t a) a -> Vd.4H NOP Vd.8B -> result v7/A3 uint16x4_t vreinterpret_u16_p16(poly16x4_t a) a -> Vd.4H NOP Vd.4H -> result v7/A32/A64 uint32x2_t vreinterpret_u32_p16(poly16x4_t a) a -> Vd.4H NOP Vd.2S -> result v7/A32/A64 poly8x8_t vreinterpret_p8_p16(poly16x4_t a) a -> Vd.4H NOP Vd.8B -> result v7/A32/A64 +mfloat8x8_t vreinterpret_mf8_p16(poly16x4_t a) a -> Vd.4H NOP Vd.8B -> result A64 uint64x1_t vreinterpret_u64_p16(poly16x4_t a) a -> Vd.4H NOP Vd.1D -> result v7/A32/A64 int64x1_t vreinterpret_s64_p16(poly16x4_t a) a -> Vd.4H NOP Vd.1D -> result v7/A32/A64 float64x1_t vreinterpret_f64_p16(poly16x4_t a) a -> Vd.4H NOP Vd.1D -> result A64 @@ -3469,6 +3612,7 @@ uint16x4_t vreinterpret_u16_u64(uint64x1_t a) a -> Vd.1D NOP Vd.4H -> result v7/ uint32x2_t vreinterpret_u32_u64(uint64x1_t a) a -> Vd.1D NOP Vd.2S -> result v7/A32/A64 poly8x8_t vreinterpret_p8_u64(uint64x1_t a) a -> Vd.1D NOP Vd.8B -> result v7/A32/A64 poly16x4_t vreinterpret_p16_u64(uint64x1_t a) a -> Vd.1D NOP Vd.4H -> result v7/A32/A64 +mfloat8x8_t vreinterpret_mf8_u64(uint64x1_t a) a -> Vd.1D NOP Vd.8B -> result A64 int64x1_t vreinterpret_s64_u64(uint64x1_t a) a -> Vd.1D NOP Vd.1D -> result v7/A32/A64 float64x1_t vreinterpret_f64_u64(uint64x1_t a) a -> Vd.1D NOP Vd.1D -> result A64 poly64x1_t vreinterpret_p64_u64(uint64x1_t a) a -> Vd.1D NOP Vd.1D -> result A32/A64 @@ -3482,6 +3626,7 @@ uint16x4_t vreinterpret_u16_s64(int64x1_t a) a -> Vd.1D NOP Vd.4H -> result v7/A uint32x2_t vreinterpret_u32_s64(int64x1_t a) a -> Vd.1D NOP Vd.2S -> result v7/A32/A64 poly8x8_t vreinterpret_p8_s64(int64x1_t a) a -> Vd.1D NOP Vd.8B -> result v7/A32/A64 poly16x4_t vreinterpret_p16_s64(int64x1_t a) a -> Vd.1D NOP Vd.4H -> result v7/A32/A64 +mfloat8x8_t vreinterpret_mf8_s64(int64x1_t a) a -> Vd.1D NOP Vd.8B -> result A64 uint64x1_t vreinterpret_u64_s64(int64x1_t a) a -> Vd.1D NOP Vd.1D -> result v7/A32/A64 float64x1_t vreinterpret_f64_s64(int64x1_t a) a -> Vd.1D NOP Vd.1D -> result A64 uint64x1_t vreinterpret_u64_p64(poly64x1_t a) a -> Vd.1D NOP Vd.1D -> result A32/A64 @@ -3495,6 +3640,7 @@ uint16x4_t vreinterpret_u16_f16(float16x4_t a) a -> Vd.4H NOP Vd.4H -> result v7 uint32x2_t vreinterpret_u32_f16(float16x4_t a) a -> Vd.4H NOP Vd.2S -> result v7/A32/A64 poly8x8_t vreinterpret_p8_f16(float16x4_t a) a -> Vd.4H NOP Vd.8B -> result v7/A32/A64 poly16x4_t vreinterpret_p16_f16(float16x4_t a) a -> Vd.4H NOP Vd.4H -> result v7/A32/A64 +mfloat8x8_t vreinterpret_mf8_f16(float16x4_t a) a -> Vd.4H NOP Vd.8B -> result A64 uint64x1_t vreinterpret_u64_f16(float16x4_t a) a -> Vd.4H NOP Vd.1D -> result v7/A32/A64 int64x1_t vreinterpret_s64_f16(float16x4_t a) a -> Vd.4H NOP Vd.1D -> result v7/A32/A64 float64x1_t vreinterpret_f64_f16(float16x4_t a) a -> Vd.4H NOP Vd.1D -> result A64 @@ -3507,6 +3653,7 @@ uint16x8_t vreinterpretq_u16_s8(int8x16_t a) a -> Vd.16B NOP Vd.8H -> result v7/ uint32x4_t vreinterpretq_u32_s8(int8x16_t a) a -> Vd.16B NOP Vd.4S -> result v7/A32/A64 poly8x16_t vreinterpretq_p8_s8(int8x16_t a) a -> Vd.16B NOP Vd.16B -> result v7/A32/A64 poly16x8_t vreinterpretq_p16_s8(int8x16_t a) a -> Vd.16B NOP Vd.8H -> result v7/A32/A64 +mfloat8x16_t vreinterpretq_mf8_s8(int8x16_t a) a -> Vd.16B NOP Vd.16B -> result A64 uint64x2_t vreinterpretq_u64_s8(int8x16_t a) a -> Vd.16B NOP Vd.2D -> result v7/A32/A64 int64x2_t vreinterpretq_s64_s8(int8x16_t a) a -> Vd.16B NOP Vd.2D -> result v7/A32/A64 float64x2_t vreinterpretq_f64_s8(int8x16_t a) a -> Vd.16B NOP Vd.2D -> result A64 @@ -3521,6 +3668,7 @@ uint16x8_t vreinterpretq_u16_s16(int16x8_t a) a -> Vd.8H NOP Vd.8H -> result v7/ uint32x4_t vreinterpretq_u32_s16(int16x8_t a) a -> Vd.8H NOP Vd.4S -> result v7/A32/A64 poly8x16_t vreinterpretq_p8_s16(int16x8_t a) a -> Vd.8H NOP Vd.16B -> result v7/A32/A64 poly16x8_t vreinterpretq_p16_s16(int16x8_t a) a -> Vd.8H NOP Vd.8H -> result v7/A32/A64 +mfloat8x16_t vreinterpretq_mf8_s16(int16x8_t a) a -> Vd.8H NOP Vd.16B -> result A64 uint64x2_t vreinterpretq_u64_s16(int16x8_t a) a -> Vd.8H NOP Vd.2D -> result v7/A32/A64 int64x2_t vreinterpretq_s64_s16(int16x8_t a) a -> Vd.8H NOP Vd.2D -> result v7/A32/A64 float64x2_t vreinterpretq_f64_s16(int16x8_t a) a -> Vd.8H NOP Vd.2D -> result A64 @@ -3535,6 +3683,7 @@ uint16x8_t vreinterpretq_u16_s32(int32x4_t a) a -> Vd.4S NOP Vd.8H -> result v7/ uint32x4_t vreinterpretq_u32_s32(int32x4_t a) a -> Vd.4S NOP Vd.4S -> result v7/A32/A64 poly8x16_t vreinterpretq_p8_s32(int32x4_t a) a -> Vd.4S NOP Vd.16B -> result v7/A32/A64 poly16x8_t vreinterpretq_p16_s32(int32x4_t a) a -> Vd.4S NOP Vd.8H -> result v7/A32/A64 +mfloat8x16_t vreinterpretq_mf8_s32(int32x4_t a) a -> Vd.4S NOP Vd.16B -> result A64 uint64x2_t vreinterpretq_u64_s32(int32x4_t a) a -> Vd.4S NOP Vd.2D -> result v7/A32/A64 int64x2_t vreinterpretq_s64_s32(int32x4_t a) a -> Vd.4S NOP Vd.2D -> result v7/A32/A64 float64x2_t vreinterpretq_f64_s32(int32x4_t a) a -> Vd.4S NOP Vd.2D -> result A64 @@ -3549,6 +3698,7 @@ uint16x8_t vreinterpretq_u16_f32(float32x4_t a) a -> Vd.4S NOP Vd.8H -> result v uint32x4_t vreinterpretq_u32_f32(float32x4_t a) a -> Vd.4S NOP Vd.4S -> result v7/A32/A64 poly8x16_t vreinterpretq_p8_f32(float32x4_t a) a -> Vd.4S NOP Vd.16B -> result v7/A32/A64 poly16x8_t vreinterpretq_p16_f32(float32x4_t a) a -> Vd.4S NOP Vd.8H -> result v7/A32/A64 +mfloat8x16_t vreinterpretq_mf8_f32(float32x4_t a) a -> Vd.4S NOP Vd.16B -> result A64 uint64x2_t vreinterpretq_u64_f32(float32x4_t a) a -> Vd.4S NOP Vd.2D -> result v7/A32/A64 int64x2_t vreinterpretq_s64_f32(float32x4_t a) a -> Vd.4S NOP Vd.2D -> result v7/A32/A64 float64x2_t vreinterpretq_f64_f32(float32x4_t a) a -> Vd.4S NOP Vd.2D -> result A64 @@ -3565,6 +3715,7 @@ uint16x8_t vreinterpretq_u16_u8(uint8x16_t a) a -> Vd.16B NOP Vd.8H -> result v7 uint32x4_t vreinterpretq_u32_u8(uint8x16_t a) a -> Vd.16B NOP Vd.4S -> result v7/A32/A64 poly8x16_t vreinterpretq_p8_u8(uint8x16_t a) a -> Vd.16B NOP Vd.16B -> result v7/A32/A64 poly16x8_t vreinterpretq_p16_u8(uint8x16_t a) a -> Vd.16B NOP Vd.8H -> result v7/A32/A64 +mfloat8x16_t vreinterpretq_mf8_u8(uint8x16_t a) a -> Vd.16B NOP Vd.16B -> result A64 uint64x2_t vreinterpretq_u64_u8(uint8x16_t a) a -> Vd.16B NOP Vd.2D -> result v7/A32/A64 int64x2_t vreinterpretq_s64_u8(uint8x16_t a) a -> Vd.16B NOP Vd.2D -> result v7/A32/A64 float64x2_t vreinterpretq_f64_u8(uint8x16_t a) a -> Vd.16B NOP Vd.2D -> result A64 @@ -3579,6 +3730,7 @@ uint8x16_t vreinterpretq_u8_u16(uint16x8_t a) a -> Vd.8H NOP Vd.16B -> result v7 uint32x4_t vreinterpretq_u32_u16(uint16x8_t a) a -> Vd.8H NOP Vd.4S -> result v7/A32/A64 poly8x16_t vreinterpretq_p8_u16(uint16x8_t a) a -> Vd.8H NOP Vd.16B -> result v7/A32/A64 poly16x8_t vreinterpretq_p16_u16(uint16x8_t a) a -> Vd.8H NOP Vd.8H -> result v7/A32/A64 +mfloat8x16_t vreinterpretq_mf8_u16(uint16x8_t a) a -> Vd.8H NOP Vd.16B -> result A64 uint64x2_t vreinterpretq_u64_u16(uint16x8_t a) a -> Vd.8H NOP Vd.2D -> result v7/A32/A64 int64x2_t vreinterpretq_s64_u16(uint16x8_t a) a -> Vd.8H NOP Vd.2D -> result v7/A32/A64 float64x2_t vreinterpretq_f64_u16(uint16x8_t a) a -> Vd.8H NOP Vd.2D -> result A64 @@ -3593,6 +3745,7 @@ uint8x16_t vreinterpretq_u8_u32(uint32x4_t a) a -> Vd.4S NOP Vd.16B -> result v7 uint16x8_t vreinterpretq_u16_u32(uint32x4_t a) a -> Vd.4S NOP Vd.8H -> result v7/A32/A64 poly8x16_t vreinterpretq_p8_u32(uint32x4_t a) a -> Vd.4S NOP Vd.16B -> result v7/A32/A64 poly16x8_t vreinterpretq_p16_u32(uint32x4_t a) a -> Vd.4S NOP Vd.8H -> result v7/A32/A64 +mfloat8x16_t vreinterpretq_mf8_u32(uint32x4_t a) a -> Vd.4S NOP Vd.16B -> result A64 uint64x2_t vreinterpretq_u64_u32(uint32x4_t a) a -> Vd.4S NOP Vd.2D -> result v7/A32/A64 int64x2_t vreinterpretq_s64_u32(uint32x4_t a) a -> Vd.4S NOP Vd.2D -> result v7/A32/A64 float64x2_t vreinterpretq_f64_u32(uint32x4_t a) a -> Vd.4S NOP Vd.2D -> result A64 @@ -3613,6 +3766,20 @@ float64x2_t vreinterpretq_f64_p8(poly8x16_t a) a -> Vd.16B NOP Vd.2D -> result A poly64x2_t vreinterpretq_p64_p8(poly8x16_t a) a -> Vd.16B NOP Vd.2D -> result A32/A64 poly128_t vreinterpretq_p128_p8(poly8x16_t a) a -> Vd.16B NOP Vd.1Q -> result A32/A64 float16x8_t vreinterpretq_f16_p8(poly8x16_t a) a -> Vd.16B NOP Vd.8H -> result v7/A32/A64 +int8x16_t vreinterpretq_s8_mf8(mfloat8x16_t a) a -> Vd.16B NOP Vd.16B -> result A64 +int16x8_t vreinterpretq_s16_mf8(mfloat8x16_t a) a -> Vd.16B NOP Vd.8H -> result A64 +int32x4_t vreinterpretq_s32_mf8(mfloat8x16_t a) a -> Vd.16B NOP Vd.4S -> result A64 +float32x4_t vreinterpretq_f32_mf8(mfloat8x16_t a) a -> Vd.16B NOP Vd.4S -> result A64 +uint8x16_t vreinterpretq_u8_mf8(mfloat8x16_t a) a -> Vd.16B NOP Vd.16B -> result A64 +uint16x8_t vreinterpretq_u16_mf8(mfloat8x16_t a) a -> Vd.16B NOP Vd.8H -> result A64 +uint32x4_t vreinterpretq_u32_mf8(mfloat8x16_t a) a -> Vd.16B NOP Vd.4S -> result A64 +poly16x8_t vreinterpretq_p16_mf8(mfloat8x16_t a) a -> Vd.16B NOP Vd.8H -> result A64 +uint64x2_t vreinterpretq_u64_mf8(mfloat8x16_t a) a -> Vd.16B NOP Vd.2D -> result A64 +int64x2_t vreinterpretq_s64_mf8(mfloat8x16_t a) a -> Vd.16B NOP Vd.2D -> result A64 +float64x2_t vreinterpretq_f64_mf8(mfloat8x16_t a) a -> Vd.16B NOP Vd.2D -> result A64 +poly64x2_t vreinterpretq_p64_mf8(mfloat8x16_t a) a -> Vd.16B NOP Vd.2D -> result A64 +poly128_t vreinterpretq_p128_mf8(mfloat8x16_t a) a -> Vd.16B NOP Vd.1Q -> result A64 +float16x8_t vreinterpretq_f16_mf8(mfloat8x16_t a) a -> Vd.16B NOP Vd.8H -> result A64 int8x16_t vreinterpretq_s8_p16(poly16x8_t a) a -> Vd.8H NOP Vd.16B -> result v7/A32/A64 int16x8_t vreinterpretq_s16_p16(poly16x8_t a) a -> Vd.8H NOP Vd.8H -> result v7/A32/A64 int32x4_t vreinterpretq_s32_p16(poly16x8_t a) a -> Vd.8H NOP Vd.4S -> result v7/A32/A64 @@ -3621,6 +3788,7 @@ uint8x16_t vreinterpretq_u8_p16(poly16x8_t a) a -> Vd.8H NOP Vd.16B -> result v7 uint16x8_t vreinterpretq_u16_p16(poly16x8_t a) a -> Vd.8H NOP Vd.8H -> result v7/A32/A64 uint32x4_t vreinterpretq_u32_p16(poly16x8_t a) a -> Vd.8H NOP Vd.4S -> result v7/A32/A64 poly8x16_t vreinterpretq_p8_p16(poly16x8_t a) a -> Vd.8H NOP Vd.16B -> result v7/A32/A64 +mfloat8x16_t vreinterpretq_mf8_p16(poly16x8_t a) a -> Vd.8H NOP Vd.16B -> result A64 uint64x2_t vreinterpretq_u64_p16(poly16x8_t a) a -> Vd.8H NOP Vd.2D -> result v7/A32/A64 int64x2_t vreinterpretq_s64_p16(poly16x8_t a) a -> Vd.8H NOP Vd.2D -> result v7/A32/A64 float64x2_t vreinterpretq_f64_p16(poly16x8_t a) a -> Vd.8H NOP Vd.2D -> result A64 @@ -3636,6 +3804,7 @@ uint16x8_t vreinterpretq_u16_u64(uint64x2_t a) a -> Vd.2D NOP Vd.8H -> result v7 uint32x4_t vreinterpretq_u32_u64(uint64x2_t a) a -> Vd.2D NOP Vd.4S -> result v7/A32/A64 poly8x16_t vreinterpretq_p8_u64(uint64x2_t a) a -> Vd.2D NOP Vd.16B -> result v7/A32/A64 poly16x8_t vreinterpretq_p16_u64(uint64x2_t a) a -> Vd.2D NOP Vd.8H -> result v7/A32/A64 +mfloat8x16_t vreinterpretq_mf8_u64(uint64x2_t a) a -> Vd.2D NOP Vd.16B -> result A64 int64x2_t vreinterpretq_s64_u64(uint64x2_t a) a -> Vd.2D NOP Vd.2D -> result v7/A32/A64 float64x2_t vreinterpretq_f64_u64(uint64x2_t a) a -> Vd.2D NOP Vd.2D -> result v7/A32/A64 float64x2_t vreinterpretq_f64_s64(int64x2_t a) a -> Vd.2D NOP Vd.2D -> result A64 @@ -3653,6 +3822,7 @@ uint16x8_t vreinterpretq_u16_s64(int64x2_t a) a -> Vd.2D NOP Vd.8H -> result v7/ uint32x4_t vreinterpretq_u32_s64(int64x2_t a) a -> Vd.2D NOP Vd.4S -> result v7/A32/A64 poly8x16_t vreinterpretq_p8_s64(int64x2_t a) a -> Vd.2D NOP Vd.16B -> result v7/A32/A64 poly16x8_t vreinterpretq_p16_s64(int64x2_t a) a -> Vd.2D NOP Vd.8H -> result v7/A32/A64 +mfloat8x16_t vreinterpretq_mf8_s64(int64x2_t a) a -> Vd.2D NOP Vd.16B -> result A64 uint64x2_t vreinterpretq_u64_s64(int64x2_t a) a -> Vd.2D NOP Vd.2D -> result v7/A32/A64 uint64x2_t vreinterpretq_u64_p64(poly64x2_t a) a -> Vd.2D NOP Vd.2D -> result A32/A64 float16x8_t vreinterpretq_f16_s64(int64x2_t a) a -> Vd.2D NOP Vd.8H -> result v7/A32/A64 @@ -3665,6 +3835,7 @@ uint16x8_t vreinterpretq_u16_f16(float16x8_t a) a -> Vd.8H NOP Vd.8H -> result v uint32x4_t vreinterpretq_u32_f16(float16x8_t a) a -> Vd.8H NOP Vd.4S -> result v7/A32/A64 poly8x16_t vreinterpretq_p8_f16(float16x8_t a) a -> Vd.8H NOP Vd.16B -> result v7/A32/A64 poly16x8_t vreinterpretq_p16_f16(float16x8_t a) a -> Vd.8H NOP Vd.8H -> result v7/A32/A64 +mfloat8x16_t vreinterpretq_mf8_f16(float16x8_t a) a -> Vd.8H NOP Vd.16B -> result A64 uint64x2_t vreinterpretq_u64_f16(float16x8_t a) a -> Vd.8H NOP Vd.2D -> result v7/A32/A64 int64x2_t vreinterpretq_s64_f16(float16x8_t a) a -> Vd.8H NOP Vd.2D -> result v7/A32/A64 float64x2_t vreinterpretq_f64_f16(float16x8_t a) a -> Vd.8H NOP Vd.2D -> result A64 @@ -3678,6 +3849,7 @@ uint16x4_t vreinterpret_u16_f64(float64x1_t a) a -> Vd.1D NOP Vd.4H -> result A6 uint32x2_t vreinterpret_u32_f64(float64x1_t a) a -> Vd.1D NOP Vd.2S -> result A64 poly8x8_t vreinterpret_p8_f64(float64x1_t a) a -> Vd.1D NOP Vd.8B -> result A64 poly16x4_t vreinterpret_p16_f64(float64x1_t a) a -> Vd.1D NOP Vd.4H -> result A64 +mfloat8x8_t vreinterpret_mf8_f64(float64x1_t a) a -> Vd.1D NOP Vd.8B -> result A64 uint64x1_t vreinterpret_u64_f64(float64x1_t a) a -> Vd.1D NOP Vd.1D -> result A64 int64x1_t vreinterpret_s64_f64(float64x1_t a) a -> Vd.1D NOP Vd.1D -> result A64 float16x4_t vreinterpret_f16_f64(float64x1_t a) a -> Vd.1D NOP Vd.4H -> result A64 @@ -3690,6 +3862,7 @@ uint16x8_t vreinterpretq_u16_f64(float64x2_t a) a -> Vd.2D NOP Vd.8H -> result A uint32x4_t vreinterpretq_u32_f64(float64x2_t a) a -> Vd.2D NOP Vd.4S -> result A64 poly8x16_t vreinterpretq_p8_f64(float64x2_t a) a -> Vd.2D NOP Vd.16B -> result A64 poly16x8_t vreinterpretq_p16_f64(float64x2_t a) a -> Vd.2D NOP Vd.8H -> result A64 +mfloat8x16_t vreinterpretq_mf8_f64(float64x2_t a) a -> Vd.2D NOP Vd.16B -> result A64 uint64x2_t vreinterpretq_u64_f64(float64x2_t a) a -> Vd.2D NOP Vd.2D -> result A64 int64x2_t vreinterpretq_s64_f64(float64x2_t a) a -> Vd.2D NOP Vd.2D -> result A64 float16x8_t vreinterpretq_f16_f64(float64x2_t a) a -> Vd.2D NOP Vd.8H -> result A64 @@ -3702,6 +3875,7 @@ uint16x4_t vreinterpret_u16_p64(poly64x1_t a) a -> Vd.1D NOP Vd.4H -> result A32 uint32x2_t vreinterpret_u32_p64(poly64x1_t a) a -> Vd.1D NOP Vd.2S -> result A32/A64 poly8x8_t vreinterpret_p8_p64(poly64x1_t a) a -> Vd.1D NOP Vd.8B -> result A32/A64 poly16x4_t vreinterpret_p16_p64(poly64x1_t a) a -> Vd.1D NOP Vd.4H -> result A32/A64 +mfloat8x8_t vreinterpret_mf8_p64(poly64x1_t a) a -> Vd.1D NOP Vd.8B -> result A32/A64 int64x1_t vreinterpret_s64_p64(poly64x1_t a) a -> Vd.1D NOP Vd.1D -> result A32/A64 float64x1_t vreinterpret_f64_p64(poly64x1_t a) a -> Vd.1D NOP Vd.1D -> result A64 float16x4_t vreinterpret_f16_p64(poly64x1_t a) a -> Vd.1D NOP Vd.4H -> result A32/A64 @@ -3713,6 +3887,7 @@ uint16x8_t vreinterpretq_u16_p64(poly64x2_t a) a -> Vd.2D NOP Vd.8H -> result A3 uint32x4_t vreinterpretq_u32_p64(poly64x2_t a) a -> Vd.2D NOP Vd.4S -> result A32/A64 poly8x16_t vreinterpretq_p8_p64(poly64x2_t a) a -> Vd.2D NOP Vd.16B -> result A32/A64 poly16x8_t vreinterpretq_p16_p64(poly64x2_t a) a -> Vd.2D NOP Vd.8H -> result A32/A64 +mfloat8x16_t vreinterpretq_mf8_p64(poly64x2_t a) a -> Vd.2D NOP Vd.16B -> result A64 int64x2_t vreinterpretq_s64_p64(poly64x2_t a) a -> Vd.2D NOP Vd.2D -> result A32/A64 float64x2_t vreinterpretq_f64_p64(poly64x2_t a) a -> Vd.2D NOP Vd.2D -> result A64 float16x8_t vreinterpretq_f16_p64(poly64x2_t a) a -> Vd.2D NOP Vd.8H -> result A32/A64 @@ -3724,10 +3899,15 @@ uint16x8_t vreinterpretq_u16_p128(poly128_t a) a -> Vd.1Q NOP Vd.8H -> result A3 uint32x4_t vreinterpretq_u32_p128(poly128_t a) a -> Vd.1Q NOP Vd.4S -> result A32/A64 poly8x16_t vreinterpretq_p8_p128(poly128_t a) a -> Vd.1Q NOP Vd.16B -> result A32/A64 poly16x8_t vreinterpretq_p16_p128(poly128_t a) a -> Vd.1Q NOP Vd.8H -> result A32/A64 +mfloat8x16_t vreinterpretq_mf8_p128(poly128_t a) a -> Vd.1Q NOP Vd.16B -> result A64 uint64x2_t vreinterpretq_u64_p128(poly128_t a) a -> Vd.1Q NOP Vd.2D -> result A32/A64 int64x2_t vreinterpretq_s64_p128(poly128_t a) a -> Vd.1Q NOP Vd.2D -> result A32/A64 float64x2_t vreinterpretq_f64_p128(poly128_t a) a -> Vd.1Q NOP Vd.2D -> result A64 float16x8_t vreinterpretq_f16_p128(poly128_t a) a -> Vd.1Q NOP Vd.8H -> result A32/A64 +mfloat8x8_t vreinterpret_mf8_u8(uint8x8_t a) a -> Vd.8B NOP Vd.8B -> result A64 +mfloat8x16_t vreinterpretq_mf8_u8(uint8x16_t a) a -> Vd.16B NOP Vd.16B -> result A64 +uint8x8_t vreinterpret_u8_mf8(mfloat8x8_t a) a -> Vd.8B NOP Vd.8B -> result A64 +uint8x16_t vreinterpretq_u8_mf8(mfloat8x16_t a) a -> Vd.16B NOP Vd.16B -> result A64 poly128_t vldrq_p128(poly128_t const *ptr) ptr -> Xn LDR Qd,[Xn] Qd -> result A32/A64 void vstrq_p128(poly128_t *ptr, poly128_t val) val -> Qt;ptr -> Xn STR Qt,[Xn] A32/A64 @@ -4470,6 +4650,7 @@ bfloat16x4_t vreinterpret_bf16_u16(uint16x4_t a) a -> Vd.4H NOP Vd.4H -> result bfloat16x4_t vreinterpret_bf16_u32(uint32x2_t a) a -> Vd.2S NOP Vd.4H -> result A32/A64 bfloat16x4_t vreinterpret_bf16_p8(poly8x8_t a) a -> Vd.8B NOP Vd.4H -> result A32/A64 bfloat16x4_t vreinterpret_bf16_p16(poly16x4_t a) a -> Vd.4H NOP Vd.4H -> result A32/A64 +bfloat16x4_t vreinterpret_bf16_mf8(mfloat8x8_t a) a -> Vd.8B NOP Vd.4H -> result A64 bfloat16x4_t vreinterpret_bf16_u64(uint64x1_t a) a -> Vd.1D NOP Vd.4H -> result A32/A64 bfloat16x4_t vreinterpret_bf16_s64(int64x1_t a) a -> Vd.1D NOP Vd.4H -> result A32/A64 bfloat16x8_t vreinterpretq_bf16_s8(int8x16_t a) a -> Vd.16B NOP Vd.8H -> result A32/A64 @@ -4481,6 +4662,7 @@ bfloat16x8_t vreinterpretq_bf16_u16(uint16x8_t a) a -> Vd.8H NOP Vd.8H -> result bfloat16x8_t vreinterpretq_bf16_u32(uint32x4_t a) a -> Vd.4S NOP Vd.8H -> result A32/A64 bfloat16x8_t vreinterpretq_bf16_p8(poly8x16_t a) a -> Vd.16B NOP Vd.8H -> result A32/A64 bfloat16x8_t vreinterpretq_bf16_p16(poly16x8_t a) a -> Vd.8H NOP Vd.8H -> result A32/A64 +bfloat16x8_t vreinterpretq_bf16_mf8(mfloat8x16_t a) a -> Vd.16B NOP Vd.8H -> result A64 bfloat16x8_t vreinterpretq_bf16_u64(uint64x2_t a) a -> Vd.2D NOP Vd.8H -> result A32/A64 bfloat16x8_t vreinterpretq_bf16_s64(int64x2_t a) a -> Vd.2D NOP Vd.8H -> result A32/A64 bfloat16x4_t vreinterpret_bf16_f64(float64x1_t a) a -> Vd.1D NOP Vd.4H -> result A64 @@ -4498,6 +4680,7 @@ uint16x4_t vreinterpret_u16_bf16(bfloat16x4_t a) a -> Vd.4H NOP Vd.4H -> result uint32x2_t vreinterpret_u32_bf16(bfloat16x4_t a) a -> Vd.4H NOP Vd.2S -> result A32/A64 poly8x8_t vreinterpret_p8_bf16(bfloat16x4_t a) a -> Vd.4H NOP Vd.8B -> result A32/A64 poly16x4_t vreinterpret_p16_bf16(bfloat16x4_t a) a -> Vd.4H NOP Vd.4H -> result A32/A64 +mfloat8x8_t vreinterpret_mf8_bf16(bfloat16x4_t a) a -> Vd.4H NOP Vd.8B -> result A64 uint64x1_t vreinterpret_u64_bf16(bfloat16x4_t a) a -> Vd.4H NOP Vd.1D -> result A32/A64 int64x1_t vreinterpret_s64_bf16(bfloat16x4_t a) a -> Vd.4H NOP Vd.1D -> result A32/A64 float64x1_t vreinterpret_f64_bf16(bfloat16x4_t a) a -> Vd.4H NOP Vd.1D -> result A64 @@ -4511,6 +4694,7 @@ uint16x8_t vreinterpretq_u16_bf16(bfloat16x8_t a) a -> Vd.8H NOP Vd.8H -> result uint32x4_t vreinterpretq_u32_bf16(bfloat16x8_t a) a -> Vd.8H NOP Vd.4S -> result A32/A64 poly8x16_t vreinterpretq_p8_bf16(bfloat16x8_t a) a -> Vd.8H NOP Vd.16B -> result A32/A64 poly16x8_t vreinterpretq_p16_bf16(bfloat16x8_t a) a -> Vd.8H NOP Vd.8H -> result A32/A64 +mfloat8x16_t vreinterpretq_mf8_bf16(bfloat16x8_t a) a -> Vd.8H NOP Vd.16B -> result A64 uint64x2_t vreinterpretq_u64_bf16(bfloat16x8_t a) a -> Vd.8H NOP Vd.2D -> result A32/A64 int64x2_t vreinterpretq_s64_bf16(bfloat16x8_t a) a -> Vd.8H NOP Vd.2D -> result A32/A64 float64x2_t vreinterpretq_f64_bf16(bfloat16x8_t a) a -> Vd.8H NOP Vd.2D -> result A64 @@ -4547,4 +4731,70 @@ float32x4_t vbfmlaltq_f32(float32x4_t r, bfloat16x8_t a, bfloat16x8_t b) r -> Vd float32x4_t vbfmlalbq_lane_f32(float32x4_t r, bfloat16x8_t a, bfloat16x4_t b, __builtin_constant_p(lane)) r -> Vd.4S;a -> Vn.8H;b -> Vm.4H;0 <= lane <= 3 BFMLALB Vd.4S,Vn.8H,Vm.H[lane] Vd.4S -> result A32/A64 float32x4_t vbfmlalbq_laneq_f32(float32x4_t r, bfloat16x8_t a, bfloat16x8_t b, __builtin_constant_p(lane)) r -> Vd.4S;a -> Vn.8H;b -> Vm.8H;0 <= lane <= 7 BFMLALB Vd.4S,Vn.8H,Vm.H[lane] Vd.4S -> result A32/A64 float32x4_t vbfmlaltq_lane_f32(float32x4_t r, bfloat16x8_t a, bfloat16x4_t b, __builtin_constant_p(lane)) r -> Vd.4S;a -> Vn.8H;b -> Vm.4H;0 <= lane <= 3 BFMLALT Vd.4S,Vn.8H,Vm.H[lane] Vd.4S -> result A32/A64 -float32x4_t vbfmlaltq_laneq_f32(float32x4_t r, bfloat16x8_t a, bfloat16x8_t b, __builtin_constant_p(lane)) r -> Vd.4S;a -> Vn.8H;b -> Vm.8H;0 <= lane <= 7 BFMLALT Vd.4S,Vn.8H,Vm.H[lane] Vd.4S -> result A32/A64 \ No newline at end of file +float32x4_t vbfmlaltq_laneq_f32(float32x4_t r, bfloat16x8_t a, bfloat16x8_t b, __builtin_constant_p(lane)) r -> Vd.4S;a -> Vn.8H;b -> Vm.8H;0 <= lane <= 7 BFMLALT Vd.4S,Vn.8H,Vm.H[lane] Vd.4S -> result A32/A64 +
Modal 8-bit floating-point intrinsics +bfloat16x8_t vcvt1_bf16_mf8_fpm(mfloat8x8_t vn, fpm_t fpm) vn -> Vn.8B BF1CVTL Vd.8H,Vn.8B Vd.8H -> result A64 +bfloat16x8_t vcvt1_low_bf16_mf8_fpm(mfloat8x16_t vn, fpm_t fpm) vn -> Vn.8B BF1CVTL Vd.8H,Vn.8B Vd.8H -> result A64 +bfloat16x8_t vcvt2_bf16_mf8_fpm(mfloat8x8_t vn, fpm_t fpm) vn -> Vn.8B BF2CVTL Vd.8H,Vn.8B Vd.8H -> result A64 +bfloat16x8_t vcvt2_low_bf16_mf8_fpm(mfloat8x16_t vn, fpm_t fpm) vn -> Vn.8B BF2CVTL Vd.8H,Vn.8B Vd.8H -> result A64 + +bfloat16x8_t vcvt1_high_bf16_mf8_fpm(mfloat8x16_t vn, fpm_t fpm) vn -> Vn.16B BF1CVTL2 Vd.8H,Vn.16B Vd.8H -> result A64 +bfloat16x8_t vcvt2_high_bf16_mf8_fpm(mfloat8x16_t vn, fpm_t fpm) vn -> Vn.16B BF2CVTL2 Vd.8H,Vn.16B Vd.8H -> result A64 + +float16x8_t vcvt1_f16_mf8_fpm(mfloat8x8_t vn, fpm_t fpm) vn -> Vn.8B F1CVTL Vd.8H,Vn.8B Vd.8H -> result A64 +float16x8_t vcvt1_low_f16_mf8_fpm(mfloat8x16_t vn, fpm_t fpm) vn -> Vn.8B F1CVTL Vd.8H,Vn.8B Vd.8H -> result A64 +float16x8_t vcvt2_f16_mf8_fpm(mfloat8x8_t vn, fpm_t fpm) vn -> Vn.8B F2CVTL Vd.8H,Vn.8B Vd.8H -> result A64 +float16x8_t vcvt2_low_f16_mf8_fpm(mfloat8x16_t vn, fpm_t fpm) vn -> Vn.8B F2CVTL Vd.8H,Vn.8B Vd.8H -> result A64 + +float16x8_t vcvt1_high_f16_mf8_fpm(mfloat8x16_t vn, fpm_t fpm) vn -> Vn.16B F1CVTL2 Vd.8H,Vn.16B Vd.8H -> result A64 +float16x8_t vcvt2_high_f16_mf8_fpm(mfloat8x16_t vn, fpm_t fpm) vn -> Vn.16B F2CVTL2 Vd.8H,Vn.16B Vd.8H -> result A64 + +mfloat8x8_t vcvt_mf8_f32_fpm(float32x4_t vn, float32x4_t vm, fpm_t fpm) vn -> Vn.4S;vm -> Vm.4S FCVTN Vd.8B, Vn.4S, Vm.4S Vd.8B -> result A64 +mfloat8x16_t vcvt_high_mf8_f32_fpm(mfloat8x8_t vd, float32x4_t vn, float32x4_t vm, fpm_t fpm) vn -> Vn.4S;vm -> Vm.4S FCVTN2 Vd.16B, Vn.4S, Vm.4S Vd.16B -> result A64 + +mfloat8x8_t vcvt_mf8_f16_fpm(float16x4_t vn, float16x4_t vm, fpm_t fpm) vn -> Vn.4H;vm -> Vm.4H FCVTN Vd.8B, Vn.4H, Vm.4H Vd.8B -> result A64 +mfloat8x16_t vcvtq_mf8_f16_fpm(float16x8_t vn, float16x8_t vm, fpm_t fpm) vn -> Vn.8H;vm -> Vm.8H FCVTN Vd.16B, Vn.8H, Vm.8H Vd.16B -> result A64 + +float16x4_t vscale_f16(float16x4_t vn, int16x4_t vm) vn -> Vn.4H;vm -> Vm.4H FSCALE Vd.4H, Vn.4H, Vm.4H Vd.4H -> result A64 +float16x8_t vscaleq_f16(float16x8_t vn, int16x8_t vm) vn -> Vn.8H;vm -> Vm.8H FSCALE Vd.8H, Vn.8H, Vm.8H Vd.8H -> result A64 +float32x2_t vscale_f32(float32x2_t vn, int32x2_t vm) vn -> Vn.2S;vm -> Vm.2S FSCALE Vd.2S, Vn.2S, Vm.2S Vd.2S -> result A64 +float32x4_t vscaleq_f32(float32x4_t vn, int32x4_t vm) vn -> Vn.4S;vm -> Vm.4S FSCALE Vd.4S, Vn.4S, Vm.4S Vd.4S -> result A64 +float64x2_t vscaleq_f64(float64x2_t vn, int64x2_t vm) vn -> Vn.2D;vm -> Vm.2D FSCALE Vd.2D, Vn.2D, Vm.2D Vd.2D -> result A64 + +float32x2_t vdot_f32_mf8_fpm(float32x2_t vd, mfloat8x8_t vn, mfloat8x8_t vm, fpm_t fpm) vd -> Vd.2S;vn -> Vn.8B;vm -> Vm.8B FDOT Vd.2S, Vn.8B, Vm.8B Vd.2S -> result A64 +float32x4_t vdotq_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn, mfloat8x16_t vm, fpm_t fpm) vd -> Vd.4S;vn -> Vn.16B;vm -> Vm.16B FDOT Vd.4S, Vn.16B, Vm.16B Vd.4S -> result A64 + +float32x2_t vdot_lane_f32_mf8_fpm(float32x2_t vd, mfloat8x8_t vn, mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm) vd -> Vd.2S;vn -> Vn.8B;vm -> Vm.4B;0 <= lane <= 1 FDOT Vd.2S, Vn.8B, Vm.4B[lane] Vd.2S -> result A64 +float32x2_t vdot_laneq_f32_mf8_fpm(float32x2_t vd, mfloat8x8_t vn, mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm) vd -> Vd.2S;vn -> Vn.16B;vm -> Vm.4B;0 <= lane <= 3 FDOT Vd.2S, Vn.8B, Vm.4B[lane] Vd.2S -> result A64 +float32x4_t vdotq_lane_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn, mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm) vd -> Vd.4S;vn -> Vn.8B;vm -> Vm.4B;0 <= lane <= 1 FDOT Vd.4S, Vn.8B, Vm.4B[lane] Vd.4S -> result A64 +float32x4_t vdotq_laneq_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn, mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm) vd -> Vd.4S;vn -> Vn.16;vm -> Vm.4B;0 <= lane <= 3 FDOT Vd.4S, Vn.8B, Vm.4B[lane] Vd.4SB -> result A64 + +float16x4_t vdot_f16_mf8_fpm(float16x4_t vd, mfloat8x8_t vn, mfloat8x8_t vm, fpm_t fpm) vd -> Vd.4H;vn -> Vn.8B;vm -> Vm.8B FDOT Vd.4H, Vn.8B, Vm.8B Vd.4H -> result A64 +float16x8_t vdotq_f16_mf8_fpm(float16x8_t vd, mfloat8x16_t vn, mfloat8x16_t vm, fpm_t fpm) vd -> Vd.8H;vn -> Vn.16B;vm -> Vm.16B FDOT Vd.8H, Vn.16B, Vm.16B Vd.8H -> result A64 + +float16x4_t vdot_lane_f16_mf8_fpm(float16x4_t vd, mfloat8x8_t vn, mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm) vd -> Vd.4H;vn -> Vn.8B;vm -> Vm.2B;0 <= lane <= 3 FDOT Vd.4H, Vn.8B, Vm.2B[lane] Vd.4H -> result A64 +float16x4_t vdot_laneq_f16_mf8_fpm(float16x4_t vd, mfloat8x8_t vn, mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm) vd -> Vd.4H;vn -> Vn.8B;vm -> Vm.2B;0 <= lane <= 7 FDOT Vd.4H, Vn.8B, Vm.2B[lane] Vd.4H -> result A64 +float16x8_t vdotq_lane_f16_mf8_fpm(float16x8_t vd, mfloat8x16_t vn, mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm) vd -> Vd.8H;vn -> Vn.16B;vm -> Vm.2B;0 <= lane <= 3 FDOT Vd.8H, Vn.16B, Vm.2B[lane] Vd.8H -> result A64 +float16x8_t vdotq_laneq_f16_mf8_fpm(float16x8_t vd, mfloat8x16_t vn, mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm) vd -> Vd.8H;vn -> Vn.16B;vm -> Vm.2B;0 <= lane <= 7 FDOT Vd.8H, Vn.16B, Vm.2B[lane] Vd.8H -> result A64 + +float16x8_t vmlalbq_f16_mf8_fpm(float16x8_t vd, mfloat8x16_t vn, mfloat8x16_t vm, fpm_t fpm) vd -> Vd.8H;vn -> Vn.16B;vm -> Vm.16B FMLALB Vd.8H, Vn.16B, Vm.16B Vd.8H -> result A64 +float16x8_t vmlaltq_f16_mf8_fpm(float16x8_t vd, mfloat8x16_t vn, mfloat8x16_t vm, fpm_t fpm) vd -> Vd.8H;vn -> Vn.16B;vm -> Vm.16B FMLALT Vd.8H, Vn.16B, Vm.16B Vd.8H -> result A64 + +float16x8_t vmlalbq_lane_f16_mf8_fpm(float16x8_t vd, mfloat8x16_t vn, mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm) vd -> Vd.8H;vn -> Vn.16B;vm -> Vm.B;0 <= lane <= 7 FMLALB Vd.8H, Vn.16B, Vm.B[lane] Vd.8H -> result A64 +float16x8_t vmlalbq_laneq_f16_mf8_fpm(float16x8_t vd, mfloat8x16_t vn, mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm) vd -> Vd.8H;vn -> Vn.16B;vm -> Vm.B;0 <= lane <= 15 FMLALB Vd.8H, Vn.16B, Vm.B[lane] Vd.8H -> result A64 +float16x8_t vmlaltq_lane_f16_mf8_fpm(float16x8_t vd, mfloat8x16_t vn, mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm) vd -> Vd.8H;vn -> Vn.16B;vm -> Vm.B;0 <= lane <= 7 FMLALT Vd.8H, Vn.16B, Vm.B[lane] Vd.8H -> result A64 +float16x8_t vmlaltq_laneq_f16_mf8_fpm(float16x8_t vd, mfloat8x16_t vn, mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm) vd -> Vd.8H;vn -> Vn.16B;vm -> Vm.B;0 <= lane <= 15 FMLALT Vd.8H, Vn.16B, Vm.B[lane] Vd.8H -> result A64 + +float32x4_t vmlallbbq_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn, mfloat8x16_t vm, fpm_t fpm) vd -> Vd.4S;vn -> Vn.16B;vm -> Vm.16B FMLALLBB Vd.4S, Vn.16B, Vm.16B Vd.4S -> result A64 +float32x4_t vmlallbtq_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn, mfloat8x16_t vm, fpm_t fpm) vd -> Vd.4S;vn -> Vn.16B;vm -> Vm.16B FMLALLBT Vd.4S, Vn.16B, Vm.16B Vd.4S -> result A64 +float32x4_t vmlalltbq_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn, mfloat8x16_t vm, fpm_t fpm) vd -> Vd.4S;vn -> Vn.16B;vm -> Vm.16B FMLALLTB Vd.4S, Vn.16B, Vm.16B Vd.4S -> result A64 +float32x4_t vmlallttq_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn, mfloat8x16_t vm, fpm_t fpm) vd -> Vd.4S;vn -> Vn.16B;vm -> Vm.16B FMLALLTT Vd.4S, Vn.16B, Vm.16B Vd.4S -> result A64 + +float32x4_t vmlallbbq_lane_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn, mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm) vd -> Vd.4S;vm -> Vn.16B; vm -> Vm.B; 0 <= lane <= 7 FMLALLBB Vd.4S, Vn.16B, Vm.B[lane] Vd.4S -> result A64 +float32x4_t vmlallbbq_laneq_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn, mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm) vd -> Vd.4S;vm -> Vn.16B; vm -> Vm.B; 0 <= lane <= 15 FMLALLBB Vd.4S, Vn.16B, Vm.B[lane] Vd.4S -> result A64 +float32x4_t vmlallbtq_lane_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn, mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm) vd -> Vd.4S;vm -> Vn.16B; vm -> Vm.B; 0 <= lane <= 7 FMLALLBB Vd.4S, Vn.16B, Vm.B[lane] Vd.4S -> result A64 +float32x4_t vmlallbtq_laneq_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn, mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm) vd -> Vd.4S;vm -> Vn.16B; vm -> Vm.B; 0 <= lane <= 15 FMLALLBB Vd.4S, Vn.16B, Vm.B[lane] Vd.4S -> result A64 +float32x4_t vmlalltbq_lane_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn, mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm) vd -> Vd.4S;vm -> Vn.16B; vm -> Vm.B; 0 <= lane <= 7 FMLALLBB Vd.4S, Vn.16B, Vm.B[lane] Vd.4S -> result A64 +float32x4_t vmlalltbq_laneq_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn, mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm) vd -> Vd.4S;vm -> Vn.16B; vm -> Vm.B; 0 <= lane <= 15 FMLALLBB Vd.4S, Vn.16B, Vm.B[lane] Vd.4S -> result A64 +float32x4_t vmlallttq_lane_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn, mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm) vd -> Vd.4S;vm -> Vn.16B; vm -> Vm.B; 0 <= lane <= 7 FMLALLBB Vd.4S, Vn.16B, Vm.B[lane] Vd.4S -> result A64 +float32x4_t vmlallttq_laneq_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn, mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm) vd -> Vd.4S;vm -> Vn.16B; vm -> Vm.B; 0 <= lane <= 15 FMLALLBB Vd.4S, Vn.16B, Vm.B[lane] Vd.4S -> result A64 diff --git a/tools/intrinsic_db/advsimd_classification.csv b/tools/intrinsic_db/advsimd_classification.csv index 5a22e518..ddfe70ed 100644 --- a/tools/intrinsic_db/advsimd_classification.csv +++ b/tools/intrinsic_db/advsimd_classification.csv @@ -1843,6 +1843,8 @@ vcopy_lane_p8 Vector manipulation|Copy vector lane vcopyq_lane_p8 Vector manipulation|Copy vector lane vcopy_lane_p16 Vector manipulation|Copy vector lane vcopyq_lane_p16 Vector manipulation|Copy vector lane +vcopy_lane_mf8 Vector manipulation|Copy vector lane +vcopyq_lane_mf8 Vector manipulation|Copy vector lane vcopy_laneq_s8 Vector manipulation|Copy vector lane vcopyq_laneq_s8 Vector manipulation|Copy vector lane vcopy_laneq_s16 Vector manipulation|Copy vector lane @@ -1869,6 +1871,8 @@ vcopy_laneq_p8 Vector manipulation|Copy vector lane vcopyq_laneq_p8 Vector manipulation|Copy vector lane vcopy_laneq_p16 Vector manipulation|Copy vector lane vcopyq_laneq_p16 Vector manipulation|Copy vector lane +vcopy_laneq_mf8 Vector manipulation|Copy vector lane +vcopyq_laneq_mf8 Vector manipulation|Copy vector lane vrbit_s8 Vector manipulation|Reverse bits within elements vrbitq_s8 Vector manipulation|Reverse bits within elements vrbit_u8 Vector manipulation|Reverse bits within elements @@ -1889,6 +1893,7 @@ vcreate_f32 Vector manipulation|Create vector vcreate_p8 Vector manipulation|Create vector vcreate_p16 Vector manipulation|Create vector vcreate_f64 Vector manipulation|Create vector +vcreate_mf8 Vector manipulation|Create vector vdup_n_s8 Vector manipulation|Set all lanes to the same value vdupq_n_s8 Vector manipulation|Set all lanes to the same value vdup_n_s16 Vector manipulation|Set all lanes to the same value @@ -1915,6 +1920,8 @@ vdup_n_p16 Vector manipulation|Set all lanes to the same value vdupq_n_p16 Vector manipulation|Set all lanes to the same value vdup_n_f64 Vector manipulation|Set all lanes to the same value vdupq_n_f64 Vector manipulation|Set all lanes to the same value +vdup_n_mf8 Vector manipulation|Set all lanes to the same value +vdupq_n_mf8 Vector manipulation|Set all lanes to the same value vmov_n_s8 Vector manipulation|Set all lanes to the same value vmovq_n_s8 Vector manipulation|Set all lanes to the same value vmov_n_s16 Vector manipulation|Set all lanes to the same value @@ -1939,6 +1946,8 @@ vmov_n_p16 Vector manipulation|Set all lanes to the same value vmovq_n_p16 Vector manipulation|Set all lanes to the same value vmov_n_f64 Vector manipulation|Set all lanes to the same value vmovq_n_f64 Vector manipulation|Set all lanes to the same value +vmov_n_mf8 Vector manipulation|Set all lanes to the same value +vmovq_n_mf8 Vector manipulation|Set all lanes to the same value vdup_lane_s8 Vector manipulation|Set all lanes to the same value vdupq_lane_s8 Vector manipulation|Set all lanes to the same value vdup_lane_s16 Vector manipulation|Set all lanes to the same value @@ -1965,6 +1974,8 @@ vdup_lane_p16 Vector manipulation|Set all lanes to the same value vdupq_lane_p16 Vector manipulation|Set all lanes to the same value vdup_lane_f64 Vector manipulation|Set all lanes to the same value vdupq_lane_f64 Vector manipulation|Set all lanes to the same value +vdup_lane_mf8 Vector manipulation|Set all lanes to the same value +vdupq_lane_mf8 Vector manipulation|Set all lanes to the same value vdup_laneq_s8 Vector manipulation|Set all lanes to the same value vdupq_laneq_s8 Vector manipulation|Set all lanes to the same value vdup_laneq_s16 Vector manipulation|Set all lanes to the same value @@ -1991,6 +2002,8 @@ vdup_laneq_p16 Vector manipulation|Set all lanes to the same value vdupq_laneq_p16 Vector manipulation|Set all lanes to the same value vdup_laneq_f64 Vector manipulation|Set all lanes to the same value vdupq_laneq_f64 Vector manipulation|Set all lanes to the same value +vdup_laneq_mf8 Vector manipulation|Set all lanes to the same value +vdupq_laneq_mf8 Vector manipulation|Set all lanes to the same value vcombine_s8 Vector manipulation|Combine vectors vcombine_s16 Vector manipulation|Combine vectors vcombine_s32 Vector manipulation|Combine vectors @@ -2005,6 +2018,7 @@ vcombine_f32 Vector manipulation|Combine vectors vcombine_p8 Vector manipulation|Combine vectors vcombine_p16 Vector manipulation|Combine vectors vcombine_f64 Vector manipulation|Combine vectors +vcombine_mf8 Vector manipulation|Combine vectors vget_high_s8 Vector manipulation|Split vectors vget_high_s16 Vector manipulation|Split vectors vget_high_s32 Vector manipulation|Split vectors @@ -2019,6 +2033,7 @@ vget_high_f32 Vector manipulation|Split vectors vget_high_p8 Vector manipulation|Split vectors vget_high_p16 Vector manipulation|Split vectors vget_high_f64 Vector manipulation|Split vectors +vget_high_mf8 Vector manipulation|Split vectors vget_low_s8 Vector manipulation|Split vectors vget_low_s16 Vector manipulation|Split vectors vget_low_s32 Vector manipulation|Split vectors @@ -2033,6 +2048,7 @@ vget_low_f32 Vector manipulation|Split vectors vget_low_p8 Vector manipulation|Split vectors vget_low_p16 Vector manipulation|Split vectors vget_low_f64 Vector manipulation|Split vectors +vget_low_mf8 Vector manipulation|Split vectors vdupb_lane_s8 Vector manipulation|Extract one element from vector vduph_lane_s16 Vector manipulation|Extract one element from vector vdups_lane_s32 Vector manipulation|Extract one element from vector @@ -2045,6 +2061,7 @@ vdups_lane_f32 Vector manipulation|Extract one element from vector vdupd_lane_f64 Vector manipulation|Extract one element from vector vdupb_lane_p8 Vector manipulation|Extract one element from vector vduph_lane_p16 Vector manipulation|Extract one element from vector +vdupb_lane_mf8 Vector manipulation|Extract one element from vector vdupb_laneq_s8 Vector manipulation|Extract one element from vector vduph_laneq_s16 Vector manipulation|Extract one element from vector vdups_laneq_s32 Vector manipulation|Extract one element from vector @@ -2057,6 +2074,7 @@ vdups_laneq_f32 Vector manipulation|Extract one element from vector vdupd_laneq_f64 Vector manipulation|Extract one element from vector vdupb_laneq_p8 Vector manipulation|Extract one element from vector vduph_laneq_p16 Vector manipulation|Extract one element from vector +vdupb_laneq_mf8 Vector manipulation|Extract one element from vector vld1_s8 Load|Stride vld1q_s8 Load|Stride vld1_s16 Load|Stride @@ -2085,6 +2103,8 @@ vld1_p16 Load|Stride vld1q_p16 Load|Stride vld1_f64 Load|Stride vld1q_f64 Load|Stride +vld1_mf8 Load|Stride +vld1q_mf8 Load|Stride vld1_lane_s8 Load|Stride vld1q_lane_s8 Load|Stride vld1_lane_s16 Load|Stride @@ -2113,6 +2133,8 @@ vld1_lane_p16 Load|Stride vld1q_lane_p16 Load|Stride vld1_lane_f64 Load|Stride vld1q_lane_f64 Load|Stride +vld1_lane_mf8 Load|Stride +vld1q_lane_mf8 Load|Stride vldap1q_lane_u64 Load|Stride vldap1q_lane_s64 Load|Stride vldap1q_lane_f64 Load|Stride @@ -2157,6 +2179,8 @@ vld1_dup_p16 Load|Stride vld1q_dup_p16 Load|Stride vld1_dup_f64 Load|Stride vld1q_dup_f64 Load|Stride +vld1_dup_mf8 Load|Stride +vld1q_dup_mf8 Load|Stride vst1_s8 Store|Stride vst1q_s8 Store|Stride vst1_s16 Store|Stride @@ -2185,6 +2209,8 @@ vst1_p16 Store|Stride vst1q_p16 Store|Stride vst1_f64 Store|Stride vst1q_f64 Store|Stride +vst1_mf8 Store|Stride +vst1q_mf8 Store|Stride vst1_lane_s8 Store|Stride vst1q_lane_s8 Store|Stride vst1_lane_s16 Store|Stride @@ -2213,6 +2239,8 @@ vst1_lane_p16 Store|Stride vst1q_lane_p16 Store|Stride vst1_lane_f64 Store|Stride vst1q_lane_f64 Store|Stride +vst1_lane_mf8 Store|Stride +vst1q_lane_mf8 Store|Stride vld2_s8 Load|Stride vld2q_s8 Load|Stride vld2_s16 Load|Stride @@ -2241,6 +2269,8 @@ vld2q_u64 Load|Stride vld2q_p64 Load|Stride vld2_f64 Load|Stride vld2q_f64 Load|Stride +vld2_mf8 Load|Stride +vld2q_mf8 Load|Stride vld3_s8 Load|Stride vld3q_s8 Load|Stride vld3_s16 Load|Stride @@ -2269,6 +2299,8 @@ vld3q_u64 Load|Stride vld3q_p64 Load|Stride vld3_f64 Load|Stride vld3q_f64 Load|Stride +vld3_mf8 Load|Stride +vld3q_mf8 Load|Stride vld4_s8 Load|Stride vld4q_s8 Load|Stride vld4_s16 Load|Stride @@ -2297,6 +2329,8 @@ vld4q_u64 Load|Stride vld4q_p64 Load|Stride vld4_f64 Load|Stride vld4q_f64 Load|Stride +vld4_mf8 Load|Stride +vld4q_mf8 Load|Stride vld2_dup_s8 Load|Stride vld2q_dup_s8 Load|Stride vld2_dup_s16 Load|Stride @@ -2325,6 +2359,8 @@ vld2q_dup_u64 Load|Stride vld2q_dup_p64 Load|Stride vld2_dup_f64 Load|Stride vld2q_dup_f64 Load|Stride +vld2_dup_mf8 Load|Stride +vld2q_dup_mf8 Load|Stride vld3_dup_s8 Load|Stride vld3q_dup_s8 Load|Stride vld3_dup_s16 Load|Stride @@ -2353,6 +2389,8 @@ vld3q_dup_u64 Load|Stride vld3q_dup_p64 Load|Stride vld3_dup_f64 Load|Stride vld3q_dup_f64 Load|Stride +vld3_dup_mf8 Load|Stride +vld3q_dup_mf8 Load|Stride vld4_dup_s8 Load|Stride vld4q_dup_s8 Load|Stride vld4_dup_s16 Load|Stride @@ -2381,6 +2419,8 @@ vld4q_dup_u64 Load|Stride vld4q_dup_p64 Load|Stride vld4_dup_f64 Load|Stride vld4q_dup_f64 Load|Stride +vld4_dup_mf8 Load|Stride +vld4q_dup_mf8 Load|Stride vst2_s8 Store|Stride vst2q_s8 Store|Stride vst2_s16 Store|Stride @@ -2409,6 +2449,8 @@ vst2q_u64 Store|Stride vst2q_p64 Store|Stride vst2_f64 Store|Stride vst2q_f64 Store|Stride +vst2_mf8 Store|Stride +vst2q_mf8 Store|Stride vst3_s8 Store|Stride vst3q_s8 Store|Stride vst3_s16 Store|Stride @@ -2437,6 +2479,8 @@ vst3q_u64 Store|Stride vst3q_p64 Store|Stride vst3_f64 Store|Stride vst3q_f64 Store|Stride +vst3_mf8 Store|Stride +vst3q_mf8 Store|Stride vst4_s8 Store|Stride vst4q_s8 Store|Stride vst4_s16 Store|Stride @@ -2465,6 +2509,8 @@ vst4q_u64 Store|Stride vst4q_p64 Store|Stride vst4_f64 Store|Stride vst4q_f64 Store|Stride +vst4_mf8 Store|Stride +vst4q_mf8 Store|Stride vld2_lane_s16 Load|Stride vld2q_lane_s16 Load|Stride vld2_lane_s32 Load|Stride @@ -2493,6 +2539,8 @@ vld2_lane_p64 Load|Stride vld2q_lane_p64 Load|Stride vld2_lane_f64 Load|Stride vld2q_lane_f64 Load|Stride +vld2_lane_mf8 Load|Stride +vld2q_lane_mf8 Load|Stride vld3_lane_s16 Load|Stride vld3q_lane_s16 Load|Stride vld3_lane_s32 Load|Stride @@ -2521,6 +2569,8 @@ vld3_lane_p64 Load|Stride vld3q_lane_p64 Load|Stride vld3_lane_f64 Load|Stride vld3q_lane_f64 Load|Stride +vld3_lane_mf8 Load|Stride +vld3q_lane_mf8 Load|Stride vld4_lane_s16 Load|Stride vld4q_lane_s16 Load|Stride vld4_lane_s32 Load|Stride @@ -2549,15 +2599,20 @@ vld4_lane_p64 Load|Stride vld4q_lane_p64 Load|Stride vld4_lane_f64 Load|Stride vld4q_lane_f64 Load|Stride +vld4_lane_mf8 Load|Stride +vld4q_lane_mf8 Load|Stride vst2_lane_s8 Store|Stride vst2_lane_u8 Store|Stride vst2_lane_p8 Store|Stride +vst2_lane_mf8 Store|Stride vst3_lane_s8 Store|Stride vst3_lane_u8 Store|Stride vst3_lane_p8 Store|Stride +vst3_lane_mf8 Store|Stride vst4_lane_s8 Store|Stride vst4_lane_u8 Store|Stride vst4_lane_p8 Store|Stride +vst4_lane_mf8 Store|Stride vst2_lane_s16 Store|Stride vst2q_lane_s16 Store|Stride vst2_lane_s32 Store|Stride @@ -2583,6 +2638,7 @@ vst2_lane_p64 Store|Stride vst2q_lane_p64 Store|Stride vst2_lane_f64 Store|Stride vst2q_lane_f64 Store|Stride +vst2q_lane_mf8 Store|Stride vst3_lane_s16 Store|Stride vst3q_lane_s16 Store|Stride vst3_lane_s32 Store|Stride @@ -2608,6 +2664,7 @@ vst3_lane_p64 Store|Stride vst3q_lane_p64 Store|Stride vst3_lane_f64 Store|Stride vst3q_lane_f64 Store|Stride +vst3q_lane_mf8 Store|Stride vst4_lane_s16 Store|Stride vst4q_lane_s16 Store|Stride vst4_lane_s32 Store|Stride @@ -2633,6 +2690,7 @@ vst4_lane_p64 Store|Stride vst4q_lane_p64 Store|Stride vst4_lane_f64 Store|Stride vst4q_lane_f64 Store|Stride +vst4q_lane_mf8 Store|Stride vst1_s8_x2 Store|Stride vst1q_s8_x2 Store|Stride vst1_s16_x2 Store|Stride @@ -2661,6 +2719,8 @@ vst1q_u64_x2 Store|Stride vst1q_p64_x2 Store|Stride vst1_f64_x2 Store|Stride vst1q_f64_x2 Store|Stride +vst1_mf8_x2 Store|Stride +vst1q_mf8_x2 Store|Stride vst1_s8_x3 Store|Stride vst1q_s8_x3 Store|Stride vst1_s16_x3 Store|Stride @@ -2689,6 +2749,8 @@ vst1q_u64_x3 Store|Stride vst1q_p64_x3 Store|Stride vst1_f64_x3 Store|Stride vst1q_f64_x3 Store|Stride +vst1_mf8_x3 Store|Stride +vst1q_mf8_x3 Store|Stride vst1_s8_x4 Store|Stride vst1q_s8_x4 Store|Stride vst1_s16_x4 Store|Stride @@ -2717,6 +2779,8 @@ vst1q_u64_x4 Store|Stride vst1q_p64_x4 Store|Stride vst1_f64_x4 Store|Stride vst1q_f64_x4 Store|Stride +vst1_mf8_x4 Store|Stride +vst1q_mf8_x4 Store|Stride vld1_s8_x2 Load|Stride vld1q_s8_x2 Load|Stride vld1_s16_x2 Load|Stride @@ -2745,6 +2809,8 @@ vld1q_u64_x2 Load|Stride vld1q_p64_x2 Load|Stride vld1_f64_x2 Load|Stride vld1q_f64_x2 Load|Stride +vld1_mf8_x2 Load|Stride +vld1q_mf8_x2 Load|Stride vld1_s8_x3 Load|Stride vld1q_s8_x3 Load|Stride vld1_s16_x3 Load|Stride @@ -2773,6 +2839,8 @@ vld1q_u64_x3 Load|Stride vld1q_p64_x3 Load|Stride vld1_f64_x3 Load|Stride vld1q_f64_x3 Load|Stride +vld1_mf8_x3 Load|Stride +vld1q_mf8_x3 Load|Stride vld1_s8_x4 Load|Stride vld1q_s8_x4 Load|Stride vld1_s16_x4 Load|Stride @@ -2801,6 +2869,8 @@ vld1q_u64_x4 Load|Stride vld1q_p64_x4 Load|Stride vld1_f64_x4 Load|Stride vld1q_f64_x4 Load|Stride +vld1_mf8_x4 Load|Stride +vld1q_mf8_x4 Load|Stride vpadd_s8 Vector arithmetic|Pairwise arithmetic|Pairwise addition vpadd_s16 Vector arithmetic|Pairwise arithmetic|Pairwise addition vpadd_s32 Vector arithmetic|Pairwise arithmetic|Pairwise addition @@ -2981,6 +3051,8 @@ vext_p8 Vector manipulation|Extract vector from a pair of vectors vextq_p8 Vector manipulation|Extract vector from a pair of vectors vext_p16 Vector manipulation|Extract vector from a pair of vectors vextq_p16 Vector manipulation|Extract vector from a pair of vectors +vext_mf8 Vector manipulation|Extract vector from a pair of vectors +vextq_mf8 Vector manipulation|Extract vector from a pair of vectors vrev64_s8 Vector manipulation|Reverse elements vrev64q_s8 Vector manipulation|Reverse elements vrev64_s16 Vector manipulation|Reverse elements @@ -2999,6 +3071,8 @@ vrev64_p8 Vector manipulation|Reverse elements vrev64q_p8 Vector manipulation|Reverse elements vrev64_p16 Vector manipulation|Reverse elements vrev64q_p16 Vector manipulation|Reverse elements +vrev64_mf8 Vector manipulation|Reverse elements +vrev64q_mf8 Vector manipulation|Reverse elements vrev32_s8 Vector manipulation|Reverse elements vrev32q_s8 Vector manipulation|Reverse elements vrev32_s16 Vector manipulation|Reverse elements @@ -3011,12 +3085,16 @@ vrev32_p8 Vector manipulation|Reverse elements vrev32q_p8 Vector manipulation|Reverse elements vrev32_p16 Vector manipulation|Reverse elements vrev32q_p16 Vector manipulation|Reverse elements +vrev32_mf8 Vector manipulation|Reverse elements +vrev32q_mf8 Vector manipulation|Reverse elements vrev16_s8 Vector manipulation|Reverse elements vrev16q_s8 Vector manipulation|Reverse elements vrev16_u8 Vector manipulation|Reverse elements vrev16q_u8 Vector manipulation|Reverse elements vrev16_p8 Vector manipulation|Reverse elements vrev16q_p8 Vector manipulation|Reverse elements +vrev16_mf8 Vector manipulation|Reverse elements +vrev16q_mf8 Vector manipulation|Reverse elements vzip1_s8 Vector manipulation|Zip elements vzip1q_s8 Vector manipulation|Zip elements vzip1_s16 Vector manipulation|Zip elements @@ -3039,6 +3117,8 @@ vzip1_p8 Vector manipulation|Zip elements vzip1q_p8 Vector manipulation|Zip elements vzip1_p16 Vector manipulation|Zip elements vzip1q_p16 Vector manipulation|Zip elements +vzip1_mf8 Vector manipulation|Zip elements +vzip1q_mf8 Vector manipulation|Zip elements vzip2_s8 Vector manipulation|Zip elements vzip2q_s8 Vector manipulation|Zip elements vzip2_s16 Vector manipulation|Zip elements @@ -3061,6 +3141,8 @@ vzip2_p8 Vector manipulation|Zip elements vzip2q_p8 Vector manipulation|Zip elements vzip2_p16 Vector manipulation|Zip elements vzip2q_p16 Vector manipulation|Zip elements +vzip2_mf8 Vector manipulation|Zip elements +vzip2q_mf8 Vector manipulation|Zip elements vuzp1_s8 Vector manipulation|Unzip elements vuzp1q_s8 Vector manipulation|Unzip elements vuzp1_s16 Vector manipulation|Unzip elements @@ -3083,6 +3165,8 @@ vuzp1_p8 Vector manipulation|Unzip elements vuzp1q_p8 Vector manipulation|Unzip elements vuzp1_p16 Vector manipulation|Unzip elements vuzp1q_p16 Vector manipulation|Unzip elements +vuzp1_mf8 Vector manipulation|Unzip elements +vuzp1q_mf8 Vector manipulation|Unzip elements vuzp2_s8 Vector manipulation|Unzip elements vuzp2q_s8 Vector manipulation|Unzip elements vuzp2_s16 Vector manipulation|Unzip elements @@ -3105,6 +3189,8 @@ vuzp2_p8 Vector manipulation|Unzip elements vuzp2q_p8 Vector manipulation|Unzip elements vuzp2_p16 Vector manipulation|Unzip elements vuzp2q_p16 Vector manipulation|Unzip elements +vuzp2_mf8 Vector manipulation|Unzip elements +vuzp2q_mf8 Vector manipulation|Unzip elements vtrn1_s8 Vector manipulation|Transpose elements vtrn1q_s8 Vector manipulation|Transpose elements vtrn1_s16 Vector manipulation|Transpose elements @@ -3127,6 +3213,8 @@ vtrn1_p8 Vector manipulation|Transpose elements vtrn1q_p8 Vector manipulation|Transpose elements vtrn1_p16 Vector manipulation|Transpose elements vtrn1q_p16 Vector manipulation|Transpose elements +vtrn1_mf8 Vector manipulation|Transpose elements +vtrn1q_mf8 Vector manipulation|Transpose elements vtrn2_s8 Vector manipulation|Transpose elements vtrn2q_s8 Vector manipulation|Transpose elements vtrn2_s16 Vector manipulation|Transpose elements @@ -3149,78 +3237,104 @@ vtrn2_p8 Vector manipulation|Transpose elements vtrn2q_p8 Vector manipulation|Transpose elements vtrn2_p16 Vector manipulation|Transpose elements vtrn2q_p16 Vector manipulation|Transpose elements +vtrn2_mf8 Vector manipulation|Transpose elements +vtrn2q_mf8 Vector manipulation|Transpose elements vtbl1_s8 Table lookup|Table lookup vtbl1_u8 Table lookup|Table lookup vtbl1_p8 Table lookup|Table lookup +vtbl1_mf8 Table lookup|Table lookup vtbx1_s8 Table lookup|Table lookup vtbx1_u8 Table lookup|Table lookup vtbx1_p8 Table lookup|Table lookup +vtbx1_mf8 Table lookup|Table lookup vtbl2_s8 Table lookup|Table lookup vtbl2_u8 Table lookup|Table lookup vtbl2_p8 Table lookup|Table lookup +vtbl2_mf8 Table lookup|Table lookup vtbl3_s8 Table lookup|Table lookup vtbl3_u8 Table lookup|Table lookup vtbl3_p8 Table lookup|Table lookup +vtbl3_mf8 Table lookup|Table lookup vtbl4_s8 Table lookup|Table lookup vtbl4_u8 Table lookup|Table lookup vtbl4_p8 Table lookup|Table lookup +vtbl4_mf8 Table lookup|Table lookup vtbx2_s8 Table lookup|Extended table lookup vtbx2_u8 Table lookup|Extended table lookup vtbx2_p8 Table lookup|Extended table lookup +vtbx2_mf8 Table lookup|Extended table lookup vtbx3_s8 Table lookup|Extended table lookup vtbx3_u8 Table lookup|Extended table lookup vtbx3_p8 Table lookup|Extended table lookup +vtbx3_mf8 Table lookup|Extended table lookup vtbx4_s8 Table lookup|Extended table lookup vtbx4_u8 Table lookup|Extended table lookup vtbx4_p8 Table lookup|Extended table lookup +vtbx4_mf8 Table lookup|Extended table lookup vqtbl1_s8 Table lookup|Table lookup vqtbl1q_s8 Table lookup|Table lookup vqtbl1_u8 Table lookup|Table lookup vqtbl1q_u8 Table lookup|Table lookup vqtbl1_p8 Table lookup|Table lookup vqtbl1q_p8 Table lookup|Table lookup +vqtbl1_mf8 Table lookup|Table lookup +vqtbl1q_mf8 Table lookup|Table lookup vqtbx1_s8 Table lookup|Extended table lookup vqtbx1q_s8 Table lookup|Extended table lookup vqtbx1_u8 Table lookup|Extended table lookup vqtbx1q_u8 Table lookup|Extended table lookup vqtbx1_p8 Table lookup|Extended table lookup vqtbx1q_p8 Table lookup|Extended table lookup +vqtbx1_mf8 Table lookup|Extended table lookup +vqtbx1q_mf8 Table lookup|Extended table lookup vqtbl2_s8 Table lookup|Table lookup vqtbl2q_s8 Table lookup|Table lookup vqtbl2_u8 Table lookup|Table lookup vqtbl2q_u8 Table lookup|Table lookup vqtbl2_p8 Table lookup|Table lookup vqtbl2q_p8 Table lookup|Table lookup +vqtbl2_mf8 Table lookup|Table lookup +vqtbl2q_mf8 Table lookup|Table lookup vqtbl3_s8 Table lookup|Table lookup vqtbl3q_s8 Table lookup|Table lookup vqtbl3_u8 Table lookup|Table lookup vqtbl3q_u8 Table lookup|Table lookup vqtbl3_p8 Table lookup|Table lookup vqtbl3q_p8 Table lookup|Table lookup +vqtbl3_mf8 Table lookup|Table lookup +vqtbl3q_mf8 Table lookup|Table lookup vqtbl4_s8 Table lookup|Table lookup vqtbl4q_s8 Table lookup|Table lookup vqtbl4_u8 Table lookup|Table lookup vqtbl4q_u8 Table lookup|Table lookup vqtbl4_p8 Table lookup|Table lookup vqtbl4q_p8 Table lookup|Table lookup +vqtbl4_mf8 Table lookup|Table lookup +vqtbl4q_mf8 Table lookup|Table lookup vqtbx2_s8 Table lookup|Extended table lookup vqtbx2q_s8 Table lookup|Extended table lookup vqtbx2_u8 Table lookup|Extended table lookup vqtbx2q_u8 Table lookup|Extended table lookup vqtbx2_p8 Table lookup|Extended table lookup vqtbx2q_p8 Table lookup|Extended table lookup +vqtbx2_mf8 Table lookup|Extended table lookup +vqtbx2q_mf8 Table lookup|Extended table lookup vqtbx3_s8 Table lookup|Extended table lookup vqtbx3q_s8 Table lookup|Extended table lookup vqtbx3_u8 Table lookup|Extended table lookup vqtbx3q_u8 Table lookup|Extended table lookup vqtbx3_p8 Table lookup|Extended table lookup vqtbx3q_p8 Table lookup|Extended table lookup +vqtbx3_mf8 Table lookup|Extended table lookup +vqtbx3q_mf8 Table lookup|Extended table lookup vqtbx4_s8 Table lookup|Extended table lookup vqtbx4q_s8 Table lookup|Extended table lookup vqtbx4_u8 Table lookup|Extended table lookup vqtbx4q_u8 Table lookup|Extended table lookup vqtbx4_p8 Table lookup|Extended table lookup vqtbx4q_p8 Table lookup|Extended table lookup +vqtbx4_mf8 Table lookup|Extended table lookup +vqtbx4q_mf8 Table lookup|Extended table lookup vget_lane_u8 Vector manipulation|Extract one element from vector vget_lane_u16 Vector manipulation|Extract one element from vector vget_lane_u32 Vector manipulation|Extract one element from vector @@ -3277,6 +3391,8 @@ vsetq_lane_p8 Vector manipulation|Set vector lane vsetq_lane_p16 Vector manipulation|Set vector lane vsetq_lane_f32 Vector manipulation|Set vector lane vsetq_lane_f64 Vector manipulation|Set vector lane +vset_lane_mf8 Vector manipulation|Set vector lane +vsetq_lane_mf8 Vector manipulation|Set vector lane vrecpxs_f32 Vector arithmetic|Reciprocal|Reciprocal exponent vrecpxd_f64 Vector arithmetic|Reciprocal|Reciprocal exponent vfma_n_f32 Scalar arithmetic|Fused multiply-accumulate by scalar @@ -3296,6 +3412,7 @@ vtrn_p16 Vector manipulation|Transpose elements vtrn_s32 Vector manipulation|Transpose elements vtrn_f32 Vector manipulation|Transpose elements vtrn_u32 Vector manipulation|Transpose elements +vtrn_mf8 Vector manipulation|Transpose elements vtrnq_s8 Vector manipulation|Transpose elements vtrnq_s16 Vector manipulation|Transpose elements vtrnq_s32 Vector manipulation|Transpose elements @@ -3305,12 +3422,14 @@ vtrnq_u16 Vector manipulation|Transpose elements vtrnq_u32 Vector manipulation|Transpose elements vtrnq_p8 Vector manipulation|Transpose elements vtrnq_p16 Vector manipulation|Transpose elements +vtrnq_mf8 Vector manipulation|Transpose elements vzip_s8 Vector manipulation|Zip elements vzip_s16 Vector manipulation|Zip elements vzip_u8 Vector manipulation|Zip elements vzip_u16 Vector manipulation|Zip elements vzip_p8 Vector manipulation|Zip elements vzip_p16 Vector manipulation|Zip elements +vzip_mf8 Vector manipulation|Zip elements vzip_s32 Vector manipulation|Zip elements vzip_f32 Vector manipulation|Zip elements vzip_u32 Vector manipulation|Zip elements @@ -3323,6 +3442,7 @@ vzipq_u16 Vector manipulation|Zip elements vzipq_u32 Vector manipulation|Zip elements vzipq_p8 Vector manipulation|Zip elements vzipq_p16 Vector manipulation|Zip elements +vzipq_mf8 Vector manipulation|Zip elements vuzp_s8 Vector manipulation|Unzip elements vuzp_s16 Vector manipulation|Unzip elements vuzp_s32 Vector manipulation|Unzip elements @@ -3332,15 +3452,17 @@ vuzp_u16 Vector manipulation|Unzip elements vuzp_u32 Vector manipulation|Unzip elements vuzp_p8 Vector manipulation|Unzip elements vuzp_p16 Vector manipulation|Unzip elements +vuzp_mf8 Vector manipulation|Unzip elements vuzpq_s8 Vector manipulation|Unzip elements vuzpq_s16 Vector manipulation|Unzip elements vuzpq_s32 Vector manipulation|Unzip elements vuzpq_f32 Vector manipulation|Unzip elements vuzpq_u8 Vector manipulation|Unzip elements -vuzpq_u16 Vector manipulation|Unzip elements +vuzpq_u16 Vector manipulation|Unzip elements` vuzpq_u32 Vector manipulation|Unzip elements vuzpq_p8 Vector manipulation|Unzip elements vuzpq_p16 Vector manipulation|Unzip elements +vuzpq_mf8 Vector manipulation|Unzip elements vreinterpret_s16_s8 Data type conversion|Reinterpret casts vreinterpret_s32_s8 Data type conversion|Reinterpret casts vreinterpret_f32_s8 Data type conversion|Reinterpret casts @@ -3349,6 +3471,7 @@ vreinterpret_u16_s8 Data type conversion|Reinterpret casts vreinterpret_u32_s8 Data type conversion|Reinterpret casts vreinterpret_p8_s8 Data type conversion|Reinterpret casts vreinterpret_p16_s8 Data type conversion|Reinterpret casts +vreinterpret_mf8_s8 Data type conversion|Reinterpret casts vreinterpret_u64_s8 Data type conversion|Reinterpret casts vreinterpret_s64_s8 Data type conversion|Reinterpret casts vreinterpret_f64_s8 Data type conversion|Reinterpret casts @@ -3362,6 +3485,7 @@ vreinterpret_u16_s16 Data type conversion|Reinterpret casts vreinterpret_u32_s16 Data type conversion|Reinterpret casts vreinterpret_p8_s16 Data type conversion|Reinterpret casts vreinterpret_p16_s16 Data type conversion|Reinterpret casts +vreinterpret_mf8_s16 Data type conversion|Reinterpret casts vreinterpret_u64_s16 Data type conversion|Reinterpret casts vreinterpret_s64_s16 Data type conversion|Reinterpret casts vreinterpret_f64_s16 Data type conversion|Reinterpret casts @@ -3375,6 +3499,7 @@ vreinterpret_u16_s32 Data type conversion|Reinterpret casts vreinterpret_u32_s32 Data type conversion|Reinterpret casts vreinterpret_p8_s32 Data type conversion|Reinterpret casts vreinterpret_p16_s32 Data type conversion|Reinterpret casts +vreinterpret_mf8_s32 Data type conversion|Reinterpret casts vreinterpret_u64_s32 Data type conversion|Reinterpret casts vreinterpret_s64_s32 Data type conversion|Reinterpret casts vreinterpret_f64_s32 Data type conversion|Reinterpret casts @@ -3388,6 +3513,7 @@ vreinterpret_u16_f32 Data type conversion|Reinterpret casts vreinterpret_u32_f32 Data type conversion|Reinterpret casts vreinterpret_p8_f32 Data type conversion|Reinterpret casts vreinterpret_p16_f32 Data type conversion|Reinterpret casts +vreinterpret_mf8_f32 Data type conversion|Reinterpret casts vreinterpret_u64_f32 Data type conversion|Reinterpret casts vreinterpret_s64_f32 Data type conversion|Reinterpret casts vreinterpret_f64_f32 Data type conversion|Reinterpret casts @@ -3402,6 +3528,7 @@ vreinterpret_u16_u8 Data type conversion|Reinterpret casts vreinterpret_u32_u8 Data type conversion|Reinterpret casts vreinterpret_p8_u8 Data type conversion|Reinterpret casts vreinterpret_p16_u8 Data type conversion|Reinterpret casts +vreinterpret_mf8_u8 Data type conversion|Reinterpret casts vreinterpret_u64_u8 Data type conversion|Reinterpret casts vreinterpret_s64_u8 Data type conversion|Reinterpret casts vreinterpret_f64_u8 Data type conversion|Reinterpret casts @@ -3415,6 +3542,7 @@ vreinterpret_u8_u16 Data type conversion|Reinterpret casts vreinterpret_u32_u16 Data type conversion|Reinterpret casts vreinterpret_p8_u16 Data type conversion|Reinterpret casts vreinterpret_p16_u16 Data type conversion|Reinterpret casts +vreinterpret_mf8_u16 Data type conversion|Reinterpret casts vreinterpret_u64_u16 Data type conversion|Reinterpret casts vreinterpret_s64_u16 Data type conversion|Reinterpret casts vreinterpret_f64_u16 Data type conversion|Reinterpret casts @@ -3428,6 +3556,7 @@ vreinterpret_u8_u32 Data type conversion|Reinterpret casts vreinterpret_u16_u32 Data type conversion|Reinterpret casts vreinterpret_p8_u32 Data type conversion|Reinterpret casts vreinterpret_p16_u32 Data type conversion|Reinterpret casts +vreinterpret_mf8_u32 Data type conversion|Reinterpret casts vreinterpret_u64_u32 Data type conversion|Reinterpret casts vreinterpret_s64_u32 Data type conversion|Reinterpret casts vreinterpret_f64_u32 Data type conversion|Reinterpret casts @@ -3446,6 +3575,21 @@ vreinterpret_s64_p8 Data type conversion|Reinterpret casts vreinterpret_f64_p8 Data type conversion|Reinterpret casts vreinterpret_p64_p8 Data type conversion|Reinterpret casts vreinterpret_f16_p8 Data type conversion|Reinterpret casts + +vreinterpret_s8_mf8 Data type conversion|Reinterpret casts +vreinterpret_s16_mf8 Data type conversion|Reinterpret casts +vreinterpret_s32_mf8 Data type conversion|Reinterpret casts +vreinterpret_f32_mf8 Data type conversion|Reinterpret casts +vreinterpret_u8_mf8 Data type conversion|Reinterpret casts +vreinterpret_u16_mf8 Data type conversion|Reinterpret casts +vreinterpret_u32_mf8 Data type conversion|Reinterpret casts +vreinterpret_p16_mf8 Data type conversion|Reinterpret casts +vreinterpret_u64_mf8 Data type conversion|Reinterpret casts +vreinterpret_s64_mf8 Data type conversion|Reinterpret casts +vreinterpret_f64_mf8 Data type conversion|Reinterpret casts +vreinterpret_p64_mf8 Data type conversion|Reinterpret casts +vreinterpret_f16_mf8 Data type conversion|Reinterpret casts + vreinterpret_s8_p16 Data type conversion|Reinterpret casts vreinterpret_s16_p16 Data type conversion|Reinterpret casts vreinterpret_s32_p16 Data type conversion|Reinterpret casts @@ -3454,6 +3598,7 @@ vreinterpret_u8_p16 Data type conversion|Reinterpret casts vreinterpret_u16_p16 Data type conversion|Reinterpret casts vreinterpret_u32_p16 Data type conversion|Reinterpret casts vreinterpret_p8_p16 Data type conversion|Reinterpret casts +vreinterpret_mf8_p16 Data type conversion|Reinterpret casts vreinterpret_u64_p16 Data type conversion|Reinterpret casts vreinterpret_s64_p16 Data type conversion|Reinterpret casts vreinterpret_f64_p16 Data type conversion|Reinterpret casts @@ -3468,6 +3613,7 @@ vreinterpret_u16_u64 Data type conversion|Reinterpret casts vreinterpret_u32_u64 Data type conversion|Reinterpret casts vreinterpret_p8_u64 Data type conversion|Reinterpret casts vreinterpret_p16_u64 Data type conversion|Reinterpret casts +vreinterpret_mf8_u64 Data type conversion|Reinterpret casts vreinterpret_s64_u64 Data type conversion|Reinterpret casts vreinterpret_f64_u64 Data type conversion|Reinterpret casts vreinterpret_p64_u64 Data type conversion|Reinterpret casts @@ -3481,6 +3627,7 @@ vreinterpret_u16_s64 Data type conversion|Reinterpret casts vreinterpret_u32_s64 Data type conversion|Reinterpret casts vreinterpret_p8_s64 Data type conversion|Reinterpret casts vreinterpret_p16_s64 Data type conversion|Reinterpret casts +vreinterpret_mf8_s64 Data type conversion|Reinterpret casts vreinterpret_u64_s64 Data type conversion|Reinterpret casts vreinterpret_f64_s64 Data type conversion|Reinterpret casts vreinterpret_u64_p64 Data type conversion|Reinterpret casts @@ -3494,6 +3641,7 @@ vreinterpret_u16_f16 Data type conversion|Reinterpret casts vreinterpret_u32_f16 Data type conversion|Reinterpret casts vreinterpret_p8_f16 Data type conversion|Reinterpret casts vreinterpret_p16_f16 Data type conversion|Reinterpret casts +vreinterpret_mf8_f16 Data type conversion|Reinterpret casts vreinterpret_u64_f16 Data type conversion|Reinterpret casts vreinterpret_s64_f16 Data type conversion|Reinterpret casts vreinterpret_f64_f16 Data type conversion|Reinterpret casts @@ -3506,6 +3654,7 @@ vreinterpretq_u16_s8 Data type conversion|Reinterpret casts vreinterpretq_u32_s8 Data type conversion|Reinterpret casts vreinterpretq_p8_s8 Data type conversion|Reinterpret casts vreinterpretq_p16_s8 Data type conversion|Reinterpret casts +vreinterpretq_mf8_s8 Data type conversion|Reinterpret casts vreinterpretq_u64_s8 Data type conversion|Reinterpret casts vreinterpretq_s64_s8 Data type conversion|Reinterpret casts vreinterpretq_f64_s8 Data type conversion|Reinterpret casts @@ -3520,6 +3669,7 @@ vreinterpretq_u16_s16 Data type conversion|Reinterpret casts vreinterpretq_u32_s16 Data type conversion|Reinterpret casts vreinterpretq_p8_s16 Data type conversion|Reinterpret casts vreinterpretq_p16_s16 Data type conversion|Reinterpret casts +vreinterpretq_mf8_s16 Data type conversion|Reinterpret casts vreinterpretq_u64_s16 Data type conversion|Reinterpret casts vreinterpretq_s64_s16 Data type conversion|Reinterpret casts vreinterpretq_f64_s16 Data type conversion|Reinterpret casts @@ -3534,6 +3684,7 @@ vreinterpretq_u16_s32 Data type conversion|Reinterpret casts vreinterpretq_u32_s32 Data type conversion|Reinterpret casts vreinterpretq_p8_s32 Data type conversion|Reinterpret casts vreinterpretq_p16_s32 Data type conversion|Reinterpret casts +vreinterpretq_mf8_s32 Data type conversion|Reinterpret casts vreinterpretq_u64_s32 Data type conversion|Reinterpret casts vreinterpretq_s64_s32 Data type conversion|Reinterpret casts vreinterpretq_f64_s32 Data type conversion|Reinterpret casts @@ -3548,6 +3699,7 @@ vreinterpretq_u16_f32 Data type conversion|Reinterpret casts vreinterpretq_u32_f32 Data type conversion|Reinterpret casts vreinterpretq_p8_f32 Data type conversion|Reinterpret casts vreinterpretq_p16_f32 Data type conversion|Reinterpret casts +vreinterpretq_mf8_f32 Data type conversion|Reinterpret casts vreinterpretq_u64_f32 Data type conversion|Reinterpret casts vreinterpretq_s64_f32 Data type conversion|Reinterpret casts vreinterpretq_f64_f32 Data type conversion|Reinterpret casts @@ -3564,6 +3716,7 @@ vreinterpretq_u16_u8 Data type conversion|Reinterpret casts vreinterpretq_u32_u8 Data type conversion|Reinterpret casts vreinterpretq_p8_u8 Data type conversion|Reinterpret casts vreinterpretq_p16_u8 Data type conversion|Reinterpret casts +vreinterpretq_mf8_u8 Data type conversion|Reinterpret casts vreinterpretq_u64_u8 Data type conversion|Reinterpret casts vreinterpretq_s64_u8 Data type conversion|Reinterpret casts vreinterpretq_f64_u8 Data type conversion|Reinterpret casts @@ -3578,6 +3731,7 @@ vreinterpretq_u8_u16 Data type conversion|Reinterpret casts vreinterpretq_u32_u16 Data type conversion|Reinterpret casts vreinterpretq_p8_u16 Data type conversion|Reinterpret casts vreinterpretq_p16_u16 Data type conversion|Reinterpret casts +vreinterpretq_mf8_u16 Data type conversion|Reinterpret casts vreinterpretq_u64_u16 Data type conversion|Reinterpret casts vreinterpretq_s64_u16 Data type conversion|Reinterpret casts vreinterpretq_f64_u16 Data type conversion|Reinterpret casts @@ -3592,6 +3746,7 @@ vreinterpretq_u8_u32 Data type conversion|Reinterpret casts vreinterpretq_u16_u32 Data type conversion|Reinterpret casts vreinterpretq_p8_u32 Data type conversion|Reinterpret casts vreinterpretq_p16_u32 Data type conversion|Reinterpret casts +vreinterpretq_mf8_u32 Data type conversion|Reinterpret casts vreinterpretq_u64_u32 Data type conversion|Reinterpret casts vreinterpretq_s64_u32 Data type conversion|Reinterpret casts vreinterpretq_f64_u32 Data type conversion|Reinterpret casts @@ -3612,6 +3767,22 @@ vreinterpretq_f64_p8 Data type conversion|Reinterpret casts vreinterpretq_p64_p8 Data type conversion|Reinterpret casts vreinterpretq_p128_p8 Data type conversion|Reinterpret casts vreinterpretq_f16_p8 Data type conversion|Reinterpret casts + +vreinterpretq_s8_mf8 Data type conversion|Reinterpret casts +vreinterpretq_s16_mf8 Data type conversion|Reinterpret casts +vreinterpretq_s32_mf8 Data type conversion|Reinterpret casts +vreinterpretq_f32_mf8 Data type conversion|Reinterpret casts +vreinterpretq_u8_mf8 Data type conversion|Reinterpret casts +vreinterpretq_u16_mf8 Data type conversion|Reinterpret casts +vreinterpretq_u32_mf8 Data type conversion|Reinterpret casts +vreinterpretq_p16_mf8 Data type conversion|Reinterpret casts +vreinterpretq_u64_mf8 Data type conversion|Reinterpret casts +vreinterpretq_s64_mf8 Data type conversion|Reinterpret casts +vreinterpretq_f64_mf8 Data type conversion|Reinterpret casts +vreinterpretq_p64_mf8 Data type conversion|Reinterpret casts +vreinterpretq_p128_mf8 Data type conversion|Reinterpret casts +vreinterpretq_f16_mf8 Data type conversion|Reinterpret casts + vreinterpretq_s8_p16 Data type conversion|Reinterpret casts vreinterpretq_s16_p16 Data type conversion|Reinterpret casts vreinterpretq_s32_p16 Data type conversion|Reinterpret casts @@ -3620,6 +3791,7 @@ vreinterpretq_u8_p16 Data type conversion|Reinterpret casts vreinterpretq_u16_p16 Data type conversion|Reinterpret casts vreinterpretq_u32_p16 Data type conversion|Reinterpret casts vreinterpretq_p8_p16 Data type conversion|Reinterpret casts +vreinterpretq_mf8_p16 Data type conversion|Reinterpret casts vreinterpretq_u64_p16 Data type conversion|Reinterpret casts vreinterpretq_s64_p16 Data type conversion|Reinterpret casts vreinterpretq_f64_p16 Data type conversion|Reinterpret casts @@ -3635,6 +3807,7 @@ vreinterpretq_u16_u64 Data type conversion|Reinterpret casts vreinterpretq_u32_u64 Data type conversion|Reinterpret casts vreinterpretq_p8_u64 Data type conversion|Reinterpret casts vreinterpretq_p16_u64 Data type conversion|Reinterpret casts +vreinterpretq_mf8_u64 Data type conversion|Reinterpret casts vreinterpretq_s64_u64 Data type conversion|Reinterpret casts vreinterpretq_f64_u64 Data type conversion|Reinterpret casts vreinterpretq_f64_s64 Data type conversion|Reinterpret casts @@ -3652,6 +3825,7 @@ vreinterpretq_u16_s64 Data type conversion|Reinterpret casts vreinterpretq_u32_s64 Data type conversion|Reinterpret casts vreinterpretq_p8_s64 Data type conversion|Reinterpret casts vreinterpretq_p16_s64 Data type conversion|Reinterpret casts +vreinterpretq_mf8_s64 Data type conversion|Reinterpret casts vreinterpretq_u64_s64 Data type conversion|Reinterpret casts vreinterpretq_u64_p64 Data type conversion|Reinterpret casts vreinterpretq_f16_s64 Data type conversion|Reinterpret casts @@ -3664,6 +3838,7 @@ vreinterpretq_u16_f16 Data type conversion|Reinterpret casts vreinterpretq_u32_f16 Data type conversion|Reinterpret casts vreinterpretq_p8_f16 Data type conversion|Reinterpret casts vreinterpretq_p16_f16 Data type conversion|Reinterpret casts +vreinterpretq_mf8_f16 Data type conversion|Reinterpret casts vreinterpretq_u64_f16 Data type conversion|Reinterpret casts vreinterpretq_s64_f16 Data type conversion|Reinterpret casts vreinterpretq_f64_f16 Data type conversion|Reinterpret casts @@ -3677,6 +3852,7 @@ vreinterpret_u16_f64 Data type conversion|Reinterpret casts vreinterpret_u32_f64 Data type conversion|Reinterpret casts vreinterpret_p8_f64 Data type conversion|Reinterpret casts vreinterpret_p16_f64 Data type conversion|Reinterpret casts +vreinterpret_mf8_f64 Data type conversion|Reinterpret casts vreinterpret_u64_f64 Data type conversion|Reinterpret casts vreinterpret_s64_f64 Data type conversion|Reinterpret casts vreinterpret_f16_f64 Data type conversion|Reinterpret casts @@ -3689,6 +3865,7 @@ vreinterpretq_u16_f64 Data type conversion|Reinterpret casts vreinterpretq_u32_f64 Data type conversion|Reinterpret casts vreinterpretq_p8_f64 Data type conversion|Reinterpret casts vreinterpretq_p16_f64 Data type conversion|Reinterpret casts +vreinterpretq_mf8_f64 Data type conversion|Reinterpret casts vreinterpretq_u64_f64 Data type conversion|Reinterpret casts vreinterpretq_s64_f64 Data type conversion|Reinterpret casts vreinterpretq_f16_f64 Data type conversion|Reinterpret casts @@ -3701,6 +3878,7 @@ vreinterpret_u16_p64 Data type conversion|Reinterpret casts vreinterpret_u32_p64 Data type conversion|Reinterpret casts vreinterpret_p8_p64 Data type conversion|Reinterpret casts vreinterpret_p16_p64 Data type conversion|Reinterpret casts +vreinterpret_mf8_p64 Data type conversion|Reinterpret casts vreinterpret_s64_p64 Data type conversion|Reinterpret casts vreinterpret_f64_p64 Data type conversion|Reinterpret casts vreinterpret_f16_p64 Data type conversion|Reinterpret casts @@ -3712,6 +3890,7 @@ vreinterpretq_u16_p64 Data type conversion|Reinterpret casts vreinterpretq_u32_p64 Data type conversion|Reinterpret casts vreinterpretq_p8_p64 Data type conversion|Reinterpret casts vreinterpretq_p16_p64 Data type conversion|Reinterpret casts +vreinterpretq_mf8_p64 Data type conversion|Reinterpret casts vreinterpretq_s64_p64 Data type conversion|Reinterpret casts vreinterpretq_f64_p64 Data type conversion|Reinterpret casts vreinterpretq_f16_p64 Data type conversion|Reinterpret casts @@ -3723,10 +3902,15 @@ vreinterpretq_u16_p128 Data type conversion|Reinterpret casts vreinterpretq_u32_p128 Data type conversion|Reinterpret casts vreinterpretq_p8_p128 Data type conversion|Reinterpret casts vreinterpretq_p16_p128 Data type conversion|Reinterpret casts +vreinterpretq_mf8_p128 Data type conversion|Reinterpret casts vreinterpretq_u64_p128 Data type conversion|Reinterpret casts vreinterpretq_s64_p128 Data type conversion|Reinterpret casts vreinterpretq_f64_p128 Data type conversion|Reinterpret casts vreinterpretq_f16_p128 Data type conversion|Reinterpret casts +vreinterpret_mf8_u8 Data type conversion|Reinterpret casts +vreinterpretq_mf8_u8 Data type conversion|Reinterpret casts +vreinterpret_u8_mf8 Data type conversion|Reinterpret casts +vreinterpretq_u8_mf8 Data type conversion|Reinterpret casts vldrq_p128 Load|Load vstrq_p128 Store|Store vaeseq_u8 Cryptography|AES @@ -4320,6 +4504,7 @@ vreinterpret_bf16_u16 Data type conversion|Reinterpret casts vreinterpret_bf16_u32 Data type conversion|Reinterpret casts vreinterpret_bf16_p8 Data type conversion|Reinterpret casts vreinterpret_bf16_p16 Data type conversion|Reinterpret casts +vreinterpret_bf16_mf8 Data type conversion|Reinterpret casts vreinterpret_bf16_u64 Data type conversion|Reinterpret casts vreinterpret_bf16_s64 Data type conversion|Reinterpret casts vreinterpretq_bf16_s8 Data type conversion|Reinterpret casts @@ -4331,6 +4516,7 @@ vreinterpretq_bf16_u16 Data type conversion|Reinterpret casts vreinterpretq_bf16_u32 Data type conversion|Reinterpret casts vreinterpretq_bf16_p8 Data type conversion|Reinterpret casts vreinterpretq_bf16_p16 Data type conversion|Reinterpret casts +vreinterpretq_bf16_mf8 Data type conversion|Reinterpret casts vreinterpretq_bf16_u64 Data type conversion|Reinterpret casts vreinterpretq_bf16_s64 Data type conversion|Reinterpret casts vreinterpret_bf16_f64 Data type conversion|Reinterpret casts @@ -4347,6 +4533,7 @@ vreinterpret_u16_bf16 Data type conversion|Reinterpret casts vreinterpret_u32_bf16 Data type conversion|Reinterpret casts vreinterpret_p8_bf16 Data type conversion|Reinterpret casts vreinterpret_p16_bf16 Data type conversion|Reinterpret casts +vreinterpret_mf8_bf16 Data type conversion|Reinterpret casts vreinterpret_u64_bf16 Data type conversion|Reinterpret casts vreinterpret_s64_bf16 Data type conversion|Reinterpret casts vreinterpret_f64_bf16 Data type conversion|Reinterpret casts @@ -4360,6 +4547,7 @@ vreinterpretq_u16_bf16 Data type conversion|Reinterpret casts vreinterpretq_u32_bf16 Data type conversion|Reinterpret casts vreinterpretq_p8_bf16 Data type conversion|Reinterpret casts vreinterpretq_p16_bf16 Data type conversion|Reinterpret casts +vreinterpretq_mf8_bf16 Data type conversion|Reinterpret casts vreinterpretq_u64_bf16 Data type conversion|Reinterpret casts vreinterpretq_s64_bf16 Data type conversion|Reinterpret casts vreinterpretq_f64_bf16 Data type conversion|Reinterpret casts @@ -4447,4 +4635,55 @@ vluti4q_lane_bf16_x2 Table lookup|Lookup table read with 4-bit indices vluti4q_lane_p16_x2 Table lookup|Lookup table read with 4-bit indices vluti4q_lane_u8 Table lookup|Lookup table read with 4-bit indices vluti4q_lane_s8 Table lookup|Lookup table read with 4-bit indices -vluti4q_lane_p8 Table lookup|Lookup table read with 4-bit indices \ No newline at end of file +vluti4q_lane_p8 Table lookup|Lookup table read with 4-bit indices +vcvt1_bf16_mf8_fpm Data type conversion|Conversions +vcvt1_low_bf16_mf8_fpm Data type conversion|Conversions +vcvt2_bf16_mf8_fpm Data type conversion|Conversions +vcvt2_low_bf16_mf8_fpm Data type conversion|Conversions +vcvt1_high_bf16_mf8_fpm Data type conversion|Conversions +vcvt2_high_bf16_mf8_fpm Data type conversion|Conversions +vcvt1_f16_mf8_fpm Data type conversion|Conversions +vcvt1_low_f16_mf8_fpm Data type conversion|Conversions +vcvt2_f16_mf8_fpm Data type conversion|Conversions +vcvt2_low_f16_mf8_fpm Data type conversion|Conversions +vcvt1_high_f16_mf8_fpm Data type conversion|Conversions +vcvt2_high_f16_mf8_fpm Data type conversion|Conversions +vcvt_mf8_f32_fpm Data type conversion|Conversions +vcvt_high_mf8_f32_fpm Data type conversion|Conversions +vcvt_mf8_f16_fpm Data type conversion|Conversions +vcvtq_mf8_f16_fpm Data type conversion|Conversions +vscale_f16 Vector arithmetic|Exponent +vscaleq_f16 Vector arithmetic|Exponent +vscale_f32 Vector arithmetic|Exponent +vscaleq_f32 Vector arithmetic|Exponent +vscaleq_f64 Vector arithmetic|Exponent +vdot_f32_mf8_fpm Vector arithmetic|Dot product +vdotq_f32_mf8_fpm Vector arithmetic|Dot product +vdot_lane_f32_mf8_fpm Vector arithmetic|Dot product +vdot_laneq_f32_mf8_fpm Vector arithmetic|Dot product +vdotq_lane_f32_mf8_fpm Vector arithmetic|Dot product +vdotq_laneq_f32_mf8_fpm Vector arithmetic|Dot product +vdot_f16_mf8_fpm Vector arithmetic|Dot product +vdotq_f16_mf8_fpm Vector arithmetic|Dot product +vdot_lane_f16_mf8_fpm Vector arithmetic|Dot product +vdot_laneq_f16_mf8_fpm Vector arithmetic|Dot product +vdotq_lane_f16_mf8_fpm Vector arithmetic|Dot product +vdotq_laneq_f16_mf8_fpm Vector arithmetic|Dot product +vmlalbq_f16_mf8_fpm Vector arithmetic|Multiply|Multiply-accumulate and widen +vmlaltq_f16_mf8_fpm Vector arithmetic|Multiply|Multiply-accumulate and widen +vmlalbq_lane_f16_mf8_fpm Vector arithmetic|Multiply|Multiply-accumulate and widen +vmlalbq_laneq_f16_mf8_fpm Vector arithmetic|Multiply|Multiply-accumulate and widen +vmlaltq_lane_f16_mf8_fpm Vector arithmetic|Multiply|Multiply-accumulate and widen +vmlaltq_laneq_f16_mf8_fpm Vector arithmetic|Multiply|Multiply-accumulate and widen +vmlallbbq_f32_mf8_fpm Vector arithmetic|Multiply|Multiply-accumulate and widen +vmlallbtq_f32_mf8_fpm Vector arithmetic|Multiply|Multiply-accumulate and widen +vmlalltbq_f32_mf8_fpm Vector arithmetic|Multiply|Multiply-accumulate and widen +vmlallttq_f32_mf8_fpm Vector arithmetic|Multiply|Multiply-accumulate and widen +vmlallbbq_lane_f32_mf8_fpm Vector arithmetic|Multiply|Multiply-accumulate and widen +vmlallbbq_laneq_f32_mf8_fpm Vector arithmetic|Multiply|Multiply-accumulate and widen +vmlallbtq_lane_f32_mf8_fpm Vector arithmetic|Multiply|Multiply-accumulate and widen +vmlallbtq_laneq_f32_mf8_fpm Vector arithmetic|Multiply|Multiply-accumulate and widen +vmlalltbq_lane_f32_mf8_fpm Vector arithmetic|Multiply|Multiply-accumulate and widen +vmlalltbq_laneq_f32_mf8_fpm Vector arithmetic|Multiply|Multiply-accumulate and widen +vmlallttq_lane_f32_mf8_fpm Vector arithmetic|Multiply|Multiply-accumulate and widen +vmlallttq_laneq_f32_mf8_fpm Vector arithmetic|Multiply|Multiply-accumulate and widen