Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FP8 ACLE specification #323

Merged
merged 2 commits into from
Sep 23, 2024
Merged

Conversation

momchil-velikov
Copy link
Contributor


name: Pull request
about: Technical issues, document format problems, bugs in scripts or feature proposal.


Thank you for submitting a pull request!

If this PR is about a bugfix:

Please use the bugfix label and make sure to go through the checklist below.

If this PR is about a proposal:

We are looking forward to evaluate your proposal, and if possible to
make it part of the Arm C Language Extension (ACLE) specifications.

We would like to encourage you reading through the contribution
guidelines
, in particular the section on submitting
a proposal
.

Please use the proposal label.

As for any pull request, please make sure to go through the below
checklist.

Checklist: (mark with X those which apply)

  • If an issue reporting the bug exists, I have mentioned it in the
    PR (do not bother creating the issue if all you want to do is
    fixing the bug yourself).
  • I have added/updated the SPDX-FileCopyrightText lines on top
    of any file I have edited. Format is SPDX-FileCopyrightText: Copyright {year} {entity or name} <{contact informations}>
    (Please update existing copyright lines if applicable. You can
    specify year ranges with hyphen , as in 2017-2019, and use
    commas to separate gaps, as in 2018-2020, 2022).
  • I have updated the Copyright section of the sources of the
    specification I have edited (this will show up in the text
    rendered in the PDF and other output format supported). The
    format is the same described in the previous item.
  • I have run the CI scripts (if applicable, as they might be
    tricky to set up on non-*nix machines). The sequence can be
    found in the contribution
    guidelines
    . Don't
    worry if you cannot run these scripts on your machine, your
    patch will be automatically checked in the Actions of the pull
    request.
  • I have added an item that describes the changes I have
    introduced in this PR in the section Changes for next
    release
    of the section Change Control/Document history
    of the document. Create Changes for next release if it does
    not exist. Notice that changes that are not modifying the
    content and rendering of the specifications (both HTML and PDF)
    do not need to be listed.
  • When modifying content and/or its rendering, I have checked the
    correctness of the result in the PDF output (please refer to the
    instructions on how to build the PDFs
    locally
    ).
  • The variable draftversion is set to true in the YAML header
    of the sources of the specifications I have modified.
  • Please DO NOT add my GitHub profile to the list of contributors
    in the README page of the project.

main/acle.md Outdated Show resolved Hide resolved
main/acle.md Outdated Show resolved Hide resolved
main/acle.md Outdated Show resolved Hide resolved
@andrewcarlotti
Copy link

I'd prefer slightly different naming for the intrinsics and new types. Specifically:

  1. Can we call the new types floatm8_t, floatm8x16_t, svfloatm8_t, etc.? This would be more consistent with existing type names while still preserving the "modal" distinction. It also makes the type name more easily distinguishable from FPMR values (which use fpm_t).

  2. Can we drop the _fpm suffix from all the intrinsic names, and instead represent the modality in the type suffix (by replacing _f8 with _fm8 wherever it appears)?

Combining these, my proposal is to replace, for example,
float16x4_t vdot_lane_f16_f8_fpm(float16x4_t vd, fpm8x8_t vn, fpm8x8_t vm, __builtin_constant_p(lane), fpm_t fpm)
with
float16x4_t vdot_lane_f16_fm8(float16x4_t vd, floatm8x8_t vn, floatm8x8_t vm, __builtin_constant_p(lane), fpm_t fpm).

@momchil-velikov
Copy link
Contributor Author

  1. Can we call the new types floatm8_t, floatm8x16_t, svfloatm8_t, etc.?
  2. Can we drop the _fpm suffix from all the intrinsic names, and instead represent the modality in the type suffix (by replacing _f8 with _fm8 wherever it appears)?

I am in favour of both proposals.

@rsandifo-arm
Copy link
Contributor

This might have been mentioned already, but the new vector types should also be added to svset_neonq, svget_neonq and svdup_neonq.

CarolineConcatto added a commit to CarolineConcatto/llvm-project that referenced this pull request Jul 1, 2024
ARM ACLE PR#323[1] adds new modal types for 8-bit floating point intrinsic.

From the PR#323:
```
ACLE defines the `__fpm8` type, which can be used for the E5M2 and E4M3
8-bit floating-point formats. It is a storage and interchange only type
with no arithmetic operations other than intrinsic calls.
````

The type should be an opaque type and its format in undefined in Clang.
Only defined in the backend by a status/format register, for AArch64 the FPMR.

This patch is an attempt to the add the fpm8_t scalar type.
It has a parser and codegen for the new scalar type.

The patch it is lowering to and 8bit unsigned as it has no format.
But maybe we should add another opaque type.

[1]  ARM-software/acle#323
@momchil-velikov
Copy link
Contributor Author

  1. Can we call the new types floatm8_t, floatm8x16_t, svfloatm8_t, etc.?
  2. Can we drop the _fpm suffix from all the intrinsic names, and instead represent the modality in the type suffix (by replacing _f8 with _fm8 wherever it appears)?

I am in favour of both proposals.

Coming up Soon(tm).

@momchil-velikov
Copy link
Contributor Author

This might have been mentioned already, but the new vector types should also be added to svset_neonq, svget_neonq and svdup_neonq.

My next step is to add intrinsics for the untyped SVE/SME instructions, that would include these too.

CarolineConcatto added a commit to CarolineConcatto/llvm-project that referenced this pull request Jul 22, 2024
This patch adds these new vector sizes for neon:
fpm8x16_t and fpm8x8_t

According to the ARM ACLE PR#323[1].

[1] ARM-software/acle#323
@momchil-velikov
Copy link
Contributor Author

I'd prefer slightly different naming for the intrinsics and new types. Specifically:

  1. Can we call the new types floatm8_t, floatm8x16_t, svfloatm8_t, etc.? This would be more consistent with existing type names while still preserving the "modal" distinction. It also makes the type name more easily distinguishable from FPMR values (which use fpm_t).

This part done.

main/acle.md Outdated Show resolved Hide resolved
@paulwalker-arm
Copy link

  1. Can we call the new types floatm8_t, floatm8x16_t, svfloatm8_t, etc.? This would be more consistent with existing type names while still preserving the "modal" distinction. It also makes the type name more easily distinguishable from FPMR values (which use fpm_t).

As an amendment that follows the scheme used when going from float16 -> bfloat16 what about mfloat8_t, mfloat8x16_t, svmfloat8_t with "m" meaning "modal"?

@momchil-velikov
Copy link
Contributor Author

momchil-velikov commented Jul 30, 2024

Things renamed according to the above naming scheme.

Copy link
Contributor

@rsandifo-arm rsandifo-arm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new feature macros should also be listed in the “Summary of predefined macros” section.

main/acle.md Outdated Show resolved Hide resolved
main/acle.md Outdated Show resolved Hide resolved
main/acle.md Outdated Show resolved Hide resolved
main/acle.md Outdated Show resolved Hide resolved
main/acle.md Outdated Show resolved Hide resolved
tools/intrinsic_db/advsimd.csv Outdated Show resolved Hide resolved
tools/intrinsic_db/advsimd.csv Outdated Show resolved Hide resolved
tools/intrinsic_db/advsimd.csv Outdated Show resolved Hide resolved
tools/intrinsic_db/advsimd.csv Outdated Show resolved Hide resolved
main/acle.md Outdated Show resolved Hide resolved
CarolineConcatto added a commit to CarolineConcatto/llvm-project that referenced this pull request Jul 31, 2024
 `mfloat8_t`     | equivalent to `__mfp8` |

According to ACLE[1] proposal
[1] ARM-software/acle#323
CarolineConcatto added a commit to CarolineConcatto/llvm-project that referenced this pull request Jul 31, 2024
 `mfloat8_t`     | equivalent to `__mfp8` |

According to ACLE[1] proposal
[1] ARM-software/acle#323
CarolineConcatto added a commit to CarolineConcatto/llvm-project that referenced this pull request Jul 31, 2024
CarolineConcatto added a commit to CarolineConcatto/llvm-project that referenced this pull request Jul 31, 2024
and fpm8x16_t to mfloat8x16_t

According to the ACLE[1]

[1]ARM-software/acle#323
CarolineConcatto added a commit to CarolineConcatto/llvm-project that referenced this pull request Aug 2, 2024
ARM ACLE PR#323[1] adds new modal types for 8-bit floating point intrinsic.

From the PR#323:
```
ACLE defines the `__mfp8` type, which can be used for the E5M2 and E4M3
8-bit floating-point formats. It is a storage and interchange only type
with no arithmetic operations other than intrinsic calls.
 `mfloat8_t`     | equivalent to `__mfp8` |
````

The type should be an opaque type and its format in undefined in Clang.
Only defined in the backend by a status/format register, for AArch64 the FPMR.

This patch is an attempt to the add the fpm8_t scalar type.
It has a parser and codegen for the new scalar type.

The patch it is lowering to and 8bit unsigned as it has no format.
But maybe we should add another opaque type.

According to ACLE[1] proposal
[1] ARM-software/acle#323
CarolineConcatto added a commit to CarolineConcatto/llvm-project that referenced this pull request Aug 2, 2024
This patch adds these new vector sizes for neon:
mfloat8x16_t and mfloat8x8_t

According to the ARM ACLE PR#323[1].

[1] ARM-software/acle#323
CarolineConcatto added a commit to CarolineConcatto/llvm-project that referenced this pull request Aug 2, 2024
This patch adds these new vector sizes for sve:
svmfloat8_t

According to the ARM ACLE PR#323[1].

[1] ARM-software/acle#323
main/acle.md Outdated Show resolved Hide resolved
main/acle.md Outdated Show resolved Hide resolved
main/acle.md Outdated Show resolved Hide resolved
main/acle.md Outdated Show resolved Hide resolved
main/acle.md Outdated Show resolved Hide resolved
main/acle.md Outdated Show resolved Hide resolved
main/acle.md Outdated Show resolved Hide resolved
tools/intrinsic_db/advsimd_classification.csv Outdated Show resolved Hide resolved
tools/intrinsic_db/advsimd_classification.csv Outdated Show resolved Hide resolved
tools/intrinsic_db/advsimd_classification.csv Outdated Show resolved Hide resolved
main/acle.md Outdated Show resolved Hide resolved
main/acle.md Outdated Show resolved Hide resolved
main/acle.md Outdated Show resolved Hide resolved
main/acle.md Outdated Show resolved Hide resolved
main/acle.md Outdated Show resolved Hide resolved
main/acle.md Show resolved Hide resolved
main/acle.md Outdated Show resolved Hide resolved
main/acle.md Show resolved Hide resolved
main/acle.md Outdated Show resolved Hide resolved
tools/intrinsic_db/advsimd_classification.csv Outdated Show resolved Hide resolved
main/acle.md Outdated Show resolved Hide resolved
main/acle.md Show resolved Hide resolved
main/acle.md Outdated Show resolved Hide resolved
main/acle.md Outdated Show resolved Hide resolved
main/acle.md Show resolved Hide resolved
@momchil-velikov
Copy link
Contributor Author

Squished everything and rebased, not supposed to contain content changes.

@momchil-velikov momchil-velikov marked this pull request as draft September 18, 2024 10:05
[fixup] Add description of helper intrisnics

... and other clarifications.

[fixup] Add FP8 intrinsics for untyped NEON instructions (load/store/etc)

[fixup] Add FP8 intrinsics for untyped SVE/SME instructions

[fixup] Define the format for the FP8 state

[fixup] Mass renaming of FP8 instrinsics and types, and misc other

[fixup] Another naming scheme

[fixup] Miscellaneous fixes

[fixup] Add FP8 variants for some more "untyped" NEON intrinsics

[fixup] Add NEON reinterpret varaiants for FP8 and misc other

[fixup] Add FP8 feature macros to the predefined macros summary table

[fixup] Misc fixes

[fixup] Fix some indentation

[fixup] Move FP8 SVE intrinsics section alongside other streaming-compatible intrinsics

[fixup] Change FP8 NEON intrinsics classification
@momchil-velikov
Copy link
Contributor Author

Last update was completely botched, now fixed.

@momchil-velikov momchil-velikov marked this pull request as ready for review September 18, 2024 12:23
Copy link
Contributor

@CarolineConcatto CarolineConcatto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Momchil for the work.

tools/intrinsic_db/advsimd.csv Outdated Show resolved Hide resolved
CarolineConcatto added a commit to CarolineConcatto/llvm-project that referenced this pull request Sep 21, 2024
ARM ACLE PR#323[1] adds new modal types for 8-bit floating point intrinsic.

From the PR#323:
```
ACLE defines the `__mfp8` type, which can be used for the E5M2 and E4M3
8-bit floating-point formats. It is a storage and interchange only type
with no arithmetic operations other than intrinsic calls.
````

The type should be an opaque type and its format in undefined in Clang.
Only defined in the backend by a status/format register, for AArch64 the FPMR.

This patch is an attempt to the add the MFloat8_t scalar type.
It has a parser and codegen for the new scalar type.

The patch it is lowering to and 8bit unsigned as it has no format.
But maybe we should add another opaque type.

[1]  ARM-software/acle#323
@vhscampos vhscampos merged commit 5525258 into ARM-software:main Sep 23, 2024
4 checks passed
@@ -6515,7 +6714,7 @@ single vectors:

| **Signed integer** | **Unsigned integer** | **Floating-point** | |
| -------------------- | -------------------- | -------------------- | -------------------- |
| `svint8_t` | `svuint8_t` | | |
| `svint8_t` | `svuint8_t` | | `svmfloat8_t |

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the

svbfloat16_t is only available if the header file also provides a definition of bfloat16_t

section below be repeated or extended for svmfloat8_t?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't that be necessary only if the underlying type definition was conditional?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would, I was confused, thanks for clarifying :)

Copy link
Contributor

@sallyarmneale sallyarmneale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor suggestions

Comment on lines +5904 to +5905
The FP8 types are all opaque types. That is to say they can only be used
by intrinsics.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is to say they can only be used by intrinsics.

Change to: That is, they can only be used by intrinsics.


#### FCVTNT, FCVTNB

Single-precision convert, narrow and interleave to 8-bit floating-point (top and bottom).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comma after 'narrow'

Lukacma added a commit to llvm/llvm-project that referenced this pull request Sep 25, 2024
This patch implements these intrinsics:

FSCALE SINGLE AND MULTI
``` 
  // Variants are also available for:
  // [_single_f32_x2], [_single_f64_x2],
  // [_single_f16_x4], [_single_f32_x4], [_single_f64_x4]
  svfloat16x2_t svscale[_single_f16_x2](svfloat16x2_t zd, svfloat16_t zm) __arm_streaming;

  // Variants are also available for:
  //  [_f32_x2], [_f64_x2],
  //  [_f16_x4], [_f32_x4], [_f64_x4]
  svfloat16x2_t svscale[_f16_x2](svfloat16x2_t zd, svfloat16x2_t zm) __arm_streaming

```
(cf. ARM-software/acle#323)

Co-authored-by: Caroline Concatto <[email protected]>
Lukacma added a commit to llvm/llvm-project that referenced this pull request Sep 26, 2024
This patch implements following intrinsics:

```
float16x4_t vscale_f16(float16x4_t vn, int16x4_t vm)	
float16x8_t vscaleq_f16(float16x8_t vn, int16x8_t vm)
float32x2_t vscale_f32(float32x2_t vn, int32x2_t vm)
float32x4_t vscaleq_f32(float32x4_t vn, int32x4_t vm)
float64x2_t vscaleq_f64(float64x2_t vn, int64x2_t vm)
```

as defined in ARM-software/acle#323

Co-authored-by: Hassnaa Hamdi <[email protected]>
augusto2112 pushed a commit to augusto2112/llvm-project that referenced this pull request Sep 26, 2024
This patch implements these intrinsics:

FSCALE SINGLE AND MULTI
``` 
  // Variants are also available for:
  // [_single_f32_x2], [_single_f64_x2],
  // [_single_f16_x4], [_single_f32_x4], [_single_f64_x4]
  svfloat16x2_t svscale[_single_f16_x2](svfloat16x2_t zd, svfloat16_t zm) __arm_streaming;

  // Variants are also available for:
  //  [_f32_x2], [_f64_x2],
  //  [_f16_x4], [_f32_x4], [_f64_x4]
  svfloat16x2_t svscale[_f16_x2](svfloat16x2_t zd, svfloat16x2_t zm) __arm_streaming

```
(cf. ARM-software/acle#323)

Co-authored-by: Caroline Concatto <[email protected]>
augusto2112 pushed a commit to augusto2112/llvm-project that referenced this pull request Sep 26, 2024
This patch implements following intrinsics:

```
float16x4_t vscale_f16(float16x4_t vn, int16x4_t vm)	
float16x8_t vscaleq_f16(float16x8_t vn, int16x8_t vm)
float32x2_t vscale_f32(float32x2_t vn, int32x2_t vm)
float32x4_t vscaleq_f32(float32x4_t vn, int32x4_t vm)
float64x2_t vscaleq_f64(float64x2_t vn, int64x2_t vm)
```

as defined in ARM-software/acle#323

Co-authored-by: Hassnaa Hamdi <[email protected]>
Sterling-Augustine pushed a commit to Sterling-Augustine/llvm-project that referenced this pull request Sep 27, 2024
This patch implements following intrinsics:

```
float16x4_t vscale_f16(float16x4_t vn, int16x4_t vm)	
float16x8_t vscaleq_f16(float16x8_t vn, int16x8_t vm)
float32x2_t vscale_f32(float32x2_t vn, int32x2_t vm)
float32x4_t vscaleq_f32(float32x4_t vn, int32x4_t vm)
float64x2_t vscaleq_f64(float64x2_t vn, int64x2_t vm)
```

as defined in ARM-software/acle#323

Co-authored-by: Hassnaa Hamdi <[email protected]>
puja2196 pushed a commit to puja2196/LLVM-tutorial that referenced this pull request Sep 30, 2024
This patch implements following intrinsics:

```
float16x4_t vscale_f16(float16x4_t vn, int16x4_t vm)	
float16x8_t vscaleq_f16(float16x8_t vn, int16x8_t vm)
float32x2_t vscale_f32(float32x2_t vn, int32x2_t vm)
float32x4_t vscaleq_f32(float32x4_t vn, int32x4_t vm)
float64x2_t vscaleq_f64(float64x2_t vn, int64x2_t vm)
```

as defined in ARM-software/acle#323

Co-authored-by: Hassnaa Hamdi <[email protected]>
puja2196 pushed a commit to puja2196/LLVM-tutorial that referenced this pull request Oct 2, 2024
This patch implements following intrinsics:

```
float16x4_t vscale_f16(float16x4_t vn, int16x4_t vm)	
float16x8_t vscaleq_f16(float16x8_t vn, int16x8_t vm)
float32x2_t vscale_f32(float32x2_t vn, int32x2_t vm)
float32x4_t vscaleq_f32(float32x4_t vn, int32x4_t vm)
float64x2_t vscaleq_f64(float64x2_t vn, int64x2_t vm)
```

as defined in ARM-software/acle#323

Co-authored-by: Hassnaa Hamdi <[email protected]>
xgupta pushed a commit to xgupta/llvm-project that referenced this pull request Oct 4, 2024
This patch implements these intrinsics:

FSCALE SINGLE AND MULTI
``` 
  // Variants are also available for:
  // [_single_f32_x2], [_single_f64_x2],
  // [_single_f16_x4], [_single_f32_x4], [_single_f64_x4]
  svfloat16x2_t svscale[_single_f16_x2](svfloat16x2_t zd, svfloat16_t zm) __arm_streaming;

  // Variants are also available for:
  //  [_f32_x2], [_f64_x2],
  //  [_f16_x4], [_f32_x4], [_f64_x4]
  svfloat16x2_t svscale[_f16_x2](svfloat16x2_t zd, svfloat16x2_t zm) __arm_streaming

```
(cf. ARM-software/acle#323)

Co-authored-by: Caroline Concatto <[email protected]>
xgupta pushed a commit to xgupta/llvm-project that referenced this pull request Oct 4, 2024
This patch implements following intrinsics:

```
float16x4_t vscale_f16(float16x4_t vn, int16x4_t vm)	
float16x8_t vscaleq_f16(float16x8_t vn, int16x8_t vm)
float32x2_t vscale_f32(float32x2_t vn, int32x2_t vm)
float32x4_t vscaleq_f32(float32x4_t vn, int32x4_t vm)
float64x2_t vscaleq_f64(float64x2_t vn, int64x2_t vm)
```

as defined in ARM-software/acle#323

Co-authored-by: Hassnaa Hamdi <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.