Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add F4E2M1FN and F8E8M0FNU types #19096

Closed
wants to merge 11 commits into from
Closed

Add F4E2M1FN and F8E8M0FNU types #19096

wants to merge 11 commits into from

Conversation

sergey-kozub
Copy link
Contributor

@sergey-kozub sergey-kozub commented Nov 6, 2024

This PR adds F4E2M1FN primitive type (4-bit float with 2 bits exponent and 1 bit mantissa), F8E8M0FNU primitive type (8-bit float with 8 bits exponent, no mantissa and no sign) and enables loads/stores in the same way S4/U4 type is implemented.

This will enable using microscaling (MX) formats (RFC), such as MXFP4.

F4E2M1FN
- Exponent bias: 1
- Maximum stored exponent value: 3 (binary 11)
- Maximum unbiased exponent value: 3 - 1 = 2
- Minimum stored exponent value: 1 (binary 01)
- Minimum unbiased exponent value: 11 = 0
- Has Positive and Negative zero
- Doesn't have infinity
- Doesn't have NaNs

Additional details:
- Zeros (+/-): S.00.0
- Max normal number: S.11.1 = ±2^(2) x (1 + 0.5) = ±6.0
- Min normal number: S.01.0 = ±2^(0) = ±1.0
- Min subnormal number: S.00.1 = ±2^(0) x 0.5 = ±0.5

F8E8M0FNU
- Exponent bias: 127
- Maximum stored exponent value: 254 (binary 1111'1110)
- Maximum unbiased exponent value: 254 - 127 = 127
- Minimum stored exponent value: 0 (binary 0000'0000)
- Minimum unbiased exponent value: 0127 = -127
- Doesn't have zero
- Doesn't have infinity
- NaN is encoded as binary 1111'1111

Additional details:
- Zeros cannot be represented
- Negative values cannot be represented
- Mantissa is always 1

Related PRs:

The PR is split into multiple commits just to make the review easier, it is possible that some tests could fail if only some (i.e. not all) of these commits are applied.

@sergey-kozub sergey-kozub self-assigned this Nov 6, 2024
@sergey-kozub sergey-kozub force-pushed the skozub/e2m1 branch 4 times, most recently from 7a4377a to 29d4c58 Compare November 6, 2024 09:27
@sergey-kozub
Copy link
Contributor Author

sergey-kozub commented Nov 6, 2024

Note: clang-format check fails on issues unrelated to this PR.

Also, running with clang-format 18.1.8 yields no issues.

@reedwm
Copy link
Member

reedwm commented Nov 20, 2024

Sorry for the delay in reviewing.

You plan on adding the other MX types, right? If so, and convinient for you, could you add them in the same PR? Normally smaller PRs are better, but all these dtypes touch many of the same places in the code, so it's easier to batch review all of them at once. Granted, FP6 might be a bit weird if packed since their size doesn't divide the byte size -- those potentially should be in a separate PR.

But if adding the other MX types in inconvenient for you, happy to review this PR as-is.

@sergey-kozub
Copy link
Contributor Author

You plan on adding the other MX types, right? If so, and convinient for you, could you add them in the same PR?

I'll add E8M0 implementation to this PR, as you suggest.
The FP6 types are not implemented yet, I'll look into that later (not a priority).

@sergey-kozub sergey-kozub changed the title Add F4E2M1FN type Add F4E2M1FN and F8E8M0FNU types Nov 21, 2024
@sergey-kozub
Copy link
Contributor Author

Added the F8E8M0FNU support patch to this PR. Will update the description soon after.

Copy link
Member

@reedwm reedwm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

Note I didn't review the MLIR codegen files (those in gpu/fusions/transforms) since I'm not familiar with those. @mooskagh can you find someone to review these files?

xla/hlo/builder/lib/math_test.cc Outdated Show resolved Hide resolved
xla/literal.h Outdated Show resolved Hide resolved
xla/literal_comparison_test.cc Outdated Show resolved Hide resolved
xla/primitive_util.h Show resolved Hide resolved
xla/python/ifrt/dtype.proto Outdated Show resolved Hide resolved
xla/service/elemental_ir_emitter.cc Outdated Show resolved Hide resolved
xla/service/elemental_ir_emitter.cc Outdated Show resolved Hide resolved
xla/service/elemental_ir_emitter.cc Outdated Show resolved Hide resolved
third_party/tsl/third_party/py/ml_dtypes/e8m0.patch Outdated Show resolved Hide resolved
xla/tests/convert_test.cc Outdated Show resolved Hide resolved
@mooskagh mooskagh requested a review from pifon2a November 25, 2024 07:51
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you add to lower_tensors.mlir tests for the new 4-bit type similar to the existing tests for int4?

func.func @transfer_write_i4
func.func @transfer_read_i4
func.func @int4_constant

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added "f4_constant", "transfer_read_f4" and "transfer_write_f4".
BTW, there's no "transfer_read_i4" in that file.

@sergey-kozub sergey-kozub force-pushed the skozub/e2m1 branch 3 times, most recently from 6972698 to b20cfd2 Compare November 27, 2024 20:15
@sergey-kozub
Copy link
Contributor Author

I believe I addressed all the prior comments. PTAL when possible.

Copy link
Contributor

@pifon2a pifon2a left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@sergey-kozub sergey-kozub force-pushed the skozub/e2m1 branch 2 times, most recently from 3d63bba to be2e457 Compare December 3, 2024 07:49
@mooskagh
Copy link
Member

mooskagh commented Dec 3, 2024

UPD: Disregard the comment, found it.

I'm trying to merge this PR.
It references float8.h in ml_dtypes.h, but I cannot find it either in this repository, nor internally (it was deleted a year ago).

copybara-service bot pushed a commit to google/tsl that referenced this pull request Dec 3, 2024
Imported from GitHub PR openxla/xla#19096

This PR adds F4E2M1FN primitive type (4-bit float with 2 bits exponent and 1 bit mantissa), F8E8M0FNU primitive type (8-bit float with 8 bits exponent, no mantissa and no sign) and enables loads/stores in the same way S4/U4 type is implemented.

This will enable using microscaling (MX) formats ([RFC](openxla/xla#18085)), such as MXFP4.

```c
F4E2M1FN
- Exponent bias: 1
- Maximum stored exponent value: 3 (binary 11)
- Maximum unbiased exponent value: 3 - 1 = 2
- Minimum stored exponent value: 1 (binary 01)
- Minimum unbiased exponent value: 1 − 1 = 0
- Has Positive and Negative zero
- Doesn't have infinity
- Doesn't have NaNs

Additional details:
- Zeros (+/-): S.00.0
- Max normal number: S.11.1 = ±2^(2) x (1 + 0.5) = ±6.0
- Min normal number: S.01.0 = ±2^(0) = ±1.0
- Min subnormal number: S.00.1 = ±2^(0) x 0.5 = ±0.5

F8E8M0FNU
- Exponent bias: 127
- Maximum stored exponent value: 254 (binary 1111'1110)
- Maximum unbiased exponent value: 254 - 127 = 127
- Minimum stored exponent value: 0 (binary 0000'0000)
- Minimum unbiased exponent value: 0 − 127 = -127
- Doesn't have zero
- Doesn't have infinity
- NaN is encoded as binary 1111'1111

Additional details:
- Zeros cannot be represented
- Negative values cannot be represented
- Mantissa is always 1
```

Related PRs:
- openxla/stablehlo#2582
- jax-ml/ml_dtypes#181
- llvm/llvm-project#95392
- llvm/llvm-project#108877
- jax-ml/ml_dtypes#166
- llvm/llvm-project#107127
- llvm/llvm-project#111028

The PR is split into multiple commits just to make the review easier, it is possible that some tests could fail if only some (i.e. not all) of these commits are applied.
Copybara import of the project:

--
fa539fbde987ff6421fd2937fade495baf633630 by Sergey Kozub <[email protected]>:

Add F4E2M1FN type: import mxfloat.h

--
2c014035923e0394b2cfcb81eaf090a96621b0aa by Sergey Kozub <[email protected]>:

Add F4E2M1FN type: primitive type

--
e919ed54e825f2e905aaf0cc279dd21cd80f1ce9 by Sergey Kozub <[email protected]>:

Add F4E2M1FN type: literal support

--
ca16839096feb93e0454ec380c5c707c30199346 by Sergey Kozub <[email protected]>:

Add F4E2M1FN type: conversion codegen

--
eedc079ca9a4db9e611d84877a25b3da21386f16 by Sergey Kozub <[email protected]>:

Add F4E2M1FN type: python interface

--
8e0305cd47002f0c1f8668a3cbcbce5428f2a4c6 by Sergey Kozub <[email protected]>:

Add F4E2M1FN type: FFI

--
aabe9c68d964609f78f29e17ee0680798ad0c6ac by Sergey Kozub <[email protected]>:

Add F4E2M1FN type: HLO evaluator

--
87da2ebfab388f113482e852009401a9e416974a by Sergey Kozub <[email protected]>:

Add F4E2M1FN type: add tests

--
e0ee48c3a37018ba985c850931592d62eadf7c2e by Sergey Kozub <[email protected]>:

Add F8E8M0FNU type

--
be2e457922e2cddeaf5aca13dd022f3ac2a1393b by Sergey Kozub <[email protected]>:

Addressing PR#19096 review comments

Merging this change closes #19096

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#19096 from openxla:skozub/e2m1 be2e457922e2cddeaf5aca13dd022f3ac2a1393b
PiperOrigin-RevId: 702273510
copybara-service bot pushed a commit that referenced this pull request Dec 3, 2024
Imported from GitHub PR #19096

This PR adds F4E2M1FN primitive type (4-bit float with 2 bits exponent and 1 bit mantissa), F8E8M0FNU primitive type (8-bit float with 8 bits exponent, no mantissa and no sign) and enables loads/stores in the same way S4/U4 type is implemented.

This will enable using microscaling (MX) formats ([RFC](#18085)), such as MXFP4.

```c
F4E2M1FN
- Exponent bias: 1
- Maximum stored exponent value: 3 (binary 11)
- Maximum unbiased exponent value: 3 - 1 = 2
- Minimum stored exponent value: 1 (binary 01)
- Minimum unbiased exponent value: 1 − 1 = 0
- Has Positive and Negative zero
- Doesn't have infinity
- Doesn't have NaNs

Additional details:
- Zeros (+/-): S.00.0
- Max normal number: S.11.1 = ±2^(2) x (1 + 0.5) = ±6.0
- Min normal number: S.01.0 = ±2^(0) = ±1.0
- Min subnormal number: S.00.1 = ±2^(0) x 0.5 = ±0.5

F8E8M0FNU
- Exponent bias: 127
- Maximum stored exponent value: 254 (binary 1111'1110)
- Maximum unbiased exponent value: 254 - 127 = 127
- Minimum stored exponent value: 0 (binary 0000'0000)
- Minimum unbiased exponent value: 0 − 127 = -127
- Doesn't have zero
- Doesn't have infinity
- NaN is encoded as binary 1111'1111

Additional details:
- Zeros cannot be represented
- Negative values cannot be represented
- Mantissa is always 1
```

Related PRs:
- openxla/stablehlo#2582
- jax-ml/ml_dtypes#181
- llvm/llvm-project#95392
- llvm/llvm-project#108877
- jax-ml/ml_dtypes#166
- llvm/llvm-project#107127
- llvm/llvm-project#111028

The PR is split into multiple commits just to make the review easier, it is possible that some tests could fail if only some (i.e. not all) of these commits are applied.
Copybara import of the project:

--
fa539fb by Sergey Kozub <[email protected]>:

Add F4E2M1FN type: import mxfloat.h

--
2c01403 by Sergey Kozub <[email protected]>:

Add F4E2M1FN type: primitive type

--
e919ed5 by Sergey Kozub <[email protected]>:

Add F4E2M1FN type: literal support

--
ca16839 by Sergey Kozub <[email protected]>:

Add F4E2M1FN type: conversion codegen

--
eedc079 by Sergey Kozub <[email protected]>:

Add F4E2M1FN type: python interface

--
8e0305c by Sergey Kozub <[email protected]>:

Add F4E2M1FN type: FFI

--
aabe9c6 by Sergey Kozub <[email protected]>:

Add F4E2M1FN type: HLO evaluator

--
87da2eb by Sergey Kozub <[email protected]>:

Add F4E2M1FN type: add tests

--
e0ee48c by Sergey Kozub <[email protected]>:

Add F8E8M0FNU type

--
be2e457 by Sergey Kozub <[email protected]>:

Addressing PR#19096 review comments

Merging this change closes #19096

FUTURE_COPYBARA_INTEGRATE_REVIEW=#19096 from openxla:skozub/e2m1 be2e457
PiperOrigin-RevId: 702273510
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Dec 3, 2024
Imported from GitHub PR openxla/xla#19096

This PR adds F4E2M1FN primitive type (4-bit float with 2 bits exponent and 1 bit mantissa), F8E8M0FNU primitive type (8-bit float with 8 bits exponent, no mantissa and no sign) and enables loads/stores in the same way S4/U4 type is implemented.

This will enable using microscaling (MX) formats ([RFC](openxla/xla#18085)), such as MXFP4.

```c
F4E2M1FN
- Exponent bias: 1
- Maximum stored exponent value: 3 (binary 11)
- Maximum unbiased exponent value: 3 - 1 = 2
- Minimum stored exponent value: 1 (binary 01)
- Minimum unbiased exponent value: 1 − 1 = 0
- Has Positive and Negative zero
- Doesn't have infinity
- Doesn't have NaNs

Additional details:
- Zeros (+/-): S.00.0
- Max normal number: S.11.1 = ±2^(2) x (1 + 0.5) = ±6.0
- Min normal number: S.01.0 = ±2^(0) = ±1.0
- Min subnormal number: S.00.1 = ±2^(0) x 0.5 = ±0.5

F8E8M0FNU
- Exponent bias: 127
- Maximum stored exponent value: 254 (binary 1111'1110)
- Maximum unbiased exponent value: 254 - 127 = 127
- Minimum stored exponent value: 0 (binary 0000'0000)
- Minimum unbiased exponent value: 0 − 127 = -127
- Doesn't have zero
- Doesn't have infinity
- NaN is encoded as binary 1111'1111

Additional details:
- Zeros cannot be represented
- Negative values cannot be represented
- Mantissa is always 1
```

Related PRs:
- openxla/stablehlo#2582
- jax-ml/ml_dtypes#181
- llvm/llvm-project#95392
- llvm/llvm-project#108877
- jax-ml/ml_dtypes#166
- llvm/llvm-project#107127
- llvm/llvm-project#111028

The PR is split into multiple commits just to make the review easier, it is possible that some tests could fail if only some (i.e. not all) of these commits are applied.
Copybara import of the project:

--
fa539fbde987ff6421fd2937fade495baf633630 by Sergey Kozub <[email protected]>:

Add F4E2M1FN type: import mxfloat.h

--
2c014035923e0394b2cfcb81eaf090a96621b0aa by Sergey Kozub <[email protected]>:

Add F4E2M1FN type: primitive type

--
e919ed54e825f2e905aaf0cc279dd21cd80f1ce9 by Sergey Kozub <[email protected]>:

Add F4E2M1FN type: literal support

--
ca16839096feb93e0454ec380c5c707c30199346 by Sergey Kozub <[email protected]>:

Add F4E2M1FN type: conversion codegen

--
eedc079ca9a4db9e611d84877a25b3da21386f16 by Sergey Kozub <[email protected]>:

Add F4E2M1FN type: python interface

--
8e0305cd47002f0c1f8668a3cbcbce5428f2a4c6 by Sergey Kozub <[email protected]>:

Add F4E2M1FN type: FFI

--
aabe9c68d964609f78f29e17ee0680798ad0c6ac by Sergey Kozub <[email protected]>:

Add F4E2M1FN type: HLO evaluator

--
87da2ebfab388f113482e852009401a9e416974a by Sergey Kozub <[email protected]>:

Add F4E2M1FN type: add tests

--
e0ee48c3a37018ba985c850931592d62eadf7c2e by Sergey Kozub <[email protected]>:

Add F8E8M0FNU type

--
be2e457922e2cddeaf5aca13dd022f3ac2a1393b by Sergey Kozub <[email protected]>:

Addressing PR#19096 review comments

Merging this change closes #19096

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#19096 from openxla:skozub/e2m1 be2e457922e2cddeaf5aca13dd022f3ac2a1393b
PiperOrigin-RevId: 702273510
@sergey-kozub
Copy link
Contributor Author

Thanks, Reed! I'll proceed with rebasing 2533c35 to HEAD and create a new PR.

copybara-service bot pushed a commit to google/tsl that referenced this pull request Jan 10, 2025
Also add mxfloat as a dependency to TensorFlow and TSL. This is needed to merge openxla/xla#19096. Previously this was done in the merge commit for that PR, but the PR was rolled back since the new types caused an internal TF Android build to fail. Now it's being done in this separate, smaller change so its easier to rollback if issues occur.

PiperOrigin-RevId: 713856483
copybara-service bot pushed a commit that referenced this pull request Jan 10, 2025
Also add mxfloat as a dependency to TensorFlow and TSL. This is needed to merge #19096. Previously this was done in the merge commit for that PR, but the PR was rolled back since the new types caused an internal TF Android build to fail. Now it's being done in this separate, smaller change so its easier to rollback if issues occur.

PiperOrigin-RevId: 713856483
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Jan 10, 2025
Also add mxfloat as a dependency to TensorFlow and TSL. This is needed to merge openxla/xla#19096. Previously this was done in the merge commit for that PR, but the PR was rolled back since the new types caused an internal TF Android build to fail. Now it's being done in this separate, smaller change so its easier to rollback if issues occur.

PiperOrigin-RevId: 713856483
copybara-service bot pushed a commit to google/tsl that referenced this pull request Jan 10, 2025
Also add mxfloat as a dependency to TensorFlow and TSL. This is needed to merge openxla/xla#19096. Previously this was done in the merge commit for that PR, but the PR was rolled back since the new types caused an internal TF Android build to fail. Now it's being done in this separate, smaller change so its easier to rollback if issues occur.

PiperOrigin-RevId: 713856483
copybara-service bot pushed a commit that referenced this pull request Jan 10, 2025
Also add mxfloat as a dependency to TensorFlow and TSL. This is needed to merge #19096. Previously this was done in the merge commit for that PR, but the PR was rolled back since the new types caused an internal TF Android build to fail. Now it's being done in this separate, smaller change so its easier to rollback if issues occur.

PiperOrigin-RevId: 713856483
copybara-service bot pushed a commit to google/tsl that referenced this pull request Jan 11, 2025
Also add mxfloat as a dependency to TensorFlow and TSL. This is needed to merge openxla/xla#19096. Previously this was done in the merge commit for that PR, but the PR was rolled back since the new types caused an internal TF Android build to fail. Now it's being done in this separate, smaller change so its easier to rollback if issues occur.

PiperOrigin-RevId: 714285735
copybara-service bot pushed a commit that referenced this pull request Jan 11, 2025
Also add mxfloat as a dependency to TensorFlow and TSL. This is needed to merge #19096. Previously this was done in the merge commit for that PR, but the PR was rolled back since the new types caused an internal TF Android build to fail. Now it's being done in this separate, smaller change so its easier to rollback if issues occur.

PiperOrigin-RevId: 714285735
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Jan 11, 2025
Also add mxfloat as a dependency to TensorFlow and TSL. This is needed to merge openxla/xla#19096. Previously this was done in the merge commit for that PR, but the PR was rolled back since the new types caused an internal TF Android build to fail. Now it's being done in this separate, smaller change so its easier to rollback if issues occur.

PiperOrigin-RevId: 714285735
@reedwm
Copy link
Member

reedwm commented Jan 11, 2025

I updated the ml_dtypes version used by TF in tensorflow/tensorflow@42bd1d3, and the TensorFlow Android build is still passing. So we should be able to merge the PR smoothly this time once you create the new PR :)

@sergey-kozub
Copy link
Contributor Author

Created a new PR #21380 (rebased)

copybara-service bot pushed a commit that referenced this pull request Jan 14, 2025
Imported from GitHub PR #21380

Previous PR #19096 was rolled back, re-trying.

This PR adds F4E2M1FN primitive type (4-bit float with 2 bits exponent and 1 bit mantissa), F8E8M0FNU primitive type (8-bit float with 8 bits exponent, no mantissa and no sign) and enables loads/stores in the same way S4/U4 type is implemented.

This will enable using microscaling (MX) formats ([RFC](#18085)), such as MXFP4.

```c
F4E2M1FN
- Exponent bias: 1
- Maximum stored exponent value: 3 (binary 11)
- Maximum unbiased exponent value: 3 - 1 = 2
- Minimum stored exponent value: 1 (binary 01)
- Minimum unbiased exponent value: 1 − 1 = 0
- Has Positive and Negative zero
- Doesn't have infinity
- Doesn't have NaNs

Additional details:
- Zeros (+/-): S.00.0
- Max normal number: S.11.1 = ±2^(2) x (1 + 0.5) = ±6.0
- Min normal number: S.01.0 = ±2^(0) = ±1.0
- Min subnormal number: S.00.1 = ±2^(0) x 0.5 = ±0.5

F8E8M0FNU
- Exponent bias: 127
- Maximum stored exponent value: 254 (binary 1111'1110)
- Maximum unbiased exponent value: 254 - 127 = 127
- Minimum stored exponent value: 0 (binary 0000'0000)
- Minimum unbiased exponent value: 0 − 127 = -127
- Doesn't have zero
- Doesn't have infinity
- NaN is encoded as binary 1111'1111

Additional details:
- Zeros cannot be represented
- Negative values cannot be represented
- Mantissa is always 1
```

Related PRs:
- openxla/stablehlo#2582
- jax-ml/ml_dtypes#181
- llvm/llvm-project#95392
- llvm/llvm-project#108877
- jax-ml/ml_dtypes#166
- llvm/llvm-project#107127
- llvm/llvm-project#111028
Copybara import of the project:

--
d7e00c4 by Sergey Kozub <[email protected]>:

Add F4E2M1FN and F8E8M0FNU types

Merging this change closes #21380

FUTURE_COPYBARA_INTEGRATE_REVIEW=#21380 from openxla:skozub/e2m1_e8m0 d7e00c4
PiperOrigin-RevId: 715070992
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Jan 14, 2025
Imported from GitHub PR openxla/xla#21380

Previous PR openxla/xla#19096 was rolled back, re-trying.

This PR adds F4E2M1FN primitive type (4-bit float with 2 bits exponent and 1 bit mantissa), F8E8M0FNU primitive type (8-bit float with 8 bits exponent, no mantissa and no sign) and enables loads/stores in the same way S4/U4 type is implemented.

This will enable using microscaling (MX) formats ([RFC](openxla/xla#18085)), such as MXFP4.

```c
F4E2M1FN
- Exponent bias: 1
- Maximum stored exponent value: 3 (binary 11)
- Maximum unbiased exponent value: 3 - 1 = 2
- Minimum stored exponent value: 1 (binary 01)
- Minimum unbiased exponent value: 1 − 1 = 0
- Has Positive and Negative zero
- Doesn't have infinity
- Doesn't have NaNs

Additional details:
- Zeros (+/-): S.00.0
- Max normal number: S.11.1 = ±2^(2) x (1 + 0.5) = ±6.0
- Min normal number: S.01.0 = ±2^(0) = ±1.0
- Min subnormal number: S.00.1 = ±2^(0) x 0.5 = ±0.5

F8E8M0FNU
- Exponent bias: 127
- Maximum stored exponent value: 254 (binary 1111'1110)
- Maximum unbiased exponent value: 254 - 127 = 127
- Minimum stored exponent value: 0 (binary 0000'0000)
- Minimum unbiased exponent value: 0 − 127 = -127
- Doesn't have zero
- Doesn't have infinity
- NaN is encoded as binary 1111'1111

Additional details:
- Zeros cannot be represented
- Negative values cannot be represented
- Mantissa is always 1
```

Related PRs:
- openxla/stablehlo#2582
- jax-ml/ml_dtypes#181
- llvm/llvm-project#95392
- llvm/llvm-project#108877
- jax-ml/ml_dtypes#166
- llvm/llvm-project#107127
- llvm/llvm-project#111028
Copybara import of the project:

--
d7e00c49a4b4f26c06266d6bb941275e67464c01 by Sergey Kozub <[email protected]>:

Add F4E2M1FN and F8E8M0FNU types

Merging this change closes #21380

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#21380 from openxla:skozub/e2m1_e8m0 d7e00c49a4b4f26c06266d6bb941275e67464c01
PiperOrigin-RevId: 715070992
copybara-service bot pushed a commit that referenced this pull request Jan 14, 2025
Imported from GitHub PR #21380

Previous PR #19096 was rolled back, re-trying.

This PR adds F4E2M1FN primitive type (4-bit float with 2 bits exponent and 1 bit mantissa), F8E8M0FNU primitive type (8-bit float with 8 bits exponent, no mantissa and no sign) and enables loads/stores in the same way S4/U4 type is implemented.

This will enable using microscaling (MX) formats ([RFC](#18085)), such as MXFP4.

```c
F4E2M1FN
- Exponent bias: 1
- Maximum stored exponent value: 3 (binary 11)
- Maximum unbiased exponent value: 3 - 1 = 2
- Minimum stored exponent value: 1 (binary 01)
- Minimum unbiased exponent value: 1 − 1 = 0
- Has Positive and Negative zero
- Doesn't have infinity
- Doesn't have NaNs

Additional details:
- Zeros (+/-): S.00.0
- Max normal number: S.11.1 = ±2^(2) x (1 + 0.5) = ±6.0
- Min normal number: S.01.0 = ±2^(0) = ±1.0
- Min subnormal number: S.00.1 = ±2^(0) x 0.5 = ±0.5

F8E8M0FNU
- Exponent bias: 127
- Maximum stored exponent value: 254 (binary 1111'1110)
- Maximum unbiased exponent value: 254 - 127 = 127
- Minimum stored exponent value: 0 (binary 0000'0000)
- Minimum unbiased exponent value: 0 − 127 = -127
- Doesn't have zero
- Doesn't have infinity
- NaN is encoded as binary 1111'1111

Additional details:
- Zeros cannot be represented
- Negative values cannot be represented
- Mantissa is always 1
```

Related PRs:
- openxla/stablehlo#2582
- jax-ml/ml_dtypes#181
- llvm/llvm-project#95392
- llvm/llvm-project#108877
- jax-ml/ml_dtypes#166
- llvm/llvm-project#107127
- llvm/llvm-project#111028
Copybara import of the project:

--
d7e00c4 by Sergey Kozub <[email protected]>:

Add F4E2M1FN and F8E8M0FNU types

Merging this change closes #21380

FUTURE_COPYBARA_INTEGRATE_REVIEW=#21380 from openxla:skozub/e2m1_e8m0 d7e00c4
PiperOrigin-RevId: 715070992
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Jan 14, 2025
Imported from GitHub PR openxla/xla#21380

Previous PR openxla/xla#19096 was rolled back, re-trying.

This PR adds F4E2M1FN primitive type (4-bit float with 2 bits exponent and 1 bit mantissa), F8E8M0FNU primitive type (8-bit float with 8 bits exponent, no mantissa and no sign) and enables loads/stores in the same way S4/U4 type is implemented.

This will enable using microscaling (MX) formats ([RFC](openxla/xla#18085)), such as MXFP4.

```c
F4E2M1FN
- Exponent bias: 1
- Maximum stored exponent value: 3 (binary 11)
- Maximum unbiased exponent value: 3 - 1 = 2
- Minimum stored exponent value: 1 (binary 01)
- Minimum unbiased exponent value: 1 − 1 = 0
- Has Positive and Negative zero
- Doesn't have infinity
- Doesn't have NaNs

Additional details:
- Zeros (+/-): S.00.0
- Max normal number: S.11.1 = ±2^(2) x (1 + 0.5) = ±6.0
- Min normal number: S.01.0 = ±2^(0) = ±1.0
- Min subnormal number: S.00.1 = ±2^(0) x 0.5 = ±0.5

F8E8M0FNU
- Exponent bias: 127
- Maximum stored exponent value: 254 (binary 1111'1110)
- Maximum unbiased exponent value: 254 - 127 = 127
- Minimum stored exponent value: 0 (binary 0000'0000)
- Minimum unbiased exponent value: 0 − 127 = -127
- Doesn't have zero
- Doesn't have infinity
- NaN is encoded as binary 1111'1111

Additional details:
- Zeros cannot be represented
- Negative values cannot be represented
- Mantissa is always 1
```

Related PRs:
- openxla/stablehlo#2582
- jax-ml/ml_dtypes#181
- llvm/llvm-project#95392
- llvm/llvm-project#108877
- jax-ml/ml_dtypes#166
- llvm/llvm-project#107127
- llvm/llvm-project#111028
Copybara import of the project:

--
d7e00c49a4b4f26c06266d6bb941275e67464c01 by Sergey Kozub <[email protected]>:

Add F4E2M1FN and F8E8M0FNU types

Merging this change closes #21380

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#21380 from openxla:skozub/e2m1_e8m0 d7e00c49a4b4f26c06266d6bb941275e67464c01
PiperOrigin-RevId: 715070992
copybara-service bot pushed a commit that referenced this pull request Jan 14, 2025
Imported from GitHub PR #21380

Previous PR #19096 was rolled back, re-trying.

This PR adds F4E2M1FN primitive type (4-bit float with 2 bits exponent and 1 bit mantissa), F8E8M0FNU primitive type (8-bit float with 8 bits exponent, no mantissa and no sign) and enables loads/stores in the same way S4/U4 type is implemented.

This will enable using microscaling (MX) formats ([RFC](#18085)), such as MXFP4.

```c
F4E2M1FN
- Exponent bias: 1
- Maximum stored exponent value: 3 (binary 11)
- Maximum unbiased exponent value: 3 - 1 = 2
- Minimum stored exponent value: 1 (binary 01)
- Minimum unbiased exponent value: 1 − 1 = 0
- Has Positive and Negative zero
- Doesn't have infinity
- Doesn't have NaNs

Additional details:
- Zeros (+/-): S.00.0
- Max normal number: S.11.1 = ±2^(2) x (1 + 0.5) = ±6.0
- Min normal number: S.01.0 = ±2^(0) = ±1.0
- Min subnormal number: S.00.1 = ±2^(0) x 0.5 = ±0.5

F8E8M0FNU
- Exponent bias: 127
- Maximum stored exponent value: 254 (binary 1111'1110)
- Maximum unbiased exponent value: 254 - 127 = 127
- Minimum stored exponent value: 0 (binary 0000'0000)
- Minimum unbiased exponent value: 0 − 127 = -127
- Doesn't have zero
- Doesn't have infinity
- NaN is encoded as binary 1111'1111

Additional details:
- Zeros cannot be represented
- Negative values cannot be represented
- Mantissa is always 1
```

Related PRs:
- openxla/stablehlo#2582
- jax-ml/ml_dtypes#181
- llvm/llvm-project#95392
- llvm/llvm-project#108877
- jax-ml/ml_dtypes#166
- llvm/llvm-project#107127
- llvm/llvm-project#111028
Copybara import of the project:

--
d7e00c4 by Sergey Kozub <[email protected]>:

Add F4E2M1FN and F8E8M0FNU types

Merging this change closes #21380

FUTURE_COPYBARA_INTEGRATE_REVIEW=#21380 from openxla:skozub/e2m1_e8m0 d7e00c4
PiperOrigin-RevId: 715070992
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Jan 14, 2025
Imported from GitHub PR openxla/xla#21380

Previous PR openxla/xla#19096 was rolled back, re-trying.

This PR adds F4E2M1FN primitive type (4-bit float with 2 bits exponent and 1 bit mantissa), F8E8M0FNU primitive type (8-bit float with 8 bits exponent, no mantissa and no sign) and enables loads/stores in the same way S4/U4 type is implemented.

This will enable using microscaling (MX) formats ([RFC](openxla/xla#18085)), such as MXFP4.

```c
F4E2M1FN
- Exponent bias: 1
- Maximum stored exponent value: 3 (binary 11)
- Maximum unbiased exponent value: 3 - 1 = 2
- Minimum stored exponent value: 1 (binary 01)
- Minimum unbiased exponent value: 1 − 1 = 0
- Has Positive and Negative zero
- Doesn't have infinity
- Doesn't have NaNs

Additional details:
- Zeros (+/-): S.00.0
- Max normal number: S.11.1 = ±2^(2) x (1 + 0.5) = ±6.0
- Min normal number: S.01.0 = ±2^(0) = ±1.0
- Min subnormal number: S.00.1 = ±2^(0) x 0.5 = ±0.5

F8E8M0FNU
- Exponent bias: 127
- Maximum stored exponent value: 254 (binary 1111'1110)
- Maximum unbiased exponent value: 254 - 127 = 127
- Minimum stored exponent value: 0 (binary 0000'0000)
- Minimum unbiased exponent value: 0 − 127 = -127
- Doesn't have zero
- Doesn't have infinity
- NaN is encoded as binary 1111'1111

Additional details:
- Zeros cannot be represented
- Negative values cannot be represented
- Mantissa is always 1
```

Related PRs:
- openxla/stablehlo#2582
- jax-ml/ml_dtypes#181
- llvm/llvm-project#95392
- llvm/llvm-project#108877
- jax-ml/ml_dtypes#166
- llvm/llvm-project#107127
- llvm/llvm-project#111028
Copybara import of the project:

--
d7e00c49a4b4f26c06266d6bb941275e67464c01 by Sergey Kozub <[email protected]>:

Add F4E2M1FN and F8E8M0FNU types

Merging this change closes #21380

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#21380 from openxla:skozub/e2m1_e8m0 d7e00c49a4b4f26c06266d6bb941275e67464c01
PiperOrigin-RevId: 715070992
copybara-service bot pushed a commit that referenced this pull request Jan 14, 2025
Imported from GitHub PR #21380

Previous PR #19096 was rolled back, re-trying.

This PR adds F4E2M1FN primitive type (4-bit float with 2 bits exponent and 1 bit mantissa), F8E8M0FNU primitive type (8-bit float with 8 bits exponent, no mantissa and no sign) and enables loads/stores in the same way S4/U4 type is implemented.

This will enable using microscaling (MX) formats ([RFC](#18085)), such as MXFP4.

```c
F4E2M1FN
- Exponent bias: 1
- Maximum stored exponent value: 3 (binary 11)
- Maximum unbiased exponent value: 3 - 1 = 2
- Minimum stored exponent value: 1 (binary 01)
- Minimum unbiased exponent value: 1 − 1 = 0
- Has Positive and Negative zero
- Doesn't have infinity
- Doesn't have NaNs

Additional details:
- Zeros (+/-): S.00.0
- Max normal number: S.11.1 = ±2^(2) x (1 + 0.5) = ±6.0
- Min normal number: S.01.0 = ±2^(0) = ±1.0
- Min subnormal number: S.00.1 = ±2^(0) x 0.5 = ±0.5

F8E8M0FNU
- Exponent bias: 127
- Maximum stored exponent value: 254 (binary 1111'1110)
- Maximum unbiased exponent value: 254 - 127 = 127
- Minimum stored exponent value: 0 (binary 0000'0000)
- Minimum unbiased exponent value: 0 − 127 = -127
- Doesn't have zero
- Doesn't have infinity
- NaN is encoded as binary 1111'1111

Additional details:
- Zeros cannot be represented
- Negative values cannot be represented
- Mantissa is always 1
```

Related PRs:
- openxla/stablehlo#2582
- jax-ml/ml_dtypes#181
- llvm/llvm-project#95392
- llvm/llvm-project#108877
- jax-ml/ml_dtypes#166
- llvm/llvm-project#107127
- llvm/llvm-project#111028
Copybara import of the project:

--
d7e00c4 by Sergey Kozub <[email protected]>:

Add F4E2M1FN and F8E8M0FNU types

Merging this change closes #21380

FUTURE_COPYBARA_INTEGRATE_REVIEW=#21380 from openxla:skozub/e2m1_e8m0 d7e00c4
PiperOrigin-RevId: 715070992
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Jan 14, 2025
Imported from GitHub PR openxla/xla#21380

Previous PR openxla/xla#19096 was rolled back, re-trying.

This PR adds F4E2M1FN primitive type (4-bit float with 2 bits exponent and 1 bit mantissa), F8E8M0FNU primitive type (8-bit float with 8 bits exponent, no mantissa and no sign) and enables loads/stores in the same way S4/U4 type is implemented.

This will enable using microscaling (MX) formats ([RFC](openxla/xla#18085)), such as MXFP4.

```c
F4E2M1FN
- Exponent bias: 1
- Maximum stored exponent value: 3 (binary 11)
- Maximum unbiased exponent value: 3 - 1 = 2
- Minimum stored exponent value: 1 (binary 01)
- Minimum unbiased exponent value: 1 − 1 = 0
- Has Positive and Negative zero
- Doesn't have infinity
- Doesn't have NaNs

Additional details:
- Zeros (+/-): S.00.0
- Max normal number: S.11.1 = ±2^(2) x (1 + 0.5) = ±6.0
- Min normal number: S.01.0 = ±2^(0) = ±1.0
- Min subnormal number: S.00.1 = ±2^(0) x 0.5 = ±0.5

F8E8M0FNU
- Exponent bias: 127
- Maximum stored exponent value: 254 (binary 1111'1110)
- Maximum unbiased exponent value: 254 - 127 = 127
- Minimum stored exponent value: 0 (binary 0000'0000)
- Minimum unbiased exponent value: 0 − 127 = -127
- Doesn't have zero
- Doesn't have infinity
- NaN is encoded as binary 1111'1111

Additional details:
- Zeros cannot be represented
- Negative values cannot be represented
- Mantissa is always 1
```

Related PRs:
- openxla/stablehlo#2582
- jax-ml/ml_dtypes#181
- llvm/llvm-project#95392
- llvm/llvm-project#108877
- jax-ml/ml_dtypes#166
- llvm/llvm-project#107127
- llvm/llvm-project#111028
Copybara import of the project:

--
d7e00c49a4b4f26c06266d6bb941275e67464c01 by Sergey Kozub <[email protected]>:

Add F4E2M1FN and F8E8M0FNU types

Merging this change closes #21380

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#21380 from openxla:skozub/e2m1_e8m0 d7e00c49a4b4f26c06266d6bb941275e67464c01
PiperOrigin-RevId: 715070992
copybara-service bot pushed a commit that referenced this pull request Jan 14, 2025
Imported from GitHub PR #21380

Previous PR #19096 was rolled back, re-trying.

This PR adds F4E2M1FN primitive type (4-bit float with 2 bits exponent and 1 bit mantissa), F8E8M0FNU primitive type (8-bit float with 8 bits exponent, no mantissa and no sign) and enables loads/stores in the same way S4/U4 type is implemented.

This will enable using microscaling (MX) formats ([RFC](#18085)), such as MXFP4.

```c
F4E2M1FN
- Exponent bias: 1
- Maximum stored exponent value: 3 (binary 11)
- Maximum unbiased exponent value: 3 - 1 = 2
- Minimum stored exponent value: 1 (binary 01)
- Minimum unbiased exponent value: 1 − 1 = 0
- Has Positive and Negative zero
- Doesn't have infinity
- Doesn't have NaNs

Additional details:
- Zeros (+/-): S.00.0
- Max normal number: S.11.1 = ±2^(2) x (1 + 0.5) = ±6.0
- Min normal number: S.01.0 = ±2^(0) = ±1.0
- Min subnormal number: S.00.1 = ±2^(0) x 0.5 = ±0.5

F8E8M0FNU
- Exponent bias: 127
- Maximum stored exponent value: 254 (binary 1111'1110)
- Maximum unbiased exponent value: 254 - 127 = 127
- Minimum stored exponent value: 0 (binary 0000'0000)
- Minimum unbiased exponent value: 0 − 127 = -127
- Doesn't have zero
- Doesn't have infinity
- NaN is encoded as binary 1111'1111

Additional details:
- Zeros cannot be represented
- Negative values cannot be represented
- Mantissa is always 1
```

Related PRs:
- openxla/stablehlo#2582
- jax-ml/ml_dtypes#181
- llvm/llvm-project#95392
- llvm/llvm-project#108877
- jax-ml/ml_dtypes#166
- llvm/llvm-project#107127
- llvm/llvm-project#111028
Copybara import of the project:

--
d7e00c4 by Sergey Kozub <[email protected]>:

Add F4E2M1FN and F8E8M0FNU types

Merging this change closes #21380

FUTURE_COPYBARA_INTEGRATE_REVIEW=#21380 from openxla:skozub/e2m1_e8m0 d7e00c4
PiperOrigin-RevId: 715070992
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Jan 14, 2025
Imported from GitHub PR openxla/xla#21380

Previous PR openxla/xla#19096 was rolled back, re-trying.

This PR adds F4E2M1FN primitive type (4-bit float with 2 bits exponent and 1 bit mantissa), F8E8M0FNU primitive type (8-bit float with 8 bits exponent, no mantissa and no sign) and enables loads/stores in the same way S4/U4 type is implemented.

This will enable using microscaling (MX) formats ([RFC](openxla/xla#18085)), such as MXFP4.

```c
F4E2M1FN
- Exponent bias: 1
- Maximum stored exponent value: 3 (binary 11)
- Maximum unbiased exponent value: 3 - 1 = 2
- Minimum stored exponent value: 1 (binary 01)
- Minimum unbiased exponent value: 1 − 1 = 0
- Has Positive and Negative zero
- Doesn't have infinity
- Doesn't have NaNs

Additional details:
- Zeros (+/-): S.00.0
- Max normal number: S.11.1 = ±2^(2) x (1 + 0.5) = ±6.0
- Min normal number: S.01.0 = ±2^(0) = ±1.0
- Min subnormal number: S.00.1 = ±2^(0) x 0.5 = ±0.5

F8E8M0FNU
- Exponent bias: 127
- Maximum stored exponent value: 254 (binary 1111'1110)
- Maximum unbiased exponent value: 254 - 127 = 127
- Minimum stored exponent value: 0 (binary 0000'0000)
- Minimum unbiased exponent value: 0 − 127 = -127
- Doesn't have zero
- Doesn't have infinity
- NaN is encoded as binary 1111'1111

Additional details:
- Zeros cannot be represented
- Negative values cannot be represented
- Mantissa is always 1
```

Related PRs:
- openxla/stablehlo#2582
- jax-ml/ml_dtypes#181
- llvm/llvm-project#95392
- llvm/llvm-project#108877
- jax-ml/ml_dtypes#166
- llvm/llvm-project#107127
- llvm/llvm-project#111028
Copybara import of the project:

--
d7e00c49a4b4f26c06266d6bb941275e67464c01 by Sergey Kozub <[email protected]>:

Add F4E2M1FN and F8E8M0FNU types

Merging this change closes #21380

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#21380 from openxla:skozub/e2m1_e8m0 d7e00c49a4b4f26c06266d6bb941275e67464c01
PiperOrigin-RevId: 715070992
copybara-service bot pushed a commit to tensorflow/mlir-hlo that referenced this pull request Jan 14, 2025
Imported from GitHub PR openxla/xla#21380

Previous PR openxla/xla#19096 was rolled back, re-trying.

This PR adds F4E2M1FN primitive type (4-bit float with 2 bits exponent and 1 bit mantissa), F8E8M0FNU primitive type (8-bit float with 8 bits exponent, no mantissa and no sign) and enables loads/stores in the same way S4/U4 type is implemented.

This will enable using microscaling (MX) formats ([RFC](openxla/xla#18085)), such as MXFP4.

```c
F4E2M1FN
- Exponent bias: 1
- Maximum stored exponent value: 3 (binary 11)
- Maximum unbiased exponent value: 3 - 1 = 2
- Minimum stored exponent value: 1 (binary 01)
- Minimum unbiased exponent value: 1 − 1 = 0
- Has Positive and Negative zero
- Doesn't have infinity
- Doesn't have NaNs

Additional details:
- Zeros (+/-): S.00.0
- Max normal number: S.11.1 = ±2^(2) x (1 + 0.5) = ±6.0
- Min normal number: S.01.0 = ±2^(0) = ±1.0
- Min subnormal number: S.00.1 = ±2^(0) x 0.5 = ±0.5

F8E8M0FNU
- Exponent bias: 127
- Maximum stored exponent value: 254 (binary 1111'1110)
- Maximum unbiased exponent value: 254 - 127 = 127
- Minimum stored exponent value: 0 (binary 0000'0000)
- Minimum unbiased exponent value: 0 − 127 = -127
- Doesn't have zero
- Doesn't have infinity
- NaN is encoded as binary 1111'1111

Additional details:
- Zeros cannot be represented
- Negative values cannot be represented
- Mantissa is always 1
```

Related PRs:
- openxla/stablehlo#2582
- jax-ml/ml_dtypes#181
- llvm/llvm-project#95392
- llvm/llvm-project#108877
- jax-ml/ml_dtypes#166
- llvm/llvm-project#107127
- llvm/llvm-project#111028
Copybara import of the project:

--
d7e00c49a4b4f26c06266d6bb941275e67464c01 by Sergey Kozub <[email protected]>:

Add F4E2M1FN and F8E8M0FNU types

Merging this change closes #21380

PiperOrigin-RevId: 715434229
copybara-service bot pushed a commit that referenced this pull request Jan 14, 2025
Imported from GitHub PR #21380

Previous PR #19096 was rolled back, re-trying.

This PR adds F4E2M1FN primitive type (4-bit float with 2 bits exponent and 1 bit mantissa), F8E8M0FNU primitive type (8-bit float with 8 bits exponent, no mantissa and no sign) and enables loads/stores in the same way S4/U4 type is implemented.

This will enable using microscaling (MX) formats ([RFC](#18085)), such as MXFP4.

```c
F4E2M1FN
- Exponent bias: 1
- Maximum stored exponent value: 3 (binary 11)
- Maximum unbiased exponent value: 3 - 1 = 2
- Minimum stored exponent value: 1 (binary 01)
- Minimum unbiased exponent value: 1 − 1 = 0
- Has Positive and Negative zero
- Doesn't have infinity
- Doesn't have NaNs

Additional details:
- Zeros (+/-): S.00.0
- Max normal number: S.11.1 = ±2^(2) x (1 + 0.5) = ±6.0
- Min normal number: S.01.0 = ±2^(0) = ±1.0
- Min subnormal number: S.00.1 = ±2^(0) x 0.5 = ±0.5

F8E8M0FNU
- Exponent bias: 127
- Maximum stored exponent value: 254 (binary 1111'1110)
- Maximum unbiased exponent value: 254 - 127 = 127
- Minimum stored exponent value: 0 (binary 0000'0000)
- Minimum unbiased exponent value: 0 − 127 = -127
- Doesn't have zero
- Doesn't have infinity
- NaN is encoded as binary 1111'1111

Additional details:
- Zeros cannot be represented
- Negative values cannot be represented
- Mantissa is always 1
```

Related PRs:
- openxla/stablehlo#2582
- jax-ml/ml_dtypes#181
- llvm/llvm-project#95392
- llvm/llvm-project#108877
- jax-ml/ml_dtypes#166
- llvm/llvm-project#107127
- llvm/llvm-project#111028
Copybara import of the project:

--
d7e00c4 by Sergey Kozub <[email protected]>:

Add F4E2M1FN and F8E8M0FNU types

Merging this change closes #21380

COPYBARA_INTEGRATE_REVIEW=#21380 from openxla:skozub/e2m1_e8m0 d7e00c4
PiperOrigin-RevId: 715434229
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Jan 14, 2025
Imported from GitHub PR openxla/xla#21380

Previous PR openxla/xla#19096 was rolled back, re-trying.

This PR adds F4E2M1FN primitive type (4-bit float with 2 bits exponent and 1 bit mantissa), F8E8M0FNU primitive type (8-bit float with 8 bits exponent, no mantissa and no sign) and enables loads/stores in the same way S4/U4 type is implemented.

This will enable using microscaling (MX) formats ([RFC](openxla/xla#18085)), such as MXFP4.

```c
F4E2M1FN
- Exponent bias: 1
- Maximum stored exponent value: 3 (binary 11)
- Maximum unbiased exponent value: 3 - 1 = 2
- Minimum stored exponent value: 1 (binary 01)
- Minimum unbiased exponent value: 1 − 1 = 0
- Has Positive and Negative zero
- Doesn't have infinity
- Doesn't have NaNs

Additional details:
- Zeros (+/-): S.00.0
- Max normal number: S.11.1 = ±2^(2) x (1 + 0.5) = ±6.0
- Min normal number: S.01.0 = ±2^(0) = ±1.0
- Min subnormal number: S.00.1 = ±2^(0) x 0.5 = ±0.5

F8E8M0FNU
- Exponent bias: 127
- Maximum stored exponent value: 254 (binary 1111'1110)
- Maximum unbiased exponent value: 254 - 127 = 127
- Minimum stored exponent value: 0 (binary 0000'0000)
- Minimum unbiased exponent value: 0 − 127 = -127
- Doesn't have zero
- Doesn't have infinity
- NaN is encoded as binary 1111'1111

Additional details:
- Zeros cannot be represented
- Negative values cannot be represented
- Mantissa is always 1
```

Related PRs:
- openxla/stablehlo#2582
- jax-ml/ml_dtypes#181
- llvm/llvm-project#95392
- llvm/llvm-project#108877
- jax-ml/ml_dtypes#166
- llvm/llvm-project#107127
- llvm/llvm-project#111028
Copybara import of the project:

--
d7e00c49a4b4f26c06266d6bb941275e67464c01 by Sergey Kozub <[email protected]>:

Add F4E2M1FN and F8E8M0FNU types

Merging this change closes #21380

PiperOrigin-RevId: 715434229
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants