Improve type_caster for floating-point types. #829

hpkfft · 2024-12-20T21:01:24Z

I would like to propose that nb::arg().noconvert() not allow value-changing conversions of floating-point arguments.

Given

NB_MODULE(my_extension, m) {
    m.def("d",   [](double d) { printf("%.8f\n", d); });
    m.def("dnc", [](double d) { printf("%.8f\n", d); },
                 nb::arg().noconvert());

    m.def("f",   [](float f) { printf("%.8f\n", f); });
    m.def("fnc", [](float f) { printf("%.8f\n", f); },
                 nb::arg().noconvert());

#if 0
    m.def("ld",   [](long double x) { printf("%.8Lf\n", x); });
    m.def("ldnc", [](long double x) { printf("%.8Lf\n", x); },
                  nb::arg().noconvert());
#endif
}

we currently have the following:

x	"hello"	3	3.0	math.pi	1e40
d(x)	TypeError	3.00000000	3.00000000	3.14159265	100....752
dnc(x)	TypeError	TypeError	3.00000000	3.14159265	100....752
f(x)	TypeError	3.00000000	3.00000000	3.14159274	inf
fnc(x)	TypeError	TypeError	3.00000000	3.14159274	inf

and the two long double functions would not compile if they were uncommented.

I suggest the following is preferable:

x	"hello"	3	3.0	math.pi	1e40
d(x)	TypeError	3.00000000	3.00000000	3.14159265	100....752
dnc(x)	TypeError	TypeError	3.00000000	3.14159265	100....752
f(x)	TypeError	3.00000000	3.00000000	3.14159274	inf
fnc(x)	TypeError	TypeError	3.00000000	TypeError	TypeError
ld(x)	TypeError	3.00000000	3.00000000	3.14159265	100....752
ldnc(x)	TypeError	TypeError	3.00000000	3.14159265	100....752

The code in this PR is not well tested, but I wanted to show some code to help spur feedback.
If the idea seems good, I'll work on this in January.

wjakob · 2025-01-06T02:14:50Z

include/nanobind/nb_cast.h

+                T result = static_cast<T>(d);
+                if ((flags & (uint8_t) cast_flags::convert)
+                        || static_cast<double>(result) == d
+                        || (result != result && d != d)) {


What does result != result && d != d accomplish that result != result does not do?

I intend for the caster to work for any floating-point type. The type T may not have Inf. If that is the case, then a double precision Inf would be converted to NaN. So, result != result but d == d. This is a value-changing conversion, so we want it to fail if noconvert() was specified. The same can happen if d is large. Then the conversion would overflow (depending on rounding mode), and although d is finite, result is NaN.
This is a possible scenario. Nvidia, Intel, Arm, Google, AMD, and Meta have "approved" an 8-bit floating-point specification E4M3 which does not have Inf but does have NaN. (E5M2 has both Inf and NaN.)
https://www.opencompute.org/documents/ocp-8-bit-floating-point-specification-ofp8-revision-1-0-2023-12-01-pdf-1

Maybe we could just check d != d. But that would be wrong if T does not support NaN. I cannot immediately think of a system relevant to nanobind that has such a type, but I'd rather play it safe. The NaN comparison check is at the end of all the short-circuiting, so I think it won't affect performance in practical usage.

wjakob · 2025-01-06T02:15:38Z

include/nanobind/nb_cast.h

+                double d;
+                if (!detail::load_f64(src.ptr(), flags, &d))
+                    return false;
+                T result = static_cast<T>(d);


I think that this would be better to still keep in a dedicated load_f32 routine with the double precision bits inlined. The goal is to keep binding code small that calls load_f32 thousands of times.

I restored load_f32. Note that I statically assert that both double and float adhere to ISO/IEC 60559 as documented here, so I only check d != d since I know that if d is NaN, then the conversion to float will give NaN. Hopefully, these assertions are true everywhere, or else I have some thinking to do....

In the case of double, the caster only checks sizeof(T) == sizeof(double). The assumption (as documented in the comment) is that this is ISO/IEC 60559 (i.e., IEEE 754) binary64. Hopefully, this is always true for systems of interest. The good news is this branch will be taken for std::float64_t as well as for double. If you like, I'm happy to use std::numeric_limits in the test, but I hesitated to include <limits> since it's 1900 lines.

I used std::is_same_v<T, float> in the test for float since TensorFloat-32 is the same size as float but is a different representation. So, std::float32_t will not take this branch. (Of course, it will still be correct, but it will use the last branch. (Without this PR, it doesn't work at all.))

I did include <limits> in common.cpp since it's only one file and it's already included transitively by nb_internals.h, which includes tsl/robin_map.h, which includes tsl/robin_hash.h, which includes <limits>.

wjakob · 2025-01-06T02:16:12Z

I like this, the idea seems useful.

tests/test_functions_ext.pyi.ref

hpkfft · 2025-01-09T20:52:33Z

Here's some official quotes:

Since at least 2000, almost all machines use IEEE 754 binary floating-point arithmetic, and almost all platforms map Python floats to IEEE 754 binary64 “double precision” values.
https://docs.python.org/3/tutorial/floatingpoint.html#representation-error

and

Building CPython now requires support for IEEE 754 floating-point numbers.
The Py_NO_NAN macro has been removed. Since CPython now requires IEEE 754 floats, NaN values are always available.
https://docs.python.org/3/whatsnew/3.11.html#build-changes

and there's an answer to on what systems does Python not use IEEE-754 double precision floats

include/nanobind/nb_cast.h

src/common.cpp

wjakob · 2025-01-10T04:15:58Z

src/common.cpp

    }
-
-    is_float = false;


Can you re-enable this assigment? I am not sure that all compilers will understand that is_float can only be false following this conditional. Having the assignment gurantees that constant propagation will remove the check below.

wjakob · 2025-01-10T04:16:20Z

src/common.cpp

        return true;
    }
-
-    is_float = false;


Can you re-enable this assigment? I am not sure that all compilers will understand that is_float can only be false following this conditional. Having the assignment gurantees that constant propagation will remove the check below.

Done.

I had assumed this was an old work-around for a specific compiler issue and was no longer needed.
Clang does the right thing with nanobind's default -O3 optimization level.
Amusingly, with a debug build, clang performs the dead store and immediately reloads the value in the very next instruction to test whether it is false. (No constant propagation, no dead store removal.)

Honestly, I think it's better not to have this since it only applies in a not NB_LIKELY code path.
But then I do not have any experience with non-Linux systems/compilers....
On Linux release builds, the dead store is removed, so having it is harmless.

Feel free to change your mind; I'm happy to revert this latest commit. :)

I prefer to haveit, I don't think it can do any harm in release mode, and debug mode performance is in any case meaningless.

If a compiler is bad at control flow optimization, this dead store may be helpful.

Improve type_caster for floating-point types.

1e008be

wjakob reviewed Jan 6, 2025

View reviewed changes

Add dedicated detail::load_f32() function.

b025460

hpkfft commented Jan 7, 2025

View reviewed changes

tests/test_functions_ext.pyi.ref Show resolved Hide resolved

wjakob reviewed Jan 10, 2025

View reviewed changes

include/nanobind/nb_cast.h Outdated Show resolved Hide resolved

wjakob reviewed Jan 10, 2025

View reviewed changes

src/common.cpp Outdated Show resolved Hide resolved

wjakob reviewed Jan 10, 2025

View reviewed changes

src/common.cpp Outdated Show resolved Hide resolved

hpkfft added 3 commits January 9, 2025 17:59

Addressed feedback from code review.

2057e51

Update changelog.

59e449e

Merge branch 'master' into floatingpoint

44a3dca

wjakob reviewed Jan 10, 2025

View reviewed changes

Restore dead store.

b26f83b

If a compiler is bad at control flow optimization, this dead store may be helpful.

wjakob merged commit 9ae3ebd into wjakob:master Jan 10, 2025
31 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve type_caster for floating-point types. #829

Improve type_caster for floating-point types. #829

hpkfft commented Dec 20, 2024

wjakob Jan 6, 2025

hpkfft Jan 7, 2025

wjakob Jan 6, 2025

hpkfft Jan 7, 2025

wjakob commented Jan 6, 2025

hpkfft commented Jan 9, 2025 •

edited

Loading

wjakob Jan 10, 2025

wjakob Jan 10, 2025

hpkfft Jan 10, 2025

wjakob Jan 10, 2025

Improve type_caster for floating-point types. #829

Improve type_caster for floating-point types. #829

Conversation

hpkfft commented Dec 20, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wjakob commented Jan 6, 2025

hpkfft commented Jan 9, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hpkfft commented Jan 9, 2025 •

edited

Loading