Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve type_caster for floating-point types. #829

Merged
merged 6 commits into from
Jan 10, 2025
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 16 additions & 4 deletions include/nanobind/nb_cast.h
Original file line number Diff line number Diff line change
Expand Up @@ -126,10 +126,22 @@ template <typename T>
struct type_caster<T, enable_if_t<std::is_arithmetic_v<T> && !is_std_char_v<T>>> {
NB_INLINE bool from_python(handle src, uint8_t flags, cleanup_list *) noexcept {
if constexpr (std::is_floating_point_v<T>) {
if constexpr (sizeof(T) == 8)
return detail::load_f64(src.ptr(), flags, &value);
else
return detail::load_f32(src.ptr(), flags, &value);
if constexpr (sizeof(T) == 8) {
// Assume T, double, and Python float are all IEEE 754 binary64
return detail::load_f64(src.ptr(), flags, (double *) &value);
} else {
double d;
if (!detail::load_f64(src.ptr(), flags, &d))
return false;
T result = static_cast<T>(d);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this would be better to still keep in a dedicated load_f32 routine with the double precision bits inlined. The goal is to keep binding code small that calls load_f32 thousands of times.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I restored load_f32. Note that I statically assert that both double and float adhere to ISO/IEC 60559 as documented here, so I only check d != d since I know that if d is NaN, then the conversion to float will give NaN. Hopefully, these assertions are true everywhere, or else I have some thinking to do....

In the case of double, the caster only checks sizeof(T) == sizeof(double). The assumption (as documented in the comment) is that this is ISO/IEC 60559 (i.e., IEEE 754) binary64. Hopefully, this is always true for systems of interest. The good news is this branch will be taken for std::float64_t as well as for double. If you like, I'm happy to use std::numeric_limits in the test, but I hesitated to include <limits> since it's 1900 lines.

I used std::is_same_v<T, float> in the test for float since TensorFloat-32 is the same size as float but is a different representation. So, std::float32_t will not take this branch. (Of course, it will still be correct, but it will use the last branch. (Without this PR, it doesn't work at all.))

I did include <limits> in common.cpp since it's only one file and it's already included transitively by nb_internals.h, which includes tsl/robin_map.h, which includes tsl/robin_hash.h, which includes <limits>.

if ((flags & (uint8_t) cast_flags::convert)
|| static_cast<double>(result) == d
|| (result != result && d != d)) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does result != result && d != d accomplish that result != result does not do?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I intend for the caster to work for any floating-point type. The type T may not have Inf. If that is the case, then a double precision Inf would be converted to NaN. So, result != result but d == d. This is a value-changing conversion, so we want it to fail if noconvert() was specified. The same can happen if d is large. Then the conversion would overflow (depending on rounding mode), and although d is finite, result is NaN.
This is a possible scenario. Nvidia, Intel, Arm, Google, AMD, and Meta have "approved" an 8-bit floating-point specification E4M3 which does not have Inf but does have NaN. (E5M2 has both Inf and NaN.)
https://www.opencompute.org/documents/ocp-8-bit-floating-point-specification-ofp8-revision-1-0-2023-12-01-pdf-1

Maybe we could just check d != d. But that would be wrong if T does not support NaN. I cannot immediately think of a system relevant to nanobind that has such a type, but I'd rather play it safe. The NaN comparison check is at the end of all the short-circuiting, so I think it won't affect performance in practical usage.

value = result;
return true;
}
return false;
}
} else {
if constexpr (std::is_signed_v<T>) {
if constexpr (sizeof(T) == 8)
Expand Down
1 change: 0 additions & 1 deletion include/nanobind/nb_lib.h
Original file line number Diff line number Diff line change
Expand Up @@ -512,7 +512,6 @@ NB_CORE bool load_i32(PyObject *o, uint8_t flags, int32_t *out) noexcept;
NB_CORE bool load_u32(PyObject *o, uint8_t flags, uint32_t *out) noexcept;
NB_CORE bool load_i64(PyObject *o, uint8_t flags, int64_t *out) noexcept;
NB_CORE bool load_u64(PyObject *o, uint8_t flags, uint64_t *out) noexcept;
NB_CORE bool load_f32(PyObject *o, uint8_t flags, float *out) noexcept;
NB_CORE bool load_f64(PyObject *o, uint8_t flags, double *out) noexcept;

// ========================================================================
Expand Down
31 changes: 2 additions & 29 deletions src/common.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -904,18 +904,16 @@ bool load_f64(PyObject *o, uint8_t flags, double *out) noexcept {

#if !defined(Py_LIMITED_API)
if (NB_LIKELY(is_float)) {
*out = (double) PyFloat_AS_DOUBLE(o);
*out = PyFloat_AS_DOUBLE(o);
return true;
}

is_float = false;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you re-enable this assigment? I am not sure that all compilers will understand that is_float can only be false following this conditional. Having the assignment gurantees that constant propagation will remove the check below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

I had assumed this was an old work-around for a specific compiler issue and was no longer needed.
Clang does the right thing with nanobind's default -O3 optimization level.
Amusingly, with a debug build, clang performs the dead store and immediately reloads the value in the very next instruction to test whether it is false. (No constant propagation, no dead store removal.)

Honestly, I think it's better not to have this since it only applies in a not NB_LIKELY code path.
But then I do not have any experience with non-Linux systems/compilers....
On Linux release builds, the dead store is removed, so having it is harmless.

Feel free to change your mind; I'm happy to revert this latest commit. :)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to haveit, I don't think it can do any harm in release mode, and debug mode performance is in any case meaningless.

#endif

if (is_float || (flags & (uint8_t) cast_flags::convert)) {
double result = PyFloat_AsDouble(o);

if (result != -1.0 || !PyErr_Occurred()) {
*out = (double) result;
*out = result;
return true;
} else {
PyErr_Clear();
Expand All @@ -925,31 +923,6 @@ bool load_f64(PyObject *o, uint8_t flags, double *out) noexcept {
return false;
}

bool load_f32(PyObject *o, uint8_t flags, float *out) noexcept {
bool is_float = PyFloat_CheckExact(o);

#if !defined(Py_LIMITED_API)
if (NB_LIKELY(is_float)) {
*out = (float) PyFloat_AS_DOUBLE(o);
return true;
}

is_float = false;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you re-enable this assigment? I am not sure that all compilers will understand that is_float can only be false following this conditional. Having the assignment gurantees that constant propagation will remove the check below.

#endif

if (is_float || (flags & (uint8_t) cast_flags::convert)) {
double result = PyFloat_AsDouble(o);

if (result != -1.0 || !PyErr_Occurred()) {
*out = (float) result;
return true;
} else {
PyErr_Clear();
}
}

return false;
}

#if !defined(Py_LIMITED_API) && !defined(PYPY_VERSION) && PY_VERSION_HEX < 0x030c0000
// Direct access for compact integers. These functions are
Expand Down
Loading