Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve type_caster for floating-point types. #829
Improve type_caster for floating-point types. #829
Changes from 2 commits
1e008be
b025460
2057e51
59e449e
44a3dca
b26f83b
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that this would be better to still keep in a dedicated
load_f32
routine with the double precision bits inlined. The goal is to keep binding code small that callsload_f32
thousands of times.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I restored
load_f32
. Note that I statically assert that bothdouble
andfloat
adhere to ISO/IEC 60559 as documented here, so I only checkd != d
since I know that ifd
isNaN
, then the conversion tofloat
will giveNaN
. Hopefully, these assertions are true everywhere, or else I have some thinking to do....In the case of
double
, the caster only checkssizeof(T) == sizeof(double)
. The assumption (as documented in the comment) is that this is ISO/IEC 60559 (i.e., IEEE 754) binary64. Hopefully, this is always true for systems of interest. The good news is this branch will be taken forstd::float64_t
as well as fordouble
. If you like, I'm happy to usestd::numeric_limits
in the test, but I hesitated to include<limits>
since it's 1900 lines.I used
std::is_same_v<T, float>
in the test forfloat
since TensorFloat-32 is the same size asfloat
but is a different representation. So,std::float32_t
will not take this branch. (Of course, it will still be correct, but it will use the last branch. (Without this PR, it doesn't work at all.))I did include
<limits>
incommon.cpp
since it's only one file and it's already included transitively bynb_internals.h
, which includestsl/robin_map.h
, which includestsl/robin_hash.h
, which includes<limits>
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does
result != result && d != d
accomplish thatresult != result
does not do?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I intend for the caster to work for any floating-point type. The type
T
may not haveInf
. If that is the case, then a double precisionInf
would be converted toNaN
. So,result != result
butd == d
. This is a value-changing conversion, so we want it to fail ifnoconvert()
was specified. The same can happen ifd
is large. Then the conversion would overflow (depending on rounding mode), and althoughd
is finite,result
isNaN
.This is a possible scenario. Nvidia, Intel, Arm, Google, AMD, and Meta have "approved" an 8-bit floating-point specification E4M3 which does not have
Inf
but does haveNaN
. (E5M2 has bothInf
andNaN
.)https://www.opencompute.org/documents/ocp-8-bit-floating-point-specification-ofp8-revision-1-0-2023-12-01-pdf-1
Maybe we could just check
d != d
. But that would be wrong ifT
does not supportNaN
. I cannot immediately think of a system relevant to nanobind that has such a type, but I'd rather play it safe. TheNaN
comparison check is at the end of all the short-circuiting, so I think it won't affect performance in practical usage.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you re-enable this assigment? I am not sure that all compilers will understand that is_float can only be false following this conditional. Having the assignment gurantees that constant propagation will remove the check below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
I had assumed this was an old work-around for a specific compiler issue and was no longer needed.
Clang does the right thing with nanobind's default -O3 optimization level.
Amusingly, with a debug build, clang performs the dead store and immediately reloads the value in the very next instruction to test whether it is false. (No constant propagation, no dead store removal.)
Honestly, I think it's better not to have this since it only applies in a not NB_LIKELY code path.
But then I do not have any experience with non-Linux systems/compilers....
On Linux release builds, the dead store is removed, so having it is harmless.
Feel free to change your mind; I'm happy to revert this latest commit. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer to haveit, I don't think it can do any harm in release mode, and debug mode performance is in any case meaningless.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you re-enable this assigment? I am not sure that all compilers will understand that
is_float
can only be false following this conditional. Having the assignment gurantees that constant propagation will remove the check below.