You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, the implementation of min_element and max_element for floating point vec3 uses the built in min/max functions. These functions explicitly handle NaN values, and do not propagate them, as is explained here.
This is potentially desired behavior, however, this is semantically different from the meaning of min in vectorized contexts where NaN values would be propagated.
The extra handling of NaNs in these functions incurs significant performance overhead.
Benchmarking on my computer using a custom f32::min implementation, which does not check for NaN, was ~25% faster.
// Current implementationpubfnmin_element(self) -> f32{self.x.min(self.y.min(self.z))}// Faster implementationpubfnmin_element(self) -> f32{let min = |a, b| if a < b { a }else{ b };min(self.x,min(self.y,self.z))}
If I'm missing something please let me know, but otherwise this seems like an easy change which could make a significant difference in performance for some use-cases.
I didn't have the time to look into other uses of floating-point min throughout the library but I'd guess it's replacement could improve performance elsewhere as well.
The text was updated successfully, but these errors were encountered:
Interesting, I guess some digging on godbolt is required to determine if this manual "ternary style" replacement compiles to the expected instruction on all architectures.
Another thing I didn't consider with this proposal, is non-optimized code-gen. This method could be a substantial performance degradation for optimized builds. However, I'm not sure how important this is for the goals of this library or if this is even the case. It would need to be tested.
Currently, the implementation of
min_element
andmax_element
for floating point vec3 uses the built inmin
/max
functions. These functions explicitly handleNaN
values, and do not propagate them, as is explained here.This is potentially desired behavior, however, this is semantically different from the meaning of
min
in vectorized contexts whereNaN
values would be propagated.The extra handling of
NaN
s in these functions incurs significant performance overhead.Benchmarking on my computer using a custom f32::min implementation, which does not check for
NaN
, was ~25% faster.We can see why this is so much faster by comparing the generated assembly: https://godbolt.org/z/3Ph1hWx8e
If I'm missing something please let me know, but otherwise this seems like an easy change which could make a significant difference in performance for some use-cases.
I didn't have the time to look into other uses of floating-point min throughout the library but I'd guess it's replacement could improve performance elsewhere as well.
The text was updated successfully, but these errors were encountered: