Improve performance of dnp.nan_to_num (#2228)

This PR adds a dedicated kernel for `dnp.nan_to_num` to improve its performance. This reduces the number of kernel calls to at most one in all cases. A kernel for both strided and contiguous inputs have been added, to avoid additional allocation of device memory for trivial strides when input is fully C- or F-contiguous. For example of performance gains, using Max GPU master: ```python In [1]: import dpnp as dnp In [2]: import numpy as np In [3]: x_np = np.random.randn(10**9) In [4]: x_np[np.random.choice(x_np.size, 200, replace=False)] = np.nan In [5]: x = dnp.asarray(x_np) In [6]: q = x.sycl_queue In [7]: %time r = dnp.nan_to_num(x); q.wait() CPU times: user 394 ms, sys: 43.8 ms, total: 438 ms Wall time: 304 ms In [8]: %time r = dnp.nan_to_num(x); q.wait() CPU times: user 333 ms, sys: 31.8 ms, total: 364 ms Wall time: 134 ms ``` on branch: ```python In [8]: %time r = dnp.nan_to_num(x); q.wait() CPU times: user 49.6 ms, sys: 8.1 ms, total: 57.7 ms Wall time: 60.9 ms In [9]: %time r = dnp.nan_to_num(x); q.wait() CPU times: user 22.9 ms, sys: 16 ms, total: 38.9 ms Wall time: 19.7 ms ```
IntelPython · Feb 5, 2025 · 77702b3 · 77702b3
1 parent 5b140db
commit 77702b3
Show file tree

Hide file tree

Showing 7 changed files with 770 additions and 19 deletions.
diff --git a/dpnp/backend/extensions/ufunc/CMakeLists.txt b/dpnp/backend/extensions/ufunc/CMakeLists.txt
@@ -38,6 +38,7 @@ set(_elementwise_sources
     ${CMAKE_CURRENT_SOURCE_DIR}/elementwise_functions/lcm.cpp
     ${CMAKE_CURRENT_SOURCE_DIR}/elementwise_functions/ldexp.cpp
     ${CMAKE_CURRENT_SOURCE_DIR}/elementwise_functions/logaddexp2.cpp
+    ${CMAKE_CURRENT_SOURCE_DIR}/elementwise_functions/nan_to_num.cpp
     ${CMAKE_CURRENT_SOURCE_DIR}/elementwise_functions/radians.cpp
     ${CMAKE_CURRENT_SOURCE_DIR}/elementwise_functions/sinc.cpp
     ${CMAKE_CURRENT_SOURCE_DIR}/elementwise_functions/spacing.cpp

diff --git a/dpnp/backend/extensions/ufunc/elementwise_functions/common.cpp b/dpnp/backend/extensions/ufunc/elementwise_functions/common.cpp
@@ -38,6 +38,7 @@
 #include "lcm.hpp"
 #include "ldexp.hpp"
 #include "logaddexp2.hpp"
+#include "nan_to_num.hpp"
 #include "radians.hpp"
 #include "sinc.hpp"
 #include "spacing.hpp"
@@ -64,6 +65,7 @@ void init_elementwise_functions(py::module_ m)
     init_lcm(m);
     init_ldexp(m);
     init_logaddexp2(m);
+    init_nan_to_num(m);
     init_radians(m);
     init_sinc(m);
     init_spacing(m);