Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Improve performance of
dnp.nan_to_num
(#2228)
This PR adds a dedicated kernel for `dnp.nan_to_num` to improve its performance. This reduces the number of kernel calls to at most one in all cases. A kernel for both strided and contiguous inputs have been added, to avoid additional allocation of device memory for trivial strides when input is fully C- or F-contiguous. For example of performance gains, using Max GPU master: ```python In [1]: import dpnp as dnp In [2]: import numpy as np In [3]: x_np = np.random.randn(10**9) In [4]: x_np[np.random.choice(x_np.size, 200, replace=False)] = np.nan In [5]: x = dnp.asarray(x_np) In [6]: q = x.sycl_queue In [7]: %time r = dnp.nan_to_num(x); q.wait() CPU times: user 394 ms, sys: 43.8 ms, total: 438 ms Wall time: 304 ms In [8]: %time r = dnp.nan_to_num(x); q.wait() CPU times: user 333 ms, sys: 31.8 ms, total: 364 ms Wall time: 134 ms ``` on branch: ```python In [8]: %time r = dnp.nan_to_num(x); q.wait() CPU times: user 49.6 ms, sys: 8.1 ms, total: 57.7 ms Wall time: 60.9 ms In [9]: %time r = dnp.nan_to_num(x); q.wait() CPU times: user 22.9 ms, sys: 16 ms, total: 38.9 ms Wall time: 19.7 ms ```
- Loading branch information