Improve performance of `dnp.nan_to_num` #2228

ndgrigorian · 2024-12-10T05:38:08Z

This PR adds a dedicated kernel for dnp.nan_to_num to improve its performance. This reduces the number of kernel calls to at most one in all cases.

A kernel for both strided and contiguous inputs have been added, to avoid additional allocation of device memory for trivial strides when input is fully C- or F-contiguous.

For example of performance gains, using Max GPU

master:

In [1]: import dpnp as dnp

In [2]: import numpy as np

In [3]: x_np = np.random.randn(10**9)

In [4]: x_np[np.random.choice(x_np.size, 200, replace=False)] = np.nan

In [5]: x = dnp.asarray(x_np)

In [6]: q = x.sycl_queue

In [7]: %time r = dnp.nan_to_num(x); q.wait()
CPU times: user 394 ms, sys: 43.8 ms, total: 438 ms
Wall time: 304 ms

In [8]: %time r = dnp.nan_to_num(x); q.wait()
CPU times: user 333 ms, sys: 31.8 ms, total: 364 ms
Wall time: 134 ms

on branch:

In [8]: %time r = dnp.nan_to_num(x); q.wait()
CPU times: user 49.6 ms, sys: 8.1 ms, total: 57.7 ms
Wall time: 60.9 ms

In [9]: %time r = dnp.nan_to_num(x); q.wait()
CPU times: user 22.9 ms, sys: 16 ms, total: 38.9 ms
Wall time: 19.7 ms

Have you provided a meaningful PR description?
Have you added a test, reproducer or referred to issue with a reproducer?
Have you tested your changes locally for CPU and GPU devices?
Have you made sure that new changes do not introduce compiler warnings?
Have you checked performance impact of proposed changes?
If this PR is a work in progress, are you filing the PR as a draft?

dpnp/dpnp_iface_mathematical.py

dpnp/backend/kernels/elementwise_functions/nan_to_num.hpp

dpnp/backend/extensions/ufunc/elementwise_functions/nan_to_num.cpp

dpnp/backend/kernels/elementwise_functions/nan_to_num.hpp

coveralls · 2025-01-12T05:56:43Z

coverage: 71.298% (-0.02%) from 71.316%
when pulling 54dfaf5 on ndgrigorian:improve-nan-to-num-performance
into 91161a8 on IntelPython:master.

Use std::conditional and value_type_of_t struct to avoid constexpr branches with redundant code

ndgrigorian requested review from antonwolfy, AlexanderKalistratov, vlad-perevezentsev and vtavana as code owners December 10, 2024 05:38

antonwolfy reviewed Dec 10, 2024

View reviewed changes

dpnp/backend/kernels/elementwise_functions/nan_to_num.hpp Show resolved Hide resolved

ndgrigorian mentioned this pull request Dec 10, 2024

Performance issue with NaN functions #2086

Open

ndgrigorian force-pushed the improve-nan-to-num-performance branch from 48f623b to 0693c0b Compare December 26, 2024 18:39

ndgrigorian force-pushed the improve-nan-to-num-performance branch from 0693c0b to 50c28f7 Compare January 12, 2025 02:13

ndgrigorian force-pushed the improve-nan-to-num-performance branch 2 times, most recently from 0c6d3f8 to 54dfaf5 Compare January 28, 2025 19:16

ndgrigorian added 15 commits February 2, 2025 14:36

Add kernel for nan_to_num to ufunc extension

0ee8db9

Add missing headers in nan_to_num.cpp

aab60a3

Add contiguous kernel for nan_to_num

441f466

Fix missed ssize_t to dpctl::tensor::ssize_t

40af05c

Clean-up nan_to_num.cpp dead code

215c280

Use dpnp.copy instead of copy method in nan_to_num

c14f07f

Fix typo in nan_to_num

795628c

inline to_num in nan_to_num kernel

281e33b

Add additional const qualifiers in nan_to_num impl functions

43f0ae1

Use is_complex_v in nan_to_num kernel

54218bd

Simplify nan_to_num call logic

31ec48a

Use std::conditional and value_type_of_t struct to avoid constexpr branches with redundant code

Improve test coverage for nan_to_num

4e4878c

Align with changes to device_allocate_and_pack in dpctl

c4007dd

Add subgroup load and store based implementation for nan_to_num kernel

4f5514a

size_t -> std::size_t in nan_to_num Python binding

d1fb595

ndgrigorian force-pushed the improve-nan-to-num-performance branch from 5d19afb to d1fb595 Compare February 2, 2025 22:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve performance of `dnp.nan_to_num` #2228

Improve performance of `dnp.nan_to_num` #2228

ndgrigorian commented Dec 10, 2024 •

edited

Loading

coveralls commented Jan 12, 2025 •

edited

Loading

Improve performance of dnp.nan_to_num #2228

Are you sure you want to change the base?

Improve performance of dnp.nan_to_num #2228

Conversation

ndgrigorian commented Dec 10, 2024 • edited Loading

coveralls commented Jan 12, 2025 • edited Loading

Improve performance of `dnp.nan_to_num` #2228

Improve performance of `dnp.nan_to_num` #2228

ndgrigorian commented Dec 10, 2024 •

edited

Loading

coveralls commented Jan 12, 2025 •

edited

Loading