Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of dnp.nan_to_num #2228

Open
wants to merge 15 commits into
base: master
Choose a base branch
from

Conversation

ndgrigorian
Copy link
Contributor

@ndgrigorian ndgrigorian commented Dec 10, 2024

This PR adds a dedicated kernel for dnp.nan_to_num to improve its performance. This reduces the number of kernel calls to at most one in all cases.

A kernel for both strided and contiguous inputs have been added, to avoid additional allocation of device memory for trivial strides when input is fully C- or F-contiguous.

For example of performance gains, using Max GPU

master:

In [1]: import dpnp as dnp

In [2]: import numpy as np

In [3]: x_np = np.random.randn(10**9)

In [4]: x_np[np.random.choice(x_np.size, 200, replace=False)] = np.nan

In [5]: x = dnp.asarray(x_np)

In [6]: q = x.sycl_queue

In [7]: %time r = dnp.nan_to_num(x); q.wait()
CPU times: user 394 ms, sys: 43.8 ms, total: 438 ms
Wall time: 304 ms

In [8]: %time r = dnp.nan_to_num(x); q.wait()
CPU times: user 333 ms, sys: 31.8 ms, total: 364 ms
Wall time: 134 ms

on branch:

In [8]: %time r = dnp.nan_to_num(x); q.wait()
CPU times: user 49.6 ms, sys: 8.1 ms, total: 57.7 ms
Wall time: 60.9 ms

In [9]: %time r = dnp.nan_to_num(x); q.wait()
CPU times: user 22.9 ms, sys: 16 ms, total: 38.9 ms
Wall time: 19.7 ms
  • Have you provided a meaningful PR description?
  • Have you added a test, reproducer or referred to issue with a reproducer?
  • Have you tested your changes locally for CPU and GPU devices?
  • Have you made sure that new changes do not introduce compiler warnings?
  • Have you checked performance impact of proposed changes?
  • If this PR is a work in progress, are you filing the PR as a draft?

dpnp/dpnp_iface_mathematical.py Show resolved Hide resolved
dpnp/dpnp_iface_mathematical.py Outdated Show resolved Hide resolved
dpnp/dpnp_iface_mathematical.py Outdated Show resolved Hide resolved
dpnp/backend/kernels/elementwise_functions/nan_to_num.hpp Outdated Show resolved Hide resolved
dpnp/backend/kernels/elementwise_functions/nan_to_num.hpp Outdated Show resolved Hide resolved
@ndgrigorian ndgrigorian force-pushed the improve-nan-to-num-performance branch from 48f623b to 0693c0b Compare December 26, 2024 18:39
@ndgrigorian ndgrigorian force-pushed the improve-nan-to-num-performance branch from 0693c0b to 50c28f7 Compare January 12, 2025 02:13
@coveralls
Copy link
Collaborator

coveralls commented Jan 12, 2025

Coverage Status

coverage: 71.298% (-0.02%) from 71.316%
when pulling 54dfaf5 on ndgrigorian:improve-nan-to-num-performance
into 91161a8 on IntelPython:master.

@ndgrigorian ndgrigorian force-pushed the improve-nan-to-num-performance branch 2 times, most recently from 0c6d3f8 to 54dfaf5 Compare January 28, 2025 19:16
@ndgrigorian ndgrigorian force-pushed the improve-nan-to-num-performance branch from 5d19afb to d1fb595 Compare February 2, 2025 22:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants