Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use dpctl.tensor.matmul in the backend of dpnp.matmul when inputs are integer #2296

Merged
merged 9 commits into from
Feb 7, 2025

Conversation

vtavana
Copy link
Collaborator

@vtavana vtavana commented Feb 5, 2025

resolves #2270

OneMath (OneMKL) routines (gemm, gemv, gemm_batch) for matrix multiplication only support floating point data types. If inputs are integer, to use OneMath we need to upcasting them to floating point dtypes, perform the calculation and then cast back the result to integer dtypes which is unsafe and we may loose some information for large integers.
In this PR, the logic for dpnp.matmul is updated to use dpctl.tensor.matmul when result has a integer dtypes.

Performance Analysis

$ sycl-ls
[level_zero:gpu][level_zero:0] Intel(R) oneAPI Unified Runtime over Level-Zero, Intel(R) Data Center GPU Max 1100 12.60.7 [1.6.31294+9]
[opencl:cpu][opencl:0] Intel(R) OpenCL, Intel(R) Xeon(R) Platinum 8480+ OpenCL 3.0 (Build 0) [2025.19.1.0.16_160000]
[opencl:gpu][opencl:1] Intel(R) OpenCL Graphics, Intel(R) Data Center GPU Max 1100 OpenCL 3.0 NEO  [24.39.31294]
import numpy, dpnp

# Case 1 - matrix-matrix multiplication
n=1024
a=numpy.ones((n,n), dtype="i8")
%timeit numpy.matmul(a, a)

ia=dpnp.array(a, device="gpu")
%timeit dpnp.matmul(ia, ia); ia.sycl_queue.wait()

# Case 2 - Matrix-vector multiplication
n=4096*4
a=numpy.ones((n,n), dtype="i8")
b=numpy.ones((n,), dtype="i8")
%timeit numpy.matmul(a, b)

ia=dpnp.array(a, device="gpu")
ib=dpnp.array(b, device="gpu")
%timeit dpnp.matmul(ia, ib); ia.sycl_queue.wait()

# Case 3 - Vector-matrix multiplication
n=4096*4
a=numpy.ones((n,n), dtype="i8")
b=numpy.ones((n,), dtype="i8")
%timeit numpy.matmul(b, a)

ia=dpnp.array(a, device="gpu")
ib=dpnp.array(b, device="gpu")
%timeit dpnp.matmul(ib, ia); ia.sycl_queue.wait()

# Case 4 - batch matrix-matrix multiplication
n=256
a=numpy.ones((n,n,n), dtype="i8")
%timeit numpy.matmul(a, a)

ia=dpnp.array(a, device="gpu")
%timeit dpnp.matmul(ia, ia); ia.sycl_queue.wait()
Case # - n NumPy dpnp - Xeon dpnp - PVC
Case1 - 1024 4.98 s ± 867 μs 9.82 ms ± 40.6 μs 3.34 ms ± 4.16 μs
Case2 - 4096×16 4.2 s ± 616 μs 9.77 s ± 112 ms 1.27 s ± 2.04 ms
Case3 - 4096×4 9.6 s ± 3.3 ms 94.6 ms ± 2.15 ms 50.9 ms ± 38.2 μs
Case4 - 256 3.19 s ± 291 μs 48.9 ms ± 244 μs 10.4 ms ± 32.3 μs

For all cases dpnp shows a better performance compared to numpy except for Case 2 on Xeon.

  • Have you provided a meaningful PR description?
  • Have you added a test, reproducer or referred to issue with a reproducer?
  • Have you tested your changes locally for CPU and GPU devices?
  • Have you made sure that new changes do not introduce compiler warnings?
  • Have you checked performance impact of proposed changes?
  • If this PR is a work in progress, are you filing the PR as a draft?

@vtavana vtavana self-assigned this Feb 5, 2025
Copy link
Contributor

github-actions bot commented Feb 5, 2025

View rendered docs @ https://intelpython.github.io/dpnp/index.html

@coveralls
Copy link
Collaborator

coveralls commented Feb 5, 2025

Coverage Status

coverage: 71.629% (+0.06%) from 71.572%
when pulling 1edddbe on fix_issue-2270
into 0c455a6 on master.

Copy link
Contributor

github-actions bot commented Feb 5, 2025

Array API standard conformance tests for dpnp=0.17.0dev5=py312he4f9c94_31 ran successfully.
Passed: 971
Failed: 0
Skipped: 29

@vtavana vtavana marked this pull request as ready for review February 6, 2025 00:47
.github/workflows/conda-package.yml Show resolved Hide resolved
dpnp/dpnp_utils/dpnp_utils_linearalgebra.py Show resolved Hide resolved
dpnp/tests/test_product.py Outdated Show resolved Hide resolved
dpnp/tests/test_product.py Outdated Show resolved Hide resolved
dpnp/dpnp_utils/dpnp_utils_linearalgebra.py Outdated Show resolved Hide resolved
dpnp/dpnp_utils/dpnp_utils_linearalgebra.py Show resolved Hide resolved
dpnp/dpnp_utils/dpnp_utils_linearalgebra.py Outdated Show resolved Hide resolved
dpnp/tests/test_product.py Outdated Show resolved Hide resolved
Copy link
Contributor

@antonwolfy antonwolfy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @vtavana , I leave two more minor comments, but overall LGTM!

dpnp/tests/test_product.py Outdated Show resolved Hide resolved
dpnp/tests/test_product.py Outdated Show resolved Hide resolved
@vtavana vtavana merged commit db97d59 into master Feb 7, 2025
66 of 69 checks passed
@vtavana vtavana deleted the fix_issue-2270 branch February 7, 2025 22:32
github-actions bot added a commit that referenced this pull request Feb 7, 2025
… are integer (#2296)

resolves #2270

OneMath (OneMKL) routines (`gemm`, `gemv`, `gemm_batch`) for matrix
multiplication only support floating point data types. If inputs are
integer, to use OneMath we need to upcasting them to floating point
dtypes, perform the calculation and then cast back the result to integer
dtypes which is unsafe and we may loose some information for large
integers.
In this PR, the logic for `dpnp.matmul` is updated to use
`dpctl.tensor.matmul` when result has a integer dtypes. db97d59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

dpnp.tensordot returns wrong result
3 participants