Add syclSolverInverter to use oneMKL on Intel GPUs #7

jngkim · 2022-05-11T00:23:20Z

Please review the developer documentation
on the wiki of this project that contains help and requirements.

Proposed changes

Add syclColverInveter and support functions in Platforms/SYCL.

What type(s) of changes does this code introduce?

Added QMCWaveFunctions/Fermion/syclSolverInverter.hpp and a unit test
Modified syclBLAS and sycl_determinant_helper
Modified CMakeLists.txt to use QMCPACK provided config file

Does this introduce a breaking change?

No

What systems has this change been tested on?

Intel GPU using nightly builds

Checklist

Yes. This PR is up to date with current the current state of 'develop'
No. Code added or changed in the PR has been clang-formatted
Yes. This PR adds tests to cover any new code, or to catch a bug that is being fixed
No. Documentation has been added (if appropriate)

ye-luo · 2022-05-17T21:06:25Z

src/Platforms/SYCL/syclBLAS.cpp

+  const size_t m_max         = ((m + tile_size - 1) / tile_size) * tile_size;
+  const size_t n_max         = ((n + tile_size - 1) / tile_size) * tile_size;
+
+  return q.submit([&](sycl::handler& cgh) {


Could you remove all handler and use the compact form of q.parallel_for with events.

ye-luo · 2022-05-17T21:13:19Z

src/QMCWaveFunctions/detail/SYCL/sycl_determinant_helper.hpp

+                                   int n,
+                                   int lda,
+                                   const TMAT* a,
+                                   const INDEX* pivot,


Likely we should just hard-code int64_t

I don't feel it is better than what CUDA/HIP does due to its blocking nature. CUDA code path just transfer the pivot and diagonal terms to host asynchronously.

ye-luo · 2022-05-17T21:15:18Z

src/QMCWaveFunctions/tests/test_syclSolverInverter.cpp

+{
+  const int N = 911;
+
+#ifdef MIXED_PRECISION


Can we remove ifdef and always test both?

A python script (using autograd) is used to generate reference values for the values of the wavefunction and derivatives at one point in space. The test system is a helium atom with two orbitals and optionally a Jastrow factor.

Add tests for Rotated SPOs using LCAO

Update bora test scripts and remove unused test scripts

The gen_rotated_lcao_wf.py file is extended to generate QMC averages for the parameter gradients. A test for the legacy driver using one thread is introduced.

Do not use cudaDeviceProp maxTexture1D in HIP

Add orbital rotation test with legacy driver

For CI running on nitrogen

Add MPI support to ROCm legacy CI

Rewrite loop to workaround NVHPC bugs

Remove omp parallel over walkers

Add mock-up mw_accept_rejectMove in MSD.

…lone_warning Drop standalone-debug for offload builds

Fix test_structure divide by 0

Make variableset real for ray

fix to derivatives

only use this overload if semantically correct

jngkim added 3 commits May 10, 2022 14:23

Add syclSolverInverter routines.

27390ec

Merge DelayedUpdateSYCL.h

ead9857

Use syclSolverInverter in DelayedUpdateSYCL.h

2e3012d

ye-luo closed this May 11, 2022

ye-luo reopened this May 11, 2022

ye-luo closed this May 11, 2022

ye-luo reopened this May 11, 2022

ye-luo and others added 4 commits May 10, 2022 19:28

Merge remote-tracking branch 'origin/develop' into sycl-allocator-solver

6b5d0ed

Merge branch 'QMCPACK:develop' into sycl-allocator-solver

5d40b0c

Fix complex compilation.

b5fb457

Formatting

64fcaff

ye-luo reviewed May 17, 2022

View reviewed changes

jngkim and others added 18 commits May 26, 2022 12:57

Merge branch 'QMCPACK:develop' into sycl-allocator-solver

6222150

Merge branch 'QMCPACK:develop' into sycl-allocator-solver

4eaa3c6

Merge branch 'QMCPACK:develop' into sycl-allocator-solver

58776ff

Merge branch 'QMCPACK:develop' into sycl-allocator-solver

9958031

Merge branch 'QMCPACK:develop' into sycl-allocator-solver

42005de

Update with interop and add waits.

c6ee364

Merge branch 'QMCPACK:develop' into sycl-allocator-solver

8edea7a

Add tests for Rotated SPOs using LCAO

10c90b2

A python script (using autograd) is used to generate reference values for the values of the wavefunction and derivatives at one point in space. The test system is a helium atom with two orbitals and optionally a Jastrow factor.

Merge pull request QMCPACK#4059 from markdewing/test_rotated_lcao

01e2c26

Add tests for Rotated SPOs using LCAO

Add data access APIs in OhmmsArray

58f01ad

Renaming variables.

6685f2f

Replace operator() with parameter packing.

7f77018

Rename data(offset) to data_at(indices)

c210527

Replace redundant code with a function.

ba4363b

Grouping doxygen comments.

036d619

Expand a bit unit test.

908f40e

Adjust phi_vgl layout.

d362348

Align SplineC2COMPTarget layout as SplineC2ROMPTarget

4c96e70

prckent and others added 25 commits June 20, 2022 12:25

Merge pull request QMCPACK#4067 from ye-luo/testing-scripts

de1db4f

Update bora test scripts and remove unused test scripts

Orbital rotation test with legacy driver

e7027ae

The gen_rotated_lcao_wf.py file is extended to generate QMC averages for the parameter gradients. A test for the legacy driver using one thread is introduced.

Do not use cudaDeviceProp maxTexture1D in HIP

0a0d55a

Merge pull request QMCPACK#4070 from jakurzak/develop

7193bc4

Do not use cudaDeviceProp maxTexture1D in HIP

Merge branch 'develop' into he_orb_rot_test

9f8548b

Merge pull request QMCPACK#4069 from markdewing/he_orb_rot_test

3637ba2

Add orbital rotation test with legacy driver

Add MPI support to ROCm legacy CI

cdf7430

For CI running on nitrogen

Merge pull request QMCPACK#4071 from williamfgc/ci-rocm-legacy-mpi

cf217ed

Add MPI support to ROCm legacy CI

Rewrite loop to avoid NVHPC hang.

e0afba1

Remove omp parallel over walkers.

49bfe4b

Merge pull request QMCPACK#4073 from ye-luo/avoid-nvhpc-hang

01e69af

Rewrite loop to workaround NVHPC bugs

Merge branch 'develop' into remove-parallel

30d63d2

Merge pull request QMCPACK#4074 from ye-luo/remove-parallel

f98bddf

Remove omp parallel over walkers

Add MSD::mw_accept_rejectMove unit test

608cdab

Add mock-up mw_accept_rejectMove in MSD.

2fce443

drop standalone-debug for offload builds

17160d4

Merge pull request QMCPACK#4072 from ye-luo/MSD_mw_accept_reject

c0ea563

Add mock-up mw_accept_rejectMove in MSD.

adding a message about -fstandalone-debug

8019f7f

Merge branch 'develop' into remove_annoying_standalone_warning

7e9baad

Guard div by 0

179a2d0

Merge pull request QMCPACK#4080 from PDoakORNL/remove_annoying_standa…

01b4f73

…lone_warning Drop standalone-debug for offload builds

Merge branch 'develop' into fixr

7559242

Merge pull request QMCPACK#4082 from prckent/fixr

7d8618d

Fix test_structure divide by 0

Merge branch 'QMCPACK:develop' into sycl-allocator-solver

5867ffb

Fix to avoid double-free with latest compilers.

b89a911

ye-luo force-pushed the sycl-allocator branch from 8e5221c to 76c6bc8 Compare July 20, 2022 00:23

ye-luo force-pushed the sycl-allocator branch from d36f048 to 475c84a Compare January 3, 2023 20:24

ye-luo pushed a commit that referenced this pull request Nov 21, 2023

Merge pull request #7 from jptowns/make_variableset_real_for_ray

f486407

Make variableset real for ray

ye-luo pushed a commit that referenced this pull request Dec 13, 2023

Merge pull request #7 from camelto2/kappa_list_unpacking_for_ray

4ae4e66

fix to derivatives

ye-luo pushed a commit that referenced this pull request Jun 2, 2024

Merge pull request #7 from correaa/afqmc-use-multi-algo-priority-system

5b49a87

only use this overload if semantically correct

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add syclSolverInverter to use oneMKL on Intel GPUs #7

Add syclSolverInverter to use oneMKL on Intel GPUs #7

jngkim commented May 11, 2022

ye-luo May 17, 2022

ye-luo May 17, 2022

ye-luo May 17, 2022

ye-luo May 17, 2022

Add syclSolverInverter to use oneMKL on Intel GPUs #7

Are you sure you want to change the base?

Add syclSolverInverter to use oneMKL on Intel GPUs #7

Conversation

jngkim commented May 11, 2022

Proposed changes

What type(s) of changes does this code introduce?

Does this introduce a breaking change?

What systems has this change been tested on?

Checklist

ye-luo May 17, 2022

Choose a reason for hiding this comment

ye-luo May 17, 2022

Choose a reason for hiding this comment

ye-luo May 17, 2022

Choose a reason for hiding this comment

ye-luo May 17, 2022

Choose a reason for hiding this comment