Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add syclSolverInverter to use oneMKL on Intel GPUs #7

Open
wants to merge 70 commits into
base: sycl-allocator
Choose a base branch
from

Conversation

jngkim
Copy link

@jngkim jngkim commented May 11, 2022

Please review the developer documentation
on the wiki of this project that contains help and requirements.

Proposed changes

Add syclColverInveter and support functions in Platforms/SYCL.

What type(s) of changes does this code introduce?

  • Added QMCWaveFunctions/Fermion/syclSolverInverter.hpp and a unit test
  • Modified syclBLAS and sycl_determinant_helper
  • Modified CMakeLists.txt to use QMCPACK provided config file

Does this introduce a breaking change?

  • No

What systems has this change been tested on?

  • Intel GPU using nightly builds

Checklist

  • Yes. This PR is up to date with current the current state of 'develop'
  • No. Code added or changed in the PR has been clang-formatted
  • Yes. This PR adds tests to cover any new code, or to catch a bug that is being fixed
  • No. Documentation has been added (if appropriate)

@ye-luo ye-luo closed this May 11, 2022
@ye-luo ye-luo reopened this May 11, 2022
@ye-luo ye-luo closed this May 11, 2022
@ye-luo ye-luo reopened this May 11, 2022
const size_t m_max = ((m + tile_size - 1) / tile_size) * tile_size;
const size_t n_max = ((n + tile_size - 1) / tile_size) * tile_size;

return q.submit([&](sycl::handler& cgh) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you remove all handler and use the compact form of q.parallel_for with events.

int n,
int lda,
const TMAT* a,
const INDEX* pivot,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likely we should just hard-code int64_t

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel it is better than what CUDA/HIP does due to its blocking nature. CUDA code path just transfer the pivot and diagonal terms to host asynchronously.

{
const int N = 911;

#ifdef MIXED_PRECISION
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove ifdef and always test both?

prckent and others added 25 commits June 20, 2022 12:25
Update bora test scripts and remove unused test scripts
The gen_rotated_lcao_wf.py file is extended to generate QMC averages for
the parameter gradients.

A test for the legacy driver using one thread is introduced.
Do not use cudaDeviceProp maxTexture1D in HIP
Add orbital rotation test with legacy driver
For CI running on nitrogen
Rewrite loop to workaround NVHPC bugs
Add mock-up mw_accept_rejectMove in MSD.
…lone_warning

Drop standalone-debug for offload builds
Fix test_structure divide by 0
ye-luo pushed a commit that referenced this pull request Nov 21, 2023
ye-luo pushed a commit that referenced this pull request Dec 13, 2023
ye-luo pushed a commit that referenced this pull request Jun 2, 2024
only use this overload if semantically correct
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants