Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add implementation of complex numbers #1336

Conversation

sbastrakov
Copy link
Member

@sbastrakov sbastrakov commented Jun 7, 2021

This PR adds implementation of alpaka's own Complex class template. It is basically std::complex<T> but its methods also work on device. Math functions are (WIP now) done via alpaka traits, same way as for real numbers.

TODO:

  • Math traits. For now I've done only one for Abs, just to demonstrate how I would like to do it, and to discuss.
  • Unit tests.
  • Figure out what to do with SYCL math implementation.

Resolve #734.

Copy link
Member

@bernhardmgruber bernhardmgruber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering whether the complex type should be templated on the accelerator. Because then you could just use std::complex everywhere on the CPU and only supply a different implementation for CUDA/HIP. I think this is also what @sliwowitz does with his RNG. AFAIK there is a different Philox implementation depending on the Acc.

include/alpaka/math/Complex.hpp Outdated Show resolved Hide resolved
include/alpaka/math/Complex.hpp Show resolved Hide resolved
@sbastrakov
Copy link
Member Author

sbastrakov commented Jun 8, 2021

During the VC @bernhardmgruber suggested if it is woth to have Acc as an additional template parameter of Complex. Then it would be possible to partially specialize there, maybe for CUDA/HIP enabled vs. disabled. Then device-side functions would change from Complex<T> like now to Complex<TAcc, T> or TComplex, and generally it becomes futher away from std::complex<T>

@sbastrakov
Copy link
Member Author

sbastrakov commented Jun 8, 2021

Notes from @SimeonEhrig on the VC. In the example we need to demostrate more: how to cast between buffer of alpaka::Complex and std::complex, how to call math operations on host (either cast to std::complex and call, or have a kernel called on Host acc). I agree, will extend the example.

@SimeonEhrig
Copy link
Member

I tested complex numbers with vikunja: alpaka-group/vikunja#54
The API usage is nice and it works well with the CPU backend. But I have a problem with the CUDA backend. If I compile it for the CUDA backend, I get the following error:

/opt/spack-modules/linux-ubuntu20.04-skylake_avx512/gcc-10.2.0/cuda-11.0.2-mtxiuzc35ykbmjk4fnpmhwnjbgquvcv5/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/usr/bin/c++ -DALPAKA_ACC_CPU_B_SEQ_T_SEQ_ENABLED -DALPAKA_ACC_GPU_CUDA_ENABLED -DALPAKA_BLOCK_SHARED_DYN_MEMBER_ALLOC_KIB=47 -DALPAKA_DEBUG=0 -DALPAKA_DEBUG_OFFLOAD_ASSUME_HOST -DALPAKA_OFFLOAD_MAX_BLOCK_SIZE=256 -DBOOST_ALL_NO_LIB -I../vikunja/include -I/opt/spack-modules/linux-ubuntu20.04-skylake_avx512/gcc-10.2.0/cuda-11.0.2-mtxiuzc35ykbmjk4fnpmhwnjbgquvcv5/targets/x86_64-linux/include -I../alpaka/include -isystem=/opt/spack-modules/linux-ubuntu20.04-skylake_avx512/gcc-10.2.0/boost-1.74.0-lqyr7ou7iafiwvq5fb6ipp62o6l2gqlf/include -isystem=/opt/spack-modules/linux-ubuntu20.04-skylake_avx512/gcc-10.2.0/cuda-11.0.2-mtxiuzc35ykbmjk4fnpmhwnjbgquvcv5/include --generate-code=arch=compute_52,code=[compute_52,sm_52] -Xcompiler=-fPIE --expt-extended-lambda --expt-relaxed-constexpr -Xcudafe=--display_error_number -Xcudafe=--diag_suppress=esa_on_defaulted_function_ignored -Xcompiler -pthread -std=c++17 -MD -MT CMakeFiles/complexVikunja.dir/main.cpp.o -MF CMakeFiles/complexVikunja.dir/main.cpp.o.d -x cu -c ../main.cpp -o CMakeFiles/complexVikunja.dir/main.cpp.o
../alpaka/include/alpaka/math/abs/Traits.hpp(54): error #3050: calling a __device__ function("operator()") from a __host__ __device__ function("abs") is not allowed
          detected during instantiation of "auto alpaka::math::abs(const T &, const TArg &) [with T=alpaka::AccGpuCudaRt<alpaka::DimInt<1UL>, uint64_t>, TArg=alpaka::Complex<float>]" 
../main.cpp(114): here

../alpaka/include/alpaka/math/abs/Traits.hpp(54): error #3050: calling a __device__ function("operator()") from a __host__ __device__ function("abs") is not allowed
          detected during instantiation of "auto alpaka::math::abs(const T &, const TArg &) [with T=alpaka::AccGpuCudaRt<alpaka::DimInt<1UL>, uint64_t>, TArg=alpaka::Complex<float>]" 
../main.cpp(114): here

2 errors detected in the compilation of "../main.cpp".
ninja: build stopped: subcommand failed.

This line is causing the error
https://github.com/alpaka-group/vikunja/blob/28c82eada54e042adc316a57c9ef172c21bc1834/example/complex/main.cpp#L117

The problem has to do with the specialization of the acc object. Maybe I use it wrong.

@sbastrakov
Copy link
Member Author

Thanks for reporting. I will try.

@sbastrakov sbastrakov force-pushed the topic-addComplexImplementation branch from 598331e to 425df88 Compare November 9, 2021 15:47
@psychocoderHPC
Copy link
Member

I discussed @SimeonEhrig issue with him offline, it could be he must change https://github.com/alpaka-group/vikunja/blob/28c82eada54e042adc316a57c9ef172c21bc1834/example/complex/main.cpp#L116 to

auto transform = [] ALPAKA_FN_HOST_ACC(auto const& acc, alpaka::Complex<Data> const& a) -> Data

to enforce the compile to evaluate the lambda on the device side only. IMO the issue is coming from the two-phase compiling of the cuda compiler.

@SimeonEhrig
Copy link
Member

After an additional discussion with @psychocoderHPC we found a solution. Now I use a struct with templated operator() function for the transform (I was not aware, that is already working 😅 ). Now the template injection is working correctly and the whole application too.

I uploaded the change: https://github.com/alpaka-group/vikunja/blob/77ed0ce48b33721d3d5a90705968817682a98121/example/complex/main.cpp#L19

Therefore, I'm fine with the complex numbers.

@sbastrakov
Copy link
Member Author

Just pushed a mostly finished version. Will hopefully fix the failing tests and do some polish tomorrow, also will clean up commit history.

@sbastrakov
Copy link
Member Author

sbastrakov commented Nov 10, 2021

@SimeonEhrig there was an issue in my original code that caused a mixup of host and host-device math. Now I believe it is fixed for everything but arg and conj, however so far i do not see what's exactly wrong with them

So perhaps your original code was fine and the error was on my side, I do not know.

@SimeonEhrig
Copy link
Member

@SimeonEhrig there was an issue in my original code that caused a mixup of host and host-device math. Now I believe it is fixed for everything but arg and conj, however so far i do not see what's exactly wrong with them

Looks like, I was not affected of this bug. Nevertheless I tested you latest commit (bd35e7b) with my vikunja example and following acc's successful.

  • AccCpuSerial
  • AccCpuOmp2Blocks
  • AccGpuCudaRt
  • AccGpuHipRt

@sbastrakov sbastrakov marked this pull request as ready for review November 24, 2021 18:53
@sbastrakov sbastrakov force-pushed the topic-addComplexImplementation branch 2 times, most recently from 0f8e51f to e73af15 Compare November 30, 2021 16:15
@j-stephan
Copy link
Member

What is the state here?

@psychocoderHPC
Copy link
Member

@sbastrakov Could you finish this PR?

@sbastrakov sbastrakov force-pushed the topic-addComplexImplementation branch from 8115b59 to 59fe66f Compare March 29, 2022 13:47
@sbastrakov
Copy link
Member Author

I updated the old PR state to match the current alpaka develop. Math tests build and run on the dev system. However I believe the previously encountered issues with accuracy threshold for unit tests on some CI configurations still remain (since i didn't do anything to fix them yet).

@sbastrakov
Copy link
Member Author

SYCL backend is not updated yet, added a checkbox.

@sbastrakov
Copy link
Member Author

CI passed, I just need to clean up history a bit, since it took some effort to make it pass.

Ready for review @j-stephan @bernhardmgruber

example/complex/CMakeLists.txt Outdated Show resolved Hide resolved
include/alpaka/math/Complex.hpp Outdated Show resolved Hide resolved
include/alpaka/math/Complex.hpp Show resolved Hide resolved
include/alpaka/math/Complex.hpp Show resolved Hide resolved
include/alpaka/math/Complex.hpp Show resolved Hide resolved
include/alpaka/math/Complex.hpp Outdated Show resolved Hide resolved
include/alpaka/math/Complex.hpp Show resolved Hide resolved
include/alpaka/math/Complex.hpp Show resolved Hide resolved
include/alpaka/math/Complex.hpp Show resolved Hide resolved
test/unit/math/src/Functor.hpp Outdated Show resolved Hide resolved
@SimeonEhrig
Copy link
Member

@sbastrakov If you are finish with applying the suggestions, I would test the PR again with the vikunja example: alpaka-group/vikunja#54

After adding tests for Complex<>, Visual Studio CI setup was consistently running out of heap memory.
Split the tests into separate translation units for float, double, complex float, complex double, and all ADL tests separately.
The split fixes this issue at the cost of slight code duplication.
@sbastrakov
Copy link
Member Author

sbastrakov commented Apr 6, 2022

@bernhardmgruber thanks for your review! As i answered in a few points, the choices were made simply to mirror interface of std::complex<T>. The idea is, that it's as smooth as possible to transition from std::complex<T> to alpaka::Complex<T> when porting a code base to alpaka. So personally i would also do what you suggest in some places, but think here it's not worth to deviate from the std:: implementation unless there is a good reason.

I resolved the points i implemented in my local branch, and kept open and answered the points I think should not be implemented.

@bernhardmgruber
Copy link
Member

@sbastrakov the PR is fine as is. I only suggested a few enhancements to improve the implementation. I would really like to see the constexpr and layout comaptible check. But if you don't want to add it, so be it.

@sbastrakov sbastrakov force-pushed the topic-addComplexImplementation branch 2 times, most recently from d40d21f to 956d4ec Compare April 6, 2022 09:24
@SimeonEhrig
Copy link
Member

@sbastrakov If you are finish with applying the suggestions, I would test the PR again with the vikunja example: alpaka-group/vikunja#54

I tested the example with Serial CPU, OpenMP Grid-Block, CUDA and HIP backkend. Looks good.

@sbastrakov
Copy link
Member Author

sbastrakov commented Apr 6, 2022

Damn, now 1 test that have passed yesterday fails. Even tho i'm sure i didn't make any changes that should have affected it.

I now loosened the bounds a little bit, hopefully will pass now. Also implemented @bernhardmgruber 's latest review suggestion.

@sbastrakov sbastrakov force-pushed the topic-addComplexImplementation branch from 956d4ec to c19fa36 Compare April 6, 2022 11:28
@sbastrakov
Copy link
Member Author

@bernhardmgruber @j-stephan ready to be merged

@bernhardmgruber bernhardmgruber merged commit dacd3a2 into alpaka-group:develop Apr 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Abstraction of complex numbers
6 participants