-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add implementation of complex numbers #1336
Add implementation of complex numbers #1336
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering whether the complex type should be templated on the accelerator. Because then you could just use std::complex
everywhere on the CPU and only supply a different implementation for CUDA/HIP. I think this is also what @sliwowitz does with his RNG. AFAIK there is a different Philox implementation depending on the Acc.
During the VC @bernhardmgruber suggested if it is woth to have |
Notes from @SimeonEhrig on the VC. In the example we need to demostrate more: how to cast between buffer of |
9a9f6c3
to
598331e
Compare
I tested complex numbers with vikunja: alpaka-group/vikunja#54
This line is causing the error The problem has to do with the specialization of the |
Thanks for reporting. I will try. |
598331e
to
425df88
Compare
I discussed @SimeonEhrig issue with him offline, it could be he must change https://github.com/alpaka-group/vikunja/blob/28c82eada54e042adc316a57c9ef172c21bc1834/example/complex/main.cpp#L116 to auto transform = [] ALPAKA_FN_HOST_ACC(auto const& acc, alpaka::Complex<Data> const& a) -> Data to enforce the compile to evaluate the lambda on the device side only. IMO the issue is coming from the two-phase compiling of the cuda compiler. |
After an additional discussion with @psychocoderHPC we found a solution. Now I use a I uploaded the change: https://github.com/alpaka-group/vikunja/blob/77ed0ce48b33721d3d5a90705968817682a98121/example/complex/main.cpp#L19 Therefore, I'm fine with the complex numbers. |
425df88
to
bd35e7b
Compare
Just pushed a mostly finished version. Will hopefully fix the failing tests and do some polish tomorrow, also will clean up commit history. |
@SimeonEhrig there was an issue in my original code that caused a mixup of host and host-device math. Now I believe it is fixed for everything but So perhaps your original code was fine and the error was on my side, I do not know. |
Looks like, I was not affected of this bug. Nevertheless I tested you latest commit (bd35e7b) with my vikunja example and following acc's successful.
|
8ea8d96
to
1c1d99f
Compare
0f8e51f
to
e73af15
Compare
What is the state here? |
@sbastrakov Could you finish this PR? |
8115b59
to
59fe66f
Compare
I updated the old PR state to match the current alpaka |
SYCL backend is not updated yet, added a checkbox. |
d083cf9
to
5747da3
Compare
CI passed, I just need to clean up history a bit, since it took some effort to make it pass. Ready for review @j-stephan @bernhardmgruber |
@sbastrakov If you are finish with applying the suggestions, I would test the PR again with the vikunja example: alpaka-group/vikunja#54 |
After adding tests for Complex<>, Visual Studio CI setup was consistently running out of heap memory. Split the tests into separate translation units for float, double, complex float, complex double, and all ADL tests separately. The split fixes this issue at the cost of slight code duplication.
@bernhardmgruber thanks for your review! As i answered in a few points, the choices were made simply to mirror interface of I resolved the points i implemented in my local branch, and kept open and answered the points I think should not be implemented. |
@sbastrakov the PR is fine as is. I only suggested a few enhancements to improve the implementation. I would really like to see the |
d40d21f
to
956d4ec
Compare
I tested the example with Serial CPU, OpenMP Grid-Block, CUDA and HIP backkend. Looks good. |
Damn, now 1 test that have passed yesterday fails. Even tho i'm sure i didn't make any changes that should have affected it. I now loosened the bounds a little bit, hopefully will pass now. Also implemented @bernhardmgruber 's latest review suggestion. |
956d4ec
to
c19fa36
Compare
@bernhardmgruber @j-stephan ready to be merged |
This PR adds implementation of alpaka's own Complex class template. It is basically
std::complex<T>
but its methods also work on device. Math functions are (WIP now) done via alpaka traits, same way as for real numbers.TODO:
Resolve #734.