Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arctan avx512 #759

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

Arctan avx512 #759

wants to merge 4 commits into from

Conversation

Ka-zam
Copy link
Contributor

@Ka-zam Ka-zam commented Feb 25, 2024

Added AVX512 kernels and some minor cleanup.

Using AVX512F yields 40% speedup over the AVX2_FMA implementation on my 7950X3D. Compared to the generic atan2 implementation this is a 65x speedup.

                      magnus@r7950x3d:~/src/kazam/volk/build$ volk_profile -R atan
                      RUN_VOLK_TESTS: volk_32f_atan_32f(131071,1987)
                      generic completed in 2010.97 ms
                      polynomial completed in 62.8946 ms
                      a_avx512 completed in 40.8423 ms
                      a_avx2_fma completed in 56.9026 ms
                      a_avx2 completed in 55.9691 ms
                      a_sse4_1 completed in 110.292 ms
                      u_avx512 completed in 39.5152 ms
                      u_avx2_fma completed in 55.4739 ms
                      u_avx2 completed in 55.6364 ms
                      u_sse4_1 completed in 110.009 ms
                      Best aligned arch: u_avx512
                      Best unaligned arch: u_avx512
                      RUN_VOLK_TESTS: volk_32fc_s32f_atan2_32f(131071,1987)
 ------>              generic completed in 4199.28 ms
                      polynomial completed in 95.92 ms
 ------>              a_avx512 completed in 64.0566 ms
                      a_avx2_fma completed in 99.0502 ms
                      a_avx2 completed in 98.3313 ms
 ------>              u_avx512 completed in 63.4753 ms
                      u_avx2_fma completed in 98.3834 ms
                      u_avx2 completed in 98.6633 ms
                      Best aligned arch: u_avx512
                      Best unaligned arch: u_avx512
                      Writing /home/magnus/.volk/volk_config...

Signed-off-by: Magnus Lundmark <[email protected]>
Signed-off-by: Magnus Lundmark <[email protected]>
@Ka-zam
Copy link
Contributor Author

Ka-zam commented Feb 28, 2024

I just noticed there's a NaN test as well...

#731

Need to update this PR with this as well for AVX512!

@jj1bdx
Copy link
Contributor

jj1bdx commented Feb 29, 2024

@Ka-zam #731 is essential for airspy-fmradion, and I've spent a few weeks solving the NaN issue. Please add the NaN test before completing your implementation.

Copy link
Contributor

@jdemel jdemel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution. The NaN checks should really be integrated. Other than that, only minor comments.

include/volk/volk_avx512_intrinsics.h Show resolved Hide resolved
kernels/volk/volk_32f_atan_32f.h Outdated Show resolved Hide resolved
kernels/volk/volk_32f_atan_32f.h Outdated Show resolved Hide resolved
kernels/volk/volk_32f_atan_32f.h Outdated Show resolved Hide resolved
kernels/volk/volk_32f_atan_32f.h Outdated Show resolved Hide resolved
kernels/volk/volk_32fc_s32f_atan2_32f.h Outdated Show resolved Hide resolved
kernels/volk/volk_32f_atan_32f.h Show resolved Hide resolved
kernels/volk/volk_32fc_s32f_atan2_32f.h Outdated Show resolved Hide resolved
kernels/volk/volk_32fc_s32f_atan2_32f.h Outdated Show resolved Hide resolved
kernels/volk/volk_32fc_s32f_atan2_32f.h Outdated Show resolved Hide resolved
@Ka-zam
Copy link
Contributor Author

Ka-zam commented Feb 29, 2024

@Ka-zam #731 is essential for airspy-fmradion, and I've spent a few weeks solving the NaN issue. Please add the NaN test before completing your implementation.

I think it's already in there and should work fine! Wrote a test program and

atan2(0.f, 0.f) == 0.f

for all kernels.

Signed-off-by: Magnus Lundmark <[email protected]>
@Ka-zam
Copy link
Contributor Author

Ka-zam commented Mar 5, 2024

Here's some special values and a sanity check:

magnus@r7950x3d:~/src/kazam/scratch$ ./a.out 
          y :  1.00  -1.00   1.00  -1.00    nan    nan   0.00  -0.00   1.00  -1.00 
          x :  1.00   1.00  -1.00  -1.00   1.00    nan   0.00   0.00   0.00   0.00 
 atan2(y, x):
    generic :  0.79  -0.79   2.36  -2.36    nan    nan   0.00  -0.00   1.57  -1.57 
 polynomial :  0.79  -0.79   2.36  -2.36    nan    nan   0.00  -0.00   1.57  -1.57 
 a_avx512dq :  0.79  -0.79   2.36  -2.36   0.00   0.00   0.00   0.00   1.57  -1.57 

Do we care about the sign of atan2(-0, 0)? What about propagating nan?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants