Adds Neonv8 protokernel to 'volk_32f_64f_add_64f' #282

dmiralles2009 · 2019-09-14T16:20:49Z

Adds Adds Neonv8 protokernel to volk_32f_64f_add_64f. Code in toolchains credited to Albin
Stigo (@ast). Execution improvement shown below

ubuntu@ubuntu:~/volk/build$ ./apps/volk_profile -d -R 32f_64f_add

RUN_VOLK_TESTS: volk_32f_64f_add_64f(131071,1987)
generic completed in 1079.18 ms
neon completed in 1056.38 ms
Best aligned arch: neon
Best unaligned arch: neon
Writing /home/ubuntu/.volk/volk_config...

@ast

Adds proto-kernel with NEONv8 support. Code in toolchains credited to Albin Stigo (@ast). Also special thanks in this commit to evryone that participated in

dmiralles2009 · 2019-09-18T01:57:27Z

Hello friends,
It seems like the code is currently breaking in areas not related to the changes this PR is trying to fix. I am adding this as part of a discussion initiated by @ast in issue #243 (more details there). Is the build system currently broken?. Below is the output error.

The following tests FAILED:
	 85 - qa_volk_32fc_s32f_magnitude_16i (Failed)
Errors while running CTest
The command "mkdir build && cd build && BOOST_ROOT=$BOOST_ROOT cmake ${CMAKE_ARG} ../ && make && ctest -V && cd .." exited with 8.

Thanks in advance,
Damian

jdemel · 2019-09-18T07:29:51Z

@dmiralles2009 could you test this with volk_profile -n -R 32fc_s32f_magnitude_16i?
Just a dry-run of this specific kernel.
Travis CI reports this same kernel as broken for your PR. But only for one build while all other configurations go through smoothly.

85: RUN_VOLK_TESTS: volk_32fc_s32f_magnitude_16i(131071,1)
85: a_avx2 completed in 0.10756 ms
85: a_sse3 completed in 0.566765 ms
85: a_sse completed in 0.594867 ms
85: generic completed in 0.490799 ms
85: u_avx2 completed in 0.131633 ms
85: offset 77213 in1: 407 in2: 406 tolerance was: 0
85: volk_32fc_s32f_magnitude_16i: fail on arch a_avx2
85: offset 77213 in1: 407 in2: 406 tolerance was: 0
85: volk_32fc_s32f_magnitude_16i: fail on arch a_sse3
85: offset 77213 in1: 407 in2: 406 tolerance was: 0
85: volk_32fc_s32f_magnitude_16i: fail on arch a_sse
85: offset 77213 in1: 407 in2: 406 tolerance was: 0
85: volk_32fc_s32f_magnitude_16i: fail on arch u_avx2
85: Best aligned arch: a_avx2
85: Best unaligned arch: u_avx2

Since this is a converter from float to int and the error says in1: 407 in2: 406 aka off-by-one error, for the time being this is one of those flaky QA tests that we'd love to fix. BTW, I can't make this test fail on my machine.

dmiralles2009 · 2019-09-18T14:36:14Z

Hi @jdemel, that is curious because my PR does not modify that portion of the code. I also did a dry run at home and it seems to build without failing in my machine. I might look over the failing QA in the next few days. Thanks for the response :)

hcab14 · 2019-09-18T17:42:41Z

85: RUN_VOLK_TESTS: volk_32fc_s32f_magnitude_16i(131071,1)
85: a_avx2 completed in 0.10756 ms
85: a_sse3 completed in 0.566765 ms
85: a_sse completed in 0.594867 ms
85: generic completed in 0.490799 ms
85: u_avx2 completed in 0.131633 ms
85: offset 77213 in1: 407 in2: 406 tolerance was: 0
85: volk_32fc_s32f_magnitude_16i: fail on arch a_avx2
85: offset 77213 in1: 407 in2: 406 tolerance was: 0
85: volk_32fc_s32f_magnitude_16i: fail on arch a_sse3
85: offset 77213 in1: 407 in2: 406 tolerance was: 0
85: volk_32fc_s32f_magnitude_16i: fail on arch a_sse
85: offset 77213 in1: 407 in2: 406 tolerance was: 0
85: volk_32fc_s32f_magnitude_16i: fail on arch u_avx2
85: Best aligned arch: a_avx2
85: Best unaligned arch: u_avx2

Since this is a converter from float to int and the error says in1: 407 in2: 406 aka off-by-one error, for the time being this is one of those flaky QA tests that we'd love to fix. BTW, I can't make this test fail on my machine.

#285

jdemel · 2019-10-30T11:09:39Z

cmake/Toolchains/arm_cortex_a72_hardfp_native.cmake

@@ -4,5 +4,6 @@
 ########################################################################
 set(CMAKE_CXX_COMPILER g++)
 set(CMAKE_C_COMPILER  gcc)
-set(CMAKE_CXX_FLAGS "-march=armv8-a -mtune=cortex-a72 -mfpu=neon-fp-armv8 -mfloat-abi=hard" CACHE STRING "" FORCE)
+set(CMAKE_CXX_FLAGS "-ffast-math -march=armv8-a -mtune=cortex-a72 -mfpu=neon-fp-armv8 -mfloat-abi=hard" CACHE STRING "" FORCE)


Recently we had a discussion on -ffast-math and my understanding is that we should not use it because it breaks things.

Yes if I remember correctly we agreed that it might break things because it allows non IEEE-754 compliant optimizations, and is therefore not a good default.

On the other hand NEONv7 is also not strictly IEEE-754 compliant and all tests passed for me with fast-math enabled.

But for now I think it's better to leave it out!

michaelld · 2019-11-01T00:35:28Z

Any further updates to this PR, or do y'all think it is ready for review?

michaelld

Please leave out the fast math flag for this specific PR. Otherwise, looks OK to me.

dmiralles2009 · 2019-11-07T03:35:14Z

Hi all, I have been busy with work but will look at this closely during the weekend. Thanks for the replies :)

Updating requests from PR

michaelld

LGTM

jdemel

LGTM now

michaelld · 2019-11-07T15:33:56Z

CI checks pass. Multiple positive reviews. Merging.

proto-kernel: Adds NEONv8 protokernel to volk_32f_64f_add_64f

8250c36

Adds proto-kernel with NEONv8 support. Code in toolchains credited to Albin Stigo (@ast). Also special thanks in this commit to evryone that participated in

michaelld added ARM / Neon Neon ARCH specific Enhancement new kernel entirely or for some specific ARCH labels Oct 25, 2019

jdemel reviewed Oct 30, 2019

View reviewed changes

jdemel mentioned this pull request Oct 30, 2019

Adds Neonv8 support to 32fc_deinterleave_64f_x2 and 32fc_deinterleave_real_64f #283

Merged

michaelld requested changes Nov 7, 2019

View reviewed changes

Update arm_cortex_a72_hardfp_native.cmake

70261c9

Updating requests from PR

michaelld approved these changes Nov 7, 2019

View reviewed changes

jdemel approved these changes Nov 7, 2019

View reviewed changes

michaelld merged commit e5a5271 into gnuradio:master Nov 7, 2019

argilo mentioned this pull request Dec 14, 2023

Remove unnecessary volatiles from volk_32fc_s32f_magnitude_16i #724

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds Neonv8 protokernel to 'volk_32f_64f_add_64f' #282

Adds Neonv8 protokernel to 'volk_32f_64f_add_64f' #282

dmiralles2009 commented Sep 14, 2019

dmiralles2009 commented Sep 18, 2019

jdemel commented Sep 18, 2019

dmiralles2009 commented Sep 18, 2019

hcab14 commented Sep 18, 2019

jdemel Oct 30, 2019

ast Oct 30, 2019

michaelld commented Nov 1, 2019

michaelld left a comment

dmiralles2009 commented Nov 7, 2019

michaelld left a comment

jdemel left a comment

michaelld commented Nov 7, 2019

Adds Neonv8 protokernel to 'volk_32f_64f_add_64f' #282

Adds Neonv8 protokernel to 'volk_32f_64f_add_64f' #282

Conversation

dmiralles2009 commented Sep 14, 2019

dmiralles2009 commented Sep 18, 2019

jdemel commented Sep 18, 2019

dmiralles2009 commented Sep 18, 2019

hcab14 commented Sep 18, 2019

jdemel Oct 30, 2019

Choose a reason for hiding this comment

ast Oct 30, 2019

Choose a reason for hiding this comment

michaelld commented Nov 1, 2019

michaelld left a comment

Choose a reason for hiding this comment

dmiralles2009 commented Nov 7, 2019

michaelld left a comment

Choose a reason for hiding this comment

jdemel left a comment

Choose a reason for hiding this comment

michaelld commented Nov 7, 2019