Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds Neonv8 protokernel to 'volk_32f_64f_add_64f' #282

Merged
merged 2 commits into from
Nov 7, 2019

Conversation

dmiralles2009
Copy link
Contributor

Adds Adds Neonv8 protokernel to volk_32f_64f_add_64f. Code in toolchains credited to Albin
Stigo (@ast). Execution improvement shown below

ubuntu@ubuntu:~/volk/build$ ./apps/volk_profile -d -R 32f_64f_add

RUN_VOLK_TESTS: volk_32f_64f_add_64f(131071,1987)
generic completed in 1079.18 ms
neon completed in 1056.38 ms
Best aligned arch: neon
Best unaligned arch: neon
Writing /home/ubuntu/.volk/volk_config...

Adds proto-kernel with NEONv8 support. Code in toolchains credited to Albin
Stigo (@ast). Also special thanks in this commit to evryone that participated in
@dmiralles2009
Copy link
Contributor Author

Hello friends,
It seems like the code is currently breaking in areas not related to the changes this PR is trying to fix. I am adding this as part of a discussion initiated by @ast in issue #243 (more details there). Is the build system currently broken?. Below is the output error.

The following tests FAILED:
	 85 - qa_volk_32fc_s32f_magnitude_16i (Failed)
Errors while running CTest
The command "mkdir build && cd build && BOOST_ROOT=$BOOST_ROOT cmake ${CMAKE_ARG} ../ && make && ctest -V && cd .." exited with 8.

Thanks in advance,
Damian

@jdemel
Copy link
Contributor

jdemel commented Sep 18, 2019

@dmiralles2009 could you test this with volk_profile -n -R 32fc_s32f_magnitude_16i?
Just a dry-run of this specific kernel.
Travis CI reports this same kernel as broken for your PR. But only for one build while all other configurations go through smoothly.

85: RUN_VOLK_TESTS: volk_32fc_s32f_magnitude_16i(131071,1)
85: a_avx2 completed in 0.10756 ms
85: a_sse3 completed in 0.566765 ms
85: a_sse completed in 0.594867 ms
85: generic completed in 0.490799 ms
85: u_avx2 completed in 0.131633 ms
85: offset 77213 in1: 407 in2: 406 tolerance was: 0
85: volk_32fc_s32f_magnitude_16i: fail on arch a_avx2
85: offset 77213 in1: 407 in2: 406 tolerance was: 0
85: volk_32fc_s32f_magnitude_16i: fail on arch a_sse3
85: offset 77213 in1: 407 in2: 406 tolerance was: 0
85: volk_32fc_s32f_magnitude_16i: fail on arch a_sse
85: offset 77213 in1: 407 in2: 406 tolerance was: 0
85: volk_32fc_s32f_magnitude_16i: fail on arch u_avx2
85: Best aligned arch: a_avx2
85: Best unaligned arch: u_avx2

Since this is a converter from float to int and the error says in1: 407 in2: 406 aka off-by-one error, for the time being this is one of those flaky QA tests that we'd love to fix. BTW, I can't make this test fail on my machine.

@dmiralles2009
Copy link
Contributor Author

Hi @jdemel, that is curious because my PR does not modify that portion of the code. I also did a dry run at home and it seems to build without failing in my machine. I might look over the failing QA in the next few days. Thanks for the response :)

@hcab14
Copy link
Contributor

hcab14 commented Sep 18, 2019

85: RUN_VOLK_TESTS: volk_32fc_s32f_magnitude_16i(131071,1)
85: a_avx2 completed in 0.10756 ms
85: a_sse3 completed in 0.566765 ms
85: a_sse completed in 0.594867 ms
85: generic completed in 0.490799 ms
85: u_avx2 completed in 0.131633 ms
85: offset 77213 in1: 407 in2: 406 tolerance was: 0
85: volk_32fc_s32f_magnitude_16i: fail on arch a_avx2
85: offset 77213 in1: 407 in2: 406 tolerance was: 0
85: volk_32fc_s32f_magnitude_16i: fail on arch a_sse3
85: offset 77213 in1: 407 in2: 406 tolerance was: 0
85: volk_32fc_s32f_magnitude_16i: fail on arch a_sse
85: offset 77213 in1: 407 in2: 406 tolerance was: 0
85: volk_32fc_s32f_magnitude_16i: fail on arch u_avx2
85: Best aligned arch: a_avx2
85: Best unaligned arch: u_avx2

Since this is a converter from float to int and the error says in1: 407 in2: 406 aka off-by-one error, for the time being this is one of those flaky QA tests that we'd love to fix. BTW, I can't make this test fail on my machine.

#285

@michaelld michaelld added ARM / Neon Neon ARCH specific Enhancement new kernel entirely or for some specific ARCH labels Oct 25, 2019
@@ -4,5 +4,6 @@
########################################################################
set(CMAKE_CXX_COMPILER g++)
set(CMAKE_C_COMPILER gcc)
set(CMAKE_CXX_FLAGS "-march=armv8-a -mtune=cortex-a72 -mfpu=neon-fp-armv8 -mfloat-abi=hard" CACHE STRING "" FORCE)
set(CMAKE_CXX_FLAGS "-ffast-math -march=armv8-a -mtune=cortex-a72 -mfpu=neon-fp-armv8 -mfloat-abi=hard" CACHE STRING "" FORCE)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recently we had a discussion on -ffast-math and my understanding is that we should not use it because it breaks things.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes if I remember correctly we agreed that it might break things because it allows non IEEE-754 compliant optimizations, and is therefore not a good default.

On the other hand NEONv7 is also not strictly IEEE-754 compliant and all tests passed for me with fast-math enabled.

But for now I think it's better to leave it out!

@michaelld
Copy link
Contributor

Any further updates to this PR, or do y'all think it is ready for review?

Copy link
Contributor

@michaelld michaelld left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please leave out the fast math flag for this specific PR. Otherwise, looks OK to me.

@dmiralles2009
Copy link
Contributor Author

Hi all, I have been busy with work but will look at this closely during the weekend. Thanks for the replies :)

Copy link
Contributor

@michaelld michaelld left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@jdemel jdemel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now

@michaelld
Copy link
Contributor

CI checks pass. Multiple positive reviews. Merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ARM / Neon Neon ARCH specific Enhancement new kernel entirely or for some specific ARCH
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants