-
Notifications
You must be signed in to change notification settings - Fork 205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missing NEON implementations #243
Comments
Excellent! If you don't mind me asking what's your motivation? It usually helps to prioritize kernels that you care about for your application of interest unless you're just trying to learn more about NEON/SIMD |
We're building an SDR "hat" for the raspberry pi, tujasdr.com. I already have some private implementations so might as well try to get them into volk. The others I will probably try to implement as I go along. Most are quite easy to do because the avx implementations are often very similar. |
@n-west are you the maintainer now? |
Perhaps of interest to @lemire ? |
Yes. Following. |
@vielmetti @lemire thanks for the support! |
I'm the only one merging PRs but everyone of @gnuradio/gr-officers is entitled to merge things. |
@ast do get your work merged once you are done please contact [email protected] for getting a CLA in place, thanks (: |
Is this list updated?. Planning to add some additional NEON support but not sure what has been completed? |
I'm not sure if @ast is keeping this updated? |
@dmiralles2009 @bhilburn yes it's updated! I would suggest you start at the bottom. Also I had some trouble with atan2 and the functions based on atan2, that might be interesting for you? Ping me if you have any questions! |
Great! atan works fine but I lose precision in the division required in atan2... Not sure how to solve that. |
@dmiralles2009 Did you a chance to have a look at it? |
Hey @ast ..Yes, I have been looking at it. I think I will be able to put some code during the weekend. I will keep you posted :) |
No stress, just tell me if you need some pointers. I implemented these today:
|
@ast Jaja, oh man you are on fire!!. I like the friendly pressure :). |
Volk I compile native on the raspberry pi 4. I normally mount the device using sshfs so I can edit on my laptop. I have rootfs on an SSD on USB3, SD-cards are painfully slow. |
@dmiralles2009 maybe this is helpful https://wiki.gnuradio.org/index.php/Cross_compile_for_Raspberry_Pi I wrote this (@ast helped too), but still need to link it into a top-level page in the wiki. This write-up came from this gist |
Well done!! Fork volk and push your changes so we can review! |
Hey @ast, so just got a new RPI4 to further develop some volks and now I am unable to properly compile. This is odd because with RPI3+ was working. Any clues?
|
Yeah ok this is a known bug, thought I already had submitted a change for that..? My toolchain file looks like this. Notice
You might have to remove the line |
Hi @ast , that was a good fix. I do not think the changes are available in volk (master). Maybe they are on your repo. I switched development to my RPI 3b+ board, I was unable to get a 64 bit OS functional in RPI 4. Have you been luckier in this regard?.
but I have been trying to run those in rpi4 to validate results. I should be pushing then soon. |
I had Ubuntu aarch64 working very well on rbpi3 using the raspbian
bootloader (to correctly load the device tree). If I remember correctly I
also added some users/groups that raspberry pi software expects.
Just try to edit the Cmake and toolchain flags to add the -mthumb flag. OR
you can just rename/remove the assembly file that fails on Raspberry Pi 4,
this is the easiest fix, you don't need it anyway.
…On Mon, Sep 9, 2019 at 3:09 AM Damian Miralles ***@***.***> wrote:
Hi @ast <https://github.com/ast> , that was a good fix. I do not think
the changes are available in volk (master). Maybe they are on your repo. I
switched development to my RPI 3b+ board, I was unable to get a 64 bit OS
functional in RPI 4. Have you been luckier in this regard?.
I did complete a couple of proto-kernels
1. volk_8ic_s32f_deinterleave_real_32f.h
2. volk_8ic_x2_multiply_conjugate_16ic.h
3. volk_8ic_s32f_deinterleave_32f_x2.h
4. volk_32fc_deinterleave_64f_x2.h
5. volk_32fc_deinterleave_real_64f.h
6. volk_32f_64f_add_64f.h
but I have been trying to run those in rpi4 to validate results. I should
be pushing then soon.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#243?email_source=notifications&email_token=AAA7YBSH46FXDHJVNSHDNRTQIWO3DA5CNFSM4HI4O4BKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6F7QII#issuecomment-529266721>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAA7YBWU2I4N7WSJQ7J4NNTQIWO3DANCNFSM4HI4O4BA>
.
|
@ast Is this list up to date? If not, can you get it updated? Thanks for your work getting NEON kernels into Volk! |
It's up to date! All the code is in my linked repo. I have limited time to create PRs so everyone is welcome to look at/improve upon my work and create PRs! |
I also have completed the proto kernel I listed but I am busy with paying
jobs. In a couple of days, I will submit a new PR.
@ast you ok if I submit your code changes on your behalf?. Having those on
the mainstream repo will be quite helpful.
Best,
Damian
…On Fri, Nov 15, 2019 at 3:05 PM Albin Stigo ***@***.***> wrote:
It's up to date! All the code is in my linked repo. I have limited time to
create PRs so everyone is welcome to look at/improve upon my work and
create PRs!
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#243?email_source=notifications&email_token=AB2KYSB5XYXQYYYIQFBWWU3QT4FKBA5CNFSM4HI4O4BKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEEGW2UY#issuecomment-554528083>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB2KYSEL7QD66CKBPVF7V6LQT4FKBANCNFSM4HI4O4BA>
.
--
Thanks
Damian Miralles
|
@dmiralles2009 yes go ahead! That would be great! Please update the list if you do! Or ping me and I'll update it... |
@ast do you still have kernels in your repo that are not yet merged? It would be great to see your work merged upstream. I added a recent overview of kernels per platform on libvolk.org. |
I can work on some of these as well now as I have recently got my aarch64 environment up and running. |
This is just a suggestion, but for some (more complicated) kernels, a conversion to NEON from SSE using something like sse2neon works pretty decently. I wouldn't especially include the header in volk but perhaps simply the generated NEON code after preprocessing. I did this on the convolutional decoding kernel obtaining a significant performance boost. |
This sounds like an interesting idea. Do you have a link to the sse2neon tool? |
It's a single header file. Project is on on github under the same name. |
I think it's a pretty good idea. Just mark them as machine-generated in a comment so it's easy to know which to (maybe) try to improve. With more recent compiler versions and ARM architectures, I have noted diminishing returns in hand written NEON code. But it's always interesting to try!! |
That sounds very good. Mark them as a conversion of the SSE implementation, for possible future rewrites or improvements. If this is all fine, I could PR my port of the Viterbi kernel to NEON (alongside tests results once I test it on a non-Android ARM platform for a proper performance check), and perhaps a few others kernels later. |
Ok, after more testing around, it would quite a bit simpler to have sse2neon as a header in Volk. The license is MIT so should be no problem with LGPL as far as I am aware. The last question that remains then, in case a SSE kernel is made NEON-compatible with this method, duplicating the kernel feels redundant. In the case of the convolutional decoder, I am just swapping out SSE headers with sse2neon if NEON support is detected. The actual kernel codes remains identical. As spiral is the kernel arch name in this instance, not an issue, but what about a "sse" kernel? Should it conserve its SSE naming (which does not seem to affect Volk in any negative way), or be changed over to "neon" or perhaps sse_neon or something else? Personally I'd think keeping the original SSE naming even on NEON would reflect the implementation being utilized is a "port" of the SSE kernel, and not something purpose-made for NEON. If a kernel ported this way is not too complicated to swap out SSE intrinsics to NEON's, it could perhaps become a "proper" NEON kernel? Those are just some ideas I had thinking about how starting to go this route could affect Volk outside of already sligtly different kernels. Edit : It is not possible with the current Volk machinery to have more one LV_HAVE_* in a kernel #if. So my intent to use the same for sse2neon is not feasible without some changes to the python scripts. |
Here's a list of kernels missing NEON implementations. Some of these are easy, some are hard, some would not provide any benefit. I'm determined to write as many as I can, so I'm creating this list to keep track.
Benchmarks are rough estimates based on
volk_test_all
on a Raspberry PI 3b aarch64 mode.My branch is here https://github.com/TujaSDR/volk/tree/tuja
Queued for improvement
Ideas for new kernels
New kernels added
Kernels where I updated NEON support
Kernels that were missing NEON support
The text was updated successfully, but these errors were encountered: