-
Notifications
You must be signed in to change notification settings - Fork 263
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document how to combine SIMDe and CPU feature based runtime dispatch #1268
Comments
If you want a single binary to work on all architectures, then it will require a level of indirection, that is, you should rely on a pointer to the best fitted routine (so that you do not hit an instruction that doesn't exist), and you should populate pointers to all possible functions prior based on the runtime CPU capability check. Correct me if I'm wrong, but I believe this is out of the scope of SIMDe itself, because the library aims to be completely header-based and aggressively inlined, without any indirection (and overhead) whatsoever. That being said, these aforementioned function pointers could be If you however do that for each possible SIMD function, you risk practically nullifying most of the benefits, which are due to locality and good compile-time optimization that often relies on the context in which you're using these functions. By having an indirection you risk losing this context, and jumping all over the memory. Measurements might show that all that trouble might've been for nothing. So when implementing such indirection it should be better to do it for higher-order functions, that do a lot of stuff under the hood, instead of each and every small inner operation. |
I agree with @Epixu , runtime CPU dispatch is out of scope for SIMDe. However, it would be nice to have a demonstration on how to implement it combined with compiling with different CPU features. |
Vectorscan uses SIMDe, but it does not switch between native SIMD instructions and SIMDe at runtime.
Is it possible to implement a mechanism that allows switching between native SIMD (e.g., AVX2) and SIMDe-based implementations dynamically, depending on the CPU's capabilities?
Some platforms may lack SIMD support, so ensuring that a single binary runs efficiently across different architectures is important.
What would be the best approach to achieve this?
Thanks in advance for your insights!
The text was updated successfully, but these errors were encountered: