Document how to combine SIMDe and CPU feature based runtime dispatch #1268

sdtckr · 2025-01-27T11:02:43Z

Vectorscan uses SIMDe, but it does not switch between native SIMD instructions and SIMDe at runtime.

Is it possible to implement a mechanism that allows switching between native SIMD (e.g., AVX2) and SIMDe-based implementations dynamically, depending on the CPU's capabilities?

Some platforms may lack SIMD support, so ensuring that a single binary runs efficiently across different architectures is important.

What would be the best approach to achieve this?

Thanks in advance for your insights!

Epixu · 2025-01-27T11:24:08Z

If you want a single binary to work on all architectures, then it will require a level of indirection, that is, you should rely on a pointer to the best fitted routine (so that you do not hit an instruction that doesn't exist), and you should populate pointers to all possible functions prior based on the runtime CPU capability check. Correct me if I'm wrong, but I believe this is out of the scope of SIMDe itself, because the library aims to be completely header-based and aggressively inlined, without any indirection (and overhead) whatsoever.

That being said, these aforementioned function pointers could be extern and dll/so-imported, and you could simply link to the appropriately prebuilt dynamic library (one for each CPU capability combination for example) at runtime, populating these pointers for you.

If you however do that for each possible SIMD function, you risk practically nullifying most of the benefits, which are due to locality and good compile-time optimization that often relies on the context in which you're using these functions. By having an indirection you risk losing this context, and jumping all over the memory. Measurements might show that all that trouble might've been for nothing. So when implementing such indirection it should be better to do it for higher-order functions, that do a lot of stuff under the hood, instead of each and every small inner operation.

mr-c · 2025-01-27T12:13:47Z

I agree with @Epixu , runtime CPU dispatch is out of scope for SIMDe. However, it would be nice to have a demonstration on how to implement it combined with compiling with different CPU features.

mr-c changed the title ~~Switching Between Native SIMD and SIMDe at Runtime?~~ Document how to combine SIMDe and CPU feature based runtime dispatch Jan 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document how to combine SIMDe and CPU feature based runtime dispatch #1268

Document how to combine SIMDe and CPU feature based runtime dispatch #1268

sdtckr commented Jan 27, 2025

Epixu commented Jan 27, 2025 •

edited

Loading

mr-c commented Jan 27, 2025

Document how to combine SIMDe and CPU feature based runtime dispatch #1268

Document how to combine SIMDe and CPU feature based runtime dispatch #1268

Comments

sdtckr commented Jan 27, 2025

Epixu commented Jan 27, 2025 • edited Loading

mr-c commented Jan 27, 2025

Epixu commented Jan 27, 2025 •

edited

Loading