-
I'm relatively new to SIMD management so I have some questionning. It seems the SIMD levels to choose when compiling Jolt require to know exactly which CPU the game will run on. That makes sense, and on consoles I guess it's easy to get the best performance. But on PC, it is quite limiting, because to release a game on PC, users can have a wide range of CPUs. According to Steam hardware support survey, AVX2 appears widely supported, but 5% of users would end up crashing. Which means the distribution would have to support no more than SSE3 to match 100% of users with the same executable, leaving the others with no way to exploit better performance. So in order to get maximum performance, the game would have to be built not only for every platform, but also for multiple SIMD levels, and somehow have users install the right one. Another way would be to compile just Jolt as multiple dynamic libraries for each SIMD level, and dynamically load the best one at runtime. However, that means the game can't use Jolt directly anymore. It would have to be abstracted behind an interface, and the dynamic library would instantiate its implementation or fill in function pointers, which feels kinda meh. I tend to consider downgrading to SSE3 if I had to ship just one executable, as it is the simplest and safest option, maybe use AVX2 for my personal testing since I know my CPU supports it, but I feel like I'd be missing out. Also, I'm wondering when I can enable |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments
-
I personally favor multiple binaries for each architecture rather than dynamically dispatching AVX or SSE code for every So rather than having conditional branches all over code for certain SIMD features, it just a static high-level decision(picking an applicable .dll, shipping multiple executables, etc) to maintain performance and avoid all the branching and slowdowns required to dispatch arch-optimized functions at runtime. Clear Linux does this route, where the host machine's capabilities are detected and then the most applicable version of a package is downloaded that is specifically compiled for the architecture's features. I think Clear Linux delimits this by Linux also has formal names for particular microarchitecture feature-levels:
Some software such as PCSX2 go this route by releasing an SSE4 and AVX version of their software. |
Beta Was this translation helpful? Give feedback.
-
I have the same preference as @Wunkolo. Switching between versions for every call to Abs or Dot has too much overhead indeed. The alternative is to compile hot spots in the code in multiple versions (e.g. I know that some physics engines do this for the inner loop of the solver). Obviously that limits the benefit to only those portions of the code at the cost of a much more complex code base (basically have to support multiple vector classes, multiple matrix classes etc. or SIMD-ify everything in place which makes things much less readable). I think you'll get most gain by compiling the executable in multiple flavors and have a 'main' executable whose only job it is to start the right executable. That said, I have to say that the performance benefit of AVX2 over SSE2 (Jolt doesn't have a SSE3 version) is much less than you may expect. See this discussion: #327 (reply in thread) So if you don't want to go through the hassle, dropping down to SSE2 is not the end of the world. |
Beta Was this translation helpful? Give feedback.
-
B.t.w. LZNCT, TZNCT, F16C and FMADD are all newer than SSE2, so if you want to drop down to SSE2 you should disable these as well. In fact the 32-bit build turns all of these off by default, see cmake_vs2022_cl_32bit.bat. |
Beta Was this translation helpful? Give feedback.
I have the same preference as @Wunkolo.
Switching between versions for every call to Abs or Dot has too much overhead indeed. The alternative is to compile hot spots in the code in multiple versions (e.g. I know that some physics engines do this for the inner loop of the solver). Obviously that limits the benefit to only those portions of the code at the cost of a much more complex code base (basically have to support multiple vector classes, multiple matrix classes etc. or SIMD-ify everything in place which makes things much less readable).
I think you'll get most gain by compiling the executable in multiple flavors and have a 'main' executable whose only job it is to start the right exe…