-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Request for clarification: What target is actually built? #11
Comments
No, only contains a single implementation
The 32bit version targets any generic x86 processor (so works on anything). The 64bit is based on a SSE3 or newer requirement as all 64bit processors must support SSE2 as a minimum anyway and apart from some very old early AMD cpus all 64bit processors have SSE3 or better. The assembly code for the specific optimised functions is created from the following optimised variants (for x86_64): x86_64/coreisbr/addmul_2.asm with default values from:
Not easily, you would have to replace the assembly files with the ones written for the different architecture. However, apart from 1 or 2 functions the 64b lib is using routines that are recommended for sandy bridge and newer architectures (still compatible with older hardware aswell) so theres not that much more performance wise that you can get out of it. |
Thanks for the list - can you please include hints about that (possibly with referencing this issue) in the documentation? In any case it would be good to also either directly add additional projects that would be more cpu-specific or document how to adjust the current one best (possibly a powershell/batch script that adjusts the project configuration?) - this likely could be handled in a different issue.
Otherwise... What do you think of switching to the "fat" configuration (compilation for all cpu's with runtime detection of features)? |
Ahh thats actually a little bit different as it is due to gmp using inline assembly that is not supported by the msvc compiler. So as I mentioned previously the external assembly in 64b builds is already using modern optimised variants. The difference your referring to here is due to 'inline' assembly optimisations that are compiler specific and just do not work with Visual Studios compiler. New msvc specific code would have to be written and added for that (see longlong.h for where it is missing).
Using fat requires a fair bit of work and last time I checked it had issues with msvc compilation (it was a long time ago though). And fat only enables switching for functions written in 'external' asm so it doesnt actually provide much advantage over the existing implementation as it still wont support the inline assembly. So there will always be some slight performance differences between compilers but im guessing most of the missing performance is due to missing inline asm (atleast the lzcnt issue mentioned is inline asm). That requires writing custom msvc code, which for these sorts of functions is something ive done before but I wont have the time to actually test if any of the changes work. |
That's good to know. So builds may have less performance than builds done from "upstream" with MSYS2 because of:
I suggest to wrap all that information here up in the docs (README?) and then close this issue as solved. ... and maybe create a follow-up issue for the last point which requires writing and adding custom msvc code (to be done later; if you could add one example with a CPU feature you can test, then you could reference it in that issue allowing others to work on this). |
Actually I just created a patch to add msvc intrinsics for some of the missing inline asm. I just need someone to check that the results are still valid (i.e. test that calculations are still correct). If i add the patch here are you capable of double checking it? |
Hm, what about copying the Makefiles and scripts in from the matching release (just copy not-existing files) [or run I'll have a look how this should be done if you aren't familiar with that. |
Not a bad idea, I gave it a go and was able to get some of the tests to run. Unfortunately due to the way the config script changes the build I coudnt get everything working due to symbol differences between the msvc compiled lib and mingw compiled tests. That and the fact that the changes are to macro defines and not to functions directly so some of the changes arent directly picked up in the mingw case. I had however already tested the changes in isolation by comparing the output to the pre-existing gcc assembly and everything tested correctly which is why I have already added the changes to the repo. If someone has a benchmark they can use I would be interested to see what difference the changes made |
There was someone on the GMP mailing list with a benchmark which could re-run it if the vcpkg package would be updated. As soon as a new version is available @ vcpkg I can ask "the benchmarker" for doing the checks. |
Could you please provide how you create those assembly codes from the original asm files? If I want to replace the files with ones written for another architecture, should I use any kind of generators or just rewrite them from scratch? |
The tuning values from 'gmp-mparam.h' can be changed by just copying across a different 'gmp-mparam.h' from one of the suitable folders within the 'mpn' directory and placing it into the SMP\x86 or SMP\x86-64 folder accordingly. To change one of the assembly files to one written for a different architecture is a bit more demanding.
Then for each desired assembly file (changing the filename and operation accordingly):
For x86_64:
The for each assembly file:
|
It would be good to know this, but it isn't part of the documentation:
--enable-fat
- so contains code for all CPUs (bigger lib)?Is it possible to explicit generate for a specific CPU?
The text was updated successfully, but these errors were encountered: