-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compiler: parallel codegen with MT #14227
Compiler: parallel codegen with MT #14227
Conversation
We only use the bitcode for cache purposes, for MT safety we must parse the bitcode into a _new_ LLVM module in a new LLVM context for each compilation unit. We also can't share a LLVM target machine, and must create one for each compilation unit... but maybe we could share one per thread?
The segfault when the compiler is built in release mode:
|
The segfault appearing in release mode is likely because the program is spending less time in Crystal and more time in LLVM which leads to more opportunities for the thread-unsafe code in LLVM to happen in parallel and corrupt memory (leading to segfault). I noticed that we can check if LLVM has been compiled with support for multithreading (and this patch should check for it) but I checked and my LLVM library does, so that's not the culprit (damn). More runs through NOTE: to speed up the reproducibility and simplify |
When you build |
@kostya no, this is when crystal has been compiled with Said differently: there are no issues with the generated binary, there is however an issue in the parallel codegen. |
This pull request has been mentioned on Crystal Forum. There might be relevant details there: https://forum.crystal-lang.org/t/choosing-cpu-for-fastest-compilation/6665/2 |
For future reference: Each module has its own I'm not sure it explains the segfault in this pull request, since we dump/parse the LLVM IR from the main thread to the codegen threads into a new context, but it likely explains why the dump/parse is required. |
I think the segfault in this PR is related to a LLVM pass, for example GlobalISEL (Global Instruction Selector). |
This pull request has been mentioned on Crystal Forum. There might be relevant details there: https://forum.crystal-lang.org/t/very-slow-build-speeds-for-hello-world/6881/17 |
Closing. Superseded by #14748 |
This implements parallel codegen of object files when MT is enabled in the compiler, which brings some performance improvements.
For example to compile Crystal itself on my laptop with empty caches: the codegen takes ~20s with
fork
and only ~14s with MT (~35s without this patch). With a filled cachefork
takes ~9s and only ~4s with MT (~5s without this patch) 🚀The biggest issue is LLVM having thread safety issues. The most prominent is that LLVMContext can't be shared, while the C library creates a global context, and from my tests LLVMTargetMachine can't be shared across threads either. There are no issues with the LLVM optimization pass with LLVM 16 at least; it might be different in LLVM 12 and before that use a different API (and reuses LLVM objects).
The good: I applied the patch on top of Crystal 1.11.1 and I could build and rebuild the compiler in non-release mode,
-O1
,-O2
,-O3
or--single-module
.The bad: Sadly, a compiler built with
--single-module -O1
or--release
will segfault during codegen (which means other modes could probably also segfault, just not as often) 😭