Compiler: parallel codegen with MT #14227

ysbaddaden · 2024-01-13T00:08:52Z

This implements parallel codegen of object files when MT is enabled in the compiler, which brings some performance improvements.

For example to compile Crystal itself on my laptop with empty caches: the codegen takes ~20s with fork and only ~14s with MT (~35s without this patch). With a filled cache fork takes ~9s and only ~4s with MT (~5s without this patch) 🚀

The biggest issue is LLVM having thread safety issues. The most prominent is that LLVMContext can't be shared, while the C library creates a global context, and from my tests LLVMTargetMachine can't be shared across threads either. There are no issues with the LLVM optimization pass with LLVM 16 at least; it might be different in LLVM 12 and before that use a different API (and reuses LLVM objects).

The good: I applied the patch on top of Crystal 1.11.1 and I could build and rebuild the compiler in non-release mode, -O1, -O2, -O3 or --single-module.
The bad: Sadly, a compiler built with --single-module -O1 or --release will segfault during codegen (which means other modes could probably also segfault, just not as often) 😭

We only use the bitcode for cache purposes, for MT safety we must parse the bitcode into a _new_ LLVM module in a new LLVM context for each compilation unit. We also can't share a LLVM target machine, and must create one for each compilation unit... but maybe we could share one per thread?

ysbaddaden · 2024-01-13T00:12:08Z

The segfault when the compiler is built in release mode:

Thread 11 "crystal" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffe3fff700 (LWP 231004)]
0x00007ffff1f0892f in ?? () from /lib/x86_64-linux-gnu/libLLVM-16.so.1
(gdb) bt
#0  0x00007ffff1f0892f in ?? () from /lib/x86_64-linux-gnu/libLLVM-16.so.1
#1  0x00007ffff1f088d4 in llvm::TargetLoweringObjectFileELF::getExplicitSectionGlobal(llvm::GlobalObject const*, llvm::SectionKind, llvm::TargetMachine const&) const () from /lib/x86_64-linux-gnu/libLLVM-16.so.1
#2  0x00007ffff2139248 in llvm::SelectionDAG::computeKnownBits(llvm::SDValue, llvm::APInt const&, unsigned int) const () from /lib/x86_64-linux-gnu/libLLVM-16.so.1
#3  0x00007ffff21943c4 in llvm::TargetLowering::SimplifyDemandedBits(llvm::SDValue, llvm::APInt const&, llvm::APInt const&, llvm::KnownBits&, llvm::TargetLowering::TargetLoweringOpt&, unsigned int, bool) const ()
   from /lib/x86_64-linux-gnu/libLLVM-16.so.1
#4  0x00007ffff21942a9 in llvm::TargetLowering::SimplifyDemandedBits(llvm::SDValue, llvm::APInt const&, llvm::APInt const&, llvm::KnownBits&, llvm::TargetLowering::TargetLoweringOpt&, unsigned int, bool) const ()
   from /lib/x86_64-linux-gnu/libLLVM-16.so.1
#5  0x00007ffff2191dab in llvm::TargetLowering::SimplifyDemandedBits(llvm::SDValue, llvm::APInt const&, llvm::KnownBits&, llvm::TargetLowering::TargetLoweringOpt&, unsigned int, bool) const ()
   from /lib/x86_64-linux-gnu/libLLVM-16.so.1
#6  0x00007ffff1fd395f in ?? () from /lib/x86_64-linux-gnu/libLLVM-16.so.1
#7  0x00007ffff1fd1457 in ?? () from /lib/x86_64-linux-gnu/libLLVM-16.so.1
#8  0x00007ffff1fce425 in ?? () from /lib/x86_64-linux-gnu/libLLVM-16.so.1
#9  0x00007ffff1f8275c in ?? () from /lib/x86_64-linux-gnu/libLLVM-16.so.1
#10 0x00007ffff1f80d4a in ?? () from /lib/x86_64-linux-gnu/libLLVM-16.so.1
#11 0x00007ffff1f7de93 in llvm::SelectionDAG::Combine(llvm::CombineLevel, llvm::AAResults*, llvm::CodeGenOpt::Level) () from /lib/x86_64-linux-gnu/libLLVM-16.so.1
#12 0x00007ffff2177182 in llvm::SelectionDAGISel::CodeGenAndEmitDAG() () from /lib/x86_64-linux-gnu/libLLVM-16.so.1
#13 0x00007ffff2176a97 in llvm::SelectionDAGISel::SelectAllBasicBlocks(llvm::Function const&) () from /lib/x86_64-linux-gnu/libLLVM-16.so.1
#14 0x00007ffff2174acf in llvm::SelectionDAGISel::runOnMachineFunction(llvm::MachineFunction&) () from /lib/x86_64-linux-gnu/libLLVM-16.so.1
#15 0x00007ffff42eeedf in ?? () from /lib/x86_64-linux-gnu/libLLVM-16.so.1
#16 0x00007ffff1d16fdb in llvm::MachineFunctionPass::runOnFunction(llvm::Function&) () from /lib/x86_64-linux-gnu/libLLVM-16.so.1
#17 0x00007ffff1acab6d in llvm::FPPassManager::runOnFunction(llvm::Function&) () from /lib/x86_64-linux-gnu/libLLVM-16.so.1
#18 0x00007ffff1ad07b3 in llvm::FPPassManager::runOnModule(llvm::Module&) () from /lib/x86_64-linux-gnu/libLLVM-16.so.1
#19 0x00007ffff1acb225 in llvm::legacy::PassManagerImpl::run(llvm::Module&) () from /lib/x86_64-linux-gnu/libLLVM-16.so.1
#20 0x00007ffff345a4eb in ?? () from /lib/x86_64-linux-gnu/libLLVM-16.so.1
#21 0x00007ffff345a2f2 in LLVMTargetMachineEmitToFile () from /lib/x86_64-linux-gnu/libLLVM-16.so.1
#22 0x0000555555ccf715 in emit_to_file () at /home/julien/src/crystal-1.11.1/src/llvm/target_machine.cr:36
#23 emit_obj_to_file () at /home/julien/src/crystal-1.11.1/src/llvm/target_machine.cr:23
#24 compile_to_object () at /home/julien/src/crystal-1.11.1/src/compiler/crystal/compiler.cr:914
#25 compile () at /home/julien/src/crystal-1.11.1/src/compiler/crystal/compiler.cr:860
#26 0x0000555555ccff26 in -> () at /home/julien/src/crystal-1.11.1/src/compiler/crystal/compiler.cr:539
#27 0x00005555555ccd2b in run () at /home/julien/src/crystal-1.11.1/src/fiber.cr:146
#28 0x0000000000000000 in ?? ()

ysbaddaden · 2024-01-15T14:10:00Z

The segfault appearing in release mode is likely because the program is spending less time in Crystal and more time in LLVM which leads to more opportunities for the thread-unsafe code in LLVM to happen in parallel and corrupt memory (leading to segfault).

I noticed that we can check if LLVM has been compiled with support for multithreading (and this patch should check for it) but I checked and my LLVM library does, so that's not the culprit (damn).

More runs through gdb would be interesting to see when it fails. It could be interesting to see if changing the ISEL mode (instruction selection) has any effect.

NOTE: to speed up the reproducibility and simplify gdb calls, we could write a simple program that loads the existing .bc files from the cache and tries to compile them in multiple threads.

kostya · 2024-01-15T16:47:16Z

When you build --single-module, its only 1 module, how mt can help here?. May be enable only for no --single-module.

ysbaddaden · 2024-01-15T20:30:30Z

@kostya no, this is when crystal has been compiled with -Dpreview_mt --single-module -O1 or another optimization level then using that compiler to compile something (e.g. crystal again) without --single-module which will trigger the parallel codegen with MT, and that will segfault.

Said differently: there are no issues with the generated binary, there is however an issue in the parallel codegen.

crysbot · 2024-03-04T18:12:58Z

This pull request has been mentioned on Crystal Forum. There might be relevant details there:

https://forum.crystal-lang.org/t/choosing-cpu-for-fastest-compilation/6665/2

ysbaddaden · 2024-04-01T13:44:02Z

For future reference:

Each module has its own LLVMContext but each module also has a reference to the main module's LLVMContext.

I'm not sure it explains the segfault in this pull request, since we dump/parse the LLVM IR from the main thread to the codegen threads into a new context, but it likely explains why the dump/parse is required.

ysbaddaden · 2024-04-01T13:45:29Z

I think the segfault in this PR is related to a LLVM pass, for example GlobalISEL (Global Instruction Selector).

crysbot · 2024-06-03T14:16:39Z

This pull request has been mentioned on Crystal Forum. There might be relevant details there:

https://forum.crystal-lang.org/t/very-slow-build-speeds-for-hello-world/6881/17

ysbaddaden · 2024-06-25T09:50:11Z

Closing. Superseded by #14748

ysbaddaden added 4 commits January 12, 2024 18:31

Compiler: enable parallel codegen with MT

1b7b362

fixup: explain why we gen the bitcode in the main thread

87a9b18

Fix: missing lib_llvm/bit_reader.cr file

1ad62a3

Blacksmoke16 added topic:compiler performance topic:multithreading labels Jan 13, 2024

ysbaddaden added 2 commits January 15, 2024 15:57

Fix: check if LLVM has been compiled with multithread support

0cdd9b0

fixup! Fix: check if LLVM has been compiled with multithread support

c6caabe

ysbaddaden mentioned this pull request Jan 30, 2024

Codegen: on demand distribution to forked processes #14273

Merged

ysbaddaden mentioned this pull request May 11, 2024

Occasionally recurring bug: Invalid Int32: "280375465082892" (ArgumentError) #14496

Closed

ysbaddaden mentioned this pull request Jun 25, 2024

Compiler: enable parallel codegen with MT #14748

Merged

6 tasks

ysbaddaden closed this Jun 25, 2024

ysbaddaden deleted the feature/compiler-parallel-codegen-with-mt branch June 25, 2024 09:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compiler: parallel codegen with MT #14227

Compiler: parallel codegen with MT #14227

ysbaddaden commented Jan 13, 2024 •

edited

Loading

ysbaddaden commented Jan 13, 2024

ysbaddaden commented Jan 15, 2024

kostya commented Jan 15, 2024

ysbaddaden commented Jan 15, 2024 •

edited

Loading

crysbot commented Mar 4, 2024

ysbaddaden commented Apr 1, 2024

ysbaddaden commented Apr 1, 2024

crysbot commented Jun 3, 2024

ysbaddaden commented Jun 25, 2024

Compiler: parallel codegen with MT #14227

Compiler: parallel codegen with MT #14227

Conversation

ysbaddaden commented Jan 13, 2024 • edited Loading

ysbaddaden commented Jan 13, 2024

ysbaddaden commented Jan 15, 2024

kostya commented Jan 15, 2024

ysbaddaden commented Jan 15, 2024 • edited Loading

crysbot commented Mar 4, 2024

ysbaddaden commented Apr 1, 2024

ysbaddaden commented Apr 1, 2024

crysbot commented Jun 3, 2024

ysbaddaden commented Jun 25, 2024

ysbaddaden commented Jan 13, 2024 •

edited

Loading

ysbaddaden commented Jan 15, 2024 •

edited

Loading