Add support for fast-math optimizations #531

elliottslaughter · 2022-04-09T00:41:11Z

Adds support for fast-math optimizations in terralib.saveobj.

Fixes #530

TODO:

documentation
cudalib.toptx
choose default fast-math options on GPU? (going to keep optimizations disabled by default, users can enable them)

…ay be generated.

elliottslaughter · 2022-04-09T05:29:29Z

This is pretty much ready to go. The one remaining question is what default to set for (particularly GPU) fast-math flags.

Originally, Terra used NVVM for CUDA code generation. (We still support this path with LLVM 3.8.) It turns out that NVVM performs the equivalent of LLVM's contract fast-math flag by default. In fact, I'm not aware of any way to turn this off. Strictly speaking, this is incorrect, but the cost must be small enough that NVIDIA decided it was worth it. They made the same decision in NVCC, which I guess is not surprising.

With the migration to NVPTX, this goes away. Fast-math flags are encoded in each floating-point LLVM instruction, and NVPTX actually respects these (as opposed to NVVM which applies contract whether users asked for it or not). Because the default is to have all flags disabled, this means that users on NVPTX (and therefore, from Terra's perspective, all users on LLVM 5.0 or newer) see a performance regression via NVVM. But it is strictly speaking more correct. In other words, NVVM's choice might introduce some numerical inaccuracy, and NVPTX's default avoids that (at the cost of performance when this is not necessary).

Since it's easy to turn these flags on now, I think it's reasonable to leave them off by default, but again this does result in a performance regression for anyone who has been sticking it out with LLVM 3.8 and doesn't pay attention to the new arguments to terralib.saveobj / cudalib.toptx.

Edit: to be clear, the option that has been implemented in this PR is to keep fast-math flags disabled by default.

If anyone has opinions on this, let me know.

elliottslaughter · 2022-04-11T19:04:19Z

CI failure looks like a Nix issue, and my last (nearly identical) commit passed, so I'm going to go ahead and merge this.

elliottslaughter added 14 commits April 5, 2022 14:37

Hack: enable fast math on all floating point arithmetic.

79b661e

Scaffolding to pass fast math flags around.

4c7d667

Parse fastmath flags and make sure it gets all the way through.

fcc41cc

Be sure to catch the other places where floating point instructions m…

f7b9010

…ay be generated.

Fix version bound for contract.

ddf73bd

More version bounds.

244475b

Fix version bound.

25b1591

Fix version bound.

3607fce

Fix version bound.

dbda3f5

Choose test settings that will pass in LLVM 3.8.

28a5ea3

Fix for CUDA compilation unit.

9ea3da4

Make fast math flags configurable in cudalib.toptx.

0a038a6

Document optimization profiles.

b479311

Fix typo.

6916813

elliottslaughter added 2 commits April 8, 2022 23:18

Refactor.

38ee600

Refactor.

22570cd

elliottslaughter merged commit d115054 into terralang:master Apr 11, 2022

elliottslaughter deleted the fast-math branch April 11, 2022 19:04

elliottslaughter mentioned this pull request Apr 11, 2022

Support fast math flags #530

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for fast-math optimizations #531

Add support for fast-math optimizations #531

elliottslaughter commented Apr 9, 2022 •

edited

Loading

elliottslaughter commented Apr 9, 2022 •

edited

Loading

elliottslaughter commented Apr 11, 2022

Add support for fast-math optimizations #531

Add support for fast-math optimizations #531

Conversation

elliottslaughter commented Apr 9, 2022 • edited Loading

elliottslaughter commented Apr 9, 2022 • edited Loading

elliottslaughter commented Apr 11, 2022

elliottslaughter commented Apr 9, 2022 •

edited

Loading

elliottslaughter commented Apr 9, 2022 •

edited

Loading