Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for fast-math optimizations #531

Merged
merged 16 commits into from
Apr 11, 2022

Conversation

elliottslaughter
Copy link
Member

@elliottslaughter elliottslaughter commented Apr 9, 2022

Adds support for fast-math optimizations in terralib.saveobj.

Fixes #530

TODO:

  • documentation
  • cudalib.toptx
  • choose default fast-math options on GPU? (going to keep optimizations disabled by default, users can enable them)

@elliottslaughter
Copy link
Member Author

elliottslaughter commented Apr 9, 2022

This is pretty much ready to go. The one remaining question is what default to set for (particularly GPU) fast-math flags.

Originally, Terra used NVVM for CUDA code generation. (We still support this path with LLVM 3.8.) It turns out that NVVM performs the equivalent of LLVM's contract fast-math flag by default. In fact, I'm not aware of any way to turn this off. Strictly speaking, this is incorrect, but the cost must be small enough that NVIDIA decided it was worth it. They made the same decision in NVCC, which I guess is not surprising.

With the migration to NVPTX, this goes away. Fast-math flags are encoded in each floating-point LLVM instruction, and NVPTX actually respects these (as opposed to NVVM which applies contract whether users asked for it or not). Because the default is to have all flags disabled, this means that users on NVPTX (and therefore, from Terra's perspective, all users on LLVM 5.0 or newer) see a performance regression via NVVM. But it is strictly speaking more correct. In other words, NVVM's choice might introduce some numerical inaccuracy, and NVPTX's default avoids that (at the cost of performance when this is not necessary).

Since it's easy to turn these flags on now, I think it's reasonable to leave them off by default, but again this does result in a performance regression for anyone who has been sticking it out with LLVM 3.8 and doesn't pay attention to the new arguments to terralib.saveobj / cudalib.toptx.

Edit: to be clear, the option that has been implemented in this PR is to keep fast-math flags disabled by default.

If anyone has opinions on this, let me know.

@elliottslaughter
Copy link
Member Author

CI failure looks like a Nix issue, and my last (nearly identical) commit passed, so I'm going to go ahead and merge this.

@elliottslaughter elliottslaughter merged commit d115054 into terralang:master Apr 11, 2022
@elliottslaughter elliottslaughter deleted the fast-math branch April 11, 2022 19:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support fast math flags
1 participant