You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First of all, congratulations for writing the most accurate micro-optimization textbook I have read so far.
There are, however, still a couple of important shortcomings that prevent me from recommending to my computational scientist community right now. One of them is your discussion of fast-math, which present it as an unilateral win that costs just a few bits of precision. This description lacks the level of nuance that this topic deserves, as explained in https://simonbyrne.github.io/notes/fastmath/ .
A good discussion of fast-math should cover the following points:
fast-math turns many important parts of the IEEE-754 specification into undefined behavior (basically every special number: -0, NaN, +/-inf... is assumed not to be received as input or emitted as output at runtime). As a result, it makes it dangerously easy to write programs that invoke undefined behavior, when all they seem to do is to perform basic FP operations. This can, in turn, result in arbitrary badness like all UB does.
fast-math makes the floating-point output of a program depend on hardware, compilers and even on compiler versions. While exact reproducibility is a very costly property to enforce on modern hardware, it is also makes programs a lot easier to test. A program that uses fast-math in production should have tests that assert the correctness of results using more mathematically general, and thus more complex properties.
While fast-math only costs a few bits of precision when performing small amounts of simple arithmetic, it can and will break fancier numerical algorithms, such as transcendental functions and statistics packages, not just in your code but in third-party libraries that you use. This is important because many common math manipulations in the set of reals (e.g. polynomial evaluations, ratios of small numbers...) are numerically unstable when performed in the set of floating-point numbers, and getting them to produce even one bit of correctness there requires special precautions. These precautions are exactly the kind of seemingly unnecessary complex code that fast-maths optimizes out.
Because of these points, I find it safer only use fast-math as a guide to manual optimization during development, rather than as a production binary feature, and would encourage you to advise the same in your book:
Periodically turn on fast-math.
Locate any resulting program speedup using a profiler.
Study the assembler before and after to see what fast-math did.
Make sure this transformation is valid for your algorithm (ideally, you would have tests for that).
If so, apply the corresponding transformation to the C code so that you get the same benefit without fast-math.
Turn off fast-math once it does not make any meaningful difference anymore.
The text was updated successfully, but these errors were encountered:
First of all, congratulations for writing the most accurate micro-optimization textbook I have read so far.
There are, however, still a couple of important shortcomings that prevent me from recommending to my computational scientist community right now. One of them is your discussion of fast-math, which present it as an unilateral win that costs just a few bits of precision. This description lacks the level of nuance that this topic deserves, as explained in https://simonbyrne.github.io/notes/fastmath/ .
A good discussion of fast-math should cover the following points:
Because of these points, I find it safer only use fast-math as a guide to manual optimization during development, rather than as a production binary feature, and would encourage you to advise the same in your book:
The text was updated successfully, but these errors were encountered: