-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exponential: Dual Xeon Gold 6154 result #11
Comments
That's pretty good for exp. For In the past I've done the mantissa part with a mixture of Taylor series for near 1 and Padé approximants for further away. That got down to about 60 to 17 cycles on Intel core2 series from Nehalem to Skylake, respectively, for a 1e-7 relative error single precision log, anyway. There may be a faster approach than Taylor/Padé, but it seemed very likely the "IEEE exponent field" trick would be applicable to whatever you do if you're willing to assume IEEE. I also perused your references and didn't see it mentioned. They seem more (edit: clarified performance mention in time and accuracy) |
Log focus is coming, I didn't get to that yet. |
Well, when you do get to log, I suggest as a starting point these blog posts from some Ebay guy. {That material didn't exist back when I was figuring this out for myself :-) } : https://www.ebayinc.com/stories/blogs/tech/fast-approximate-logarithms-part-i-the-basics/ https://www.ebayinc.com/stories/blogs/tech/fast-approximate-logarithms-part-ii-rounding-error/ https://www.ebayinc.com/stories/blogs/tech/fast-approximate-logarithms-part-iii-the-formulas/ He never really mentions Padé approximants, though his rational approximations are similar. He argues for a [3/4, 6/4) interval. (As mentioned, any [t,2t) works - whatever gets you a fast, well approximated "shifted mantissa".) Seems like if you have all this Intel-specific SIMD code, assuming IEEE (even little-endian IEEE) is not much of a stretch. Particularly with log, there can be unavoidable speed/accuracy trade-offs. So, you may want to have a Nim call that accepts a target precision (and conceivably even returns an error estimate), and then just default it to 23 bits for float32 and 51 bits for float64 and let the call-site use a larger error when faster performance is desired and in-context lower accuracy is acceptable. |
cc @Laurae2
The text was updated successfully, but these errors were encountered: