Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Posit log & log1p native implementations #244

Open
touisteur opened this issue Aug 28, 2021 · 16 comments
Open

Posit log & log1p native implementations #244

touisteur opened this issue Aug 28, 2021 · 16 comments
Assignees

Comments

@touisteur
Copy link

Hi,

I was looking for a posit implementation of the log and log1p functions, actually trying to benchmark how a posit32 (2,30?) would fare (precision and perf-wise) on a gpu or cpu (compared to a float binary32) - also trying to use (otherwise unused) int32 resources in a CUDA kernel.

Reading the code on elementary functions, they seem implemented as 'call std::log and convert to posit'? Did I miss anything?

If not present, is there a canonical posit log/log1p implementation I could start from?

Thanks in advance,
Lionel

@Ravenwater Ravenwater self-assigned this Aug 30, 2021
@Ravenwater
Copy link
Contributor

Ravenwater commented Aug 30, 2021

@touisteur indeed, the elementary functions in Universal do not yet use native implementations. However, extensive research has been done on what is required for correctly rounded math libraries. In particular, Jay Lam and Santosh Nagarakatte have worked out the Minefield method that John Gustafson proposed for posits. Here is the research page: https://people.cs.rutgers.edu/~sn349/rlibm/

Their RLibm contains a perfectly rounded elementary function library for posit<32,2>. That would be the best place to start.

If you would be interested in implementing the log() and log1p() methods for posits in Universal that would be most appreciated. as we have been looking for a subject matter expert to get the math library work in Universal started. The task is non-trivial, as the Minefield method is dependent on the specific number system configuration and the sampling of the Reals it represents. Otherwise stated, to provide a correctly rounded math library for a parameterized number system that can represent thousands of number systems is an open research question. But we have PoCs for specific posit configurations, so it is not unreasonable to get started there and work our way into solving the meta problem.

Secondly, Jay's methodology for synthesizing the coefficients for the Minefield polynomial is not feasible for posits with nbits > 32. This creates a problem for creating an Oracle number system and associated math library that could be used as a baseline. So that is an open question as well.

Love to hear your thoughts on the above.

@Ravenwater
Copy link
Contributor

Ravenwater commented Aug 31, 2021

@touisteur just released v3.34 which has a unified mathlib shim among fixpnt, cfloat, and posit.

While adding the fixpnt shim, the pattern of a general math lib shim that uses the compiler's math library and marshalls values through IEEE-754 double is now clear. The nice side-effect of this pattern is that bring-up of a new number system, we only need to implement two attributes: operator=() to map from IEEE-754 to the encoding, and double() to synthesize the IEEE-754 value from an encoding.

Having this general shim layer will also reduce the amount of code we need to manage. So we have a cool architectural task to design the general mathlib shim to offer the fallback functionality for 'small' values that are representable by double-precision IEEE-754, and an overload to a native implementation if the function is available.

Would that be a task that interests you?

@touisteur
Copy link
Author

Hi @Ravenwater, very nice of you to answer so quickly and with such details. I'm (sadly) not a subject matter expert and feel I wouldn't do justice to the task. I'll probably revisit the subject in 2022. In the meantime, I'll take a look at rlibm & the published paper, though, for inspiration.

@Ravenwater
Copy link
Contributor

Ravenwater commented Aug 31, 2021

@touisteur thanks for the update. I'll add the mathlib shim on the todo list, and I will use your (this) issue to trigger the first step of implementing a native log() implementation.

Definitely check out Jay's work as it is a very cool and state-of-the-art approach to solve the table makers' dilemma.

http://perso.ens-lyon.fr/jean-michel.muller/Intro-to-TMD.htm

@touisteur
Copy link
Author

touisteur commented Sep 1, 2021

Heh I did try my hand with sollya for a similar topic (I want to take into account some information I have about the input of the log or log1p, in fact it's more log(x/y) and x and y are bounded...) but i had to admit defeat. Either I'm not quite competent enough, or these kinds of tools are pretty hard to use. Papers on the subject are fascinating though and I'm quite impressed we're still making so much progress still...

@Ravenwater
Copy link
Contributor

@touisteur I have collected a set of references that we can compare and contrast:

gcc's libm's implementation: https://code.woboq.org/userspace/glibc/sysdeps/ieee754/dbl-64/e_log.c.html
musl: https://git.musl-libc.org/cgit/musl/tree/src/math/log.c
fdlibm: https://www.netlib.org/fdlibm/e_log.c
rlibm: https://github.com/rutgers-apl/rlibm-32/blob/main/source/posit32/log.c

I'll ping Jay and Santosh to see if they have a polynomial fitted for 8-bit and 16-bit posits. I am very interested to explore the differences of the polynomials that correctly round 8, 16, 24, 32, 40, 48 bit posits. I have not yet seen an analysis that studies the structure of those differences, so maybe we can drive that in this experiment.

@touisteur
Copy link
Author

touisteur commented Sep 4, 2021

Oh thanks! I had no idea rlibm had a posit32 implementation. I need to dive in and check how it fares against float and double logs from libcs...

Edit: seems based on softposit and seems quite 'easy' to use and read. I need to find a keyboard+pc after the weekend and try that. Didn't find a log1p but that might not be that far away. Also need to check whether it's using ints larger than int32 underneath. Thanks!

@Ravenwater
Copy link
Contributor

Ravenwater commented Sep 4, 2021

Well, not quite: it still uses doubles for compute, and maps back to posit on the final conversion.

posit32_t rlibm_log(posit32_t x) {
    if (x.v >= 0x80000000) {
        return castP32(0x80000000);
    } else if (x.v == 0) {
        return castP32(0x80000000);
    } else if (x.v == 0x40000000) {
        return castP32(0x0);
    }
    
    doubleX fix, fit, dX;
    fix.d = convertP32ToDouble(x);

    int m = fix.x >> 52lu;
    m -= 1023lu;
    fix.x &= 0xFFFFFFFFFFFFFlu;
    fix.x |= 0x3FF0000000000000lu;
    
    fit.x = fix.x & 0xFE00000000000lu;
    int FIndex = fit.x >> 45lu;
    fit.x |= 0x3FF0000000000000lu;
    
    dX.d = fix.d - fit.d;
    dX.d *= oneByF[FIndex];
    
    // Figure out index. 7 bits are the same. 64 - 18 = 46
    unsigned long index = (dX.x & 0x01FFFFFFFFFFFFFFlu) >> 46;
    const double* coeff = logcoeffs[index];
    
    // Compute polynomial
    double y = coeff[3];
    y *= dX.d;
    y += coeff[2];
    y *= dX.d;
    y += coeff[1];
    y *= dX.d;
    y += coeff[0];
    y *= dX.d;
    
    // Output compensation
    y += m * LN2LOW;
    y += lnOneDotF[FIndex];
    y += m * LN2HIGH;
    return convertDoubleToP32(y);
}

what is different is the polynomial: just three terms.

Universal has native posit sqrt, and for coarse posit configurations, the approximation algorithms don't work as they assume a lot finer sampling of the continuum. That implies that there is a HUGE design space for polynomials and appropriate sampling resolution.

@touisteur
Copy link
Author

Oh yeah I just re-read the code and fell on that double accumulator, errrr.

@touisteur
Copy link
Author

Well, not quite: it still uses doubles for compute, and maps back to posit on the final conversion.

what is different is the polynomial: just three terms.

Universal has native posit sqrt, and for coarse posit configurations, the approximation algorithms don't work as they assume a lot finer sampling of the continuum. That implies that there is a HUGE design space for polynomials and appropriate sampling resolution.

So I gather it's done with double because coefficients in posit32 wouldn't 'do the job'. I wonder how far we'd be.

@Ravenwater
Copy link
Contributor

that is the $64000 question. I just pinged the FPbench.org community with that question. There are several research groups (Rutgers, Utah, and UWash) that are working on math function library precision, so I am hoping that they have some inkling of what the issues are.

@touisteur
Copy link
Author

touisteur commented Sep 5, 2021

I am probably going to end up trying fpm for log/log1p computation. I know it's fixed point, but it allows defining 'where' the 'fixed point' is. So if I know the dynamic of the input (I do) I can probably place my 'point' where it's giving enough accuracy... Will report. Not soon, though.

edit https://mikelankamp.github.io/fpm/

@Ravenwater
Copy link
Contributor

@touisteur took a look at fpm. If I implement the log/exp functions for you, would you consider using Universal instead of fpm?

@touisteur
Copy link
Author

touisteur commented Sep 6, 2021 via email

@Ravenwater
Copy link
Contributor

Cool, I'll put it on my docket for this week. I'll report here on the progress.

@touisteur
Copy link
Author

touisteur commented Sep 7, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Backlog
Development

No branches or pull requests

2 participants