Skip to content
Felix S Klock II edited this page Feb 13, 2014 · 2 revisions

Will wrote a set of micro-benchmarks to learn why Larceny's bignums are so slow.

Timings on a 1.5 GHz Sparc:


                    Larceny     Larceny     Chez        Scheme 48   MzScheme
                    v0.963      v0.96a5a1   6.1         1.7         v4.1
                                rev 5764

fixnum * fixnum     160 nsec    160 nsec    375 nsec    240 nsec      3 usec
bignum + fixnum      12 usec      8 usec    560 nsec      2 usec      2 usec
bignum * fixnum     144 usec     10 usec      1 usec      4 usec      5 usec
bignum * 2^32        42 usec      9 usec      1 usec      5 usec      5 usec
bignum * 2^64        64 usec     10 usec      1 usec      7 usec      5 usec
bignum * 2^128      111 usec     10 usec      1 usec     10 usec      5 usec
bignum * 2^256      202 usec     10 usec      1 usec     15 usec      5 usec
bignum + bignum      13 usec      8 usec    680 nsec      2 usec      2 usec
(expt 10 10)          1 usec      1 usec      2 usec      9 usec      8 usec
(expt 10 100)       300 usec     50 usec      6 usec     21 usec     22 usec
(expt 10 1000)       38 msec      3 msec    177 usec    680 usec    380 usec
(expt 10 10000)       4 sec     110 msec     19 msec     58 msec     15 msec
(expt 10 100000)    363 sec       4 sec       2 sec       6 sec     390 msec

Timings on a 2.8 GHz Pentium:


                    Larceny     Larceny     Petite      MzScheme    Ikarus      Gambit
                    v0.963      v0.965a1    Chez 7.3    v370        0.0.3       v4.1.0
                                rev 5764

fixnum * fixnum     121 nsec    112 nsec    413 nsec    873 nsec     96 nsec      2 usec
bignum + fixnum       4 usec      4 usec    444 nsec    536 nsec     76 nsec      2 usec
bignum * fixnum      60 usec     12 usec    560 nsec    920 nsec     96 nsec      3 usec
bignum * 2^32        16 usec     10 usec    568 nsec    964 nsec    159 nsec      2 usec
bignum * 2^64        25 usec     11 usec    572 nsec    944 nsec    190 nsec      2 usec
bignum * 2^128       41 usec     11 usec    600 nsec      1 usec    253 nsec      2 usec
bignum * 2^256       74 usec     11 usec    600 nsec      1 usec    352 nsec      3 usec
bignum + bignum       4 usec      4 usec    500 nsec    552 nsec    131 nsec      2 usec
(expt 10 10)        824 nsec    960 nsec    641 nsec      1 usec    712 nsec      2 usec
(expt 10 100)       130 usec     53 usec      2 usec      4 usec      2 usec      6 usec
(expt 10 1000)       16 msec      2 msec     40 usec    356 usec     18 usec    112 usec
(expt 10 10000)       2 sec      80 msec      3 msec      1 msec    840 usec      5 msec
(expt 10 100000)    153 sec       3 sec     344 msec     40 msec     24 msec     52 msec

Except for Larceny, Chez, Ikarus, and MzScheme (Pentium only), these timings are for interpreted benchmarks. Interpreted code often performs reasonably well on bignum benchmarks, so I didn't bother to compile for Gambit.

These numbers show that:

  • Larceny takes a long time to get into the bignum code.
    • Larceny traps first to millicode, then to C, then back to Scheme.
    • To get into the sub-microsecond range, we need millicode that calls Scheme without going through C.
  • Multiplying a bignum by a fixnum was way slow.
    • Rewriting the inner loop made this case go 5 to 15 times as fast.
    • Another factor of 2 to 4 could be obtained using 1616 or 3232 multiplication.
  • Larceny wasn't recognizing shifts.
    • Another factor of 2 to 4 could be obtained using 16-bit or 32-bit loads/stores.
  • Larceny was using the classical algorithm for multiplication.
    • Larceny now uses Karatsuba's algorithm for large bignums.
    • Another factor of 2 to 4 could be obtained using 16-bit or 32-bit operations.
    • For really large bignums, convolution-based multiplication is fastest.

The easiest and most generally applicable improvements would come from bypassing C and using more 16-bit operations.

See WhyMillicodeIsSlow.

Clone this wiki locally