Skip to content
Felix S. Klock II edited this page Jul 28, 2013 · 3 revisions

= Why Twobit is Slow =

Twobit is an order of magnitude slower than the fastest Scheme compilers: Chez Scheme, Ikarus, MzScheme.

Furthermore, before v0.96, Twobit took an outrageously long time to macro-expand large quasiquote forms (ticket:65) and to perform CSE and/or representation inference on certain large programs. These pathological misbehaviors were limited in v0.96. Though some of the underlying problems still remain, it looks like we can now begin to focus on the speed of Twobit on more typical inputs.

Overall Compilation Speed

The nboyer benchmark is 770 lines long, including a long block comment at the beginning. The compiler benchmark is 11693 dense lines of code. Here's how long it takes (in milliseconds) to load (or to compile) the nboyer and compiler benchmarks on the Solaris and Linux machines we have been using for benchmarking:


                                     nboyer                       compiler
                             --------------------           --------------------
                             Solaris        Linux           Solaris        Linux
Chez Scheme 6.1                 30                             960
MzScheme v371                   80 (avg)      30 (avg)        1790
Ikarus 0.0.2                                  33 (avg)                      1850
Larceny v0.96
  (global-optimization #f)     147           185              6350         16300
  (global-optimization #t)     345           287             16600         21000
Gambit  (compile to file)                   4730
Chicken (compile to file)                  19280                          500000

On our Linux machine, it looks as though Twobit compiles 500 to 2000 lines per second. That isn't terrible, but it isn't anything to brag about either.

Note that Twobit compiles about as fast on our relatively slow Solaris machine as on our Linux machine. That's a clue.

Timing Individual Phases

The figures above show that Twobit's global optimizations take a lot of time. Here's a little program that times initial segments of the compilation process:


; A hack for determining how much time is spent
; in each phase of Twobit.
;
; Works only when Twobit is exposed.

(define (time-all-phases filename)
  (t0 filename)
  (t1 filename)
  (t2 filename)
  (t3 filename)
  (t4 filename)
  (t5 filename)
  (t6 filename))

(define use (the-usual-syntactic-environment))

(define (time-phase filename name f)
  (run-benchmark
   name
   1
   (lambda ()
     (call-with-input-file filename
      (lambda (p)
        (do ((x (read p) (read p)))
            ((eof-object? x))
          (f x)))))
   (lambda (x) #t)))

(define (t0 filename)
  (time-phase filename "read" (lambda (x) x)))

(define (t1 filename)
  (time-phase filename "pass 1" (lambda (x) (pass1 x use))))

(define (t2 filename)
  (time-phase filename "pass 1&2" (lambda (x) (pass2 (pass1 x use)))))

(define (t3 filename)
  (time-phase filename
              "pass 1&2&3"
              (lambda (x) (pass3 (pass2 (pass1 x use))))))

(define (t4 filename)
  (time-phase filename
              "pass 1&2&3&4"
              (lambda (x)
                (pass4 (pass3 (pass2 (pass1 x use)))
                       $usual-integrable-procedures$))))

(define (t5 filename) (time-phase filename "compile" compile))

(define (t6 filename)
  (time-phase filename
              "compile&assemble"
              (lambda (x) (assemble (compile x)))))

Individual Phases in v0.96

Here are timings in milliseconds for individual phases in v0.96:


                                     nboyer                       compiler
                             --------------------           --------------------
                             Solaris        Linux           Solaris        Linux
read                             25            9               490          172
pass 1                           43           24              1744         1465
pass 2                            9            5               787         1158
pass 3                          233          130              9581        11801
pass 4                           18           11              1675         1875
assembly                         17           28              2366        27867
total                           345          307             14272        44431

The Solaris timings are roughly the same here as before, but the total time for Linux is more than twice as slow as before. That may be related to differences between the larceny and twobit heaps. The most likely cause is that the twobit heap uses a portable implementation of hashtables, and symbol-hash calls symbol->string to hash on the string instead of fetching a symbol's hash value directly from the symbol's data structure.

We need timing data for the larceny heap.

  • PnkFelix has written experimental code for timing the twobit passes even when they have been sealed off within the Larceny heap. It is available in [source:trunk/larceny_src/lib/Experimental/twobit-pass-times.sch twobit-pass-times.sch]. After loading the file, one can invoke (twobit-pass-times) to see a summary of time spent in various passes. Feel free to adopt the ideas within the code if you like. Questions about (and critiques of) the methodology it uses are also welcome.
Clone this wiki locally