-
Notifications
You must be signed in to change notification settings - Fork 32
TwobitIsSlow
= Why Twobit is Slow =
Twobit is an order of magnitude slower than the fastest Scheme compilers: Chez Scheme, Ikarus, MzScheme.
Furthermore, before v0.96, Twobit took an outrageously long time to macro-expand large quasiquote forms (ticket:65) and to perform CSE and/or representation inference on certain large programs. These pathological misbehaviors were limited in v0.96. Though some of the underlying problems still remain, it looks like we can now begin to focus on the speed of Twobit on more typical inputs.
The nboyer
benchmark is 770 lines long, including a long block comment at the beginning. The compiler
benchmark is 11693 dense lines of code. Here's how long it takes (in milliseconds) to load (or to compile) the nboyer
and compiler
benchmarks on the Solaris and Linux machines we have been using for benchmarking:
nboyer compiler
-------------------- --------------------
Solaris Linux Solaris Linux
Chez Scheme 6.1 30 960
MzScheme v371 80 (avg) 30 (avg) 1790
Ikarus 0.0.2 33 (avg) 1850
Larceny v0.96
(global-optimization #f) 147 185 6350 16300
(global-optimization #t) 345 287 16600 21000
Gambit (compile to file) 4730
Chicken (compile to file) 19280 500000
On our Linux machine, it looks as though Twobit compiles 500 to 2000 lines per second. That isn't terrible, but it isn't anything to brag about either.
Note that Twobit compiles about as fast on our relatively slow Solaris machine as on our Linux machine. That's a clue.
The figures above show that Twobit's global optimizations take a lot of time. Here's a little program that times initial segments of the compilation process:
; A hack for determining how much time is spent
; in each phase of Twobit.
;
; Works only when Twobit is exposed.
(define (time-all-phases filename)
(t0 filename)
(t1 filename)
(t2 filename)
(t3 filename)
(t4 filename)
(t5 filename)
(t6 filename))
(define use (the-usual-syntactic-environment))
(define (time-phase filename name f)
(run-benchmark
name
1
(lambda ()
(call-with-input-file filename
(lambda (p)
(do ((x (read p) (read p)))
((eof-object? x))
(f x)))))
(lambda (x) #t)))
(define (t0 filename)
(time-phase filename "read" (lambda (x) x)))
(define (t1 filename)
(time-phase filename "pass 1" (lambda (x) (pass1 x use))))
(define (t2 filename)
(time-phase filename "pass 1&2" (lambda (x) (pass2 (pass1 x use)))))
(define (t3 filename)
(time-phase filename
"pass 1&2&3"
(lambda (x) (pass3 (pass2 (pass1 x use))))))
(define (t4 filename)
(time-phase filename
"pass 1&2&3&4"
(lambda (x)
(pass4 (pass3 (pass2 (pass1 x use)))
$usual-integrable-procedures$))))
(define (t5 filename) (time-phase filename "compile" compile))
(define (t6 filename)
(time-phase filename
"compile&assemble"
(lambda (x) (assemble (compile x)))))
Here are timings in milliseconds for individual phases in v0.96:
nboyer compiler
-------------------- --------------------
Solaris Linux Solaris Linux
read 25 9 490 172
pass 1 43 24 1744 1465
pass 2 9 5 787 1158
pass 3 233 130 9581 11801
pass 4 18 11 1675 1875
assembly 17 28 2366 27867
total 345 307 14272 44431
The Solaris timings are roughly the same here as before, but the total time for Linux is more than twice as slow as before. That may be related to differences between the larceny
and twobit
heaps. The most likely cause is that the twobit
heap uses a portable implementation of hashtables, and symbol-hash
calls symbol->string
to hash on the string instead of fetching a symbol's hash value directly from the symbol's data structure.
We need timing data for the larceny
heap.
- PnkFelix has written experimental code for timing the twobit passes even when they have been sealed off within the Larceny heap. It is available in [source:trunk/larceny_src/lib/Experimental/twobit-pass-times.sch twobit-pass-times.sch]. After loading the file, one can invoke
(twobit-pass-times)
to see a summary of time spent in various passes. Feel free to adopt the ideas within the code if you like. Questions about (and critiques of) the methodology it uses are also welcome.