-
Notifications
You must be signed in to change notification settings - Fork 32
QuasiquoteIsSlow
This page is to document PnkFelix's experiments with Larceny to determine why we are seeing apparently quadratic time to expand quasiquote expressions.
First, the way we'll generate test expressions to expand.
;; build-list : Nat (Nat -> X) -> [Listof X]
;; Standard list c'tor (by folding n)
(define (build-list n f)
(reverse
(let rec ((n n))
(if (zero? n)
'()
(cons (f (- n 1)) (rec (- n 1)))))))
;; make-qq-exp : Nat Sym Sym -> Sexp
;; Constructs sexp w/ head qq and sz elems, each prefixed by uq
(define (make-qq-exp sz qq uq)
(list qq (build-list sz (lambda (x) (list uq (+ x 1))))))
(equal? (make-qq-exp 5 'quasiquote 'quote)
(quote `('1 '2 '3 '4 '5)))
(equal? (make-qq-exp 5 'quote 'unquote)
(quote '(,1 ,2 ,3 ,4 ,5)))
-
m-expand
of a quasiquote doessyntactic-lookup
, finds the denotation class for quasiquote ismacro
, and invokesm-macro
-
m-macro
callsm-transcribe
, passing a continuation that invokesm-expand
. -
m-transcribe
expands the quasiquote, creating a new expression with the hea keyword.descend-quasiquote
, which has a denotation class ofmacro
; this is then passed (by the above step) tom-expand
, and the cycle continues.
Therefore, it seems like directly iterating the right number of steps m-transcribe
should be equivalent to calling m-macro
. However, the calls to m-macro
are definitely slower. So what's happening in between? The only thing PnkFelix can identify is that m-expand
is, in the interim, invoking syntactic-lookup
and denotation-class
...
PnkFelix just realized that (the limit of) the below transcribe-steps
procedure is not a faithful reproduction of m-expand
. In particular, calling m-transcribe
many times in a row is evaluating much more quickly than the invocation sequence that m-macro
is using...
To test this theory, PnkFelix developed the following function:
;; transcribe-steps : Nat -> (Sexp StxEnv -> Sexp) & Nat (Sexp StxEnv -> X) -> X
;; Takes exp through n transcription steps.
;; N.B.: m-transcribe will die if n is too large.
(define (transcribe-steps n . rest)
(lambda (exp env)
(let loop ((n n) (exp exp) (env env))
(cond ((zero? n) (if (null? rest) exp ((car rest) exp env)))
(else (m-transcribe exp env (lambda (exp2 env2)
(loop (- n 1) exp2 env2))))))))
(This is a conservative extension of the transcribe-steps
procedure defined below; it now takes an optional continuation argument so that we can try to do useful things with the generated syntax environment at the end of the transcription.)
So now we can check that iterating this transcriber is producing answers that are consistent with those produced by m-expand
, if we also include one last call to m-expand
to remove those pesky suffixes on the identifiers:
Here is some informal evidence that the two are equivalent:
> (equal? (let ((expand (transcribe-steps 49 m-expand)))
(expand (make-qq-exp 5 'quasiquote 'unquote)
(the-usual-syntactic-environment)))
(m-expand (make-qq-exp 5 'quasiquote 'unquote)
(the-usual-syntactic-environment)))
#t
> (equal? (let ((expand (transcribe-steps 4505 m-expand)))
(expand (make-qq-exp 500 'quasiquote 'unquote)
(the-usual-syntactic-environment)))
(m-expand (make-qq-exp 500 'quasiquote 'unquote)
(the-usual-syntactic-environment)))
#t
>
Now PnkFelix has tried timing the two approaches, and has found that transcribe-steps 4505 m-expand)
produces a slow expander (6304 ms for N=500), but (transcribe-steps 4505)
produces a fast one (720 ms for N=500). So there's still something going on with that call to m-expand
, even after the expression has been through the maximal number of transcription steps.
One can verify this by doing the following:
> (time-thunk (lambda () (let ((expand (transcribe-steps 4505 (lambda (exp env) (set! save-exp exp) (set! save-env env)))))
(expand (make-qq-exp 500 'quasiquote 'unquote)
(the-usual-syntactic-environment)))))
353
> (time-thunk (lambda () (m-expand save-exp save-env)))
5474
Manual inspection of save-exp
confirms that it is an expanded form of the quasiquote, an expanded one with many many invocations of several freshly generated aliases of cons e.g. .cons|82401
, .cons|82396
, etc. Also, (length save-env)
produces 5548.
So perhaps the problem is all the lookups we need to do of the aliases of cons? In particular, the syntax environment is represented by a simple association list. Perhaps we should be using a hashtable?
To see how the above thoughts developed, you can read the notes below.
At the heart of twobit-expand
is the m-transcribe
procedure.
It is written in continuation passing style; we can transcribe for a fixed
number of steps by using the following procedure:
;; transcribe-steps : Nat -> (Sexp StxEnv -> Sexp)
;; Takes exp through n transcription steps.
;; N.B.: m-transcribe will die if n is too large.
(define (transcribe-steps n)
(lambda (exp env)
(let loop ((n n) (exp exp) (env env))
(cond ((zero? n) exp)
(else (m-transcribe exp env (lambda (exp2 env2)
(loop (- n 1) exp2 env2))))))))
Just to illustrate, here is how we take a quasiquote through 8 transcription steps: *
((transcribe-steps 8) '`(,1 ,2 ,3 ,4 ,5)
(the-usual-syntactic-environment))
- which produces:
(.interpret-continuation|98919
(1
(,3 ,4 ,5)
(,2 ,3 ,4 ,5)
()
(2 .unquote|98915 1 (,1 ,2 ,3 ,4 ,5) (0)))
.unquote|98919
2)
Based on direct experimentation with transcribe-steps
and make-qq-exp
, PnkFelix hypothesizes the following:
- Fully expanding the Sexp produced by
(make-qq-exp N 'quasiquote 'unquote)
requires 9*N
+5 transcription steps. - Fully expanding the Sexp produced by
(make-qq-exp N 'quasiquote 'quote)
requires 15*N
+5 transcription steps.
PnkFelix wanted to time individual transcription steps.
For this, we'll grab the elapsed time from Larceny's memstats
function.
Note that if you wnt to get the system time or the user time, just
change the car
below accordingly.
;; (() -> Any) -> Nat
;; Invokes thunk, discards result, returns running time.
(define (time-thunk thunk)
(let* ((s1 (memstats))
(r (thunk))
(s2 (memstats))
(map- (lambda (f x y) (- (f x) (f y)))))
(car (map (lambda (f) (map- f s2 s1))
(list memstats-elapsed-time
memstats-system-time
memstats-user-time)))))
PnkFelix wanted to control whether garbage collection left over from previous computations would be included in the measurement, so there's a global variable to control whether we're going to explicit collect before each timing (each such invocation is expensive in overall experiment time, so that's why there's a way to turn the behavior off.)
(define *gc-between-timings* #f) ;; Boolean
Finally, here is the code to actually gather the times for each transcription step.
;; Nat -> (Sexp StxEnv -> [Listof Nat])
;; Takes exp through n transcription steps, timing each step. Discards
;; expansion result, and returns a list of the measured times.
(define (time-transcribe-steps n)
(lambda (exp env)
(let loop ((n n) (exp exp) (env env))
(let ((next-exp (undefined))
(next-env (undefined)))
(cond ((zero? n) '())
(else
(cons
(begin (if *gc-between-timings* (collect))
(time-thunk
(lambda ()
(m-transcribe
exp env
(lambda (exp2 env2)
(set! next-exp exp2)
(set! next-env env2))))))
(loop (- n 1) next-exp next-env ))))))))
The code for time-transcribe-steps
is a lot uglier than for transcribe-steps
because PnkFelix had to explicitly break the continuation
invocation... he supposes there might be a way to use call/cc
to get the
same effect in a slightly cleaner fashion, but this was the obvious way to do it in his mind.