diff --git a/doc/optimization.pdf b/doc/optimization.pdf new file mode 100644 index 00000000..e13a6f60 Binary files /dev/null and b/doc/optimization.pdf differ diff --git a/doc/optimization.txt b/doc/optimization.txt new file mode 100755 index 00000000..e0b05321 --- /dev/null +++ b/doc/optimization.txt @@ -0,0 +1,192 @@ + Optimization in a Compiler for Stack-based Lisp Machines + + L. P. Deutsch and L. M. Masinter + +I. INTRODUCTION + +This paper describes the optimization phase of a compiler for translating the INTERLISP [Teitelman et al., 1978] dialect of LISP into stack-architecture instruction sets. [Deutsch, IJCAI] We discuss the general organization of the compiler, and then the set of optimization techniques we have found most useful. The compiler and optimization phase is machine independent, in that it generates a stream of instructions for an abstract stack machine, which is subsequently turned into the actual machine instructions by an assembler. The compiler has been in successful use for several years, producing code both for an 8-bit Lisp instruction sets for the Alto [Deutsch, 1978], and a 9-bit instruction set [Fiala, 19??]. + +There are always tradeoffs in designing a compiler. Each additional optimization usually increases the running time of the compiler as well as its complexity. The improvement in the code generated must be weighed against the benefit gained, measured by the amount of code improvement weighted by the frequency that the optimization is made. Without providing a multiplicity of compiler controls (which most users do not want to know about), the compiler designer must use empirical knowledge of "average" user programs in order to make the appropriate design choices. One of the major purposes of this paper is to publish some empirical results of the relative utility of different code transformations when compiling a large set of programs. + +Why this compiler is different + +Compiling LISP for a stack based architecture differs from compiling other languages such as PASCAL [P-code paper] or ALGOL [B5000 compiler] for several reasons. Procedures are independently compiled, so that global optimization techniques are not relevant. Compiling for a stack-based instruction set is different from compiling for more conventional machine architectures, in that register allocation is not relevent, and randomly addressable compiler-generated temporary variables other than top-of-stack are difficult to access. + +II. ABOUT THE COMPILER AND THE OBJECT LANGUAGE + +The compiler operates in several passes. The first pass takes the S-expression definition of the function being compiled, and walks down it recursively, generating a simple intermediate code, called ByteLap, analogous to assembly code. During this first pass, the compiler expands all macros, CLISP [Teitelman, 19??], record accesses and iterative statement. A few optimizations are performed during pass one, but most of the optimization work is saved for later. +The next pass of the compiler is a "post-optimization" phase, which performs transformations on the ByteLap to improve it. Optimizations are tried repeatedly, until no further improvement is possible. + +After the post-optimization phase is done, the results are passed to an assembler, which transforms the ByteLap into the actual machine instructions. +We currently have in use two different assemblers which generate code for two different instruction sets: one for the Maxc 9-bit instruction set [fiala maxc article] and one for the Alto/Dorado 8-bit instruction set[Deutsch, 1980]. (These are considerably different implementation systems; for example, the Maxc system employs shallow variable binding, while the Alto/Dorado system employs deep binding). The translation from ByteLap to machine code is straightforward. + +THE STRUCTURE OF BYTELAP + +The ByteLap intermediate code generated by the compiler can be viewed as the instruction set for an abstract stack machine. There are n opcodes, each of which has some effect on the state of the linear temporary value stack. The format of ByteLap is described here to simplify subsequent discussion of optimizations. +The instruction set has only NN opcodes, as follows: + +[variable] Push the value of the given variable on the stack +SETQ[variable] Store the top of the stack into the variable [var] +POP pop the top of the stack +COPY duplicate the top of the stack +'constant Push the given constant on the stack +JUMP tag jump to the indicated location +FJUMP tag jump to the indicated location if top-of-stack is NIL, + otherwise continue. In either case, pop the stack +TJUMP tag Similar to FJUMP, but jumps if top-of-stack is non-NIL. +NTJUMP tag Similar to TJUMP, but does not pop if it jumps. This is useful when a value is tested and then subsequently used. +NFJUMP tag Analogous to NTJUMP. +FN(n)function Call the indicated function with n arguments. +BIND[v1,...,vn;n1...nn] Bind the variables v1...vn to the n values on the top of the stack. (for internal PROG and LAMBDAs). Also bind the variables n1...nk to NIL. Remember current stack location. +DUNBIND Unwind the stack to the + +Note that a given ByteLap opcode could have one of several different translations in the actual code executed. For example, both the Dorado and Maxc implementation have a separate opcode 'NIL for pushing nil, from the more general constant opcode. In addition, all operations such as arithmetic or CAR are encoded as FN calls; the assembler distinguishes between the built-in operations and those that must actually perform external calls. Furthermore, a sequence of bytelap instructions can assemble into a single instruction; for example, both instruction sets have instructions which can do a SETQ and a POP in the same instruction. + +III. OPTIMIZATIONS PERFORMED + +One of the most important ground rules for the optimization phase has been that all optimizations are conservative: must not increase code size or running time, and must decrease one of them or else make possible further optimizations. +Only optimizations which experience has shown to be useful are described here. + + optimizations during code generation phase +code generated is singly linked list + + Eff, Retf, (no Ncf) (eliminate CONST, CAR, VAR in EFF +(unchecked difference between compiled and interpreted: VAR not +cause error) + + Recursion removal (not inside frame with SPECVARS if code +outside). + + no POP in RETF'd PROGN or PROG. + + NIL in spread call removed (compiler assumes FNTYP of callee +known at compile time). + + constant expansion (in tests, etc): not so much for user code, +but in program generated programs (simple macro expansions, etc), +requires less work to generate reasonable macros. + + Arithmetic functions: merge constants. + + Commutative functions (push constants at end. +Improves other +optimizations (e.g. +jump copy) and reduces stack depth). + + Special optimization of expansion of mapping functions (MAPC +etc). + + post optimizations +Use doubly linked list + + [x][x] -> [x][copy], [setq x][x] -> [setq x][copy]: reduces +references to x (possible dead variable, bind removal) possible +efficiency improvement. + [x][pop][x] -> [x]: same, also reduces code size. + + [setq x] where x is localvar and not used, or else where x is +specvar bound, but no subsequent function calls which might "see" +its binding, can be removed. + + [value][pop] delete - smaller, faster + + [pop][dunbind] -> [dunbind] - smaller, faster + + [const][unbind] -> [dunbind][const] - enables further +optimizations. + + + +Jump optimziations + + jumpthru (jump to jump, jump to return) (very frequent) + + limited versions of scan optimizations across jumps, e.g. +[var] [jump a] a: var -> [var] [ntjump b] b: + + [setq x] [pop] [jump a] a: [x] -> [setq x] [jump b] b: . +(smaller code size, faster. +Note common in loop expansions, e.g. +(PROG NIL LP (test x) ... +(SETQ X (fn x)) (GO LP)) compiles as +lp: [x] [test] ... +[x] [fn] [setq x] [pop][jump lp] -> +lp: [x] lp1: [test] ... +[x][fn][setq x][jump lp1] + + Delete code after tags which are no longer referenced and +which are otherwise dead (e.g. +because some other optimization +eliminated them.) + + Jump .+1 eliminated (often because of other optimizations). + + Commonback (quite frequent, and very important. +any code [jump +a], [same code] a: -> [jump b] b: [same code] a. + + Jump merge inline .... +[jump a] [jump ?] a: ... +-> +a: ....... +can only do if can find end of a code (a jump or +whatever). + + [fjump a] [jump b] a: -> [tjump b] + + [x] [tjump a] [x] a: [x] -> [x] [copy] [tjump a] +(e.g. +selectq, [x] 'a [eq] [fjump no] [x] no: [x] [stringp] ... +-> [x] [copy] 'a [eq] [fjump no1] ... +no1: 'b ... + + return optimizations + + return merging + + POP [value] RETURN : delete POP. +(done after scanopt). +Also +DUNBIND, UNBIND (except when [value] is var which was specvar and +also bound (only because of shallow-binding system).. + + + Eliminate unused variables and frames (for macro expansion). + + Make vars LOCALVARS when no functions called + + + +CONCLUSIONS + +Without optimization, the compiler could be quite simple. + Non-deterministic algorithm of code transformations. +As with +other "hill-climbing" algorithms, can perform one transformation +which prevents other (possible better) transformations from +occurring. + + Peephole optimization easier than in cases where jumps are not +explicit. +Simple intermediate language pays off. +Can yield better +optimzations than source transformations because of ambiguity of +source definitions (Progn ... +(FOO A B C)) = (FOO (PROGN ... +A) B C +D) in compiled code. + + + + +BIBLIOGRAPHY + + Pascal/P-code optimizers. + + Smalltalk/Mesa compilers? + + B5000 series compilers? + + Other lisp compilers [Rabbit] + + McCarthy etc? compilers?