Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(Local) Optimization in a Compiler for Stack-based Lisp Machines. #2

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added doc/optimization.pdf
Binary file not shown.
192 changes: 192 additions & 0 deletions doc/optimization.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,192 @@
Optimization in a Compiler for Stack-based Lisp Machines

L. P. Deutsch and L. M. Masinter

I. INTRODUCTION

This paper describes the optimization phase of a compiler for translating the INTERLISP [Teitelman et al., 1978] dialect of LISP into stack-architecture instruction sets. [Deutsch, IJCAI] We discuss the general organization of the compiler, and then the set of optimization techniques we have found most useful. The compiler and optimization phase is machine independent, in that it generates a stream of instructions for an abstract stack machine, which is subsequently turned into the actual machine instructions by an assembler. The compiler has been in successful use for several years, producing code both for an 8-bit Lisp instruction sets for the Alto [Deutsch, 1978], and a 9-bit instruction set [Fiala, 19??].

There are always tradeoffs in designing a compiler. Each additional optimization usually increases the running time of the compiler as well as its complexity. The improvement in the code generated must be weighed against the benefit gained, measured by the amount of code improvement weighted by the frequency that the optimization is made. Without providing a multiplicity of compiler controls (which most users do not want to know about), the compiler designer must use empirical knowledge of "average" user programs in order to make the appropriate design choices. One of the major purposes of this paper is to publish some empirical results of the relative utility of different code transformations when compiling a large set of programs.

Why this compiler is different

Compiling LISP for a stack based architecture differs from compiling other languages such as PASCAL [P-code paper] or ALGOL [B5000 compiler] for several reasons. Procedures are independently compiled, so that global optimization techniques are not relevant. Compiling for a stack-based instruction set is different from compiling for more conventional machine architectures, in that register allocation is not relevent, and randomly addressable compiler-generated temporary variables other than top-of-stack are difficult to access.

II. ABOUT THE COMPILER AND THE OBJECT LANGUAGE

The compiler operates in several passes. The first pass takes the S-expression definition of the function being compiled, and walks down it recursively, generating a simple intermediate code, called ByteLap, analogous to assembly code. During this first pass, the compiler expands all macros, CLISP [Teitelman, 19??], record accesses and iterative statement. A few optimizations are performed during pass one, but most of the optimization work is saved for later.
The next pass of the compiler is a "post-optimization" phase, which performs transformations on the ByteLap to improve it. Optimizations are tried repeatedly, until no further improvement is possible.

After the post-optimization phase is done, the results are passed to an assembler, which transforms the ByteLap into the actual machine instructions.
We currently have in use two different assemblers which generate code for two different instruction sets: one for the Maxc 9-bit instruction set [fiala maxc article] and one for the Alto/Dorado 8-bit instruction set[Deutsch, 1980]. (These are considerably different implementation systems; for example, the Maxc system employs shallow variable binding, while the Alto/Dorado system employs deep binding). The translation from ByteLap to machine code is straightforward.

THE STRUCTURE OF BYTELAP

The ByteLap intermediate code generated by the compiler can be viewed as the instruction set for an abstract stack machine. There are n opcodes, each of which has some effect on the state of the linear temporary value stack. The format of ByteLap is described here to simplify subsequent discussion of optimizations.
The instruction set has only NN opcodes, as follows:

[variable] Push the value of the given variable on the stack
SETQ[variable] Store the top of the stack into the variable [var]
POP pop the top of the stack
COPY duplicate the top of the stack
'constant Push the given constant on the stack
JUMP tag jump to the indicated location
FJUMP tag jump to the indicated location if top-of-stack is NIL,
otherwise continue. In either case, pop the stack
TJUMP tag Similar to FJUMP, but jumps if top-of-stack is non-NIL.
NTJUMP tag Similar to TJUMP, but does not pop if it jumps. This is useful when a value is tested and then subsequently used.
NFJUMP tag Analogous to NTJUMP.
FN(n)function Call the indicated function with n arguments.
BIND[v1,...,vn;n1...nn] Bind the variables v1...vn to the n values on the top of the stack. (for internal PROG and LAMBDAs). Also bind the variables n1...nk to NIL. Remember current stack location.
DUNBIND Unwind the stack to the

Note that a given ByteLap opcode could have one of several different translations in the actual code executed. For example, both the Dorado and Maxc implementation have a separate opcode 'NIL for pushing nil, from the more general constant opcode. In addition, all operations such as arithmetic or CAR are encoded as FN calls; the assembler distinguishes between the built-in operations and those that must actually perform external calls. Furthermore, a sequence of bytelap instructions can assemble into a single instruction; for example, both instruction sets have instructions which can do a SETQ and a POP in the same instruction.

III. OPTIMIZATIONS PERFORMED

One of the most important ground rules for the optimization phase has been that all optimizations are conservative: must not increase code size or running time, and must decrease one of them or else make possible further optimizations.
Only optimizations which experience has shown to be useful are described here.

optimizations during code generation phase
code generated is singly linked list

Eff, Retf, (no Ncf) (eliminate CONST, CAR, VAR in EFF
(unchecked difference between compiled and interpreted: VAR not
cause error)

Recursion removal (not inside frame with SPECVARS if code
outside).

no POP in RETF'd PROGN or PROG.

NIL in spread call removed (compiler assumes FNTYP of callee
known at compile time).

constant expansion (in tests, etc): not so much for user code,
but in program generated programs (simple macro expansions, etc),
requires less work to generate reasonable macros.

Arithmetic functions: merge constants.

Commutative functions (push constants at end.
Improves other
optimizations (e.g.
jump copy) and reduces stack depth).

Special optimization of expansion of mapping functions (MAPC
etc).

post optimizations
Use doubly linked list

[x][x] -> [x][copy], [setq x][x] -> [setq x][copy]: reduces
references to x (possible dead variable, bind removal) possible
efficiency improvement.
[x][pop][x] -> [x]: same, also reduces code size.

[setq x] where x is localvar and not used, or else where x is
specvar bound, but no subsequent function calls which might "see"
its binding, can be removed.

[value][pop] delete - smaller, faster

[pop][dunbind] -> [dunbind] - smaller, faster

[const][unbind] -> [dunbind][const] - enables further
optimizations.



Jump optimziations

jumpthru (jump to jump, jump to return) (very frequent)

limited versions of scan optimizations across jumps, e.g.
[var] [jump a] a: var -> [var] [ntjump b] b:

[setq x] [pop] [jump a] a: [x] -> [setq x] [jump b] b: .
(smaller code size, faster.
Note common in loop expansions, e.g.
(PROG NIL LP (test x) ...
(SETQ X (fn x)) (GO LP)) compiles as
lp: [x] [test] ...
[x] [fn] [setq x] [pop][jump lp] ->
lp: [x] lp1: [test] ...
[x][fn][setq x][jump lp1]

Delete code after tags which are no longer referenced and
which are otherwise dead (e.g.
because some other optimization
eliminated them.)

Jump .+1 eliminated (often because of other optimizations).

Commonback (quite frequent, and very important.
any code [jump
a], [same code] a: -> [jump b] b: [same code] a.

Jump merge inline ....
[jump a] [jump ?] a: ...
->
a: .......
can only do if can find end of a code (a jump or
whatever).

[fjump a] [jump b] a: -> [tjump b]

[x] [tjump a] [x] a: [x] -> [x] [copy] [tjump a]
(e.g.
selectq, [x] 'a [eq] [fjump no] [x] no: [x] [stringp] ...
-> [x] [copy] 'a [eq] [fjump no1] ...
no1: 'b ...

return optimizations

return merging

POP [value] RETURN : delete POP.
(done after scanopt).
Also
DUNBIND, UNBIND (except when [value] is var which was specvar and
also bound (only because of shallow-binding system)..


Eliminate unused variables and frames (for macro expansion).

Make vars LOCALVARS when no functions called



CONCLUSIONS

Without optimization, the compiler could be quite simple.
Non-deterministic algorithm of code transformations.
As with
other "hill-climbing" algorithms, can perform one transformation
which prevents other (possible better) transformations from
occurring.

Peephole optimization easier than in cases where jumps are not
explicit.
Simple intermediate language pays off.
Can yield better
optimzations than source transformations because of ambiguity of
source definitions (Progn ...
(FOO A B C)) = (FOO (PROGN ...
A) B C
D) in compiled code.




BIBLIOGRAPHY

Pascal/P-code optimizers.

Smalltalk/Mesa compilers?

B5000 series compilers?

Other lisp compilers [Rabbit]

McCarthy etc? compilers?