Skip to content
Alex Wood edited this page Nov 18, 2022 · 1 revision

We've been thinking about moving things to a system where everything is merely compiled into bytecode by default, and then compiled into machine code with clasp-cleavir when requested or when called a lot. This raises the question of how this second compilation phase should take place. There are also related questions of how to improve features of the bytecode that will be important if it is used pervasively, such as debuggability, incorporating our custom special forms (which I have reduced to being mostly defcallback), compile-file handling (mostly load time value), and better error messages for syntactically invalid code.

Two basic paths to doing this are:

From source

First, we could keep to bytecode as we are doing, but retain the source code CST as well, but macroexpanded. When it comes time to native-compile, we just grab the CST and run it through the clasp-cleavir process. In this system, the bytecode itself would not have any source information; we could lazily perform a compilation any time it's asked for in the debugger.

The basic advantage of this would be that it would be fairly simple to do. We'd rewrite the bytecode compiler to retain the macroexpansion as it goes (this would be better than macroexpanding separately, as that would mean macroexpanding the same code twice - more expensive and kind of pointless).

The disadvantage is mostly in code to maintain. We would continue to have two separate frontends (dealing with Lisp source) - the one for bytecode, and the one for Cleavir. As in the past this will cause some confusion, as any change to syntax needs to be coordinated in two different systems, and there may be inconsistent results between the two frontends. Another lesser disadvantage is that every bytecode function will be carrying around potentially a lot of source code in memory.

From bytecode

Second, we could generate Cleavir IR directly from the bytecode, and compile that to native code. The disadvantage here is mostly in the startup cost: I would need to write a system to translate bytecode to IR, and the bytecode format would need to be augmented to retain debug information and some environment information (e.g. optimize declarations ignored by the bytecode compiler but used by Cleavir).

The first advantage is in long-term maintenance cost. We could basically drop all parts of Clasp that deal with cleavir-cst-to-ast, cleavir-ast, cleavir-ast-to-bir, and cleavir-env, and have one single frontend. (Things in the SICL world may be moving in this direction already - scymtym has written an s-expression-syntax library that does the work of cst-to-ast, and with more consistent error messages.)

Second and more interestingly, if the bytecode was made more robust in this way, we could use it in FASLs as we have briefly discussed, without sacrificing source information, or requiring source forms to be kept in the FASLs for later native compilation. This would be a lot of work, but could have a lot of cool consequences: FASLs portable between Clasps running in different environments, FASL versioning to catch incompatibilities in a nice way rather than crashes, the ability to distribute compact FASLs to users rather than source code, and possibly even generating and using binaries on other Lisps if the bytecode compiler and VM are ported to them.

Conclusion

I favor the second idea, since I think it is more useful in the long term. However, the first idea may be very simple to implement, and if it is, we could start with that. Then we could work on the necessary systems to e.g. decide when to compile, in that regime, and carry them over later.

Clone this wiki locally