Mx* Compiler

Overview

The source language is Mx*, a java-like language. The target is risc-v assembly. There are mainly three stages: semantic, codegen and optimize.

Semantic Stage

Lexer and Parser

ANTLR4 is a powerful tool to generate lexer and parser by a sample g4 file.

Abstract Syntax Tree

CST(Concrete Syntax Tree) is a representation of grammars in a tree-like form. AST(Abstract Syntax Tree) can be considered as a simplified CST but still contains useful information for semantic check. We can traverse CST using BaseVisitor(generated by antlr4) to build AST.

Semantic Check

Build an implicit scope tree and symbol tree. Lots of details and strange cases like (new int[10])[0]. The only thing to do is data-oriented program and check.

Codegen Stage

Intermediate Representation

LLVM-IR is a great IR. If we follow the rule of llvm, the codegen generation and optimization will be easier and more efficient. And we don't need too much time to design personal IR. Although the design and implemention is much easier, the homework becomes a little boring and less challengeable. In my opinion, the most challengeable work is to design an efficient IR, which will make the optimization process easy.

See llvm langdef

Instruction Selection

Translate IR to pesudo risc-v assembly. We can do some optimizations in the process, such as:

replace a + a or a * 2 with a << 1.

%ptr = getelementptr i32, i32* array, i32 1
%res = load i32, i32* %ptr

We can translate the above IR code into lw res,4(array) instead of addi ptr,4(array); lw res,0(ptr).

The hard part of instruction selection is to understand risc-v calling convention and handle registers like sp,fp,ra when calling function. Here's a good reference: Understanding RISCV Calling Convention.

Register Allocation

See Chapter 11 in Tiger Book. Although the algorithm is difficult understand for me, there's pesudo code in tiger book.

Optimize Stage

Mem2reg

The definition of SSA(Static Single Assignment) in SSA book:

A program is defined to be in SSA form if each variable is a target of exactly one assignment statement in the program text.

But to implement LLVM-IR with SSA directly is a little hard, so we firstly implement pesudo-LLVM then restruct it to real LLVM. In pesudo-LLVM, all defitions of variables are by store instructions and all uses of variables are by load instructions. The following is an example.

int a = 1, b = 2;
int c = a + b;

The pesudo-LLVM of the above code is

%a_address = alloca i32
store i32 1, i32* %a_address
%b_address = alloca i32
store i32 2, i32* %b_address
%c_address = alloca i32
%a = load i32, i32* %a_address
%b = load i32, i32* %b_address
%c = add i32 %a, %b
store i32 %c i32* %c_address

In other words, we achieve pesudo-SSA by sacrificing memory access. The mem2reg optimization is just to build a dominator tree and elimitate these expensive memory access. The algorithm is in chapter 2 of SSA book.

CFG Simplification

In the CFG(control flow graph), vertexes represent basic blocks and edges represent jump instructions in the control flow. CFG Simplification includes removing unreachable blocks, simplifying constant branches, simplifying single path phi instructions, etc.

Alias Analysis

Two pointers are may alias if they may point to the same memory location. The result of alias analysis will be used in CSE and LICM. I use Andersen's Points-To Analysis, which is easy to implement.

Is alias analysis useful? In fact, in most cases, only global variables are checked not alias and other pointers are checked may alias.

CSE(Common SubExpression Elimination)

If c = a + b in block1, d = a + b in block2 and block1 dominates block2, then we can replace d = a + b with d = c. But this case doesn't often happen. In most cases, we elimniate load like a[i] to reduce memory access. The following is an example.

Before CSE:

for(i = 0; i < n; ++i) {
    s1 = s1 + a[i];
    s2 = s2 + a[i] * a[i];
    s3 = s3 + a[i] * a[i] * a[i];
}

After CSE:

for(i = 0; i < n; ++i) {
    t1 = a[i];
    s1 = s1 + t1;
    t2 = t1 * t1;
    s2 = s2 + t2;
    t3 = t2 * t1
    s3 = s3 + t3;
}

In the above example, there's no store instruction. So we don't need to check alias and just need to eliminate load instructions. But most cases are more complex. In CSE optimization, we dfs the dominator tree, find common subexpressions by using map and check alias if we eliminate load.

LICM(Loop Invariant Code Motion)

A variable a is loop invariant if :

a is defined outside the loop.
a = b op c b, c are loop invariant.
a = getelementptr ptr, x ptr is loop invariant.
a = load ptr ptr is loop invariant and no store instruction to ptr or ptr's alias.

If we don't have alias anlysis, LICM will be useless. Because in most cases, we move getelementptr and load instructions like the following example.

Before LICM:

for(i = 0; i < n; ++i) {
    for (j = 0; j < n; ++j) {
        s = s + a[i][j];
    }
}

After LICM:

for(i = 0; i < n; ++i) {
    b = a[i];
    for (j = 0; j < n; ++j) {
        s = s + b[j];
    }
}

Inlining

A very useful optimization. For non-recursive functions, we build a calling tree rooted at main function and start inlining from the leafs of the calling tree. For recursive-functions, we only inline several levels.

Global Variables Localization

I just move global variables to main function if they just appear in main function. In fact, most global variables will be moved into main after inlining. But there're still some global variables. Here's a good optimization: load values from all global varaibles when entering function and store values to all global variables when exiting function. I don't do the optimization, but the same effect can be achieved by CSE. ~~Thus my CSE is just to eliminate global variables?~~

DCE(Dead Code Elimitation)

A naive dead code elimitation: If a instruction's result is not used by other instructions, remove it!

DCE optimization can be used to eliminate redundant IR instructions if the implementation of IR is a little bad.

SCCP(Sparse Conditional Constant Propagation)

See chapter 19.3 in Tiger Book .

Are DCE and SCCP useful in programs which are written by a not bad programer?

Print Restruction

Replace println(toString(x)) with printlnInt(x). Maybe the optimization is just data-oriented optimization. I'm curious why there are many println(toString()) in some testcases. In my opinion, it's meaningless. Is there such a programmer writing these redundant codes?

Peephole Optimization

Replace store a, ptr b = load ptr with store a, ptr b = move a. For every load, we just need to check nearby store of the load. The opimization is useful in the following case.

Before Peephole Optimization:

for (i = 0; i < n; ++i) {
    a[i] = i * i;
    printlnInt(a[i]);
}

After Peephole Optimization:

for (i = 0; i < n; ++i) {
    t = i * i;
    a[i] = t;
    printlnInt(t);
}

If the block has the only entry, we can eliminate store and load in different basic blocks. The optimization is useful in if statements like the following case.

Before Peephole Optimization:

for (i = 0; i < n; ++i) {
    a[i] = i * i;
    if (a[i] > c)
        printlnInt(a[i]);
}

After Peephole Optimization:

for (i = 0; i < n; ++i) {
    t = i * i;
    a[i] = t;
    if (t > c)
        printlnInt(t);
}

Performance

gcc O1 benchmark: 4.00

when maxInstNum=2000 in inlining:

testcase	time(clock cycles of simulator)	score
binary_tree	532814506	5.00
dijkstra	166188863	4.22
humble	1018041821	3.05
kruskal	158220677	3.42
lca	96375308	4.06
lunatic	2656322338	3.51
maxflow	23437143	5.00
pi	32980003	3.49
segtree	1027958443	3.96
sha_1	1243551947	4.78
overall		40.55

when maxInstNum=1000 in inlining:

testcase	time(clock cycles of simulator)	score
binary_tree	595389094	4.82
dijkstra	161618896	4.38
humble	1018041821	3.05
kruskal	158315403	3.41
lca	97050821	4.02
lunatic	2656524739	3.51
maxflow	23437143	5.00
pi	32980003	3.49
segtree	1027826250	3.96
sha_1	1249243188	4.76
overall		40.44

when maxInstNum=500 in inlining:

testcase	time(clock cycles of simulator)	score
binary_tree	679571883	4.09
dijkstra	170628330	4.07
humble	1018041821	3.05
kruskal	158897962	3.39
lca	96616199	4.05
lunatic	2657771281	3.50
maxflow	23437143	5.00
pi	32980003	3.49
segtree	1040642244	3.89
sha_1	1265151948	4.69
overall		38.96

Reference

The Definitive ANTLR 4 Reference.
Tiger Book: Modern Compiler Implementation in Java.
《自制编译器》青木峰郎.
Engineering A Compiler.
SSA Book
The RISC-V Instruction Set Manual.

Name		Name	Last commit message	Last commit date
Latest commit History 94 Commits
lib		lib
src		src
.gitignore		.gitignore
README.md		README.md
build.bash		build.bash
builtin.s		builtin.s
codegen.bash		codegen.bash
optim.bash		optim.bash
semantic.bash		semantic.bash

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mx* Compiler

Overview

Semantic Stage

Lexer and Parser

Abstract Syntax Tree

Semantic Check

Codegen Stage

Intermediate Representation

Instruction Selection

Register Allocation

Optimize Stage

Mem2reg

CFG Simplification

Alias Analysis

CSE(Common SubExpression Elimination)

LICM(Loop Invariant Code Motion)

Inlining

Global Variables Localization

DCE(Dead Code Elimitation)

SCCP(Sparse Conditional Constant Propagation)

Print Restruction

Peephole Optimization

Performance

Reference

About

Releases

Packages

Languages

LinsongGuo/Mxstar-compiler

Folders and files

Latest commit

History

Repository files navigation

Mx* Compiler

Overview

Semantic Stage

Lexer and Parser

Abstract Syntax Tree

Semantic Check

Codegen Stage

Intermediate Representation

Instruction Selection

Register Allocation

Optimize Stage

Mem2reg

CFG Simplification

Alias Analysis

CSE(Common SubExpression Elimination)

LICM(Loop Invariant Code Motion)

Inlining

Global Variables Localization

DCE(Dead Code Elimitation)

SCCP(Sparse Conditional Constant Propagation)

Print Restruction

Peephole Optimization

Performance

Reference

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages