Skip to content

Commit

Permalink
note
Browse files Browse the repository at this point in the history
  • Loading branch information
hzqmwne committed Feb 3, 2018
1 parent 27a89de commit cc32b44
Show file tree
Hide file tree
Showing 21 changed files with 1,145 additions and 0 deletions.
31 changes: 31 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
### SJTU SE302 编译原理与技术(Principle and Technology of Compiler)
##### 《Modern Compiler Implementation in C》中的Tiger编译器
 
lab1:Straight-line program interpreter(直线式程序解释器)
lab2:Lexical Analysis(词法分析)
lab3:Syntax Analysis(语法分析)
final_tiger:Final Test for lab2 and lab3(对lab2和lab3的深度测试)
lab4:Type Checking(类型检查)
lab5:Intermediate Code(中间代码生成)
lab6:A Workable Tiger Compiler(可用的tiger编译器)
 
#### Usage:
编译tiger编译器:(需要linux系统并安装有gcc、make、lex、yacc)
cd lab6
make
得到tiger-compiler

编译tiger程序:
./tiger-compiler hello.tig
得到hello.tig.s

链接:(对于64位系统,需要安装32位运行库)
gcc -Wl,--wrap,getchar -m32 hello.tig.s runtime.c -o hello.out
(如果是32位系统,去掉-m32参数)

运行:
./hello.out
 
 

更多信息可参考note.txt
1 change: 1 addition & 0 deletions final_tiger/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Final Test,仅针对lab2和lab3
10 changes: 10 additions & 0 deletions lab1/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
### Lab1 Straight-line program interpreter
具体要求参见[lab1.md](./lab1.md)
 
 
正式项目开始前的热身。熟悉c语言、递归、树、键值对存储,了解无副作用编程(without side effects,除了初始化以外不使用赋值语句)
 

需要实现两个函数:
int maxargs(A_stm stm):统计给定语句及其子表达式中参数最多的print语句的参数个数
void interp(A_stm stm):对给定的语句进行解释执行
91 changes: 91 additions & 0 deletions lab1/lab1.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html>
<head>
</head>

<body onload="on_resize()" onclick="on_resize()">
<div id="header">
<span class="MainTitle">Compilers</span>
<span class="SubTitle">2017</span>
</div>


<div id="content">

<h2>Rules</h2>
<ol>
<li><font color="red">Prohibit plagiarism!</font></li>
<li>Uploading the lab: you must use <font color="red">`make handin`</font> provided by the Makefile and change the file name with your student ID.</li>
<li>You will get <font color="red">0 point</font> if missing the deadline.</li>
</ol>


<h2>Lab1 Straight-line program interpreter</h2>
<p>
<b>Due:</b> Sep 17, 12:00, 2017 <font color="red">At noon!</font></br>
If you have any problem about this lab, please contact TA
</p>
<hr>


<h2>Introduction</h2>
<p>
In this lab you will implement a simple straight-line program interpreter for the programming language described in <b>chapter 1</b> of your text book. Please read it carefully before you start.
</p>
<p>
This lab serves as an introduction to:
</p>
<p>
&nbsp&nbsp&nbsp&nbsp<b>environments</b> (symbol tables mapping variable-names to information about the variables)
</p>
<p>
&nbsp&nbsp&nbsp&nbsp<b>abstract syntax</b> (data structures representing the phrase structure of programs)
<p>
<p>
&nbsp&nbsp&nbsp&nbsp<b>recursion over tree data structures</b> (which is useful in many parts of a compiler)
</p>

<p> Please pay attention to the following things: </p>
<p>&nbsp&nbsp&nbsp&nbspWrite the interpreter <b>without side effects</b> (Please read your text book carefully)</p>



<h2>Getting started</h2>
<p>
Download the orginal environment <a href="https://ipads.se.sjtu.edu.cn/courses/compilers/handout-2017/lab1.tar.gz">here</a>.
</p>
<p>
Hint: You should first have a quick all the files. You must <font color="red">put all your implementation</font> in the file called `myimpl.c`. Do not modify other files!
<pre>
$ tar -xf lab1.tar.gz
$ cd lab1
</pre>

</p>

<h2>Testing</h2>
<p>
We've provided a script file called <b>gradeMe.sh</b> for testing. After programming, try running:
<pre>
$ ./gradeMe.sh
[^_^]: Pass
</pre>
The smiling face implies that you have passed all the test points. <b>gradeMe.sh</b> compares your output with <b>ref.txt</b>, so if there's something wrong with your answer, you can check the content in <b>ref.txt</b>.
</p>

<h2>Handin procedure</h2>
<p>
Submit your code to TA. You should archive your code as a gzipped tar file with the following command before submission:
<pre>
$ cd lab1
$ make handin
$ change the name of tar to: e.g. 5150379000.tar.gz (Do not add other characters!)
</pre>
You will receive full credit if your code passes the test in <b>gradeMe.sh</b> and your interpreter is a '<b>without side effect</b>' interpreter.
<p></p>
We will use the timestamp of your <b>last</b> submission for the purpose of calculating late days.

</p>
</div>
</body>
</html>
5 changes: 5 additions & 0 deletions lab1/note.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
https://ipads.se.sjtu.edu.cn/courses/compilers/labs/lab1.html

需要实现myimpl.c
lab1与整个tiger编译器的实现没有关系,但是lab1练习的递归、树、键值对存储将贯穿整个项目。
关于without side effects,书上有详细的说明
6 changes: 6 additions & 0 deletions lab2/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
### Lab2 Lexical Analysis
具体要求参见[lab2.md](./lab2.md)
&emsp;

使用lex工具,编写正则表达式对tiger程序进行词法分析
注意对字符串和嵌套注释的处理
121 changes: 121 additions & 0 deletions lab2/lab2.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html>
<head>
</head>

<body onload="on_resize()" onclick="on_resize()">
<div id="header">
<span class="MainTitle">Compilers</span>
<span class="SubTitle">2017</span>
</div>

<div id="content">
<h2>Lab2 Lexical Analysis</h2>
<p>
<b>Due:</b> Oct 7, 12:00 a.m, 2017 <font color="red">At noon!</font></br>
</p>
<hr>

<h2><font color="red">Update</font></h2>
<p>
<a href="https://ipads.se.sjtu.edu.cn/courses/compilers/handout-2017/update.tar.gz">Here</a> are the two new testcases (test50.out and test51.out). Please copy them into directory named "refs".
</p>

<h2>Introduction</h2>
<p>
In this lab you will implement a lexical analyzer for the Tiger language with Lex. Please read <b>chapter 2</b> of your textbook carefully before you start.
</p>

<h2>Getting started</h2>
<p>
Download the orginal environment <a href="https://ipads.se.sjtu.edu.cn/courses/compilers/handout-2017/lab2.tar.gz">here</a>.
<pre>
$ tar -xf lab2.tar.gz
$ cd ~/lab2
</pre>
</p>

<h2>Guide</h2>
<p>
0. You are only allowed to modify <b>tiger.lex</b> exclude the first several lines.
</p>
<p>
1. This lab is the first step to build our tiger compiler. So you should <b>carefully manage</b> your code including the following labs. For example, using GIT or cloud storage.
</p>

<p>
2. We recommend using <b>states</b> to handle comments (please refer to page 32-33 in the text book). Specifically, tiger language allows <b>nested comments</b>.
This one is legal:
<pre>
/* This is comment. /*nested comment*/*/
</pre>
While this one is illegal:
<pre>
/* This is comment. /*nested comment*/
</pre>
</p>

<p>
3. When handing an "ID", you should use <b>"String"</b> function to allocate the object.
</p>

<p>
4. Besides, you should turn in <b>documentation</b> for the following points:
</p>
<p>
<li> how you handle comments
<li> how you handle strings
<li> error handling
<li> end-of-file handling
<li> other interesting features of your lexer
</p>
<p>
Send your <b>documentation</b> to <b>TA </b> with the name of `studentID_name_lab2_document.pdf`
</p>

<h2>Testing</h2>
<p>
<b>Warning:</b> Before testing, please make sure that your compiler generates no warnings. Our testing environment might be different from yours, so warnings may be evil sometimes. Watch out!

<hr>
We've provided a script file called <b>gradeMe.sh</b> for testing. After programming, try running:
<pre>
$ ./gradeMe.sh
[^_^]: Pass
</pre>
The smiling face implies that you have passed all the test on Tiger code files in <b>/testcases</b>. <b>gradeMe.sh</b> compares your output with the files in <b>/refs</b>, so if there's something wrong with your answer, you can check the content in <b>/refs</b>.
</p>

<h2>Final Test</h2>
<p>
Since our tiger compiler is built step by step, there may exist some hidden bugs in current step.
Therefore, we provide a final test for uncovering potentail bugs. </br>
Although this test does not affect your score of this lab, it can help you find bugs as early as possible.
</p>
<p>
In case of some compatible issues, we provide a virutal machine (runs on VMware) for the final test. </br>
It is <b>not</b> necessary to coding inside the VM (Nevertheless, it has a workable environment.)
You can download it from <a href="https://jbox.sjtu.edu.cn/l/71KeTu">JBox</a>. VM user: tmac ; password: 123
<pre>
For testing,
$ copy tiger.lex into VM at first
$ cp tiger.lex ~/final-tiger
$ cd final-tiger
$ ./gradeMe.sh
</pre>
</p>


<h2>Handin procedure</h2>
<p>
Submit your code to TA. You should archive your code as a gzipped tar file with the following command before submission:
<pre>
$ cd lab2
$ make handin
$ change the name of tar to: e.g. 5150379000.tar.gz (Do not add other characters!)
</pre>
You will receive full credit if your code passes the test in <b>gradeMe.sh</b> when running on our machines. We will use the timestamp of your <b>last</b> submission for the purpose of calculating late days.
</p>
</div>
</body>
</html>
68 changes: 68 additions & 0 deletions lab2/note.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
https://ipads.se.sjtu.edu.cn/courses/compilers/labs/lab2.html

这是实现整个编译器的第一步,词法分析。需要完成tiger.lex文件。
词法分析过程以字节为单位读取源文件,解析出词法单元。
每次调用词法分析器,返回一个整数作为词法单元的标识。

书上给出了tiger语言的词法单词。大部分都可以直接识别,需要注意的是字符串和嵌套注释


嵌套注释:
lex初始处于INITIAL状态,当遇到"/*"时进入COMMENT状态。在COMMENT状态下,遇到"/*"对计数器加1,遇到"*/"对计数器减1。计数器归零后回到INITIAL状态。


字符串:
tiger语言的字符串支持多种转义字符,包括:(以下摘抄自书)
\n 换行符
\t 制表符Tab
\^c 控制字符c
\ddd 具有ASCII码ddd(3个十进制数字)的单个字符
\" 双引号字符(")
\\ 反斜线字符(\)
\f___f\ 此序列将被忽略,其中f___f代表一个或多个以上的格式化字符
(非可打印字符的子集,至少应包含空白符、制表符、换行符、走纸符)
组成的序列。(这允许在一行的末尾和下一行的开头各写一个“\”,
从而写出一个长度超过一行的长字符串)
在lab2中,只实现了\n和\t两种转义,其他情况只是简单地忽略“\”字符(这间接实现了\"和\\)。遇到换行也将自动结束字符串。
具体实现上,当lex遇到双引号(\x22)的时候,调用一个自定义函数逐个字符读取并处理转义,返回识别后的字符串。
(也可以直接编写正则表达式识别出字符串后再处理转义,但这个正则表达式比较复杂,可读性和调试都比较困难,而且仍需要单独处理转义,故未采用此方法)
(至于通过正则表达式一次性识别字符串和转义……直接放弃了此方法)


====================


注意lab2中对长度为0的字符串返回NULL指针而不是空字符串,这只是为了同时通过lab2测试和final test。在lab3及以后,这里返回的是空字符串而非空指针
测试时发现的另一个问题:C语言printf函数对NULL参数的处理:
程序一:
#include <stdio.h>
int main() {
printf("%s\n", NULL); // ----------> 产生segmentation fault
return 0;
}
程序二:
#include <stdio.h>
int main() {
printf("%s", NULL); // -----> 打印出(NULL)
return 0;
}
程序三:
#include <stdio.h>
int main() {
puts(NULL); // ----------> 产生segmentation fault
return 0;
}
(以上是在linux平台,gcc 5.4.0 使用 -O0 参数编译运行的结果)
第一种情况下printf被优化为了puts(尽管用-O0指定了不优化……),但第二中情况没有(可以通过反汇编验证)
printf对NULL参数会打印出(NULL)(这应该是未定义行为),但puts遇到NULL参数会直接错误
objdump -d 反汇编的结果对程序一和程序三完全相同


======================


特别注意,tiger语言允许“\0”作为合法的字符串的一部分,而C语言把“\0”作为字符串的结束符。如果词法分析只是简单的返回指向字符串开头的指针,在后面的编译阶段会丢失“\0”后面的字符串内容(这个bug直到最后生成汇编程序时才发现……)
对词法分析器的完善参见lab6,添加了对\ddd转义的处理(这样字符串就可以包含任何字符)和对“\0”的特殊处理

lab2提供了final test,用以检验词法分析器能否通过最终的测试。后续lab将要完成的内容在这里以.o文件的形式提供。
将tiger.lex复制到final_tiger文件夹下(需要先回滚到final_tiger的初始提交状态,因为lab3修改了final_tiger),然后make;./gradeMe.sh
6 changes: 6 additions & 0 deletions lab3/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
### Lab3 Syntax Analysis
具体要求参见[lab3.md](./lab3.md)
&emsp;

使用yacc,编写文法规则对tiger程序进行语法分析,并生成抽象语法树
需要调用lab2写好的词法分析器。将lab2/tiger.lex复制到lab3中
Loading

0 comments on commit cc32b44

Please sign in to comment.