The goal of this project is to understand how a pipelined processor works. In this project, you need to extend the existing 5-stage pipelined RISC-V simulator to support new instructions and a branch prediction scheme using the branch target buffer (BTB).
The target RISC-V processor snurisc5
consists of five pipeline stages: IF, ID, EX, MM, and WB. The following briefly summarizes the tasks performed in each stage:
- IF: Fetches an instruction from imem (instruction memory)
- ID: Decodes the instruction, reads register files, and prepares immediate values
- EX: Performs arithmetic/logical computation and determines the branch outcome
- MM: Accesses dmem (data memory), if necessary
- WB: Writes back the result to the register file and updates
pc
Please note that snurisc5
is slightly different from the 5-stage pipeline described in the textbook. For more information, please refer to this file.
This project assignment consists of the following three parts.
The RISC-V ISA is designed to support custom instructions. System architects can add any instruction for the application that they want to accelerate. This is a powerful feature as it allows for a new invention without breaking any software compatibility.
In this project, we want to add two new instructions called push
and pop
. Traditionally, these instructions are used to access data in the stack memory. The operations of the push
and pop
instructions are defined as follows (sp
denotes the stack pointer register and reg
can be any register):
push reg: R[sp] <- R[sp] - 4
M[R[sp]] <- R[reg]
pop reg: R[reg] <- M[R[sp]]
R[sp] <- R[sp] + 4
Please note that executing the pop sp
instruction leads to an undefined result and should be avoided, while push sp
is a valid instruction.
Using the push
and pop
instructions, function prologue and epilogue needed to save and restore register values can be simplified as shown in the following example.
func: addi sp, sp, -8 func: push ra
sw ra, 4(sp) push s0
sw s0, 0(sp)
... => ...
lw s0, 0(sp) pop s0
lw ra, 4(sp) pop ra
addi sp, sp, 8 ret
ret
We encode push
and pop
instructions as R-type instructions, but with a new opcode 0b1101011
. Both instructions have only one register argument. For push
, the register number is encoded in the rs2
field as its value will be written into the memory just like in the sw
instruction. For pop
, the register number is encoded in the rd
field. The below shows how push x31
and pop x30
instructions are encoded.
# R-type instruction format
bits 31...25 24...20 19...15 14...12 11...7 6...0
funct7 rs2 rs1 funct3 rd opcode
push x31: 0000001 11111 00000 000 00000 1101011
pop x30: 0000010 00000 00000 000 11110 1101011
Your task is to modify the existing 5-stage pipelined processor to support the push
and pop
instructions with fully implementing data forwarding wherever necessary. For push
and pop
instructions, use the ALU to compute the next sp
value (sp + 4
or sp - 4
).
There are several things you need to consider.
- Both
push
andpop
instructions may also have an implicit dependency onsp
with the preceding instructions that modifysp
. In these cases, you have to forward its value as well.
- Example 1: Implicit dependency on
sp
C0 C1 C2 C3 C4 C5 C6 C7 (cycle) lui sp, 0x80020 IF ID EX MM WB push t0 IF ID EX MM WB ; sp forwarded from lui pop t1 IF ID EX MM WB ; sp forwarded from push ebreak IF ID EX MM WB
- The
pop
instruction can cause another load-use hazard, where the pipeline should be stalled for one cycle.
- Example 2: Load-use hazard
C0 C1 C2 C3 C4 C5 C6 C7 C8 C9 (cycle) lui sp, 0x80020 IF ID EX MM WB push zero IF ID EX MM WB pop t0 IF ID EX MM WB push t0 IF ID ID EX MM WB ; 1-cycle stall ebreak IF IF ID EX MM WB
- For
pop
, you need to write back two register values, namelysp
andrd
, in the WB stage. For this, we provide you with the new register file (Pipe.cpu.rf
) that has two write ports:rd
andrd2
for two register numbers to be written, andwbdata
andwbdata2
for their data, respectively. If you specify any non-zero register numbers in bothrd
andrd2
, they will be updated together at the end of the WB stage. If you don't have anything to write inrd2
, set it to zero. Again, setting bothrd
andrd2
to the same register number (except zero) results in an undefined behavior. You can assume that our test cases will not include such an instruction aspop sp
.
The current pipelined processor uses an always-not-taken branch prediction scheme that naively fetches the next instructions when it encounters a branch instruction. In case the branch is determined to be taken at the EX stage, the wrong instructions in the IF and ID stages should be set to BUBBLEs, wasting two cycles.
The role of the branch predictor is to increase the chances of fetching the right instruction after a branch or jump instruction. Based on the past history of whether the current branch instruction was previously taken or not, the branch predictor may be able to make a better decision. Your second task is to implement a branch predictor using the branch target buffer (BTB).
Basically, the BTB is a small table inside the processor that caches the recent information on taken branches. Because the BTB predicts the next instruction address and will send it out before decoding the instruction, we must know whether the fetched instruction is predicted as a taken branch. The hardware for BTB is very similar to the hardware for a cache. It has a fixed number of entries and each entry consists of valid bit (V), tag bits (T), and the target address (A) as shown below.
BTB structure:
+---+---------+----------------------+
0 | V | tag (T) | target address (A) |
+---+---------+----------------------+
1 | | | |
+---+---------+----------------------+
| | ... | ... |
+---+---------+----------------------+
N-1 | | | |
+---+---------+----------------------+
The valid bit indicates whether the corresponding entry has a valid information. Initially, all the entries in the BTB are set to be invalid. The tag bits represent the address of a branch instruction. And the target address has the address to jump when the branch is taken.
In the IF stage, the instruction memory (imem) and the BTB are accessed simultaneously with the same pc
. If the pc
of the fetched instruction matches a tag in the BTB, then it indicates that the instruction is a branch and it was taken previously. In this case, the current branch instruction is also predicted as taken, and the corresponding target address is used as the next pc
. If the matching entry is not found in the BTB, the current instruction is either (1) a branch instruction which were not taken previously, or (2) not a branch instruction at all. In both cases, the next instruction at pc
+ 4 will be fetched.
The branch outcome is known at the end of the EX stage. If the prediction using the BTB was right, we have nothing to do. When the prediction was wrong, we need to cancel two instructions in the IF and ID stages, and fetch the right instruction. Also, when the branch is predicted to be taken but it was not taken, we should make the corresponding entry in the BTB invalid. Likewise, when the branch is predicted to be not-taken but it was taken, we should add an entry for the current branch instruction to the BTB.
Consider the following things when you implement the BTB.
-
We assume that the BTB always has the power-of-2 (i.e. N = 2k, 0 <= k <= 8) entries.
-
The BTB has a direct-mapped organization, where
(pc >> 2) % N
is used as an index. We omit the last 2 bits in thepc
because they will be always0b00
(all the instruction addresses are aligned to the 4-byte boundary inpyrisc
). The remaining bits of thepc
(i.e.pc >> (k + 2)
) are used as the tag bits. If the valid bit (V
) is 1, it indicates a valid entry. -
We have added a new class named
BTB
in the skeleton code. You need to implement the initialization code as well asadd()
andremove()
functions. You can refer to the BTB object using the namePipe.cpu.btb
. Because of the direct-mapped organization, two or more branch instructions can be mapped to the same entry in the BTB. If the corresponding entry is already occupied by another branch instruction at the time you want to add a new entry, it is simply overwritten with the information of the current branch instruction. -
You may need to add/remove an entry to/from the BTB in the EX stage when the prediction was wrong. Please make sure you update the BTB inside of the
update(self)
function of the EX stage so that it can be applied to the BTB at the end of the current cycle. -
It is very difficult to predict the target address of the
jalr
instruction as it depends on a register value. To make the problem simpler, thejalr
instruction is handled in the same way as insnurisc5
; the instructions next to thejalr
instruction are fetched until we have the target address in the EX stage and then the two instructions being executed in the IF and ID stages are converted into BUBBLEs while the target address is forwarded to the nextpc
value immediately. On the other hand, thejal
instruction can be treated in the same way as the other branch instructions.
The following shows some example scenarios.
-
Example 3: For the first
bne
instruction at line 7, the BTB entry is empty. Hence, it is predicted as not-taken and the followingebreak
instruction is fetched. AtC4
, we know that it is mispredicted, so two instructions at line 11 and 13 are flushed and theaddi
instruction is fetched atC5
(line 4). Next time we meet thebne
instruction at line 8, the previous history is available in the BTB and it is predicted as taken, fetching theaddi
instruction immediately at the next cycle atC7
. On the third execution of thebne
instruction atC8
, it will be still predicted as taken, but this is wrong. Again, two instructions at line 6 and 10 are flushed and theebreak
instruction is fetched atC11
.1 C0 C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 C13 C14 C15 (cycle) 2 li t0, 3 IF ID EX MM WB 3 L0: addi t0, t0, -1 IF ID EX MM WB 4 IF ID EX MM WB 5 IF ID EX MM WB 6 IF ID - - - 7 bne t0, x0, L0 IF ID EX MM WB 8 IF ID EX MM WB 9 IF ID EX MM WB 10 IF - - - - 11 ebreak IF ID - - - 12 IF ID EX MM WB 13 <illegal> IF - - - -
-
Example 4: In this example, the
beq
instruction will be mispredicted. But you can see that the instruction at the branch target is actually the same as the next instruction. In this case, you may feel tempted to keep them going instead of making them BUBBLEs. In order to do that, however, you would need another comparator at the EX stage to compare the branch target address with the address of the next instruction. To make the design simpler, we do not consider this optimization. So, whenever a branch is mispredicted, you must cancel two instructions being executed in the IF and ID stages.1 C0 C1 C2 C3 C4 C5 C6 C7 C8 (cycle) 2 beq x0, x0, L0 IF ID EX MM WB 3 L0: li t0, 1 IF ID - - - 4 IF ID EX MM WB 5 ebreak IF - - - - 6 IF ID EX MM WB
You need to prepare and submit the design document (in PDF format) for the modified snurisc5
processor. If you design the pipeline correctly with satisfying all the above requirements, you will get 20 points even if your implementation does not work. Your design document should answer the following questions.
- About Part 1: When do the new data hazards occur due to the
push
andpop
instructions and how do you deal with them?
- Show all the possible cases when data hazards can occur and your solutions to them
- Explain the changes in the datapath
- Explain the changes in the control signals. You should present how each control signal is generated in detail similar to the descriptions in pp.319-320 in the textbook.
- About Part 2: How do you implement the branch prediction using the BTB?
- Explain your implementation of the BTB
- Explain how you find out that the current branch is mispredicted
- Explain how you handle the mispredicted branch
- Explain the changes in the datapath
- Explain the changes in the control signals. Again, you should present how each control signal is generated in detail.
You need to rebuild the GNU RISC-V toolchain so that it understands the push
and pop
instructions. If you have deleted the source code of the GNU RISC-V toolchain, please download it again from the Github as follows.
$ git clone --recursive https://github.com/riscv/riscv-gnu-toolchain
There is a patch file named pushpop.patch
in the patch
directory of the skeleton code. Move the file into the riscv-gnu-toolchain
directory and perform the following command. This will update ./binutils/include/opcode/riscv-opc.h
and ./binutils/opcodes/riscv-opc.c
files. (The following assumes that ca-pa4
and riscv-gnu-toolchain
directories exist in your home directory.)
$ cp ~/ca-pa4/patch/pushpop.patch ~/riscv-gnu-toolchain
$ cd ~/riscv-gnu-toolchain
$ patch -p1 < ./pushpop.patch
Now you can build the toolchain as before.
$ mkdir build
$ cd build
$ ../configure --prefix=/opt/riscv --with-arch=rv32i --disable-gdb
$ sudo make
We provide you with the skeleton code that can be downloaded from https://github.com/snu-csl/ca-pa4. To download the skeleton code, please take the following step:
$ git clone https://github.com/snu-csl/ca-pa4.git
It is basically the same as the 5-stage pipelined simulator (snurisc5
) available in the PyRISC project. Please refer to the snurisc5.pdf file for the current pipeline structure of the snurisc5
simulator. We have slightly changed the simulator structure so that you only need to modify the stages.py
file. Currently, the instruction encodings and masks for push
and pop
instructions are already added to the ISA table in isa.py
as shown below. However, the datapath and the control logic cannot handle those instructions yet.
You may find the GUIDE.md in the PyRISC project useful, which describes the overall architecture and implementation details of the snurisc5
simulator.
# Instruction Encodings
PUSH = WORD(0b00000010000000000000000001101011)
POP = WORD(0b00000100000000000000000001101011)
# Instruction Masks
PUSH_MASK = WORD(0b11111110000000000111000001111111)
POP_MASK = WORD(0b11111110000000000111000001111111)
# ISA table
PUSH : [ "push", PUSH_MASK, R_TYPE, CL_MEM, ]
POP : [ "pop", POP_MASK, R_TYPE, CL_MEM, ]
Several RISC-V executable files, such as fib
, sum100
, forward
, branch
, loaduse
, ex1
, ex2
, ex3
, and ex4
, are available in the ./asm
directory of the skeleton code. In particular, ex1
, ex2
, ex3
, and ex4
programs are the ones explained in Example 1-4 of this document. You can test your simulator with these programs. Also, it is highly recommended to write your own test programs to see how your simulator works. Use the log level 4 as follows if you want to examine what's happening in the pipeline in each cycle. Also, you can change the size of the BTB using the -b
option. For example, -b 7
sets the size of the BTB to 27 = 128 entries (The default BTB size is set to k
= 4, i.e. 16 entries).
$ ./snurisc5.py -l 4 -b 7 asm/ex4
-
You should not change any files other than
stages.py
. -
Your
stages.py
file should not contain anyprint()
function even in comment lines. Please remove them before you submit your code to the server. -
You should not introduce unnecessary pipeline stalls.
-
Your code should finish within a reasonable number of of cycles. If your simulator runs beyond the predefined threshold, you will get the
TIMEOUT
error.
-
Submit only the
stages.py
file to the submission server. -
Also, submit the design document (in PDF file only) to the submission server.
-
The submitted code will NOT be graded instantly. Instead, it will be graded every four hours (12:00am, 4:00am, 8:00am, 12:00pm, 4:00pm, 8:00pm). You may submit multiple versions, but only the last version will be graded.
-
The
sys
server will be closed at 11:59PM on December 22nd. This is the firm deadline.
-
You will work on this project alone.
-
Only the upload submitted before the deadline will receive the full credit. 25% of the credit will be deducted for every single day delay.
-
You can use up to 4 slip days during this semester. If your submission is delayed by 1 day and if you decided to use 1 slip day, there will be no penalty. In this case, you should explicitly declare the number of slip days you want to use in the QnA board of the submission server after each submission.
-
Any attempt to copy others' work will result in heavy penalty (for both the copier and the originator). Don't take a risk.
This is the final project. I hope it was fun!
Jin-Soo Kim
Systems Software and Architecture Laboratory
Dept. of Computer Science and Engineering
Seoul National University