From 8baeb8ed9ab650a25b61428755ee5139ef10fc2d Mon Sep 17 00:00:00 2001 From: Myrausman Date: Wed, 11 Sep 2024 00:31:40 +0500 Subject: [PATCH] improved readme Signed-off-by: Myrausman --- README.md | 355 +++++++++++++++++++++++++++++++----------------------- 1 file changed, 204 insertions(+), 151 deletions(-) diff --git a/README.md b/README.md index e7a988cc..762745ba 100644 --- a/README.md +++ b/README.md @@ -1,12 +1,20 @@ # riscv-opcodes -This repo enumerates standard RISC-V instruction opcodes and control and -status registers. It also contains a script to convert them into several -formats (C, Scala, LaTeX). +This repository enumerates standard RISC-V instruction opcodes and control/status registers. It also contains a script to convert them into various formats (C, Scala, LaTeX). -Artifacts (encoding.h, latex-tables, etc) from this repo are used in other -tools and projects like Spike, PK, RISC-V Manual, etc. +Artifacts like `encoding.h`, `latex-tables`, etc., from this repo are used in tools and projects such as Spike, PK, and the RISC-V Manual. +## Table of Contents +1. [Project Structure](#project-structure) +2. [File Naming Policy](#file-naming-policy) +3. [Encoding Syntax](#encoding-syntax) +4. [Usage](#usage) +5. [Artifact Generation](#artifact-generation) +6. [Adding a New Extension](#adding-a-new-extension) +7. [Debugging](#debugging) +8. [Contributing](#contributing) + +--- ## Project Structure ```bash @@ -14,206 +22,251 @@ tools and projects like Spike, PK, RISC-V Manual, etc. ├── encoding.h # the template encoding.h file ├── LICENSE # license file ├── Makefile # makefile to generate artifacts -├── parse.py # python file to perform checks on the instructions and generate artifacts +├── parse.py # performs checks and generates artifacts ├── README.md # this file ├── rv* # instruction opcode files └── unratified # contains unratified instruction opcode files ``` - +--- ## File Naming Policy -This project follows a very specific file structure to define the instruction encodings. All files -containing instruction encodings start with the prefix `rv`. These files can either be present in -the root directory (if the instructions have been ratified) or the `unratified` directory. The exact -file-naming policy and location is as mentioned below: +This project follows a specific file naming convention for instruction encodings: + +* **`rv_x`**: Instructions common to both 32-bit and 64-bit modes of extension `X`. +* **`rv32_x`**: Instructions specific to `rv32x` (e.g., `brev8`). +* **`rv64_x`**: Instructions specific to `rv64x` (e.g., `addw`). +* **`rv_x_y`**: Instructions valid when both extensions `X` and `Y` are enabled. Canonical ordering as specified by the RISC-V spec should be followed. +* **`unratified/`**: Contains instruction encodings that are not ratified yet, following the same policy. -1. `rv_x` - contains instructions common within the 32-bit and 64-bit modes of extension X. -2. `rv32_x` - contains instructions present in rv32x only (absent in rv64x e.g.. brev8) -3. `rv64_x` - contains instructions present in rv64x only (absent in rv32x, e.g. addw) -4. `rv_x_y` - contains instructions when both extension X and Y are available/enabled. It is recommended to follow canonical ordering for such file names as specified by the spec. -5. `unratified` - this directory will also contain files similar to the above policies, but will - correspond to instructions which have not yet been ratified. +For instructions present in multiple extensions where the spec is vague, the encoding should be placed in the canonically ordered first extension and imported into others using `$import`. -When an instruction is present in multiple extensions and the spec is vague in defining the extension which owns the instruction, the instruction encoding must be placed in the first canonically ordered extension and should be imported(via the `$import` keyword) in the remaining extensions. +--- ## Encoding Syntax +Instruction encoding files in this project use the following syntax: -The encoding syntax uses `$` to indicate keywords. As of now 2 keywords have been identified : `$import` and `$pseudo_op` (described below). The syntax also uses `::` as a means to define the relationship between extension and instruction. `..` is used to defined bit ranges. We use `#` to define comments in the files. All comments must be in a separate line. In-line comments are not supported. +* **Keywords**: `$import` and `$pseudo_op` are keywords used to indicate special operations. +* **Operators**: `::` defines relationships between extensions and instructions; `..` defines bit ranges. +* **Comments**: Use `#` for comments. Inline comments are not supported. -Instruction syntaxes used in this project are broadly categorized into three: +### Instruction Categories -- **regular instructions** :- these are instructions which hold a unique opcode in the encoding space. A very generic syntax guideline - for these instructions is as follows: - ``` - - ``` - where `` is either `` or ``. +1. **Regular Instructions**: Instructions with unique opcodes. + - **Syntax**: ` ` + - **Example**: + ```plaintext + lui rd imm20 6..2=0x0D 1..0=3 + beq bimm12hi rs1 rs2 bimm12lo 14..12=0 6..2=0x18 1..0=3 + ``` - Examples: - ``` - lui rd imm20 6..2=0x0D 1..0=3 - beq bimm12hi rs1 rs2 bimm12lo 14..12=0 6..2=0x18 1..0=3 - ``` - The bit encodings are usually of 2 types: - - *single bit assignment* : here the value of a single bit is assigned using syntax `=`. For e.g. `6=1` means bit 6 should be 1. Here the value must be 1 or 0. - - *range assignment*: here a range of bits is assigned a value using syntax: `..=`. For e.g. `31..24=0xab`. The value here can be either unsigned integer, hex (0x) or binary (0b). + - **Bit Encoding Types**: + - *Single Bit Assignment*: `=` + - *Range Assignment*: `..=` -- **pseudo_instructions** (a.k.a pseudo\_ops) - These are instructions which are aliases of regular instructions. Their encodings force - certain restrictions over the regular instruction. The syntax for such instructions uses the `$pseudo_op` keyword as follows: - ``` - $pseudo_op :: - ``` - Here the `` specifies the extension which contains the base instruction. `` indicates the name of the instruction - this pseudo-instruction is an alias of. The remaining fields are the same as the regular instruction syntax, where all the args and the fields - of the pseudo instruction are specified. +2. **Pseudo Instructions** (`$pseudo_op`): Aliases for regular instructions with restricted bit encodings. + - **Syntax**: `$pseudo_op :: ` + - ``: Specifies the extension which contains the base instruction. + - ``: Indicates the name of the instruction this pseudo-instruction is an alias of. + - The remaining fields are the same as the regular instruction syntax, where all the arguments and the fields of the pseudo instruction are specified. + - **Example**: + ```plaintext + $pseudo_op rv_zicsr::csrrs frflags rd 19..15=0 31..20=0x001 14..12=2 6..2=0x1C 1..0=3 + ``` - Example: - ``` - $pseudo_op rv_zicsr::csrrs frflags rd 19..15=0 31..20=0x001 14..12=2 6..2=0x1C 1..0=3 - ``` - - If a ratified instruction is a pseudo\_op of a regular unratified - instruction, it is recommended to maintain this pseudo\_op relationship i.e. - define the new instruction as a pseudo\_op of the unratified regular - instruction, as this avoids existence of overlapping opcodes for users who are - experimenting with unratified extensions as well. - -- **imported_instructions** - these are instructions which are borrowed from an extension into a new/different extension/sub-extension. Only regular instructions can be imported. Pseudo-op or already imported instructions cannot be imported. Example: - ``` - $import rv32_zkne::aes32esmi - ``` + - **Recommendation**: If a ratified instruction is a `$pseudo_op` of a regular unratified instruction, it is recommended to maintain this `$pseudo_op` relationship. Define the new instruction as a `$pseudo_op` of the unratified regular instruction to avoid overlapping opcodes for users experimenting with unratified extensions. -### RESTRICTIONS +3. **Imported Instructions** (`$import`): Instructions borrowed from another extension. + - These are instructions borrowed from an extension into a new or different extension/sub-extension. Only regular instructions can be imported. Pseudo-ops or already imported instructions cannot be imported. + - **Syntax**: `$import :: ` + - **Example**: + ```plaintext + $import rv32_zkne::aes32esmi + ``` -Following are the restrictions one should keep in mind while defining $pseudo\_ops and $imported\_ops +### Restrictions -- Pseudo-op or already imported instructions cannot be imported again in another file. One should - always import base-instructions only. -- While defining a $pseudo\_op, the base-instruction itself cannot be a $pseudo\_op +* Pseudo-ops or already imported instructions cannot be imported again. +* A base instruction for a pseudo-op cannot be a pseudo-op itself. +--- ## Flow for parse.py -The `parse.py` python file is used to perform checks on the current set of instruction encodings and also generates multiple artifacts : latex tables, encoding.h header file, etc. This section will provide a brief overview of the flow within the python file. +The `parse.py` Python file is used to perform checks on the current set of instruction encodings and also generates multiple artifacts: LaTeX tables, `encoding.h` header file, etc. This section provides a brief overview of the flow within the Python file. -To start with, `parse.py` creates a list of all `rv*` files currently checked into the repo (including those inside the `unratified` directory as well). -It then starts parsing each file line by line. In the first pass, we only capture regular instructions and ignore the imported or pseudo instructions. -For each regular instruction, the following checks are performed : +1. **Initial Setup**: + - `parse.py` creates a list of all `rv*` files currently checked into the repo (including those inside the `unratified` directory). + - It starts parsing each file line by line. - - for range-assignment syntax, the *msb* position must be higher than the *lsb* position - - for range-assignment syntax, the value of the range must representable in the space identified by *msb* and *lsb* - - values for the same bit positions should not be defined multiple times. - - All bit positions must be accounted for (either as args or constant value fields) +2. **First Pass - Regular Instructions**: + - Capture only regular instructions and ignore imported or pseudo instructions. + - **Checks performed**: + - For range-assignment syntax, the *msb* (most significant bit) position must be higher than the *lsb* (least significant bit) position. + - The value of the range must be representable in the space identified by *msb* and *lsb*. + - Values for the same bit positions should not be defined multiple times. + - All bit positions must be accounted for (either as arguments or constant value fields). -Once the above checks are passed for a regular instruction, we then create a dictionary for this instruction which contains the following fields: - - encoding : contains a 32-bit string defining the encoding of the instruction. Here `-` is used to represent instruction argument fields - - extension : string indicating which extension/filename this instruction was picked from - - mask : a 32-bit hex value indicating the bits of the encodings that must be checked for legality of that instruction - - match : a 32-bit hex value indicating the values the encoding must take for the bits which are set as 1 in the mask above - - variable_fields : This is list of args required by the instruction + - **Dictionary Creation**: + - Create a dictionary for each instruction with the following fields: + - `encoding`: A 32-bit string defining the encoding of the instruction. `-` is used to represent instruction argument fields. + - `extension`: String indicating which extension/filename this instruction was picked from. + - `mask`: A 32-bit hex value indicating the bits of the encodings that must be checked for legality. + - `match`: A 32-bit hex value indicating the values the encoding must take for the bits which are set as 1 in the mask. + - `variable_fields`: A list of arguments required by the instruction. -The above dictionary elements are added to a main `instr_dict` dictionary under the instruction node. This process continues until all regular -instructions have been processed. In the second pass, we now process the `$pseudo_op` instructions. Here, we first check if the *base-instruction* of -this pseudo instruction exists in the relevant extension/filename or not. If it is present, the the remaining part of the syntax undergoes the same -checks as above. Once the checks pass and if the *base-instruction* is not already added to the main `instr_dict` then the pseudo-instruction is added to -the list. In the third, and final, pass we process the imported instructions. + - Add the dictionary elements to a main `instr_dict` dictionary under the instruction node. This process continues until all regular instructions have been processed. -The case where the *base-instruction* for a pseudo-instruction may not be present in the main `instr_dict` after the first pass is if the only a subset -of extensions are being processed such that the *base-instruction* is not included. +3. **Second Pass - Pseudo Instructions**: + - Process `$pseudo_op` instructions. + - **Checks performed**: + - Verify if the *base-instruction* of the pseudo instruction exists in the relevant extension/filename. + - The remaining part of the syntax undergoes the same checks as above. + - If the checks pass and the *base-instruction* is not already added to the main `instr_dict`, then add the pseudo-instruction to the list. +4. **Third Pass - Imported Instructions**: + - Process imported instructions. + +5. **Special Case**: + - If the *base-instruction* for a pseudo-instruction is not present in the main `instr_dict` after the first pass, it may be due to processing only a subset of extensions where the *base-instruction* is not included. ## Artifact Generation and Usage -The following artifacts can be generated using parse.py: +The `parse.py` script can generate the following artifacts: -- instr\_dict.json : This is always generated by parse.py and contains the - entire main dictionary `instr\_dict` in JSON format. Note, in this file the - *dots* in an instruction are replaced with *underscores*. In previous - versions of this project the generated file was instr\_dict.yaml. Note that - JSON is a subset of YAML so the file can still be read by any YAML parser. -- encoding.out.h : this is the header file that is used by tools like spike, pk, etc -- instr-table.tex : the latex table of instructions used in the riscv-unpriv spec -- priv-instr-table.tex : the latex table of instruction used in the riscv-priv spec -- inst.chisel : chisel code to decode instructions -- inst.sverilog : system verilog code to decode instructions -- inst.rs : rust code containing mask and match variables for all instructions -- inst.spinalhdl : spinalhdl code to decode instructions -- inst.go : go code to decode instructions +* **`instr_dict.yaml`**: Contains the main dictionary `instr_dict` in YAML format. Note that dots in instruction names are replaced with underscores in this YAML file. +* **`encoding.out.h`**: A header file used by tools such as Spike, PK, etc. +* **`instr-table.tex`**: LaTeX table of instructions for the RISC-V unprivileged specification. +* **`priv-instr-table.tex`**: LaTeX table of instructions for the RISC-V privileged specification. +* **`inst.chisel`**: Chisel code for decoding instructions. +* **`inst.sverilog`**: SystemVerilog code for decoding instructions. +* **`inst.rs`**: Rust code containing mask and match variables for all instructions. +* **`inst.spinalhdl`**: SpinalHDL code for decoding instructions. +* **`inst.go`**: Go code for decoding instructions. -To generate all the above artifacts for all instructions currently checked in, simply run `make` from the root-directory. This should print the following log on the command-line: +### Prerequisites +Ensure you have the required Python dependencies installed. Run the following commands: + +```bash +sudo apt-get install python3-pip +pip3 install -r requirements.txt ``` -Running with args : ['./parse.py', '-c', '-go', '-chisel', '-sverilog', '-rust', '-latex', '-spinalhdl', 'rv*', 'unratified/rv*'] -Extensions selected : ['rv*', 'unratified/rv*'] -INFO:: encoding.out.h generated successfully -INFO:: inst.chisel generated successfully -INFO:: inst.spinalhdl generated successfully -INFO:: inst.sverilog generated successfully -INFO:: inst.rs generated successfully -INFO:: inst.go generated successfully -INFO:: instr-table.tex generated successfully -INFO:: priv-instr-table.tex generated successfully -``` +### Generating Artifacts +To generate all artifacts for all instructions currently checked in, run make from the root directory. This will produce the following output: + ```plaintext + Running with args : ['./parse.py', '-c', '-go', '-chisel', '-sverilog', '-rust', '-latex', '-spinalhdl', 'rv*', 'unratified/rv*'] + Extensions selected : ['rv*', 'unratified/rv*'] + INFO:: encoding.out.h generated successfully + INFO:: inst.chisel generated successfully + INFO:: inst.spinalhdl generated successfully + INFO:: inst.sverilog generated successfully + INFO:: inst.rs generated successfully + INFO:: inst.go generated successfully + INFO:: instr-table.tex generated successfully + INFO:: priv-instr-table.tex generated successfully -By default all extensions are enabled. To select only a subset of extensions you can change the `EXTENSIONS` variable of the makefile to contains only the file names of interest. + ``` +### Selecting Specific Extensions +By default, all extensions are enabled. To select a subset of extensions, modify the EXTENSIONS variable in the Makefile to include only the filenames of interest. For example, to include only the I and M extensions: For example if you want only the I and M extensions you can do the following: ```bash make EXTENSIONS='rv*_i rv*_m' ``` -Which will print the following log: - -``` -Running with args : ['./parse.py', '-c', '-chisel', '-sverilog', '-rust', '-latex', 'rv32_i', 'rv64_i', 'rv_i', 'rv64_m', 'rv_m'] -Extensions selected : ['rv32_i', 'rv64_i', 'rv_i', 'rv64_m', 'rv_m'] -INFO:: encoding.out.h generated successfully -INFO:: inst.chisel generated successfully -INFO:: inst.sverilog generated successfully -INFO:: inst.rs generated successfully -INFO:: instr-table.tex generated successfully -INFO:: priv-instr-table.tex generated successfully +This will produce the following output: + +```plaintext + Running with args : ['./parse.py', '-c', '-go', '-chisel', '-sverilog', '-rust', '-latex', 'rv32_i', 'rv64_i', 'rv_i', 'rv64_m', 'rv_m'] + Extensions selected : ['rv32_i', 'rv64_i', 'rv_i', 'rv64_m', 'rv_m'] + INFO:: encoding.out.h generated successfully + INFO:: inst.chisel generated successfully + INFO:: inst.sverilog generated successfully + INFO:: inst.rs generated successfully + INFO:: instr-table.tex generated successfully + INFO:: priv-instr-table.tex generated successfully ``` +### Generating Specific Artifacts -If you only want a specific artifact you can use one or more of the following targets : `c`, `rust`, `chisel`, `sverilog`, `latex` +To generate specific artifacts, use one or more of the following targets: -You can use the `clean` target to remove all artifacts. +* `c` +* `rust` +* `chisel` +* `sverilog` +* `latex` -## Adding a new extension +### Cleaning Up -To add a new extension of instructions, create an appropriate `rv*` file based on the policy defined in [File Structure](#file-naming-policy). Run `make` from the root directory to ensure that all checks pass and all artifacts are created correctly. A successful run should print the following log on the terminal: +To remove all generated artifacts, use the `clean` target: +```bash +make clean ``` -Running with args : ['./parse.py', '-c', '-chisel', '-sverilog', '-rust', '-latex', 'rv*', 'unratified/rv*'] -Extensions selected : ['rv*', 'unratified/rv*'] -INFO:: encoding.out.h generated successfully -INFO:: inst.chisel generated successfully -INFO:: inst.sverilog generated successfully -INFO:: inst.rs generated successfully -INFO:: instr-table.tex generated successfully -INFO:: priv-instr-table.tex generated successfully -``` +--- + +## Adding a New Extension -Create a PR for review. +To add a new extension of instructions, follow these steps: -## Enabling Debug logs in parse.py +1. **Create the Extension File**: + - Create a new `rv*` file according to the policy defined in the [File Structure](#file-naming-policy). -To enable debug logs in parse.py change `level=logging.INFO` to `level=logging.DEBUG` and run the python command. You will now see debug statements on -the terminal like below: +2. **Run Checks and Generate Artifacts**: + - From the root directory, run the `make` command to ensure that all checks pass and that all artifacts are generated correctly. + - A successful run will produce the following output: + + ```plaintext + Running with args : ['./parse.py', '-c', '-chisel', '-sverilog', '-rust', '-latex', 'rv*', 'unratified/rv*'] + Extensions selected : ['rv*', 'unratified/rv*'] + INFO:: encoding.out.h generated successfully + INFO:: inst.chisel generated successfully + INFO:: inst.sverilog generated successfully + INFO:: inst.rs generated successfully + INFO:: instr-table.tex generated successfully + INFO:: priv-instr-table.tex generated successfully + ``` + +3. **Submit for Review**: + - Create a pull request (PR) to submit your changes for review. + +Ensure you follow these steps carefully to integrate the new extension properly. +--- + +## How do I find where an instruction is defined? + +You can locate the definition of an instruction using one of the following methods: + +1. **Using `grep`**: + ```bash + grep "^\s*" rv* unratified/rv* + ``` +2. **Using `make`**: + - Run make to generate the instr_dict.yaml file. + - Open instr_dict.yaml and search for the instruction. + - The extension field in the file will indicate which file the instruction was picked from. +--- +## Debugging +To enable debug logs in parse.py: + +1. Modify the logging level in parse.py: +```python +level=logging.INFO ``` -DEBUG:: Collecting standard instructions first -DEBUG:: Parsing File: ./rv_i -DEBUG:: Processing line: lui rd imm20 6..2=0x0D 1..0=3 -DEBUG:: Processing line: auipc rd imm20 6..2=0x05 1..0=3 -DEBUG:: Processing line: jal rd jimm20 6..2=0x1b 1..0=3 -DEBUG:: Processing line: jalr rd rs1 imm12 14..12=0 6..2=0x19 1..0=3 -DEBUG:: Processing line: beq bimm12hi rs1 rs2 bimm12lo 14..12=0 6..2=0x18 1..0=3 -DEBUG:: Processing line: bne bimm12hi rs1 rs2 bimm12lo 14..12=1 6..2=0x18 1..0=3 +Change it to: +```python +level=logging.DEBUG ``` +2. Example debug output: + ```bash + DEBUG:: Parsing File: ./rv_i + DEBUG:: Processing line: lui rd imm20 6..2=0x0D 1..0=3 + ``` +--- +## Contributing + If you wish to contribute to this project: -## How do I find where an instruction is defined? + - Open a pull request (PR) or issue. + - Ensure that all tests pass. + - Follow the repository’s coding guidelines. -You can use `grep "^\s*" rv* unratified/rv*` OR run `make` and open -`instr_dict.json` and search for the instruction you are looking for. Within -that instruction the `extension` field will indicate which file the -instruction was picked from.