diff --git a/doc/Sysdarft.md b/doc/Sysdarft.md index 0cecc2421..6e5cf062a 100644 --- a/doc/Sysdarft.md +++ b/doc/Sysdarft.md @@ -234,7 +234,7 @@ Suppose the stack pointer `%SP` initially points to address `0x1000` and `%SB` p When a value is pushed onto the stack, the `%SP` is decreased (since the stack grows downward), and the value is stored at the new address. -```c +``` (Stack End) (Stack Pointer) (Stack Base) [ -- 8 Byte Data -- ][ -- Free Space -- ] ^ ^ ^ @@ -245,7 +245,7 @@ and the value is stored at the new address. When a value is popped from the stack, the `%SP` is increased and the stack grows back up, freeing the space in the process. -```c +``` (Stack End) (Stack Pointer After Pop) (Stack Base) [ -- 7 Byte Data -- ][ --- --- --- Free Space -- --- --- --- ] [ -- 1 Byte Data -- ] @@ -350,7 +350,7 @@ current allocated stack space for local variables. [^PUSHALL]: Push all preservable registers into the stack in the following order (Higher in order means register being pushed earlier): -```asm +``` %FER[0-15], %FG, %SB, %SP, %DB, %DP, %EB, %EP, %CPS ``` Code segment registers `%CB` and `%IP` are not preserved by `PUSHALL`. @@ -363,7 +363,7 @@ Refer to *Assembler Syntax* and *Appendix A* for more information. [^Enter]: `ENTER [Width] [Number]` preserves a stack space to allocate spaces for local variables. Procedure of `ENTER` can be described as: -```c++ +``` CBS = Number; // CBS is Current Procedure Stack Preservation Space SP = SP - CPS; // SP is stack pointer ``` @@ -372,7 +372,7 @@ Refer to *Assembler Syntax* and *Appendix A* for more information. [^Leave]: `LEAVE` tears down a stack space allocated through `ENTER` Procedure of `LEAVE` can be described as: -```c++ +``` SP = SP + CPS; // SP is stack pointer CBS = 0; // CBS is Current Procedure Stack Preservation Space ``` @@ -390,8 +390,6 @@ Preprocessor directives are not program statements but directives for the prepro The preprocessor examines the code before actual compilation of code begins and resolves all these directives before any code is actually generated by regular statements[@CPPPrimer]. -There are three preprocessor directives for Sysdarft Assembly Language. - #### .org `.org`, or origin, defines the starting offset for code in memory. @@ -404,7 +402,7 @@ ensuring proper offset calculations for absolute code. #### Syntax and Example -```asm +``` .org [Decimal or Hexadecimal] .org 0xC1800 ``` @@ -431,14 +429,14 @@ It's essentially a way to define *symbolic constants* or *aliases* for values or #### Syntax and Example -```asm +``` .equ '[Search Target]', '[Replacement]' ; regular expression not enabled .equ 'HDD_IO', '0x1234' ; regular expression enabled - ; this will replace occurrances like add(%fer0, %fer1) to add .64bit <%fer0>, <%fer1> - .equ 'add\s*\((.*), (.*)\)', 'add .64bit <\1>, <\2>' + ; this will replace occurrances like ADD(%FER0, %FER1) to ADD .64bit <%FER0>, <%FER1> + .equ 'ADD\s*\((.*), (.*)\)', 'ADD .64bit <\1>, <\2>' ``` #### .lab (*deprecated*) @@ -449,18 +447,52 @@ Line markers can be auto scanned and defined without relying on this directive. #### Syntax and Example -```asm +``` .lab marker1, [marker2, ...] .lab _start, _end ``` +> **NOTE**: The preprocessor directives mentioned above are called `Declarative Preprocessor Directives`, +> which can and only can be processed if they are at the beginning of the file. +> Any occurrences of declarative preprocessor directives within the code region, +> that is, appearing after an instruction or valid line marker, +> the assembler will refuse to process these directives +> and an exception (error) will be thrown. + +#### `@` and `@@` + +`@` and `@@` are code offset references. +`@` means the segment offset of the current instruction. +`@@` means the code origin, if `.org` is not specified, its value is `0x00`. +Both `@` and `@@` are constant value, and should be treated as one. + +#### Syntax and Example + +``` + JMP <%CB>, <$(@)> +``` + +#### .resvb + +`.resvb` is short for `reserve bytes`. +It reserves a fixed size of a data region inside the code area. +This is essential when it comes to size alignment or padding. +It supports mathematical expressions like `+, -, * ,/, %, etc.`. + +#### Syntax and Example + +``` + .resvb < [Mathematical Expression] > + .resvb < 16 - ( (@ - @@) % 16 ) > ; ensure 16 byte alignment +``` + ## Instruction Statements Instruction statements are actions performed by processor. For all instruction statements, it follows this syntax: -```asm +``` Mnemonic [Width] [, ] ``` @@ -480,9 +512,9 @@ The following is the breakdown of each part of the instruction expression. #### Mnemonic -Mnemonic is a symbolic name represents each of the machine-language instructions[@ComputerScienceIlluminated][^MnemonicTable]. +Mnemonic is a symbolic name represents each of the machine-language instructions[@ComputerScienceIlluminated]'[^MnemonicTable]. -[^MnemonicTable]: Refer to Appendix A for the whole instruction mnemonic table. +[^MnemonicTable]: Refer to *Appendix A* for the whole instruction mnemonic table. #### Operation Width @@ -497,7 +529,7 @@ There are three possible operand types: registers, constants, or memory referenc #### Register Operands -Register operands are accessible internal CPU registers: general-purpose or special-purpose. +Register operands are accessible internal CPU registers of general-purpose or special-purpose. Registers must start with `%`, with no space between `%` and register name. For example: `%EXR2` is valid, but `"% EXR2"` is not and will not be detected as an operand. @@ -515,8 +547,8 @@ Expression, if being a stand-alone operand, is enclosed by `<` and `>`, resulting in a double enclosure of both signs. For example, a constant in an instruction expression can look like this: -```asm - add .64bit <%FER0>, < $( 0xFFFF + 0xBC ) > +``` + ADD .64bit <%FER0>, < $( 0xFFFF + 0xBC ) > ``` Constant expressions are always 64 bits wide. @@ -529,29 +561,30 @@ Memory references are data stored at a specific memory location. Memory references are a complicated expression: -```asm +``` *[Ratio]&[Width](Base, Offset1, Offset2) ``` -**and** +*and* -$\text{Memory Linear Address} = \text{Ratio} \times ( \text{Base} + \text{Offset1} + \text{Offset2} )$ +$\text{Memory Reference Physical Address} = \text{\%DB} + \text{Ratio} \times (\text{Base} + \text{Offset1} + \text{Offset2})$ -**where** +*where* -- `Ratio` can be `1`, `2`, `4`, `8`, `16`. -- `Base`, `Offset1`, `Offset2` can be either constant expressions or registers. -- `Width` specifies data width of the memory location, which can be `8`, `16`, `32`, `64`, - representing $8\text{-bit}$, $16\text{-bit}$, $32\text{-bit}$, and $64\text{-bit}$ data respectively. +- *Ratio* can be `1`, `2`, `4`, `8`, `16`. +- *Base*, *Offset1*, *Offset2* can be and can only be either constant expressions or registers. +- *Width* specifies data width of the memory location, which can be `8`, `16`, `32`, `64`, + representing $8\text{-bit}$, $16\text{-bit}$, $32\text{-bit}$, and $64\text{-bit}$ data respectively. +- *`%DB`* being Data Segment Base Register. The following is an example of a memory reference: -```asm +``` *2&64(%FER1, $(0xFC), $(0xBC)) ``` This address points to a $64\text{-bit}$ data width space at the address -$(\text{\%FER1} + \text{0xFC} + \text{0xBC}) \times 2$ +$(\text{\%FER1} + \text{0xFC} + \text{0xBC}) \times 2 + \text{\%DB}$ #### Line Markers @@ -559,18 +592,18 @@ Line markers are special operands that record the offset of their corresponding For example: -```asm - jmp <%cb>, <_start> +``` + JMP <%CB>, <_start> _start: - xor .32bit <%her0>, <%her0> + XOR .32bit <%HER0>, <%HER0> ``` -`_start` is identified as a line marker by a following `:`. +`_start` is identified as a line marker by its tailing `":."` Only spaces and tabs may appear after the colon, any other elements like instructions will be considered as errors. If `.org` is not specified, line markers are calculated as offsets from the beginning of the file, starting at `0`. -If `.org` is specified, the offset is calculated from the given address. +If `.org` is specified, the offset is calculated from $\text{the offset within the file} + \text{specified origin}$. # **Interruption** @@ -678,4 +711,188 @@ so that the next field will begin at the edge of the next allocation unit. | **INS** | `OPR1` $=$ `OPR1` - `OPR2` | `SUB [Width] , ` | | **OUTS** | `OPR1` $=$ `OPR1` - `OPR2` - `CF` | `SBB [Width] , ` | +# **Appendix B: Examples** + +## Example A, Disk I/O + +``` +00000000000C1800: 30 01 64 A2 02 64 E3 18 JMP <%CB>, <$(0xC18E3)> + 0C 00 00 00 00 00 +00000000000C180E: 24 PUSHALL +00000000000C180F: 51 64 02 64 37 01 00 00 OUT .64bit <$(0x137)>, <$(0x0)> + 00 00 00 00 02 64 00 00 + 00 00 00 00 00 00 +00000000000C1825: 51 64 02 64 38 01 00 00 OUT .64bit <$(0x138)>, <$(0x4)> + 00 00 00 00 02 64 04 00 + 00 00 00 00 00 00 +00000000000C183B: 20 64 01 64 00 02 64 00 MOV .64bit <%FER0>, <$(0x800)> + 08 00 00 00 00 00 00 +00000000000C184A: 52 64 02 64 39 01 00 00 INS .64bit <$(0x139)> + 00 00 00 00 +00000000000C1856: 25 POPALL +00000000000C1857: 32 RET +00000000000C1858: 24 PUSHALL +00000000000C1859: 20 64 01 64 00 02 64 D0 MOV .64bit <%FER0>, <$(0x7D0)> + 07 00 00 00 00 00 00 +00000000000C1868: 20 64 01 64 A4 02 64 00 MOV .64bit <%DP>, <$(0xB8000)> + 80 0B 00 00 00 00 00 +00000000000C1877: 28 MOVS +00000000000C1878: 39 02 64 18 00 00 00 00 INT <$(0x18)> + 00 00 00 +00000000000C1883: 20 16 01 16 00 02 64 CF MOV .16bit <%EXR0>, <$(0x7CF)> + 07 00 00 00 00 00 00 +00000000000C1892: 39 02 64 11 00 00 00 00 INT <$(0x11)> + 00 00 00 +00000000000C189D: 25 POPALL +00000000000C189E: 32 RET +00000000000C189F: 24 PUSHALL +00000000000C18A0: 39 02 64 14 00 00 00 00 INT <$(0x14)> + 00 00 00 +00000000000C18AB: 0A 08 01 08 00 02 64 71 CMP .8bit <%R0>, <$(0x71)> + 00 00 00 00 00 00 00 +00000000000C18BA: 33 01 64 A2 02 64 E1 18 JE <%CB>, <$(0xC18E1)> + 0C 00 00 00 00 00 +00000000000C18C8: 39 02 64 10 00 00 00 00 INT <$(0x10)> + 00 00 00 +00000000000C18D3: 30 01 64 A2 02 64 A0 18 JMP <%CB>, <$(0xC18A0)> + 0C 00 00 00 00 00 +00000000000C18E1: 25 POPALL +00000000000C18E2: 32 RET +00000000000C18E3: 20 64 01 64 A1 02 64 FF MOV .64bit <%SP>, <$(0xFFF)> + 0F 00 00 00 00 00 00 +00000000000C18F2: 20 64 01 64 A0 02 64 35 MOV .64bit <%SB>, <$(0xC1A35)> + 1A 0C 00 00 00 00 00 +00000000000C1901: 20 64 03 64 02 64 00 00 MOV .64bit <*1&64($(0xA0000), $(0x30), $(0x8))>, <$(0xC1986)> + 0A 00 00 00 00 00 02 64 + 30 00 00 00 00 00 00 00 + 02 64 08 00 00 00 00 00 + 00 00 01 02 64 86 19 0C + 00 00 00 00 00 +00000000000C192E: 20 64 03 64 02 64 00 00 MOV .64bit <*1&64($(0xA0000), $(0x50), $(0x8))>, <$(0xC1987)> + 0A 00 00 00 00 00 02 64 + 50 00 00 00 00 00 00 00 + 02 64 08 00 00 00 00 00 + 00 00 01 02 64 87 19 0C + 00 00 00 00 00 +00000000000C195B: 31 01 64 A2 02 64 0E 18 CALL <%CB>, <$(0xC180E)> + 0C 00 00 00 00 00 +00000000000C1969: 31 01 64 A2 02 64 58 18 CALL <%CB>, <$(0xC1858)> + 0C 00 00 00 00 00 +00000000000C1977: 31 01 64 A2 02 64 9F 18 CALL <%CB>, <$(0xC189F)> + 0C 00 00 00 00 00 +00000000000C1985: 40 HLT +00000000000C1986: 3B IRET +00000000000C1987: 12 64 01 64 02 01 64 02 XOR .64bit <%FER2>, <%FER2> +00000000000C198F: 12 64 01 64 00 01 64 00 XOR .64bit <%FER0>, <%FER0> +00000000000C1997: 20 64 01 64 01 02 64 1A MOV .64bit <%FER1>, <$(0xC1A1A)> + 1A 0C 00 00 00 00 00 +00000000000C19A6: 20 08 01 08 00 03 08 01 MOV .8bit <%R0>, <*1&8(%FER1, %FER2, $(0x0))> + 64 01 01 64 02 02 64 00 + 00 00 00 00 00 00 00 01 +00000000000C19BE: 01 64 01 64 02 02 64 01 ADD .64bit <%FER2>, <$(0x1)> + 00 00 00 00 00 00 00 +00000000000C19CD: 0A 08 01 08 00 02 64 00 CMP .8bit <%R0>, <$(0x0)> + 00 00 00 00 00 00 00 +00000000000C19DC: 33 01 64 A2 02 64 03 1A JE <%CB>, <$(0xC1A03)> + 0C 00 00 00 00 00 +00000000000C19EA: 39 02 64 10 00 00 00 00 INT <$(0x10)> + 00 00 00 +00000000000C19F5: 30 01 64 A2 02 64 A6 19 JMP <%CB>, <$(0xC19A6)> + 0C 00 00 00 00 00 +00000000000C1A03: 39 02 64 13 00 00 00 00 INT <$(0x13)> + 00 00 00 +00000000000C1A0E: 39 02 64 17 00 00 00 00 INT <$(0x17)> + 00 00 00 +00000000000C1A19: 3B IRET +00000000000C1A1A: .8bit_data < 'K', 'e', 'y', 'b', 'o', 'a', 'r', 'd', ' ', > + .8bit_data < 'I', 'n', 't', 'e', 'r', 'r', 'u', 'p', 't', ' ', > + .8bit_data < 'c', 'a', 'l', 'l', 'e', 'd', '!', 0x00, > +00000000000C1A35: 00 NOP +``` + +## Example B, Real Time Clock + +``` +00000000000C1800: 20 64 01 64 A1 02 64 FF MOV .64bit <%SP>, <$(0xFFF)> + 0F 00 00 00 00 00 00 +00000000000C180F: 20 64 01 64 A0 02 64 1D MOV .64bit <%SB>, <$(0xC1A1D)> + 1A 0C 00 00 00 00 00 +00000000000C181E: 20 64 03 64 02 64 00 00 MOV .64bit <*1&64($(0xA0000), $(0x800), $(0x8))>, <$(0xC1940)> + 0A 00 00 00 00 00 02 64 + 00 08 00 00 00 00 00 00 + 02 64 08 00 00 00 00 00 + 00 00 01 02 64 40 19 0C + 00 00 00 00 00 +00000000000C184B: 20 64 03 64 02 64 00 00 MOV .64bit <*1&64($(0xA0000), $(0x50), $(0x8))>, <$(0xC18B7)> + 0A 00 00 00 00 00 02 64 + 50 00 00 00 00 00 00 00 + 02 64 08 00 00 00 00 00 + 00 00 01 02 64 B7 18 0C + 00 00 00 00 00 +00000000000C1878: 51 64 02 64 71 00 00 00 OUT .64bit <$(0x71)>, <$(0x9C4080)> + 00 00 00 00 02 64 80 40 + 9C 00 00 00 00 00 +00000000000C188E: 39 02 64 14 00 00 00 00 INT <$(0x14)> + 00 00 00 +00000000000C1899: 0A 08 01 08 00 02 64 71 CMP .8bit <%R0>, <$(0x71)> + 00 00 00 00 00 00 00 +00000000000C18A8: 34 01 64 A2 02 64 8E 18 JNE <%CB>, <$(0xC188E)> + 0C 00 00 00 00 00 +00000000000C18B6: 40 HLT +00000000000C18B7: 12 64 01 64 02 01 64 02 XOR .64bit <%FER2>, <%FER2> +00000000000C18BF: 12 64 01 64 00 01 64 00 XOR .64bit <%FER0>, <%FER0> +00000000000C18C7: 20 64 01 64 01 02 64 02 MOV .64bit <%FER1>, <$(0xC1A02)> + 1A 0C 00 00 00 00 00 +00000000000C18D6: 20 08 01 08 00 03 08 01 MOV .8bit <%R0>, <*1&8(%FER1, %FER2, $(0x0))> + 64 01 01 64 02 02 64 00 + 00 00 00 00 00 00 00 01 +00000000000C18EE: 0B 64 01 64 02 INC .64bit <%FER2> +00000000000C18F3: 0A 08 01 08 00 02 64 00 CMP .8bit <%R0>, <$(0x0)> + 00 00 00 00 00 00 00 +00000000000C1902: 33 01 64 A2 02 64 29 19 JE <%CB>, <$(0xC1929)> + 0C 00 00 00 00 00 +00000000000C1910: 39 02 64 10 00 00 00 00 INT <$(0x10)> + 00 00 00 +00000000000C191B: 30 01 64 A2 02 64 D6 18 JMP <%CB>, <$(0xC18D6)> + 0C 00 00 00 00 00 +00000000000C1929: 39 02 64 13 00 00 00 00 INT <$(0x13)> + 00 00 00 +00000000000C1934: 39 02 64 17 00 00 00 00 INT <$(0x17)> + 00 00 00 +00000000000C193F: 3B IRET +00000000000C1940: 50 64 02 64 70 00 00 00 IN .64bit <$(0x70)>, <%FER0> + 00 00 00 00 01 64 00 +00000000000C194F: 20 64 01 64 02 02 64 00 MOV .64bit <%FER2>, <$(0x0)> + 00 00 00 00 00 00 00 +00000000000C195E: 20 64 01 64 01 02 64 0A MOV .64bit <%FER1>, <$(0xA)> + 00 00 00 00 00 00 00 +00000000000C196D: 08 64 01 64 01 DIV .64bit <%FER1> +00000000000C1972: 20 64 01 64 03 01 64 00 MOV .64bit <%FER3>, <%FER0> +00000000000C197A: 20 64 01 64 04 01 64 01 MOV .64bit <%FER4>, <%FER1> +00000000000C1982: 20 64 01 64 00 01 64 01 MOV .64bit <%FER0>, <%FER1> +00000000000C198A: 01 64 01 64 00 02 64 30 ADD .64bit <%FER0>, <$(0x30)> + 00 00 00 00 00 00 00 +00000000000C1999: 22 64 01 64 00 PUSH .64bit <%FER0> +00000000000C199E: 0B 64 01 64 02 INC .64bit <%FER2> +00000000000C19A3: 20 64 01 64 01 01 64 04 MOV .64bit <%FER1>, <%FER4> +00000000000C19AB: 20 64 01 64 00 01 64 03 MOV .64bit <%FER0>, <%FER3> +00000000000C19B3: 0A 64 01 64 00 02 64 00 CMP .64bit <%FER0>, <$(0x0)> + 00 00 00 00 00 00 00 +00000000000C19C2: 34 01 64 A2 02 64 5E 19 JNE <%CB>, <$(0xC195E)> + 0C 00 00 00 00 00 +00000000000C19D0: 20 64 01 64 03 01 64 02 MOV .64bit <%FER3>, <%FER2> +00000000000C19D8: 23 64 01 64 00 POP .64bit <%FER0> +00000000000C19DD: 39 02 64 10 00 00 00 00 INT <$(0x10)> + 00 00 00 +00000000000C19E8: 60 01 64 A2 02 64 D8 19 LOOP <%CB>, <$(0xC19D8)> + 0C 00 00 00 00 00 +00000000000C19F6: 39 02 64 13 00 00 00 00 INT <$(0x13)> + 00 00 00 +00000000000C1A01: 3B IRET +00000000000C1A02: .8bit_data < 'K', 'e', 'y', 'b', 'o', 'a', 'r', 'd', ' ', > + .8bit_data < 'I', 'n', 't', 'e', 'r', 'r', 'u', 'p', 't', ' ', > + .8bit_data < 'c', 'a', 'l', 'l', 'e', 'd', '!', 0x00, > +00000000000C1A1D: 00 NOP +``` + # References diff --git a/src/include/EncodingDecoding.h b/src/include/EncodingDecoding.h index 19f37f845..718e0a662 100644 --- a/src/include/EncodingDecoding.h +++ b/src/include/EncodingDecoding.h @@ -606,7 +606,7 @@ void decode_target(std::vector < std::string > & literal_buffer, std::vector < uint8_t > & code_buffer); /// Regular expression of the operand, captures '<' and '>' as well -const std::regex target_pattern(R"(<\s*(?:\*\s*(?:1|2|4|8|16)\&(8|16|32|64)\s*\([^,]+,[^,]+,[^,]+\)|%(?:R|EXR|HER)[0-7]|%(FER)([\d]+)|%(SB|SP|CB|DB|DP|EB|EP)|%XMM[0-5]|\$\s*\(\s*(?:0[xX][A-Fa-f0-9]+|\s|[+\-.',*\/^%()xX0-9-])+\s*\))\s*>)"); +const std::regex target_pattern(R"(<\s*(?:\*\s*(?:1|2|4|8|16)\&(8|16|32|64)\s*\([^,]+,[^,]+,[^,]+\)|%(?:R|EXR|HER)[0-7]|%(FER)([\d]+)|%(SB|SP|CB|DB|DP|EB|EP)|\$\s*\(\s*(?:0[xX][A-Fa-f0-9]+|\s|[+\-.',*\/^%()xX0-9-])+\s*\))\s*>)"); /*! * @brief Encode an instruction