-
Notifications
You must be signed in to change notification settings - Fork 4
Assembly language
tenyr has sixteen 32-bit two's-complement signed integer registers,
named A
through P
(case is ignored). Two of these registers, A
and
P
, are special, while the others are general purpose (see Instruction
Shorthand and Control Flow for
descriptions of the special A
and P
registers).
Every tenyr instruction can be expressed in four algebraic instruction formats (type0, type1, type2, type3) :
Z <- X op Y + I
Z <- X op I + Y
Z <- I op X + Y
Z <- X + I20
where Z
is a register, op
is one of the accepted arithmetic
operations, +
is addition, X
and Y
are registers,
I
is any immediate integer value between -2048 and 2047 inclusive, and
I20
is any immediate integer value between -524288 and 524287
inclusive (i.e., I
and I20
are 12-bit and 20-bit signed
two's-complement integers immediates). In the first three formats, any
one of the operands on the right hand side can be left out, along with
the operation that precedes it. Examples :
a <- a + a + 0 # type 0
b <- c * d + 3 # type 0
c <- d - e + -2 # type 0
d <- e ^ f # type 0
e <- f - 2 # type 1
f <- 2 | g # type 2
g <- -h # type 1
h <- i >= 0 # type 1
i <- j < k # type 0
j <- k # type 0
k <- j + 0x7abcd # type 3
l <- 0x7abcd # type 3
m <- a + a + 0x79a # type 0
Here are the operations that tenyr supports, with their binary
encodings. Operations are encoded this way to group hardware-similar
operations together, differing by the most sigificant bit only. In the
table below, the column header is read before the row ; e.g., &
is
0b0001
and -
is 0b1100
.
Encoding | Operator | Description | Encoding | Operator | Description |
---|---|---|---|---|---|
0000 |
| |
bitwise OR | 1000 |
|~ |
bitwise OR with complemented second operand |
0001 |
& |
bitwise AND | 1001 |
&~ |
bitwise AND with complemented second operand |
0010 |
^ |
bitwise XOR | 1010 |
^^ |
pack (see below) |
0011 |
>> |
arithmetic right shift | 1011 |
>>> |
logical right shift |
0100 |
+ |
signed addition | 1100 |
- |
signed subtraction |
0101 |
* |
signed multiplication | 1101 |
<< |
left shift |
0110 |
== |
bitwise equality | 1110 |
@ |
test bit at position |
0111 |
< |
signed less-than | 1111 |
>= |
signed greater-than-or-equal |
Some of the operations merit explanation. The test operations (<
,
>=
, ==
, and @
) produce a result that is either 0
(false) or -1
(true). The canonical truth value in tenyr is -1
, not 1
. This
allows us to do clever things with masks, and also explains the
existence of the special &~
and |~
operations -- when the second
operand is a truth value, the bitwise complement works as a Boolean NOT.
The operations also support some syntactical sugar ; for example, B <- ~C
is accepted by the assembler and transformed into B <- A |~ C
or
B <- 0 |~ C
, depending on the required operation type.
The pack
operation (represented by ^^
) concatenates the 20 least
significant bits of the left operand with the 12 least significant bits
of the right operand. This operation makes it easier to construct large
values in registers using immediates.
tenyr works with signed two's complement numbers - the only
operation (besides bitwise operations that have no concept of sign) that
is explicitly unsigned is >>>
, the logical right shift. Whereas >>
(arithmetic right shift) fills in shifted bits with the most significant
bit of the word, >>>
fills in zeros.
A memory operation looks just like a register-register operation, but with one side of the instruction dereferenced, using brackets :
D <- [E * 4 + F] # a load into D
E -> [F << 2] # a store from E
[F] <- 2 # another kind of store, with an immediate value
One instruction can't have brackets on both sides of an arrow, and an immediate value cannot appear on the left side of an arrow.
Although pieces of the right-hand-side of an instruction can be left
out during assembly, under the covers all the pieces are still there ;
the missing parts are filled in with zeros or with references to the
special A
register, which always contains 0
, even if it is written
to. Therefore, each instruction in the following pairs is identical to
the other one in the pair :
B <- 3 ; B <- A | A + 0x00000003
C <- D * E ; C <- D * E + 0x00000000
E <- 1 << B ; E <- 0x00000001 << B + A
To see the expanded form, invoke the disassembler (tas -d
) with the
-v
option.
tenyr has no dedicated control-flow instructions ; flow is
controlled by updating the P
register, which is the program counter /
instruction pointer. Reading from P
will produce the address of the
currently executing instruction, plus one. Writing to it will cause the
next instruction executed to be fetched from the address written into
P
. For example, if this program starts at address 0 :
B <- P # after this instruction, B contains 1
D <- 3 # after this instruction, D contains 3
P <- P - 3 # this is a loop back to the first instruction above
Notice that in the third instruction it was necessary to subtract 3
instead of 2, because the value in P
was effectively the location of
the next instruction that would have been executed in the absence of
a control flow change.
Under normal circumstances, the programmer will likely use the @+symbol
shorthand forms (syntax sugar) to update the P
register :
D <- 5
C <- 10
loop_top:
C <- C - 1
N <- C > D
P <- @+loop_top & N + P
Notice that we used >
even though this is not one of the supported
operations. The assembler accepts >
and rewrites it into a valid
tenyr instruction by swapping the order of the operands and using
<
instead. An analogous transformation occurs for <=
.
All 32-bit words decode to a legal instruction of type 0, 1, 2, or 3. The token
illegal
is accepted by the assembler and encoded as 0xffffffff
, which is
the type3 instruction P <- [P - 1]
. This instruction will update P with the
value of the instruction itself, so it has the effect of P <- 0xffffffff
. The
simulator halts before attempting to execute the instruction at address
0xffffffff.
Labels can be used to identify segments of code and data. A label
is defined by a sequence of alphanumeric characters and underscores,
where that sequence cannot look like a register name (this restriction
may be relaxed in the future). A label is referred to by prefixing @
to its name :
data:
.word 0xdeadbeef
top:
B <- C
D <- @data
E <- @top
Getting the value of @label
directly isn't generally useful, because
its value is relative to the base of the fully-linked object that
contains that symbol, and the link-time base address might not be equal
to the runtime base address where the code or data is loaded in memory.
If code is loaded in a single section at a base address of 0x1000
, one
would need to add 0x1000
to @data
to get the absolute value in
memory. This is easier when using the special label .
, which gives the
offset from the beginning of the current section to the current address ; then
the expression P - (. + 1)
will be the loading offset. This is handled by
the @+symbol
syntax sugar mentioned previously, which produces a
"PC-relocated" address from a label reference.
Immediate values in type{0,1,2} instructions are 12 bits wide, thus ranging from -2048 to 2047. In type3 instructions, they are 20 bits wide, thus ranging from -524288 to 524287. Character constants (in the local character encoding) can appear in immediate expressions :
B <- '$'
C <- 4
An immediate value can also be an expression with multiple terms, as long as :
- all of the terms are constants
- the entire expression is enclosed in parentheses
- a
@label
reference occurs at most once, at the outermost nesting
The result of an immediate expression is computed in the assembler, and only
the resulting immediate value is written out. Many of the tenyr operations
can be used in immediate expressions, too, as well as an additional one :
integer division, with /
. Operator precedence within an immediate expression
follows the rules for the C language.
B <- B ^ ('A' ^ 'a') # flip case of the character in B
C <- ((124 - 1) | 1) # after this, C will contain 123
D <- (8 / 4) # D will contain 2
E <- (16 - 8 / 4) # E will contain 14, not 2
There are a few assembly directives to make assembly easier :
.word 0, 1, 2, 0x1234, 'A' # each value is expanded to 32 bits
.word (2 + @bar), (8 / 4) # expressions are accepted by `.word`
.chars "Hello, world" # each character in its own 32-bit word
.chars "concat" "" "enate" # string constants concatenate
.chars "concat", "enate" # string constants store consecutively
.zero 0x14 # this creates 0x14 = 20 zeros
.global foo # mark symbol visible during linking
Comments start at a #
character and continue to the end of a line. The
code examples above show valid comments.
It is intended that disassembling a program and reassembling it will produce an identical binary. The disassembler takes care to explicitly emit otherwise unnecessary (zero) operands to disambiguate instructions. Any situation where an assembler-disassembler-assembler round-trip does not produce identical output on each round is a bug, and should be reported.