-
Notifications
You must be signed in to change notification settings - Fork 7
Syntax
The GDSL compiler translates a complete program at a time. This program may be spread over several files. By using certain files (such as specifications/basis/prelude) in several decoders, we can re-use certain functions or type definitions.
Functions and constants are defined using the keyword val followed by the function name and its arguments (if any):
val f x y = x*x+y*y
The function body x*x+y*y is an expression which can contain local definitions that are defined using a let construct. The function above can thus be rewritten as follows:
val f x y =
let
val double x = x*x
val sum x y = x+y
in
sum (double x) (double y)
end
Note that a function call is written by the function name followed by its arguments, so that compound arguments must be put into parenthesis. The sequence of declarations does not matter and functions may be mutually recursive. Any top-level constant is evaluated when initializing the GDSL runtime, so they are always evaluated once.
GDSL is a pure functional language, that is, variables can not be updated by side-effects. In order to conveniently pass around a state, GDSL provides so-called monadic actions which are functions that implicitly take a state and return a modified state. (Haskell programmers say that functions have the type of a state monad.) These actions are executed in sequence using the do keyword:
val action = do
res1 <- action1;
res2 <- action2;
action3
end
The do-notation is thus used to compose actions so that they are evaluated in sequence, meaning that the input state of action is passed to action1 whose result is passed to action2, then to action3 and after that, the state is returned by action. Each action may return an additional result that can be bound to a variable using a left arrow; here the result res1 of action1 can be used as argument to action2. The result of the last action action3 is returned as the result from action.
All actions ultimately boil down to performing three built-in actions: query and update and return, all of them taking one argument. The action query f applies the function f to the internal state and returns the result (the state is unchanged). update f applies f to the internal state and the result becomes the output state of update f. The return value of the action is () (the empty tuple) which can only be discarded. Finally, return v leaves the state unchanged and simply returns the value v. The return is like the let keyword in that it can be used to bind a value to a variable. Thus, it does not alter the control flow as the return keyword in imperative languages. As an example, the following example updates the internal state to 5, reads this state, sets it to 6, reads this value and returns their sum:
# const resturns a one-argument function that always returns x
val const x =
let
val set y = x
in
set
end
# the identity function
val id x = x
val action =
do
update (const 5);
x <- query id;
update (const 6);
y <- query id;
return (x+y)
end
Being able to pass a state implicitly is key to define very succinct decoders, which are discussed next.
Decoders are a special kind of functions that pattern match against an implicit internal byte stream. In contrast to functions which can only have one definition, a decoder can have several definitions that accumulate. For instance, the following decoder d pattern matches against either a byte sequence starting with 0x01 or 0x02 and fails otherwise:
val d [0x01] = return "found 01"
val d [0x02] = return "found 02"
A failing decoder raises an exception at runtime that can be caught by the programmer. A decoder can be written to match patterns of different lengths. In particular, one can add a default pattern that matches if no other pattern does:
val d [0x01 0x00] = return "found 01 zero"
val d [0x01 0x01] = return "found 01 one"
val d [0x02] = return "found 02"
val d [] = return "found something else"
This feature can be used to implement backtracking by calling another decoder in the [] case.
For variable-length decoder rules, the GDSL compiler requires that the matching rule can be determined by examining the input from left to right, reading on so-called token from the input and matching it. Each decoder has one fixed token size which is determined by the patterns: Each hexadecimal number gives rise to a token of size 4n bits where n is the number of digits after the 0x. It is an error if a decoder contains tokens of different size such as 0x33 0x444.
Instructions usually have arguments that follow certain patterns. Suppose that the second byte of the following patterns denote the argument. This argument should translate to "zero" or "one":
val d [0x01 0x00] = return "found 01 zero"
val d [0x01 0x01] = return "found 01 one"
val d [0x02 0x00] = return "found 02 zero"
val d [0x02 0x01] = return "found 02 one"
Instead of re-defining the logic to match an argument in every instruction, it is possible to define a subdecoder that matches the argument and stores it into the internal state. The content of the internal state can then be extracted when constructing the result of the main decoder. The following definition is equivalent to the one above:
# take the argument and concatenate it to the string in the state
# (here +++ denotes string concatenation)
val genRes mnemonic = do
arg <- query id;
return ("mnmeonic" +++ arg)
end
val d [0x01 arg] = genRes "found 01 "
val d [0x02 arg] = genRes "found 02 "
val arg [0x00] = update (const "zero")
val arg [0x01] = update (const "one")
The patterns of a decoder may not overlap, so that matching the rules in a different sequence than in which are they were written has not effect on which rule matches. This property is enforced by the compiler and allows it to generate efficient decoders.
Specifying non-overlapping patterns requires a way to concisely specify several patterns duplicating rules. We provide no general way of negating patterns (i.e. match everything but 0x01). Instead, we provide bit-patterns with wild cards. A bit pattern is enclosed in ticks and contains the following:
- 0: matches a zero bit
- 1: matches a one bit
- .: matches either a zero or a one
- p|q: matches either bit pattern p or pattern q which must consists of 0, 1, or . (no space allowed)
- x:n: matches a bit pattern of n bits and bits this pattern to variable x
- x@p: matches bit pattern p (no spaces allowed) and binds the result to x
- x: call to a subdecoder
The following decoder contains non-overlapping patterns:
val d ['0000 x@11|10 ..'] = return "eight bits, value greater than 0b1000"
val d ['0000 x@00|01 ..'] = return "eight bits, value less than 0b0100"
Most instruction sets are quite orthogonal in their pattern space and often use patterns where a few bits are parameters:
val d ['0000 x:2 ..'] = return "upper nibble 0, two-bit argument in x"
More examples for common bit patterns can be found in the specifications/avr/avr.ml decoder file.
Occasionally, a decoder has a mode that determines which instruction is generated. In this case, the same pattern matches but further tests on the internal state of the decoder are necessary to determine the final result. GDSL provides guards to simplify writing this branching. A guard is a function that is passed the internal state of the decoder and that then return true or false. The right-hand side of the guard is only executed if the guard evaluates to true. Consider the following rule:
val isFive x = x==5
val d ['0000 arg']
| isFive = "the arg is five"
| otherwise = "the arg is not five"
val arg ['0101'] = update (const 5)
val arg ['1010'] = update (const 0)
Here, the function isFive is passed the internal state (and integer which is 5 or 0, depending on the arg decoder). The function otherwise is pre-defined in the prelude.ml file and always returns true so that the second rule is taken if isFive returns false. Examples of guards can be found in the decoder in specifications/x86/x86.ml.
The base types of GDSL are integers, strings, and bit-vectors of fixed size. Their operations are as follows:
-
Integers literals are decimal digits or hexadecimal numbers written 0xABCD. Operations are +, -, *, /z (division with rounding towards zero), /m (towards minus infinity), /p (towards plus infinity) and the relational tests <, <=, === (note that == is equality of vectors), >=, >.
-
String constants "I am a string" can be concatenated using the infix operator +++. Internally, strings are ropes, that is, a data structure that represents a concatenation of strings as a tree of string literals. When passing a string to a C function, a rope (GDSL string) can be converted to a pointer to a NUL-terminated string using the function string-from-rope. An integer can be turned into a string using show-int.
-
Bit vectors literals are written 100100 and have a fixed size. Bit vectors can be pattern matched in case statements using the bit patterns shown earlier (without subdecoders). Two bit vectors can be concatenated using the infix operator ^. Two vectors of equal size can be combined using the functions and and or. A vector can be negated using not and compared using ==. Functions that take bit vectors can be polymorphic in the size of the bit vectors.
Bit vectors of size one are used as Booleans. In particular, GDSL features the standard if cond then expr1 else expr2 construct that is equivalent to the following case-statement:
case cond of
'1' : expr1
| '0' : expr2
end
More complex values can be constructed using records (products of types) and algebraic data types (sums of types).
Records are sets of field names and their values, written { foo=7, bar="Hello" }. A field can be add or replaced using a record update @{ bar=42 } which is a function that takes a record. Thus, @{ bar=42 } { foo=7, bar="Hello" } evaluates to { foo=7, bar=42 }. The sequence of the field names does not matter. Extracting a value from a record is done using a record selector $foo which is also a function from a record to the type of the field. Thus, $foo { foo=7, bar=42 } evaluates to 7.
One stronghold of functional languages is their ability to define algebraic data structures that allow a compact representation of syntax trees. In GDSL, algebraic data types must be declared using a type declaration as follows:
type adt =
Con1
| Con2 of int
Here, the new type adt is declared which has two variants: it can either have the value Con1 or the value Con2 x where x must be an integer. The payload of the constructor Con2 can be extracted using pattern matching as follows:
val f v =
case v of
Con1 : "v contains Con1"
| Con2 x: "v contains Con2 with value " ^ show-int x
end
In order to write a compilable program, it also has to pass the type checker.
The syntax of GDSL is presented in GDSL: A Generic Decoder Specification Language for Interpreting Machine Language and GDSL: A Universal Toolkit for Giving Semantics to Machine Language.
- Overview
- QuickStart
- Installation
- GDSL
-
Specifications
- Intel x86 32/64 bit
- MIPS32
- Atmel AVR 8-bit
- TI MSP430 8-bit
- Contributing
- RReil
- Libraries and Demos
- Acknowledgements
- Support