DecoderTranslatorAPI

Jump to bottom Edit New page

Benedikt Geßele edited this page Apr 7, 2015 · 2 revisions

API Syntax

The signature of functions is given in GDSL using the syntax known from export declarations and in C providing the signature of the C interface (using the default prefix "gdsl").

Required API functions and data types for Decoder/Translator modules

Decoder/Translator modules have to provide a defined API so that they can be loaded by the gdsl-multiplex library. The required functions are described in the following.

Decoder API: Functions

This section describes the API functions of decoders.

config-default

config-default : decoder-configuration // GDSL int_t gdsl_config_default(state_t s) // C

The config-default function returns a default configuration vector for the architecture. Each bit of this vector represents one configuration option. The default configuration can be used to decode binary code without architecture-specific configuration code.

decode

decode : (configuration : decoder-configuration) -> S insndata <{} => {}> // GDSL insndata_t gdsl_decode(state_t s, int_t config) // C

The decode function decodes one instruction read from the input stream. It expects the architecture configuration as a parameter; this parameter is a vector consisting of configuration options. The function returns the instruction object.

generalize

generalize : (instruction : insndata) -> asm-insn // GDSL asm_insn_t gdsl_generalize(state_t s, insndata_t insn) // C

The generalize function turns an instruction into a generic representation common to all decoders. This instruction representation allows for a architecture-independent processing of assembly instructions. The function expects the instruction to be transformed as its only parameter.

int insn-length(instruction insn)

The insn-length function returns the length of an instruction in bytes. It expects the instruction as a parameter.

Remark: This function is obsolete and no longer needed.

int operands(instruction insn)

The operands function determines the number of operands of an instruction. The respective instruction object is expected as a parameter.

Remark: This function is obsolete and no longer needed.

rope pretty(instruction insn)

The pretty function converts an instruction to its string representation. It returns a rope object since strings are represented as ropes in GDSL. The function expects the instruction object to convert as its only parameter.

rope pretty-operand(instruction insn, int i)

The pretty-operand function converts the i-th operand of an instruction to its string representation. It returns a rope object since strings are represented as ropes in GDSL. The function expects the instruction object and the index of the operand to convert (starting with zero) as its parameters.

Remark: This function is obsolete and no longer needed.

rope pretty-mnemonic(instruction insn)

The pretty_mnemonic function returns the mnemonic of the instruction as a string rope. It returns a rope object since strings are represented as ropes in GDSL. The function expects the instruction object as its only parameter.

Remark: This function is obsolete and no longer needed.

int typeof-opnd(instruction insn, int i)

The typeof-opnd function determines the type of the i-th operand of an instruction. The type is represented by an integer value. The instruction and the index of the operand (starting with zero) are expected as parameters. Currently, the following operand types exist:

operand type	mapped integer
Immediate	0
Register	1
Memory location	2
Linear expression	3
Flow operand	4

Remark: This function is obsolete and no longer needed.

Decoder API: Data Types

Some data types need to be defined by decoders using predetermined names since they are referenced by other modules. The following describes all such data types.

insndata

The insndata data type is the GDSL type of a machine-specific instructions.

decoder-configuration

The decoder-configuration data type is the GDSL type used for the configuration vector of the decoder. This data type needs to be an alias for a vector of a specific length. For example, the following definition defines the decoder-configuration type for an architecture that requires two configuration options:

type decoder-configuration = |2|

Translator API: Functions

This section describes the API functions of semantic translators.

rreil translate(instruction insn)

The translate function translates a given instruction to RReil. It expects the instruction as its only parameter.

rreil translate-block-single(instruction insn)

The translate-block-single function is similar to the translate function; it also translates one machine instruction into RReil. In contrast to the translate function it accumulates the additional RReil statements in the global state instead of returning them. This way it can be integrated in the translation of a entire basic block.

int_option relative-next(rreil stmts)

The relative-next function tries to determine the address(es) of the succeeding blocks relative to the value of the IP at the end of the block given as parameter. The function returns a record {a, b} that contains two int options representing up to two addresses. An int option may either cotain an integer value (IO_SOME v) or no value (IO_NONE). It is recommended for this function to make use of the function

int_option relative-next-generic(boolean (is-sem-ip)(sem_id), rreil stmts)

that determines the respective addresses given an additional function as parameter that helps to tell the IP register and other registers apart.

rope pretty-arch-id(sem_id id)

The pretty-arch-id function converts an architecture specific semantics id, i.e. an RReil register that represents (some part of) a machine register, into its string representation.

rope pretty-arch-exception(exception ex)

The pretty-arch-id function converts an architecture specific exception into its string representation.

GDSL Decoder API

void endianness(vector end_mask)

The endianness function allows the implementor of a decoder to specify the endianness of the architecture. The setting influences the order in which bytes are read from the input stream. Assuming the programmer used endianness to set end_mask, the offset into the code buffer for the next byte to be consumed is calculated as follows:

offset = ((bytes_read + base_address) XOR zero_extend(end_mask)) - base_address

In the formula, bytes_read represents the number of bytes already read and base_address is the base address of the code (see set-code).

GDSL Decoder Pattern Matching Order

In GDSL, decoder patterns are at the heart of each decoder implementation. Currently, decoder patterns are matched from left to right expecting the most significant bit of an instruction on the very left. As a result, GDSL expects the first byte returned by consume to be the byte with the highest address within the current instruction word. As an example, consider matching the consecutively consumed bytes 0x42, 0x99, 0x12, and 0x77 to a bit pattern:

Pattern Matching

Finding the Right end_mask

Depending on the architecture, the programmer may have to consider two different sizes when it comes to byte ordering: the size of a single instruction (or a fixed part of it) and the size of the chunks the processor reads from memory at a time. The default end_mask value is zero; this is correct for processors that use a big endian encoding and load each instruction separately. By setting bit at index b in end_mask to one, the programmer configures a deviation from that chunk ordering considering memory blocks of size 2^b. As an example, consider a processor that accesses the memory through blocks of 8 bytes, each containing two 4 byte instruction words. Furthermore, assume the instruction words to be in big-endian order (within the 8 byte block); the bytes of each instruction are also encoded using big-endian ordering. In this case, a value of '100' is required for end_mask:

Offset Calculation