Skip to content
Julian Kranz edited this page May 28, 2015 · 8 revisions

Overview

The toolkit generates frontends for the analysis of executable code. It is meant as a common platform to specify instruction decoders and translations into intermediate representations. It consists of a compiler from a domain specific language called GDSL to C (other target languages can be added) and several decoders and semantic translations. See the specifications for a list of existing decoders and semantic translations. The toolkit also contains a few demo applications and libraries.

Generic Decoder Specification Language (GDSL)

GDSL is an ML-like functional language that is designed to create decoders from byte streams to assembler instructions and from there to a semantics. Its design goals can be found here. It was motivated by a discussion at the Dagstuhl seminar 12051 where the need for a public-domain cross-platform frontend was raised.

GDSL Compiler

Our compiler for GDSL is written in SML and emits C code. Other output languages can be added if needed. The aim of the code generation is to preserve the structure of the input program as much as possible in order to enable the user to debug and profile the program at the C level. Indeed, the emitted code is closed to hand-written C which allows standard C compilers to create very efficient code. In our experiments our Intel x86 decoder outperforms the XED instruction decoder that is shipped with Intel's Pin toolkit.

Type Inference

Besides testing our decoders against other available decoders, our language contains a sophisticated type inference that infers standard Hindley-Milner types (as in ML or Haskell) but furthermore infers the length of bit vectors in the presence of concatenation and features row-polymorphic records. The latter are simply collections of field-name, value pairs. The inference is able to check that a field is defined whenever it is used. Besides obvious programming errors, the type inference has highlighted a few misprints in the Intel instruction manual (the version used is from May 2012):

  • CVTDQ2PD and CVTPD2DQ with f3 and vex f3 0f prefix lack an /r

  • PSLLDQ and PSLRDQ are specified with /7 and /3, respectively, but a general register is used afterwards; we assume the instructions are valid for the more general /r

  • multi-byte NOP instruction can only take EAX as argument

Releases

Version Date Platform URL
1.1.0 28.05.2015 Unix https://github.com/gdslang/gdsl-toolkit/wiki/dist/gdsl-toolkit-1.1.0.tar.gz
1.1.0 28.05.2015 Windows https://github.com/gdslang/gdsl-toolkit/wiki/dist/win-gdsl-toolkit-1.1.0.zip
1.0.0 17.09.2014 Unix https://github.com/gdslang/gdsl-toolkit/wiki/dist/gdsl-toolkit-1.0.0.tar.gz
1.0.0 17.09.2014 Windows https://github.com/gdslang/gdsl-toolkit/wiki/dist/win-gdsl-toolkit-1.0.0.zip
0.9.1 21.01.2014 Unix https://github.com/gdslang/gdsl-toolkit/wiki/dist/gdsl-toolkit-0.9.1.tar.gz
0.9.0 23.10.2013 Unix https://github.com/gdslang/gdsl-toolkit/wiki/dist/gdsl-toolkit-0.9.0.tar.gz