Table of Contents
The compiler specification is a required part of a Ghidra language module for supporting disassembly and analysis of a particular processor. Its purpose is to encode information about a target binary which is specific to the compiler that generated that binary. Within Ghidra, the SLEIGH specification allows the decoding of machine instructions for a particular processor, like Intel x86, but more than one compiler can produce those instructions. For a particular target binary, understanding details about the specific compiler used to build it is important to the reverse engineering process. The compiler specification fills this need, allowing concepts like parameter passing conventions and stack mechanisms to be formally described.
A compiler specification is a single file contained in a
module's data/languages
directory
with a ".cspec" suffix. There may be more than one ".cspec" file in
the directory, if Ghidra supports multiple compilers for the same processor. The
compiler specification is identified by the 5th field of
Ghidra's processor id. The id is explicitly
linked with the ".cspec" by adding a tag in the root ".ldefs" file for
the processor, also in the same directory.
Example 1.
x86:LE:32:default:gcc
and associating it with the file x86gcc.cspec
<language_definitions>
...
<language processor="x86"
endian="little"
size="32"
variant="default"
version="2.3"
slafile="x86.sla"
processorspec="x86.pspec"
manualindexfile="../manuals/x86.idx"
id="x86:LE:32:default">
<description>Intel/AMD 32-bit x86</description>
<compiler name="Visual Studio" spec="x86win.cspec" id="windows"/>
<compiler name="gcc" spec="x86gcc.cspec" id="gcc"/>
<compiler name="Borland" spec="x86borland.cspec" id="borland"/>
</language>
...
</language_definitions>
A compiler specification is just an XML file, so it needs to start
with the usual XML directive and it always
has <compiler_spec>
as the root XML tag. All
specific compiler features are described using subtags to this tag. In
principle, all the subtags are optional except
the <default_prototype>
tag, but there is generally a
minimum set of tags that are needed to create a useful specification
(See ???). In general, the subtags can appear in any order. The only
exceptions are that tags which define names,
like <prototype>
, must appear before other tags
which use that name.
The rest of this document is broken up into sections that roughly correspond with aspects of compiler design, and then subsections within these address particular tags.
Many parts of the compiler specification use tags that describe a single varnode. Since architectures frequently name many of their registers or special memory locations, it is convenient for the specification designer to be able to use these names. But in some cases there is no name and the designer must fall back on the defining triple for a varnode: an address space, an offset and a size. Hence there are really two different XML tags that are used to describe varnodes and both are referred to as a varnode tag.
The <register>
tag is used to specify formally named registers, usually defined by
the SLEIGH specification for the processor. The name must be given in a name attribute
for the tag.
The <varnode>
tag is used to generically describe any varnode. It must take
three attributes:
space is a formal name of the address space containing the varnode,
offset is an unsigned integer specifying the byte offset of the varnode
within the space, and size is an integer specifying the size of the varnode in bytes.
The <varnode>
tag can be used to describe any varnode, including named registers, global
RAM locations, and stack locations. For stack locations, the offset is interpreted relative to the
function that is being decompiled or is otherwise in scope. An offset of 0, for instance typically refers
to the memory location on the stack being pointed to by the formal stack pointer register, upon entry
to the function being analyzed.
Example 2.
<register name="EAX"/> <register name="r1"/> <varnode space="ram" offset="0x1020" size="4"/> <varnode space="stack" offset="8" size="8"/> <varnode space="stack" offset="0xfffffff8" size="2"/> <varnode space="register" offset="0" size="1"/>