Ghidra Decompiler Analysis Engine
Core Classes

Introduction

The decompiler attempts to translate from low-level representations of computer programs into high-level representations. Thus it needs to model concepts from both the low-level machine hardware domain and from the high-level software programming domain.

Understanding the classes within the source code that implement these models provides the quickest inroad into obtaining an overall understanding of the code.

We list all these fundemental classes here, loosely grouped as follows. There is one set of classes that describe the Syntax Trees, which are built up from the original p-code, and transformed during the decompiler's simplification process. The Translation classes do the actual building of the syntax trees from binary executables, and the Transformation classes do the actual work of transforming the syntax trees. Finally there is the High-level classes, which for the decompiler represents recovered information, describing familiar software development concepts, like datatypes, prototypes, symbols, variables, etc.

Syntax Trees

  • AddrSpace
    • A place within the reverse engineering model where data can be stored. The typical address spaces are ram, modeling the main databus of a processor, and register, modeling a processors on board registers. Data is stored a byte at a time at offsets within the AddrSpace.
  • Address
    • An AddrSpace and an offset within the space forms the Address of the byte at that offset.
  • Varnode
    • A contiguous set of bytes, given by an Address and a size, encoding a single value in the model. In terms of SSA syntax tree, a Varnode is also a node in the tree.
  • SeqNum
  • PcodeOp
    • A single p-code operation. A single machine instruction is translated into (possibly several) operations in this Register Transfer Language.
    • Overview of PcodeOp
  • BlockBasic
  • Funcdata
    • The root object holding all information about a function, including: the p-code syntax tree, prototype, and local symbol information.
    • Overview of Funcdata

Translation

Transformation

High-level Representation

Overview of SeqNum

A sequence number is a form of extended address for multiple p-code operations that may be associated with the same address. There is a normal Address field. There is a time field which is a static value, determined when an operation is created, that guarantees the uniqueness of the SeqNum. There is also an order field which preserves order information about operations within a basic block. This value may change if the syntax tree is manipulated.

Address & getAddr(); // get the Address field
uintm getTime(); // get the time field
uintm getOrder(); // get the order field

Overview of PcodeOp

A single operation in the p-code language. It has, at most, one Varnode output, and some number of Varnode inputs. The inputs are operated on depending on the opcode of the instruction, producing the output.

OpCode code(); // get the opcode for this op
Address & getAddr(); // get Address of the associated processor instruction
// which generated this op.
SeqNum & getSeqNum(); // get the full unique identifier for this op
int4 numInput(); // get number of Varnode inputs to this op
Varnode * getOut(); // get Varnode output
Varnode * getIn(int4 i); // get (one of the) Varnode inputs
BlockBasic * getParent(); // get basic block containing this op
bool isDead(); // op may no longer be in syntax tree
bool isCall(); // various categories of op
bool isBranch();
bool isBoolOutput();

Overview of BlockBasic

A sequence of PcodeOps with a single path of execution.

int4 sizeOut(); // get number of paths flowing out of this block
int4 sizeIn(); // get number of paths flowing into this block
BlockBasic *getIn(int4 i) // get (one of the) blocks flowing into this
BlockBasic *getOut(int4 i) // get (one of the) blocks flowing out of this
SeqNum & getStart(); // get SeqNum of first operation in block
SeqNum & getStop(); // get SeqNum of last operation in block
BlockBasic *getImmedDom(); // get immediate dominator block
iterator beginOp(); // get iterator to first PcodeOp in block
iterator endOp();

Overview of Funcdata

This is a container for the sytax tree associated with a single function and all other function specific data. It has an associated start address, function prototype, and local scope.

string & getName(); // get name of function
Address & getAddress(); // get Address of function's entry point
int4 numCalls(); // number of subfunctions called by this function
FuncCallSpecs *getCallSpecs(int4 i); // get specs for one of the subfunctions
BlockGraph & getBasicBlocks(); // get the collection of basic blocks
iterator beginLoc(Address &); // Search for Varnodes in tree
iterator beginLoc(int4,Address &); // based on the Varnode's address
iterator beginLoc(int4,Address &,Address &,uintm);
iterator beginDef(uint4,Address &); // Search for Varnode based on the
// address of its defining operation

LoadImage

Action

Rule

Translate

Decodes machine instructions and can produce p-code.

int4 oneInstruction(PcodeEmit &,Address &) const; // produce pcode for one instruction
void printAssembly(ostream &,int4,Address &) const; // print the assembly for one instruction

Datatype

Many objects have an associated Datatype, including Varnodes, Symbols, and FuncProtos. A Datatype is built to resemble the type systems of common high-level languages like C or Java.

type_metatype getMetatype(); // categorize type as VOID, UNKNOWN,
// INT, UINT, BOOL, CODE, FLOAT,
// PTR, ARRAY, STRUCT
string & getName(); // get name of the type
int4 getSize(); // get number of bytes encoding this type

There are base types (in varying sizes) as returned by getMetatype.

TYPE_VOID, // void type
TYPE_UNKNOWN, // unknown type
TYPE_INT, // signed integer
TYPE_UINT, // unsigned integer
TYPE_BOOL, // boolean
TYPE_CODE, // function data
TYPE_FLOAT, // floating point
};

Then these can be used to build compound types, with pointer, array, and structure qualifiers.

class TypePointer : public Datatype { // pointer to (some other type)
Datatype *getBase(); // get Datatype being pointed to
};
class TypeArray : public Datatype { // array of (some other type)
Datatype *getBase(); // get Datatype of array element
};
class TypeStruct : public Datatype { // structure with fields of (some other types)
TypeField *getField(int4,int4,int4 *); // get Datatype of a field
};

TypeFactory

This is a container for Datatypes.

Datatype *findByName(string &); // find a Datatype by name
Datatype *getTypeVoid(); // retrieve common types
Datatype *getTypeChar();
Datatype *getBase(int4 size,type_metatype);
Datatype *getTypePointer(int4,Datatype *,uint4); // get a pointer to another type
Datatype *getTypeArray(int4,Datatype *); // get an array of another type

HighVariable

A single high-level variable can move in and out of various memory locations and registers during the course of its lifetime. A HighVariable encapsulates this concept. It is a collection of (low-level) Varnodes, all of which are used to store data for one high-level variable.

int4 numInstances(); // get number of different Varnodes associated
// with this variable.
Varnode * getInstance(int4); // get (one of the) Varnodes associated with
// this variable.
Datatype * getType(); // get Datatype of this variable
Symbol * getSymbol(); // get Symbol associated with this variable

FuncProto

FuncCallSpecs

Symbol

A particular symbol used for describing memory in the model. This behaves like a normal (high-level language) symbol. It lives in a scope, has a name, and has a Datatype.

string & getName(); // get the name of the symbol
Datatype * getType(); // get the Datatype of the symbol
Scope * getScope(); // get the scope containing the symbol
SymbolEntry * getFirstWholeMap(); // get the (first) SymbolEntry associated
// with this symbol

SymbolEntry

This associates a memory location with a particular symbol, i.e. it maps the symbol to memory. Its, in theory, possible to have more than one SymbolEntry associated with a Symbol.

Address & getAddr(); // get Address of memory location
int4 getSize(); // get size of memory location
Symbol * getSymbol(); // get Symbol associated with location
RangeList & getUseLimit(); // get range of code addresses for which
// this mapping applies

Scope

This is a container for symbols.

SymbolEntry *findAddr(Address &,Address &); // find a Symbol by address
SymbolEntry *findContainer(Address &,int4,Address &); // find containing symbol
Funcdata * findFunction(Address &); // find a function by entry address
Symbol * findByName(string &); // find a Symbol by name
SymbolEntry *queryByAddr(Address &,Address &); // search for symbols across multiple scopes
SymbolEntry *queryContainer(Address &,int4,Address &);
Funcdata * queryFunction(Address &);
Scope * discoverScope(Address &,int4,Address &); // discover scope of an address
string & getName(); // get name of scope
Scope * getParent(); // get parent scope

Database

This is the container for Scopes.

Scope *getGlobalScope(); // get the root/global scope
Scope *resolveScope(string &,Scope *); // resolve a scope by name

Architecture

This is the repository for all information about a particular processor and executable. It holds the symbol table, the processor translator, the load image, the type database, and the transform engine.

class Architecture {
Database * symboltab; // the symbol table
Translate * translate; // the processor translator
LoadImage * loader; // the executable loadimage
ActionDatabase allacts; // transforms which can be performed
TypeFactory * types; // the Datatype database
};
PcodeEmit
Abstract class for emitting pcode to an application.
Definition: translate.hh:76
BlockGraph
A control-flow block built out of sub-components.
Definition: block.hh:271
TypePointer
Datatype object representing a pointer.
Definition: type.hh:228
SeqNum
A class for uniquely labelling and comparing PcodeOps.
Definition: address.hh:111
TYPE_INT
@ TYPE_INT
Signed integer. Signed is considered less specific than unsigned in C.
Definition: type.hh:37
Scope
A collection of Symbol objects within a single (namespace or functional) scope.
Definition: database.hh:402
Architecture::allacts
ActionDatabase allacts
Actions that can be applied in this architecture.
Definition: architecture.hh:162
TypeStruct
A composite Datatype object: A "structure" with component "fields".
Definition: type.hh:310
FuncCallSpecs
A class for analyzing parameters to a sub-function call.
Definition: fspec.hh:1449
ActionDatabase
Database of root Action objects that can be used to transform a function.
Definition: action.hh:298
BlockBasic
A basic block for p-code operations.
Definition: block.hh:365
Architecture
Manager for all the major decompiler subsystems.
Definition: architecture.hh:119
SymbolEntry
A storage location for a particular Symbol.
Definition: database.hh:51
LoadImage
An interface into a particular binary executable image.
Definition: loadimage.hh:71
TypeStruct::getField
const TypeField * getField(int4 off, int4 sz, int4 *newoff) const
Get field based on offset.
Definition: type.cc:897
Varnode
A low-level variable or contiguous set of bytes described by an Address and a size.
Definition: varnode.hh:65
RangeList
A disjoint set of Ranges, possibly across multiple address spaces.
Definition: address.hh:203
TypeField
Specifies subfields of a structure or what a pointer points to.
Definition: type.hh:133
Architecture::translate
const Translate * translate
Translation method for this binary.
Definition: architecture.hh:148
Architecture::types
TypeFactory * types
List of types for this binary.
Definition: architecture.hh:147
Address
A low-level machine address for labelling bytes and data.
Definition: address.hh:46
Funcdata
Container for data structures associated with a single function.
Definition: funcdata.hh:45
Datatype
The base datatype class for the decompiler.
Definition: type.hh:62
TypeArray
Datatype object representing an array of elements.
Definition: type.hh:254
TYPE_VOID
@ TYPE_VOID
Standard "void" type, absence of type.
Definition: type.hh:34
TypeFactory
Container class for all Datatype objects in an Architecture.
Definition: type.hh:396
TYPE_CODE
@ TYPE_CODE
Data is actual executable code.
Definition: type.hh:40
Database
A manager for symbol scopes for a whole executable.
Definition: database.hh:844
TYPE_UNKNOWN
@ TYPE_UNKNOWN
An unknown low-level type. Treated as an unsigned integer.
Definition: type.hh:36
Translate
The interface to a translation engine for a processor.
Definition: translate.hh:294
OpCode
OpCode
The op-code defining a specific p-code operation (PcodeOp)
Definition: opcodes.hh:35
Architecture::loader
LoadImage * loader
Method for loading portions of binary.
Definition: architecture.hh:149
Architecture::symboltab
Database * symboltab
Memory map of global variables and functions.
Definition: architecture.hh:140
type_metatype
type_metatype
Definition: type.hh:33
Symbol
The base class for a symbol in a symbol table or scope.
Definition: database.hh:152
TYPE_BOOL
@ TYPE_BOOL
Boolean.
Definition: type.hh:39
TYPE_FLOAT
@ TYPE_FLOAT
Floating-point.
Definition: type.hh:41
TYPE_UINT
@ TYPE_UINT
Unsigned integer.
Definition: type.hh:38