Ghidra Decompiler Analysis Engine
|
The decompiler attempts to translate from low-level representations of computer programs into high-level representations. Thus it needs to model concepts from both the low-level machine hardware domain and from the high-level software programming domain.
Understanding the classes within the source code that implement these models provides the quickest inroad into obtaining an overall understanding of the code.
We list all these fundemental classes here, loosely grouped as follows. There is one set of classes that describe the Syntax Trees, which are built up from the original p-code, and transformed during the decompiler's simplification process. The Translation classes do the actual building of the syntax trees from binary executables, and the Transformation classes do the actual work of transforming the syntax trees. Finally there is the High-level classes, which for the decompiler represents recovered information, describing familiar software development concepts, like datatypes, prototypes, symbols, variables, etc.
Syntax Trees
Translation
Transformation
High-level Representation
A sequence number is a form of extended address for multiple p-code operations that may be associated with the same address. There is a normal Address field. There is a time field which is a static value, determined when an operation is created, that guarantees the uniqueness of the SeqNum. There is also an order field which preserves order information about operations within a basic block. This value may change if the syntax tree is manipulated.
A single operation in the p-code language. It has, at most, one Varnode output, and some number of Varnode inputs. The inputs are operated on depending on the opcode of the instruction, producing the output.
A sequence of PcodeOps with a single path of execution.
This is a container for the sytax tree associated with a single function and all other function specific data. It has an associated start address, function prototype, and local scope.
Decodes machine instructions and can produce p-code.
Many objects have an associated Datatype, including Varnodes, Symbols, and FuncProtos. A Datatype is built to resemble the type systems of common high-level languages like C or Java.
There are base types (in varying sizes) as returned by getMetatype.
Then these can be used to build compound types, with pointer, array, and structure qualifiers.
This is a container for Datatypes.
A single high-level variable can move in and out of various memory locations and registers during the course of its lifetime. A HighVariable encapsulates this concept. It is a collection of (low-level) Varnodes, all of which are used to store data for one high-level variable.
A particular symbol used for describing memory in the model. This behaves like a normal (high-level language) symbol. It lives in a scope, has a name, and has a Datatype.
This associates a memory location with a particular symbol, i.e. it maps the symbol to memory. Its, in theory, possible to have more than one SymbolEntry associated with a Symbol.
This is a container for symbols.
This is the container for Scopes.
This is the repository for all information about a particular processor and executable. It holds the symbol table, the processor translator, the load image, the type database, and the transform engine.