Ghidra Decompiler Analysis Engine
The Basic SLEIGH Interface

To use SLEIGH as a library within an application, there are basically five classes that you need to be aware of.

Translate (or Sleigh)

The core SLEIGH class is Sleigh, which is derived from the interface, Translate. In order to instantiate it in your code, you need a LoadImage object, and a ContextDatabase object. The load image is responsible for retrieving instruction bytes, based on address, from a binary executable. The context database provides the library extra mode information that may be necessary to do the disassembly or translation. This can be used, for instance, to specify that an x86 binary is running in 32-bit mode, or to specify that an ARM processor is running in THUMB mode. Once these objects are built, the Sleigh object can be immediately instantiated.

LoadImageBfd *loader;
ContextDatabase *context;
Translate *trans;
// Set up the loadimage
// Providing an executable name and architecture
string loadimagename = "x86testcode";
string bfdtarget= "default";
loader = new LoadImageBfd(loadimagename,bfdtarget);
loader->open(); // Load the executable from file
context = new ContextInternal(); // Create a processor context
trans = new Sleigh(loader,context); // Instantiate the translator

Once the Sleigh object is in hand, the only required initialization step left is to inform it of the ".sla" file. The file is in XML format and needs to be read in using SLEIGH's built-in XML parser. The following code accomplishes this.

string sleighfilename = "specfiles/x86.sla";
DocumentStorage docstorage;
Element *sleighroot = docstorage.openDocument(sleighfilename)->getRoot();
docstorage.registerTag(sleighroot);
trans->initialize(docstorage); // Initialize the translator

AssemblyEmit

In order to do disassembly, you need to derive a class from AssemblyEmit, and implement the method dump. The library will call this method exactly once, for each instruction disassembled.

This routine simply needs to decide how (and where) to print the corresponding portion of the disassembly. For instance,

class AssemblyRaw : public AssemblyEmit {
public:
virtual void dump(const Address &addr,const string &mnem,const string &body) {
addr.printRaw(cout);
cout << ": " << mnem << ' ' << body << endl;
}
};

This is a minimal implementation that simply dumps the disassembly straight to standard out. Once this object is instantiated, the Sleigh object can use it to write out assembly via the Translate::printAssembly() method.

AssemblyEmit *assememit = new AssemblyRaw();
Address addr(trans->getDefaultCodeSpace(),0x80484c0);
int4 length; // Length of instruction in bytes
length = trans->printAssembly(*assememit,addr);
addr = addr + length; // Advance to next instruction
length = trans->printAssembly(*assememit,addr);
addr = addr + length;
length = trans->printAssembly(*assememit,addr);

PcodeEmit

In order to generate a pcode translation of a machine instruction, you need to derive a class from PcodeEmit and implement the virtual method dump. This method will be invoked once for each pcode operation in the translation of a machine instruction. There will likely be multiple calls per instruction. Each call passes in a single pcode operation, complete with its possible varnode output, and all of its varnode inputs. Here is an example of a PcodeEmit object that simply prints out the pcode.

class PcodeRawOut : public PcodeEmit {
public:
virtual void dump(const Address &addr,OpCode opc,VarnodeData *outvar,VarnodeData *vars,int4 isize);
};
static void print_vardata(ostream &s,VarnodeData &data)
{
s << '(' << data.space->getName() << ',';
data.space->printOffset(s,data.offset);
s << ',' << dec << data.size << ')';
}
void PcodeRawOut::dump(const Address &addr,OpCode opc,VarnodeData *outvar,VarnodeData *vars,int4 isize)
{
if (outvar != (VarnodeData *)0) { // The output is optional
print_vardata(cout,*outvar);
cout << " = ";
}
cout << get_opname(opc);
// Possibly check for a code reference or a space reference
for(int4 i=0;i<isize;++i) {
cout << ' ';
print_vardata(cout,vars[i]);
}
cout << endl;
}

Notice that the dump routine uses the built-in function get_opname to find a string version of the opcode. Each varnode is defined in terms of the VarnodeData object, which is defined simply:

struct VarnodeData {
AddrSpace *space; // The address space
uintb offset; // The offset within the space
uint4 size; // The number of bytes at that location
};

Once the PcodeEmit object is instantiated, the Sleigh object can use it to generate pcode, one instruction at a time, using the Translate::oneInstruction() const method.

PcodeEmit *pcodeemit = new PcodeRawOut();
Address addr(trans->getDefaultCodeSpace(),0x80484c0);
int4 length; // Length of instruction in bytes
length = trans->oneInstruction(*pcodeemit,addr);
addr = addr + length; // Advance to next instruction
length = trans->oneInstruction(*pcodeemit,addr);
addr = addr + length;
length = trans->oneInstruction(*pcodeemit,addr);

For an application to properly follow flow, while translating machine instructions into pcode, the emitted pcode must be inspected for the various branch operations.

LoadImage

A LoadImage holds all the binary data from an executable file in the format similar to how it would exist when being executed by a real processor. The interface to this from SLEIGH is actually very simple, although it can hide a complicated structure. One method does most of the work, LoadImage::loadFill(). It takes a byte pointer, a size, and an Address. The method is expected to fill in the ptr array with size bytes taken from the load image, corresponding to the address addr. There are two more virtual methods that are required for a complete implementation of LoadImage, getArchType and adjustVma, but these do not need to be implemented fully.

class MyLoadImage : public LoadImage {
public:
MyLoadImage(const string &nm) : Loadimage(nm) {}
virtual void loadFill(uint1 *ptr,int4 size,const Address &addr);
virtual string getArchType(void) const { return "mytype"; }
virtual void adjustVma(long adjust) {}
};

ContextDatabase

The ContextDatabase needs to keep track of any possible context variable and its value, over different address ranges. In most cases, you probably don't need to override the class yourself, but can use the built-in class, ContextInternal. This provides the basic functionality required and will work for different architectures. What you may need to do is set values for certain variables, depending on the processor and the environment it is running in. For instance, for the x86 platform, you need to set the addrsize and opsize bits, to indicate the processor would be running in 32-bit mode. The context variables specific to a particular processor are established by the SLEIGH spec. So the variables can only be set after the spec has been loaded.

...
context = new ContextInternal();
trans = new Sleigh(loader,context);
DocumentStorage docstorage;
Element *root = docstorage.openDocument("specfiles/x86.sla")->getRoot();
docstorage.registerTag(root);
trans->initialize(docstorage);
context->setVariableDefault("addrsize",1); // Address size is 32-bits
context->setVariableDefault("opsize",1); // Operand size is 32-bits
Translate::printAssembly
virtual int4 printAssembly(AssemblyEmit &emit, const Address &baseaddr) const =0
Disassemble a single machine instruction.
PcodeEmit
Abstract class for emitting pcode to an application.
Definition: translate.hh:76
ContextInternal
An in-memory implementation of the ContextDatabase interface.
Definition: globalcontext.hh:256
AddrSpace
A region where processor data is stored.
Definition: space.hh:73
VarnodeData::space
AddrSpace * space
The address space.
Definition: pcoderaw.hh:34
Translate::oneInstruction
virtual int4 oneInstruction(PcodeEmit &emit, const Address &baseaddr) const =0
Transform a single machine instruction into pcode.
MyLoadImage::getArchType
virtual string getArchType(void) const
Get a string indicating the architecture type.
Definition: sleighexample.cc:74
DocumentStorage::registerTag
void registerTag(const Element *el)
Register the given XML Element object under its tag name.
Definition: xml.cc:2321
get_opname
const char * get_opname(OpCode opc)
Convert an OpCode to the name as a string.
Definition: opcodes.cc:58
AssemblyEmit
Abstract class for emitting disassembly to an application.
Definition: translate.hh:118
LoadImageBfd
Definition: loadimage_bfd.hh:33
PcodeRawOut::dump
virtual void dump(const Address &addr, OpCode opc, VarnodeData *outvar, VarnodeData *vars, int4 isize)
The main pcode emit method.
Definition: sleighexample.cc:144
MyLoadImage::adjustVma
virtual void adjustVma(long adjust)
Adjust load addresses with a global offset.
Definition: sleighexample.cc:75
Element
An XML element. A node in the DOM tree.
Definition: xml.hh:150
ContextDatabase::setVariableDefault
void setVariableDefault(const string &nm, uintm val)
Provide a default value for a context variable.
Definition: globalcontext.cc:109
LoadImage
An interface into a particular binary executable image.
Definition: loadimage.hh:71
MyLoadImage
Definition: sleighexample.cc:67
MyLoadImage::loadFill
virtual void loadFill(uint1 *ptr, int4 size, const Address &addr)
Get data from the LoadImage.
Definition: sleighexample.cc:80
VarnodeData::offset
uintb offset
The offset within the space.
Definition: pcoderaw.hh:35
VarnodeData::size
uint4 size
The number of bytes in the location.
Definition: pcoderaw.hh:36
Sleigh
A full SLEIGH engine.
Definition: sleigh.hh:158
AddrSpaceManager::getDefaultCodeSpace
AddrSpace * getDefaultCodeSpace(void) const
Get the default address space of this processor.
Definition: translate.hh:491
PcodeRawOut
Definition: sleighexample.cc:131
AddrSpace::getName
const string & getName(void) const
Get the name.
Definition: space.hh:264
Address
A low-level machine address for labelling bytes and data.
Definition: address.hh:46
ContextDatabase
An interface to a database of disassembly/decompiler context information.
Definition: globalcontext.hh:108
Translate
The interface to a translation engine for a processor.
Definition: translate.hh:294
OpCode
OpCode
The op-code defining a specific p-code operation (PcodeOp)
Definition: opcodes.hh:35
Address::printRaw
void printRaw(ostream &s) const
Write a raw version of the address to a stream.
Definition: address.hh:276
VarnodeData
Data defining a specific memory location.
Definition: pcoderaw.hh:33
DocumentStorage
A container for parsed XML documents.
Definition: xml.hh:249
AssemblyRaw
Definition: sleighexample.cc:102
DocumentStorage::openDocument
Document * openDocument(const string &filename)
Open and parse an XML file.
Definition: xml.cc:2310
AddrSpace::printOffset
void printOffset(ostream &s, uintb offset) const
Write an address offset to a stream.
Definition: space.cc:177
Translate::initialize
virtual void initialize(DocumentStorage &store)=0
Initialize the translator given XML configuration documents.
AssemblyRaw::dump
virtual void dump(const Address &addr, const string &mnem, const string &body)
The main disassembly emitting method.
Definition: sleighexample.cc:104