A prototype model, in Ghidra, is a set of rules for determining how parameters and return values are passed between a function and its subfunction. For a high-level language (such as C or Java), a function prototype is the ordered list of parameters (each specified as a name and a datatype) that are passed to the function as input plus the optional value (specified as just a dataype) returned by the function. A prototype model specifies how a compiler decides which storage locations are used to hold the actual values at run time.
From a reverse engineering perspective, Ghidra also needs to solve the inverse problem: given a set of storage locations (registers and stack locations) that look like they are inputs and outputs to a function, determine a high-level function prototype that produces those locations when compiled. The same prototype model is used to solve this problem as well, but in this case, the solution may not be unique, or can only be exactly derived from information that Ghidra doesn't have.
The <prototype>
tag encodes details about a specific prototype model, within a compiler
specification. A given compiler spec
can have multiple prototype models, which are all distinguished by the mandatory name attribute
for the tag. Other Ghidra tools refer to prototype model's by this name, and it must be unique
across all models in the compiler spec. All <prototype>
tags must include the subtags,
<input>
and <output>
, which list storage locations
(registers, stack, and other varnodes) as
the raw material for the prototype model to decide where parameters are stored for passing
between functions. The <input>
tag holds the resources used to pass input parameters, and
<output>
describes resources for return value storage. A resource is described by
the <pentry>
tag, which comes in two flavors. Most <pentry>
tags describe a storage location to be used by a single variable. If the tag has an
align attribute however, multiple
variables can be allocated from the same resource, where different variables must be aligned
relative to the start of the resource as specified by the attribute's value.
How <pentry>
resources are used is
determined by the prototype model's strategy. This is specified as an optional attribute
to the main <prototype>
tag. There are currently only two strategies:
standard and register. If the attribute is not present,
the prototype model defaults to the standard strategy.
For this strategy, the <pentry>
subtags under the
<input>
tag are viewed as an ordered resource list.
When assigning storage locations from a list of datatypes, each datatype is evaluated
in order. The first <pentry>
from the resource list that fits the datatype and hasn't
been fully used by previous datatypes is assigned to that datatype.
In this case, the <input>
tag
lists varnodes in the order that a compiler would dole them out when given a list of parameters to
pass. Integer or pointer values are usually passed first in specially designated registers rather than on the
stack if there are not enough available registers. There can one stack-based
<pentry>
at the end of the list that will typically match any number of
parameters of any size or type.
If there are separate <pentry>
tags for dedicated floating-point registers,
the standard strategy treats them as a separate resource list, independent of the one for
integer and pointer datatypes.
The <pentry>
tags specifying floating-point registers are listed in the same
<input>
tag, immediately after the integer registers, and are distinguished by
the metatype="float"
attribute labeling the individual tags.
For the inverse case, where the decompiler must infer a prototype from data-flow and liveness, the
standard strategy expects there to be no gaps in the usage of the
(either) resource list.
For a putative input varnode to be considered a formal parameter, it must occur somewhere in the
<pentry>
resource list. If there is a gap, i.e. the second
<pentry>
occurs as a varnode but not the first, then the decompiler
will fill in the gap by creating an extra unused parameter. Or if the gap is too big,
the original input varnode will not be considered a formal parameter.
This allocation strategy is designed for software with a lot of hand-coded assembly routines
that are not sticking to a particular parameter passing strategy. The idea is to
provide <pentry>
tags for any register that might conceivably be considered an input
location. Then the input varnodes for a function that have a corresponding <pentry>
are automatically promoted to formal parameters. In practical terms, this strategy
behaves in the same way as the Standard strategy, except that in the reverse case,
the decompiler does not care about gaps in the resource list. It will not fill in
gaps, and it will not throw out putative inputs because of large gaps.
When assigning storage locations from a list of datatypes, the same algorithm is applied as in
the standard strategy. The first <pentry>
that hasn't been used and that fits the
datatype is assigned. Note that this may not make as much sense for hand-coded assembly.
There must be exactly one <default_proto>
tag, which contains exactly one
<prototype>
sub-tag. Other <prototype>
tags can be listed outside
of this tag. The designated default prototype model. Where users are given the option of choosing from
among different prototype models, the name "default" is always presented as an option and refers to this
prototype model. It is also used in some situations where the prototype model is unknown but analysis needs
to proceed.
Attributes and Children | ||
name |
The name of the prototype model | |
extrapop |
Amount stack pointer changes across a call or unknown | |
stackshift |
Amount stack changes due to the call mechanism | |
type |
(Optional) Generic calling convention type: stdcall, cdecl, fastcall, or thiscall | |
strategy |
(Optional) Allocation strategy: standard or register | |
<input> |
Resources for input variables | |
pointermax |
(Optional) Max size of parameter before converting to pointer | |
thisbeforeretpointer |
(Optional) true if this pointer comes before hidden return pointer | |
killedbycall |
(Optional) true indicates all input storage locations are considered killed by call | |
<pentry> |
(1 or more) Storage resources | |
<output> |
Resources for return value | |
killedbycall |
(Optional) true indicates all output storage locations are considered killed by call | |
<pentry> |
(1 or more) Storage resources | |
<returnaddress> |
(Optional) Storage location of return value | |
<unaffected> |
(Optional) Registers whose value is unaffected across calls | |
<killedbycall> |
(Optional) Registers whose value does not persist across calls | |
<likelytrash> |
(Optional) Registers that may hold a trash value entering the function | |
<localrange> |
(Optional) Range of stack locations that may hold mapped local variables |
The <prototype>
tag specifies a prototype model. It must have a name attribute,
which gives the name that can be used both in the Ghidra GUI and at other points within the compiler spec. The
strategy attribute indicates the allocation strategy, as described below.
If omitted the strategy defaults to standard.
Every <prototype>
must specify the extrapop attribute. This indicates the change in
the stack pointer to expect across a call, within the p-code model. For architectures where a call instruction pushes a
return value on the stack, this value will usually be positive and match the size of the stack-pointer in bytes,
indicating that a called function usually pops the return value itself and changes the stack pointer in a way not apparent
in the (callers) p-code. For architectures that use a link register to store the return address, extrapop
is usually zero, indicating to the decompiler that it can expect the stack pointer value not to change across a call. The
attribute can also be specified as unknown. This turns on the fairly onerous analysis associated with the
Microsoft stdcall calling convention, where functions, upon return, pop off their own stack parameters
in addition to the return address.
The stackshift attribute is also mandatory and indicates the amount the stack pointer changes just due to the call mechanism used to access a function with this prototype. The call instruction for many processors pushes the return address onto the stack. The stackshift attribute would typically be 2, 4, or 8, matching the code address size, in this case. For link register mechanisms, this attribute is set to zero.
The type attribute can be used to associate one of Ghidra's generic calling convention types with the prototype. The possible values are: stdcall, cdecl, fastcall, and thiscall. Each of these values can be assigned to at most one calling convention across the compiler specification. Generic calling conventions are used to encode calling convention information in a Ghidra datatype, like a FunctionDefinitionDataType, which can apply to more than one program or architecture.
The <input>
tag lists the resources used to pass input parameters to a function
with this prototype. The varnodes used for passing are selected by an
allocation strategy (See the section called “Describing Parameters and Allocation Strategies”)
from among the resources specified here. The
<input>
tag contains a list of <pentry>
sub-tags describing the varnodes.
Depending on the allocation strategy, the ordering is typically important.
The killedbycall attribute if true indicates that all storage locations listed in
the <input>
should be considered as killed by call (See the section called “<killedbycall>”).
This attribute is optional and defaults to false.
The pointermax attribute can be used if there is an absolute limit on the size of datatypes passed directly using the standard resources. If present and non-zero, the attribute indicates the largest number of bytes for a parameter. Bigger inputs are assumed to have a pointer passed instead. When a user specifies a function prototype with a big parameter, Ghidra will automatically allocate a storage location that holds the pointer. By default, this substitution does not occur, and large parameters go through the normal resource allocation process and are assigned storage that holds the whole value directly.
The thisbeforeretpointer indicates how the two hidden parameters, the this pointer and the hidden return pointer, are ordered on the stack, in the rare case where both occur in a single prototype. If the attribute is true, the this pointer comes first. By default, the hidden return will come first.
The following is an example tag using the standard allocation strategy with 3 integer registers and 2 floating-point registers. If there are more parameters of either type, the compiler allocates storage from the stack.
Example 15.
<input> <pentry minsize="1" maxsize="8" metatype="float"> <register name="f1"/> </pentry> <pentry minsize="1" maxsize="8" metatype="float"> <register name="f2"/> </pentry> <pentry minsize="1" maxsize="4"> <register name="a0"/> </pentry> <pentry minsize="1" maxsize="4"> <register name="a1"/> </pentry> <pentry minsize="1" maxsize="4"> <register name="a2"/> </pentry> <pentry minsize="1" maxsize="500" align="4"> <addr offset="16" space="stack"/> </pentry> </input>
The handling of
<pentry>
subtags within the <output>
tag is slightly different
than for the input case. Technically, this tag is sensitive to the allocation strategy
selected for the prototype. Currently however, all (both) strategies behave the same for the output parameter.
When assigning a storage location for a return value of a given data-type, the
first <pentry>
within list that matches the data-type is used as the storage
location. If none of the <pentry>
storage locations fit the data-type, a
Hidden Return Parameter
is triggered. An extra hidden input parameter is passed which holds a pointer to where the function
will store the return value.
In the inverse case, the decompiler examines all (possible) output varnodes that have
a corresponding <pentry>
tag in the resource list. The varnode whose corresponding
tag occurs the earliest in the list becomes the formal return value for the function.
If an output varnode matches no <pentry>
, then it is rejected as a formal return value.
Example 16.
<output killedbycall="true"> <pentry minsize="4" maxsize="10" metatype="float" extension="float"> <register name="ST0"/> </pentry> <pentry minsize="1" maxsize="4"> <register name="EAX"/> </pentry> <pentry minsize="5" maxsize="8"> <addr space="join" piece1="EDX" piece2="EAX"/> </pentry> </output>
Attributes and Children | ||
minsize |
Size (in bytes) of smallest variable stored here | |
maxsize |
Size (in bytes) of largest variable stored here | |
align |
(Optional) Alignment of successive locations within this entry | |
metatype |
(Optional) Restriction on datatype: unknown, float, int, uint, or ptr | |
extension |
(Optional) How small values are extended: sign, zero, inttype, float, or none | |
<register> |
Storage location of the entry | |
name |
Name of register | |
<addr> |
(alternate form) | |
space |
Address space of the location | |
offset |
Offset (in bytes) of location |
The <pentry>
tag describes the individual memory resources that make up both
the <input>
and <output>
resource lists. These
are consumed by the allocation strategy as it assigns storage for parameters and return values.
Attributes describe restrictions on how a particular <pentry>
resource
can be used.
The storage for the entry is specified by either the <register>
or the
<addr>
subtag. The minsize
and maxsize
attributes
restrict the size of the parameter to which the entry is assigned, and the metatype
attribute restricts the type of the parameter.
Metatype refers to the class
of the datatype, independent of size: integer, unsigned integer, floating-point, or pointer. The
default is unknown
or no type restriction. The <metatype>
can
be used to split out a separate floating-point resource list for some allocation strategies.
In the standard strategy for instance, any <pentry>
that
has the attribute metatype="float"
is pulled out into a separate list from all the other entries.
The optional extension
attribute indicates that variables are extended to fill the
entire location, if the datatype would otherwise occupy fewer bytes. The type
of extension depends on this attribute's value: zero
for zero extension,
sign
for sign extension, and float
for floating-point extension.
A value of inttype
indicates the value is either sign or zero extended depending on
the original datatype. The default is none
for no extension.
The align
attribute indicates that multiple variables can be drawn from the
pentry
resource. The first variable occupies bytes starting with the address
of the storage location specified in the tag. Additional variables start at the next available
aligned byte. The attribute value must be a positive integer that specifies the alignment. This
is typically used to model parameters pulled from a stack resource. The example below draws
up to 500 bytes of parameters from the stack, which are 4 byte aligned, starting at an offset
of 16 bytes from the initial value of the stack pointer.
Example 17.
<pentry minsize="1" maxsize="500" align="4"> <addr space="stack" offset="16"/> </pentry>
This is an optional tag that describes where the return address is stored, upon entering a function. If present, it overrides the default value for functions that use this particular prototype model. (See the section called “<returnaddress>”) It takes a single varnode tag describing the storage location.
This tag lists one or more storage locations that the compiler knows will not be modified by any sub-function. Each storage location is specified as a varnode tag.
By contract, sub-functions must either not touch these locations at all, or they must save off the value and then restore it before returning to their caller. Many ABI documents refer to these as saved registers. Fundamentally, this allows the decompiler to propagate values across function calls. Without this tag, because it is generally looking at a single function in isolation, the decompiler doesn't have enough information to safely allow this kind of propagation.
This tag lists one or more storage locations, each specified as a varnode tag, whose value should be considered killed by call.
A register or other storage location is killed by call if, from the point
of view of the calling function, the value of the register before a sub-function call is unrelated
to its value after the call. This is effectively the opposite of the <unaffected>
tag which specifies that the value is unchanged across the call.
A storage location marked neither <unaffected>
or <killedbycall>
is treated as if it may hold different values before and after the call. In other words,
the storage location represents the same high-level variable before and after, but the call may
modify the value.
This tag lists one or more storage locations specified as a varnode tag. In specialized cases, compilers can move around what seem like input values to functions, but the values are actually unused and the movement is incidental. The canonical example, is the push of a register on the stack, where the code is simply trying to make space on the stack.
If there is movement and no other explicit manipulation of the input value in a storage location tagged this way, the decompiler will treat the movement as dead code.
Attributes and Children | ||
<range> |
(1 or more) Range of bytes eligible for local variables | |
space |
Address space containing range (Usually "stack") | |
first |
(Optional) Starting byte offset of range, default is 0 | |
last |
(Optional) Ending byte offset, default is maximal offset of space |
This tag lists one or more <range>
tags that explicitly describe
all the possible ranges on the stack that can hold mapped local variables other than
parameters. Individual functions will be assumed to use some subset of this region.
The first and last attributes
to the <range>
tag give offsets relative to the incoming value
of the stack pointer. This affects the decompiler's reconstruction of the stack frame
for a function and parameter recovery.
Omitting this tag and accepting the default is often sufficient. The default sets the local
range as all bytes not yet pushed on the stack, where the incoming
stack pointer points to the last byte pushed. An explicit tag is useful when a specific
region needs to be added to or
excised from the default. The following example is for the 64-bit x86 prototype model, where
the caller reserves extra space on the stack for register parameters that needs
to be added to the default. The <localrange>
tag replaces the default,
so it needs to specify the default range if it wants to keep it.
Example 22.
<localrange> <range space="stack" first="0xfffffffffff0bdc1" last="0xffffffffffffffff"/> <range space="stack" first="8" last="39"/> </localrange>