Parameter Passing

A prototype model, in Ghidra, is a set of rules for determining how parameters and return values are passed between a function and its subfunction. For a high-level language (such as C or Java), a function prototype is the ordered list of parameters (each specified as a name and a datatype) that are passed to the function as input plus the optional value (specified as just a dataype) returned by the function. A prototype model specifies how a compiler decides which storage locations are used to hold the actual values at run time.

From a reverse engineering perspective, Ghidra also needs to solve the inverse problem: given a set of storage locations (registers and stack locations) that look like they are inputs and outputs to a function, determine a high-level function prototype that produces those locations when compiled. The same prototype model is used to solve this problem as well, but in this case, the solution may not be unique, or can only be exactly derived from information that Ghidra doesn't have.

Describing Parameters and Allocation Strategies

The <prototype> tag encodes details about a specific prototype model, within a compiler specification. A given compiler spec can have multiple prototype models, which are all distinguished by the mandatory name attribute for the tag. Other Ghidra tools refer to prototype model's by this name, and it must be unique across all models in the compiler spec. All <prototype> tags must include the subtags, <input> and <output>, which list storage locations (registers, stack, and other varnodes) as the raw material for the prototype model to decide where parameters are stored for passing between functions. The <input> tag holds the resources used to pass input parameters, and <output> describes resources for return value storage. A resource is described by the <pentry> tag, which comes in two flavors. Most <pentry> tags describe a storage location to be used by a single variable. If the tag has an align attribute however, multiple variables can be allocated from the same resource, where different variables must be aligned relative to the start of the resource as specified by the attribute's value.

How <pentry> resources are used is determined by the prototype model's strategy. This is specified as an optional attribute to the main <prototype> tag. There are currently only two strategies: standard and register. If the attribute is not present, the prototype model defaults to the standard strategy.

Standard Strategy

For this strategy, the <pentry> subtags under the <input> tag are viewed as an ordered resource list. When assigning storage locations from a list of datatypes, each datatype is evaluated in order. The first <pentry> from the resource list that fits the datatype and hasn't been fully used by previous datatypes is assigned to that datatype. In this case, the <input> tag lists varnodes in the order that a compiler would dole them out when given a list of parameters to pass. Integer or pointer values are usually passed first in specially designated registers rather than on the stack if there are not enough available registers. There can one stack-based <pentry> at the end of the list that will typically match any number of parameters of any size or type.

If there are separate <pentry> tags for dedicated floating-point registers, the standard strategy treats them as a separate resource list, independent of the one for integer and pointer datatypes. The <pentry> tags specifying floating-point registers are listed in the same <input> tag, immediately after the integer registers, and are distinguished by the metatype="float" attribute labeling the individual tags.

For the inverse case, where the decompiler must infer a prototype from data-flow and liveness, the standard strategy expects there to be no gaps in the usage of the (either) resource list. For a putative input varnode to be considered a formal parameter, it must occur somewhere in the <pentry> resource list. If there is a gap, i.e. the second <pentry> occurs as a varnode but not the first, then the decompiler will fill in the gap by creating an extra unused parameter. Or if the gap is too big, the original input varnode will not be considered a formal parameter.

Register Strategy

This allocation strategy is designed for software with a lot of hand-coded assembly routines that are not sticking to a particular parameter passing strategy. The idea is to provide <pentry> tags for any register that might conceivably be considered an input location. Then the input varnodes for a function that have a corresponding <pentry> are automatically promoted to formal parameters. In practical terms, this strategy behaves in the same way as the Standard strategy, except that in the reverse case, the decompiler does not care about gaps in the resource list. It will not fill in gaps, and it will not throw out putative inputs because of large gaps.

When assigning storage locations from a list of datatypes, the same algorithm is applied as in the standard strategy. The first <pentry> that hasn't been used and that fits the datatype is assigned. Note that this may not make as much sense for hand-coded assembly.

<default_proto>

Attributes and Children
<prototype> Specification for the default prototype

There must be exactly one <default_proto> tag, which contains exactly one <prototype> sub-tag. Other <prototype> tags can be listed outside of this tag. The designated default prototype model. Where users are given the option of choosing from among different prototype models, the name "default" is always presented as an option and refers to this prototype model. It is also used in some situations where the prototype model is unknown but analysis needs to proceed.

<prototype>

Attributes and Children
name The name of the prototype model
extrapop Amount stack pointer changes across a call or unknown
stackshift Amount stack changes due to the call mechanism
type (Optional) Generic calling convention type: stdcall, cdecl, fastcall, or thiscall
strategy (Optional) Allocation strategy: standard or register
<input> Resources for input variables
pointermax (Optional) Max size of parameter before converting to pointer
thisbeforeretpointer (Optional) true if this pointer comes before hidden return pointer
killedbycall (Optional) true indicates all input storage locations are considered killed by call
<pentry> (1 or more) Storage resources
<output> Resources for return value
killedbycall (Optional) true indicates all output storage locations are considered killed by call
<pentry> (1 or more) Storage resources
<returnaddress> (Optional) Storage location of return value
<unaffected> (Optional) Registers whose value is unaffected across calls
<killedbycall> (Optional) Registers whose value does not persist across calls
<likelytrash> (Optional) Registers that may hold a trash value entering the function
<localrange> (Optional) Range of stack locations that may hold mapped local variables

The <prototype> tag specifies a prototype model. It must have a name attribute, which gives the name that can be used both in the Ghidra GUI and at other points within the compiler spec. The strategy attribute indicates the allocation strategy, as described below. If omitted the strategy defaults to standard.

Every <prototype> must specify the extrapop attribute. This indicates the change in the stack pointer to expect across a call, within the p-code model. For architectures where a call instruction pushes a return value on the stack, this value will usually be positive and match the size of the stack-pointer in bytes, indicating that a called function usually pops the return value itself and changes the stack pointer in a way not apparent in the (callers) p-code. For architectures that use a link register to store the return address, extrapop is usually zero, indicating to the decompiler that it can expect the stack pointer value not to change across a call. The attribute can also be specified as unknown. This turns on the fairly onerous analysis associated with the Microsoft stdcall calling convention, where functions, upon return, pop off their own stack parameters in addition to the return address.

The stackshift attribute is also mandatory and indicates the amount the stack pointer changes just due to the call mechanism used to access a function with this prototype. The call instruction for many processors pushes the return address onto the stack. The stackshift attribute would typically be 2, 4, or 8, matching the code address size, in this case. For link register mechanisms, this attribute is set to zero.

The type attribute can be used to associate one of Ghidra's generic calling convention types with the prototype. The possible values are: stdcall, cdecl, fastcall, and thiscall. Each of these values can be assigned to at most one calling convention across the compiler specification. Generic calling conventions are used to encode calling convention information in a Ghidra datatype, like a FunctionDefinitionDataType, which can apply to more than one program or architecture.

<input>

The <input> tag lists the resources used to pass input parameters to a function with this prototype. The varnodes used for passing are selected by an allocation strategy (See the section called “Describing Parameters and Allocation Strategies”) from among the resources specified here. The <input> tag contains a list of <pentry> sub-tags describing the varnodes. Depending on the allocation strategy, the ordering is typically important.

The killedbycall attribute if true indicates that all storage locations listed in the <input> should be considered as killed by call (See the section called “<killedbycall>”). This attribute is optional and defaults to false.

The pointermax attribute can be used if there is an absolute limit on the size of datatypes passed directly using the standard resources. If present and non-zero, the attribute indicates the largest number of bytes for a parameter. Bigger inputs are assumed to have a pointer passed instead. When a user specifies a function prototype with a big parameter, Ghidra will automatically allocate a storage location that holds the pointer. By default, this substitution does not occur, and large parameters go through the normal resource allocation process and are assigned storage that holds the whole value directly.

The thisbeforeretpointer indicates how the two hidden parameters, the this pointer and the hidden return pointer, are ordered on the stack, in the rare case where both occur in a single prototype. If the attribute is true, the this pointer comes first. By default, the hidden return will come first.

The following is an example tag using the standard allocation strategy with 3 integer registers and 2 floating-point registers. If there are more parameters of either type, the compiler allocates storage from the stack.

Example 15.

  <input>
    <pentry minsize="1" maxsize="8" metatype="float">
      <register name="f1"/>
    </pentry>
    <pentry minsize="1" maxsize="8" metatype="float">
      <register name="f2"/>
    </pentry>
    <pentry minsize="1" maxsize="4">
      <register name="a0"/>
    </pentry>
    <pentry minsize="1" maxsize="4">
      <register name="a1"/>
    </pentry>
    <pentry minsize="1" maxsize="4">
      <register name="a2"/>
    </pentry>
    <pentry minsize="1" maxsize="500" align="4">
      <addr offset="16" space="stack"/>
    </pentry>
  </input>

<output>

The handling of <pentry> subtags within the <output> tag is slightly different than for the input case. Technically, this tag is sensitive to the allocation strategy selected for the prototype. Currently however, all (both) strategies behave the same for the output parameter.

When assigning a storage location for a return value of a given data-type, the first <pentry> within list that matches the data-type is used as the storage location. If none of the <pentry> storage locations fit the data-type, a Hidden Return Parameter is triggered. An extra hidden input parameter is passed which holds a pointer to where the function will store the return value.

In the inverse case, the decompiler examines all (possible) output varnodes that have a corresponding <pentry> tag in the resource list. The varnode whose corresponding tag occurs the earliest in the list becomes the formal return value for the function. If an output varnode matches no <pentry>, then it is rejected as a formal return value.

Example 16.

  <output killedbycall="true">
    <pentry minsize="4" maxsize="10" metatype="float" extension="float">
      <register name="ST0"/>
    </pentry>
    <pentry minsize="1" maxsize="4">
      <register name="EAX"/>
    </pentry>
    <pentry minsize="5" maxsize="8">
      <addr space="join" piece1="EDX" piece2="EAX"/>
    </pentry>
  </output>

<pentry>

Attributes and Children
minsize Size (in bytes) of smallest variable stored here
maxsize Size (in bytes) of largest variable stored here
align (Optional) Alignment of successive locations within this entry
metatype (Optional) Restriction on datatype: unknown, float, int, uint, or ptr
extension (Optional) How small values are extended: sign, zero, inttype, float, or none
<register> Storage location of the entry
name Name of register
<addr> (alternate form)
space Address space of the location
offset Offset (in bytes) of location

The <pentry> tag describes the individual memory resources that make up both the <input> and <output> resource lists. These are consumed by the allocation strategy as it assigns storage for parameters and return values. Attributes describe restrictions on how a particular <pentry> resource can be used.

The storage for the entry is specified by either the <register> or the <addr> subtag. The minsize and maxsize attributes restrict the size of the parameter to which the entry is assigned, and the metatype attribute restricts the type of the parameter.

Metatype refers to the class of the datatype, independent of size: integer, unsigned integer, floating-point, or pointer. The default is unknown or no type restriction. The <metatype> can be used to split out a separate floating-point resource list for some allocation strategies. In the standard strategy for instance, any <pentry> that has the attribute metatype="float" is pulled out into a separate list from all the other entries.

The optional extension attribute indicates that variables are extended to fill the entire location, if the datatype would otherwise occupy fewer bytes. The type of extension depends on this attribute's value: zero for zero extension, sign for sign extension, and float for floating-point extension. A value of inttype indicates the value is either sign or zero extended depending on the original datatype. The default is none for no extension.

The align attribute indicates that multiple variables can be drawn from the pentry resource. The first variable occupies bytes starting with the address of the storage location specified in the tag. Additional variables start at the next available aligned byte. The attribute value must be a positive integer that specifies the alignment. This is typically used to model parameters pulled from a stack resource. The example below draws up to 500 bytes of parameters from the stack, which are 4 byte aligned, starting at an offset of 16 bytes from the initial value of the stack pointer.

Example 17.

  <pentry minsize="1" maxsize="500" align="4">
    <addr space="stack" offset="16"/>
  </pentry>

<returnaddress>

Attributes and Children
<register> or <varnode> One varnode tag

This is an optional tag that describes where the return address is stored, upon entering a function. If present, it overrides the default value for functions that use this particular prototype model. (See the section called “<returnaddress>”) It takes a single varnode tag describing the storage location.

Example 18.

  <returnaddress>
    <register name="RA" />
  </returnaddress>

<unaffected>

Attributes and Children
<register> or <varnode> (1 or more) varnode tags

This tag lists one or more storage locations that the compiler knows will not be modified by any sub-function. Each storage location is specified as a varnode tag.

By contract, sub-functions must either not touch these locations at all, or they must save off the value and then restore it before returning to their caller. Many ABI documents refer to these as saved registers. Fundamentally, this allows the decompiler to propagate values across function calls. Without this tag, because it is generally looking at a single function in isolation, the decompiler doesn't have enough information to safely allow this kind of propagation.

Example 19.

  <unaffected>
    <register name="ESP"/>
    <register name="EBP"/>
  </unaffected>

<killedbycall>

Attributes and Children
<register> or <varnode> (1 or more) varnode tags

This tag lists one or more storage locations, each specified as a varnode tag, whose value should be considered killed by call.

A register or other storage location is killed by call if, from the point of view of the calling function, the value of the register before a sub-function call is unrelated to its value after the call. This is effectively the opposite of the <unaffected> tag which specifies that the value is unchanged across the call.

A storage location marked neither <unaffected> or <killedbycall> is treated as if it may hold different values before and after the call. In other words, the storage location represents the same high-level variable before and after, but the call may modify the value.

Example 20.

  <killedbycall>
    <register name="ECX"/>
    <register name="EDX"/>
  </killedbycall>

<likelytrash>

Attributes and Children
<register> or <varnode> (1 or more) varnode tags

This tag lists one or more storage locations specified as a varnode tag. In specialized cases, compilers can move around what seem like input values to functions, but the values are actually unused and the movement is incidental. The canonical example, is the push of a register on the stack, where the code is simply trying to make space on the stack.

If there is movement and no other explicit manipulation of the input value in a storage location tagged this way, the decompiler will treat the movement as dead code.

Example 21.

  <likelytrash>
    <register name="ECX"/>
  </likelytrash>

<localrange>

Attributes and Children
<range> (1 or more) Range of bytes eligible for local variables
space Address space containing range (Usually "stack")
first (Optional) Starting byte offset of range, default is 0
last (Optional) Ending byte offset, default is maximal offset of space

This tag lists one or more <range> tags that explicitly describe all the possible ranges on the stack that can hold mapped local variables other than parameters. Individual functions will be assumed to use some subset of this region. The first and last attributes to the <range> tag give offsets relative to the incoming value of the stack pointer. This affects the decompiler's reconstruction of the stack frame for a function and parameter recovery.

Omitting this tag and accepting the default is often sufficient. The default sets the local range as all bytes not yet pushed on the stack, where the incoming stack pointer points to the last byte pushed. An explicit tag is useful when a specific region needs to be added to or excised from the default. The following example is for the 64-bit x86 prototype model, where the caller reserves extra space on the stack for register parameters that needs to be added to the default. The <localrange> tag replaces the default, so it needs to specify the default range if it wants to keep it.

Example 22.

  <localrange>
    <range space="stack" first="0xfffffffffff0bdc1" last="0xffffffffffffffff"/>
    <range space="stack" first="8" last="39"/>
  </localrange>