Ghidra Decompiler Analysis Engine
|
The base class API for emitting a high-level language. More...
#include <printlanguage.hh>
Classes | |
struct | Atom |
A single non-operator token emitted by the decompiler. More... | |
struct | NodePending |
A pending data-flow node; waiting to be placed on the reverse polish notation stack. More... | |
struct | ReversePolish |
An entry on the reverse polish notation (RPN) stack. More... | |
Public Types | |
enum | modifiers { force_hex = 1, force_dec = 2, bestfit = 4, force_scinote = 8, force_pointer = 0x10, print_load_value = 0x20, print_store_value = 0x40, no_branch = 0x80, only_branch = 0x100, comma_separate = 0x200, flat = 0x400, falsebranch = 0x800, nofallthru = 0x1000, negatetoken = 0x2000, hide_thisparam = 0x4000 } |
Possible context sensitive modifiers to how tokens get emitted. More... | |
enum | tagtype { syntax, vartoken, functoken, optoken, typetoken, fieldtoken, blanktoken } |
Possible types of Atom. More... | |
enum | namespace_strategy { MINIMAL_NAMESPACES = 0, NO_NAMESPACES = 1, ALL_NAMESPACES = 2 } |
Strategies for displaying namespace tokens. More... | |
Public Member Functions | |
PrintLanguage (Architecture *g, const string &nm) | |
Constructor. More... | |
virtual | ~PrintLanguage (void) |
Destructor. | |
void | setLineCommentIndent (int4 val) |
Set the number of characters to indent comment lines. More... | |
void | setCommentDelimeter (const string &start, const string &stop, bool usecommentfill) |
Establish comment delimiters for the language. More... | |
void | setXML (bool val) |
Set whether the low-level emitter, emits XML markup. More... | |
void | setFlat (bool val) |
Set whether nesting code structure should be emitted. More... | |
virtual void | adjustTypeOperators (void)=0 |
Set basic data-type information for p-code operators. | |
virtual void | resetDefaults (void) |
Set printing options to their default value. | |
virtual void | clear (void) |
Clear the RPN stack and the low-level emitter. | |
virtual void | setIntegerFormat (const string &nm) |
Set the default integer format. More... | |
virtual void | setCommentStyle (const string &nm)=0 |
Set the way comments are displayed in decompiler output. More... | |
virtual void | docTypeDefinitions (const TypeFactory *typegrp)=0 |
Emit definitions of data-types. More... | |
virtual void | docAllGlobals (void)=0 |
Emit declarations of global variables. | |
virtual void | docSingleGlobal (const Symbol *sym)=0 |
Emit the declaration for a single (global) Symbol. More... | |
virtual void | docFunction (const Funcdata *fd)=0 |
Emit the declaration (and body) of a function. More... | |
virtual void | emitBlockBasic (const BlockBasic *bb)=0 |
Emit statements in a basic block. | |
virtual void | emitBlockGraph (const BlockGraph *bl)=0 |
Emit (an unspecified) list of blocks. | |
virtual void | emitBlockCopy (const BlockCopy *bl)=0 |
Emit a basic block (with any labels) | |
virtual void | emitBlockGoto (const BlockGoto *bl)=0 |
Emit a block ending with a goto statement. | |
virtual void | emitBlockLs (const BlockList *bl)=0 |
Emit a sequence of blocks. | |
virtual void | emitBlockCondition (const BlockCondition *bl)=0 |
Emit a conditional statement. | |
virtual void | emitBlockIf (const BlockIf *bl)=0 |
Emit an if/else style construct. | |
virtual void | emitBlockWhileDo (const BlockWhileDo *bl)=0 |
Emit a loop structure, check at top. | |
virtual void | emitBlockDoWhile (const BlockDoWhile *bl)=0 |
Emit a loop structure, check at bottom. | |
virtual void | emitBlockInfLoop (const BlockInfLoop *bl)=0 |
Emit an infinite loop structure. | |
virtual void | emitBlockSwitch (const BlockSwitch *bl)=0 |
Emit a switch structure. | |
virtual void | opCopy (const PcodeOp *op)=0 |
Emit a COPY operator. | |
virtual void | opLoad (const PcodeOp *op)=0 |
Emit a LOAD operator. | |
virtual void | opStore (const PcodeOp *op)=0 |
Emit a STORE operator. | |
virtual void | opBranch (const PcodeOp *op)=0 |
Emit a BRANCH operator. | |
virtual void | opCbranch (const PcodeOp *op)=0 |
Emit a CBRANCH operator. | |
virtual void | opBranchind (const PcodeOp *op)=0 |
Emit a BRANCHIND operator. | |
virtual void | opCall (const PcodeOp *op)=0 |
Emit a CALL operator. | |
virtual void | opCallind (const PcodeOp *op)=0 |
Emit a CALLIND operator. | |
virtual void | opCallother (const PcodeOp *op)=0 |
Emit a CALLOTHER operator. | |
virtual void | opConstructor (const PcodeOp *op, bool withNew)=0 |
Emit an operator constructing an object. | |
virtual void | opReturn (const PcodeOp *op)=0 |
Emit a RETURN operator. | |
virtual void | opIntEqual (const PcodeOp *op)=0 |
Emit a INT_EQUAL operator. | |
virtual void | opIntNotEqual (const PcodeOp *op)=0 |
Emit a INT_NOTEQUAL operator. | |
virtual void | opIntSless (const PcodeOp *op)=0 |
Emit a INT_SLESS operator. | |
virtual void | opIntSlessEqual (const PcodeOp *op)=0 |
Emit a INT_SLESSEQUAL operator. | |
virtual void | opIntLess (const PcodeOp *op)=0 |
Emit a INT_LESS operator. | |
virtual void | opIntLessEqual (const PcodeOp *op)=0 |
Emit a INT_LESSEQUAL operator. | |
virtual void | opIntZext (const PcodeOp *op, const PcodeOp *readOp)=0 |
Emit a INT_ZEXT operator. | |
virtual void | opIntSext (const PcodeOp *op, const PcodeOp *readOp)=0 |
Emit a INT_SEXT operator. | |
virtual void | opIntAdd (const PcodeOp *op)=0 |
Emit a INT_ADD operator. | |
virtual void | opIntSub (const PcodeOp *op)=0 |
Emit a INT_SUB operator. | |
virtual void | opIntCarry (const PcodeOp *op)=0 |
Emit a INT_CARRY operator. | |
virtual void | opIntScarry (const PcodeOp *op)=0 |
Emit a INT_SCARRY operator. | |
virtual void | opIntSborrow (const PcodeOp *op)=0 |
Emit a INT_SBORROW operator. | |
virtual void | opInt2Comp (const PcodeOp *op)=0 |
Emit a INT_2COMP operator. | |
virtual void | opIntNegate (const PcodeOp *op)=0 |
Emit a INT_NEGATE operator. | |
virtual void | opIntXor (const PcodeOp *op)=0 |
Emit a INT_XOR operator. | |
virtual void | opIntAnd (const PcodeOp *op)=0 |
Emit a INT_AND operator. | |
virtual void | opIntOr (const PcodeOp *op)=0 |
Emit a INT_OR operator. | |
virtual void | opIntLeft (const PcodeOp *op)=0 |
Emit a INT_LEFT operator. | |
virtual void | opIntRight (const PcodeOp *op)=0 |
Emit a INT_RIGHT operator. | |
virtual void | opIntSright (const PcodeOp *op)=0 |
Emit a INT_SRIGHT operator. | |
virtual void | opIntMult (const PcodeOp *op)=0 |
Emit a INT_MULT operator. | |
virtual void | opIntDiv (const PcodeOp *op)=0 |
Emit a INT_DIV operator. | |
virtual void | opIntSdiv (const PcodeOp *op)=0 |
Emit a INT_SDIV operator. | |
virtual void | opIntRem (const PcodeOp *op)=0 |
Emit a INT_REM operator. | |
virtual void | opIntSrem (const PcodeOp *op)=0 |
Emit a INT_SREM operator. | |
virtual void | opBoolNegate (const PcodeOp *op)=0 |
Emit a BOOL_NEGATE operator. | |
virtual void | opBoolXor (const PcodeOp *op)=0 |
Emit a BOOL_XOR operator. | |
virtual void | opBoolAnd (const PcodeOp *op)=0 |
Emit a BOOL_AND operator. | |
virtual void | opBoolOr (const PcodeOp *op)=0 |
Emit a BOOL_OR operator. | |
virtual void | opFloatEqual (const PcodeOp *op)=0 |
Emit a FLOAT_EQUAL operator. | |
virtual void | opFloatNotEqual (const PcodeOp *op)=0 |
Emit a FLOAT_NOTEQUAL operator. | |
virtual void | opFloatLess (const PcodeOp *op)=0 |
Emit a FLOAT_LESS operator. | |
virtual void | opFloatLessEqual (const PcodeOp *op)=0 |
Emit a FLOAT_LESSEQUAL operator. | |
virtual void | opFloatNan (const PcodeOp *op)=0 |
Emit a FLOAT_NAN operator. | |
virtual void | opFloatAdd (const PcodeOp *op)=0 |
Emit a FLOAT_ADD operator. | |
virtual void | opFloatDiv (const PcodeOp *op)=0 |
Emit a FLOAT_DIV operator. | |
virtual void | opFloatMult (const PcodeOp *op)=0 |
Emit a FLOAT_MULT operator. | |
virtual void | opFloatSub (const PcodeOp *op)=0 |
Emit a FLOAT_SUB operator. | |
virtual void | opFloatNeg (const PcodeOp *op)=0 |
Emit a FLOAT_NEG operator. | |
virtual void | opFloatAbs (const PcodeOp *op)=0 |
Emit a FLOAT_ABS operator. | |
virtual void | opFloatSqrt (const PcodeOp *op)=0 |
Emit a FLOAT_SQRT operator. | |
virtual void | opFloatInt2Float (const PcodeOp *op)=0 |
Emit a FLOAT_INT2FLOAT operator. | |
virtual void | opFloatFloat2Float (const PcodeOp *op)=0 |
Emit a FLOAT_FLOAT2FLOAT operator. | |
virtual void | opFloatTrunc (const PcodeOp *op)=0 |
Emit a FLOAT_TRUNC operator. | |
virtual void | opFloatCeil (const PcodeOp *op)=0 |
Emit a FLOAT_CEIL operator. | |
virtual void | opFloatFloor (const PcodeOp *op)=0 |
Emit a FLOAT_FLOOR operator. | |
virtual void | opFloatRound (const PcodeOp *op)=0 |
Emit a FLOAT_ROUND operator. | |
virtual void | opMultiequal (const PcodeOp *op)=0 |
Emit a MULTIEQUAL operator. | |
virtual void | opIndirect (const PcodeOp *op)=0 |
Emit a INDIRECT operator. | |
virtual void | opPiece (const PcodeOp *op)=0 |
Emit a PIECE operator. | |
virtual void | opSubpiece (const PcodeOp *op)=0 |
Emit a SUBPIECE operator. | |
virtual void | opCast (const PcodeOp *op)=0 |
Emit a CAST operator. | |
virtual void | opPtradd (const PcodeOp *op)=0 |
Emit a PTRADD operator. | |
virtual void | opPtrsub (const PcodeOp *op)=0 |
Emit a PTRSUB operator. | |
virtual void | opSegmentOp (const PcodeOp *op)=0 |
Emit a SEGMENTOP operator. | |
virtual void | opCpoolRefOp (const PcodeOp *op)=0 |
Emit a CPOOLREF operator. | |
virtual void | opNewOp (const PcodeOp *op)=0 |
Emit a NEW operator. | |
virtual void | opInsertOp (const PcodeOp *op)=0 |
Emit an INSERT operator. | |
virtual void | opExtractOp (const PcodeOp *op)=0 |
Emit an EXTRACT operator. | |
virtual void | opPopcountOp (const PcodeOp *op)=0 |
Emit a POPCOUNT operator. | |
Static Public Member Functions | |
static int4 | mostNaturalBase (uintb val) |
Determine the most natural base for an integer. More... | |
static void | formatBinary (ostream &s, uintb val) |
Print a number in binary form. More... | |
Protected Member Functions | |
void | popScope (void) |
Pop to the previous symbol scope. | |
void | pushOp (const OpToken *tok, const PcodeOp *op) |
Push an operator token onto the RPN stack. More... | |
void | pushAtom (const Atom &atom) |
Push a variable token onto the RPN stack. More... | |
void | pushVnImplied (const Varnode *vn, const PcodeOp *op, uint4 m) |
Push an implied variable onto the RPN stack. More... | |
void | pushVnExplicit (const Varnode *vn, const PcodeOp *op) |
Push an explicit variable onto the RPN stack. More... | |
void | pushVnLHS (const Varnode *vn, const PcodeOp *op) |
Push a variable as the left-hand side of an expression. More... | |
bool | parentheses (const OpToken *op2) |
Determine if the given token should be emitted in its own parenthetic expression. More... | |
void | emitOp (const ReversePolish &entry) |
Send an operator token from the RPN to the emitter. More... | |
void | emitAtom (const Atom &atom) |
Send an variable token from the RPN to the emitter. More... | |
bool | escapeCharacterData (ostream &s, const uint1 *buf, int4 count, int4 charsize, bool bigend) const |
Emit a byte buffer to the stream as unicode characters. More... | |
void | recurse (void) |
Emit from the RPN stack as much as possible. More... | |
void | opBinary (const OpToken *tok, const PcodeOp *op) |
Push a binary operator onto the RPN stack. More... | |
void | opUnary (const OpToken *tok, const PcodeOp *op) |
Push a unary operator onto the RPN stack. More... | |
void | resetDefaultsInternal (void) |
Reset options to default for PrintLanguage. | |
virtual void | printUnicode (ostream &s, int4 onechar) const =0 |
Print a single unicode character as a character constant for the high-level language. More... | |
virtual void | pushType (const Datatype *ct)=0 |
Push a data-type name onto the RPN expression stack. More... | |
virtual void | pushConstant (uintb val, const Datatype *ct, const Varnode *vn, const PcodeOp *op)=0 |
Push a constant onto the RPN stack. More... | |
virtual bool | pushEquate (uintb val, int4 sz, const EquateSymbol *sym, const Varnode *vn, const PcodeOp *op)=0 |
Push a constant marked up by and EquateSymbol onto the RPN stack. More... | |
virtual void | pushAnnotation (const Varnode *vn, const PcodeOp *op)=0 |
Push an address which is not in the normal data-flow. More... | |
virtual void | pushSymbol (const Symbol *sym, const Varnode *vn, const PcodeOp *op)=0 |
Push a specific Symbol onto the RPN stack. More... | |
virtual void | pushUnnamedLocation (const Address &addr, const Varnode *vn, const PcodeOp *op)=0 |
Push an address as a substitute for a Symbol onto the RPN stack. More... | |
virtual void | pushPartialSymbol (const Symbol *sym, int4 off, int4 sz, const Varnode *vn, const PcodeOp *op, Datatype *outtype)=0 |
Push a variable that represents only part of a symbol onto the RPN stack. More... | |
virtual void | pushMismatchSymbol (const Symbol *sym, int4 off, int4 sz, const Varnode *vn, const PcodeOp *op)=0 |
Push an identifier for a variable that mismatches with its Symbol. More... | |
virtual void | emitLineComment (int4 indent, const Comment *comm) |
Emit a comment line. More... | |
virtual void | emitVarDecl (const Symbol *sym)=0 |
Emit a variable declaration. More... | |
virtual void | emitVarDeclStatement (const Symbol *sym)=0 |
Emit a variable declaration statement. More... | |
virtual bool | emitScopeVarDecls (const Scope *scope, int4 cat)=0 |
Emit all the variable declarations for a given scope. More... | |
virtual void | emitExpression (const PcodeOp *op)=0 |
Emit a full expression. More... | |
virtual void | emitFunctionDeclaration (const Funcdata *fd)=0 |
Emit a function declaration. More... | |
virtual bool | checkPrintNegation (const Varnode *vn)=0 |
Check whether a given boolean Varnode can be printed in negated form. More... | |
Static Protected Member Functions | |
static bool | unicodeNeedsEscape (int4 codepoint) |
Determine if the given codepoint needs to be escaped. More... | |
Protected Attributes | |
Architecture * | glb |
The Architecture owning the language emitter. | |
const Scope * | curscope |
The current symbol scope. | |
CastStrategy * | castStrategy |
The strategy for emitting explicit case operations. | |
EmitXml * | emit |
The low-level token emitter. | |
uint4 | mods |
Currently active printing modifications. | |
uint4 | instr_comment_type |
Type of instruction comments to display. | |
uint4 | head_comment_type |
Type of header comments to display. | |
namespace_strategy | namespc_strategy |
How should namespace tokens be displayed. | |
The base class API for emitting a high-level language.
Instances of this object are responsible for converting a function's (transformed) data-flow graph into the final stream of tokens of a high-level source code language. There a few main entry points including:
The system is responsible for printing:
As part of all this printing, the system is also responsible for
To accomplish this, the API is broken up into three sections. The first section are the main entry point 'doc' methods. The second section are 'emit' methods, which are responsible for printing a representation of a particular high-level code construct. The third section are 'push' and 'op' methods, which are responsible for walking expression trees. The order in which tokens are emitted for an expression is determined by a Reverse Polish Notation (RPN) stack, that the 'push' methods manipulate. Operators and variables are pushed onto this stack and are ultimately emitted in the correct order.
The base class provides a generic printing modifications stack and a symbol scope stack to provide a printing context mechanism for derived classes.
Possible context sensitive modifiers to how tokens get emitted.
Possible types of Atom.
PrintLanguage::PrintLanguage | ( | Architecture * | g, |
const string & | nm | ||
) |
g | is the Architecture that owns and will use this PrintLanguage |
nm | is the formal name of the language |
|
protectedpure virtual |
Check whether a given boolean Varnode can be printed in negated form.
In many situations a boolean value can be inverted by flipping the operator token producing it to a complementary token.
vn | is the given boolean Varnode |
Implemented in PrintC.
|
pure virtual |
|
pure virtual |
|
pure virtual |
Emit definitions of data-types.
typegrp | is the container for the data-types that should be defined |
Implemented in PrintC.
|
protected |
|
protectedpure virtual |
|
protectedpure virtual |
|
protectedvirtual |
Emit a comment line.
The comment will get emitted as a single line using the high-level language's delimiters with the given indent level
indent | is the number of characters to indent |
comm | is the Comment object containing the character data and associated markup info |
|
protected |
Send an operator token from the RPN to the emitter.
An OpToken directly from the RPN is sent to the low-level emitter, resolving any final spacing or parentheses.
entry | is the RPN entry to be emitted |
|
protectedpure virtual |
|
protectedpure virtual |
|
protectedpure virtual |
|
protected |
Emit a byte buffer to the stream as unicode characters.
Characters are emitted until we reach a terminator character or count bytes is consumed.
s | is the output stream |
buf | is the byte buffer |
count | is the maximum number of bytes to consume |
charsize | is 1 for UTF8, 2 for UTF16, or 4 for UTF32 |
bigend | is true for a big endian encoding of UTF elements |
|
static |
Print a number in binary form.
Print a string a '0' and '1' characters representing the given value
s | is the output stream |
val | is the given value |
|
static |
Determine the most natural base for an integer.
Count '0' and '9' digits base 10. Count '0' and 'f' digits base 16. The highest count is the preferred base.
val | is the given integer |
Push a binary operator onto the RPN stack.
Push an operator onto the stack that has a normal binary format. Both of its input expressions are also pushed.
tok | is the operator token to push |
op | is the associated PcodeOp |
Push a unary operator onto the RPN stack.
Push an operator onto the stack that has a normal unary format. Its input expression is also pushed.
tok | is the operator token to push |
op | is the associated PcodeOp |
|
protected |
Determine if the given token should be emitted in its own parenthetic expression.
The token at the top of the stack is being emitted. Check if its input expression, ending with the given operator token, needs to be surrounded by parentheses to convey the proper meaning.
op2 | is the input token to this operator |
|
protectedpure virtual |
Print a single unicode character as a character constant for the high-level language.
For most languages, this prints the character surrounded by single quotes.
s | is the output stream |
onechar | is the unicode code point of the character to print |
Implemented in PrintC.
|
protected |
Push a variable token onto the RPN stack.
Push a single token (an Atom) onto the RPN stack. This may trigger some amount of the RPN stack to get emitted, depending on what was pushed previously. The 'emit' routines are called, popping off as much as possible.
atom | is the token to be pushed |
|
protectedpure virtual |
Push a constant onto the RPN stack.
The value is ultimately emitted based on its data-type and other associated mark-up
val | is the value of the constant |
ct | is the data-type of the constant |
vn | is the Varnode holding the constant (optional) |
op | is the PcodeOp using the constant (optional) |
Implemented in PrintC.
|
protectedpure virtual |
Push a constant marked up by and EquateSymbol onto the RPN stack.
The equate may substitute a name or force a conversion for the constant
val | is the value of the constant |
sz | is the number of bytes to use for the encoding |
sym | is the EquateSymbol that marks up the constant |
vn | is the Varnode holding the constant (optional) |
op | is the PcodeOp using the constant (optional) |
Implemented in PrintC.
|
protectedpure virtual |
Push an identifier for a variable that mismatches with its Symbol.
This happens when a Varnode overlaps, but is not contained by a Symbol. This most commonly happens when the size of a Symbol is unknown
sym | is the overlapped symbol |
off | is the byte offset of the variable relative to the symbol |
sz | is the size of the variable in bytes |
vn | is the Varnode representing the variable |
op | is a PcodeOp associated with the Varnode |
Implemented in PrintC.
Push an operator token onto the RPN stack.
This generally will recursively push an entire expression onto the RPN stack, up to Varnode objects marked as explicit, and will decide token order and parenthesis placement. As the ordering gets resolved, some amount of the expression may get emitted.
tok | is the operator token to push |
op | is the PcodeOp associated with the token |
|
protectedpure virtual |
Push a variable that represents only part of a symbol onto the RPN stack.
Generally member syntax specifying a field within a structure gets emitted.
sym | is the root Symbol |
off | is the byte offset, within the Symbol, of the partial variable |
sz | is the number of bytes in the partial variable |
vn | is the Varnode holding the partial value |
op | is a PcodeOp associate with the Varnode |
outtype | is the data-type expected by expression using the partial variable |
Implemented in PrintC.
|
protectedpure virtual |
Push a data-type name onto the RPN expression stack.
The data-type is generally emitted as if for a cast.
ct | is the data-type to push |
Implemented in PrintC.
|
protectedpure virtual |
Push an address as a substitute for a Symbol onto the RPN stack.
If there is no Symbol or other name source for an explicit variable, this method is used to print something to represent the variable based on its storage address.
addr | is the storage address |
vn | is the Varnode representing the variable (if present) |
op | is a PcodeOp associated with the variable |
Implemented in PrintC.
Push a variable as the left-hand side of an expression.
The given Varnode will ultimately be emitted as an explicit variable on the left-hand side of an assignment statement. As with pushVnExplicit(), this method decides how the Varnode will be emitted and pushes the resulting Atom onto the RPN stack.
|
protected |
Emit from the RPN stack as much as possible.
Any complete sub-expressions that are still on the RPN will get emitted.
void PrintLanguage::setCommentDelimeter | ( | const string & | start, |
const string & | stop, | ||
bool | usecommentfill | ||
) |
Establish comment delimiters for the language.
By default, comments are indicated in the high-level language by preceding them with a specific sequence of delimiter characters, and optionally by ending the comment with another set of delimiter characters.
start | is the initial sequence of characters delimiting a comment |
stop | if not empty is the sequence delimiting the end of the comment |
usecommentfill | is true if the delimiter needs to be emitted after every line break |
|
pure virtual |
Set the way comments are displayed in decompiler output.
This method can either be provided a formal name or a sample of the initial delimiter, then it will choose from among the schemes it knows
nm | is the configuration description |
Implemented in PrintC.
void PrintLanguage::setFlat | ( | bool | val | ) |
Set whether nesting code structure should be emitted.
Emitting formal code structuring can be turned off, causing all control-flow to be represented as goto statements and labels.
val | is true if no code structuring should be emitted |
|
virtual |
Set the default integer format.
This determines how integers are displayed by default. Possible values are "hex" and "dec" to force a given format, or "best" can be used to let the decompiler select what it thinks best for each individual integer.
nm | is "hex", "dec", or "best" |
void PrintLanguage::setLineCommentIndent | ( | int4 | val | ) |
Set the number of characters to indent comment lines.
val | is the number of characters |
void PrintLanguage::setXML | ( | bool | val | ) |
Set whether the low-level emitter, emits XML markup.
Tell the emitter whether to emit just the raw tokens or if output is in XML format with additional mark-up on the raw tokens.
val | is true for XML mark-up |
|
staticprotected |
Determine if the given codepoint needs to be escaped.
Separate unicode characters that can be clearly emitted in a source code string (letters, numbers, punctuation, symbols) from characters that are better represented in source code with an escape sequence (control characters, unusual spaces, separators, private use characters etc.
codepoint | is the given unicode codepoint to categorize. |