Legal part
This package includes source code of 32-bit Disassembler and 32-bit single line Assembler for 80x86-compatible processors. The source is a slightly stripped version of code used in OllyDbg v1.04 and is well proven by its numerous users. (If you haven't heard before, OllyDbg is a 32-bit Assembler level debugger with powerful analyzing capabilities that makes binary machine code understandable).
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License (http://www.fsf.org/copyleft/gpl.html) for more details.
You should have received a copy of the GNU General Public License (gpl.txt) along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA.
All brand names and product names used in 80x86 Assembler and Disassembler, accompanying files or in this help file are trademarks, registered trademarks, or trade names of their respective holders.
Introduction
Disassembler understands all standard 80x86 commands, FPU, MMX, AMD's MMX extensions, Athlon/PIII MMX extensions and 3DNow! instructions. It does not decode SSI or SSI2 commands. Disassembler assumes 32 bit code and data segments but correctly decodes prefixed 16-bit commands. Several decoding modes allow you to select the amount of returned information (which is inversely proportional to execution speed): command length only, basic information useful for code analysis, or full decoding with dump and assembler form. Multiple options select desired format. Disassembler and Assembler support both MASM and Borland's IDEAL modes.
Assembler converts single command from the ASCII form to the binary code. It allows to find several possible encodings, or even to create search patterns with undefined operands.
This package includes following files:
- disasm.h - common definitions
- disasm.c - Disassembler
- assembl.c - Assembler
- asmserv.c - table of commands and service functions
- main.c - demo program
#define MAINPROG // Place all unique variables here
#include "disasm.h"
(I use this trick to define shared global variables). Below is a small piece of code disassembled with OllyDbg 1.04 using different text settings:
004505B3 A1 DC464B00 MOV EAX,DS:[4B46DC] 004505B8 8B0498 MOV EAX,DS:[EAX+EBX*4] 004505BB 50 PUSH EAX 004505BC 8D85 E0FBFFFF LEA EAX,SS:[EBP-420] 004505C2 50 PUSH EAX 004505C3 E8 141BFCFF CALL 004120DC 004505C8 83C4 08 ADD ESP,8 004505CB 43 INC EBX 004505CC 3B1D D8464B00 CMP EBX,DS:[4B46D8] 004505D2 0F8C AFFEFFFF JL 00450487 004505D8 80BD E0FDFFFF 00 CMP BYTE PTR SS:[EBP-220],0 004505DF 75 14 JNZ SHORT 004505F5 004505E1 68 B39E4600 PUSH 469EB3 004505E6 8D85 E0FDFFFF LEA EAX,SS:[EBP-220] 004505EC 50 PUSH EAX 004505ED E8 521BFCFF CALL 00412144 |
004505B3 A1 DC464B00 mov eax,[dword ds:4B46DC] 004505B8 8B0498 mov eax,[dword ds:eax+ebx*4] 004505BB 50 push eax 004505BC 8D85 E0FBFFFF lea eax,[dword ss:ebp-420] 004505C2 50 push eax 004505C3 E8 141BFCFF call 004120DC 004505C8 83C4 08 add esp,8 004505CB 43 inc ebx 004505CC 3B1D D8464B00 cmp ebx,[dword ds:4B46D8] 004505D2 0F8C AFFEFFFF jl 00450487 004505D8 80BD E0FDFFFF 00 cmp [byte ss:ebp-220],0 004505DF 75 14 jnz short 004505F5 004505E1 68 B39E4600 push 469EB3 004505E6 8D85 E0FDFFFF lea eax,[dword ss:ebp-220] 004505EC 50 push eax 004505ED E8 521BFCFF call 00412144 |
Brief description of functions
- int Assemble(char *cmd,ulong ip,t_asmmodel *model,int attempt,int constsize,char *errtext) - assembles text command to binary code;
- int Checkcondition(int code,ulong flags) - checks whether flags met condition in the command;
- int Decodeaddress(ulong addr,ulong base,int addrmode,char *symb,int nsymb,char *comment) - user-supplied function that decodes addresses into symbolic names;
- ulong Disasm(char *src,ulong srcsize,ulong srcip,t_disasm *disasm,int disasmmode) - determines length of the binary command or disassembles it to the text;
- ulong Disassembleback(char *block,ulong base,ulong size,ulong ip,int n) - walks binary code backward;
- ulong Disassembleforward(char *block,ulong base,ulong size,ulong ip,int n) - walks binary code forward;
- int Isfilling(ulong addr,char *data,ulong size,ulong align) - determines whether command is equivalent to NOP;
- int Print3dnow(char *s,char *f) - converts 3DNow! constant to text without triggering FPU exception for invalid operands;
- int Printfloat10(char *s,long double ext) - converts 10-byte floating constant to text without causing exception;
- int Printfloat4(char *s,float f) - converts 4-byte floating constant to text without causing exception;
- int Printfloat8(char *s,double d) - converts 8-byte floating constant to text without causing exception.
Assemble
Function Assemble(), as expected, converts command from ASCII form to binary 32 bit code. It shares command table with Disasm(), so if some command can be disassembled, it can be assembled back too, with one exception: Assemble doesn't support 16 bit addresses. With some unimportant exceptions, 16 bit addresses cannot be used in Win32 programs.
Some commands have more than one encoding. Assemble() allows you to find them all. This is important, for example, if you want to find the shortest possible code or to find all possible occurrences of this command in the code. There are two parameters, constsize and attempt. First parameter selects size of immediate constant and address constant (8 or 32 bits), second is the occurrence of the command in the command table. To find all variants, call Assemble() with attempt=0,1,2... and for each attempt with constsize=0,1,2,3 as long as function reports success for at least one constsize. Generated codes may repeat. Please note that if command uses memory addresses, only one form will be generated in each case: [EAX*2] but not [EAX+EAX]; [EBX+EAX] but not [EAX+EBX]; [EAX] will not use SIB byte; no DS: prefix and so on.
Assemble compiles also imprecise commands that include following generalized operands:
- R8 - any 8-bit register (stays for AL, BL, CL, DL, AH, BH, CH, DH)
- R16 - any 16 bit register (AX, BX, CX, DX, SP, BP, SI, DI)
- R32 - any 32 bit register (EAX, EBX, ECX, EDX, ESP, EBP, ESI, EDI)
- FPU - any FPU register (ST0..ST7)
- MMX - any MMX register (MM0..MM7)
- CRX - any control register (CR0..CR7)
- DRX - any debug register (DR0..DR7)
- CONST - any constant
Function returns number of bytes in assembled code or non-positive (zero or negative) number in case of error or when variant selected by combination of attempt and constsize doesn't exist. This number is the negative position of error in the input command. If you generate executable code, imprecise commands are usually not allowed. To assure that command is precise, check that all significant bytes in mask contain 0xFF.
int Assemble(char *cmd,ulong ip,t_asmmodel *model,int attempt,int constsize,char *errtext);
Parameters:
- cmd - pointer to zero terminated ASCII command;
- pi - address of the first byte of generated binary command in memory;
- model - pointer to the structure that receives machine code and mask, see detailed description below;
- attempt - index of alternative encoding of the command. Call Assemble with attempt=0,1,2... to obtain all possible versions of the command. Stop this sequence when Assemble reports error;
- constsize - requested size of address constant and immediate data. Call Assemble with constsize=0,1,2,3 to obtain all possible encodings of the version selected by attempt;
- errtext - pointer to text buffer of length at least TEXTLEN bytes that receives description of detected error.
typedef struct t_asmmodel { // Model to search for assembler command
char code[MAXCMDSIZE]; // Binary code
char mask[MAXCMDSIZE]; // Mask for binary code (0: bit ignored)
int length; // Length of code, bytes (0: empty)
int jmpsize; // Offset size if relative jump
int jmpoffset; // Offset relative to IP
int jmppos; // Position of jump offset in command
} t_asmmodel;
Members:
- code - binary code of the command. Only bits that have 1's in corresponding mask bits are significant;
- mask - comparison mask. Search routine ignores all code bits where mask is set to 0;
- length - length of code and mask, bytes. If length is 0, search model is empty or invalid;
- jmpsize - if nonzero, command is a relative jump and jmpsize is a size of offset in bytes;
- jmpoffset - if jmpsize is nonzero, jump offset relative to address of the following command, otherwise undefined;
- jmppos - if jmpsize is nonzero, position of the first byte of the offset in code, otherwise undefined.
Checkcondition
Checks whether 80x86 flags meet condition code in the command. Returns 1 if condition is met and 0 if not.
int Checkcondition(int code,ulong flags);
Parameters:
- code - byte of command that contains condition code;
- flags - contents of register EFL.
Decodeaddress
Custom user-supplied function that converts constant (address) into symbolic name. Initially, source code includes dummy function that returns 0.
Decodeaddress() decodes memory address or constant to the ASCII string and optionally comments this address. Returns length of decoded string (not including terminal 0), or 0 on error or if symbolic name is not available.
int Decodeaddress(ulong addr,char *symb,int nsymb,char *comment);
Parameters:
- addr - address to decode in address space of debugged program;
- symb - pointer to buffer of length at least nsymb bytes where Decodeaddress() places decoded string;
- nsymb - length, in characters, of buffer symb;
- comment - pointer to string of length at least TEXTLEN bytes or NULL, receives comment associated with addr.
Disasm
The most important (and complex) function in this package. Depending on the specified disasmmode, Disasm() performs one of the four functions:
- DISASM_SIZE - quickly determines size of the command. Use this mode if you want to walk through the code. In this mode, treat all members of disasm as undefined;
- DISASM_DATA - determines size and analyses operands. Use this mode for quick analysis, for example, if you need to calculate jump destination. Members of disasm marked with asterisk (*) are undefined;
- DISASM_FILE - determines size, analyses operand and disassembles command, but doesn't attempt to convert addresses to symbols. Use this mode if there is no correspondence between addresses and symbols, for example, if you dump the contents of binary file;
- DISASM_CODE - full disassembly.
ulong Disasm(char *src,ulong srcsize,ulong srcip,t_disasm *disasm,int disasmmode);
Parameters:
- src - pointer to binary code that must be disassembled;
- srcsize - size of src. Length of 80x86 command is limited to MAXCMDSIZE bytes;
- srcip - address of the command;
- disasm - pointer to structure that receives results of disassembling, see detailed description below;
- disasmmode - disassembly mode, one of DISASM_xxx (see above).
typedef struct t_disasm { // Results of disassembling
ulong pi; // Instruction pointer
char dump[TEXTLEN]; // (*) Hexadecimal dump of the command
char result[TEXTLEN]; // (*) Disassembled command
char comment[TEXTLEN]; // (*) Brief comment
int cmdtype; // One of C_xxx
int memtype; // Type of addressed variable in memory
int nprefix; // Number of prefixes
int indexed; // Address contains register(s)
ulong jmpconst; // Constant jump address
ulong jmptable; // Possible address of switch table
ulong adrconst; // Constant part of address
ulong immconst; // Immediate constant
int zeroconst; // Whether contains zero constant
int fixupoffset; // Possible offset of 32 bit fixups
int fixupsize; // Possible total size of fixups or 0
int error; // Error while disassembling command
int warnings; // Combination of DAW_xxx
} t_disasm;
Members:
- pi - address of the disassembled command;
- dump - ASCII string, formatted hexadecimal dump of the command;
- result - ASCII string, disassembled command itself;
- comment - ASCII string, brief comment that applies to the whole command;
- cmdtype - type of the disassembled command, one of C_xxx possibly ORed with C_RARE to indicate that command is seldom in ordinary Win32 applications. Commands of type C_MMX additionally contain size of MMX data in the 3 least significant bits (0 means 8-byte operands). Non-MMX commands may have C_EXPL bit set which means that some memory operand has size which is not conform with standard 80x86 rules;
- memtype - type of memory operand, one of DEC_xxx, or DEC_UNKNOWN if operand is non-standard or command does not access memory;
- nprefix - number of prefixes that this command contains;
- indexed - if memory address contains index register, set to scale, otherwise 0;
- jmpconst - address of jump destination if this address is a constant, and 0 otherwise;
- jmptable - if indirect jump can be interpreted as switch, base address of switch table and 0 otherwise;
- adrconst - constant part of memory address;
- immconst - immediate constant or 0 if command contains no immediate constant. The only command that contains two immediate constants is ENTER. Disasm() ignores second constant which is anyway 0 in most cases;
- zeroconst - nonzero if command contains immediate zero constant;
- fixupoffset - possible start of 32 bit fixup within the command, or 0 if command can't contain fixups;
- fixupsize - possible total size of fixups (0, 4 or 8). If command contains both immediate constant and immediate address, they are always adjacent on 80x86 processors;
- error - Disasm() was unable to disassemble command (for example, command does not exist or crosses end of memory block), one of DAE_xxx;
- warnings - command is suspicious or meaningless (for example, far jump or MOV EAX,EAX preceded with segment prefix), combination of DAW_xxx bits;
- ideal - force IDEAL decoding mode
- lowercase - force lowercase
- tabarguments - insert tab between mnemonic and arguments
- extraspace - insert extra space between arguments
- putdefseg - show default segments
- showmemsize - always show memory size
- shownear - show NEAR modifiers
- shortstringcmds - use short form of string commands
- sizesens - mode of decoding of size-sensitive mnemonics (16/32 bits) like:
1 - PUSHAW/PUSHAD
2 - PUSHAW/PUSHA
- symbolic - show symbolic addresses, requires Decodeaddress()
- farcalls - accept far calls, returns & addresses
- decodevxd - decode VxD calls (Win95/98)
- privileged - accept privileged commands
- iocommand - accept I/O commands
- badshift - accept shift out of range 1..31
- extraprefix - accept superfluous prefixes
- lockedbus - accept LOCK prefixes
- stackalign - accept unaligned stack operations
- iswindowsnt - when checking for dangerous commands, assume NT-based OS
Disassembleback
Calculates address of assembler instruction that is n instructions (maximally 127) back from the instruction at specified pi. Returns address of found instruction. In case of error, it may be less than n instructions apart.
80x86 commands have variable length. Disassembleback uses heuristical methods to separate commands and in some (astoundingly rare!) cases may return invalid answer.
ulong Disassembleback(char *block,ulong base,ulong size,ulong ip,int n);
Parameters:
- block - pointer to the copy of code;
- base - address of first byte in the code block;
- size - size of code block;
- pi - address of current instruction;
- n - number of instructions to walk back.
Disassembleforward
Calculates address of assembler instruction that is n instructions forward from instruction at specified address. Returns address of found instruction. In case of error, it may be less than n instructions apart.
ulong Disassembleforward(char *block,ulong base,ulong size,ulong ip,int n,int usedec);
Parameters:
- block - pointer to the copy of code;
- base - address of first byte in the code block;
- size - size of code block;
- pi - address of current instruction;
- n - number of instructions to walk forward.
Isfilling
Function determines whether pointed instruction is a no-action command (equivalent to NOP) used by different compilers to fill the gap between procedures or data blocks to a specified aligned border. Returns length of filling command in bytes or 0 if command is not a recognized filling.
int Isfilling(ulong addr,char *data,ulong size,ulong align);
Parameters:
- addr - address of the first byte of analyzed command;
- data - pointer to the binary command;
- size - size of data;
- align - assumed alignment of the next non-filling command (power of 2), or 0 if alignment is not required.
Printfloat* functions
These functions decode 4-, 8-, 10-byte floating point number or 8-byte 3DNow! operand into the text form to string s. They correctly decode all cases of NANs or INFs without triggering floating point exceptions. If operand is not a valid floating point number, functions print hexadecimal dump of the number. Return length of decoded string in bytes, not including terminal 0.
int Print3dnow(char *s,char *f);
int Printfloat10(char *s,long double ext);
int Printfloat4(char *s,float f);
int Printfloat8(char *s,double d);
Copyleft (C) 2001 Oleh Yuschuk