This is an expanded and improved version of a talk I gave at the last AHA! meeting here in Austin. My slides for that talk can be found here
**
Recently I've been working on my windbg extension, narly. Part of my intended purpose for it is to be able to search for ROP gadgets, but to do that I needed to find a disassembler that I could use. I couldn't use the disassembler available through the windbg extension API, since the only thing you can do is call IDebugControl::Disassemble, which just returns a string of the disassembly. I could have used a number of other open-source disassemblers, but decided to reinvent the wheel and write my own. A lot of people would groan at the thought of reinventing the wheel, but I usually enjoy it since it's one of the best ways I learn.
I've been coding my disassembler straight from the Intel® 64 and IA-32 Architectures Software Developer's Manual: Volume 2A: Instruction Set Reference, A-M and Volume 2B: Instruction Set Reference, N-Z. Reading through specs that define how things are supposed to behave is always interesting, because you start to understand some of the boundaries/corner cases and see ways that some potentially interesting behaviors might occur. This blog post is about some interesting techniques I've come across to represent the same instruction in multiple ways (using a different set of bytes to do the exact same thing.) Note that I've only been able to test this out on Intel processors.
To understand most of what I'll be talking about, you'll have to somewhat understand the parts of an x86 instruction, as well as the "rules" that go with them. Once I go over the "rules", I'll then explain the ways I've come up with to represent the same instruction using different sets of bytes, both by following the rules and by breaking them.
instructions and rules
An x86 instruction is made up of several parts:
+----------+--------+--------+-----+--------------+-----------+ | prefixes | opcode | modR/M | sib | displacement | immediate | +----------+--------+--------+-----+--------------+-----------+
prefixes
+----------+--------+--------+-----+--------------+-----------+
| prefixes | opcode | modR/M | sib | displacement | immediate |
+----------+--------+--------+-----+--------------+-----------+
prefix | name -------+----------------------- 0xf0 | LOCK 0xf2 | REPNE/REPZ 0xf3 | REPE/REPZ | 0x2e | CS Segment override 0x36 | SS Segment override 0x3e | DS Segment override 0x26 | ES Segment override 0x64 | FS Segment override 0x65 | GS Segment override | 0x2e | *Branch not taken 0x3e | *Branch taken | 0x66 | Operand-size override 0x67 | Address-size override *for use only with Jcc instructions
lock rep add dword ptr[eax],eax
01011a3f f0 ??? 01011a40 f30100 rep add dword ptr [eax],eax
Below are some examples of the affects prefixes have on instructions:
prefixes | opcode | modRM | sib | disp | imm | disasm --------------+--------+-------+-----+----------+----------+-------- | 59 | | | | | pop ecx 66 | 59 | | | | | pop cx f366 | 59 | | | | | rep pop cx f266 | 59 | | | | | repne pop cx f22e66 | 59 | | | | | repne pop cx | | | | | | | 8b | 03 | | | | mov eax,dword ptr [ebx] 67 | 8b | 03 | | | | mov eax,dword ptr [bp+di] 6667 | 8b | 03 | | | | mov ax,word ptr [bp+di] 663e | 8b | 03 | | | | mov ax,word ptr ds:[ebx] | | | | | | | 01 | 00 | | | | add dword ptr[eax],eax f0 | 01 | 00 | | | | lock add dword ptr[eax],eax
opcode
+----------+--------+--------+-----+--------------+-----------+
| prefixes | opcode | modR/M | sib | displacement | immediate |
+----------+--------+--------+-----+--------------+-----------+
Each opcode determines how its operands are encoded. For example, an opcode might use the modR/M byte, modR/M and sib bytes, or modR/M byte and displacement,
prefixes | opcode | modRM | sib | disp | imm | disasm --------------+--------+-------+-----+----------+----------+-------- | 8b | 03 | | | | mov eax,dword ptr[ebx] | 8b | 04 | bb | | | mov eax,dword ptr[ebx+edi*4] | 8b | 83 | | 10203040 | | mov eax,dword ptr[ebx+40302010]
prefixes | opcode | modRM | sib | disp | imm | disasm --------------+--------+-------+-----+----------+----------+-------- | b8 | | | | aabbccdd | mov eax,ddccbbaa
prefixes | opcode | modRM | sib | disp | imm | disasm --------------+--------+-------+-----+----------+----------+-------- | 58 | | | | | pop eax eax = 00 | 59 | | | | | pop ecx ecx = 01 | 5a | | | | | pop edx edx = 02 | 5b | | | | | pop ebx ebx = 03 | 5c | | | | | pop esp esp = 04 | 5d | | | | | pop ebp ebp = 05 | 5e | | | | | pop esi esi = 06 | 5f | | | | | pop edi edi = 07
modR/M
+----------+--------+--------+-----+--------------+-----------+ | prefixes | opcode | modR/M | sib | displacement | immediate | +----------+--------+--------+-----+--------------+-----------+
0 1 2 3 4 5 6 7 +---+---+---+---+---+---+---+---+ | mod |reg/opcode | r/m | +-------+-----------+-----------+
Depending on the value of the r/m part of the modR/M byte, the sib byte (32-bit addressing mode only) or a displacement (16 or 32) may also be required. Also, if there is no second operand, the reg/opcode part may be used as an opcode extension.
There are two tables that define the meaning of the modR/M byte, one for each addressing mode:
(tables from Intel® 64 and IA-32 Architectures Software Developer's Manual: Volume 2A: Instruction Set Reference, A-M pages 2-6 and 2-7)
Below are examples of different ways the modR/M byte might be used:
prefixes | opcode | modRM | sib | disp | imm | disasm --------------+--------+-------+-----+----------+----------+-------- | 8b | 03 | | | | mov eax,dword ptr [ebx] | 8b | 05 | |aabbccdd | | mov eax,dword ptr ds:[ddccbbaa] | 8b | 41 | | 10 | | mov eax,dword ptr[ecx+10] | | | | | | 67 | 8b | 03 | | | | mov eax, dword ptr[bp+di] 67 | 8b | 05 | | | | mov eax, dword ptr[di] 67 | 8b | 0c | | | | mov ecx, dword ptr[si] 67 | 8b | 41 | | 10 | | mov eax, dword ptr[bx+di+10] 67 | 8b | 81 | | 1010 | | mov eax, dword ptr[bx+di+1010]
sib
+----------+--------+--------+-----+--------------+-----------+
| prefixes | opcode | modR/M | sib | displacement | immediate |
+----------+--------+--------+-----+--------------+-----------+
[--][--]
0 1 2 3 4 5 6 7 +---+---+---+---+---+---+---+---+ | scale | index | base | +-------+-----------+-----------+
prefixes | opcode | modRM | sib | disp | imm | disasm --------------+--------+-------+-----+----------+----------+-------- | 8b | 1c | 83 | | | mov ebx,dword ptr[ebx+eax*4] | | \----> scale | | | \----> index | \----> base
(table is from Intel® 64 and IA-32 Architectures Software Developer's Manual: Volume 2A: Instruction Set Reference, A-M page 2-8)
Some examples on using the sib byte are below:
prefixes | opcode | modRM | sib | disp | imm | disasm --------------+--------+-------+-----+----------+----------+-------- | 8b | 1c | 03 | | | mov ebx,dword ptr[ebx+eax] | 8b | 1c | 43 | | | mov ebx,dword ptr[ebx+eax*2] | 8b | 1c | 83 | | | mov ebx,dword ptr[ebx+eax*4] | 8b | 1c | c3 | | | mov ebx,dword ptr[ebx+eax*8] | | | | | | | 8b | 1c | 23 | | | mov ebx,dword ptr[ebx] | 8b | 1c | 63 | | | mov ebx,dword ptr[ebx] | 8b | 1c | a3 | | | mov ebx,dword ptr[ebx] | 8b | 1c | e3 | | | mov ebx,dword ptr[ebx]
immediate
+----------+--------+--------+-----+--------------+-----------+
| prefixes | opcode | modR/M | sib | displacement | immediate |
+----------+--------+--------+-----+--------------+-----------+
prefixes | opcode | modRM | sib | disp | imm | disasm --------------+--------+-------+-----+----------+----------+-------- | 04 | | | | aa | add al,aa 66 | 05 | | | | aaaa | add ax,aaaa | 05 | | | | aaaaaaaa | add eax,aaaaaaaa
multiple representations
Now it's time for the good stuff :^)
playing fair with modR/M
If you take another look at the 16 and 32-bit modR/M tables, you'll notice that when the mod part is 0x3 (binary 11), the operand is not dereferenced. This can be used to create another representation of an instruction:
prefixes | opcode | modRM | sib | disp | imm | disasm --------------+--------+-------+-----+----------+----------+-------- | 5a | | | | | pop edx | 8f | c2 | | | | pop edx
playing fair with sib
In the sib table, you'll notice that whenever the index part is 0x4 (100 binary), that a scaled index isn't used. This can also let us make multiple representations of an instruction:
prefixes | opcode | modRM | sib | disp | imm | disasm --------------+--------+-------+-----+----------+----------+-------- | 8b | 00 | | | | mov eax,dword ptr[eax] | 8b | 04 | 20 | | | mov eax,dword ptr[eax] | 8b | 04 | 60 | | | mov eax,dword ptr[eax] | 8b | 04 | a0 | | | mov eax,dword ptr[eax] | 8b | 04 | e0 | | | mov eax,dword ptr[eax]
being a bully
Now is where we start going into undefined territory.
Certain prefixes are only supposed to be used with certain instructions, and certain prefixes only make sense to be used when certain types of operands are used. For example, if a modR/M byte isn't used, the 0x67 prefix doesn't always do something:
prefixes | opcode | modRM | sib | disp | imm | disasm --------------+--------+-------+-----+----------+----------+-------- 67 | 58 | | | | | pop eax
prefixes | opcode | modRM | sib | disp | imm | disasm --------------+--------+-------+-----+----------+----------+-------- | a1 | | | 10203040 | | mov eax,dword ptr ds:[40302010] 67 | a1 | | | 1020 | | mov eax,dword ptr ds:[00002010]
prefixes | opcode | modRM | sib | disp | imm | disasm --------------+--------+-------+-----+----------+----------+-------- 2e | 58 | | | | | pop eax 3e | 58 | | | | | pop eax
prefixes | opcode | modRM | sib | disp | imm | disasm --------------+--------+-------+-----+----------+----------+-------- | 8b | 00 | | | | mov eax,dword ptr[eax] 2e | 8b | 00 | | | | mov eax,dword ptr cs:[eax] 3e | 8b | 00 | | | | mov eax,dword ptr ds:[eax]
being a bully*4
Earlier in this post I mentioned that up to four prefixes are allowed on an instruction, and that they can be in any order. This also applies to the "nop" prefixes: they can be in any order, but they can also be repeated:
prefixes | opcode | modRM | sib | disp | imm | disasm --------------+--------+-------+-----+----------+----------+-------- 67 | 58 | | | | | pop eax 6767 | 58 | | | | | pop eax 676767 | 58 | | | | | pop eax 67676767 | 58 | | | | | pop eax 67 | 58 | | | | | pop eax 6766 | 58 | | | | | pop ax 673e67 | 58 | | | | | pop eax 3e66672e | 58 | | | | | pop ax
0100c38c 67 ??? 0100c38d 67 ??? 0100c38e 67 ??? 0100c38f 6758 pop eax
being a bully*14
Being a bully times 14? No way, you say. YES WAY. I decided to test the "rule" about having only four prefixes. It's a lie! In my testing, I was able to add up to fourteen prefixes before I started getting errors. Below are some examples:
prefixes | opcode | modRM | sib | disp | imm | disasm --------------+--------+-------+-----+----------+----------+-------- 6767676767676767676767676767 | 58 | | | | | pop eax 673e2e673e2e673e2e673e2e6767 | 58 | | | | | pop eax windbg disassembly: 0100c3a2 67 ??? 0100c3a3 67 ??? 0100c3a4 67 ??? 0100c3a5 67 ??? 0100c3a6 67 ??? 0100c3a7 67 ??? 0100c3a8 67 ??? 0100c3a9 67 ??? 0100c3aa 67 ??? 0100c3ab 67 ??? 0100c3ac 67 ??? 0100c3ad 67 ??? 0100c3ae 67 ??? 0100c3af 6758 pop eax
Crazy stuff, huh? The biggest thing I learned from this is to question everything. I'm not used to having to question my disassembler too much. That's something I've been able to trust up to this point, at least for disassembling _one_instrustion_. I almost missed seeing most of this behavior because the first few times I saw windbg fail to disassemble a single instruction, I thought it meant that it wouldn't execute. Luckily, I stepped (a literal hit-F10-and-execute-the-next-instruction step) through it anyways and noticed this behavior.
One other thing to note is that this isn't the first time somebody has run across the multiple-prefixes behavior. A quick conversation with Mark Dowd made it apparent that this behavior is pretty well-known, that different processors may allow even more than 14 prefixes, and that there are major differences in how emulators and native clients execute instructions.
I have other ideas for some more potentially interesting research in this area, but this is all for now. Laters.