Zoom Icon

X64 Assembly

From UIC Archive

x64 Assembly


X64 Assembly
Author: Daniel Pistelli
Website: http://ntcore.com
Date: 01/01/2007 (dd/mm/yyyy)
Level: Working brain required
Language: English Flag English.gif

Links e References

AMD64 documentation


This article is extracted from "Moving to Windows x64It was published into New Years Pack 2: downloadable from 'Documents from UIC'" by Daniel Pistelli (Ntoskrnl)


Now I'll try to explain the basics of x64 assembly. I assume the reader is already familiar with x86 assembly, otherwise he won't be able to make heads or tails of this paragraph.
Moreover, since this is just a very (but very) brief guide, you'll have to look into the AMD64 documentation for more advanced stuff. Some stuff I won't even mention, you'll see by yourself that some instructions are no longer in use: for instance, that the lea instruction has completely taken place of the mov offset.

What you're going to notice at once is that there are some more registers in the x64 syntax:

  • 8 new general-purpose registers (GPRs).
  • 8 new 128-bit XMM registers.

Of course, all general-purpose registers are 64 bits wide. The old ones we already knew are easy to recognize in their 64-bit form: rax, rbx, rcx, rdx, rsi, rdi, rbp, rsp (and rip if we want to count the instruction pointer). These old registers can still be accessed in their smaller bit ranges, for instance: rax, eax, ax, ah, al.
The new registers go from r8 to r15, and can be accessed in their various bit ranges like this: r8 (qword), r8d (dword), r8w (word), r8b (low byte).

Here's a figure taken from the AMD docs:

X64 registers.jpg

Applications can still use segments registers as base for addressing, but the 64-bit mode only recognizes three of the old ones (and only two can be used for base address calculations). Here's another figure:

X64 segments.jpg

And now, the most important things. Calling convention and stack. x64 assembly uses FASTCALLs as calling convention, meaning it uses registers to pass the first 4 parameters (and then the stack).
Thus, the stack frame is made of: the stack parameters, the registers parameters, the return address (which I remind you is a qword) and the local variables.
The first parameter is the rcx register, the second one rdx, the third r8 and the fourth r9. Saying that the parameters registers are part of the stack frame, makes it also clear that any function that calls another child function has to initialize the stack providing space for these four registers, even if the parameters passed to the child function are less than four.
The initialization of the stack pointer is done only in the prologue of a function, it has to be large enough to hold all the arguments passed to child functions and it's always a duty of the caller to clean the stack. Now, the most important thing to understand how the space is provided in the stack frame is that the stack has to be 16-byte aligned.
In fact, the return address has to be aligned to 16 bytes. So, the stack space will always be something like 16n + 8, where n depends on the number of parameters. Here's a small figure of a stack frame:


Don't worry if you haven't completely figured out how it works: now we will see a few code samples, which, in my opinion, always make the theory a lot easier to understand. Let us take for instance a hello-world application like:

int WINAPI _tWinMain(HINSTANCE hInstance, HINSTANCE hPrevInstance, LPSTR szCmdLine, int iCmdShow) {

   MessageBox(NULL, _T("Hello World!"), _T("My First x64 Application"), 0);
   return 0;


This code disassembled would look like: .text:0000000000401220 sub_401220 proc near  ; CODE XREF: start+10E p .text:0000000000401220 .text:0000000000401220 arg_0= qword ptr 8 .text:0000000000401220 arg_8= qword ptr 10h .text:0000000000401220 arg_10= qword ptr 18h .text:0000000000401220 arg_18= dword ptr 20h .text:0000000000401220 .text:0000000000401220 mov [rsp+arg_18], r9d .text:0000000000401225 mov [rsp+arg_10], r8 .text:000000000040122A mov [rsp+arg_8], rdx .text:000000000040122F mov [rsp+arg_0], rcx .text:0000000000401234 sub rsp, 28h .text:0000000000401238 xor r9d, r9d  ; uType .text:000000000040123B lea r8, Caption  ; "My First x64 Application" .text:0000000000401242 lea rdx, Text  ; "Hello World!" .text:0000000000401249 xor ecx, ecx  ; hWnd .text:000000000040124B call cs:MessageBoxA .text:0000000000401251 xor eax, eax .text:0000000000401253 add rsp, 28h .text:0000000000401257 retn .text:0000000000401257 sub_401220 endp

The stack pointer initialization is all about the things I said earlier.
Since we are calling a child-function with parameters we need the space for all four parameter registers (0x20, this value is already aligned to 16 byte) and the return address (0x08). Thus, we'll have 0x28.
Remember that if the stack-value is too small or is not aligned, your code will crash at once. Also, don't wonder why there's no ExitProcess in this function: compiling the code above with Visual C++ adds always a stub (WinMainCRTStartup) which then calls our WinMain.
So, the ExitProcess is in the stub code. But what happens when the code before the MessageBox calls a function which take seven parameters instead of four? .text:0000000000401180 sub_401180 proc near  ; CODE XREF: sub_4011F0+4 p .text:0000000000401180  ; sub_4011F0+11 p .text:0000000000401180 .text:0000000000401180 var_28= qword ptr -28h .text:0000000000401180 var_20= qword ptr -20h .text:0000000000401180 var_18= qword ptr -18h .text:0000000000401180 .text:0000000000401180 sub rsp, 48h .text:0000000000401184 lea rax, unk_402040 .text:000000000040118B mov [rsp+48h+var_18], rax .text:0000000000401190 lea rax, unk_402044 .text:0000000000401197 mov [rsp+48h+var_20], rax .text:000000000040119C lea rax, unk_402048 .text:00000000004011A3 mov [rsp+48h+var_28], rax .text:00000000004011A8 lea r9, qword_40204C  ; __int64 .text:00000000004011AF lea r8, qword_40204C+4  ; __int64 .text:00000000004011B6 lea rdx, unk_402054  ; __int64 .text:00000000004011BD lea rcx, aAa  ; "ptr" .text:00000000004011C4 call TakeSevenParameters .text:00000000004011C9 xor r9d, r9d  ; uType .text:00000000004011CC lea r8, Caption  ; "My First x64 Application" .text:00000000004011D3 lea rdx, Text  ; "Hello World!" .text:00000000004011DA xor ecx, ecx  ; hWnd .text:00000000004011DC call cs:MessageBoxA .text:00000000004011E2 add rsp, 48h .text:00000000004011E6 retn .text:00000000004011E6 sub_401180 endp

As said, the child function takes 7 parameters, making it necessary to provide space for 3 extra parameters on the stack. So, 7 * 8 = 0x38, which aligned to 16byte is 0x40. Providing, then, space for the return address makes it 0x48, our value indeed.
I think you have understood the stack-frames logic by now, it's actually quite easy to understand it, but it needs a second to revert from the old x86/stdcall logic to this one. But now enough of this, now that we've seen how the x64 code works, we'll try compiling an assembly source by ourselves.

Before we start, I have to make something clear. There are some assemblers over the internet which make the job easier, mainly because the initialize the stack by themselves or they create code that is easy to converto from/to x86.
But I think that is not the point here in this article. In fact, I'm going to use the microsoft assembler (ml64.exe), which requires you to write everything down, just like in the disassembly. Another option could be compiling the with another assembler and then link it with ml64.
I think the reader should really make these decisions on his own. As far as I am concerned, I don't believe that much code should be written in assembly and avoided whenever it could be done. This new x64 technology is a good opportunity to re-think about these matters.
In the last years I always wrote 64-bit compatible code in C/C++ (I mean unmanaged, of course) and when I had to recompile a project of 70,000 lines of code for x64, I didn't had to change one single line of code (I'll talk about the C/C++ programming later). Despite of all the macros an assembler offers, I seriously doubt that people who wrote their whole code in assembly will be able to switch so easily to x64 (remember one day even the IA64 syntax could be adopted). I think in most cases the obvious choice will be not converting to the new technology and stick to x86, but this isn't always possible, it depends on the software category.

The microsoft assembler is contained in the SDK and in the DDK (WDK for Vista). Right now, I'm using Vista's WDK, which I freely downloaded from the msdn. The first sample of code I'm going to show you is a simple Hello-World messagebox application.

extrn MessageBoxA : proc extrn ExitProcess : proc

.data body db 'Hello World!', 0 capt db 'My First x64 Application', 0

.code Main proc sub rsp, 28h xor r9d, r9d  ; uType = 0 lea r8, capt  ; lpCaption lea rdx, body  ; lpText xor rcx, rcx  ; hWnd = NULL call MessageBoxA xor ecx, ecx  ; exit code = 0 call ExitProcess Main endp


As you can see, I didn't bother unwinding the stack, since I call ExitProcess. The syntax is very similar to the old MASM one, although there are a few dissimalirites. The ml64 console output should be something like this:

The command line to compile is: ml64 C:\...\test.asm /link /subsystem:windows /defaultlib:C:\WinDDK\6000\lib\wnet\amd64\kernel32.lib /defaultlib:C:\WinDDK\6000\lib\wnet\amd64\user32.lib /entry:Main

If the libs are not in the same directory as ml64.exe, you'll have to provide the path like I did. The entry has to be provided, otherwise you would have to use WinMainCRTStartup as main entry.

The next sample of code I'm going to show you displays a window calling CreateWindowEx. What you're going to learn through this code is structure alignment and how integrating resources in your projects.
Like I said earlier, I don't want to encourage you to write your windows in assembly, but I believe that this sort of code is good for learning. Now the code, afterwards the explanation. extrn GetModuleHandleA : proc extrn MessageBoxA : proc extrn RegisterClassExA : proc extrn CreateWindowExA : proc extrn DefWindowProcA : proc extrn ShowWindow : proc extrn GetMessageA : proc extrn TranslateMessage : proc extrn DispatchMessageA : proc extrn PostQuitMessage : proc extrn DestroyWindow : proc extrn ExitProcess : proc


 cbSize            dd      ?
 style             dd      ?
 lpfnWndProc       dq      ?
 cbClsExtra        dd      ?
 cbWndExtra        dd      ?
 hInstance         dq      ?
 hIcon             dq      ?
 hCursor           dq      ?
 hbrBackground     dq      ?
 lpszMenuName      dq      ?
 lpszClassName     dq      ?
 hIconSm           dq      ?


POINT struct

 x                 dd      ?
 y                 dd      ?

POINT ends

MSG struct

 hwnd              dq      ?
 message           dd      ?
 padding1          dd      ?      ; padding
 wParam            dq      ?
 lParam            dq      ?
 time              dd      ?
 pt                POINT   <>
 padding2          dd      ?      ; padding

MSG ends


.data szWindowClass db 'FirstApp', 0 szTitle db 'My First x64 Windows', 0 szHelpTitle db 'Help', 0 szHelpText db 'This will be a big help...', 0

.data? hInstance qword ? hWnd qword ? wndclass WNDCLASSEX <> wmsg MSG <>


WndProc: //; proc hWnd : qword, uMsg : dword, wParam : qword, lParam : qword

 mov [rsp+8], rcx       // ; hWnd (save parameters as locals)
 mov [rsp+10h], edx     // ; Msg
 mov [rsp+18h], r8      // ; wParam
 mov [rsp+20h], r9      // ; lParam
 sub rsp, 38h
 cmp edx, WM_DESTROY
 jnz @next1

 xor ecx, ecx          //; exit code
 call PostQuitMessage
 xor rax, rax


 cmp edx, WM_COMMAND
 jnz @default

 mov rbx, rsp
 add rbx, 38h
 mov r10, [rbx+18h]    // ; wParam
 cmp r10w, IDM_ABOUT
 jz @about
 cmp r10w, IDM_EXIT
 jz @exit
 jmp @default


 xor r9d, r9d
 lea r8, szHelpTitle
 lea rdx, szHelpText
 xor ecx, ecx
 call MessageBoxA
 jmp @default


 mov rbx, rsp
 add rbx, 38h
 mov rcx, [rbx+8h]      // ; hWnd
 call DestroyWindow


 mov rbx, rsp
 add rbx, 38h
 mov r9, [rbx+20h]      // ; lParam
 mov r8, [rbx+18h]      // ; wParam
 mov edx, [rbx+10h]     // ; Msg
 mov rcx, [rbx+8]       // ; hWnd
 call DefWindowProcA
 add rsp, 38h

MyRegisterClass: //; proc hInst : qword

 sub rsp, 28h
 mov wndclass.cbSize, sizeof WNDCLASSEX
 mov eax, CS_VREDRAW
 or eax, CS_HREDRAW
 mov wndclass.style, eax
 lea rax, WndProc
 mov wndclass.lpfnWndProc, rax
 mov wndclass.cbClsExtra, 0
 mov wndclass.cbWndExtra, 0
 mov wndclass.hInstance, rcx
 mov wndclass.hIcon, NULL
 mov wndclass.hCursor, NULL
 mov wndclass.hbrBackground, COLOR_WINDOW
 mov wndclass.lpszMenuName, IDC_MENU
 lea rax, szWindowClass
 mov wndclass.lpszClassName, rax
 mov wndclass.hIconSm, NULL
 lea rcx, wndclass
 call RegisterClassExA
 add rsp, 28h

InitInstance: //; proc hInst : qword

 sub rsp, 78h        
 xor rbx, rbx
 mov [rsp+58h], rbx           // ; lpParam
 mov [rsp+50h], rcx           // ; hInstance
 mov [rsp+48h], rbx           // ; hMenu = NULL
 mov [rsp+40h], rbx           // ; hWndParent = NULL
 mov [rsp+38h], rbx           // ; Height
 mov [rsp+30h], rax           // ; Width
 mov [rsp+28h], rbx            //; Y
 mov [rsp+20h], rax           // ; X
 mov r9d, WS_OVERLAPPEDWINDOW  //; dwStyle
 lea r8, szTitle               //; lpWindowName
 lea rdx, szWindowClass        //; lpClassName
 xor ecx, ecx                 // ; dwExStyle
 call CreateWindowExA
 mov hWnd, rax
 mov edx, SW_SHOW
 mov rcx, hWnd
 call ShowWindow
 mov rax, hWnd                 //; set return value
 add rsp,78h

Main proc

 sub rsp, 28h
 xor rcx, rcx    
 call GetModuleHandleA
 mov hInstance, rax
 mov rcx, rax
 call MyRegisterClass
 test rax, rax
 jz @close              //; if the RegisterClassEx fails, exit

 mov rcx, hInstance
 call InitInstance
 test rax, rax
 jz @close              //; if the InitInstance fails, exit

@handlemsgs: //; message processing routine

 xor r9d, r9d          
 xor r8d, r8d
 xor edx, edx
 lea rcx, wmsg
 call GetMessageA
 test eax, eax
 jz @close
 lea rcx, wmsg
 call TranslateMessage
 lea rcx, wmsg
 call DispatchMessageA
 jmp @handlemsgs


 xor ecx, ecx   
 call ExitProcess

Main endp


As you can see, I tried to stay as low level as I could. The reason why I avoided for other functions other than the main the proc macro is that the ml64 puts a prologue end an epilogue, which I didn't want, by itself.
Avoiding the macro made it possible to define my own stack frame without any intermission by the compiler. The first thing to notice scrolling this code is the structure: MSG struct

 hwnd              dq      ?
 message           dd      ?
 padding1          dd      ?      //; padding
 wParam            dq      ?
 lParam            dq      ?
 time              dd      ?
 pt                POINT   <>
 padding2          dd      ?     // ; padding

MSG ends

It requires two paddings which the x86 declaration of the same structure didn't. The reason, in a few words, is that qword members should be aligned to qword boundaries (this for the first padding).
The additional padding at the end of the structure follows the rule that: every structure should be aligned to its largest member. So, being its largest member a qword, the structure should be aligned to an 8-byte boundary.

To compile this sample, the command line is: ml64 c:\myapp\test.asm /link /subsystem:windows /defaultlib:C:\WinDDK\6000\lib\wnet\amd64\kernel32.lib /defaultlib:C:\WinDDK\6000\lib\wnet\amd64\user32.lib /entry:Main c:\myapp\test.res

test.res is a file I took from a VC++ wizard project, I was too lazy to make on by myself. Anyway, making a resource file is very easy with the VC++, but no one forbids you to use the notepad, it just takes more time.
To compile the resource file all you need to do is to use the command line: "rc test.rc".

I think the rest of the code is pretty easy to understand. I didn't cover everything with this paragraph, but now you should have quite a good insight into x64 assembly.

Daniel Pistelli


I documenti qui pubblicati sono da considerarsi pubblici e liberamente distribuibili, a patto che se ne citi la fonte di provenienza. Tutti i documenti presenti su queste pagine sono stati scritti esclusivamente a scopo di ricerca, nessuna di queste analisi è stata fatta per fini commerciali, o dietro alcun tipo di compenso. I documenti pubblicati presentano delle analisi puramente teoriche della struttura di un programma, in nessun caso il software è stato realmente disassemblato o modificato; ogni corrispondenza presente tra i documenti pubblicati e le istruzioni del software oggetto dell'analisi, è da ritenersi puramente casuale. Tutti i documenti vengono inviati in forma anonima ed automaticamente pubblicati, i diritti di tali opere appartengono esclusivamente al firmatario del documento (se presente), in nessun caso il gestore di questo sito, o del server su cui risiede, può essere ritenuto responsabile dei contenuti qui presenti, oltretutto il gestore del sito non è in grado di risalire all'identità del mittente dei documenti. Tutti i documenti ed i file di questo sito non presentano alcun tipo di garanzia, pertanto ne è sconsigliata a tutti la lettura o l'esecuzione, lo staff non si assume alcuna responsabilità per quanto riguarda l'uso improprio di tali documenti e/o file, è doveroso aggiungere che ogni riferimento a fatti cose o persone è da considerarsi PURAMENTE casuale. Tutti coloro che potrebbero ritenersi moralmente offesi dai contenuti di queste pagine, sono tenuti ad uscire immediatamente da questo sito.

Vogliamo inoltre ricordare che il Reverse Engineering è uno strumento tecnologico di grande potenza ed importanza, senza di esso non sarebbe possibile creare antivirus, scoprire funzioni malevole e non dichiarate all'interno di un programma di pubblico utilizzo. Non sarebbe possibile scoprire, in assenza di un sistema sicuro per il controllo dell'integrità, se il "tal" programma è realmente quello che l'utente ha scelto di installare ed eseguire, né sarebbe possibile continuare lo sviluppo di quei programmi (o l'utilizzo di quelle periferiche) ritenuti obsoleti e non più supportati dalle fonti ufficiali.