I've made a simple test using again the power example (like in previuos posts: [1] and [2]).
In this case, an assembly function called pow_caller, which in turn calls the p_pow function, is being called from C.
This is the calling C code:
#include "stdlib.h" #include "stdio.h" #include "assert.h" extern int pow_caller(int, int); int main(void) { assert(1 == pow_caller(3, 0)); assert(9 == pow_caller(3, 2)); assert(25 == pow_caller(5, 2)); assert(36 == pow_caller(6, 2)); assert(27 == pow_caller(3, 3)); printf("All tests passed!\n"); return EXIT_SUCCESS; }This code is calling the pow_caller function:
pow_caller: push rbp ; Store stack initial state mov rbp, rsp sub rsp, 8 ; We make space for p_pow result by ; subtracting 8 bytes to the stack ; pointer (RSP) push rdi ; We push the base (b) to the stack push rsi ; We push the exponent (e) to the stack call p_pow ; Now we call p_pow add rsp, 16 ; We make RSP point to the ; position were the result is. pop rax ; And move the result to rax so ; that it's returned to the C program mov rsp, rbp ; Restore stack initial state pop rbp retAs you can see, pow_caller is passing the parameters (base and exponent) to p_pow using the stack.
Before passing the parameters it's also reserving some space (8 bytes) for the result of p_pow to be returned. That's what sub rsp, 8 is doing, (remember that the stack grows towards decreasing addresses).
Once p_pow finishes, we have to get the result from the stack. To do it we go to the place where the result is by moving the stack pointer RSP (add rsp, 16) and, once there, get the result from the stack moving it to the RAX register (pop rax), so that it's returned to the C program when pow_caller ends.
Before executing add rsp, 16, the result of p_pow was in memory at address rbp+16, but after executing add rsp, 16 it was at address rbp, so it can be reached just using pop.
It's very important to remark that the state of the stack (RSP and RBP) must be the same before and after the function to avoid funny bugs when you go back to the caller.
In pow_caller we've pushed the content of 3 registers (RBP, RDI and RSI) to the stack (24 bytes) and subtracted 8 bytes to store the result of p_pow, that's 32 bytes, so that, $rsp' = $rsp - 32, where $rsp is the content of RSP when entering p_pow_caller and $rsp' is its current value.
After p_pow finished, we added 16 bytes to RSP and then made two pops (another 16 bytes), so at the end $rsp' = $rsp again.
That last pop made also that $rbp' = $rbp again.
It only remains to see how p_pow accesses the stack to get the parameters and store the result.
This is the code of p_pow:
p_pow: push rbp mov rbp, rsp ; Store stack initial state push rax push rbx push rcx push rdx mov eax, 1 ; Initialize rax ; We move the base (b) from the stack to ebx mov ebx, dword[rbp+24] ; We move the exponent (e) from the stack to ecx mov ecx, dword[rbp+16] cmp ecx, 0 ; if e==0 -> b^0=1, and we are done jle p_pow_end p_pow_loop: mul ebx ; eax*ebx=edx:eax dec ecx ; ecx = ecx - 1 jg p_pow_loop ; If ecx > 0 it iterates again p_pow_end: ; Move the result to the space reserved for it in the stack mov [rbp+32], rax pop rdx ; Restore stack initial state pop rcx pop rbx pop rax mov rsp, rbp pop rbp retOk, so again the first thing we do is pushing the initial value of RBP (the value in p_pow_caller), so that we can get it back once the function has finished.
Then we copy in RBP the initial content of RSP+8 (because we did a pop). We do this because we want to be able to go on using the stack doing push and pop (which changes RSP), but we also want to be able to reach the parameters that are at certain distance in memory of the original RSP.
This explains why to get a parameter from the stack we use [rbp+distance].
In p_pow_caller we pushed the base first and then the exponent. Since the stack is a LIFO data structure, the exponent is closer to RBP than the base.
How far is it? Well, before pushing p_pow initial RBP content, the exponent was at a distance of 8 bytes from the caller's RSP and the base at a distance of 16. After pushing RBP, the distance is 8 bytes more: base at 28 and exponent at 16 bytes distance.
In that moment we stored the content of RSP inside RBP, so that later changes to RSP cannot change the distance to the parameters. That's the reason why we are able to grab the base and exponent by doing mov ebx, dword[rbp+24] and mov ecx, dword[rbp+16], respectively.
When we've computed the power, we store it in the place that we reserved for it before pushing the parameters (which is 8 bytes farther than the base) doing mov [rbp+32], rax
At the end of p_pow we restore the initial state of the stack (RBP and RSP) again before returning to p_pow_caller.