CS641 Class 4 Use of
the Stack: P&H Sec. 2.8, plus how our mips-gcc handles the stack.
Handout: AssemblerFunctionExamples.html
Generic steps for executing a procedure: see pg. 113
MIPS register conventions, pg. 114: a-reg’s for args, v reg’s for return val, $ra for return address
· jal instruction: jal ProcAddr “jump-and-link”, saves “return address” in reg $ra and jumps to ProcAddr
· jr $ra: return to caller, using jump to address in $ra
More precisely, the CPU tracks its current location in code with the PC register (program counter), shown at the very start of the SPIM screen. The PC holds the address of the currently executing instruction.
Since all instructions are 4 bytes long, the address of the next instruction is PC + 4 unless a branch or jump is executed. While jal is executing, PC+4 is the value of the return address that is placed in $ra. Then the PC is loaded with ProcAddr, causing the CPU to switch to that address for the next instruction.
So far, no mention of stack. The above mechanism can call and return from a simple procedure just using registers. We can even do syscalls in the procedure as we did with hello.s, since the syscall does not change the preserved registers, which include $ra (maybe more, but can’t find statement of this.)
However, as soon as one function calls another, we have to start saving $ra for use after the lower function returns.
The cleanest way to save something is on the stack, since it just grows as needed.
Pushing and popping data on the stack: on machines with push and pop instructions, we just push rx and later pop rx, but of course that’s CISC thinking.
Remember that the stack grows down in memory, so the word at ($sp) is at “top of stack”, and the words above that address are in use already. The words below that address are up for grabs. This of course assumes sp points to good memory already, but that is set up by the OS or startup code.
Claiming stack space on RISC: just adjust the SP “manually”. A function can allocate stack space for itself by just moving SP down by 4 bytes for each word needed to save. Example in book: need 3 registers saved, so
addi $sp, $sp, -12
Now we (the function) “own” the memory below 12($sp), i.e, 8($sp), 4($sp), and 0($sp). We move into our newly acquired memory to save, say, reg’s t1, t0, and s0:
sw $t1, 8($sp)
sw $t0, 4($sp) # but see note below (how this is an unusual
action)
sw $s0, 0($sp)
If we don’t change $sp during the function, it is easy to restore these registers from the stack:
lw $t1, 8($sp)
lw $t0, 4($sp)
lw $s0, 0($sp)
This is written in opposite order on pg. 115, following the old push-pop idea that you need to pop in the opposite order, but in fact we don’t need to with this setup.
See Figure 2.10 on pg. 116. However, note that saving and restoring $t0 and $t1 is quite unusual, since they are not-preserved registers, i.e., each function can use them without saving and restoring them. Of course so can any function we call, so we do need to be careful across calls. The save and restore of s0 is more typical, since it’s a preserved register, so we are required to save it and restore it if we use it at all.
Save and restore of $ra is particularly common, needed whenever there’s a function calling a function.
Recall that main of hello.s is a function itself, ending with jr $ra. So if we make it call a function, say str_length to find the message length, it gives us a simple next example. We convert the code from last class into a function here.
Since we want to call str_length with a pointer to the string, itself in memory, and then also print it with print-string, it is reasonable to give this pointer a preserved register such as s0, so its value is guaranteed to survive across the call. Once we decide to use s0, we need to actively preserve it for our own caller, so we need to preserve both $s0 and $ra here in main.
countstring1.s: on
handout
.text
#
register usage:
#
s0: pointer to string
.globl
main
main: #program starts at main
addi $sp, $sp, 4
# claim 2 word
sw
$ra, 4($sp) # save ra
sw $s0, 0($sp) # and s0
la
$s0, msg # load address of msg into a0
move $a0, $s0 #
use a0 for arg
jal
str_length # takes str in a0,
returns len in v0
move $a0, $v0
ori
$v0,$0,1 # syscall 1 for print int with addr in a0
syscall
move
$a0, $s0 # reload str addr from $s0 (preserved across
call)
ori
$v0,$0,4 # syscall 4 for print string with addr in a0
syscall
lw $s0, 0($sp) # restore s0
lw $ra, 4($sp) #
restore ra
jr
$ra # return from main
#
register usage:
#
t0: pointer to string
#
t1: count
#
t2: current byte of string
#
Note this function calls no functions itself, making it a “leaf”
#
As a leaf, it doesn’t need to save and restore $ra
str_length:
move $t0, $a0
# point
to string <--changed: arg in a0
li $t1, 0 # count = 0
loop:
lb $t2, 0($t0) # load byte of
string
beq $t2, $0, done # check if byte = 0
addi $t1,$t1,1 # inc count
addi $t0, $t0, 1 # inc pointer
j loop # loop back
done:
move $v0, $t1 # count to return <--changed: return val in v0
jr $ra # return
.data
msg: .asciiz "Hello, world!"
Also look at the factorial example, pg. 117-118. It uses the slti instruction needed for handling inequalities like n < 1 explained on pg. 109.
To understand
C-generated code, we need to tackle the frame pointer idea.
The key idea is stated on pg. 119: We call the block of memory allocated on the stack the “stack frame”. The frame pointer points to the start of this area for the duration of a function call, freeing up the stack pointer to move around if necessary, because the software can use the fp to reference the saved variables on the stack. Thus the frame pointer offers a “stable base pointer” for local-memory variable access.
The text is right, but Fig. 2.12 doesn’t show it right. In the figure, $fp points to the last word of the frame, not the first. Fix it.
Also, the contents of the frame in the diagram should be considered abstractly, i.e., not in order.
Note: There are examples in Appendix B.6 that use fp, but not the way gcc does, so ignore those examples.
Contents of the Stack
Frames
Arguments to Functions: a0-a3 are used for up to 4 arguments, but C allows any number of arguments
Consider the case of 6 args: val = f2(x0, x1, x2, x3, x4, x5);
x0-x3 are passed in a0-a3, x4 and x5 are passed on the stack. And for uniformity, mips-gcc allocates spots for x0-x3 on the stack too. But weirdly, this stack space is allocated by the caller, not the callee. More on this below.
Saved return address: for non-leaf functions
Saved preserved registers: $fp, and any s-registers this function wants to use. Also $gp, which gets 8 bytes (don’t worry about this yet)
Local arrays and structs: for this function
Stack Allocation by
caller for arguments
First example using
mips-gcc conventions: see handout for stringcount2.s, converted to SPIM syntax
from output of “mips-gcc –S stringcount2.c”.
We see that str_length has only one argument, but the caller, main, is expected to allocate space for at least 4 args if it calls anything.
Main needs to save $ra, $fp, $gp (2 words), as well as allocate space for 4 args, total of 8 words, or 32 bytes.
We’re ignoring $gp, not actually in use here anyway. But the calling convention requires allocating space (8 bytes) for it.
Note that main puts the arg to str_length in a0, not on the stack, even though it has allocated space for it on the stack. We can think of this as a convenience for the called function: it can store arg 0 there if it wants to.
In fact, str_length does store arg0 in its spot on the stack right after setting up its own frame pointer. Then it uses that spot for the working value of the string pointer p, loading and storing from the stack. This is not very efficient, but it’s easy for the debugger to follow. Similarly the count variable (a local variable in the .c program) is given a stack location by str_length’s own allocation, and used from there.
Note that mips-gcc gets a lot smarter if optimization is turned on. Try “mips-gcc-O2 –S stringcount2.c” to see all the extra code melt away. Don’t forget to use –O2 for your production C code!
More than 4 args...
Suppose a function f1 calls 2 functions f2 and f3, with 3 and 6 arguments, respectively. By mips-gcc rules, f1 needs to allocate enough arg space for the 6-arg call, i.e. 24 bytes of stack space. (If f3 has only 2 args, it uses the minimal 4 args-worth, for 16 bytes.)
Suppose f1 is itself simple, so that it only needs to preserve $fp and $ra, and $gp. Then its frame looks like this:
--------
saved-$ra
saved-$fp
saved-$gp
(2 words)
space
for 6 arg regs for callers
-------
So its frame is 10*4 = 40 bytes
f1 allocates 40 bytes:
addi $sp, $sp, -40
f1 saves $ra and $fp:
sw $ra, 36($sp)
<--highest word address in the frame
sw
$fp, 32($sp)
With $fp saved, f1 can establish its own $fp:
move $fp, $sp
Although $fp and $sp now contain the same address, $fp is preferred for stack references in the body of f1.
To call f2, f1 puts the 3 args in a0-a2, and does the jal, just as in our previous example. On return, it can access any function return value from v0.
To call f3, f1 puts x0-x3 in a0-a3, and x4 in 4*4($fp), and x5 in 5*4($fp), leaving 0($fp), 4($fp), 8($fp) and 12($fp) reserved for storing $a0-$a3 if the callee wants to do that. In other words, the callee is in charge of these spots on the stack, and is also allowed to change the spots for the #4 and #5 args.
Ready to call f3
The stack now looks like this, after f1 has set up the args for the call to f3. Registers a0-a3 contain the first 4 args.
--------
36($fp): saved-$ra
32($fp):
saved-$fp
8 bytes for $gp
20($fp):
arg 5
16($fp):
arg 4
0($fp)-
15($fp): space for args 0-3, untouched so far
-------<--$sp,
$fp point here
Calling f3
Now f1 does the jal to f3. This itself does nothing to the stack. It just puts the RA in $ra.
f3 in action: setting
up its own frame and $fp
F3 receives the same stack and $sp, $fp contents, but thinks of $fp as the caller’s $fp, not its own.
Suppose f3 wants to use s0, but is a leaf function. Then it needs to preserve $fp and $s0, and leave space for $gp, but not $ra. It needs no argument space on the stack, just the 16 bytes for the preserved registers:
addi
$sp, $sp, -16
sw $fp, 12($sp)
<--save caller’s fp
sw
$s0, 8($sp)
move
$fp, $sp <--establish $fp for f2
f3’s view of the
stack: its own args lie just above
its own frame
Here’s
the stack now, showing 2 stack frames, because f2 has ownership of part of the
caller’s stack space:
--------
saved-$ra
saved-$fp
8
bytes for $gp
44($fp):
arg 5 <--f2 gets arg5 from here and
is allowed to change this spot
40($fp):
arg 4
16($fp)-
39($fp): space for args 0-3, untouched so far: f3 can save a0-a3 to here,
change the values
-------
12($fp): saved-$fp
8($fp):
saved-$s0
8
bytes for $gp
-------<--$sp,
$fp point here
We can see that the general rule is that if f3 has F bytes of frame, then its arg0 spot is F($fp), arg1 is at (F+4)($fp), and so on. We see that its own args lie just above its own frame on the stack.
With the unoptimized compile, mips-gcc will in fact save a0-a3 to their stack spots just after it establishes its own $fp. After this action, all of its arguments are lined up on the stack, and their use in the body of f3 will involve a load-from-stack, operation in registers, and then store-to-stack if the value has changed. This is quite inefficient (and fixed by using optimization), but makes it easy for debuggers to follow the code at runtime.
Why is this so inefficient? Because the stack resides in memory, so all the stack loads and stores are memory references, across the “memory barrier”.