CS641 Class 4 Use of the Stack: P&H Sec. 2.8, plus how our mips-gcc handles the stack.

Handout: AssemblerFunctionExamples.html

Generic steps for executing a procedure: see pg. 113

MIPS register conventions, pg. 114: a-reg’s for args, v reg’s for return val, $ra for return address

·         jal instruction: jal ProcAddr “jump-and-link”, saves “return address” in reg $ra and jumps to ProcAddr

·         jr $ra:  return to caller, using jump to address in $ra

More precisely, the CPU tracks its current location in code with the PC register (program counter), shown at the very start of the SPIM screen.  The PC holds the address of the currently executing instruction.

Since all instructions are 4 bytes long, the address of the next instruction is PC + 4 unless a branch or jump is executed.  While jal is executing, PC+4 is the value of the return address that is placed in $ra.  Then the PC is loaded with ProcAddr, causing the CPU to switch to that address for the next instruction.

So far, no mention of stack.  The above mechanism can call and return from a simple procedure just using registers.  We can even do syscalls in the procedure as we did with hello.s, since the syscall does not change the preserved registers, which include $ra (maybe more, but can’t find statement of this.)

However, as soon as one function calls another, we have to start saving $ra for use after the lower function returns.

The cleanest way to save something is on the stack, since it just grows as needed.

Pushing and popping data on the stack: on machines with push and pop instructions, we just push rx and later pop rx, but of course that’s CISC thinking.

Remember that the stack grows down in memory, so the word at ($sp) is at “top of stack”, and the words above that address are in use already.  The words below that address are up for grabs. This of course assumes sp points to good memory already, but that is set up by the OS or startup code.

Claiming stack space on RISC: just adjust the SP “manually”. A function can allocate stack space for itself by just moving SP down by 4 bytes for each word needed to save. Example in book: need 3 registers saved, so

addi $sp, $sp, -12 

Now we (the function) “own” the memory below 12($sp), i.e, 8($sp), 4($sp), and 0($sp).  We move into our newly acquired memory to save, say, reg’s t1, t0, and s0:

sw $t1, 8($sp)

sw $t0, 4($sp)  # but see note below (how this is an unusual action)

sw $s0, 0($sp)

 

If we don’t change $sp during the function, it is easy to restore these registers from the stack:

lw $t1, 8($sp)

lw $t0, 4($sp)

lw $s0, 0($sp)

 

This is written in opposite order on pg. 115, following the old push-pop idea that you need to pop in the opposite order, but in fact we don’t need to with this setup.

See Figure 2.10 on pg. 116. However, note that saving and restoring $t0 and $t1 is quite unusual, since they are not-preserved registers, i.e., each function can use them without saving and restoring them.  Of course so can any function we call, so we do need to be careful across calls.  The save and restore of s0 is more typical, since it’s a preserved register, so we are required to save it and restore it if we use it at all.

Save and restore of $ra is particularly common, needed whenever there’s a function calling a function.

Recall that main of hello.s is a function itself, ending with jr $ra.  So if we make it call a function, say str_length to find the message length, it gives us a simple next example. We convert the code from last class into a function here.

Since we want to call str_length with a pointer to the string, itself in memory, and then also print it with print-string, it is reasonable to give this pointer a preserved register such as s0, so its value is guaranteed to survive across the call.  Once we decide to use s0, we need to actively preserve it for our own caller, so we need to preserve both $s0 and $ra here in main.

countstring1.s: on handout

.text

# register usage:

# s0: pointer to string

 

        .globl  main

main:                     #program starts at main

        addi    $sp, $sp, 4  # claim 2 word

        sw      $ra, 4($sp)  # save ra

        sw      $s0, 0($sp)  # and s0

        la      $s0, msg     # load address of msg into a0

        move    $a0, $s0     # use a0 for arg

        jal     str_length   # takes str in a0, returns len in v0

        move    $a0, $v0

        ori     $v0,$0,1     # syscall 1 for print int with addr in a0

        syscall

        move    $a0, $s0     # reload str addr from $s0 (preserved across call)

        ori     $v0,$0,4     # syscall 4 for print string with addr in a0

        syscall

        lw      $s0, 0($sp)  # restore s0

        lw      $ra, 4($sp)   # restore ra

        jr      $ra          # return from main

 

# register usage:

# t0: pointer to string

# t1: count

# t2: current byte of string

# Note this function calls no functions itself, making it a “leaf”

# As a leaf, it doesn’t need to save and restore $ra

 

str_length:

move $t0, $a0        # point to string  <--changed: arg in a0

      li $t1, 0            # count = 0

loop: lb $t2, 0($t0)       # load byte of string

      beq $t2, $0, done    # check if byte = 0

      addi $t1,$t1,1       # inc count

      addi $t0, $t0, 1     # inc pointer

      j loop               # loop back

done: move $v0, $t1        # count to return  <--changed: return val in v0

      jr $ra               # return

 

 

.data

msg:    .asciiz "Hello, world!"

 

Also look at the factorial example, pg. 117-118. It uses the slti instruction needed for handling inequalities like n < 1 explained on pg. 109.

To understand C-generated code, we need to tackle the frame pointer idea.

The key idea is stated on pg. 119: We call the block of memory allocated on the stack the “stack frame”. The frame pointer points to the start of this area for the duration of a function call, freeing up the stack pointer to move around if necessary, because the software can use the fp to reference the saved variables on the stack.  Thus the frame pointer offers a “stable base pointer” for local-memory variable access.

The text is right, but Fig. 2.12 doesn’t show it right. In the figure, $fp points to the last word of the frame, not the first. Fix it.

Also, the contents of the frame in the diagram should be considered abstractly, i.e., not in order.

Note: There are examples in Appendix B.6 that use fp, but not the way gcc does, so ignore those examples.

Contents of the Stack Frames

Arguments to Functions: a0-a3 are used for up to 4 arguments, but C allows any number of arguments

Consider the case of 6 args:  val = f2(x0, x1, x2, x3, x4, x5);

x0-x3 are passed in a0-a3, x4 and x5 are passed on the stack. And for uniformity, mips-gcc allocates spots for x0-x3 on the stack too. But weirdly, this stack space is allocated by the caller, not the callee. More on this below.

Saved return address: for non-leaf functions

Saved preserved registers: $fp, and any s-registers this function wants to use. Also $gp, which gets 8 bytes (don’t worry about this yet)

Local arrays and structs: for this function

Stack Allocation by caller for arguments

First example using mips-gcc conventions: see handout for stringcount2.s, converted to SPIM syntax from output of “mips-gcc –S stringcount2.c”.

We see that str_length has only one argument, but the caller, main, is expected to allocate space for at least 4 args if it calls anything.

Main needs to save $ra, $fp, $gp (2 words), as well as allocate space for 4 args, total of 8 words, or 32 bytes.

We’re ignoring $gp, not actually in use here anyway. But the calling convention requires allocating space (8 bytes) for it.

Note that main puts the arg to str_length in a0, not on the stack, even though it has allocated space for it on the stack. We can think of this as a convenience for the called function: it can store arg 0 there if it wants to.

In fact, str_length does store arg0 in its spot on the stack right after setting up its own frame pointer.  Then it uses that spot for the working value of the string pointer p, loading and storing from the stack.  This is not very efficient, but it’s easy for the debugger to follow. Similarly the count variable (a local variable in the .c program) is given a stack location by str_length’s own allocation, and used from there.

Note that mips-gcc gets a lot smarter if optimization is turned on.  Try “mips-gcc-O2 –S stringcount2.c” to see all the extra code melt away. Don’t forget to use –O2 for your production C code!

More than 4 args...

Suppose a function f1 calls 2 functions f2 and f3, with 3 and 6 arguments, respectively. By mips-gcc rules, f1 needs to allocate enough arg space for the 6-arg call, i.e. 24 bytes of stack space. (If f3 has only 2 args, it uses the minimal 4 args-worth, for 16 bytes.)

Suppose f1 is itself simple, so that it only needs to preserve $fp and $ra, and $gp.  Then its frame looks like this:

--------
saved-$ra

saved-$fp

saved-$gp (2 words)

space for 6 arg regs for callers

-------


So its frame is 10*4 = 40 bytes

f1 allocates 40 bytes:

addi $sp, $sp, -40

f1 saves $ra and $fp:

sw  $ra, 36($sp)   <--highest word address in the frame

sw  $fp, 32($sp)

 

With $fp saved, f1 can establish its own $fp:

move $fp, $sp

Although $fp and $sp now contain the same address, $fp is preferred for stack references in the body of f1.

To call f2, f1 puts the 3 args in a0-a2, and does the jal, just as in our previous example. On return, it can access any function return value from v0.

To call f3, f1 puts x0-x3 in a0-a3, and x4 in 4*4($fp), and x5 in 5*4($fp), leaving 0($fp), 4($fp), 8($fp) and 12($fp) reserved for storing $a0-$a3 if the callee wants to do that.  In other words, the callee is in charge of these spots on the stack, and is also allowed to change the spots for the #4 and #5 args.

Ready to call f3

The stack now looks like this, after f1 has set up the args for the call to f3. Registers a0-a3 contain the first 4 args.

--------
36($fp): saved-$ra

32($fp): saved-$fp
8 bytes for $gp

20($fp): arg 5

16($fp): arg 4

0($fp)- 15($fp): space for args 0-3, untouched so far

-------<--$sp, $fp point here

Calling f3

Now f1 does the jal to f3. This itself does nothing to the stack.  It just puts the RA in $ra.

f3 in action: setting up its own frame and $fp

F3 receives the same stack and $sp, $fp contents, but thinks of $fp as the caller’s $fp, not its own. 

Suppose f3 wants to use s0, but is a leaf function. Then it needs to preserve $fp and $s0, and leave space for $gp, but not $ra. It needs no argument space on the stack, just the 16 bytes for the preserved registers:

addi $sp, $sp, -16
sw   $fp, 12($sp)   <--save caller’s fp

sw   $s0, 8($sp)

move $fp, $sp   <--establish $fp for f2

 

f3’s view of the stack: its own args lie just above its own frame

 

Here’s the stack now, showing 2 stack frames, because f2 has ownership of part of the caller’s stack space:

 

--------
saved-$ra

saved-$fp

8 bytes for $gp

44($fp): arg 5    <--f2 gets arg5 from here and is allowed to change this spot

40($fp): arg 4

16($fp)- 39($fp): space for args 0-3, untouched so far: f3 can save a0-a3 to here, change the values

-------
12($fp): saved-$fp

8($fp): saved-$s0

8 bytes for $gp

-------<--$sp, $fp point here

We can see that the general rule is that if f3 has F bytes of frame, then its arg0 spot is F($fp), arg1 is at (F+4)($fp), and so on.  We see that its own args lie just above its own frame on the stack.

With the unoptimized compile, mips-gcc will in fact save a0-a3 to their stack spots just after it establishes its own $fp. After this action, all of its arguments are lined up on the stack, and their use in the body of f3 will involve a load-from-stack, operation in registers, and then store-to-stack if the value has changed.  This is quite inefficient (and fixed by using optimization), but makes it easy for debuggers to follow the code at runtime.

 Why is this so inefficient? Because the stack resides in memory, so all the stack loads and stores are memory references, across the “memory barrier”.