CS641 Class 3
Reading so far: P&H (4th): 1.1-1.3, 2.1-2.3, 2.6, 2.7, 2.9 (ASCII) as on syllabus, plus Appendix B.9 and B.10 on SPIM, also 3-page tutorial on the CD-ROM “Getting Started with SPIM”.
Handouts: String Processing Example 1/31/11, MIPS Cross Compiler Setup and First Example
Review How do the following C statement? (assuming a in $s0,
b in $s1, ...)
a = b + c + d;
° Break into multiple instructions
add $t0, $s1, $s2 # temp = b + c
add $s0, $t0, $s3 # temp = temp + d
Immediates are numerical constants.
° Add Immediate:
addi $s0,$s1,10 (in MIPS)
f = g + 10 (in C)
C variables map onto registers; what about large data structures like arrays? or C variables still just in memory, not yet in a register.
° Data transfer instructions transfer data between registers and memory:
• Memory to register, Register to memory
° Example: lw $t0,12($s0)
This instruction will take the pointer in $s0, add 12 bytes to it, and then load the value from the memory pointed to by this calculated sum into register $t0
• $s0 is called the base register
• 12 is called the offset
• if s0 contains 0x1000 0100 as above, this load accesses address 0x1000 010c. Note that 12 decimal is c in hex. Numbers default to decimal in assembler. Need to write 0x in front of hex numbers, just as in C.
But how do we get 0x1000 0100 into s0?? Using lui and la
We saw how to use immediate operands to construct constants, but they only can handle 16 bits.
Another instruction: lui, load upper immediate, loads 16 bits into the upper half of the target register, and clears the lower half.
So to get 0x1000 0100 into s0:
lui $s0, 0x1000
addi $s0, 0x0100
Note that the assembler supports a pseudo-instruction la (load address) that knows how to do this, so we can write:
la $s0, 0x10000100
and the assembler will compose the two needed instructions (perhaps using ori instead of addi) for us.
Store: Also want to store value from a register into memory
° Store instruction syntax is identical to Load instruction syntax
° MIPS Instruction Name:
° sw (meaning Store Word, so 32 bits or one word are stored at a time)
° Example: sw $t0,12($s0)
° This instruction will take the pointer in $s0, add 12 bytes to it, and then store the value from register $t0 into the memory address pointed to by the calculated sum
First full program: hello.s, the hello world program linked to the class web page.
# hello world from MIPS assembler
# Note that the first instruction (la) is a pseudo-instruction,
# and assembles to two real instructions, an lui and an ori
# to load the upper half and lower half of the 32-bit address
# for msg into register $a0
main: #program starts at main
la $a0, msg # load address of msg into a0
ori $v0,$0,4 # syscall 4 for print string with addr in a0
jr $ra # return from main
msg: .asciiz "Hello, world!" <--symbol msg gets value = address of the string
The assembler reads the program, in order, and while .text is in effect, it puts the indicated instructions into a growing text segment (i.e. code), and when .data is in effect, it puts the indicated data areas into the data segment. These are loaded into SPIM along with some startup code, and we end up with what you see in the screenshot handout.
Note that the assembled code starts at 0x40 0024, and there is code before that. This code is the user-level startup code. It sets some registers up and then calls main. When main returns, it does the exit syscall, returning execution to the OS. We see the same kind of thing on full-sized UNIX and Linux systems.
Symbols, and the “symbol table”
The assembler is able to handle name-value associations called symbols. The most important kind of symbol is a named memory location like symbol (or label) main, with value 0x40 0024, in the code area (text segment) and symbol msg, with value 0x1001 0000, in the data segment.
At runtime, symbols have fixed values, arrived at by the build process done by SPIM. When we write the program, we don’t yet know these values, but we can trust the build process to allocate proper addresses. We can just use the symbols like named constants.
To see the “global symbols”, the ones with .globl either by you or the startup module, use Simulator>Display symbol table. This had been done in the run shown on the screenshot. See the Console window near the bottom. If you want to see msg in this report, add “.globl msg” at the top to make it global.
Now that we know that msg = 0x1001 0000, we can see how the la is expanded into two instructions by the assembler:
From the SPIM screenshot: assembled pseudo-instruction la $a0, msg
lui $1, 4097 [msg] # note that 4097 = 0x1001, the upper half of msg’s value, put in $at, the assembler’s reg
ori $4, $1, 0 [msg] # this is or-ing in the lower half, which is 0 in this case, and putting the addr in $a0
syscall: special trap instruction to get OS to print the string
Calling syscall 4, print string:
· Put address of string to print in $a0
· Put 4 in $v0
· Do syscall instruction
See pg. B-44 for list of system calls you can use. The string needs to be null-terminated, as done in C. The .asciiz directive does this for us.
Understanding the bytes in the string at 0x1001 0000 (msg): Little-endian representation!
We see [0x10010000] 0x6c6c6548 ...
We can do “man ascii” on Linux/UNIX to get a decent ASCII table that shows hex values. It shows that 0x6c = l, 0x65 = e, and 0x48 = H, so we are seeing the first 4 bytes of Hello world! in opposite order.
Actually, the byte 48 is at 0x1001 0000, the byte 65 at 0x1001 0001, and so on, as expected for a string starting at 0x1001 0000. But SPIM is displaying the contents of the word at 0x1001 0000 (4 bytes) as an integer number.
Little-endian Integer Layout:
A: lowest byte of number, loads (lw) to lowest byte of register, prints at right end
A+1: next byte
A+2: next byte
A+3: high byte, loads to highest byte of register, prints at left end
So you see that to find the byte at A, the first byte of the string, you need to look on the right-hand end of the printed number.
Working with strings (for problem 1 of hw1)
Let’s count the characters in a string set up like Hello, world!.
Idea: set up a pointer in a register, clear the count reg
load a byte (lb)
if 0, branch to done
branch back to load
Registers we can use without “preserving”: see pg. 118: t’s, a’s, v’s: plenty
Ex: use t0 for ptr, t1 for count, t2 for byte
Need branches: Sec 2.7, pg. 105
beg reg1, reg2, label1
Go to statement label label1 if contents of reg1 and reg2 are equal. Similarly bne.
Unconditional branch: “jump” j label1
OK, we’re ready to code!
From Handout: String Processing Example 1/31/11
# countstring.s: count chars in a string in data
# t0 - pointer to string
# t1 - count of chars
# t2 - current char
main: la $t0, str2count # point to string
li $t1, 0 # count = 0
loop: lb $t2, 0($t0) # load byte of string
beq $t2, $0, done # check if byte = 0
addi $t1,$t1,1 # inc count
addi $t0, $t0, 1 # inc pointer
j loop # loop back
done: move $a0, $t1 # count to print
li $v0, 1 # print string
jr $ra # return
str2count: .asciiz "abcdefg"
In C, written to avoid #include <stdio.h>, so we can use mips-gcc on it, and also putting the string in the data section.
extern int printf(char *s, ...);
char str2count = "abcdefg";
int i = 0;
for (p = str2count; *p; p++)
We can compile this two ways on sf06:
a.out run it
mips-gcc –S countstring.c create countstring.s in MIPS assembler (once PATH is set properly—
But to read this, we need to understand the use of the stack, so look at simpler one now:
Look at handout MIPS Cross Compiler Setup and First Example
The cross-compiler and other related tools are in /usr/local/bin/mips.