CS641 Class 3

Reading so far: P&H (4th): 1.1-1.3, 2.1-2.3, 2.6, 2.7, 2.9 (ASCII) as on syllabus, plus Appendix B.9 and B.10 on SPIM, also 3-page tutorial on the CD-ROM “Getting Started with SPIM”.

Handouts: String Processing Example 1/31/11, MIPS Cross Compiler Setup and First Example

Review

Review How do the following C statement? (assuming a in $s0, b in $s1, ...)
a = b + c + d;

° Break into multiple instructions

add $t0, $s1, $s2 # temp = b + c

add $s0, $t0, $s3 # temp = temp + d

Immediates are numerical constants.

° Add Immediate:

addi $s0,$s1,10 (in MIPS)

f = g + 10 (in C)

C variables map onto registers; what about large data structures like arrays? or C variables still just in memory, not yet in a register.

° Data transfer instructions transfer data between registers and memory:

• Memory to register, Register to memory

° Example: lw $t0,12($s0)

This instruction will take the pointer in $s0, add 12 bytes to it, and then load the value from the memory pointed to by this calculated sum into register $t0

° Notes:

• $s0 is called the base register

• 12 is called the offset

• if s0 contains 0x1000 0100 as above, this load accesses address 0x1000 010c. Note that 12 decimal is c in hex. Numbers default to decimal in assembler. Need to write 0x in front of hex numbers, just as in C.

But how do we get 0x1000 0100 into s0?? Using lui and la

We saw how to use immediate operands to construct constants, but they only can handle 16 bits.

Another instruction: lui, load upper immediate, loads 16 bits into the upper half of the target register, and clears the lower half.

So to get 0x1000 0100 into s0:

lui $s0, 0x1000

addi $s0, 0x0100

Note that the assembler supports a pseudo-instruction la (load address) that knows how to do this, so we can write:

la $s0, 0x10000100

and the assembler will compose the two needed instructions (perhaps using ori instead of addi) for us.

Store: Also want to store value from a register into memory

° Store instruction syntax is identical to Load instruction syntax

° MIPS Instruction Name:

° sw (meaning Store Word, so 32 bits or one word are stored at a time)

° Example: sw $t0,12($s0)

° This instruction will take the pointer in $s0, add 12 bytes to it, and then store the value from register $t0 into the memory address pointed to by the calculated sum

First full program: hello.s, the hello world program linked to the class web page.

# hello world from MIPS assembler

# Note that the first instruction (la) is a pseudo-instruction,

# and assembles to two real instructions, an lui and an ori

# to load the upper half and lower half of the 32-bit address

# for msg into register $a0

.text

.globl main

main: #program starts at main

la $a0, msg # load address of msg into a0

ori $v0,$0,4 # syscall 4 for print string with addr in a0

syscall

jr $ra # return from main

.data

msg: .asciiz "Hello, world!" <--symbol msg gets value = address of the string

The assembler reads the program, in order, and while .text is in effect, it puts the indicated instructions into a growing text segment (i.e. code), and when .data is in effect, it puts the indicated data areas into the data segment. These are loaded into SPIM along with some startup code, and we end up with what you see in the screenshot handout.

Startup Code

Note that the assembled code starts at 0x40 0024, and there is code before that. This code is the user-level startup code. It sets some registers up and then calls main. When main returns, it does the exit syscall, returning execution to the OS. We see the same kind of thing on full-sized UNIX and Linux systems.

Symbols, and the “symbol table”

The assembler is able to handle name-value associations called symbols. The most important kind of symbol is a named memory location like symbol (or label) main, with value 0x40 0024, in the code area (text segment) and symbol msg, with value 0x1001 0000, in the data segment.

At runtime, symbols have fixed values, arrived at by the build process done by SPIM. When we write the program, we don’t yet know these values, but we can trust the build process to allocate proper addresses. We can just use the symbols like named constants.

To see the “global symbols”, the ones with .globl either by you or the startup module, use Simulator>Display symbol table. This had been done in the run shown on the screenshot. See the Console window near the bottom. If you want to see msg in this report, add “.globl msg” at the top to make it global.

Now that we know that msg = 0x1001 0000, we can see how the la is expanded into two instructions by the assembler:

From the SPIM screenshot: assembled pseudo-instruction la $a0, msg

lui $1, 4097 [msg] # note that 4097 = 0x1001, the upper half of msg’s value, put in $at, the assembler’s reg

ori $4, $1, 0 [msg] # this is or-ing in the lower half, which is 0 in this case, and putting the addr in $a0

syscall: special trap instruction to get OS to print the string

Calling syscall 4, print string:

· Put address of string to print in $a0

· Put 4 in $v0

· Do syscall instruction

See pg. B-44 for list of system calls you can use. The string needs to be null-terminated, as done in C. The .asciiz directive does this for us.

Understanding the bytes in the string at 0x1001 0000 (msg): Little-endian representation!

We see [0x10010000] 0x6c6c6548 ...

We can do “man ascii” on Linux/UNIX to get a decent ASCII table that shows hex values. It shows that 0x6c = l, 0x65 = e, and 0x48 = H, so we are seeing the first 4 bytes of Hello world! in opposite order.

Actually, the byte 48 is at 0x1001 0000, the byte 65 at 0x1001 0001, and so on, as expected for a string starting at 0x1001 0000. But SPIM is displaying the contents of the word at 0x1001 0000 (4 bytes) as an integer number.

Little-endian Integer Layout:

A: lowest byte of number, loads (lw) to lowest byte of register, prints at right end

A+1: next byte

A+2: next byte

A+3: high byte, loads to highest byte of register, prints at left end

So you see that to find the byte at A, the first byte of the string, you need to look on the right-hand end of the printed number.

Working with strings (for problem 1 of hw1)

Let’s count the characters in a string set up like Hello, world!.

Idea: set up a pointer in a register, clear the count reg

Loop:

load a byte (lb)

if 0, branch to done

count it

branch back to load

Registers we can use without “preserving”: see pg. 118: t’s, a’s, v’s: plenty

Ex: use t0 for ptr, t1 for count, t2 for byte

Need branches: Sec 2.7, pg. 105

beg reg1, reg2, label1

Go to statement label label1 if contents of reg1 and reg2 are equal. Similarly bne.

Unconditional branch: “jump” j label1

OK, we’re ready to code!

From Handout: String Processing Example 1/31/11

countstring.s

# countstring.s: count chars in a string in data

# registers:

# t0 - pointer to string

# t1 - count of chars

# t2 - current char

.globl main

.globl str2count

main: la $t0, str2count # point to string

li $t1, 0 # count = 0

loop: lb $t2, 0($t0) # load byte of string

beq $t2, $0, done # check if byte = 0

addi $t1,$t1,1 # inc count

addi $t0, $t0, 1 # inc pointer

j loop # loop back

done: move $a0, $t1 # count to print

li $v0, 1 # print string

syscall

jr $ra # return

.data

str2count: .asciiz "abcdefg"

In C, written to avoid #include <stdio.h>, so we can use mips-gcc on it, and also putting the string in the data section.

extern int printf(char *s, ...);

char str2count[] = "abcdefg";

int main()

{

char *p;

int i = 0;

for (p = str2count; *p; p++)

i++;

printf("%d\n", i);

}

We can compile this two ways on sf06:

gcc countstring.c

a.out run it

mips-gcc –S countstring.c create countstring.s in MIPS assembler (once PATH is set properly—

see handout)

But to read this, we need to understand the use of the stack, so look at simpler one now:

Look at handout MIPS Cross Compiler Setup and First Example

The cross-compiler and other related tools are in /usr/local/bin/mips.