CS341 Notes on Program Images: SAPC and UNIX

 

Example program: $pcex/test.c, showing printf, scanf, etc. use.  To build it for SAPC:

On UNIX:

mkdir test1

cd test1

cp $pcex/makefile $pcex/test.c .

make C=test

Now look for test.lnx, the SAPC executable file. It is a file in the UNIX filesystem that contains x86 instructions for the program and related information, including the symbol table.   It is generated by i386-gcc, the cross-gcc-compiler and i386-ld, the cross-loader.  These cross-tools are UNIX programs running on the Sparc CPU of the (Sun) UNIX system, but generating x86 code in binary formats compatible with Linux tools. (However, the resulting executable cannot run on ordinary Linux because it’s a standalone program, not a Linux user program.) The suffix .lnx is there to remind us of its Linux software roots.

 

The symbol table contains names and values for all the important places (addresses) in the program: all the function names (code addresses), external variable names (data addresses), and some overhead symbols as well that we’ll ignore.  The symbol table inside test.lnx is in binary format.  However, the i386-nm tool can dump out the symbol table into a human-readable text file, and this is done in the build directed by the above “make C=test”, resulting in the text.syms file, another file in the UNIX file system.

 

We can find where printf resides in memory by looking in test.syms for the line about printf:

grep printf test.syms    or    i386-nm test.lnx | grep printf

Either way we see:

00100990 T _printf

00100990 t printf.opc

 

The entry for _printf is the actual function-name symbol.  The other one is an overhead symbol for the object file (printf.opc) from printf.c, the implementation file for printf.  We see that printf is at address 0x100990 in the SAPC program.  Similarly we find that _main is at 0x1002d4.   Both of these are just a little over 0x100100, the start address for the program, which is also the lowest address of the program. The T in the middle column identifies this as a “text” address, i.e., a code address, and because it’s capitalized, it is an external (or “global”) address, one that can be accessed from outside its own .c file.

 

Similarly we find that _msg is at address 0x101ae8, somewhat above the code addresses, and is marked D for an external/global data address.  This is no accident.  All the code is placed together in the first stretch of memory given to the program, followed by all the external (and static) data.  Some of the code comes directly from the program, other code from library functions like printf, still other code from functions called from library functions, and so on.  In a standalone program like this, the code has to do everything, even work directly with hardware, because there’s no operating system (OS).

 

Why are the SAPC program addresses so high?  Why is the start address specified (by the i386-ld command) at 0x100100, a little above 0x100000 = 2**20 = 1M?  Because we have 4M of usable memory on the SAPC, but the first 1M of it contains hardware-defined memory areas such as video memory and BIOS.  See SAPC Programming Environment for more information on this.  We have 3M of clear usable memory starting at 0x100000, so we chose 0x100100 as an easy-to-remember place to start the program.  We can call this 3M area the “user memory area,” as opposed to the “system memory area” of the first 1M.  There is some clear usable memory in the first 1M, and this is where code and data for Tutor resides.

 

A simple program has code, data, and a program stack.  On the SAPC, the stack grows down from the top of the user memory area, the last byte address of which is 0x3fffff.  All the local (automatic) variables are held on the stack, along with function return addresses and other information.

 

Thus we can make a rough picture of a program running on an SAPC: code starting at 0x100100, data starting somewhere above that after the code, and stack growing down from 0x3fffff.

 

Using Tutor to examine a SAPC program

 

Use mtip (a UNIX program) to download the program into an online SAPC.  It connects your input/output to one of the online SAPCs, and then just shuttles characters back and forth except for “escaped” commands such as ~r and ~d and ~q, which are handled by mtip as appropriate.  Here “md” is for memory display, “ms” for memory set.

 

mtip –f test.lnx

~r                 ß reset the SAPC if needed (takes 12 secs)

~d                 ß download test.lnx

Tutor> go 100100                 run program

 

Tutor> md 100100                 look at downloaded code

00100100    bc f0 ff 3f 00 bd 00 00 00 00 e8 01 00 00 00 cc ...?............

Tutor> md 101ae8                  look at msg address

00101ae8    74 65 73 74 69 6e 67 00 00 00 00 00 00 00 00 00 testing.........

Tutor> ms 101ae8 70        set byte at 101ae8 to hex 70

Tutor> md 101ae8

00101ae8    70 65 73 74 69 6e 67 00 00 00 00 00 00 00 00 00 pesting.........

 

Note how Tutor gives the hex values of bytes one by one for 16 bytes and then tries to provide the ASCII characters for them if they are printable, or just “.” if not.  Thus from the above we can see that the contents of address 0x100101 is f0 and the contents of address 10010f is cc.  The only printable ASCII code in this set of 16 bytes is the fourth one, 0x3f=’?’.  At 101ae8, the string “testing…” resides, so at 101ae8 itself we see 0x74 = ‘t’, at 101ae9 we see 0x65=’e’, and so on.  Then after the ms  command, the byte at 101ae8 is set to 0x70 = ‘p’.

 

The x86 CPU (and almost all other current CPUs as well) uses “byte addressing.”  Each byte of memory holds 8 bits and has its own address.  Note that 8 bits of data is neatly represented by 2 hex digits.  Each hex digit represents 4 bits, from 0 = 0000 binary up to f = 1111 binary.  If you’re rusty on this, write them all out—we’ll be using them a lot!

 

Building test.c as a UNIX program.

 

This is really easy.  Just “gcc test.c” and look for file a.out. If you want to use gdb, the debugger, add –g: “gcc –g test.c”.  The resulting a.out file is a UNIX executable containing Sparc instructions (for our Sun UNIX systems) which are entirely different from x86 instructions.  In fact Sparc assembly language is coded by very few programmers because it was never designed for direct use, only for compiler output, unlike the x86 assembly language.  In addition to the Sparc instructions, the executable file contains a symbol table, which we can look at by using “nm”, or “nm –n” to sort by address.  Note: don’t call this (or any) executable “test”.  There is a UNIX program called test (see “man test”) that is extremely easy to confuse with your own program.  Can drive you crazy.

 

Because the program can call on the OS to do all the i/o work, the program image itself can be smaller than its SAPC counterpart.  It just does an i/o request via a “system call” to the OS.  Each system call shows up as special “trap” instruction—we’ll return to this at the very end of the term.  There are many advantages to using a real OS.  We are only avoiding using an OS so we can study the hardware.  A modern OS is so successful at separating programs from hardware that the hardware is effectively completely hidden.

 

We can dump the symbol table to file test.usyms by the UNIX command “nm –pnx a.out > test.usyms”.  This will give a file in the same format as the SAPC test.syms, with hex addresses in sorted order (command nm, with -p for simple output, -n for sorted, -x for hex.)

 

Using gdb to examine a UNIX program (test.c)

 

Script started on Fri Aug 30 20:31:05 2002

warning: could not update utmp entry

  ßignore this warning

 

ulab(1)% gdb a.out

GNU gdb 5.0

Copyright 2000 Free Software Foundation, Inc.

GDB is free software, covered by the GNU General Public License, and you are

welcome to change it and/or distribute copies of it under certain conditions.

Type "show copying" to see the conditions.

There is absolutely no warranty for GDB.  Type "show warranty" for details.

This GDB was configured as "sparc-sun-solaris2.6"...

(gdb) p msg

$1 = "testing"

(gdb) p/x msg

$2 = {0x74, 0x65, 0x73, 0x74, 0x69, 0x6e, 0x67, 0x0}

(gdb) p &msg[0]

$3 = 0x20d80 "testing"

(gdb) p main

$4 = {int ()} 0x1076c <main>

(gdb) p printf

$5 = {<text variable, no debug info>} 0x20c88 <printf>

(gdb) b main

Breakpoint 1 at 0x10770: file test.c, line 10.

(gdb) r

Starting program: /home/eoneil/ulab/a.out

 

 

Breakpoint 1, main () at test.c:10

10        a = 3;

(gdb) p msg

$6 = "testing"

(gdb) p &msg[0]

$7 = 0x20d80 "testing"

(gdb) set var msg[0] = 0x70   ß changing a memory location

(gdb) p msg

$8 = "pesting"

(gdb) p/x msg

$9 = {0x70, 0x65, 0x73, 0x74, 0x69, 0x6e, 0x67, 0x0}

(gdb) x/c 0x20d80

0x20d80 <msg>:  112 'p'

 (gdb) x/c 0x20d81

0x20d81 <msg+1>: 101 'e'

(gdb) n

11        printf("\nWelcome to C on this machine, whatever it is\n\n");

(gdb) n

 

 

Welcome to C on this machine, whatever it is

 

 

12        printf("%s, %s, %d, %d, %d...\n",msg,msg,1,2,a);

(gdb) p a

$10 = 3

(gdb) p &a

$11 = (int *) 0xffbeef9c

   ß address on stack (very high, near 0xffffffff)

(gdb) x $11

    ß using $11 abbreviation from last line

0xffbeef9c:     0 '\000'

   ß defaults to c, previous size (one byte)

(gdb) x/x $11

0xffbeef9c:     0x00

       ß still c, how do we get 32 bits displayed?

 (gdb) help x

Examine memory: x/FMT ADDRESS.

ADDRESS is an expression for the memory address to examine.

FMT is a repeat count followed by a format letter and a size letter.

Format letters are o(octal), x(hex), d(decimal), u(unsigned decimal),

  t(binary), f(float), a(address), i(instruction), c(char) and s(string).

Size letters are b(byte), h(halfword), w(word), g(giant, 8 bytes).

The specified number of objects of the specified size are printed

according to the format.

 

 

Defaults for format and size letters are those previously used.

Default count is 1.  Default address is following last thing printed

with this command or "print".

(gdb) x/wx $11    ß use w for 32-bit display

0xffbeef9c:     0x00000003        ß value of a, on stack

(gdb) q

The program is running.  Exit anyway? (y or n) y

ulab(2)% exit

script done on Fri Aug 30 20:38:33 2002